From a5f6f88c3d1a453dd35cbaac2870f5fae866ad2e Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 14:22:36 -0600
Subject: docs: Do not seek comments in kernel/rcu/tree_plugin.h

There are no kerneldoc comments in this file, so do not attempt to
include them in the docs build.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/kernel-api.rst | 2 --
 Documentation/driver-api/basics.rst   | 3 ---
 2 files changed, 5 deletions(-)

diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index a29c99d13331..a53ec2eb8176 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -358,8 +358,6 @@ Read-Copy Update (RCU)
 
 .. kernel-doc:: kernel/rcu/tree.c
 
-.. kernel-doc:: kernel/rcu/tree_plugin.h
-
 .. kernel-doc:: kernel/rcu/tree_exp.h
 
 .. kernel-doc:: kernel/rcu/update.c
diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst
index e970fadf4d1a..1ba88c7b3984 100644
--- a/Documentation/driver-api/basics.rst
+++ b/Documentation/driver-api/basics.rst
@@ -115,9 +115,6 @@ Kernel utility functions
 .. kernel-doc:: kernel/rcu/tree.c
    :export:
 
-.. kernel-doc:: kernel/rcu/tree_plugin.h
-   :export:
-
 .. kernel-doc:: kernel/rcu/update.c
    :export:
 
-- 
cgit v1.2.3-59-g8ed1b


From e8d4f892bb245702ee23abfcd28eb98b5eca6c86 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 14:31:50 -0600
Subject: docs: Fix a misdirected kerneldoc directive

The stratix10 service layer documentation tried to include a kerneldoc
comments for a nonexistent struct; leading to a "no structured comments
found" message.  Switch it to stratix10_svc_command_config_type, which
appears at that spot in the sequence and was not included.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/firmware/other_interfaces.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/driver-api/firmware/other_interfaces.rst b/Documentation/driver-api/firmware/other_interfaces.rst
index a4ac54b5fd79..b81794e0cfbb 100644
--- a/Documentation/driver-api/firmware/other_interfaces.rst
+++ b/Documentation/driver-api/firmware/other_interfaces.rst
@@ -33,7 +33,7 @@ of the requests on to a secure monitor (EL3).
    :functions: stratix10_svc_client_msg
 
 .. kernel-doc:: include/linux/firmware/intel/stratix10-svc-client.h
-   :functions: stratix10_svc_command_reconfig_payload
+   :functions: stratix10_svc_command_config_type
 
 .. kernel-doc:: include/linux/firmware/intel/stratix10-svc-client.h
    :functions: stratix10_svc_cb_data
-- 
cgit v1.2.3-59-g8ed1b


From 41ce14e39bbe0683a2d49385ee8a8cb0b1d010eb Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 14:43:42 -0600
Subject: docs: Do not seek kerneldoc comments in hw-consumer.h

There are no kerneldoc comments here, so looking for them just yields a
warning in the docs build.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/iio/hw-consumer.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/driver-api/iio/hw-consumer.rst b/Documentation/driver-api/iio/hw-consumer.rst
index e0fe0b98230e..819fb9edc005 100644
--- a/Documentation/driver-api/iio/hw-consumer.rst
+++ b/Documentation/driver-api/iio/hw-consumer.rst
@@ -45,7 +45,6 @@ A typical IIO HW consumer setup looks like this::
 
 More details
 ============
-.. kernel-doc:: include/linux/iio/hw-consumer.h
 .. kernel-doc:: drivers/iio/buffer/industrialio-hw-consumer.c
    :export:
 
-- 
cgit v1.2.3-59-g8ed1b


From 3aef4472665695be7cbdd2cc274814f56d36e4ef Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 15:01:30 -0600
Subject: docs: No structured comments in target_core_device.c

Documentation/driver-api/target.rst is seeking kerneldoc comments in
drivers/target/target_core_device.c, but no such comments exist.  Take out
the kernel-doc directive and eliminate one warning from the build.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/target.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/driver-api/target.rst b/Documentation/driver-api/target.rst
index 4363611dd86d..620ec6173a93 100644
--- a/Documentation/driver-api/target.rst
+++ b/Documentation/driver-api/target.rst
@@ -10,8 +10,8 @@ TBD
 Target core device interfaces
 =============================
 
-.. kernel-doc:: drivers/target/target_core_device.c
-    :export:
+This section is blank because no kerneldoc comments have been added to
+drivers/target/target_core_device.c.
 
 Target core transport interfaces
 ================================
-- 
cgit v1.2.3-59-g8ed1b


From dea20be5063c97bdac48e81ee2a85975f14885ed Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 15:03:39 -0600
Subject: docs: no structured comments in fs/file_table.c

Remove the kernel-doc directive, since there are only warnings to be found
there.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/api-summary.rst | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst
index aa51ffcfa029..bbb0c1c0e5cf 100644
--- a/Documentation/filesystems/api-summary.rst
+++ b/Documentation/filesystems/api-summary.rst
@@ -89,9 +89,6 @@ Other Functions
 .. kernel-doc:: fs/direct-io.c
    :export:
 
-.. kernel-doc:: fs/file_table.c
-   :export:
-
 .. kernel-doc:: fs/libfs.c
    :export:
 
-- 
cgit v1.2.3-59-g8ed1b


From 3f715b147a6c5245ee25d7334f4053c339feef98 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 15:05:41 -0600
Subject: docs: No structured comments in include/linux/interconnect.h

Remove the kernel-doc directive for this file, since there's nothing there
and it generates a warning.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/interconnect/interconnect.rst | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
index b8107dcc4cd3..c3e004893796 100644
--- a/Documentation/interconnect/interconnect.rst
+++ b/Documentation/interconnect/interconnect.rst
@@ -89,6 +89,5 @@ Interconnect consumers
 
 Interconnect consumers are the clients which use the interconnect APIs to
 get paths between endpoints and set their bandwidth/latency/QoS requirements
-for these interconnect paths.
-
-.. kernel-doc:: include/linux/interconnect.h
+for these interconnect paths.  These interfaces are not currently
+documented.
-- 
cgit v1.2.3-59-g8ed1b


From b0d60bfbb60cef1efd699a65e29a94487f8c7b1f Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 24 May 2019 14:52:01 -0600
Subject: kernel-doc: always name missing kerneldoc sections

The "no structured comments found" warning is not particularly useful if
there are several invocations, one of which is looking for something
wrong.  So if something specific has been requested, make it clear that
it's the one we weren't able to find.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/kernel-doc | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 3350e498b4ce..c0cb41e65b9b 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -285,7 +285,7 @@ use constant {
     OUTPUT_INTERNAL     => 4, # output non-exported symbols
 };
 my $output_selection = OUTPUT_ALL;
-my $show_not_found = 0;
+my $show_not_found = 0;	# No longer used
 
 my @export_file_list;
 
@@ -435,7 +435,7 @@ while ($ARGV[0] =~ m/^--?(.*)/) {
     } elsif ($cmd eq 'enable-lineno') {
 	    $enable_lineno = 1;
     } elsif ($cmd eq 'show-not-found') {
-	$show_not_found = 1;
+	$show_not_found = 1;  # A no-op but don't fail
     } else {
 	# Unknown argument
         usage();
@@ -2163,12 +2163,14 @@ sub process_file($) {
     }
 
     # Make sure we got something interesting.
-    if ($initial_section_counter == $section_counter) {
-	if ($output_mode ne "none") {
-	    print STDERR "${file}:1: warning: no structured comments found\n";
+    if ($initial_section_counter == $section_counter && $
+	output_mode ne "none") {
+	if ($output_selection == OUTPUT_INCLUDE) {
+	    print STDERR "${file}:1: warning: '$_' not found\n"
+		for keys %function_table;
 	}
-	if (($output_selection == OUTPUT_INCLUDE) && ($show_not_found == 1)) {
-	    print STDERR "    Was looking for '$_'.\n" for keys %function_table;
+	else {
+	    print STDERR "${file}:1: warning: no structured comments found\n";
 	}
     }
 }
-- 
cgit v1.2.3-59-g8ed1b


From 42f6ebd827832e62a37350ffad776ea785a2486b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Thu, 23 May 2019 07:43:43 -0300
Subject: docs: cdomain.py: get rid of a warning since version 1.8

There's a new warning about a deprecation function. Add a
logic at cdomain.py to avoid that.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/sphinx/cdomain.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/sphinx/cdomain.py b/Documentation/sphinx/cdomain.py
index cf13ff3a656c..cbac8e608dc4 100644
--- a/Documentation/sphinx/cdomain.py
+++ b/Documentation/sphinx/cdomain.py
@@ -48,7 +48,10 @@ major, minor, patch = sphinx.version_info[:3]
 
 def setup(app):
 
-    app.override_domain(CDomain)
+    if (major == 1 and minor < 8):
+        app.override_domain(CDomain)
+    else:
+        app.add_domain(CDomain, override=True)
 
     return dict(
         version = __version__,
-- 
cgit v1.2.3-59-g8ed1b


From fe4ec72cca500b2f97ffa0429b4cd57f67e0821d Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Tue, 21 May 2019 21:30:00 +0900
Subject: docs: tracing: Fix typos in histogram.rst

This patch fixes some spelling typos in histogram.rst

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/trace/histogram.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index fb621a1c2638..8408670d0328 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1010,7 +1010,7 @@ Extended error information
 
   For example, suppose we wanted to take a look at the relative
   weights in terms of skb length for each callpath that leads to a
-  netif_receieve_skb event when downloading a decent-sized file using
+  netif_receive_skb event when downloading a decent-sized file using
   wget.
 
   First we set up an initially paused stacktrace trigger on the
@@ -1843,7 +1843,7 @@ practice, not every handler.action combination is currently supported;
 if a given handler.action combination isn't supported, the hist
 trigger will fail with -EINVAL;
 
-The default 'handler.action' if none is explicity specified is as it
+The default 'handler.action' if none is explicitly specified is as it
 always has been, to simply update the set of values associated with an
 entry.  Some applications, however, may want to perform additional
 actions at that point, such as generate another event, or compare and
@@ -2088,7 +2088,7 @@ The following commonly-used handler.action pairs are available:
     and the saved values corresponding to the max are displayed
     following the rest of the fields.
 
-    If a snaphot was taken, there is also a message indicating that,
+    If a snapshot was taken, there is also a message indicating that,
     along with the value and event that triggered the global maximum:
 
     # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
@@ -2176,7 +2176,7 @@ The following commonly-used handler.action pairs are available:
     hist trigger entry.
 
     Note that in this case the changed value is a global variable
-    associated withe current trace instance.  The key of the specific
+    associated with current trace instance.  The key of the specific
     trace event that caused the value to change and the global value
     itself are displayed, along with a message stating that a snapshot
     has been taken and where to find it.  The user can use the key
@@ -2203,7 +2203,7 @@ The following commonly-used handler.action pairs are available:
     and the saved values corresponding to that value are displayed
     following the rest of the fields.
 
-    If a snaphot was taken, there is also a message indicating that,
+    If a snapshot was taken, there is also a message indicating that,
     along with the value and event that triggered the snapshot::
 
       # cat /sys/kernel/debug/tracing/events/tcp/tcp_probe/hist
-- 
cgit v1.2.3-59-g8ed1b


From 93285c01977729a2e046e065e4b99791b966130c Mon Sep 17 00:00:00 2001
From: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Date: Tue, 21 May 2019 10:32:08 +0800
Subject: doc: kernel-parameters.txt: fix documentation of nmi_watchdog
 parameter

The default behavior of hardlockup depends on the config of
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.

Fix the description of nmi_watchdog to make it clear.

Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..79d043b8850d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2836,8 +2836,9 @@
 			0 - turn hardlockup detector in nmi_watchdog off
 			1 - turn hardlockup detector in nmi_watchdog on
 			When panic is specified, panic when an NMI watchdog
-			timeout occurs (or 'nopanic' to override the opposite
-			default). To disable both hard and soft lockup detectors,
+			timeout occurs (or 'nopanic' to not panic on an NMI
+			watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set)
+			To disable both hard and soft lockup detectors,
 			please see 'nowatchdog'.
 			This is useful when you use a panic=... timeout and
 			need the box quickly up again.
-- 
cgit v1.2.3-59-g8ed1b


From 50c1f43a37d006ac24755397614b00064a8f293a Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:05 +1000
Subject: docs: filesystems: vfs: Remove space before tab

Currently the file has a bunch of spaces before tabspaces.  This is a
nuisance when patching the file because they show up whenever we touch
these lines.  Let's just fix them all now in preparation for doing the
RST conversion.

Remove spaces before tabspaces.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 78 +++++++++++++++++++--------------------
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 57fc576b1f3e..cab5a36f39c6 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -134,7 +134,7 @@ struct file_system_type {
 	should be shut down
 
   owner: for internal VFS use: you should initialize this to THIS_MODULE in
-  	most cases.
+	most cases.
 
   next: for internal VFS use: you should initialize this to NULL
 
@@ -143,7 +143,7 @@ struct file_system_type {
 The mount() method has the following arguments:
 
   struct file_system_type *fs_type: describes the filesystem, partly initialized
-  	by the specific filesystem code
+	by the specific filesystem code
 
   int flags: mount flags
 
@@ -180,12 +180,12 @@ and provides a fill_super() callback instead. The generic variants are:
   mount_nodev: mount a filesystem that is not backed by a device
 
   mount_single: mount a filesystem which shares the instance between
-  	all mounts
+	all mounts
 
 A fill_super() callback implementation has the following arguments:
 
   struct super_block *sb: the superblock structure. The callback
-  	must initialize this properly.
+	must initialize this properly.
 
   void *data: arbitrary mount options, usually comes as an ASCII
 	string (see "Mount Options" section)
@@ -236,14 +236,14 @@ only called from a process context (i.e. not from an interrupt handler
 or bottom half).
 
   alloc_inode: this method is called by alloc_inode() to allocate memory
- 	for struct inode and initialize it.  If this function is not
- 	defined, a simple 'struct inode' is allocated.  Normally
- 	alloc_inode will be used to allocate a larger structure which
- 	contains a 'struct inode' embedded within it.
+	for struct inode and initialize it.  If this function is not
+	defined, a simple 'struct inode' is allocated.  Normally
+	alloc_inode will be used to allocate a larger structure which
+	contains a 'struct inode' embedded within it.
 
   destroy_inode: this method is called by destroy_inode() to release
-  	resources allocated for struct inode.  It is only required if
-  	->alloc_inode was defined and simply undoes anything done by
+	resources allocated for struct inode.  It is only required if
+	->alloc_inode was defined and simply undoes anything done by
 	->alloc_inode.
 
   dirty_inode: this method is called by the VFS to mark an inode dirty.
@@ -271,15 +271,15 @@ or bottom half).
 	(i.e. unmount). This is called with the superblock lock held
 
   sync_fs: called when VFS is writing out all dirty data associated with
-  	a superblock. The second parameter indicates whether the method
+	a superblock. The second parameter indicates whether the method
 	should wait until the write out has been completed. Optional.
 
   freeze_fs: called when VFS is locking a filesystem and
-  	forcing it into a consistent state.  This method is currently
-  	used by the Logical Volume Manager (LVM).
+	forcing it into a consistent state.  This method is currently
+	used by the Logical Volume Manager (LVM).
 
   unfreeze_fs: called when VFS is unlocking a filesystem and making it writable
-  	again.
+	again.
 
   statfs: called when the VFS needs to get filesystem statistics.
 
@@ -476,30 +476,30 @@ otherwise noted.
 	that.
 
   permission: called by the VFS to check for access rights on a POSIX-like
-  	filesystem.
+	filesystem.
 
 	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk
-        mode, the filesystem must check the permission without blocking or
+	mode, the filesystem must check the permission without blocking or
 	storing to the inode.
 
 	If a situation is encountered that rcu-walk cannot handle, return
 	-ECHILD and it will be called again in ref-walk mode.
 
   setattr: called by the VFS to set attributes for a file. This method
-  	is called by chmod(2) and related system calls.
+	is called by chmod(2) and related system calls.
 
   getattr: called by the VFS to get attributes of a file. This method
-  	is called by stat(2) and related system calls.
+	is called by stat(2) and related system calls.
 
   listxattr: called by the VFS to list all extended attributes for a
 	given file. This method is called by the listxattr(2) system call.
 
   update_time: called by the VFS to update a specific time or the i_version of
-  	an inode.  If this is not defined the VFS will update the inode itself
-  	and call mark_inode_dirty_sync.
+	an inode.  If this is not defined the VFS will update the inode itself
+	and call mark_inode_dirty_sync.
 
   atomic_open: called on the last component of an open.  Using this optional
-  	method the filesystem can look up, possibly create and open the file in
+	method the filesystem can look up, possibly create and open the file in
 	one atomic operation.  If it wants to leave actual opening to the
 	caller (e.g. if the file turned out to be a symlink, device, or just
 	something filesystem won't do atomic open for), it may signal this by
@@ -687,13 +687,13 @@ struct address_space_operations {
        that all succeeds, ->readpage will be called again.
 
   writepages: called by the VM to write out pages associated with the
-  	address_space object.  If wbc->sync_mode is WBC_SYNC_ALL, then
-  	the writeback_control will specify a range of pages that must be
-  	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
+	address_space object.  If wbc->sync_mode is WBC_SYNC_ALL, then
+	the writeback_control will specify a range of pages that must be
+	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
 	and that many pages should be written if possible.
 	If no ->writepages is given, then mpage_writepages is used
-  	instead.  This will choose pages from the address space that are
-  	tagged as DIRTY and will pass them to ->writepage.
+	instead.  This will choose pages from the address space that are
+	tagged as DIRTY and will pass them to ->writepage.
 
   set_page_dirty: called by the VM to set a page dirty.
         This is particularly needed if an address space attaches
@@ -704,11 +704,11 @@ struct address_space_operations {
         PAGECACHE_TAG_DIRTY tag in the radix tree.
 
   readpages: called by the VM to read pages associated with the address_space
-  	object. This is essentially just a vector version of
-  	readpage.  Instead of just one page, several pages are
-  	requested.
+	object. This is essentially just a vector version of
+	readpage.  Instead of just one page, several pages are
+	requested.
 	readpages is only used for read-ahead, so read errors are
-  	ignored.  If anything goes wrong, feel free to give up.
+	ignored.  If anything goes wrong, feel free to give up.
 
   write_begin:
 	Called by the generic buffered write code to ask the filesystem to
@@ -745,12 +745,12 @@ struct address_space_operations {
         that were able to be copied into pagecache.
 
   bmap: called by the VFS to map a logical block offset within object to
-  	physical block number. This method is used by the FIBMAP
-  	ioctl and for working with swap-files.  To be able to swap to
-  	a file, the file must have a stable mapping to a block
-  	device.  The swap system does not go through the filesystem
-  	but instead uses bmap to find out where the blocks in the file
-  	are and uses those addresses directly.
+	physical block number. This method is used by the FIBMAP
+	ioctl and for working with swap-files.  To be able to swap to
+	a file, the file must have a stable mapping to a block
+	device.  The swap system does not go through the filesystem
+	but instead uses bmap to find out where the blocks in the file
+	are and uses those addresses directly.
 
   invalidatepage: If a page has PagePrivate set, then invalidatepage
         will be called when part or all of the page is to be removed
@@ -810,7 +810,7 @@ struct address_space_operations {
   putback_page: Called by the VM when isolated page's migration fails.
 
   launder_page: Called before freeing a page - it writes back the dirty page. To
-  	prevent redirtying the page, it is kept locked during the whole
+	prevent redirtying the page, it is kept locked during the whole
 	operation.
 
   is_partially_uptodate: Called by the VM when reading a file through the
@@ -921,7 +921,7 @@ otherwise noted.
   unlocked_ioctl: called by the ioctl(2) system call.
 
   compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
- 	 are used on 64 bit kernels.
+	 are used on 64 bit kernels.
 
   mmap: called by the mmap(2) system call
 
@@ -946,7 +946,7 @@ otherwise noted.
 	(non-blocking) mode is enabled for a file
 
   lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
-  	commands
+	commands
 
   get_unmapped_area: called by the mmap(2) system call
 
-- 
cgit v1.2.3-59-g8ed1b


From 4ee33ea403ac7c1f2b04534132ebb9c3c5095b56 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:06 +1000
Subject: docs: filesystems: vfs: Use uniform space after period.

Currently sometimes document has a single space after a period and
sometimes it has double.  Whichever we use it should be uniform.

Use double space after period, be uniform.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 246 +++++++++++++++++++-------------------
 1 file changed, 123 insertions(+), 123 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index cab5a36f39c6..6088b925aa7f 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -14,12 +14,12 @@ Introduction
 
 The Virtual File System (also known as the Virtual Filesystem Switch)
 is the software layer in the kernel that provides the filesystem
-interface to userspace programs. It also provides an abstraction
+interface to userspace programs.  It also provides an abstraction
 within the kernel which allows different filesystem implementations to
 coexist.
 
 VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so
-on are called from a process context. Filesystem locking is described
+on are called from a process context.  Filesystem locking is described
 in the document Documentation/filesystems/Locking.
 
 
@@ -27,37 +27,37 @@ Directory Entry Cache (dcache)
 ------------------------------
 
 The VFS implements the open(2), stat(2), chmod(2), and similar system
-calls. The pathname argument that is passed to them is used by the VFS
+calls.  The pathname argument that is passed to them is used by the VFS
 to search through the directory entry cache (also known as the dentry
-cache or dcache). This provides a very fast look-up mechanism to
-translate a pathname (filename) into a specific dentry. Dentries live
+cache or dcache).  This provides a very fast look-up mechanism to
+translate a pathname (filename) into a specific dentry.  Dentries live
 in RAM and are never saved to disc: they exist only for performance.
 
-The dentry cache is meant to be a view into your entire filespace. As
+The dentry cache is meant to be a view into your entire filespace.  As
 most computers cannot fit all dentries in the RAM at the same time,
-some bits of the cache are missing. In order to resolve your pathname
+some bits of the cache are missing.  In order to resolve your pathname
 into a dentry, the VFS may have to resort to creating dentries along
-the way, and then loading the inode. This is done by looking up the
+the way, and then loading the inode.  This is done by looking up the
 inode.
 
 
 The Inode Object
 ----------------
 
-An individual dentry usually has a pointer to an inode. Inodes are
+An individual dentry usually has a pointer to an inode.  Inodes are
 filesystem objects such as regular files, directories, FIFOs and other
 beasts.  They live either on the disc (for block device filesystems)
-or in the memory (for pseudo filesystems). Inodes that live on the
+or in the memory (for pseudo filesystems).  Inodes that live on the
 disc are copied into the memory when required and changes to the inode
-are written back to disc. A single inode can be pointed to by multiple
+are written back to disc.  A single inode can be pointed to by multiple
 dentries (hard links, for example, do this).
 
 To look up an inode requires that the VFS calls the lookup() method of
-the parent directory inode. This method is installed by the specific
-filesystem implementation that the inode lives in. Once the VFS has
+the parent directory inode.  This method is installed by the specific
+filesystem implementation that the inode lives in.  Once the VFS has
 the required dentry (and hence the inode), we can do all those boring
 things like open(2) the file, or stat(2) it to peek at the inode
-data. The stat(2) operation is fairly simple: once the VFS has the
+data.  The stat(2) operation is fairly simple: once the VFS has the
 dentry, it peeks at the inode data and passes some of it back to
 userspace.
 
@@ -67,17 +67,17 @@ The File Object
 
 Opening a file requires another operation: allocation of a file
 structure (this is the kernel-side implementation of file
-descriptors). The freshly allocated file structure is initialized with
+descriptors).  The freshly allocated file structure is initialized with
 a pointer to the dentry and a set of file operation member functions.
-These are taken from the inode data. The open() file method is then
-called so the specific filesystem implementation can do its work. You
-can see that this is another switch performed by the VFS. The file
+These are taken from the inode data.  The open() file method is then
+called so the specific filesystem implementation can do its work.  You
+can see that this is another switch performed by the VFS.  The file
 structure is placed into the file descriptor table for the process.
 
 Reading, writing and closing files (and other assorted VFS operations)
 is done by using the userspace file descriptor to grab the appropriate
 file structure, and then calling the required file structure method to
-do whatever is required. For as long as the file is open, it keeps the
+do whatever is required.  For as long as the file is open, it keeps the
 dentry in use, which in turn means that the VFS inode is still in use.
 
 
@@ -92,7 +92,7 @@ functions:
    extern int register_filesystem(struct file_system_type *);
    extern int unregister_filesystem(struct file_system_type *);
 
-The passed struct file_system_type describes your filesystem. When a
+The passed struct file_system_type describes your filesystem.  When a
 request is made to mount a filesystem onto a directory in your namespace,
 the VFS will call the appropriate mount() method for the specific
 filesystem.  New vfsmount referring to the tree returned by ->mount()
@@ -106,7 +106,7 @@ file /proc/filesystems.
 struct file_system_type
 -----------------------
 
-This describes the filesystem. As of kernel 2.6.39, the following
+This describes the filesystem.  As of kernel 2.6.39, the following
 members are defined:
 
 struct file_system_type {
@@ -168,12 +168,12 @@ point of view is a reference to dentry at the root of (sub)tree to
 be attached; creation of new superblock is a common side effect.
 
 The most interesting member of the superblock structure that the
-mount() method fills in is the "s_op" field. This is a pointer to
+mount() method fills in is the "s_op" field.  This is a pointer to
 a "struct super_operations" which describes the next level of the
 filesystem implementation.
 
 Usually, a filesystem uses one of the generic mount() implementations
-and provides a fill_super() callback instead. The generic variants are:
+and provides a fill_super() callback instead.  The generic variants are:
 
   mount_bdev: mount a filesystem residing on a block device
 
@@ -184,7 +184,7 @@ and provides a fill_super() callback instead. The generic variants are:
 
 A fill_super() callback implementation has the following arguments:
 
-  struct super_block *sb: the superblock structure. The callback
+  struct super_block *sb: the superblock structure.  The callback
 	must initialize this properly.
 
   void *data: arbitrary mount options, usually comes as an ASCII
@@ -203,7 +203,7 @@ struct super_operations
 -----------------------
 
 This describes how the VFS can manipulate the superblock of your
-filesystem. As of kernel 2.6.22, the following members are defined:
+filesystem.  As of kernel 2.6.22, the following members are defined:
 
 struct super_operations {
         struct inode *(*alloc_inode)(struct super_block *sb);
@@ -231,7 +231,7 @@ struct super_operations {
 };
 
 All methods are called without any locks being held, unless otherwise
-noted. This means that most methods can block safely. All methods are
+noted.  This means that most methods can block safely.  All methods are
 only called from a process context (i.e. not from an interrupt handler
 or bottom half).
 
@@ -268,11 +268,11 @@ or bottom half).
   delete_inode: called when the VFS wants to delete an inode
 
   put_super: called when the VFS wishes to free the superblock
-	(i.e. unmount). This is called with the superblock lock held
+	(i.e. unmount).  This is called with the superblock lock held
 
   sync_fs: called when VFS is writing out all dirty data associated with
-	a superblock. The second parameter indicates whether the method
-	should wait until the write out has been completed. Optional.
+	a superblock.  The second parameter indicates whether the method
+	should wait until the write out has been completed.  Optional.
 
   freeze_fs: called when VFS is locking a filesystem and
 	forcing it into a consistent state.  This method is currently
@@ -283,10 +283,10 @@ or bottom half).
 
   statfs: called when the VFS needs to get filesystem statistics.
 
-  remount_fs: called when the filesystem is remounted. This is called
+  remount_fs: called when the filesystem is remounted.  This is called
 	with the kernel lock held
 
-  clear_inode: called then the VFS clears the inode. Optional
+  clear_inode: called then the VFS clears the inode.  Optional
 
   umount_begin: called when the VFS is unmounting a filesystem.
 
@@ -307,17 +307,17 @@ or bottom half).
 	implement ->nr_cached_objects for it to be called correctly.
 
 	We can't do anything with any errors that the filesystem might
-	encountered, hence the void return type. This will never be called if
+	encountered, hence the void return type.  This will never be called if
 	the VM is trying to reclaim under GFP_NOFS conditions, hence this
 	method does not need to handle that situation itself.
 
 	Implementations must include conditional reschedule calls inside any
-	scanning loop that is done. This allows the VFS to determine
+	scanning loop that is done.  This allows the VFS to determine
 	appropriate scan batch sizes without having to worry about whether
 	implementations will cause holdoff problems due to large scan batch
 	sizes.
 
-Whoever sets up the inode is responsible for filling in the "i_op" field. This
+Whoever sets up the inode is responsible for filling in the "i_op" field.  This
 is a pointer to a "struct inode_operations" which describes the methods that
 can be performed on individual inodes.
 
@@ -361,7 +361,7 @@ struct inode_operations
 -----------------------
 
 This describes how the VFS can manipulate an inode in your
-filesystem. As of kernel 2.6.22, the following members are defined:
+filesystem.  As of kernel 2.6.22, the following members are defined:
 
 struct inode_operations {
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
@@ -391,19 +391,19 @@ struct inode_operations {
 Again, all methods are called without any locks being held, unless
 otherwise noted.
 
-  create: called by the open(2) and creat(2) system calls. Only
-	required if you want to support regular files. The dentry you
+  create: called by the open(2) and creat(2) system calls.  Only
+	required if you want to support regular files.  The dentry you
 	get should not have an inode (i.e. it should be a negative
-	dentry). Here you will probably call d_instantiate() with the
+	dentry).  Here you will probably call d_instantiate() with the
 	dentry and the newly created inode
 
   lookup: called when the VFS needs to look up an inode in a parent
-	directory. The name to look for is found in the dentry. This
+	directory.  The name to look for is found in the dentry.  This
 	method must call d_add() to insert the found inode into the
-	dentry. The "i_count" field in the inode structure should be
-	incremented. If the named inode does not exist a NULL inode
+	dentry.  The "i_count" field in the inode structure should be
+	incremented.  If the named inode does not exist a NULL inode
 	should be inserted into the dentry (this is called a negative
-	dentry). Returning an error code from this routine must only
+	dentry).  Returning an error code from this routine must only
 	be done on a real error, otherwise creating inodes with system
 	calls like create(2), mknod(2), mkdir(2) and so on will fail.
 	If you wish to overload the dentry methods then you should
@@ -411,27 +411,27 @@ otherwise noted.
 	to a struct "dentry_operations".
 	This method is called with the directory inode semaphore held
 
-  link: called by the link(2) system call. Only required if you want
-	to support hard links. You will probably need to call
+  link: called by the link(2) system call.  Only required if you want
+	to support hard links.  You will probably need to call
 	d_instantiate() just as you would in the create() method
 
-  unlink: called by the unlink(2) system call. Only required if you
+  unlink: called by the unlink(2) system call.  Only required if you
 	want to support deleting inodes
 
-  symlink: called by the symlink(2) system call. Only required if you
-	want to support symlinks. You will probably need to call
+  symlink: called by the symlink(2) system call.  Only required if you
+	want to support symlinks.  You will probably need to call
 	d_instantiate() just as you would in the create() method
 
-  mkdir: called by the mkdir(2) system call. Only required if you want
-	to support creating subdirectories. You will probably need to
+  mkdir: called by the mkdir(2) system call.  Only required if you want
+	to support creating subdirectories.  You will probably need to
 	call d_instantiate() just as you would in the create() method
 
-  rmdir: called by the rmdir(2) system call. Only required if you want
+  rmdir: called by the rmdir(2) system call.  Only required if you want
 	to support deleting subdirectories
 
   mknod: called by the mknod(2) system call to create a device (char,
-	block) inode or a named pipe (FIFO) or socket. Only required
-	if you want to support creating these types of inodes. You
+	block) inode or a named pipe (FIFO) or socket.  Only required
+	if you want to support creating these types of inodes.  You
 	will probably need to call d_instantiate() just as you would
 	in the create() method
 
@@ -478,21 +478,21 @@ otherwise noted.
   permission: called by the VFS to check for access rights on a POSIX-like
 	filesystem.
 
-	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk
-	mode, the filesystem must check the permission without blocking or
+	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in rcu-walk
+        mode, the filesystem must check the permission without blocking or
 	storing to the inode.
 
 	If a situation is encountered that rcu-walk cannot handle, return
 	-ECHILD and it will be called again in ref-walk mode.
 
-  setattr: called by the VFS to set attributes for a file. This method
+  setattr: called by the VFS to set attributes for a file.  This method
 	is called by chmod(2) and related system calls.
 
-  getattr: called by the VFS to get attributes of a file. This method
+  getattr: called by the VFS to get attributes of a file.  This method
 	is called by stat(2) and related system calls.
 
   listxattr: called by the VFS to list all extended attributes for a
-	given file. This method is called by the listxattr(2) system call.
+	given file.  This method is called by the listxattr(2) system call.
 
   update_time: called by the VFS to update a specific time or the i_version of
 	an inode.  If this is not defined the VFS will update the inode itself
@@ -530,7 +530,7 @@ The first can be used independently to the others.  The VM can try to
 either write dirty pages in order to clean them, or release clean
 pages in order to reuse them.  To do this it can call the ->writepage
 method on dirty pages, and ->releasepage on clean pages with
-PagePrivate set. Clean pages without PagePrivate and with no external
+PagePrivate set.  Clean pages without PagePrivate and with no external
 references will be released without notice being given to the
 address_space.
 
@@ -538,7 +538,7 @@ To achieve this functionality, pages need to be placed on an LRU with
 lru_cache_add and mark_page_active needs to be called whenever the
 page is used.
 
-Pages are normally kept in a radix tree index by ->index. This tree
+Pages are normally kept in a radix tree index by ->index.  This tree
 maintains information about the PG_Dirty and PG_Writeback status of
 each page, so that pages with either of these flags can be found
 quickly.
@@ -624,7 +624,7 @@ struct address_space_operations
 -------------------------------
 
 This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem. The following members are defined:
+your filesystem.  The following members are defined:
 
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -704,7 +704,7 @@ struct address_space_operations {
         PAGECACHE_TAG_DIRTY tag in the radix tree.
 
   readpages: called by the VM to read pages associated with the address_space
-	object. This is essentially just a vector version of
+	object.  This is essentially just a vector version of
 	readpage.  Instead of just one page, several pages are
 	requested.
 	readpages is only used for read-ahead, so read errors are
@@ -712,7 +712,7 @@ struct address_space_operations {
 
   write_begin:
 	Called by the generic buffered write code to ask the filesystem to
-	prepare to write len bytes at the given offset in the file. The
+	prepare to write len bytes at the given offset in the file.  The
 	address_space should check that the write will be able to complete,
 	by allocating space if necessary and doing any other internal
 	housekeeping.  If the write will update parts of any basic-blocks on
@@ -735,7 +735,7 @@ struct address_space_operations {
 	which case write_end is not called.
 
   write_end: After a successful write_begin, and data copy, write_end must
-        be called. len is the original len passed to write_begin, and copied
+        be called.  len is the original len passed to write_begin, and copied
         is the amount that was able to be copied.
 
         The filesystem must take care of unlocking the page and releasing it
@@ -745,7 +745,7 @@ struct address_space_operations {
         that were able to be copied into pagecache.
 
   bmap: called by the VFS to map a logical block offset within object to
-	physical block number. This method is used by the FIBMAP
+	physical block number.  This method is used by the FIBMAP
 	ioctl and for working with swap-files.  To be able to swap to
 	a file, the file must have a stable mapping to a block
 	device.  The swap system does not go through the filesystem
@@ -757,7 +757,7 @@ struct address_space_operations {
 	from the address space.  This generally corresponds to either a
 	truncation, punch hole  or a complete invalidation of the address
 	space (in the latter case 'offset' will always be 0 and 'length'
-	will be PAGE_SIZE). Any private data associated with the page
+	will be PAGE_SIZE).  Any private data associated with the page
 	should be updated to reflect this truncation.  If offset is 0 and
 	length is PAGE_SIZE, then the private data should be released,
 	because the page must be able to be completely discarded.  This may
@@ -767,7 +767,7 @@ struct address_space_operations {
   releasepage: releasepage is called on PagePrivate pages to indicate
         that the page should be freed if possible.  ->releasepage
         should remove any private data from the page and clear the
-        PagePrivate flag. If releasepage() fails for some reason, it must
+        PagePrivate flag.  If releasepage() fails for some reason, it must
 	indicate failure with a 0 return value.
 	releasepage() is used in two distinct though related cases.  The
 	first is when the VM finds a clean page with no active users and
@@ -787,7 +787,7 @@ struct address_space_operations {
 
   freepage: freepage is called once the page is no longer visible in
         the page cache in order to allow the cleanup of any private
-	data. Since it may be called by the memory reclaimer, it
+	data.  Since it may be called by the memory reclaimer, it
 	should not assume that the original address_space mapping still
 	exists, and it should not block.
 
@@ -809,32 +809,32 @@ struct address_space_operations {
 
   putback_page: Called by the VM when isolated page's migration fails.
 
-  launder_page: Called before freeing a page - it writes back the dirty page. To
+  launder_page: Called before freeing a page - it writes back the dirty page.  To
 	prevent redirtying the page, it is kept locked during the whole
 	operation.
 
   is_partially_uptodate: Called by the VM when reading a file through the
-	pagecache when the underlying blocksize != pagesize. If the required
+	pagecache when the underlying blocksize != pagesize.  If the required
 	block is up to date then the read can complete without needing the IO
 	to bring the whole page up to date.
 
   is_dirty_writeback: Called by the VM when attempting to reclaim a page.
 	The VM uses dirty and writeback information to determine if it needs
-	to stall to allow flushers a chance to complete some IO. Ordinarily
+	to stall to allow flushers a chance to complete some IO.  Ordinarily
 	it can use PageDirty and PageWriteback but some filesystems have
 	more complex state (unstable pages in NFS prevent reclaim) or
-	do not set those flags due to locking problems. This callback
+	do not set those flags due to locking problems.  This callback
 	allows a filesystem to indicate to the VM if a page should be
 	treated as dirty or writeback for the purposes of stalling.
 
   error_remove_page: normally set to generic_error_remove_page if truncation
-	is ok for this address space. Used for memory failure handling.
+	is ok for this address space.  Used for memory failure handling.
 	Setting this implies you deal with pages going away under you,
 	unless you have them locked or reference counts increased.
 
   swap_activate: Called when swapon is used on a file to allocate
 	space if necessary and pin the block lookup information in
-	memory. A return value of zero indicates success,
+	memory.  A return value of zero indicates success,
 	in which case this file can be used to back swapspace.
 
   swap_deactivate: Called during swapoff on files where swap_activate
@@ -844,14 +844,14 @@ struct address_space_operations {
 The File Object
 ===============
 
-A file object represents a file opened by a process. This is also known
+A file object represents a file opened by a process.  This is also known
 as an "open file description" in POSIX parlance.
 
 
 struct file_operations
 ----------------------
 
-This describes how the VFS can manipulate an open file. As of kernel
+This describes how the VFS can manipulate an open file.  As of kernel
 4.18, the following members are defined:
 
 struct file_operations {
@@ -916,7 +916,7 @@ otherwise noted.
 
   poll: called by the VFS when a process wants to check if there is
 	activity on this file and (optionally) go to sleep until there
-	is activity. Called by the select(2) and poll(2) system calls
+	is activity.  Called by the select(2) and poll(2) system calls
 
   unlocked_ioctl: called by the ioctl(2) system call.
 
@@ -925,13 +925,13 @@ otherwise noted.
 
   mmap: called by the mmap(2) system call
 
-  open: called by the VFS when an inode should be opened. When the VFS
-	opens a file, it creates a new "struct file". It then calls the
-	open method for the newly allocated file structure. You might
+  open: called by the VFS when an inode should be opened.  When the VFS
+	opens a file, it creates a new "struct file".  It then calls the
+	open method for the newly allocated file structure.  You might
 	think that the open method really belongs in
-	"struct inode_operations", and you may be right. I think it's
+	"struct inode_operations", and you may be right.  I think it's
 	done the way it is because it makes filesystems simpler to
-	implement. The open() method is a good place to initialize the
+	implement.  The open() method is a good place to initialize the
 	"private_data" member in the file structure if you want to point
 	to a device structure
 
@@ -939,7 +939,7 @@ otherwise noted.
 
   release: called when the last reference to an open file is closed
 
-  fsync: called by the fsync(2) system call. Also see the section above
+  fsync: called by the fsync(2) system call.  Also see the section above
 	 entitled "Handling errors during writeback".
 
   fasync: called by the fcntl(2) system call when asynchronous
@@ -954,13 +954,13 @@ otherwise noted.
 
   flock: called by the flock(2) system call
 
-  splice_write: called by the VFS to splice data from a pipe to a file. This
+  splice_write: called by the VFS to splice data from a pipe to a file.  This
 		method is used by the splice(2) system call
 
-  splice_read: called by the VFS to splice data from file to a pipe. This
+  splice_read: called by the VFS to splice data from file to a pipe.  This
 	       method is used by the splice(2) system call
 
-  setlease: called by the VFS to set or release a file lock lease. setlease
+  setlease: called by the VFS to set or release a file lock lease.  setlease
 	    implementations should call generic_setlease to record or remove
 	    the lease in the inode after setting it.
 
@@ -984,12 +984,12 @@ otherwise noted.
   fadvise: possibly called by the fadvise64() system call.
 
 Note that the file operations are implemented by the specific
-filesystem in which the inode resides. When opening a device node
+filesystem in which the inode resides.  When opening a device node
 (character or block special) most filesystems will call special
 support routines in the VFS which will locate the required device
-driver information. These support routines replace the filesystem file
+driver information.  These support routines replace the filesystem file
 operations with those for the device driver, and then proceed to call
-the new open() method for the file. This is how opening a device file
+the new open() method for the file.  This is how opening a device file
 in the filesystem eventually ends up calling the device driver open()
 method.
 
@@ -1002,10 +1002,10 @@ struct dentry_operations
 ------------------------
 
 This describes how a filesystem can overload the standard dentry
-operations. Dentries and the dcache are the domain of the VFS and the
-individual filesystem implementations. Device drivers have no business
-here. These methods may be set to NULL, as they are either optional or
-the VFS uses a default. As of kernel 2.6.22, the following members are
+operations.  Dentries and the dcache are the domain of the VFS and the
+individual filesystem implementations.  Device drivers have no business
+here.  These methods may be set to NULL, as they are either optional or
+the VFS uses a default.  As of kernel 2.6.22, the following members are
 defined:
 
 struct dentry_operations {
@@ -1024,10 +1024,10 @@ struct dentry_operations {
 	struct dentry *(*d_real)(struct dentry *, const struct inode *);
 };
 
-  d_revalidate: called when the VFS needs to revalidate a dentry. This
+  d_revalidate: called when the VFS needs to revalidate a dentry.  This
 	is called whenever a name look-up finds a dentry in the
-	dcache. Most local filesystems leave this as NULL, because all their
-	dentries in the dcache are valid. Network filesystems are different
+	dcache.  Most local filesystems leave this as NULL, because all their
+	dentries in the dcache are valid.  Network filesystems are different
 	since things can change on the server without the client necessarily
 	being aware of it.
 
@@ -1045,11 +1045,11 @@ struct dentry_operations {
 
  d_weak_revalidate: called when the VFS needs to revalidate a "jumped" dentry.
 	This is called when a path-walk ends at dentry that was not acquired by
-	doing a lookup in the parent directory. This includes "/", "." and "..",
+	doing a lookup in the parent directory.  This includes "/", "." and "..",
 	as well as procfs-style symlinks and mountpoint traversal.
 
 	In this case, we are less concerned with whether the dentry is still
-	fully correct, but rather that the inode is still valid. As with
+	fully correct, but rather that the inode is still valid.  As with
 	d_revalidate, most local filesystems will set this to NULL since their
 	dcache entries are always valid.
 
@@ -1057,17 +1057,17 @@ struct dentry_operations {
 
 	d_weak_revalidate is only called after leaving rcu-walk mode.
 
-  d_hash: called when the VFS adds a dentry to the hash table. The first
+  d_hash: called when the VFS adds a dentry to the hash table.  The first
 	dentry passed to d_hash is the parent directory that the name is
 	to be hashed into.
 
 	Same locking and synchronisation rules as d_compare regarding
 	what is safe to dereference etc.
 
-  d_compare: called to compare a dentry name with a given name. The first
+  d_compare: called to compare a dentry name with a given name.  The first
 	dentry is the parent of the dentry to be compared, the second is
-	the child dentry. len and name string are properties of the dentry
-	to be compared. qstr is the name to compare it with.
+	the child dentry.  len and name string are properties of the dentry
+	to be compared.  qstr is the name to compare it with.
 
 	Must be constant and idempotent, and should not take locks if
 	possible, and should not or store into the dentry.
@@ -1082,9 +1082,9 @@ struct dentry_operations {
 	"rcu-walk", ie. without any locks or references on things.
 
   d_delete: called when the last reference to a dentry is dropped and the
-	dcache is deciding whether or not to cache it. Return 1 to delete
-	immediately, or 0 to cache the dentry. Default is NULL which means to
-	always cache a reachable dentry. d_delete must be constant and
+	dcache is deciding whether or not to cache it.  Return 1 to delete
+	immediately, or 0 to cache the dentry.  Default is NULL which means to
+	always cache a reachable dentry.  d_delete must be constant and
 	idempotent.
 
   d_init: called when a dentry is allocated
@@ -1092,19 +1092,19 @@ struct dentry_operations {
   d_release: called when a dentry is really deallocated
 
   d_iput: called when a dentry loses its inode (just prior to its
-	being deallocated). The default when this is NULL is that the
-	VFS calls iput(). If you define this method, you must call
+	being deallocated).  The default when this is NULL is that the
+	VFS calls iput().  If you define this method, you must call
 	iput() yourself
 
   d_dname: called when the pathname of a dentry should be generated.
 	Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
-	pathname generation. (Instead of doing it when dentry is created,
-	it's done only when the path is needed.). Real filesystems probably
+	pathname generation.  (Instead of doing it when dentry is created,
+	it's done only when the path is needed.).  Real filesystems probably
 	dont want to use it, because their dentries are present in global
-	dcache hash, so their hash should be an invariant. As no lock is
+	dcache hash, so their hash should be an invariant.  As no lock is
 	held, d_dname() should not try to modify the dentry itself, unless
-	appropriate SMP safety is used. CAUTION : d_path() logic is quite
-	tricky. The correct way to return for example "Hello" is to put it
+	appropriate SMP safety is used.  CAUTION : d_path() logic is quite
+	tricky.  The correct way to return for example "Hello" is to put it
 	at the end of the buffer, and returns a pointer to the first char.
 	dynamic_dname() helper function is provided to take care of this.
 
@@ -1166,7 +1166,7 @@ struct dentry_operations {
 	With NULL inode the topmost real underlying dentry is returned.
 
 Each dentry has a pointer to its parent dentry, as well as a hash list
-of child dentries. Child dentries are basically like files in a
+of child dentries.  Child dentries are basically like files in a
 directory.
 
 
@@ -1179,36 +1179,36 @@ manipulate dentries:
   dget: open a new handle for an existing dentry (this just increments
 	the usage count)
 
-  dput: close a handle for a dentry (decrements the usage count). If
+  dput: close a handle for a dentry (decrements the usage count).  If
 	the usage count drops to 0, and the dentry is still in its
 	parent's hash, the "d_delete" method is called to check whether
-	it should be cached. If it should not be cached, or if the dentry
-	is not hashed, it is deleted. Otherwise cached dentries are put
+	it should be cached.  If it should not be cached, or if the dentry
+	is not hashed, it is deleted.  Otherwise cached dentries are put
 	into an LRU list to be reclaimed on memory shortage.
 
-  d_drop: this unhashes a dentry from its parents hash list. A
+  d_drop: this unhashes a dentry from its parents hash list.  A
 	subsequent call to dput() will deallocate the dentry if its
 	usage count drops to 0
 
-  d_delete: delete a dentry. If there are no other open references to
+  d_delete: delete a dentry.  If there are no other open references to
 	the dentry then the dentry is turned into a negative dentry
-	(the d_iput() method is called). If there are other
+	(the d_iput() method is called).  If there are other
 	references, then d_drop() is called instead
 
   d_add: add a dentry to its parents hash list and then calls
 	d_instantiate()
 
   d_instantiate: add a dentry to the alias hash list for the inode and
-	updates the "d_inode" member. The "i_count" member in the
-	inode structure should be set/incremented. If the inode
+	updates the "d_inode" member.  The "i_count" member in the
+	inode structure should be set/incremented.  If the inode
 	pointer is NULL, the dentry is called a "negative
-	dentry". This function is commonly called when an inode is
+	dentry".  This function is commonly called when an inode is
 	created for an existing negative dentry
 
   d_lookup: look up a dentry given its parent and path name component
 	It looks up the child of that given name from the dcache
-	hash table. If it is found, the reference count is incremented
-	and the dentry is returned. The caller must use dput()
+	hash table.  If it is found, the reference count is incremented
+	and the dentry is returned.  The caller must use dput()
 	to free the dentry when it finishes using it.
 
 Mount Options
-- 
cgit v1.2.3-59-g8ed1b


From 90caa781f6402a08b4e602fab7017baa3cee3a28 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:07 +1000
Subject: docs: filesystems: vfs: Use 72 character column width

In preparation for conversion to RST format use the kernels favoured
documentation column width.  If we are going to do this we might as well
do it thoroughly.  Just do the paragraphs (not the indented stuff), the
rest will be done during indentation fix up patch.

This patch is whitespace only, no textual changes.

Use 72 character column width for all paragraph sections.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 198 +++++++++++++++++++-------------------
 1 file changed, 97 insertions(+), 101 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 6088b925aa7f..1cd0e658137a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -12,15 +12,14 @@
 Introduction
 ============
 
-The Virtual File System (also known as the Virtual Filesystem Switch)
-is the software layer in the kernel that provides the filesystem
-interface to userspace programs.  It also provides an abstraction
-within the kernel which allows different filesystem implementations to
-coexist.
+The Virtual File System (also known as the Virtual Filesystem Switch) is
+the software layer in the kernel that provides the filesystem interface
+to userspace programs.  It also provides an abstraction within the
+kernel which allows different filesystem implementations to coexist.
 
-VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so
-on are called from a process context.  Filesystem locking is described
-in the document Documentation/filesystems/Locking.
+VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
+are called from a process context.  Filesystem locking is described in
+the document Documentation/filesystems/Locking.
 
 
 Directory Entry Cache (dcache)
@@ -34,11 +33,10 @@ translate a pathname (filename) into a specific dentry.  Dentries live
 in RAM and are never saved to disc: they exist only for performance.
 
 The dentry cache is meant to be a view into your entire filespace.  As
-most computers cannot fit all dentries in the RAM at the same time,
-some bits of the cache are missing.  In order to resolve your pathname
-into a dentry, the VFS may have to resort to creating dentries along
-the way, and then loading the inode.  This is done by looking up the
-inode.
+most computers cannot fit all dentries in the RAM at the same time, some
+bits of the cache are missing.  In order to resolve your pathname into a
+dentry, the VFS may have to resort to creating dentries along the way,
+and then loading the inode.  This is done by looking up the inode.
 
 
 The Inode Object
@@ -46,33 +44,32 @@ The Inode Object
 
 An individual dentry usually has a pointer to an inode.  Inodes are
 filesystem objects such as regular files, directories, FIFOs and other
-beasts.  They live either on the disc (for block device filesystems)
-or in the memory (for pseudo filesystems).  Inodes that live on the
-disc are copied into the memory when required and changes to the inode
-are written back to disc.  A single inode can be pointed to by multiple
+beasts.  They live either on the disc (for block device filesystems) or
+in the memory (for pseudo filesystems).  Inodes that live on the disc
+are copied into the memory when required and changes to the inode are
+written back to disc.  A single inode can be pointed to by multiple
 dentries (hard links, for example, do this).
 
 To look up an inode requires that the VFS calls the lookup() method of
 the parent directory inode.  This method is installed by the specific
-filesystem implementation that the inode lives in.  Once the VFS has
-the required dentry (and hence the inode), we can do all those boring
-things like open(2) the file, or stat(2) it to peek at the inode
-data.  The stat(2) operation is fairly simple: once the VFS has the
-dentry, it peeks at the inode data and passes some of it back to
-userspace.
+filesystem implementation that the inode lives in.  Once the VFS has the
+required dentry (and hence the inode), we can do all those boring things
+like open(2) the file, or stat(2) it to peek at the inode data.  The
+stat(2) operation is fairly simple: once the VFS has the dentry, it
+peeks at the inode data and passes some of it back to userspace.
 
 
 The File Object
 ---------------
 
 Opening a file requires another operation: allocation of a file
-structure (this is the kernel-side implementation of file
-descriptors).  The freshly allocated file structure is initialized with
-a pointer to the dentry and a set of file operation member functions.
-These are taken from the inode data.  The open() file method is then
-called so the specific filesystem implementation can do its work.  You
-can see that this is another switch performed by the VFS.  The file
-structure is placed into the file descriptor table for the process.
+structure (this is the kernel-side implementation of file descriptors).
+The freshly allocated file structure is initialized with a pointer to
+the dentry and a set of file operation member functions.  These are
+taken from the inode data.  The open() file method is then called so the
+specific filesystem implementation can do its work.  You can see that
+this is another switch performed by the VFS.  The file structure is
+placed into the file descriptor table for the process.
 
 Reading, writing and closing files (and other assorted VFS operations)
 is done by using the userspace file descriptor to grab the appropriate
@@ -93,11 +90,12 @@ functions:
    extern int unregister_filesystem(struct file_system_type *);
 
 The passed struct file_system_type describes your filesystem.  When a
-request is made to mount a filesystem onto a directory in your namespace,
-the VFS will call the appropriate mount() method for the specific
-filesystem.  New vfsmount referring to the tree returned by ->mount()
-will be attached to the mountpoint, so that when pathname resolution
-reaches the mountpoint it will jump into the root of that vfsmount.
+request is made to mount a filesystem onto a directory in your
+namespace, the VFS will call the appropriate mount() method for the
+specific filesystem.  New vfsmount referring to the tree returned by
+->mount() will be attached to the mountpoint, so that when pathname
+resolution reaches the mountpoint it will jump into the root of that
+vfsmount.
 
 You can see all filesystems that are registered to the kernel in the
 file /proc/filesystems.
@@ -156,21 +154,21 @@ The mount() method must return the root dentry of the tree requested by
 caller.  An active reference to its superblock must be grabbed and the
 superblock must be locked.  On failure it should return ERR_PTR(error).
 
-The arguments match those of mount(2) and their interpretation
-depends on filesystem type.  E.g. for block filesystems, dev_name is
-interpreted as block device name, that device is opened and if it
-contains a suitable filesystem image the method creates and initializes
-struct super_block accordingly, returning its root dentry to caller.
+The arguments match those of mount(2) and their interpretation depends
+on filesystem type.  E.g. for block filesystems, dev_name is interpreted
+as block device name, that device is opened and if it contains a
+suitable filesystem image the method creates and initializes struct
+super_block accordingly, returning its root dentry to caller.
 
 ->mount() may choose to return a subtree of existing filesystem - it
 doesn't have to create a new one.  The main result from the caller's
-point of view is a reference to dentry at the root of (sub)tree to
-be attached; creation of new superblock is a common side effect.
+point of view is a reference to dentry at the root of (sub)tree to be
+attached; creation of new superblock is a common side effect.
 
-The most interesting member of the superblock structure that the
-mount() method fills in is the "s_op" field.  This is a pointer to
-a "struct super_operations" which describes the next level of the
-filesystem implementation.
+The most interesting member of the superblock structure that the mount()
+method fills in is the "s_op" field.  This is a pointer to a "struct
+super_operations" which describes the next level of the filesystem
+implementation.
 
 Usually, a filesystem uses one of the generic mount() implementations
 and provides a fill_super() callback instead.  The generic variants are:
@@ -317,16 +315,16 @@ or bottom half).
 	implementations will cause holdoff problems due to large scan batch
 	sizes.
 
-Whoever sets up the inode is responsible for filling in the "i_op" field.  This
-is a pointer to a "struct inode_operations" which describes the methods that
-can be performed on individual inodes.
+Whoever sets up the inode is responsible for filling in the "i_op"
+field.  This is a pointer to a "struct inode_operations" which describes
+the methods that can be performed on individual inodes.
 
 struct xattr_handlers
 ---------------------
 
 On filesystems that support extended attributes (xattrs), the s_xattr
-superblock field points to a NULL-terminated array of xattr handlers.  Extended
-attributes are name:value pairs.
+superblock field points to a NULL-terminated array of xattr handlers.
+Extended attributes are name:value pairs.
 
   name: Indicates that the handler matches attributes with the specified name
 	(such as "system.posix_acl_access"); the prefix field must be NULL.
@@ -346,9 +344,9 @@ attributes are name:value pairs.
 	attribute.  This method is called by the the setxattr(2) and
 	removexattr(2) system calls.
 
-When none of the xattr handlers of a filesystem match the specified attribute
-name or when a filesystem doesn't support extended attributes, the various
-*xattr(2) system calls return -EOPNOTSUPP.
+When none of the xattr handlers of a filesystem match the specified
+attribute name or when a filesystem doesn't support extended attributes,
+the various *xattr(2) system calls return -EOPNOTSUPP.
 
 
 The Inode Object
@@ -360,8 +358,8 @@ An inode object represents an object within the filesystem.
 struct inode_operations
 -----------------------
 
-This describes how the VFS can manipulate an inode in your
-filesystem.  As of kernel 2.6.22, the following members are defined:
+This describes how the VFS can manipulate an inode in your filesystem.
+As of kernel 2.6.22, the following members are defined:
 
 struct inode_operations {
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
@@ -517,42 +515,40 @@ The Address Space Object
 ========================
 
 The address space object is used to group and manage pages in the page
-cache.  It can be used to keep track of the pages in a file (or
-anything else) and also track the mapping of sections of the file into
-process address spaces.
+cache.  It can be used to keep track of the pages in a file (or anything
+else) and also track the mapping of sections of the file into process
+address spaces.
 
 There are a number of distinct yet related services that an
-address-space can provide.  These include communicating memory
-pressure, page lookup by address, and keeping track of pages tagged as
-Dirty or Writeback.
+address-space can provide.  These include communicating memory pressure,
+page lookup by address, and keeping track of pages tagged as Dirty or
+Writeback.
 
 The first can be used independently to the others.  The VM can try to
-either write dirty pages in order to clean them, or release clean
-pages in order to reuse them.  To do this it can call the ->writepage
-method on dirty pages, and ->releasepage on clean pages with
-PagePrivate set.  Clean pages without PagePrivate and with no external
-references will be released without notice being given to the
-address_space.
+either write dirty pages in order to clean them, or release clean pages
+in order to reuse them.  To do this it can call the ->writepage method
+on dirty pages, and ->releasepage on clean pages with PagePrivate set.
+Clean pages without PagePrivate and with no external references will be
+released without notice being given to the address_space.
 
 To achieve this functionality, pages need to be placed on an LRU with
-lru_cache_add and mark_page_active needs to be called whenever the
-page is used.
+lru_cache_add and mark_page_active needs to be called whenever the page
+is used.
 
 Pages are normally kept in a radix tree index by ->index.  This tree
-maintains information about the PG_Dirty and PG_Writeback status of
-each page, so that pages with either of these flags can be found
-quickly.
+maintains information about the PG_Dirty and PG_Writeback status of each
+page, so that pages with either of these flags can be found quickly.
 
 The Dirty tag is primarily used by mpage_writepages - the default
 ->writepages method.  It uses the tag to find dirty pages to call
 ->writepage on.  If mpage_writepages is not used (i.e. the address
-provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is
-almost unused.  write_inode_now and sync_inode do use it (through
+provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
+unused.  write_inode_now and sync_inode do use it (through
 __sync_single_inode) to check if ->writepages has been successful in
 writing out the whole address_space.
 
-The Writeback tag is used by filemap*wait* and sync_page* functions,
-via filemap_fdatawait_range, to wait for all writeback to complete.
+The Writeback tag is used by filemap*wait* and sync_page* functions, via
+filemap_fdatawait_range, to wait for all writeback to complete.
 
 An address_space handler may attach extra information to a page,
 typically using the 'private' field in the 'struct page'.  If such
@@ -562,25 +558,24 @@ handler to deal with that data.
 
 An address space acts as an intermediate between storage and
 application.  Data is read into the address space a whole page at a
-time, and provided to the application either by copying of the page,
-or by memory-mapping the page.
-Data is written into the address space by the application, and then
-written-back to storage typically in whole pages, however the
-address_space has finer control of write sizes.
+time, and provided to the application either by copying of the page, or
+by memory-mapping the page.  Data is written into the address space by
+the application, and then written-back to storage typically in whole
+pages, however the address_space has finer control of write sizes.
 
 The read process essentially only requires 'readpage'.  The write
 process is more complicated and uses write_begin/write_end or
-set_page_dirty to write data into the address_space, and writepage
-and writepages to writeback data to storage.
+set_page_dirty to write data into the address_space, and writepage and
+writepages to writeback data to storage.
 
 Adding and removing pages to/from an address_space is protected by the
 inode's i_mutex.
 
 When data is written to a page, the PG_Dirty flag should be set.  It
 typically remains set until writepage asks for it to be written.  This
-should clear PG_Dirty and set PG_Writeback.  It can be actually
-written at any point after PG_Dirty is clear.  Once it is known to be
-safe, PG_Writeback is cleared.
+should clear PG_Dirty and set PG_Writeback.  It can be actually written
+at any point after PG_Dirty is clear.  Once it is known to be safe,
+PG_Writeback is cleared.
 
 Writeback makes use of a writeback_control structure to direct the
 operations.  This gives the the writepage and writepages operations some
@@ -609,9 +604,10 @@ file descriptors should get back an error is not possible.
 Instead, the generic writeback error tracking infrastructure in the
 kernel settles for reporting errors to fsync on all file descriptions
 that were open at the time that the error occurred.  In a situation with
-multiple writers, all of them will get back an error on a subsequent fsync,
-even if all of the writes done through that particular file descriptor
-succeeded (or even if there were no writes on that file descriptor at all).
+multiple writers, all of them will get back an error on a subsequent
+fsync, even if all of the writes done through that particular file
+descriptor succeeded (or even if there were no writes on that file
+descriptor at all).
 
 Filesystems that wish to use this infrastructure should call
 mapping_set_error to record the error in the address_space when it
@@ -623,8 +619,8 @@ point in the stream of errors emitted by the backing device(s).
 struct address_space_operations
 -------------------------------
 
-This describes how the VFS can manipulate mapping of a file to page cache in
-your filesystem.  The following members are defined:
+This describes how the VFS can manipulate mapping of a file to page
+cache in your filesystem.  The following members are defined:
 
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
@@ -1231,8 +1227,8 @@ filesystems.
 Showing options
 ---------------
 
-If a filesystem accepts mount options, it must define show_options()
-to show all the currently active options.  The rules are:
+If a filesystem accepts mount options, it must define show_options() to
+show all the currently active options.  The rules are:
 
   - options MUST be shown which are not default or their values differ
     from the default
@@ -1240,14 +1236,14 @@ to show all the currently active options.  The rules are:
   - options MAY be shown which are enabled by default or have their
     default value
 
-Options used only internally between a mount helper and the kernel
-(such as file descriptors), or which only have an effect during the
-mounting (such as ones controlling the creation of a journal) are exempt
-from the above rules.
+Options used only internally between a mount helper and the kernel (such
+as file descriptors), or which only have an effect during the mounting
+(such as ones controlling the creation of a journal) are exempt from the
+above rules.
 
-The underlying reason for the above rules is to make sure, that a
-mount can be accurately replicated (e.g. umounting and mounting again)
-based on the information found in /proc/mounts.
+The underlying reason for the above rules is to make sure, that a mount
+can be accurately replicated (e.g. umounting and mounting again) based
+on the information found in /proc/mounts.
 
 Resources
 =========
-- 
cgit v1.2.3-59-g8ed1b


From e04c83cd53b59e422157c4cea0cdc4e2f33fe305 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:08 +1000
Subject: docs: filesystems: vfs: Use uniform spacing around headings

Currently spacing before and after headings is non-uniform.  Use two
blank lines before a heading and one after the heading.

Use uniform spacing around headings.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 1cd0e658137a..242fd644c97b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -319,6 +319,7 @@ Whoever sets up the inode is responsible for filling in the "i_op"
 field.  This is a pointer to a "struct inode_operations" which describes
 the methods that can be performed on individual inodes.
 
+
 struct xattr_handlers
 ---------------------
 
@@ -511,6 +512,7 @@ otherwise noted.
   tmpfile: called in the end of O_TMPFILE open().  Optional, equivalent to
 	atomically creating, opening and unlinking a file in given directory.
 
+
 The Address Space Object
 ========================
 
@@ -584,8 +586,10 @@ and the constraints under which it is being done.  It is also used to
 return information back to the caller about the result of a writepage or
 writepages request.
 
+
 Handling errors during writeback
 --------------------------------
+
 Most applications that do buffered I/O will periodically call a file
 synchronization call (fsync, fdatasync, msync or sync_file_range) to
 ensure that data written has made it to the backing store.  When there
@@ -616,6 +620,7 @@ file->fsync operation, they should call file_check_and_advance_wb_err to
 ensure that the struct file's error cursor has advanced to the correct
 point in the stream of errors emitted by the backing device(s).
 
+
 struct address_space_operations
 -------------------------------
 
@@ -1207,9 +1212,11 @@ manipulate dentries:
 	and the dentry is returned.  The caller must use dput()
 	to free the dentry when it finishes using it.
 
+
 Mount Options
 =============
 
+
 Parsing options
 ---------------
 
@@ -1224,6 +1231,7 @@ The <linux/parser.h> header defines an API that helps parse these
 options.  There are plenty of examples on how to use it in existing
 filesystems.
 
+
 Showing options
 ---------------
 
@@ -1245,6 +1253,7 @@ The underlying reason for the above rules is to make sure, that a mount
 can be accurately replicated (e.g. umounting and mounting again) based
 on the information found in /proc/mounts.
 
+
 Resources
 =========
 
-- 
cgit v1.2.3-59-g8ed1b


From 90ac11a844f8859d5f960fb530190a9690a9a19b Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:09 +1000
Subject: docs: filesystems: vfs: Use correct initial heading

Kernel RST has a preferred heading adornment scheme.  Currently all the
heading adornments follow this scheme except the document heading.

Use correct heading adornment for initial heading.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 242fd644c97b..1167dd94d84b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -1,5 +1,6 @@
-
-	      Overview of the Linux Virtual File System
+=========================================
+Overview of the Linux Virtual File System
+=========================================
 
 	Original author: Richard Gooch <rgooch@atnf.csiro.au>
 
-- 
cgit v1.2.3-59-g8ed1b


From 099c5c7a3fba0c4686090075c6d214355aa67e47 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:10 +1000
Subject: docs: filesystems: vfs: Use SPDX identifier

Currently the licence is indicated via a custom string.  We have SPDX
license identifiers now for this task.

Use SPDX license identifier matching current license string.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 1167dd94d84b..bd6dd782e8ca 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 =========================================
 Overview of the Linux Virtual File System
 =========================================
@@ -7,8 +9,6 @@ Overview of the Linux Virtual File System
   Copyright (C) 1999 Richard Gooch
   Copyright (C) 2005 Pekka Enberg
 
-  This file is released under the GPLv2.
-
 
 Introduction
 ============
-- 
cgit v1.2.3-59-g8ed1b


From e66b045715457ca6e18fce2b2fc61dd8af2e2440 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:11 +1000
Subject: docs: filesystems: vfs: Fix pre-amble indentation

Currently file pre-amble contains custom indentation.  RST is not going
to like this, lets left-align the text.  Put the copyright notices in a
list in preparation for converting document to RST.

Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index bd6dd782e8ca..9ed5c8d6e656 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -4,10 +4,10 @@
 Overview of the Linux Virtual File System
 =========================================
 
-	Original author: Richard Gooch <rgooch@atnf.csiro.au>
+Original author: Richard Gooch <rgooch@atnf.csiro.au>
 
-  Copyright (C) 1999 Richard Gooch
-  Copyright (C) 2005 Pekka Enberg
+- Copyright (C) 1999 Richard Gooch
+- Copyright (C) 2005 Pekka Enberg
 
 
 Introduction
-- 
cgit v1.2.3-59-g8ed1b


From 1b44ae63deae020e172866871bd14a76376e0f8b Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:12 +1000
Subject: docs: filesystems: vfs: Convert spaces to tabs

There are bunch of places with 8 spaces, in preparation for correctly
indenting all code snippets (during conversion to RST) change these to
use tabspaces.

This patch is whitespace only.

Convert instances of 8 consecutive spaces to a single tabspace.

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.txt | 124 +++++++++++++++++++-------------------
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 9ed5c8d6e656..4f4f4931bfa0 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -111,12 +111,12 @@ members are defined:
 struct file_system_type {
 	const char *name;
 	int fs_flags;
-        struct dentry *(*mount) (struct file_system_type *, int,
-                       const char *, void *);
-        void (*kill_sb) (struct super_block *);
-        struct module *owner;
-        struct file_system_type * next;
-        struct list_head fs_supers;
+	struct dentry *(*mount) (struct file_system_type *, int,
+		       const char *, void *);
+	void (*kill_sb) (struct super_block *);
+	struct module *owner;
+	struct file_system_type * next;
+	struct list_head fs_supers;
 	struct lock_class_key s_lock_key;
 	struct lock_class_key s_umount_key;
 };
@@ -205,26 +205,26 @@ This describes how the VFS can manipulate the superblock of your
 filesystem.  As of kernel 2.6.22, the following members are defined:
 
 struct super_operations {
-        struct inode *(*alloc_inode)(struct super_block *sb);
-        void (*destroy_inode)(struct inode *);
-
-        void (*dirty_inode) (struct inode *, int flags);
-        int (*write_inode) (struct inode *, int);
-        void (*drop_inode) (struct inode *);
-        void (*delete_inode) (struct inode *);
-        void (*put_super) (struct super_block *);
-        int (*sync_fs)(struct super_block *sb, int wait);
-        int (*freeze_fs) (struct super_block *);
-        int (*unfreeze_fs) (struct super_block *);
-        int (*statfs) (struct dentry *, struct kstatfs *);
-        int (*remount_fs) (struct super_block *, int *, char *);
-        void (*clear_inode) (struct inode *);
-        void (*umount_begin) (struct super_block *);
-
-        int (*show_options)(struct seq_file *, struct dentry *);
-
-        ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
-        ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
+	struct inode *(*alloc_inode)(struct super_block *sb);
+	void (*destroy_inode)(struct inode *);
+
+	void (*dirty_inode) (struct inode *, int flags);
+	int (*write_inode) (struct inode *, int);
+	void (*drop_inode) (struct inode *);
+	void (*delete_inode) (struct inode *);
+	void (*put_super) (struct super_block *);
+	int (*sync_fs)(struct super_block *sb, int wait);
+	int (*freeze_fs) (struct super_block *);
+	int (*unfreeze_fs) (struct super_block *);
+	int (*statfs) (struct dentry *, struct kstatfs *);
+	int (*remount_fs) (struct super_block *, int *, char *);
+	void (*clear_inode) (struct inode *);
+	void (*umount_begin) (struct super_block *);
+
+	int (*show_options)(struct seq_file *, struct dentry *);
+
+	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
+	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 	int (*nr_cached_objects)(struct super_block *);
 	void (*free_cached_objects)(struct super_block *, int);
 };
@@ -479,7 +479,7 @@ otherwise noted.
 	filesystem.
 
 	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in rcu-walk
-        mode, the filesystem must check the permission without blocking or
+	mode, the filesystem must check the permission without blocking or
 	storing to the inode.
 
 	If a situation is encountered that rcu-walk cannot handle, return
@@ -698,12 +698,12 @@ struct address_space_operations {
 	tagged as DIRTY and will pass them to ->writepage.
 
   set_page_dirty: called by the VM to set a page dirty.
-        This is particularly needed if an address space attaches
-        private data to a page, and that data needs to be updated when
-        a page is dirtied.  This is called, for example, when a memory
+	This is particularly needed if an address space attaches
+	private data to a page, and that data needs to be updated when
+	a page is dirtied.  This is called, for example, when a memory
 	mapped page gets modified.
 	If defined, it should set the PageDirty flag, and the
-        PAGECACHE_TAG_DIRTY tag in the radix tree.
+	PAGECACHE_TAG_DIRTY tag in the radix tree.
 
   readpages: called by the VM to read pages associated with the address_space
 	object.  This is essentially just a vector version of
@@ -721,7 +721,7 @@ struct address_space_operations {
 	storage, then those blocks should be pre-read (if they haven't been
 	read already) so that the updated blocks can be written out properly.
 
-        The filesystem must return the locked pagecache page for the specified
+	The filesystem must return the locked pagecache page for the specified
 	offset, in *pagep, for the caller to write into.
 
 	It must be able to cope with short writes (where the length passed to
@@ -730,21 +730,21 @@ struct address_space_operations {
 	flags is a field for AOP_FLAG_xxx flags, described in
 	include/linux/fs.h.
 
-        A void * may be returned in fsdata, which then gets passed into
-        write_end.
+	A void * may be returned in fsdata, which then gets passed into
+	write_end.
 
-        Returns 0 on success; < 0 on failure (which is the error code), in
+	Returns 0 on success; < 0 on failure (which is the error code), in
 	which case write_end is not called.
 
   write_end: After a successful write_begin, and data copy, write_end must
-        be called.  len is the original len passed to write_begin, and copied
-        is the amount that was able to be copied.
+	be called.  len is the original len passed to write_begin, and copied
+	is the amount that was able to be copied.
 
-        The filesystem must take care of unlocking the page and releasing it
-        refcount, and updating i_size.
+	The filesystem must take care of unlocking the page and releasing it
+	refcount, and updating i_size.
 
-        Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
-        that were able to be copied into pagecache.
+	Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
+	that were able to be copied into pagecache.
 
   bmap: called by the VFS to map a logical block offset within object to
 	physical block number.  This method is used by the FIBMAP
@@ -755,7 +755,7 @@ struct address_space_operations {
 	are and uses those addresses directly.
 
   invalidatepage: If a page has PagePrivate set, then invalidatepage
-        will be called when part or all of the page is to be removed
+	will be called when part or all of the page is to be removed
 	from the address space.  This generally corresponds to either a
 	truncation, punch hole  or a complete invalidation of the address
 	space (in the latter case 'offset' will always be 0 and 'length'
@@ -767,47 +767,47 @@ struct address_space_operations {
 	release MUST succeed.
 
   releasepage: releasepage is called on PagePrivate pages to indicate
-        that the page should be freed if possible.  ->releasepage
-        should remove any private data from the page and clear the
-        PagePrivate flag.  If releasepage() fails for some reason, it must
+	that the page should be freed if possible.  ->releasepage
+	should remove any private data from the page and clear the
+	PagePrivate flag.  If releasepage() fails for some reason, it must
 	indicate failure with a 0 return value.
 	releasepage() is used in two distinct though related cases.  The
 	first is when the VM finds a clean page with no active users and
-        wants to make it a free page.  If ->releasepage succeeds, the
-        page will be removed from the address_space and become free.
+	wants to make it a free page.  If ->releasepage succeeds, the
+	page will be removed from the address_space and become free.
 
 	The second case is when a request has been made to invalidate
-        some or all pages in an address_space.  This can happen
-        through the fadvise(POSIX_FADV_DONTNEED) system call or by the
-        filesystem explicitly requesting it as nfs and 9fs do (when
-        they believe the cache may be out of date with storage) by
-        calling invalidate_inode_pages2().
+	some or all pages in an address_space.  This can happen
+	through the fadvise(POSIX_FADV_DONTNEED) system call or by the
+	filesystem explicitly requesting it as nfs and 9fs do (when
+	they believe the cache may be out of date with storage) by
+	calling invalidate_inode_pages2().
 	If the filesystem makes such a call, and needs to be certain
-        that all pages are invalidated, then its releasepage will
-        need to ensure this.  Possibly it can clear the PageUptodate
-        bit if it cannot free private data yet.
+	that all pages are invalidated, then its releasepage will
+	need to ensure this.  Possibly it can clear the PageUptodate
+	bit if it cannot free private data yet.
 
   freepage: freepage is called once the page is no longer visible in
-        the page cache in order to allow the cleanup of any private
+	the page cache in order to allow the cleanup of any private
 	data.  Since it may be called by the memory reclaimer, it
 	should not assume that the original address_space mapping still
 	exists, and it should not block.
 
   direct_IO: called by the generic read/write routines to perform
-        direct_IO - that is IO requests which bypass the page cache
-        and transfer data directly between the storage and the
-        application's address space.
+	direct_IO - that is IO requests which bypass the page cache
+	and transfer data directly between the storage and the
+	application's address space.
 
   isolate_page: Called by the VM when isolating a movable non-lru page.
 	If page is successfully isolated, VM marks the page as PG_isolated
 	via __SetPageIsolated.
 
   migrate_page:  This is used to compact the physical memory usage.
-        If the VM wants to relocate a page (maybe off a memory card
-        that is signalling imminent failure) it will pass a new page
+	If the VM wants to relocate a page (maybe off a memory card
+	that is signalling imminent failure) it will pass a new page
 	and an old page to this function.  migrate_page should
 	transfer any private data across and update any references
-        that it has to the page.
+	that it has to the page.
 
   putback_page: Called by the VM when isolated page's migration fails.
 
-- 
cgit v1.2.3-59-g8ed1b


From af96c1e304f7051bf2ee64c9957724bdace05c58 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Wed, 15 May 2019 10:29:13 +1000
Subject: docs: filesystems: vfs: Convert vfs.txt to RST

vfs.txt is currently stale.  If we convert it to RST this is a good
first step in the process of getting the VFS documentation up to date.

This patch does the following (all as a single patch so as not to
introduce any new SPHINX build warnings)

 - Use '.. code-block:: c' for C code blocks and indent the code blocks.
 - Use double backticks for struct member descriptions.
 - Fix a couple of build warnings by guarding pointers (*) with double
   backticks .e.g  ``*ptr``.
 - Add vfs to Documentation/filesystems/index.rst

The member descriptions paragraph indentation was not touched.  It is
not pretty but these do not cause build warnings.  These descriptions
all need updating anyways so leave it as it is for now.

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |    1 +
 Documentation/filesystems/vfs.rst   | 1291 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/vfs.txt   | 1274 ----------------------------------
 3 files changed, 1292 insertions(+), 1274 deletions(-)
 create mode 100644 Documentation/filesystems/vfs.rst
 delete mode 100644 Documentation/filesystems/vfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 1131c34d77f6..35644840a690 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -16,6 +16,7 @@ algorithms work.
 .. toctree::
    :maxdepth: 2
 
+   vfs
    path-lookup.rst
    api-summary
    splice
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
new file mode 100644
index 000000000000..2ffbdf5f392c
--- /dev/null
+++ b/Documentation/filesystems/vfs.rst
@@ -0,0 +1,1291 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================================
+Overview of the Linux Virtual File System
+=========================================
+
+Original author: Richard Gooch <rgooch@atnf.csiro.au>
+
+- Copyright (C) 1999 Richard Gooch
+- Copyright (C) 2005 Pekka Enberg
+
+
+Introduction
+============
+
+The Virtual File System (also known as the Virtual Filesystem Switch) is
+the software layer in the kernel that provides the filesystem interface
+to userspace programs.  It also provides an abstraction within the
+kernel which allows different filesystem implementations to coexist.
+
+VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
+are called from a process context.  Filesystem locking is described in
+the document Documentation/filesystems/Locking.
+
+
+Directory Entry Cache (dcache)
+------------------------------
+
+The VFS implements the open(2), stat(2), chmod(2), and similar system
+calls.  The pathname argument that is passed to them is used by the VFS
+to search through the directory entry cache (also known as the dentry
+cache or dcache).  This provides a very fast look-up mechanism to
+translate a pathname (filename) into a specific dentry.  Dentries live
+in RAM and are never saved to disc: they exist only for performance.
+
+The dentry cache is meant to be a view into your entire filespace.  As
+most computers cannot fit all dentries in the RAM at the same time, some
+bits of the cache are missing.  In order to resolve your pathname into a
+dentry, the VFS may have to resort to creating dentries along the way,
+and then loading the inode.  This is done by looking up the inode.
+
+
+The Inode Object
+----------------
+
+An individual dentry usually has a pointer to an inode.  Inodes are
+filesystem objects such as regular files, directories, FIFOs and other
+beasts.  They live either on the disc (for block device filesystems) or
+in the memory (for pseudo filesystems).  Inodes that live on the disc
+are copied into the memory when required and changes to the inode are
+written back to disc.  A single inode can be pointed to by multiple
+dentries (hard links, for example, do this).
+
+To look up an inode requires that the VFS calls the lookup() method of
+the parent directory inode.  This method is installed by the specific
+filesystem implementation that the inode lives in.  Once the VFS has the
+required dentry (and hence the inode), we can do all those boring things
+like open(2) the file, or stat(2) it to peek at the inode data.  The
+stat(2) operation is fairly simple: once the VFS has the dentry, it
+peeks at the inode data and passes some of it back to userspace.
+
+
+The File Object
+---------------
+
+Opening a file requires another operation: allocation of a file
+structure (this is the kernel-side implementation of file descriptors).
+The freshly allocated file structure is initialized with a pointer to
+the dentry and a set of file operation member functions.  These are
+taken from the inode data.  The open() file method is then called so the
+specific filesystem implementation can do its work.  You can see that
+this is another switch performed by the VFS.  The file structure is
+placed into the file descriptor table for the process.
+
+Reading, writing and closing files (and other assorted VFS operations)
+is done by using the userspace file descriptor to grab the appropriate
+file structure, and then calling the required file structure method to
+do whatever is required.  For as long as the file is open, it keeps the
+dentry in use, which in turn means that the VFS inode is still in use.
+
+
+Registering and Mounting a Filesystem
+=====================================
+
+To register and unregister a filesystem, use the following API
+functions:
+
+.. code-block:: c
+
+	#include <linux/fs.h>
+
+	extern int register_filesystem(struct file_system_type *);
+	extern int unregister_filesystem(struct file_system_type *);
+
+The passed struct file_system_type describes your filesystem.  When a
+request is made to mount a filesystem onto a directory in your
+namespace, the VFS will call the appropriate mount() method for the
+specific filesystem.  New vfsmount referring to the tree returned by
+->mount() will be attached to the mountpoint, so that when pathname
+resolution reaches the mountpoint it will jump into the root of that
+vfsmount.
+
+You can see all filesystems that are registered to the kernel in the
+file /proc/filesystems.
+
+
+struct file_system_type
+-----------------------
+
+This describes the filesystem.  As of kernel 2.6.39, the following
+members are defined:
+
+.. code-block:: c
+
+	struct file_system_operations {
+		const char *name;
+		int fs_flags;
+		struct dentry *(*mount) (struct file_system_type *, int,
+					 const char *, void *);
+		void (*kill_sb) (struct super_block *);
+		struct module *owner;
+		struct file_system_type * next;
+		struct list_head fs_supers;
+		struct lock_class_key s_lock_key;
+		struct lock_class_key s_umount_key;
+	};
+
+``name``: the name of the filesystem type, such as "ext2", "iso9660",
+	"msdos" and so on
+
+``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
+
+``mount``: the method to call when a new instance of this filesystem should
+be mounted
+
+``kill_sb``: the method to call when an instance of this filesystem
+	should be shut down
+
+``owner``: for internal VFS use: you should initialize this to THIS_MODULE in
+	most cases.
+
+``next``: for internal VFS use: you should initialize this to NULL
+
+  s_lock_key, s_umount_key: lockdep-specific
+
+The mount() method has the following arguments:
+
+``struct file_system_type *fs_type``: describes the filesystem, partly initialized
+	by the specific filesystem code
+
+``int flags``: mount flags
+
+``const char *dev_name``: the device name we are mounting.
+
+``void *data``: arbitrary mount options, usually comes as an ASCII
+	string (see "Mount Options" section)
+
+The mount() method must return the root dentry of the tree requested by
+caller.  An active reference to its superblock must be grabbed and the
+superblock must be locked.  On failure it should return ERR_PTR(error).
+
+The arguments match those of mount(2) and their interpretation depends
+on filesystem type.  E.g. for block filesystems, dev_name is interpreted
+as block device name, that device is opened and if it contains a
+suitable filesystem image the method creates and initializes struct
+super_block accordingly, returning its root dentry to caller.
+
+->mount() may choose to return a subtree of existing filesystem - it
+doesn't have to create a new one.  The main result from the caller's
+point of view is a reference to dentry at the root of (sub)tree to be
+attached; creation of new superblock is a common side effect.
+
+The most interesting member of the superblock structure that the mount()
+method fills in is the "s_op" field.  This is a pointer to a "struct
+super_operations" which describes the next level of the filesystem
+implementation.
+
+Usually, a filesystem uses one of the generic mount() implementations
+and provides a fill_super() callback instead.  The generic variants are:
+
+``mount_bdev``: mount a filesystem residing on a block device
+
+``mount_nodev``: mount a filesystem that is not backed by a device
+
+``mount_single``: mount a filesystem which shares the instance between
+	all mounts
+
+A fill_super() callback implementation has the following arguments:
+
+``struct super_block *sb``: the superblock structure.  The callback
+	must initialize this properly.
+
+``void *data``: arbitrary mount options, usually comes as an ASCII
+	string (see "Mount Options" section)
+
+``int silent``: whether or not to be silent on error
+
+
+The Superblock Object
+=====================
+
+A superblock object represents a mounted filesystem.
+
+
+struct super_operations
+-----------------------
+
+This describes how the VFS can manipulate the superblock of your
+filesystem.  As of kernel 2.6.22, the following members are defined:
+
+.. code-block:: c
+
+	struct super_operations {
+		struct inode *(*alloc_inode)(struct super_block *sb);
+		void (*destroy_inode)(struct inode *);
+
+		void (*dirty_inode) (struct inode *, int flags);
+		int (*write_inode) (struct inode *, int);
+		void (*drop_inode) (struct inode *);
+		void (*delete_inode) (struct inode *);
+		void (*put_super) (struct super_block *);
+		int (*sync_fs)(struct super_block *sb, int wait);
+		int (*freeze_fs) (struct super_block *);
+		int (*unfreeze_fs) (struct super_block *);
+		int (*statfs) (struct dentry *, struct kstatfs *);
+		int (*remount_fs) (struct super_block *, int *, char *);
+		void (*clear_inode) (struct inode *);
+		void (*umount_begin) (struct super_block *);
+
+		int (*show_options)(struct seq_file *, struct dentry *);
+
+		ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
+		ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
+		int (*nr_cached_objects)(struct super_block *);
+		void (*free_cached_objects)(struct super_block *, int);
+	};
+
+All methods are called without any locks being held, unless otherwise
+noted.  This means that most methods can block safely.  All methods are
+only called from a process context (i.e. not from an interrupt handler
+or bottom half).
+
+``alloc_inode``: this method is called by alloc_inode() to allocate memory
+	for struct inode and initialize it.  If this function is not
+	defined, a simple 'struct inode' is allocated.  Normally
+	alloc_inode will be used to allocate a larger structure which
+	contains a 'struct inode' embedded within it.
+
+``destroy_inode``: this method is called by destroy_inode() to release
+	resources allocated for struct inode.  It is only required if
+	->alloc_inode was defined and simply undoes anything done by
+	->alloc_inode.
+
+``dirty_inode``: this method is called by the VFS to mark an inode dirty.
+
+``write_inode``: this method is called when the VFS needs to write an
+	inode to disc.  The second parameter indicates whether the write
+	should be synchronous or not, not all filesystems check this flag.
+
+``drop_inode``: called when the last access to the inode is dropped,
+	with the inode->i_lock spinlock held.
+
+	This method should be either NULL (normal UNIX filesystem
+	semantics) or "generic_delete_inode" (for filesystems that do not
+	want to cache inodes - causing "delete_inode" to always be
+	called regardless of the value of i_nlink)
+
+	The "generic_delete_inode()" behavior is equivalent to the
+	old practice of using "force_delete" in the put_inode() case,
+	but does not have the races that the "force_delete()" approach
+	had. 
+
+``delete_inode``: called when the VFS wants to delete an inode
+
+``put_super``: called when the VFS wishes to free the superblock
+	(i.e. unmount).  This is called with the superblock lock held
+
+``sync_fs``: called when VFS is writing out all dirty data associated with
+	a superblock.  The second parameter indicates whether the method
+	should wait until the write out has been completed.  Optional.
+
+``freeze_fs``: called when VFS is locking a filesystem and
+	forcing it into a consistent state.  This method is currently
+	used by the Logical Volume Manager (LVM).
+
+``unfreeze_fs``: called when VFS is unlocking a filesystem and making it writable
+	again.
+
+``statfs``: called when the VFS needs to get filesystem statistics.
+
+``remount_fs``: called when the filesystem is remounted.  This is called
+	with the kernel lock held
+
+``clear_inode``: called then the VFS clears the inode.  Optional
+
+``umount_begin``: called when the VFS is unmounting a filesystem.
+
+``show_options``: called by the VFS to show mount options for
+	/proc/<pid>/mounts.  (see "Mount Options" section)
+
+``quota_read``: called by the VFS to read from filesystem quota file.
+
+``quota_write``: called by the VFS to write to filesystem quota file.
+
+``nr_cached_objects``: called by the sb cache shrinking function for the
+	filesystem to return the number of freeable cached objects it contains.
+	Optional.
+
+``free_cache_objects``: called by the sb cache shrinking function for the
+	filesystem to scan the number of objects indicated to try to free them.
+	Optional, but any filesystem implementing this method needs to also
+	implement ->nr_cached_objects for it to be called correctly.
+
+	We can't do anything with any errors that the filesystem might
+	encountered, hence the void return type.  This will never be called if
+	the VM is trying to reclaim under GFP_NOFS conditions, hence this
+	method does not need to handle that situation itself.
+
+	Implementations must include conditional reschedule calls inside any
+	scanning loop that is done.  This allows the VFS to determine
+	appropriate scan batch sizes without having to worry about whether
+	implementations will cause holdoff problems due to large scan batch
+	sizes.
+
+Whoever sets up the inode is responsible for filling in the "i_op"
+field.  This is a pointer to a "struct inode_operations" which describes
+the methods that can be performed on individual inodes.
+
+
+struct xattr_handlers
+---------------------
+
+On filesystems that support extended attributes (xattrs), the s_xattr
+superblock field points to a NULL-terminated array of xattr handlers.
+Extended attributes are name:value pairs.
+
+``name``: Indicates that the handler matches attributes with the specified name
+	(such as "system.posix_acl_access"); the prefix field must be NULL.
+
+``prefix``: Indicates that the handler matches all attributes with the specified
+	name prefix (such as "user."); the name field must be NULL.
+
+``list``: Determine if attributes matching this xattr handler should be listed
+	for a particular dentry.  Used by some listxattr implementations like
+	generic_listxattr.
+
+``get``: Called by the VFS to get the value of a particular extended attribute.
+	This method is called by the getxattr(2) system call.
+
+``set``: Called by the VFS to set the value of a particular extended attribute.
+	When the new value is NULL, called to remove a particular extended
+	attribute.  This method is called by the the setxattr(2) and
+	removexattr(2) system calls.
+
+When none of the xattr handlers of a filesystem match the specified
+attribute name or when a filesystem doesn't support extended attributes,
+the various ``*xattr(2)`` system calls return -EOPNOTSUPP.
+
+
+The Inode Object
+================
+
+An inode object represents an object within the filesystem.
+
+
+struct inode_operations
+-----------------------
+
+This describes how the VFS can manipulate an inode in your filesystem.
+As of kernel 2.6.22, the following members are defined:
+
+.. code-block:: c
+
+	struct inode_operations {
+		int (*create) (struct inode *,struct dentry *, umode_t, bool);
+		struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
+		int (*link) (struct dentry *,struct inode *,struct dentry *);
+		int (*unlink) (struct inode *,struct dentry *);
+		int (*symlink) (struct inode *,struct dentry *,const char *);
+		int (*mkdir) (struct inode *,struct dentry *,umode_t);
+		int (*rmdir) (struct inode *,struct dentry *);
+		int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
+		int (*rename) (struct inode *, struct dentry *,
+			       struct inode *, struct dentry *, unsigned int);
+		int (*readlink) (struct dentry *, char __user *,int);
+		const char *(*get_link) (struct dentry *, struct inode *,
+					 struct delayed_call *);
+		int (*permission) (struct inode *, int);
+		int (*get_acl)(struct inode *, int);
+		int (*setattr) (struct dentry *, struct iattr *);
+		int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
+		ssize_t (*listxattr) (struct dentry *, char *, size_t);
+		void (*update_time)(struct inode *, struct timespec *, int);
+		int (*atomic_open)(struct inode *, struct dentry *, struct file *,
+				   unsigned open_flag, umode_t create_mode);
+		int (*tmpfile) (struct inode *, struct dentry *, umode_t);
+	};
+
+Again, all methods are called without any locks being held, unless
+otherwise noted.
+
+``create``: called by the open(2) and creat(2) system calls.  Only
+	required if you want to support regular files.  The dentry you
+	get should not have an inode (i.e. it should be a negative
+	dentry).  Here you will probably call d_instantiate() with the
+	dentry and the newly created inode
+
+``lookup``: called when the VFS needs to look up an inode in a parent
+	directory.  The name to look for is found in the dentry.  This
+	method must call d_add() to insert the found inode into the
+	dentry.  The "i_count" field in the inode structure should be
+	incremented.  If the named inode does not exist a NULL inode
+	should be inserted into the dentry (this is called a negative
+	dentry).  Returning an error code from this routine must only
+	be done on a real error, otherwise creating inodes with system
+	calls like create(2), mknod(2), mkdir(2) and so on will fail.
+	If you wish to overload the dentry methods then you should
+	initialise the "d_dop" field in the dentry; this is a pointer
+	to a struct "dentry_operations".
+	This method is called with the directory inode semaphore held
+
+``link``: called by the link(2) system call.  Only required if you want
+	to support hard links.  You will probably need to call
+	d_instantiate() just as you would in the create() method
+
+``unlink``: called by the unlink(2) system call.  Only required if you
+	want to support deleting inodes
+
+``symlink``: called by the symlink(2) system call.  Only required if you
+	want to support symlinks.  You will probably need to call
+	d_instantiate() just as you would in the create() method
+
+``mkdir``: called by the mkdir(2) system call.  Only required if you want
+	to support creating subdirectories.  You will probably need to
+	call d_instantiate() just as you would in the create() method
+
+``rmdir``: called by the rmdir(2) system call.  Only required if you want
+	to support deleting subdirectories
+
+``mknod``: called by the mknod(2) system call to create a device (char,
+	block) inode or a named pipe (FIFO) or socket.  Only required
+	if you want to support creating these types of inodes.  You
+	will probably need to call d_instantiate() just as you would
+	in the create() method
+
+``rename``: called by the rename(2) system call to rename the object to
+	have the parent and name given by the second inode and dentry.
+
+	The filesystem must return -EINVAL for any unsupported or
+	unknown	flags.  Currently the following flags are implemented:
+	(1) RENAME_NOREPLACE: this flag indicates that if the target
+	of the rename exists the rename should fail with -EEXIST
+	instead of replacing the target.  The VFS already checks for
+	existence, so for local filesystems the RENAME_NOREPLACE
+	implementation is equivalent to plain rename.
+	(2) RENAME_EXCHANGE: exchange source and target.  Both must
+	exist; this is checked by the VFS.  Unlike plain rename,
+	source and target may be of different type.
+
+``get_link``: called by the VFS to follow a symbolic link to the
+	inode it points to.  Only required if you want to support
+	symbolic links.  This method returns the symlink body
+	to traverse (and possibly resets the current position with
+	nd_jump_link()).  If the body won't go away until the inode
+	is gone, nothing else is needed; if it needs to be otherwise
+	pinned, arrange for its release by having get_link(..., ..., done)
+	do set_delayed_call(done, destructor, argument).
+	In that case destructor(argument) will be called once VFS is
+	done with the body you've returned.
+	May be called in RCU mode; that is indicated by NULL dentry
+	argument.  If request can't be handled without leaving RCU mode,
+	have it return ERR_PTR(-ECHILD).
+
+
+	If the filesystem stores the symlink target in ->i_link, the
+	VFS may use it directly without calling ->get_link(); however,
+	->get_link() must still be provided.  ->i_link must not be
+	freed until after an RCU grace period.  Writing to ->i_link
+	post-iget() time requires a 'release' memory barrier.
+
+``readlink``: this is now just an override for use by readlink(2) for the
+	cases when ->get_link uses nd_jump_link() or object is not in
+	fact a symlink.  Normally filesystems should only implement
+	->get_link for symlinks and readlink(2) will automatically use
+	that.
+
+``permission``: called by the VFS to check for access rights on a POSIX-like
+	filesystem.
+
+	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in rcu-walk
+	mode, the filesystem must check the permission without blocking or
+	storing to the inode.
+
+	If a situation is encountered that rcu-walk cannot handle, return
+	-ECHILD and it will be called again in ref-walk mode.
+
+``setattr``: called by the VFS to set attributes for a file.  This method
+	is called by chmod(2) and related system calls.
+
+``getattr``: called by the VFS to get attributes of a file.  This method
+	is called by stat(2) and related system calls.
+
+``listxattr``: called by the VFS to list all extended attributes for a
+	given file.  This method is called by the listxattr(2) system call.
+
+``update_time``: called by the VFS to update a specific time or the i_version of
+	an inode.  If this is not defined the VFS will update the inode itself
+	and call mark_inode_dirty_sync.
+
+``atomic_open``: called on the last component of an open.  Using this optional
+	method the filesystem can look up, possibly create and open the file in
+	one atomic operation.  If it wants to leave actual opening to the
+	caller (e.g. if the file turned out to be a symlink, device, or just
+	something filesystem won't do atomic open for), it may signal this by
+	returning finish_no_open(file, dentry).  This method is only called if
+	the last component is negative or needs lookup.  Cached positive dentries
+	are still handled by f_op->open().  If the file was created,
+	FMODE_CREATED flag should be set in file->f_mode.  In case of O_EXCL
+	the method must only succeed if the file didn't exist and hence FMODE_CREATED
+	shall always be set on success.
+
+``tmpfile``: called in the end of O_TMPFILE open().  Optional, equivalent to
+	atomically creating, opening and unlinking a file in given directory.
+
+
+The Address Space Object
+========================
+
+The address space object is used to group and manage pages in the page
+cache.  It can be used to keep track of the pages in a file (or anything
+else) and also track the mapping of sections of the file into process
+address spaces.
+
+There are a number of distinct yet related services that an
+address-space can provide.  These include communicating memory pressure,
+page lookup by address, and keeping track of pages tagged as Dirty or
+Writeback.
+
+The first can be used independently to the others.  The VM can try to
+either write dirty pages in order to clean them, or release clean pages
+in order to reuse them.  To do this it can call the ->writepage method
+on dirty pages, and ->releasepage on clean pages with PagePrivate set.
+Clean pages without PagePrivate and with no external references will be
+released without notice being given to the address_space.
+
+To achieve this functionality, pages need to be placed on an LRU with
+lru_cache_add and mark_page_active needs to be called whenever the page
+is used.
+
+Pages are normally kept in a radix tree index by ->index.  This tree
+maintains information about the PG_Dirty and PG_Writeback status of each
+page, so that pages with either of these flags can be found quickly.
+
+The Dirty tag is primarily used by mpage_writepages - the default
+->writepages method.  It uses the tag to find dirty pages to call
+->writepage on.  If mpage_writepages is not used (i.e. the address
+provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
+unused.  write_inode_now and sync_inode do use it (through
+__sync_single_inode) to check if ->writepages has been successful in
+writing out the whole address_space.
+
+The Writeback tag is used by filemap*wait* and sync_page* functions, via
+filemap_fdatawait_range, to wait for all writeback to complete.
+
+An address_space handler may attach extra information to a page,
+typically using the 'private' field in the 'struct page'.  If such
+information is attached, the PG_Private flag should be set.  This will
+cause various VM routines to make extra calls into the address_space
+handler to deal with that data.
+
+An address space acts as an intermediate between storage and
+application.  Data is read into the address space a whole page at a
+time, and provided to the application either by copying of the page, or
+by memory-mapping the page.  Data is written into the address space by
+the application, and then written-back to storage typically in whole
+pages, however the address_space has finer control of write sizes.
+
+The read process essentially only requires 'readpage'.  The write
+process is more complicated and uses write_begin/write_end or
+set_page_dirty to write data into the address_space, and writepage and
+writepages to writeback data to storage.
+
+Adding and removing pages to/from an address_space is protected by the
+inode's i_mutex.
+
+When data is written to a page, the PG_Dirty flag should be set.  It
+typically remains set until writepage asks for it to be written.  This
+should clear PG_Dirty and set PG_Writeback.  It can be actually written
+at any point after PG_Dirty is clear.  Once it is known to be safe,
+PG_Writeback is cleared.
+
+Writeback makes use of a writeback_control structure to direct the
+operations.  This gives the the writepage and writepages operations some
+information about the nature of and reason for the writeback request,
+and the constraints under which it is being done.  It is also used to
+return information back to the caller about the result of a writepage or
+writepages request.
+
+
+Handling errors during writeback
+--------------------------------
+
+Most applications that do buffered I/O will periodically call a file
+synchronization call (fsync, fdatasync, msync or sync_file_range) to
+ensure that data written has made it to the backing store.  When there
+is an error during writeback, they expect that error to be reported when
+a file sync request is made.  After an error has been reported on one
+request, subsequent requests on the same file descriptor should return
+0, unless further writeback errors have occurred since the previous file
+syncronization.
+
+Ideally, the kernel would report errors only on file descriptions on
+which writes were done that subsequently failed to be written back.  The
+generic pagecache infrastructure does not track the file descriptions
+that have dirtied each individual page however, so determining which
+file descriptors should get back an error is not possible.
+
+Instead, the generic writeback error tracking infrastructure in the
+kernel settles for reporting errors to fsync on all file descriptions
+that were open at the time that the error occurred.  In a situation with
+multiple writers, all of them will get back an error on a subsequent
+fsync, even if all of the writes done through that particular file
+descriptor succeeded (or even if there were no writes on that file
+descriptor at all).
+
+Filesystems that wish to use this infrastructure should call
+mapping_set_error to record the error in the address_space when it
+occurs.  Then, after writing back data from the pagecache in their
+file->fsync operation, they should call file_check_and_advance_wb_err to
+ensure that the struct file's error cursor has advanced to the correct
+point in the stream of errors emitted by the backing device(s).
+
+
+struct address_space_operations
+-------------------------------
+
+This describes how the VFS can manipulate mapping of a file to page
+cache in your filesystem.  The following members are defined:
+
+.. code-block:: c
+
+	struct address_space_operations {
+		int (*writepage)(struct page *page, struct writeback_control *wbc);
+		int (*readpage)(struct file *, struct page *);
+		int (*writepages)(struct address_space *, struct writeback_control *);
+		int (*set_page_dirty)(struct page *page);
+		int (*readpages)(struct file *filp, struct address_space *mapping,
+				 struct list_head *pages, unsigned nr_pages);
+		int (*write_begin)(struct file *, struct address_space *mapping,
+				   loff_t pos, unsigned len, unsigned flags,
+				struct page **pagep, void **fsdata);
+		int (*write_end)(struct file *, struct address_space *mapping,
+				 loff_t pos, unsigned len, unsigned copied,
+				 struct page *page, void *fsdata);
+		sector_t (*bmap)(struct address_space *, sector_t);
+		void (*invalidatepage) (struct page *, unsigned int, unsigned int);
+		int (*releasepage) (struct page *, int);
+		void (*freepage)(struct page *);
+		ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
+		/* isolate a page for migration */
+		bool (*isolate_page) (struct page *, isolate_mode_t);
+		/* migrate the contents of a page to the specified target */
+		int (*migratepage) (struct page *, struct page *);
+		/* put migration-failed page back to right list */
+		void (*putback_page) (struct page *);
+		int (*launder_page) (struct page *);
+
+		int (*is_partially_uptodate) (struct page *, unsigned long,
+					      unsigned long);
+		void (*is_dirty_writeback) (struct page *, bool *, bool *);
+		int (*error_remove_page) (struct mapping *mapping, struct page *page);
+		int (*swap_activate)(struct file *);
+		int (*swap_deactivate)(struct file *);
+	};
+
+``writepage``: called by the VM to write a dirty page to backing store.
+      This may happen for data integrity reasons (i.e. 'sync'), or
+      to free up memory (flush).  The difference can be seen in
+      wbc->sync_mode.
+      The PG_Dirty flag has been cleared and PageLocked is true.
+      writepage should start writeout, should set PG_Writeback,
+      and should make sure the page is unlocked, either synchronously
+      or asynchronously when the write operation completes.
+
+      If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
+      try too hard if there are problems, and may choose to write out
+      other pages from the mapping if that is easier (e.g. due to
+      internal dependencies).  If it chooses not to start writeout, it
+      should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep
+      calling ->writepage on that page.
+
+      See the file "Locking" for more details.
+
+``readpage``: called by the VM to read a page from backing store.
+       The page will be Locked when readpage is called, and should be
+       unlocked and marked uptodate once the read completes.
+       If ->readpage discovers that it needs to unlock the page for
+       some reason, it can do so, and then return AOP_TRUNCATED_PAGE.
+       In this case, the page will be relocated, relocked and if
+       that all succeeds, ->readpage will be called again.
+
+``writepages``: called by the VM to write out pages associated with the
+	address_space object.  If wbc->sync_mode is WBC_SYNC_ALL, then
+	the writeback_control will specify a range of pages that must be
+	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
+	and that many pages should be written if possible.
+	If no ->writepages is given, then mpage_writepages is used
+	instead.  This will choose pages from the address space that are
+	tagged as DIRTY and will pass them to ->writepage.
+
+``set_page_dirty``: called by the VM to set a page dirty.
+	This is particularly needed if an address space attaches
+	private data to a page, and that data needs to be updated when
+	a page is dirtied.  This is called, for example, when a memory
+	mapped page gets modified.
+	If defined, it should set the PageDirty flag, and the
+	PAGECACHE_TAG_DIRTY tag in the radix tree.
+
+``readpages``: called by the VM to read pages associated with the address_space
+	object.  This is essentially just a vector version of
+	readpage.  Instead of just one page, several pages are
+	requested.
+	readpages is only used for read-ahead, so read errors are
+	ignored.  If anything goes wrong, feel free to give up.
+
+``write_begin``:
+	Called by the generic buffered write code to ask the filesystem to
+	prepare to write len bytes at the given offset in the file.  The
+	address_space should check that the write will be able to complete,
+	by allocating space if necessary and doing any other internal
+	housekeeping.  If the write will update parts of any basic-blocks on
+	storage, then those blocks should be pre-read (if they haven't been
+	read already) so that the updated blocks can be written out properly.
+
+	The filesystem must return the locked pagecache page for the specified
+	offset, in ``*pagep``, for the caller to write into.
+
+	It must be able to cope with short writes (where the length passed to
+	write_begin is greater than the number of bytes copied into the page).
+
+	flags is a field for AOP_FLAG_xxx flags, described in
+	include/linux/fs.h.
+
+	A void * may be returned in fsdata, which then gets passed into
+	write_end.
+
+	Returns 0 on success; < 0 on failure (which is the error code), in
+	which case write_end is not called.
+
+``write_end``: After a successful write_begin, and data copy, write_end must
+	be called.  len is the original len passed to write_begin, and copied
+	is the amount that was able to be copied.
+
+	The filesystem must take care of unlocking the page and releasing it
+	refcount, and updating i_size.
+
+	Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
+	that were able to be copied into pagecache.
+
+``bmap``: called by the VFS to map a logical block offset within object to
+	physical block number.  This method is used by the FIBMAP
+	ioctl and for working with swap-files.  To be able to swap to
+	a file, the file must have a stable mapping to a block
+	device.  The swap system does not go through the filesystem
+	but instead uses bmap to find out where the blocks in the file
+	are and uses those addresses directly.
+
+``invalidatepage``: If a page has PagePrivate set, then invalidatepage
+	will be called when part or all of the page is to be removed
+	from the address space.  This generally corresponds to either a
+	truncation, punch hole  or a complete invalidation of the address
+	space (in the latter case 'offset' will always be 0 and 'length'
+	will be PAGE_SIZE).  Any private data associated with the page
+	should be updated to reflect this truncation.  If offset is 0 and
+	length is PAGE_SIZE, then the private data should be released,
+	because the page must be able to be completely discarded.  This may
+	be done by calling the ->releasepage function, but in this case the
+	release MUST succeed.
+
+``releasepage``: releasepage is called on PagePrivate pages to indicate
+	that the page should be freed if possible.  ->releasepage
+	should remove any private data from the page and clear the
+	PagePrivate flag.  If releasepage() fails for some reason, it must
+	indicate failure with a 0 return value.
+	releasepage() is used in two distinct though related cases.  The
+	first is when the VM finds a clean page with no active users and
+	wants to make it a free page.  If ->releasepage succeeds, the
+	page will be removed from the address_space and become free.
+
+	The second case is when a request has been made to invalidate
+	some or all pages in an address_space.  This can happen
+	through the fadvise(POSIX_FADV_DONTNEED) system call or by the
+	filesystem explicitly requesting it as nfs and 9fs do (when
+	they believe the cache may be out of date with storage) by
+	calling invalidate_inode_pages2().
+	If the filesystem makes such a call, and needs to be certain
+	that all pages are invalidated, then its releasepage will
+	need to ensure this.  Possibly it can clear the PageUptodate
+	bit if it cannot free private data yet.
+
+``freepage``: freepage is called once the page is no longer visible in
+	the page cache in order to allow the cleanup of any private
+	data.  Since it may be called by the memory reclaimer, it
+	should not assume that the original address_space mapping still
+	exists, and it should not block.
+
+``direct_IO``: called by the generic read/write routines to perform
+	direct_IO - that is IO requests which bypass the page cache
+	and transfer data directly between the storage and the
+	application's address space.
+
+``isolate_page``: Called by the VM when isolating a movable non-lru page.
+	If page is successfully isolated, VM marks the page as PG_isolated
+	via __SetPageIsolated.
+
+``migrate_page``:  This is used to compact the physical memory usage.
+	If the VM wants to relocate a page (maybe off a memory card
+	that is signalling imminent failure) it will pass a new page
+	and an old page to this function.  migrate_page should
+	transfer any private data across and update any references
+	that it has to the page.
+
+``putback_page``: Called by the VM when isolated page's migration fails.
+
+``launder_page``: Called before freeing a page - it writes back the dirty page.  To
+	prevent redirtying the page, it is kept locked during the whole
+	operation.
+
+``is_partially_uptodate``: Called by the VM when reading a file through the
+	pagecache when the underlying blocksize != pagesize.  If the required
+	block is up to date then the read can complete without needing the IO
+	to bring the whole page up to date.
+
+``is_dirty_writeback``: Called by the VM when attempting to reclaim a page.
+	The VM uses dirty and writeback information to determine if it needs
+	to stall to allow flushers a chance to complete some IO.  Ordinarily
+	it can use PageDirty and PageWriteback but some filesystems have
+	more complex state (unstable pages in NFS prevent reclaim) or
+	do not set those flags due to locking problems.  This callback
+	allows a filesystem to indicate to the VM if a page should be
+	treated as dirty or writeback for the purposes of stalling.
+
+``error_remove_page``: normally set to generic_error_remove_page if truncation
+	is ok for this address space.  Used for memory failure handling.
+	Setting this implies you deal with pages going away under you,
+	unless you have them locked or reference counts increased.
+
+``swap_activate``: Called when swapon is used on a file to allocate
+	space if necessary and pin the block lookup information in
+	memory.  A return value of zero indicates success,
+	in which case this file can be used to back swapspace.
+
+``swap_deactivate``: Called during swapoff on files where swap_activate
+	was successful.
+
+
+The File Object
+===============
+
+A file object represents a file opened by a process.  This is also known
+as an "open file description" in POSIX parlance.
+
+
+struct file_operations
+----------------------
+
+This describes how the VFS can manipulate an open file.  As of kernel
+4.18, the following members are defined:
+
+.. code-block:: c
+
+	struct file_operations {
+		struct module *owner;
+		loff_t (*llseek) (struct file *, loff_t, int);
+		ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
+		ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
+		ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
+		ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
+		int (*iopoll)(struct kiocb *kiocb, bool spin);
+		int (*iterate) (struct file *, struct dir_context *);
+		int (*iterate_shared) (struct file *, struct dir_context *);
+		__poll_t (*poll) (struct file *, struct poll_table_struct *);
+		long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
+		long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+		int (*mmap) (struct file *, struct vm_area_struct *);
+		int (*open) (struct inode *, struct file *);
+		int (*flush) (struct file *, fl_owner_t id);
+		int (*release) (struct inode *, struct file *);
+		int (*fsync) (struct file *, loff_t, loff_t, int datasync);
+		int (*fasync) (int, struct file *, int);
+		int (*lock) (struct file *, int, struct file_lock *);
+		ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
+		unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+		int (*check_flags)(int);
+		int (*flock) (struct file *, int, struct file_lock *);
+		ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
+		ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
+		int (*setlease)(struct file *, long, struct file_lock **, void **);
+		long (*fallocate)(struct file *file, int mode, loff_t offset,
+				  loff_t len);
+		void (*show_fdinfo)(struct seq_file *m, struct file *f);
+	#ifndef CONFIG_MMU
+		unsigned (*mmap_capabilities)(struct file *);
+	#endif
+		ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
+		loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
+					   struct file *file_out, loff_t pos_out,
+					   loff_t len, unsigned int remap_flags);
+		int (*fadvise)(struct file *, loff_t, loff_t, int);
+	};
+
+Again, all methods are called without any locks being held, unless
+otherwise noted.
+
+``llseek``: called when the VFS needs to move the file position index
+
+``read``: called by read(2) and related system calls
+
+``read_iter``: possibly asynchronous read with iov_iter as destination
+
+``write``: called by write(2) and related system calls
+
+``write_iter``: possibly asynchronous write with iov_iter as source
+
+``iopoll``: called when aio wants to poll for completions on HIPRI iocbs
+
+``iterate``: called when the VFS needs to read the directory contents
+
+``iterate_shared``: called when the VFS needs to read the directory contents
+	when filesystem supports concurrent dir iterators
+
+``poll``: called by the VFS when a process wants to check if there is
+	activity on this file and (optionally) go to sleep until there
+	is activity.  Called by the select(2) and poll(2) system calls
+
+``unlocked_ioctl``: called by the ioctl(2) system call.
+
+``compat_ioctl``: called by the ioctl(2) system call when 32 bit system calls
+	 are used on 64 bit kernels.
+
+``mmap``: called by the mmap(2) system call
+
+``open``: called by the VFS when an inode should be opened.  When the VFS
+	opens a file, it creates a new "struct file".  It then calls the
+	open method for the newly allocated file structure.  You might
+	think that the open method really belongs in
+	"struct inode_operations", and you may be right.  I think it's
+	done the way it is because it makes filesystems simpler to
+	implement.  The open() method is a good place to initialize the
+	"private_data" member in the file structure if you want to point
+	to a device structure
+
+``flush``: called by the close(2) system call to flush a file
+
+``release``: called when the last reference to an open file is closed
+
+``fsync``: called by the fsync(2) system call.  Also see the section above
+	 entitled "Handling errors during writeback".
+
+``fasync``: called by the fcntl(2) system call when asynchronous
+	(non-blocking) mode is enabled for a file
+
+``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
+	commands
+
+``get_unmapped_area``: called by the mmap(2) system call
+
+``check_flags``: called by the fcntl(2) system call for F_SETFL command
+
+``flock``: called by the flock(2) system call
+
+``splice_write``: called by the VFS to splice data from a pipe to a file.  This
+		method is used by the splice(2) system call
+
+``splice_read``: called by the VFS to splice data from file to a pipe.  This
+	       method is used by the splice(2) system call
+
+``setlease``: called by the VFS to set or release a file lock lease.  setlease
+	    implementations should call generic_setlease to record or remove
+	    the lease in the inode after setting it.
+
+``fallocate``: called by the VFS to preallocate blocks or punch a hole.
+
+``copy_file_range``: called by the copy_file_range(2) system call.
+
+``remap_file_range``: called by the ioctl(2) system call for FICLONERANGE and
+	FICLONE and FIDEDUPERANGE commands to remap file ranges.  An
+	implementation should remap len bytes at pos_in of the source file into
+	the dest file at pos_out.  Implementations must handle callers passing
+	in len == 0; this means "remap to the end of the source file".  The
+	return value should the number of bytes remapped, or the usual
+	negative error code if errors occurred before any bytes were remapped.
+	The remap_flags parameter accepts REMAP_FILE_* flags.  If
+	REMAP_FILE_DEDUP is set then the implementation must only remap if the
+	requested file ranges have identical contents.  If REMAP_CAN_SHORTEN is
+	set, the caller is ok with the implementation shortening the request
+	length to satisfy alignment or EOF requirements (or any other reason).
+
+``fadvise``: possibly called by the fadvise64() system call.
+
+Note that the file operations are implemented by the specific
+filesystem in which the inode resides.  When opening a device node
+(character or block special) most filesystems will call special
+support routines in the VFS which will locate the required device
+driver information.  These support routines replace the filesystem file
+operations with those for the device driver, and then proceed to call
+the new open() method for the file.  This is how opening a device file
+in the filesystem eventually ends up calling the device driver open()
+method.
+
+
+Directory Entry Cache (dcache)
+==============================
+
+
+struct dentry_operations
+------------------------
+
+This describes how a filesystem can overload the standard dentry
+operations.  Dentries and the dcache are the domain of the VFS and the
+individual filesystem implementations.  Device drivers have no business
+here.  These methods may be set to NULL, as they are either optional or
+the VFS uses a default.  As of kernel 2.6.22, the following members are
+defined:
+
+.. code-block:: c
+
+	struct dentry_operations {
+		int (*d_revalidate)(struct dentry *, unsigned int);
+		int (*d_weak_revalidate)(struct dentry *, unsigned int);
+		int (*d_hash)(const struct dentry *, struct qstr *);
+		int (*d_compare)(const struct dentry *,
+				 unsigned int, const char *, const struct qstr *);
+		int (*d_delete)(const struct dentry *);
+		int (*d_init)(struct dentry *);
+		void (*d_release)(struct dentry *);
+		void (*d_iput)(struct dentry *, struct inode *);
+		char *(*d_dname)(struct dentry *, char *, int);
+		struct vfsmount *(*d_automount)(struct path *);
+		int (*d_manage)(const struct path *, bool);
+		struct dentry *(*d_real)(struct dentry *, const struct inode *);
+	};
+
+``d_revalidate``: called when the VFS needs to revalidate a dentry.  This
+	is called whenever a name look-up finds a dentry in the
+	dcache.  Most local filesystems leave this as NULL, because all their
+	dentries in the dcache are valid.  Network filesystems are different
+	since things can change on the server without the client necessarily
+	being aware of it.
+
+	This function should return a positive value if the dentry is still
+	valid, and zero or a negative error code if it isn't.
+
+	d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU).
+	If in rcu-walk mode, the filesystem must revalidate the dentry without
+	blocking or storing to the dentry, d_parent and d_inode should not be
+	used without care (because they can change and, in d_inode case, even
+	become NULL under us).
+
+	If a situation is encountered that rcu-walk cannot handle, return
+	-ECHILD and it will be called again in ref-walk mode.
+
+``_weak_revalidate``: called when the VFS needs to revalidate a "jumped" dentry.
+	This is called when a path-walk ends at dentry that was not acquired by
+	doing a lookup in the parent directory.  This includes "/", "." and "..",
+	as well as procfs-style symlinks and mountpoint traversal.
+
+	In this case, we are less concerned with whether the dentry is still
+	fully correct, but rather that the inode is still valid.  As with
+	d_revalidate, most local filesystems will set this to NULL since their
+	dcache entries are always valid.
+
+	This function has the same return code semantics as d_revalidate.
+
+	d_weak_revalidate is only called after leaving rcu-walk mode.
+
+``d_hash``: called when the VFS adds a dentry to the hash table.  The first
+	dentry passed to d_hash is the parent directory that the name is
+	to be hashed into.
+
+	Same locking and synchronisation rules as d_compare regarding
+	what is safe to dereference etc.
+
+``d_compare``: called to compare a dentry name with a given name.  The first
+	dentry is the parent of the dentry to be compared, the second is
+	the child dentry.  len and name string are properties of the dentry
+	to be compared.  qstr is the name to compare it with.
+
+	Must be constant and idempotent, and should not take locks if
+	possible, and should not or store into the dentry.
+	Should not dereference pointers outside the dentry without
+	lots of care (eg.  d_parent, d_inode, d_name should not be used).
+
+	However, our vfsmount is pinned, and RCU held, so the dentries and
+	inodes won't disappear, neither will our sb or filesystem module.
+	->d_sb may be used.
+
+	It is a tricky calling convention because it needs to be called under
+	"rcu-walk", ie. without any locks or references on things.
+
+``d_delete``: called when the last reference to a dentry is dropped and the
+	dcache is deciding whether or not to cache it.  Return 1 to delete
+	immediately, or 0 to cache the dentry.  Default is NULL which means to
+	always cache a reachable dentry.  d_delete must be constant and
+	idempotent.
+
+``d_init``: called when a dentry is allocated
+
+``d_release``: called when a dentry is really deallocated
+
+``d_iput``: called when a dentry loses its inode (just prior to its
+	being deallocated).  The default when this is NULL is that the
+	VFS calls iput().  If you define this method, you must call
+	iput() yourself
+
+``d_dname``: called when the pathname of a dentry should be generated.
+	Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
+	pathname generation.  (Instead of doing it when dentry is created,
+	it's done only when the path is needed.).  Real filesystems probably
+	dont want to use it, because their dentries are present in global
+	dcache hash, so their hash should be an invariant.  As no lock is
+	held, d_dname() should not try to modify the dentry itself, unless
+	appropriate SMP safety is used.  CAUTION : d_path() logic is quite
+	tricky.  The correct way to return for example "Hello" is to put it
+	at the end of the buffer, and returns a pointer to the first char.
+	dynamic_dname() helper function is provided to take care of this.
+
+	Example :
+
+.. code-block:: c
+
+	static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
+	{
+		return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
+				dentry->d_inode->i_ino);
+	}
+
+``d_automount``: called when an automount dentry is to be traversed (optional).
+	This should create a new VFS mount record and return the record to the
+	caller.  The caller is supplied with a path parameter giving the
+	automount directory to describe the automount target and the parent
+	VFS mount record to provide inheritable mount parameters.  NULL should
+	be returned if someone else managed to make the automount first.  If
+	the vfsmount creation failed, then an error code should be returned.
+	If -EISDIR is returned, then the directory will be treated as an
+	ordinary directory and returned to pathwalk to continue walking.
+
+	If a vfsmount is returned, the caller will attempt to mount it on the
+	mountpoint and will remove the vfsmount from its expiration list in
+	the case of failure.  The vfsmount should be returned with 2 refs on
+	it to prevent automatic expiration - the caller will clean up the
+	additional ref.
+
+	This function is only used if DCACHE_NEED_AUTOMOUNT is set on the
+	dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is set on the
+	inode being added.
+
+``d_manage``: called to allow the filesystem to manage the transition from a
+	dentry (optional).  This allows autofs, for example, to hold up clients
+	waiting to explore behind a 'mountpoint' while letting the daemon go
+	past and construct the subtree there.  0 should be returned to let the
+	calling process continue.  -EISDIR can be returned to tell pathwalk to
+	use this directory as an ordinary directory and to ignore anything
+	mounted on it and not to check the automount flag.  Any other error
+	code will abort pathwalk completely.
+
+	If the 'rcu_walk' parameter is true, then the caller is doing a
+	pathwalk in RCU-walk mode.  Sleeping is not permitted in this mode,
+	and the caller can be asked to leave it and call again by returning
+	-ECHILD.  -EISDIR may also be returned to tell pathwalk to
+	ignore d_automount or any mounts.
+
+	This function is only used if DCACHE_MANAGE_TRANSIT is set on the
+	dentry being transited from.
+
+``d_real``: overlay/union type filesystems implement this method to return one of
+	the underlying dentries hidden by the overlay.  It is used in two
+	different modes:
+
+	Called from file_dentry() it returns the real dentry matching the inode
+	argument.  The real dentry may be from a lower layer already copied up,
+	but still referenced from the file.  This mode is selected with a
+	non-NULL inode argument.
+
+	With NULL inode the topmost real underlying dentry is returned.
+
+Each dentry has a pointer to its parent dentry, as well as a hash list
+of child dentries.  Child dentries are basically like files in a
+directory.
+
+
+Directory Entry Cache API
+--------------------------
+
+There are a number of functions defined which permit a filesystem to
+manipulate dentries:
+
+``dget``: open a new handle for an existing dentry (this just increments
+	the usage count)
+
+``dput``: close a handle for a dentry (decrements the usage count).  If
+	the usage count drops to 0, and the dentry is still in its
+	parent's hash, the "d_delete" method is called to check whether
+	it should be cached.  If it should not be cached, or if the dentry
+	is not hashed, it is deleted.  Otherwise cached dentries are put
+	into an LRU list to be reclaimed on memory shortage.
+
+``d_drop``: this unhashes a dentry from its parents hash list.  A
+	subsequent call to dput() will deallocate the dentry if its
+	usage count drops to 0
+
+``d_delete``: delete a dentry.  If there are no other open references to
+	the dentry then the dentry is turned into a negative dentry
+	(the d_iput() method is called).  If there are other
+	references, then d_drop() is called instead
+
+``d_add``: add a dentry to its parents hash list and then calls
+	d_instantiate()
+
+``d_instantiate``: add a dentry to the alias hash list for the inode and
+	updates the "d_inode" member.  The "i_count" member in the
+	inode structure should be set/incremented.  If the inode
+	pointer is NULL, the dentry is called a "negative
+	dentry".  This function is commonly called when an inode is
+	created for an existing negative dentry
+
+``d_lookup``: look up a dentry given its parent and path name component
+	It looks up the child of that given name from the dcache
+	hash table.  If it is found, the reference count is incremented
+	and the dentry is returned.  The caller must use dput()
+	to free the dentry when it finishes using it.
+
+
+Mount Options
+=============
+
+
+Parsing options
+---------------
+
+On mount and remount the filesystem is passed a string containing a
+comma separated list of mount options.  The options can have either of
+these forms:
+
+  option
+  option=value
+
+The <linux/parser.h> header defines an API that helps parse these
+options.  There are plenty of examples on how to use it in existing
+filesystems.
+
+
+Showing options
+---------------
+
+If a filesystem accepts mount options, it must define show_options() to
+show all the currently active options.  The rules are:
+
+  - options MUST be shown which are not default or their values differ
+    from the default
+
+  - options MAY be shown which are enabled by default or have their
+    default value
+
+Options used only internally between a mount helper and the kernel (such
+as file descriptors), or which only have an effect during the mounting
+(such as ones controlling the creation of a journal) are exempt from the
+above rules.
+
+The underlying reason for the above rules is to make sure, that a mount
+can be accurately replicated (e.g. umounting and mounting again) based
+on the information found in /proc/mounts.
+
+
+Resources
+=========
+
+(Note some of these resources are not up-to-date with the latest kernel
+ version.)
+
+Creating Linux virtual filesystems. 2002
+    <http://lwn.net/Articles/13325/>
+
+The Linux Virtual File-system Layer by Neil Brown. 1999
+    <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
+
+A tour of the Linux VFS by Michael K. Johnson. 1996
+    <http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
+
+A small trail through the Linux kernel by Andries Brouwer. 2001
+    <http://www.win.tue.nl/~aeb/linux/vfs/trail.html>
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
deleted file mode 100644
index 4f4f4931bfa0..000000000000
--- a/Documentation/filesystems/vfs.txt
+++ /dev/null
@@ -1,1274 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-=========================================
-Overview of the Linux Virtual File System
-=========================================
-
-Original author: Richard Gooch <rgooch@atnf.csiro.au>
-
-- Copyright (C) 1999 Richard Gooch
-- Copyright (C) 2005 Pekka Enberg
-
-
-Introduction
-============
-
-The Virtual File System (also known as the Virtual Filesystem Switch) is
-the software layer in the kernel that provides the filesystem interface
-to userspace programs.  It also provides an abstraction within the
-kernel which allows different filesystem implementations to coexist.
-
-VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
-are called from a process context.  Filesystem locking is described in
-the document Documentation/filesystems/Locking.
-
-
-Directory Entry Cache (dcache)
-------------------------------
-
-The VFS implements the open(2), stat(2), chmod(2), and similar system
-calls.  The pathname argument that is passed to them is used by the VFS
-to search through the directory entry cache (also known as the dentry
-cache or dcache).  This provides a very fast look-up mechanism to
-translate a pathname (filename) into a specific dentry.  Dentries live
-in RAM and are never saved to disc: they exist only for performance.
-
-The dentry cache is meant to be a view into your entire filespace.  As
-most computers cannot fit all dentries in the RAM at the same time, some
-bits of the cache are missing.  In order to resolve your pathname into a
-dentry, the VFS may have to resort to creating dentries along the way,
-and then loading the inode.  This is done by looking up the inode.
-
-
-The Inode Object
-----------------
-
-An individual dentry usually has a pointer to an inode.  Inodes are
-filesystem objects such as regular files, directories, FIFOs and other
-beasts.  They live either on the disc (for block device filesystems) or
-in the memory (for pseudo filesystems).  Inodes that live on the disc
-are copied into the memory when required and changes to the inode are
-written back to disc.  A single inode can be pointed to by multiple
-dentries (hard links, for example, do this).
-
-To look up an inode requires that the VFS calls the lookup() method of
-the parent directory inode.  This method is installed by the specific
-filesystem implementation that the inode lives in.  Once the VFS has the
-required dentry (and hence the inode), we can do all those boring things
-like open(2) the file, or stat(2) it to peek at the inode data.  The
-stat(2) operation is fairly simple: once the VFS has the dentry, it
-peeks at the inode data and passes some of it back to userspace.
-
-
-The File Object
----------------
-
-Opening a file requires another operation: allocation of a file
-structure (this is the kernel-side implementation of file descriptors).
-The freshly allocated file structure is initialized with a pointer to
-the dentry and a set of file operation member functions.  These are
-taken from the inode data.  The open() file method is then called so the
-specific filesystem implementation can do its work.  You can see that
-this is another switch performed by the VFS.  The file structure is
-placed into the file descriptor table for the process.
-
-Reading, writing and closing files (and other assorted VFS operations)
-is done by using the userspace file descriptor to grab the appropriate
-file structure, and then calling the required file structure method to
-do whatever is required.  For as long as the file is open, it keeps the
-dentry in use, which in turn means that the VFS inode is still in use.
-
-
-Registering and Mounting a Filesystem
-=====================================
-
-To register and unregister a filesystem, use the following API
-functions:
-
-   #include <linux/fs.h>
-
-   extern int register_filesystem(struct file_system_type *);
-   extern int unregister_filesystem(struct file_system_type *);
-
-The passed struct file_system_type describes your filesystem.  When a
-request is made to mount a filesystem onto a directory in your
-namespace, the VFS will call the appropriate mount() method for the
-specific filesystem.  New vfsmount referring to the tree returned by
-->mount() will be attached to the mountpoint, so that when pathname
-resolution reaches the mountpoint it will jump into the root of that
-vfsmount.
-
-You can see all filesystems that are registered to the kernel in the
-file /proc/filesystems.
-
-
-struct file_system_type
------------------------
-
-This describes the filesystem.  As of kernel 2.6.39, the following
-members are defined:
-
-struct file_system_type {
-	const char *name;
-	int fs_flags;
-	struct dentry *(*mount) (struct file_system_type *, int,
-		       const char *, void *);
-	void (*kill_sb) (struct super_block *);
-	struct module *owner;
-	struct file_system_type * next;
-	struct list_head fs_supers;
-	struct lock_class_key s_lock_key;
-	struct lock_class_key s_umount_key;
-};
-
-  name: the name of the filesystem type, such as "ext2", "iso9660",
-	"msdos" and so on
-
-  fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
-
-  mount: the method to call when a new instance of this
-	filesystem should be mounted
-
-  kill_sb: the method to call when an instance of this filesystem
-	should be shut down
-
-  owner: for internal VFS use: you should initialize this to THIS_MODULE in
-	most cases.
-
-  next: for internal VFS use: you should initialize this to NULL
-
-  s_lock_key, s_umount_key: lockdep-specific
-
-The mount() method has the following arguments:
-
-  struct file_system_type *fs_type: describes the filesystem, partly initialized
-	by the specific filesystem code
-
-  int flags: mount flags
-
-  const char *dev_name: the device name we are mounting.
-
-  void *data: arbitrary mount options, usually comes as an ASCII
-	string (see "Mount Options" section)
-
-The mount() method must return the root dentry of the tree requested by
-caller.  An active reference to its superblock must be grabbed and the
-superblock must be locked.  On failure it should return ERR_PTR(error).
-
-The arguments match those of mount(2) and their interpretation depends
-on filesystem type.  E.g. for block filesystems, dev_name is interpreted
-as block device name, that device is opened and if it contains a
-suitable filesystem image the method creates and initializes struct
-super_block accordingly, returning its root dentry to caller.
-
-->mount() may choose to return a subtree of existing filesystem - it
-doesn't have to create a new one.  The main result from the caller's
-point of view is a reference to dentry at the root of (sub)tree to be
-attached; creation of new superblock is a common side effect.
-
-The most interesting member of the superblock structure that the mount()
-method fills in is the "s_op" field.  This is a pointer to a "struct
-super_operations" which describes the next level of the filesystem
-implementation.
-
-Usually, a filesystem uses one of the generic mount() implementations
-and provides a fill_super() callback instead.  The generic variants are:
-
-  mount_bdev: mount a filesystem residing on a block device
-
-  mount_nodev: mount a filesystem that is not backed by a device
-
-  mount_single: mount a filesystem which shares the instance between
-	all mounts
-
-A fill_super() callback implementation has the following arguments:
-
-  struct super_block *sb: the superblock structure.  The callback
-	must initialize this properly.
-
-  void *data: arbitrary mount options, usually comes as an ASCII
-	string (see "Mount Options" section)
-
-  int silent: whether or not to be silent on error
-
-
-The Superblock Object
-=====================
-
-A superblock object represents a mounted filesystem.
-
-
-struct super_operations
------------------------
-
-This describes how the VFS can manipulate the superblock of your
-filesystem.  As of kernel 2.6.22, the following members are defined:
-
-struct super_operations {
-	struct inode *(*alloc_inode)(struct super_block *sb);
-	void (*destroy_inode)(struct inode *);
-
-	void (*dirty_inode) (struct inode *, int flags);
-	int (*write_inode) (struct inode *, int);
-	void (*drop_inode) (struct inode *);
-	void (*delete_inode) (struct inode *);
-	void (*put_super) (struct super_block *);
-	int (*sync_fs)(struct super_block *sb, int wait);
-	int (*freeze_fs) (struct super_block *);
-	int (*unfreeze_fs) (struct super_block *);
-	int (*statfs) (struct dentry *, struct kstatfs *);
-	int (*remount_fs) (struct super_block *, int *, char *);
-	void (*clear_inode) (struct inode *);
-	void (*umount_begin) (struct super_block *);
-
-	int (*show_options)(struct seq_file *, struct dentry *);
-
-	ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
-	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
-	int (*nr_cached_objects)(struct super_block *);
-	void (*free_cached_objects)(struct super_block *, int);
-};
-
-All methods are called without any locks being held, unless otherwise
-noted.  This means that most methods can block safely.  All methods are
-only called from a process context (i.e. not from an interrupt handler
-or bottom half).
-
-  alloc_inode: this method is called by alloc_inode() to allocate memory
-	for struct inode and initialize it.  If this function is not
-	defined, a simple 'struct inode' is allocated.  Normally
-	alloc_inode will be used to allocate a larger structure which
-	contains a 'struct inode' embedded within it.
-
-  destroy_inode: this method is called by destroy_inode() to release
-	resources allocated for struct inode.  It is only required if
-	->alloc_inode was defined and simply undoes anything done by
-	->alloc_inode.
-
-  dirty_inode: this method is called by the VFS to mark an inode dirty.
-
-  write_inode: this method is called when the VFS needs to write an
-	inode to disc.  The second parameter indicates whether the write
-	should be synchronous or not, not all filesystems check this flag.
-
-  drop_inode: called when the last access to the inode is dropped,
-	with the inode->i_lock spinlock held.
-
-	This method should be either NULL (normal UNIX filesystem
-	semantics) or "generic_delete_inode" (for filesystems that do not
-	want to cache inodes - causing "delete_inode" to always be
-	called regardless of the value of i_nlink)
-
-	The "generic_delete_inode()" behavior is equivalent to the
-	old practice of using "force_delete" in the put_inode() case,
-	but does not have the races that the "force_delete()" approach
-	had. 
-
-  delete_inode: called when the VFS wants to delete an inode
-
-  put_super: called when the VFS wishes to free the superblock
-	(i.e. unmount).  This is called with the superblock lock held
-
-  sync_fs: called when VFS is writing out all dirty data associated with
-	a superblock.  The second parameter indicates whether the method
-	should wait until the write out has been completed.  Optional.
-
-  freeze_fs: called when VFS is locking a filesystem and
-	forcing it into a consistent state.  This method is currently
-	used by the Logical Volume Manager (LVM).
-
-  unfreeze_fs: called when VFS is unlocking a filesystem and making it writable
-	again.
-
-  statfs: called when the VFS needs to get filesystem statistics.
-
-  remount_fs: called when the filesystem is remounted.  This is called
-	with the kernel lock held
-
-  clear_inode: called then the VFS clears the inode.  Optional
-
-  umount_begin: called when the VFS is unmounting a filesystem.
-
-  show_options: called by the VFS to show mount options for
-	/proc/<pid>/mounts.  (see "Mount Options" section)
-
-  quota_read: called by the VFS to read from filesystem quota file.
-
-  quota_write: called by the VFS to write to filesystem quota file.
-
-  nr_cached_objects: called by the sb cache shrinking function for the
-	filesystem to return the number of freeable cached objects it contains.
-	Optional.
-
-  free_cache_objects: called by the sb cache shrinking function for the
-	filesystem to scan the number of objects indicated to try to free them.
-	Optional, but any filesystem implementing this method needs to also
-	implement ->nr_cached_objects for it to be called correctly.
-
-	We can't do anything with any errors that the filesystem might
-	encountered, hence the void return type.  This will never be called if
-	the VM is trying to reclaim under GFP_NOFS conditions, hence this
-	method does not need to handle that situation itself.
-
-	Implementations must include conditional reschedule calls inside any
-	scanning loop that is done.  This allows the VFS to determine
-	appropriate scan batch sizes without having to worry about whether
-	implementations will cause holdoff problems due to large scan batch
-	sizes.
-
-Whoever sets up the inode is responsible for filling in the "i_op"
-field.  This is a pointer to a "struct inode_operations" which describes
-the methods that can be performed on individual inodes.
-
-
-struct xattr_handlers
----------------------
-
-On filesystems that support extended attributes (xattrs), the s_xattr
-superblock field points to a NULL-terminated array of xattr handlers.
-Extended attributes are name:value pairs.
-
-  name: Indicates that the handler matches attributes with the specified name
-	(such as "system.posix_acl_access"); the prefix field must be NULL.
-
-  prefix: Indicates that the handler matches all attributes with the specified
-	name prefix (such as "user."); the name field must be NULL.
-
-  list: Determine if attributes matching this xattr handler should be listed
-	for a particular dentry.  Used by some listxattr implementations like
-	generic_listxattr.
-
-  get: Called by the VFS to get the value of a particular extended attribute.
-	This method is called by the getxattr(2) system call.
-
-  set: Called by the VFS to set the value of a particular extended attribute.
-	When the new value is NULL, called to remove a particular extended
-	attribute.  This method is called by the the setxattr(2) and
-	removexattr(2) system calls.
-
-When none of the xattr handlers of a filesystem match the specified
-attribute name or when a filesystem doesn't support extended attributes,
-the various *xattr(2) system calls return -EOPNOTSUPP.
-
-
-The Inode Object
-================
-
-An inode object represents an object within the filesystem.
-
-
-struct inode_operations
------------------------
-
-This describes how the VFS can manipulate an inode in your filesystem.
-As of kernel 2.6.22, the following members are defined:
-
-struct inode_operations {
-	int (*create) (struct inode *,struct dentry *, umode_t, bool);
-	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
-	int (*link) (struct dentry *,struct inode *,struct dentry *);
-	int (*unlink) (struct inode *,struct dentry *);
-	int (*symlink) (struct inode *,struct dentry *,const char *);
-	int (*mkdir) (struct inode *,struct dentry *,umode_t);
-	int (*rmdir) (struct inode *,struct dentry *);
-	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
-	int (*rename) (struct inode *, struct dentry *,
-			struct inode *, struct dentry *, unsigned int);
-	int (*readlink) (struct dentry *, char __user *,int);
-	const char *(*get_link) (struct dentry *, struct inode *,
-				 struct delayed_call *);
-	int (*permission) (struct inode *, int);
-	int (*get_acl)(struct inode *, int);
-	int (*setattr) (struct dentry *, struct iattr *);
-	int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
-	ssize_t (*listxattr) (struct dentry *, char *, size_t);
-	void (*update_time)(struct inode *, struct timespec *, int);
-	int (*atomic_open)(struct inode *, struct dentry *, struct file *,
-			unsigned open_flag, umode_t create_mode);
-	int (*tmpfile) (struct inode *, struct dentry *, umode_t);
-};
-
-Again, all methods are called without any locks being held, unless
-otherwise noted.
-
-  create: called by the open(2) and creat(2) system calls.  Only
-	required if you want to support regular files.  The dentry you
-	get should not have an inode (i.e. it should be a negative
-	dentry).  Here you will probably call d_instantiate() with the
-	dentry and the newly created inode
-
-  lookup: called when the VFS needs to look up an inode in a parent
-	directory.  The name to look for is found in the dentry.  This
-	method must call d_add() to insert the found inode into the
-	dentry.  The "i_count" field in the inode structure should be
-	incremented.  If the named inode does not exist a NULL inode
-	should be inserted into the dentry (this is called a negative
-	dentry).  Returning an error code from this routine must only
-	be done on a real error, otherwise creating inodes with system
-	calls like create(2), mknod(2), mkdir(2) and so on will fail.
-	If you wish to overload the dentry methods then you should
-	initialise the "d_dop" field in the dentry; this is a pointer
-	to a struct "dentry_operations".
-	This method is called with the directory inode semaphore held
-
-  link: called by the link(2) system call.  Only required if you want
-	to support hard links.  You will probably need to call
-	d_instantiate() just as you would in the create() method
-
-  unlink: called by the unlink(2) system call.  Only required if you
-	want to support deleting inodes
-
-  symlink: called by the symlink(2) system call.  Only required if you
-	want to support symlinks.  You will probably need to call
-	d_instantiate() just as you would in the create() method
-
-  mkdir: called by the mkdir(2) system call.  Only required if you want
-	to support creating subdirectories.  You will probably need to
-	call d_instantiate() just as you would in the create() method
-
-  rmdir: called by the rmdir(2) system call.  Only required if you want
-	to support deleting subdirectories
-
-  mknod: called by the mknod(2) system call to create a device (char,
-	block) inode or a named pipe (FIFO) or socket.  Only required
-	if you want to support creating these types of inodes.  You
-	will probably need to call d_instantiate() just as you would
-	in the create() method
-
-  rename: called by the rename(2) system call to rename the object to
-	have the parent and name given by the second inode and dentry.
-
-	The filesystem must return -EINVAL for any unsupported or
-	unknown	flags.  Currently the following flags are implemented:
-	(1) RENAME_NOREPLACE: this flag indicates that if the target
-	of the rename exists the rename should fail with -EEXIST
-	instead of replacing the target.  The VFS already checks for
-	existence, so for local filesystems the RENAME_NOREPLACE
-	implementation is equivalent to plain rename.
-	(2) RENAME_EXCHANGE: exchange source and target.  Both must
-	exist; this is checked by the VFS.  Unlike plain rename,
-	source and target may be of different type.
-
-  get_link: called by the VFS to follow a symbolic link to the
-	inode it points to.  Only required if you want to support
-	symbolic links.  This method returns the symlink body
-	to traverse (and possibly resets the current position with
-	nd_jump_link()).  If the body won't go away until the inode
-	is gone, nothing else is needed; if it needs to be otherwise
-	pinned, arrange for its release by having get_link(..., ..., done)
-	do set_delayed_call(done, destructor, argument).
-	In that case destructor(argument) will be called once VFS is
-	done with the body you've returned.
-	May be called in RCU mode; that is indicated by NULL dentry
-	argument.  If request can't be handled without leaving RCU mode,
-	have it return ERR_PTR(-ECHILD).
-
-	If the filesystem stores the symlink target in ->i_link, the
-	VFS may use it directly without calling ->get_link(); however,
-	->get_link() must still be provided.  ->i_link must not be
-	freed until after an RCU grace period.  Writing to ->i_link
-	post-iget() time requires a 'release' memory barrier.
-
-  readlink: this is now just an override for use by readlink(2) for the
-	cases when ->get_link uses nd_jump_link() or object is not in
-	fact a symlink.  Normally filesystems should only implement
-	->get_link for symlinks and readlink(2) will automatically use
-	that.
-
-  permission: called by the VFS to check for access rights on a POSIX-like
-	filesystem.
-
-	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in rcu-walk
-	mode, the filesystem must check the permission without blocking or
-	storing to the inode.
-
-	If a situation is encountered that rcu-walk cannot handle, return
-	-ECHILD and it will be called again in ref-walk mode.
-
-  setattr: called by the VFS to set attributes for a file.  This method
-	is called by chmod(2) and related system calls.
-
-  getattr: called by the VFS to get attributes of a file.  This method
-	is called by stat(2) and related system calls.
-
-  listxattr: called by the VFS to list all extended attributes for a
-	given file.  This method is called by the listxattr(2) system call.
-
-  update_time: called by the VFS to update a specific time or the i_version of
-	an inode.  If this is not defined the VFS will update the inode itself
-	and call mark_inode_dirty_sync.
-
-  atomic_open: called on the last component of an open.  Using this optional
-	method the filesystem can look up, possibly create and open the file in
-	one atomic operation.  If it wants to leave actual opening to the
-	caller (e.g. if the file turned out to be a symlink, device, or just
-	something filesystem won't do atomic open for), it may signal this by
-	returning finish_no_open(file, dentry).  This method is only called if
-	the last component is negative or needs lookup.  Cached positive dentries
-	are still handled by f_op->open().  If the file was created,
-	FMODE_CREATED flag should be set in file->f_mode.  In case of O_EXCL
-	the method must only succeed if the file didn't exist and hence FMODE_CREATED
-	shall always be set on success.
-
-  tmpfile: called in the end of O_TMPFILE open().  Optional, equivalent to
-	atomically creating, opening and unlinking a file in given directory.
-
-
-The Address Space Object
-========================
-
-The address space object is used to group and manage pages in the page
-cache.  It can be used to keep track of the pages in a file (or anything
-else) and also track the mapping of sections of the file into process
-address spaces.
-
-There are a number of distinct yet related services that an
-address-space can provide.  These include communicating memory pressure,
-page lookup by address, and keeping track of pages tagged as Dirty or
-Writeback.
-
-The first can be used independently to the others.  The VM can try to
-either write dirty pages in order to clean them, or release clean pages
-in order to reuse them.  To do this it can call the ->writepage method
-on dirty pages, and ->releasepage on clean pages with PagePrivate set.
-Clean pages without PagePrivate and with no external references will be
-released without notice being given to the address_space.
-
-To achieve this functionality, pages need to be placed on an LRU with
-lru_cache_add and mark_page_active needs to be called whenever the page
-is used.
-
-Pages are normally kept in a radix tree index by ->index.  This tree
-maintains information about the PG_Dirty and PG_Writeback status of each
-page, so that pages with either of these flags can be found quickly.
-
-The Dirty tag is primarily used by mpage_writepages - the default
-->writepages method.  It uses the tag to find dirty pages to call
-->writepage on.  If mpage_writepages is not used (i.e. the address
-provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
-unused.  write_inode_now and sync_inode do use it (through
-__sync_single_inode) to check if ->writepages has been successful in
-writing out the whole address_space.
-
-The Writeback tag is used by filemap*wait* and sync_page* functions, via
-filemap_fdatawait_range, to wait for all writeback to complete.
-
-An address_space handler may attach extra information to a page,
-typically using the 'private' field in the 'struct page'.  If such
-information is attached, the PG_Private flag should be set.  This will
-cause various VM routines to make extra calls into the address_space
-handler to deal with that data.
-
-An address space acts as an intermediate between storage and
-application.  Data is read into the address space a whole page at a
-time, and provided to the application either by copying of the page, or
-by memory-mapping the page.  Data is written into the address space by
-the application, and then written-back to storage typically in whole
-pages, however the address_space has finer control of write sizes.
-
-The read process essentially only requires 'readpage'.  The write
-process is more complicated and uses write_begin/write_end or
-set_page_dirty to write data into the address_space, and writepage and
-writepages to writeback data to storage.
-
-Adding and removing pages to/from an address_space is protected by the
-inode's i_mutex.
-
-When data is written to a page, the PG_Dirty flag should be set.  It
-typically remains set until writepage asks for it to be written.  This
-should clear PG_Dirty and set PG_Writeback.  It can be actually written
-at any point after PG_Dirty is clear.  Once it is known to be safe,
-PG_Writeback is cleared.
-
-Writeback makes use of a writeback_control structure to direct the
-operations.  This gives the the writepage and writepages operations some
-information about the nature of and reason for the writeback request,
-and the constraints under which it is being done.  It is also used to
-return information back to the caller about the result of a writepage or
-writepages request.
-
-
-Handling errors during writeback
---------------------------------
-
-Most applications that do buffered I/O will periodically call a file
-synchronization call (fsync, fdatasync, msync or sync_file_range) to
-ensure that data written has made it to the backing store.  When there
-is an error during writeback, they expect that error to be reported when
-a file sync request is made.  After an error has been reported on one
-request, subsequent requests on the same file descriptor should return
-0, unless further writeback errors have occurred since the previous file
-syncronization.
-
-Ideally, the kernel would report errors only on file descriptions on
-which writes were done that subsequently failed to be written back.  The
-generic pagecache infrastructure does not track the file descriptions
-that have dirtied each individual page however, so determining which
-file descriptors should get back an error is not possible.
-
-Instead, the generic writeback error tracking infrastructure in the
-kernel settles for reporting errors to fsync on all file descriptions
-that were open at the time that the error occurred.  In a situation with
-multiple writers, all of them will get back an error on a subsequent
-fsync, even if all of the writes done through that particular file
-descriptor succeeded (or even if there were no writes on that file
-descriptor at all).
-
-Filesystems that wish to use this infrastructure should call
-mapping_set_error to record the error in the address_space when it
-occurs.  Then, after writing back data from the pagecache in their
-file->fsync operation, they should call file_check_and_advance_wb_err to
-ensure that the struct file's error cursor has advanced to the correct
-point in the stream of errors emitted by the backing device(s).
-
-
-struct address_space_operations
--------------------------------
-
-This describes how the VFS can manipulate mapping of a file to page
-cache in your filesystem.  The following members are defined:
-
-struct address_space_operations {
-	int (*writepage)(struct page *page, struct writeback_control *wbc);
-	int (*readpage)(struct file *, struct page *);
-	int (*writepages)(struct address_space *, struct writeback_control *);
-	int (*set_page_dirty)(struct page *page);
-	int (*readpages)(struct file *filp, struct address_space *mapping,
-			struct list_head *pages, unsigned nr_pages);
-	int (*write_begin)(struct file *, struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned flags,
-				struct page **pagep, void **fsdata);
-	int (*write_end)(struct file *, struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned copied,
-				struct page *page, void *fsdata);
-	sector_t (*bmap)(struct address_space *, sector_t);
-	void (*invalidatepage) (struct page *, unsigned int, unsigned int);
-	int (*releasepage) (struct page *, int);
-	void (*freepage)(struct page *);
-	ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
-	/* isolate a page for migration */
-	bool (*isolate_page) (struct page *, isolate_mode_t);
-	/* migrate the contents of a page to the specified target */
-	int (*migratepage) (struct page *, struct page *);
-	/* put migration-failed page back to right list */
-	void (*putback_page) (struct page *);
-	int (*launder_page) (struct page *);
-
-	int (*is_partially_uptodate) (struct page *, unsigned long,
-					unsigned long);
-	void (*is_dirty_writeback) (struct page *, bool *, bool *);
-	int (*error_remove_page) (struct mapping *mapping, struct page *page);
-	int (*swap_activate)(struct file *);
-	int (*swap_deactivate)(struct file *);
-};
-
-  writepage: called by the VM to write a dirty page to backing store.
-      This may happen for data integrity reasons (i.e. 'sync'), or
-      to free up memory (flush).  The difference can be seen in
-      wbc->sync_mode.
-      The PG_Dirty flag has been cleared and PageLocked is true.
-      writepage should start writeout, should set PG_Writeback,
-      and should make sure the page is unlocked, either synchronously
-      or asynchronously when the write operation completes.
-
-      If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
-      try too hard if there are problems, and may choose to write out
-      other pages from the mapping if that is easier (e.g. due to
-      internal dependencies).  If it chooses not to start writeout, it
-      should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep
-      calling ->writepage on that page.
-
-      See the file "Locking" for more details.
-
-  readpage: called by the VM to read a page from backing store.
-       The page will be Locked when readpage is called, and should be
-       unlocked and marked uptodate once the read completes.
-       If ->readpage discovers that it needs to unlock the page for
-       some reason, it can do so, and then return AOP_TRUNCATED_PAGE.
-       In this case, the page will be relocated, relocked and if
-       that all succeeds, ->readpage will be called again.
-
-  writepages: called by the VM to write out pages associated with the
-	address_space object.  If wbc->sync_mode is WBC_SYNC_ALL, then
-	the writeback_control will specify a range of pages that must be
-	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
-	and that many pages should be written if possible.
-	If no ->writepages is given, then mpage_writepages is used
-	instead.  This will choose pages from the address space that are
-	tagged as DIRTY and will pass them to ->writepage.
-
-  set_page_dirty: called by the VM to set a page dirty.
-	This is particularly needed if an address space attaches
-	private data to a page, and that data needs to be updated when
-	a page is dirtied.  This is called, for example, when a memory
-	mapped page gets modified.
-	If defined, it should set the PageDirty flag, and the
-	PAGECACHE_TAG_DIRTY tag in the radix tree.
-
-  readpages: called by the VM to read pages associated with the address_space
-	object.  This is essentially just a vector version of
-	readpage.  Instead of just one page, several pages are
-	requested.
-	readpages is only used for read-ahead, so read errors are
-	ignored.  If anything goes wrong, feel free to give up.
-
-  write_begin:
-	Called by the generic buffered write code to ask the filesystem to
-	prepare to write len bytes at the given offset in the file.  The
-	address_space should check that the write will be able to complete,
-	by allocating space if necessary and doing any other internal
-	housekeeping.  If the write will update parts of any basic-blocks on
-	storage, then those blocks should be pre-read (if they haven't been
-	read already) so that the updated blocks can be written out properly.
-
-	The filesystem must return the locked pagecache page for the specified
-	offset, in *pagep, for the caller to write into.
-
-	It must be able to cope with short writes (where the length passed to
-	write_begin is greater than the number of bytes copied into the page).
-
-	flags is a field for AOP_FLAG_xxx flags, described in
-	include/linux/fs.h.
-
-	A void * may be returned in fsdata, which then gets passed into
-	write_end.
-
-	Returns 0 on success; < 0 on failure (which is the error code), in
-	which case write_end is not called.
-
-  write_end: After a successful write_begin, and data copy, write_end must
-	be called.  len is the original len passed to write_begin, and copied
-	is the amount that was able to be copied.
-
-	The filesystem must take care of unlocking the page and releasing it
-	refcount, and updating i_size.
-
-	Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
-	that were able to be copied into pagecache.
-
-  bmap: called by the VFS to map a logical block offset within object to
-	physical block number.  This method is used by the FIBMAP
-	ioctl and for working with swap-files.  To be able to swap to
-	a file, the file must have a stable mapping to a block
-	device.  The swap system does not go through the filesystem
-	but instead uses bmap to find out where the blocks in the file
-	are and uses those addresses directly.
-
-  invalidatepage: If a page has PagePrivate set, then invalidatepage
-	will be called when part or all of the page is to be removed
-	from the address space.  This generally corresponds to either a
-	truncation, punch hole  or a complete invalidation of the address
-	space (in the latter case 'offset' will always be 0 and 'length'
-	will be PAGE_SIZE).  Any private data associated with the page
-	should be updated to reflect this truncation.  If offset is 0 and
-	length is PAGE_SIZE, then the private data should be released,
-	because the page must be able to be completely discarded.  This may
-	be done by calling the ->releasepage function, but in this case the
-	release MUST succeed.
-
-  releasepage: releasepage is called on PagePrivate pages to indicate
-	that the page should be freed if possible.  ->releasepage
-	should remove any private data from the page and clear the
-	PagePrivate flag.  If releasepage() fails for some reason, it must
-	indicate failure with a 0 return value.
-	releasepage() is used in two distinct though related cases.  The
-	first is when the VM finds a clean page with no active users and
-	wants to make it a free page.  If ->releasepage succeeds, the
-	page will be removed from the address_space and become free.
-
-	The second case is when a request has been made to invalidate
-	some or all pages in an address_space.  This can happen
-	through the fadvise(POSIX_FADV_DONTNEED) system call or by the
-	filesystem explicitly requesting it as nfs and 9fs do (when
-	they believe the cache may be out of date with storage) by
-	calling invalidate_inode_pages2().
-	If the filesystem makes such a call, and needs to be certain
-	that all pages are invalidated, then its releasepage will
-	need to ensure this.  Possibly it can clear the PageUptodate
-	bit if it cannot free private data yet.
-
-  freepage: freepage is called once the page is no longer visible in
-	the page cache in order to allow the cleanup of any private
-	data.  Since it may be called by the memory reclaimer, it
-	should not assume that the original address_space mapping still
-	exists, and it should not block.
-
-  direct_IO: called by the generic read/write routines to perform
-	direct_IO - that is IO requests which bypass the page cache
-	and transfer data directly between the storage and the
-	application's address space.
-
-  isolate_page: Called by the VM when isolating a movable non-lru page.
-	If page is successfully isolated, VM marks the page as PG_isolated
-	via __SetPageIsolated.
-
-  migrate_page:  This is used to compact the physical memory usage.
-	If the VM wants to relocate a page (maybe off a memory card
-	that is signalling imminent failure) it will pass a new page
-	and an old page to this function.  migrate_page should
-	transfer any private data across and update any references
-	that it has to the page.
-
-  putback_page: Called by the VM when isolated page's migration fails.
-
-  launder_page: Called before freeing a page - it writes back the dirty page.  To
-	prevent redirtying the page, it is kept locked during the whole
-	operation.
-
-  is_partially_uptodate: Called by the VM when reading a file through the
-	pagecache when the underlying blocksize != pagesize.  If the required
-	block is up to date then the read can complete without needing the IO
-	to bring the whole page up to date.
-
-  is_dirty_writeback: Called by the VM when attempting to reclaim a page.
-	The VM uses dirty and writeback information to determine if it needs
-	to stall to allow flushers a chance to complete some IO.  Ordinarily
-	it can use PageDirty and PageWriteback but some filesystems have
-	more complex state (unstable pages in NFS prevent reclaim) or
-	do not set those flags due to locking problems.  This callback
-	allows a filesystem to indicate to the VM if a page should be
-	treated as dirty or writeback for the purposes of stalling.
-
-  error_remove_page: normally set to generic_error_remove_page if truncation
-	is ok for this address space.  Used for memory failure handling.
-	Setting this implies you deal with pages going away under you,
-	unless you have them locked or reference counts increased.
-
-  swap_activate: Called when swapon is used on a file to allocate
-	space if necessary and pin the block lookup information in
-	memory.  A return value of zero indicates success,
-	in which case this file can be used to back swapspace.
-
-  swap_deactivate: Called during swapoff on files where swap_activate
-	was successful.
-
-
-The File Object
-===============
-
-A file object represents a file opened by a process.  This is also known
-as an "open file description" in POSIX parlance.
-
-
-struct file_operations
-----------------------
-
-This describes how the VFS can manipulate an open file.  As of kernel
-4.18, the following members are defined:
-
-struct file_operations {
-	struct module *owner;
-	loff_t (*llseek) (struct file *, loff_t, int);
-	ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
-	ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
-	ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
-	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
-	int (*iopoll)(struct kiocb *kiocb, bool spin);
-	int (*iterate) (struct file *, struct dir_context *);
-	int (*iterate_shared) (struct file *, struct dir_context *);
-	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
-	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
-	int (*mmap) (struct file *, struct vm_area_struct *);
-	int (*open) (struct inode *, struct file *);
-	int (*flush) (struct file *, fl_owner_t id);
-	int (*release) (struct inode *, struct file *);
-	int (*fsync) (struct file *, loff_t, loff_t, int datasync);
-	int (*fasync) (int, struct file *, int);
-	int (*lock) (struct file *, int, struct file_lock *);
-	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
-	unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
-	int (*check_flags)(int);
-	int (*flock) (struct file *, int, struct file_lock *);
-	ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
-	ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
-	int (*setlease)(struct file *, long, struct file_lock **, void **);
-	long (*fallocate)(struct file *file, int mode, loff_t offset,
-			  loff_t len);
-	void (*show_fdinfo)(struct seq_file *m, struct file *f);
-#ifndef CONFIG_MMU
-	unsigned (*mmap_capabilities)(struct file *);
-#endif
-	ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int);
-	loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in,
-				   struct file *file_out, loff_t pos_out,
-				   loff_t len, unsigned int remap_flags);
-	int (*fadvise)(struct file *, loff_t, loff_t, int);
-};
-
-Again, all methods are called without any locks being held, unless
-otherwise noted.
-
-  llseek: called when the VFS needs to move the file position index
-
-  read: called by read(2) and related system calls
-
-  read_iter: possibly asynchronous read with iov_iter as destination
-
-  write: called by write(2) and related system calls
-
-  write_iter: possibly asynchronous write with iov_iter as source
-
-  iopoll: called when aio wants to poll for completions on HIPRI iocbs
-
-  iterate: called when the VFS needs to read the directory contents
-
-  iterate_shared: called when the VFS needs to read the directory contents
-	when filesystem supports concurrent dir iterators
-
-  poll: called by the VFS when a process wants to check if there is
-	activity on this file and (optionally) go to sleep until there
-	is activity.  Called by the select(2) and poll(2) system calls
-
-  unlocked_ioctl: called by the ioctl(2) system call.
-
-  compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
-	 are used on 64 bit kernels.
-
-  mmap: called by the mmap(2) system call
-
-  open: called by the VFS when an inode should be opened.  When the VFS
-	opens a file, it creates a new "struct file".  It then calls the
-	open method for the newly allocated file structure.  You might
-	think that the open method really belongs in
-	"struct inode_operations", and you may be right.  I think it's
-	done the way it is because it makes filesystems simpler to
-	implement.  The open() method is a good place to initialize the
-	"private_data" member in the file structure if you want to point
-	to a device structure
-
-  flush: called by the close(2) system call to flush a file
-
-  release: called when the last reference to an open file is closed
-
-  fsync: called by the fsync(2) system call.  Also see the section above
-	 entitled "Handling errors during writeback".
-
-  fasync: called by the fcntl(2) system call when asynchronous
-	(non-blocking) mode is enabled for a file
-
-  lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
-	commands
-
-  get_unmapped_area: called by the mmap(2) system call
-
-  check_flags: called by the fcntl(2) system call for F_SETFL command
-
-  flock: called by the flock(2) system call
-
-  splice_write: called by the VFS to splice data from a pipe to a file.  This
-		method is used by the splice(2) system call
-
-  splice_read: called by the VFS to splice data from file to a pipe.  This
-	       method is used by the splice(2) system call
-
-  setlease: called by the VFS to set or release a file lock lease.  setlease
-	    implementations should call generic_setlease to record or remove
-	    the lease in the inode after setting it.
-
-  fallocate: called by the VFS to preallocate blocks or punch a hole.
-
-  copy_file_range: called by the copy_file_range(2) system call.
-
-  remap_file_range: called by the ioctl(2) system call for FICLONERANGE and
-	FICLONE and FIDEDUPERANGE commands to remap file ranges.  An
-	implementation should remap len bytes at pos_in of the source file into
-	the dest file at pos_out.  Implementations must handle callers passing
-	in len == 0; this means "remap to the end of the source file".  The
-	return value should the number of bytes remapped, or the usual
-	negative error code if errors occurred before any bytes were remapped.
-	The remap_flags parameter accepts REMAP_FILE_* flags.  If
-	REMAP_FILE_DEDUP is set then the implementation must only remap if the
-	requested file ranges have identical contents.  If REMAP_CAN_SHORTEN is
-	set, the caller is ok with the implementation shortening the request
-	length to satisfy alignment or EOF requirements (or any other reason).
-
-  fadvise: possibly called by the fadvise64() system call.
-
-Note that the file operations are implemented by the specific
-filesystem in which the inode resides.  When opening a device node
-(character or block special) most filesystems will call special
-support routines in the VFS which will locate the required device
-driver information.  These support routines replace the filesystem file
-operations with those for the device driver, and then proceed to call
-the new open() method for the file.  This is how opening a device file
-in the filesystem eventually ends up calling the device driver open()
-method.
-
-
-Directory Entry Cache (dcache)
-==============================
-
-
-struct dentry_operations
-------------------------
-
-This describes how a filesystem can overload the standard dentry
-operations.  Dentries and the dcache are the domain of the VFS and the
-individual filesystem implementations.  Device drivers have no business
-here.  These methods may be set to NULL, as they are either optional or
-the VFS uses a default.  As of kernel 2.6.22, the following members are
-defined:
-
-struct dentry_operations {
-	int (*d_revalidate)(struct dentry *, unsigned int);
-	int (*d_weak_revalidate)(struct dentry *, unsigned int);
-	int (*d_hash)(const struct dentry *, struct qstr *);
-	int (*d_compare)(const struct dentry *,
-			unsigned int, const char *, const struct qstr *);
-	int (*d_delete)(const struct dentry *);
-	int (*d_init)(struct dentry *);
-	void (*d_release)(struct dentry *);
-	void (*d_iput)(struct dentry *, struct inode *);
-	char *(*d_dname)(struct dentry *, char *, int);
-	struct vfsmount *(*d_automount)(struct path *);
-	int (*d_manage)(const struct path *, bool);
-	struct dentry *(*d_real)(struct dentry *, const struct inode *);
-};
-
-  d_revalidate: called when the VFS needs to revalidate a dentry.  This
-	is called whenever a name look-up finds a dentry in the
-	dcache.  Most local filesystems leave this as NULL, because all their
-	dentries in the dcache are valid.  Network filesystems are different
-	since things can change on the server without the client necessarily
-	being aware of it.
-
-	This function should return a positive value if the dentry is still
-	valid, and zero or a negative error code if it isn't.
-
-	d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU).
-	If in rcu-walk mode, the filesystem must revalidate the dentry without
-	blocking or storing to the dentry, d_parent and d_inode should not be
-	used without care (because they can change and, in d_inode case, even
-	become NULL under us).
-
-	If a situation is encountered that rcu-walk cannot handle, return
-	-ECHILD and it will be called again in ref-walk mode.
-
- d_weak_revalidate: called when the VFS needs to revalidate a "jumped" dentry.
-	This is called when a path-walk ends at dentry that was not acquired by
-	doing a lookup in the parent directory.  This includes "/", "." and "..",
-	as well as procfs-style symlinks and mountpoint traversal.
-
-	In this case, we are less concerned with whether the dentry is still
-	fully correct, but rather that the inode is still valid.  As with
-	d_revalidate, most local filesystems will set this to NULL since their
-	dcache entries are always valid.
-
-	This function has the same return code semantics as d_revalidate.
-
-	d_weak_revalidate is only called after leaving rcu-walk mode.
-
-  d_hash: called when the VFS adds a dentry to the hash table.  The first
-	dentry passed to d_hash is the parent directory that the name is
-	to be hashed into.
-
-	Same locking and synchronisation rules as d_compare regarding
-	what is safe to dereference etc.
-
-  d_compare: called to compare a dentry name with a given name.  The first
-	dentry is the parent of the dentry to be compared, the second is
-	the child dentry.  len and name string are properties of the dentry
-	to be compared.  qstr is the name to compare it with.
-
-	Must be constant and idempotent, and should not take locks if
-	possible, and should not or store into the dentry.
-	Should not dereference pointers outside the dentry without
-	lots of care (eg.  d_parent, d_inode, d_name should not be used).
-
-	However, our vfsmount is pinned, and RCU held, so the dentries and
-	inodes won't disappear, neither will our sb or filesystem module.
-	->d_sb may be used.
-
-	It is a tricky calling convention because it needs to be called under
-	"rcu-walk", ie. without any locks or references on things.
-
-  d_delete: called when the last reference to a dentry is dropped and the
-	dcache is deciding whether or not to cache it.  Return 1 to delete
-	immediately, or 0 to cache the dentry.  Default is NULL which means to
-	always cache a reachable dentry.  d_delete must be constant and
-	idempotent.
-
-  d_init: called when a dentry is allocated
-
-  d_release: called when a dentry is really deallocated
-
-  d_iput: called when a dentry loses its inode (just prior to its
-	being deallocated).  The default when this is NULL is that the
-	VFS calls iput().  If you define this method, you must call
-	iput() yourself
-
-  d_dname: called when the pathname of a dentry should be generated.
-	Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
-	pathname generation.  (Instead of doing it when dentry is created,
-	it's done only when the path is needed.).  Real filesystems probably
-	dont want to use it, because their dentries are present in global
-	dcache hash, so their hash should be an invariant.  As no lock is
-	held, d_dname() should not try to modify the dentry itself, unless
-	appropriate SMP safety is used.  CAUTION : d_path() logic is quite
-	tricky.  The correct way to return for example "Hello" is to put it
-	at the end of the buffer, and returns a pointer to the first char.
-	dynamic_dname() helper function is provided to take care of this.
-
-	Example :
-
-	static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
-	{
-		return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]",
-				dentry->d_inode->i_ino);
-	}
-
-  d_automount: called when an automount dentry is to be traversed (optional).
-	This should create a new VFS mount record and return the record to the
-	caller.  The caller is supplied with a path parameter giving the
-	automount directory to describe the automount target and the parent
-	VFS mount record to provide inheritable mount parameters.  NULL should
-	be returned if someone else managed to make the automount first.  If
-	the vfsmount creation failed, then an error code should be returned.
-	If -EISDIR is returned, then the directory will be treated as an
-	ordinary directory and returned to pathwalk to continue walking.
-
-	If a vfsmount is returned, the caller will attempt to mount it on the
-	mountpoint and will remove the vfsmount from its expiration list in
-	the case of failure.  The vfsmount should be returned with 2 refs on
-	it to prevent automatic expiration - the caller will clean up the
-	additional ref.
-
-	This function is only used if DCACHE_NEED_AUTOMOUNT is set on the
-	dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is set on the
-	inode being added.
-
-  d_manage: called to allow the filesystem to manage the transition from a
-	dentry (optional).  This allows autofs, for example, to hold up clients
-	waiting to explore behind a 'mountpoint' while letting the daemon go
-	past and construct the subtree there.  0 should be returned to let the
-	calling process continue.  -EISDIR can be returned to tell pathwalk to
-	use this directory as an ordinary directory and to ignore anything
-	mounted on it and not to check the automount flag.  Any other error
-	code will abort pathwalk completely.
-
-	If the 'rcu_walk' parameter is true, then the caller is doing a
-	pathwalk in RCU-walk mode.  Sleeping is not permitted in this mode,
-	and the caller can be asked to leave it and call again by returning
-	-ECHILD.  -EISDIR may also be returned to tell pathwalk to
-	ignore d_automount or any mounts.
-
-	This function is only used if DCACHE_MANAGE_TRANSIT is set on the
-	dentry being transited from.
-
-  d_real: overlay/union type filesystems implement this method to return one of
-	the underlying dentries hidden by the overlay.  It is used in two
-	different modes:
-
-	Called from file_dentry() it returns the real dentry matching the inode
-	argument.  The real dentry may be from a lower layer already copied up,
-	but still referenced from the file.  This mode is selected with a
-	non-NULL inode argument.
-
-	With NULL inode the topmost real underlying dentry is returned.
-
-Each dentry has a pointer to its parent dentry, as well as a hash list
-of child dentries.  Child dentries are basically like files in a
-directory.
-
-
-Directory Entry Cache API
---------------------------
-
-There are a number of functions defined which permit a filesystem to
-manipulate dentries:
-
-  dget: open a new handle for an existing dentry (this just increments
-	the usage count)
-
-  dput: close a handle for a dentry (decrements the usage count).  If
-	the usage count drops to 0, and the dentry is still in its
-	parent's hash, the "d_delete" method is called to check whether
-	it should be cached.  If it should not be cached, or if the dentry
-	is not hashed, it is deleted.  Otherwise cached dentries are put
-	into an LRU list to be reclaimed on memory shortage.
-
-  d_drop: this unhashes a dentry from its parents hash list.  A
-	subsequent call to dput() will deallocate the dentry if its
-	usage count drops to 0
-
-  d_delete: delete a dentry.  If there are no other open references to
-	the dentry then the dentry is turned into a negative dentry
-	(the d_iput() method is called).  If there are other
-	references, then d_drop() is called instead
-
-  d_add: add a dentry to its parents hash list and then calls
-	d_instantiate()
-
-  d_instantiate: add a dentry to the alias hash list for the inode and
-	updates the "d_inode" member.  The "i_count" member in the
-	inode structure should be set/incremented.  If the inode
-	pointer is NULL, the dentry is called a "negative
-	dentry".  This function is commonly called when an inode is
-	created for an existing negative dentry
-
-  d_lookup: look up a dentry given its parent and path name component
-	It looks up the child of that given name from the dcache
-	hash table.  If it is found, the reference count is incremented
-	and the dentry is returned.  The caller must use dput()
-	to free the dentry when it finishes using it.
-
-
-Mount Options
-=============
-
-
-Parsing options
----------------
-
-On mount and remount the filesystem is passed a string containing a
-comma separated list of mount options.  The options can have either of
-these forms:
-
-  option
-  option=value
-
-The <linux/parser.h> header defines an API that helps parse these
-options.  There are plenty of examples on how to use it in existing
-filesystems.
-
-
-Showing options
----------------
-
-If a filesystem accepts mount options, it must define show_options() to
-show all the currently active options.  The rules are:
-
-  - options MUST be shown which are not default or their values differ
-    from the default
-
-  - options MAY be shown which are enabled by default or have their
-    default value
-
-Options used only internally between a mount helper and the kernel (such
-as file descriptors), or which only have an effect during the mounting
-(such as ones controlling the creation of a journal) are exempt from the
-above rules.
-
-The underlying reason for the above rules is to make sure, that a mount
-can be accurately replicated (e.g. umounting and mounting again) based
-on the information found in /proc/mounts.
-
-
-Resources
-=========
-
-(Note some of these resources are not up-to-date with the latest kernel
- version.)
-
-Creating Linux virtual filesystems. 2002
-    <http://lwn.net/Articles/13325/>
-
-The Linux Virtual File-system Layer by Neil Brown. 1999
-    <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>
-
-A tour of the Linux VFS by Michael K. Johnson. 1996
-    <http://www.tldp.org/LDP/khg/HyperNews/get/fs/vfstour.html>
-
-A small trail through the Linux kernel by Andries Brouwer. 2001
-    <http://www.win.tue.nl/~aeb/linux/vfs/trail.html>
-- 
cgit v1.2.3-59-g8ed1b


From 44f42165177e6c32f3a6aaceeaf7d9cd1c95595f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:24 -0300
Subject: scripts/sphinx-pre-install: make activate hint smarter

It is possible that multiple Sphinx virtualenvs are installed
on a given kernel tree. Change the logic to get the latest
version of those, as this is probably what the user wants.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/sphinx-pre-install | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index 8c2d1bcf2e02..11239eb29695 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -1,7 +1,7 @@
 #!/usr/bin/perl
 use strict;
 
-# Copyright (c) 2017 Mauro Carvalho Chehab <mchehab@kernel.org>
+# Copyright (c) 2017-2019 Mauro Carvalho Chehab <mchehab@kernel.org>
 #
 # This program is free software; you can redistribute it and/or
 # modify it under the terms of the GNU General Public License
@@ -15,6 +15,7 @@ use strict;
 
 my $conf = "Documentation/conf.py";
 my $requirement_file = "Documentation/sphinx/requirements.txt";
+my $virtenv_prefix = "sphinx_";
 
 #
 # Static vars
@@ -28,7 +29,8 @@ my $need_symlink = 0;
 my $need_sphinx = 0;
 my $rec_sphinx_upgrade = 0;
 my $install = "";
-my $virtenv_dir = "sphinx_";
+my $virtenv_dir = "";
+my $min_version;
 
 #
 # Command line arguments
@@ -229,7 +231,6 @@ sub get_sphinx_fname()
 
 sub check_sphinx()
 {
-	my $min_version;
 	my $rec_version;
 	my $cur_version;
 
@@ -255,7 +256,7 @@ sub check_sphinx()
 
 	die "Can't get recommended sphinx version from $requirement_file" if (!$min_version);
 
-	$virtenv_dir .= $rec_version;
+	$virtenv_dir = $virtenv_prefix . $rec_version;
 
 	my $sphinx = get_sphinx_fname();
 	return if ($sphinx eq "");
@@ -612,18 +613,23 @@ sub check_needs()
 		       which("sphinx-build-3");
 	}
 	if ($need_sphinx || $rec_sphinx_upgrade) {
-		my $activate = "$virtenv_dir/bin/activate";
-		if (-e "$ENV{'PWD'}/$activate") {
+		my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate";
+                my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate";
+
+                @activates = sort {$b cmp $a} @activates;
+
+		if (scalar @activates > 0 && $activates[0] ge $min_activate) {
 			printf "\nNeed to activate virtualenv with:\n";
-			printf "\t. $activate\n";
+			printf "\t. $activates[0]\n";
 		} else {
+			my $rec_activate = "$virtenv_dir/bin/activate";
 			my $virtualenv = findprog("virtualenv-3");
 			$virtualenv = findprog("virtualenv-3.5") if (!$virtualenv);
 			$virtualenv = findprog("virtualenv") if (!$virtualenv);
 			$virtualenv = "virtualenv" if (!$virtualenv);
 
 			printf "\t$virtualenv $virtenv_dir\n";
-			printf "\t. $activate\n";
+			printf "\t. $rec_activate\n";
 			printf "\tpip install -r $requirement_file\n";
 
 			$need++ if (!$rec_sphinx_upgrade);
-- 
cgit v1.2.3-59-g8ed1b


From c4c562defedb7634a717293a5192071983e79781 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:25 -0300
Subject: scripts/sphinx-pre-install: get rid of RHEL7 explicity check

RHEL8 was already launched. This test won't get it, and will
do the wrong thing. Ok, we could fix it, but now we check
Sphinx version to ensure that it matches the minimal (1.3),
so there's no need for an explicit check there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/sphinx-pre-install | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index 11239eb29695..ded3e2ef3f8d 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -581,19 +581,6 @@ sub check_needs()
 		print "Unknown OS\n";
 	}
 
-	# RHEL 7.x and clones have Sphinx version 1.1.x and incomplete texlive
-	if (($system_release =~ /Red Hat Enterprise Linux/) ||
-	    ($system_release =~ /CentOS/) ||
-	    ($system_release =~ /Scientific Linux/) ||
-	    ($system_release =~ /Oracle Linux Server/)) {
-		$virtualenv = 1;
-		$pdf = 0;
-
-		printf("NOTE: On this distro, Sphinx and TexLive shipped versions are incompatible\n");
-		printf("with doc build. So, use Sphinx via a Python virtual environment.\n\n");
-		printf("This script can't install a TexLive version that would provide PDF.\n");
-	}
-
 	# Check for needed programs/tools
 	check_sphinx();
 	check_perl_module("Pod::Usage", 0);
-- 
cgit v1.2.3-59-g8ed1b


From 9b88ad5464af1bf7228991f1c46a9a13484790a4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:26 -0300
Subject: scripts/sphinx-pre-install: always check if version is compatible
 with build

Call the script every time a make docs target is selected, on
a simplified check mode.

With this change, the script will set two vars:

$min_version - obtained from `needs_sphinx` var inside
	       conf.py (currently, '1.3')

$rec_version - obtained from sphinx/requirements.txt.

With those changes, a target like "make htmldocs" will do:

1) If no sphinx-build/sphinx-build3 is found, it will run
   the script on normal mode as before, checking for all
   system dependencies and providing install hints for the
   needed programs and will abort the build;

2) If no sphinx-build/sphinx-build3 is found, but there is
   a sphinx_${VER}/bin/activate file, and if
   ${VER} >= $min_version (string comparation), it will
   run in full mode, and will recommend to activate the
   virtualenv. If there are multiple virtualenvs, it
   will string sort the versions, recommending the
   highest version and will abort the build;

3) If Sphinx is detected but has a version lower than
   $min_version, it will run in full mode - with will
   recommend creating a virtual env using sphinx/requirements.txt,
   and will abort the build.

4) If Sphinx is detected and version is lower than
   $rec_version, it will run in full mode and will
   recommend creating a virtual env using sphinx/requirements.txt.

   In this case, it **won't** abort the build.

5) If Sphinx is detected and version is equal or righer than
   $rec_version it will return just after detecting the
   version ("quick mode"), not checking if are there any
   missing dependencies.

Just like before, if one wants to install Sphinx from the
distro, it has to call the script manually and use `--no-virtualenv`
argument to get the hints for his OS:

    You should run:

	sudo dnf install -y python3-sphinx python3-sphinx_rtd_theme

While here, add a small help for the three optional arguments
for the script.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Makefile     |  5 +++++
 scripts/sphinx-pre-install | 40 +++++++++++++++++++++++++++-------------
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index e889e7cb8511..380e24053d6f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -70,12 +70,14 @@ quiet_cmd_sphinx = SPHINX  $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
 	$(abspath $(BUILDDIR)/$3/$4)
 
 htmldocs:
+	@./scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
 
 linkcheckdocs:
 	@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
 
 latexdocs:
+	@./scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
 
 ifeq ($(HAVE_PDFLATEX),0)
@@ -87,14 +89,17 @@ pdfdocs:
 else # HAVE_PDFLATEX
 
 pdfdocs: latexdocs
+	@./scripts/sphinx-pre-install --version-check
 	$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
 
 endif # HAVE_PDFLATEX
 
 epubdocs:
+	@./scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
 
 xmldocs:
+	@./scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
 
 endif # HAVE_SPHINX
diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index ded3e2ef3f8d..f001fc2fcf12 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -38,6 +38,7 @@ my $min_version;
 
 my $pdf = 1;
 my $virtualenv = 1;
+my $version_check = 0;
 
 #
 # List of required texlive packages on Fedora and OpenSuse
@@ -277,20 +278,22 @@ sub check_sphinx()
 
 	die "$sphinx didn't return its version" if (!$cur_version);
 
-	printf "Sphinx version %s (minimal: %s, recommended >= %s)\n",
-		$cur_version, $min_version, $rec_version;
-
 	if ($cur_version lt $min_version) {
-		print "Warning: Sphinx version should be >= $min_version\n\n";
+		printf "ERROR: Sphinx version is %s. It should be >= %s (recommended >= %s)\n",
+		       $cur_version, $min_version, $rec_version;;
 		$need_sphinx = 1;
 		return;
 	}
 
 	if ($cur_version lt $rec_version) {
+		printf "Sphinx version %s\n", $cur_version;
 		print "Warning: It is recommended at least Sphinx version $rec_version.\n";
-		print "         To upgrade, use:\n\n";
 		$rec_sphinx_upgrade = 1;
+		return;
 	}
+
+	# On version check mode, just assume Sphinx has all mandatory deps
+	exit (0) if ($version_check);
 }
 
 #
@@ -575,14 +578,18 @@ sub check_distros()
 
 sub check_needs()
 {
+	# Check for needed programs/tools
+	check_sphinx();
+
 	if ($system_release) {
-		print "Detected OS: $system_release.\n";
+		print "Detected OS: $system_release.\n\n";
 	} else {
-		print "Unknown OS\n";
+		print "Unknown OS\n\n";
 	}
 
+	print "To upgrade Sphinx, use:\n\n" if ($rec_sphinx_upgrade);
+
 	# Check for needed programs/tools
-	check_sphinx();
 	check_perl_module("Pod::Usage", 0);
 	check_program("make", 0);
 	check_program("gcc", 0);
@@ -601,13 +608,14 @@ sub check_needs()
 	}
 	if ($need_sphinx || $rec_sphinx_upgrade) {
 		my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate";
-                my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate";
+		my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate";
 
-                @activates = sort {$b cmp $a} @activates;
+		@activates = sort {$b cmp $a} @activates;
 
-		if (scalar @activates > 0 && $activates[0] ge $min_activate) {
-			printf "\nNeed to activate virtualenv with:\n";
+		if ($need_sphinx && scalar @activates > 0 && $activates[0] ge $min_activate) {
+			printf "\nNeed to activate a compatible Sphinx version on virtualenv with:\n";
 			printf "\t. $activates[0]\n";
+			exit (1);
 		} else {
 			my $rec_activate = "$virtenv_dir/bin/activate";
 			my $virtualenv = findprog("virtualenv-3");
@@ -646,8 +654,14 @@ while (@ARGV) {
 		$virtualenv = 0;
 	} elsif ($arg eq "--no-pdf"){
 		$pdf = 0;
+	} elsif ($arg eq "--version-check"){
+		$version_check = 1;
 	} else {
-		print "Usage:\n\t$0 <--no-virtualenv> <--no-pdf>\n\n";
+		print "Usage:\n\t$0 <--no-virtualenv> <--no-pdf> <--version-check>\n\n";
+		print "Where:\n";
+		print "\t--no-virtualenv\t- Recommend installing Sphinx instead of using a virtualenv\n";
+		print "\t--version-check\t- if version is compatible, don't check for missing dependencies\n";
+		print "\t--no-pdf\t- don't check for dependencies required to build PDF docs\n\n";
 		exit -1;
 	}
 }
-- 
cgit v1.2.3-59-g8ed1b


From 9e78e7fc0b20bcc0d5599f71d297b6fa1a2e7c5f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:27 -0300
Subject: scripts/documentation-file-ref-check: better handle translations

Only seek for translation renames inside the translation
directory.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 63e9542656f1..6b622b88f4cf 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -141,6 +141,10 @@ print "Auto-fixing broken references. Please double-check the results\n";
 foreach my $ref (keys %broken_ref) {
 	my $new =$ref;
 
+	my $basedir = ".";
+	# On translations, only seek inside the translations directory
+	$basedir  = $1 if ($ref =~ m,(Documentation/translations/[^/]+),);
+
 	# get just the basename
 	$new =~ s,.*/,,;
 
@@ -161,18 +165,18 @@ foreach my $ref (keys %broken_ref) {
 	# usual reason for breakage: file renamed to .rst
 	if (!$f) {
 		$new =~ s/\.txt$/.rst/;
-		$f=qx(find . -iname $new) if ($new);
+		$f=qx(find $basedir -iname $new) if ($new);
 	}
 
 	# usual reason for breakage: use dash or underline
 	if (!$f) {
 		$new =~ s/[-_]/[-_]/g;
-		$f=qx(find . -iname $new) if ($new);
+		$f=qx(find $basedir -iname $new) if ($new);
 	}
 
 	# Wild guess: seek for the same name on another place
 	if (!$f) {
-		$f = qx(find . -iname $new) if ($new);
+		$f = qx(find $basedir -iname $new) if ($new);
 	}
 
 	my @find = split /\s+/, $f;
-- 
cgit v1.2.3-59-g8ed1b


From aeaacbfed853c17b8ac5e73c21f54d7f0805d899 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:28 -0300
Subject: scripts/documentation-file-ref-check: exclude false-positives

There are at least two cases where a documentation file was gone
for good, but the text still mentions it:

1) drivers/vhost/vhost.c:
   the reference for Documentation/virtual/lguest/lguest.c is just
   to give credits to the original work that vhost replaced;

2) Documentation/scsi/scsi_mid_low_api.txt:
   It gives credit and mentions the old Documentation/Configure.help
   file that used to be part of Kernel 2.4.x

As we don't want to keep the script to keep pinpoint to those
every time, let's add a logic at the script to allow it to ignore
valid false-positives like the above.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 6b622b88f4cf..05235775cc71 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -8,6 +8,14 @@ use warnings;
 use strict;
 use Getopt::Long qw(:config no_auto_abbrev);
 
+# NOTE: only add things here when the file was gone, but the text wants
+# to mention a past documentation file, for example, to give credits for
+# the original work.
+my %false_positives = (
+	"Documentation/scsi/scsi_mid_low_api.txt" => "Documentation/Configure.help",
+	"drivers/vhost/vhost.c" => "Documentation/virtual/lguest/lguest.c",
+);
+
 my $scriptname = $0;
 $scriptname =~ s,.*/([^/]+/),$1,;
 
@@ -122,6 +130,11 @@ while (<IN>) {
 			next if (grep -e, glob("$path/$ref $path/$fulref"));
 		}
 
+		# Discard known false-positives
+		if (defined($false_positives{$f})) {
+			next if ($false_positives{$f} eq $fulref);
+		}
+
 		if ($fix) {
 			if (!($ref =~ m/(scripts|Kconfig|Kbuild)/)) {
 				$broken_ref{$ref}++;
-- 
cgit v1.2.3-59-g8ed1b


From 4904aeed9f686c90dba72980f0067ac1a7dbbfb6 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:29 -0300
Subject: scripts/documentation-file-ref-check: improve tools ref handling

There's a false positive on perf/util:

	tools/perf/util/s390-cpumsf.c: Documentation/perf.data-file-format.txt

The file is there at tools/perf/Documentation/, but the logic
with detects relative documentation references inside tools is
not capable of detecting it.

So, improve it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 05235775cc71..5d775ca7469b 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -127,7 +127,7 @@ while (<IN>) {
 		if ($f =~ m/tools/) {
 			my $path = $f;
 			$path =~ s,(.*)/.*,$1,;
-			next if (grep -e, glob("$path/$ref $path/$fulref"));
+			next if (grep -e, glob("$path/$ref $path/../$ref $path/$fulref"));
 		}
 
 		# Discard known false-positives
-- 
cgit v1.2.3-59-g8ed1b


From 0ca862e6f1c7e58e4eb9758fdb09255e6104d6a0 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:30 -0300
Subject: scripts/documentation-file-ref-check: teach about .txt -> .yaml
 renames

At DT, files are being renamed to jason. Teach the script how to
handle such renames when used in fix mode.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 5d775ca7469b..ff16db269079 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -165,13 +165,22 @@ foreach my $ref (keys %broken_ref) {
 
 	# usual reason for breakage: DT file moved around
 	if ($ref =~ /devicetree/) {
-		my $search = $new;
-		$search =~ s,^.*/,,;
-		$f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search);
+		# usual reason for breakage: DT file renamed to .yaml
 		if (!$f) {
-			# Manufacturer name may have changed
-			$search =~ s/^.*,//;
+			my $new_ref = $ref;
+			$new_ref =~ s/\.txt$/.yaml/;
+			$f=$new_ref if (-f $new_ref);
+		}
+
+		if (!$f) {
+			my $search = $new;
+			$search =~ s,^.*/,,;
 			$f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search);
+			if (!$f) {
+				# Manufacturer name may have changed
+				$search =~ s/^.*,//;
+				$f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search);
+			}
 		}
 	}
 
-- 
cgit v1.2.3-59-g8ed1b


From cf08508d21ffae5aea6c7dcb771ebd28612c6120 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:31 -0300
Subject: docs: by default, build docs a lot faster with Sphinx >= 1.7

Since Sphinx version 1.7, it is possible to use "-jauto" in
order to speedup documentation builds. On older versions,
while -j was already supported, one would need to set the
number of threads manually.

So, if SPHINXOPTS is not provided, add -jauto, in order to
speed up the build. That makes it *a lot* times faster than
without -j.

If one really wants to slow things down, it can just use:

	make SPHINXOPTS=-j1 htmldocs

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[ jc: fixed perl magic to determine sphinx version ]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 380e24053d6f..85d3cfafd77c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -28,6 +28,8 @@ ifeq ($(HAVE_SPHINX),0)
 
 else # HAVE_SPHINX
 
+export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while (<IN>) { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN')
+
 # User-friendly check for pdflatex and latexmk
 HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
 HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi)
-- 
cgit v1.2.3-59-g8ed1b


From a700767a7682d9bd237e927253274859aee075e7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 29 May 2019 20:09:32 -0300
Subject: docs: requirements.txt: recommend Sphinx 1.7.9

As discussed at the linux-doc ML, while we'll still support
version 1.3, it is time to recommend a more modern version.

So, let's switch the minimal requirements to Sphinx 1.7.9,
as it has the "-jauto" flag, with makes a lot faster when
building documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/doc-guide/sphinx.rst    | 17 ++++++++---------
 Documentation/sphinx/requirements.txt |  4 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index c039224b404e..4ba081f43e98 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -27,8 +27,7 @@ Sphinx Install
 ==============
 
 The ReST markups currently used by the Documentation/ files are meant to be
-built with ``Sphinx`` version 1.3 or higher. If you desire to build
-PDF output, it is recommended to use version 1.4.6 or higher.
+built with ``Sphinx`` version 1.3 or higher.
 
 There's a script that checks for the Sphinx requirements. Please see
 :ref:`sphinx-pre-install` for further details.
@@ -56,13 +55,13 @@ or ``virtualenv``, depending on how your distribution packaged Python 3.
       those expressions are written using LaTeX notation. It needs texlive
       installed with amdfonts and amsmath in order to evaluate them.
 
-In summary, if you want to install Sphinx version 1.4.9, you should do::
+In summary, if you want to install Sphinx version 1.7.9, you should do::
 
-       $ virtualenv sphinx_1.4
-       $ . sphinx_1.4/bin/activate
-       (sphinx_1.4) $ pip install -r Documentation/sphinx/requirements.txt
+       $ virtualenv sphinx_1.7.9
+       $ . sphinx_1.7.9/bin/activate
+       (sphinx_1.7.9) $ pip install -r Documentation/sphinx/requirements.txt
 
-After running ``. sphinx_1.4/bin/activate``, the prompt will change,
+After running ``. sphinx_1.7.9/bin/activate``, the prompt will change,
 in order to indicate that you're using the new environment. If you
 open a new shell, you need to rerun this command to enter again at
 the virtual environment before building the documentation.
@@ -105,8 +104,8 @@ command line options for your distro::
 	You should run:
 
 		sudo dnf install -y texlive-luatex85
-		/usr/bin/virtualenv sphinx_1.4
-		. sphinx_1.4/bin/activate
+		/usr/bin/virtualenv sphinx_1.7.9
+		. sphinx_1.7.9/bin/activate
 		pip install -r Documentation/sphinx/requirements.txt
 
 	Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
index 742be3e12619..14e29a0ae480 100644
--- a/Documentation/sphinx/requirements.txt
+++ b/Documentation/sphinx/requirements.txt
@@ -1,3 +1,3 @@
-docutils==0.12
-Sphinx==1.4.9
+docutils
+Sphinx==1.7.9
 sphinx_rtd_theme
-- 
cgit v1.2.3-59-g8ed1b


From 6c01edd395a7cc7bb82333e953992eb0e76b1c35 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 31 May 2019 10:02:11 -0600
Subject: docs: look for sphinx-pre-install in the source tree

Recent makefile changes included an invocation of
./scripts/sphinx-pre-install.  Unfortunately, that fails when a separate
build directory is in use with:

  /bin/bash: ./scripts/sphinx-pre-install: No such file or directory

Use $(srctree) to fully specify the location of this script.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 85d3cfafd77c..2edd03b1dad6 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -23,7 +23,7 @@ ifeq ($(HAVE_SPHINX),0)
 .DEFAULT:
 	$(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
 	@echo
-	@./scripts/sphinx-pre-install
+	@$(srctree)/scripts/sphinx-pre-install
 	@echo "  SKIP    Sphinx $@ target."
 
 else # HAVE_SPHINX
-- 
cgit v1.2.3-59-g8ed1b


From 18e1572419d69f8d45248cccabc40352a3e281d6 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Tue, 4 Jun 2019 07:55:49 -0600
Subject: docs: Completely fix the remote build tree case

My previous fix miserably failed to catch all of the invocations of
"./scripts/sphinx-pre-install", so we got build errors.  Try again with
more caffeine.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Makefile | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2edd03b1dad6..2df0789f90b7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -72,14 +72,14 @@ quiet_cmd_sphinx = SPHINX  $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
 	$(abspath $(BUILDDIR)/$3/$4)
 
 htmldocs:
-	@./scripts/sphinx-pre-install --version-check
+	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
 
 linkcheckdocs:
 	@$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
 
 latexdocs:
-	@./scripts/sphinx-pre-install --version-check
+	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
 
 ifeq ($(HAVE_PDFLATEX),0)
@@ -91,17 +91,17 @@ pdfdocs:
 else # HAVE_PDFLATEX
 
 pdfdocs: latexdocs
-	@./scripts/sphinx-pre-install --version-check
+	@$(srctree)/scripts/sphinx-pre-install --version-check
 	$(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
 
 endif # HAVE_PDFLATEX
 
 epubdocs:
-	@./scripts/sphinx-pre-install --version-check
+	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
 
 xmldocs:
-	@./scripts/sphinx-pre-install --version-check
+	@$(srctree)/scripts/sphinx-pre-install --version-check
 	@+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
 
 endif # HAVE_SPHINX
-- 
cgit v1.2.3-59-g8ed1b


From ee5dc0491c38ae4e4e583d7532d470754bb173f6 Mon Sep 17 00:00:00 2001
From: "Tobin C. Harding" <tobin@kernel.org>
Date: Tue, 4 Jun 2019 10:26:56 +1000
Subject: docs: filesystems: vfs: Render method descriptions

Currently vfs.rst does not render well into HTML the method descriptions
for VFS data structures.  We can improve the HTML output by putting the
description string on a new line following the method name.

Suggested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/vfs.rst | 1147 +++++++++++++++++++++----------------
 1 file changed, 642 insertions(+), 505 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 2ffbdf5f392c..0f85ab21c2ca 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -125,35 +125,46 @@ members are defined:
 		struct lock_class_key s_umount_key;
 	};
 
-``name``: the name of the filesystem type, such as "ext2", "iso9660",
+``name``
+	the name of the filesystem type, such as "ext2", "iso9660",
 	"msdos" and so on
 
-``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
+``fs_flags``
+	various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
 
-``mount``: the method to call when a new instance of this filesystem should
-be mounted
+``mount``
+	the method to call when a new instance of this filesystem should
+	be mounted
 
-``kill_sb``: the method to call when an instance of this filesystem
-	should be shut down
+``kill_sb``
+	the method to call when an instance of this filesystem should be
+	shut down
 
-``owner``: for internal VFS use: you should initialize this to THIS_MODULE in
-	most cases.
 
-``next``: for internal VFS use: you should initialize this to NULL
+``owner``
+	for internal VFS use: you should initialize this to THIS_MODULE
+	in most cases.
+
+``next``
+	for internal VFS use: you should initialize this to NULL
 
   s_lock_key, s_umount_key: lockdep-specific
 
 The mount() method has the following arguments:
 
-``struct file_system_type *fs_type``: describes the filesystem, partly initialized
-	by the specific filesystem code
+``struct file_system_type *fs_type``
+	describes the filesystem, partly initialized by the specific
+	filesystem code
 
-``int flags``: mount flags
+``int flags``
+	mount flags
 
-``const char *dev_name``: the device name we are mounting.
+``const char *dev_name``
+	the device name we are mounting.
 
-``void *data``: arbitrary mount options, usually comes as an ASCII
-	string (see "Mount Options" section)
+``void *data``
+	arbitrary mount options, usually comes as an ASCII string (see
+	"Mount Options" section)
 
 The mount() method must return the root dentry of the tree requested by
 caller.  An active reference to its superblock must be grabbed and the
@@ -178,22 +189,27 @@ implementation.
 Usually, a filesystem uses one of the generic mount() implementations
 and provides a fill_super() callback instead.  The generic variants are:
 
-``mount_bdev``: mount a filesystem residing on a block device
+``mount_bdev``
+	mount a filesystem residing on a block device
 
-``mount_nodev``: mount a filesystem that is not backed by a device
+``mount_nodev``
+	mount a filesystem that is not backed by a device
 
-``mount_single``: mount a filesystem which shares the instance between
-	all mounts
+``mount_single``
+	mount a filesystem which shares the instance between all mounts
 
 A fill_super() callback implementation has the following arguments:
 
-``struct super_block *sb``: the superblock structure.  The callback
-	must initialize this properly.
+``struct super_block *sb``
+	the superblock structure.  The callback must initialize this
+	properly.
 
-``void *data``: arbitrary mount options, usually comes as an ASCII
-	string (see "Mount Options" section)
+``void *data``
+	arbitrary mount options, usually comes as an ASCII string (see
+	"Mount Options" section)
 
-``int silent``: whether or not to be silent on error
+``int silent``
+	whether or not to be silent on error
 
 
 The Superblock Object
@@ -240,87 +256,106 @@ noted.  This means that most methods can block safely.  All methods are
 only called from a process context (i.e. not from an interrupt handler
 or bottom half).
 
-``alloc_inode``: this method is called by alloc_inode() to allocate memory
-	for struct inode and initialize it.  If this function is not
+``alloc_inode``
+	this method is called by alloc_inode() to allocate memory for
+	struct inode and initialize it.  If this function is not
 	defined, a simple 'struct inode' is allocated.  Normally
 	alloc_inode will be used to allocate a larger structure which
 	contains a 'struct inode' embedded within it.
 
-``destroy_inode``: this method is called by destroy_inode() to release
-	resources allocated for struct inode.  It is only required if
+``destroy_inode``
+	this method is called by destroy_inode() to release resources
+	allocated for struct inode.  It is only required if
 	->alloc_inode was defined and simply undoes anything done by
 	->alloc_inode.
 
-``dirty_inode``: this method is called by the VFS to mark an inode dirty.
+``dirty_inode``
+	this method is called by the VFS to mark an inode dirty.
 
-``write_inode``: this method is called when the VFS needs to write an
-	inode to disc.  The second parameter indicates whether the write
-	should be synchronous or not, not all filesystems check this flag.
+``write_inode``
+	this method is called when the VFS needs to write an inode to
+	disc.  The second parameter indicates whether the write should
+	be synchronous or not, not all filesystems check this flag.
 
-``drop_inode``: called when the last access to the inode is dropped,
-	with the inode->i_lock spinlock held.
+``drop_inode``
+	called when the last access to the inode is dropped, with the
+	inode->i_lock spinlock held.
 
 	This method should be either NULL (normal UNIX filesystem
-	semantics) or "generic_delete_inode" (for filesystems that do not
-	want to cache inodes - causing "delete_inode" to always be
+	semantics) or "generic_delete_inode" (for filesystems that do
+	not want to cache inodes - causing "delete_inode" to always be
 	called regardless of the value of i_nlink)
 
-	The "generic_delete_inode()" behavior is equivalent to the
-	old practice of using "force_delete" in the put_inode() case,
-	but does not have the races that the "force_delete()" approach
-	had. 
+	The "generic_delete_inode()" behavior is equivalent to the old
+	practice of using "force_delete" in the put_inode() case, but
+	does not have the races that the "force_delete()" approach had.
 
-``delete_inode``: called when the VFS wants to delete an inode
+``delete_inode``
+	called when the VFS wants to delete an inode
 
-``put_super``: called when the VFS wishes to free the superblock
+``put_super``
+	called when the VFS wishes to free the superblock
 	(i.e. unmount).  This is called with the superblock lock held
 
-``sync_fs``: called when VFS is writing out all dirty data associated with
-	a superblock.  The second parameter indicates whether the method
+``sync_fs``
+	called when VFS is writing out all dirty data associated with a
+	superblock.  The second parameter indicates whether the method
 	should wait until the write out has been completed.  Optional.
 
-``freeze_fs``: called when VFS is locking a filesystem and
-	forcing it into a consistent state.  This method is currently
-	used by the Logical Volume Manager (LVM).
+``freeze_fs``
+	called when VFS is locking a filesystem and forcing it into a
+	consistent state.  This method is currently used by the Logical
+	Volume Manager (LVM).
 
-``unfreeze_fs``: called when VFS is unlocking a filesystem and making it writable
+``unfreeze_fs``
+	called when VFS is unlocking a filesystem and making it writable
 	again.
 
-``statfs``: called when the VFS needs to get filesystem statistics.
+``statfs``
+	called when the VFS needs to get filesystem statistics.
 
-``remount_fs``: called when the filesystem is remounted.  This is called
-	with the kernel lock held
+``remount_fs``
+	called when the filesystem is remounted.  This is called with
+	the kernel lock held
 
-``clear_inode``: called then the VFS clears the inode.  Optional
+``clear_inode``
+	called then the VFS clears the inode.  Optional
 
-``umount_begin``: called when the VFS is unmounting a filesystem.
+``umount_begin``
+	called when the VFS is unmounting a filesystem.
 
-``show_options``: called by the VFS to show mount options for
-	/proc/<pid>/mounts.  (see "Mount Options" section)
+``show_options``
+	called by the VFS to show mount options for /proc/<pid>/mounts.
+	(see "Mount Options" section)
 
-``quota_read``: called by the VFS to read from filesystem quota file.
+``quota_read``
+	called by the VFS to read from filesystem quota file.
 
-``quota_write``: called by the VFS to write to filesystem quota file.
+``quota_write``
+	called by the VFS to write to filesystem quota file.
 
-``nr_cached_objects``: called by the sb cache shrinking function for the
-	filesystem to return the number of freeable cached objects it contains.
+``nr_cached_objects``
+	called by the sb cache shrinking function for the filesystem to
+	return the number of freeable cached objects it contains.
 	Optional.
 
-``free_cache_objects``: called by the sb cache shrinking function for the
-	filesystem to scan the number of objects indicated to try to free them.
-	Optional, but any filesystem implementing this method needs to also
-	implement ->nr_cached_objects for it to be called correctly.
+``free_cache_objects``
+	called by the sb cache shrinking function for the filesystem to
+	scan the number of objects indicated to try to free them.
+	Optional, but any filesystem implementing this method needs to
+	also implement ->nr_cached_objects for it to be called
+	correctly.
 
 	We can't do anything with any errors that the filesystem might
-	encountered, hence the void return type.  This will never be called if
-	the VM is trying to reclaim under GFP_NOFS conditions, hence this
-	method does not need to handle that situation itself.
+	encountered, hence the void return type.  This will never be
+	called if the VM is trying to reclaim under GFP_NOFS conditions,
+	hence this method does not need to handle that situation itself.
 
-	Implementations must include conditional reschedule calls inside any
-	scanning loop that is done.  This allows the VFS to determine
-	appropriate scan batch sizes without having to worry about whether
-	implementations will cause holdoff problems due to large scan batch
-	sizes.
+	Implementations must include conditional reschedule calls inside
+	any scanning loop that is done.  This allows the VFS to
+	determine appropriate scan batch sizes without having to worry
+	about whether implementations will cause holdoff problems due to
+	large scan batch sizes.
 
 Whoever sets up the inode is responsible for filling in the "i_op"
 field.  This is a pointer to a "struct inode_operations" which describes
@@ -334,23 +369,31 @@ On filesystems that support extended attributes (xattrs), the s_xattr
 superblock field points to a NULL-terminated array of xattr handlers.
 Extended attributes are name:value pairs.
 
-``name``: Indicates that the handler matches attributes with the specified name
-	(such as "system.posix_acl_access"); the prefix field must be NULL.
+``name``
+	Indicates that the handler matches attributes with the specified
+	name (such as "system.posix_acl_access"); the prefix field must
+	be NULL.
 
-``prefix``: Indicates that the handler matches all attributes with the specified
-	name prefix (such as "user."); the name field must be NULL.
+``prefix``
+	Indicates that the handler matches all attributes with the
+	specified name prefix (such as "user."); the name field must be
+	NULL.
 
-``list``: Determine if attributes matching this xattr handler should be listed
-	for a particular dentry.  Used by some listxattr implementations like
-	generic_listxattr.
+``list``
+	Determine if attributes matching this xattr handler should be
+	listed for a particular dentry.  Used by some listxattr
+	implementations like generic_listxattr.
 
-``get``: Called by the VFS to get the value of a particular extended attribute.
-	This method is called by the getxattr(2) system call.
+``get``
+	Called by the VFS to get the value of a particular extended
+	attribute.  This method is called by the getxattr(2) system
+	call.
 
-``set``: Called by the VFS to set the value of a particular extended attribute.
-	When the new value is NULL, called to remove a particular extended
-	attribute.  This method is called by the the setxattr(2) and
-	removexattr(2) system calls.
+``set``
+	Called by the VFS to set the value of a particular extended
+	attribute.  When the new value is NULL, called to remove a
+	particular extended attribute.  This method is called by the the
+	setxattr(2) and removexattr(2) system calls.
 
 When none of the xattr handlers of a filesystem match the specified
 attribute name or when a filesystem doesn't support extended attributes,
@@ -399,128 +442,147 @@ As of kernel 2.6.22, the following members are defined:
 Again, all methods are called without any locks being held, unless
 otherwise noted.
 
-``create``: called by the open(2) and creat(2) system calls.  Only
-	required if you want to support regular files.  The dentry you
-	get should not have an inode (i.e. it should be a negative
-	dentry).  Here you will probably call d_instantiate() with the
-	dentry and the newly created inode
+``create``
+	called by the open(2) and creat(2) system calls.  Only required
+	if you want to support regular files.  The dentry you get should
+	not have an inode (i.e. it should be a negative dentry).  Here
+	you will probably call d_instantiate() with the dentry and the
+	newly created inode
 
-``lookup``: called when the VFS needs to look up an inode in a parent
+``lookup``
+	called when the VFS needs to look up an inode in a parent
 	directory.  The name to look for is found in the dentry.  This
 	method must call d_add() to insert the found inode into the
 	dentry.  The "i_count" field in the inode structure should be
 	incremented.  If the named inode does not exist a NULL inode
 	should be inserted into the dentry (this is called a negative
-	dentry).  Returning an error code from this routine must only
-	be done on a real error, otherwise creating inodes with system
+	dentry).  Returning an error code from this routine must only be
+	done on a real error, otherwise creating inodes with system
 	calls like create(2), mknod(2), mkdir(2) and so on will fail.
 	If you wish to overload the dentry methods then you should
-	initialise the "d_dop" field in the dentry; this is a pointer
-	to a struct "dentry_operations".
-	This method is called with the directory inode semaphore held
+	initialise the "d_dop" field in the dentry; this is a pointer to
+	a struct "dentry_operations".  This method is called with the
+	directory inode semaphore held
 
-``link``: called by the link(2) system call.  Only required if you want
-	to support hard links.  You will probably need to call
+``link``
+	called by the link(2) system call.  Only required if you want to
+	support hard links.  You will probably need to call
 	d_instantiate() just as you would in the create() method
 
-``unlink``: called by the unlink(2) system call.  Only required if you
-	want to support deleting inodes
+``unlink``
+	called by the unlink(2) system call.  Only required if you want
+	to support deleting inodes
 
-``symlink``: called by the symlink(2) system call.  Only required if you
-	want to support symlinks.  You will probably need to call
+``symlink``
+	called by the symlink(2) system call.  Only required if you want
+	to support symlinks.  You will probably need to call
 	d_instantiate() just as you would in the create() method
 
-``mkdir``: called by the mkdir(2) system call.  Only required if you want
+``mkdir``
+	called by the mkdir(2) system call.  Only required if you want
 	to support creating subdirectories.  You will probably need to
 	call d_instantiate() just as you would in the create() method
 
-``rmdir``: called by the rmdir(2) system call.  Only required if you want
+``rmdir``
+	called by the rmdir(2) system call.  Only required if you want
 	to support deleting subdirectories
 
-``mknod``: called by the mknod(2) system call to create a device (char,
-	block) inode or a named pipe (FIFO) or socket.  Only required
-	if you want to support creating these types of inodes.  You
-	will probably need to call d_instantiate() just as you would
-	in the create() method
+``mknod``
+	called by the mknod(2) system call to create a device (char,
+	block) inode or a named pipe (FIFO) or socket.  Only required if
+	you want to support creating these types of inodes.  You will
+	probably need to call d_instantiate() just as you would in the
+	create() method
 
-``rename``: called by the rename(2) system call to rename the object to
-	have the parent and name given by the second inode and dentry.
+``rename``
+	called by the rename(2) system call to rename the object to have
+	the parent and name given by the second inode and dentry.
 
 	The filesystem must return -EINVAL for any unsupported or
-	unknown	flags.  Currently the following flags are implemented:
-	(1) RENAME_NOREPLACE: this flag indicates that if the target
-	of the rename exists the rename should fail with -EEXIST
-	instead of replacing the target.  The VFS already checks for
-	existence, so for local filesystems the RENAME_NOREPLACE
-	implementation is equivalent to plain rename.
+	unknown flags.  Currently the following flags are implemented:
+	(1) RENAME_NOREPLACE: this flag indicates that if the target of
+	the rename exists the rename should fail with -EEXIST instead of
+	replacing the target.  The VFS already checks for existence, so
+	for local filesystems the RENAME_NOREPLACE implementation is
+	equivalent to plain rename.
 	(2) RENAME_EXCHANGE: exchange source and target.  Both must
-	exist; this is checked by the VFS.  Unlike plain rename,
-	source and target may be of different type.
-
-``get_link``: called by the VFS to follow a symbolic link to the
-	inode it points to.  Only required if you want to support
-	symbolic links.  This method returns the symlink body
-	to traverse (and possibly resets the current position with
-	nd_jump_link()).  If the body won't go away until the inode
-	is gone, nothing else is needed; if it needs to be otherwise
-	pinned, arrange for its release by having get_link(..., ..., done)
-	do set_delayed_call(done, destructor, argument).
-	In that case destructor(argument) will be called once VFS is
-	done with the body you've returned.
-	May be called in RCU mode; that is indicated by NULL dentry
+	exist; this is checked by the VFS.  Unlike plain rename, source
+	and target may be of different type.
+
+``get_link``
+	called by the VFS to follow a symbolic link to the inode it
+	points to.  Only required if you want to support symbolic links.
+	This method returns the symlink body to traverse (and possibly
+	resets the current position with nd_jump_link()).  If the body
+	won't go away until the inode is gone, nothing else is needed;
+	if it needs to be otherwise pinned, arrange for its release by
+	having get_link(..., ..., done) do set_delayed_call(done,
+	destructor, argument).  In that case destructor(argument) will
+	be called once VFS is done with the body you've returned.  May
+	be called in RCU mode; that is indicated by NULL dentry
 	argument.  If request can't be handled without leaving RCU mode,
 	have it return ERR_PTR(-ECHILD).
 
-
 	If the filesystem stores the symlink target in ->i_link, the
 	VFS may use it directly without calling ->get_link(); however,
 	->get_link() must still be provided.  ->i_link must not be
 	freed until after an RCU grace period.  Writing to ->i_link
 	post-iget() time requires a 'release' memory barrier.
 
-``readlink``: this is now just an override for use by readlink(2) for the
+``readlink``
+	this is now just an override for use by readlink(2) for the
 	cases when ->get_link uses nd_jump_link() or object is not in
 	fact a symlink.  Normally filesystems should only implement
 	->get_link for symlinks and readlink(2) will automatically use
 	that.
 
-``permission``: called by the VFS to check for access rights on a POSIX-like
+``permission``
+	called by the VFS to check for access rights on a POSIX-like
 	filesystem.
 
-	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in rcu-walk
-	mode, the filesystem must check the permission without blocking or
-	storing to the inode.
+	May be called in rcu-walk mode (mask & MAY_NOT_BLOCK).  If in
+	rcu-walk mode, the filesystem must check the permission without
+	blocking or storing to the inode.
 
-	If a situation is encountered that rcu-walk cannot handle, return
+	If a situation is encountered that rcu-walk cannot handle,
+	return
 	-ECHILD and it will be called again in ref-walk mode.
 
-``setattr``: called by the VFS to set attributes for a file.  This method
-	is called by chmod(2) and related system calls.
-
-``getattr``: called by the VFS to get attributes of a file.  This method
-	is called by stat(2) and related system calls.
-
-``listxattr``: called by the VFS to list all extended attributes for a
-	given file.  This method is called by the listxattr(2) system call.
-
-``update_time``: called by the VFS to update a specific time or the i_version of
-	an inode.  If this is not defined the VFS will update the inode itself
-	and call mark_inode_dirty_sync.
-
-``atomic_open``: called on the last component of an open.  Using this optional
-	method the filesystem can look up, possibly create and open the file in
-	one atomic operation.  If it wants to leave actual opening to the
-	caller (e.g. if the file turned out to be a symlink, device, or just
-	something filesystem won't do atomic open for), it may signal this by
-	returning finish_no_open(file, dentry).  This method is only called if
-	the last component is negative or needs lookup.  Cached positive dentries
-	are still handled by f_op->open().  If the file was created,
-	FMODE_CREATED flag should be set in file->f_mode.  In case of O_EXCL
-	the method must only succeed if the file didn't exist and hence FMODE_CREATED
-	shall always be set on success.
-
-``tmpfile``: called in the end of O_TMPFILE open().  Optional, equivalent to
-	atomically creating, opening and unlinking a file in given directory.
+``setattr``
+	called by the VFS to set attributes for a file.  This method is
+	called by chmod(2) and related system calls.
+
+``getattr``
+	called by the VFS to get attributes of a file.  This method is
+	called by stat(2) and related system calls.
+
+``listxattr``
+	called by the VFS to list all extended attributes for a given
+	file.  This method is called by the listxattr(2) system call.
+
+``update_time``
+	called by the VFS to update a specific time or the i_version of
+	an inode.  If this is not defined the VFS will update the inode
+	itself and call mark_inode_dirty_sync.
+
+``atomic_open``
+	called on the last component of an open.  Using this optional
+	method the filesystem can look up, possibly create and open the
+	file in one atomic operation.  If it wants to leave actual
+	opening to the caller (e.g. if the file turned out to be a
+	symlink, device, or just something filesystem won't do atomic
+	open for), it may signal this by returning finish_no_open(file,
+	dentry).  This method is only called if the last component is
+	negative or needs lookup.  Cached positive dentries are still
+	handled by f_op->open().  If the file was created, FMODE_CREATED
+	flag should be set in file->f_mode.  In case of O_EXCL the
+	method must only succeed if the file didn't exist and hence
+	FMODE_CREATED shall always be set on success.
+
+``tmpfile``
+	called in the end of O_TMPFILE open().  Optional, equivalent to
+	atomically creating, opening and unlinking a file in given
+	directory.
 
 
 The Address Space Object
@@ -673,70 +735,75 @@ cache in your filesystem.  The following members are defined:
 		int (*swap_deactivate)(struct file *);
 	};
 
-``writepage``: called by the VM to write a dirty page to backing store.
-      This may happen for data integrity reasons (i.e. 'sync'), or
-      to free up memory (flush).  The difference can be seen in
-      wbc->sync_mode.
-      The PG_Dirty flag has been cleared and PageLocked is true.
-      writepage should start writeout, should set PG_Writeback,
-      and should make sure the page is unlocked, either synchronously
-      or asynchronously when the write operation completes.
-
-      If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
-      try too hard if there are problems, and may choose to write out
-      other pages from the mapping if that is easier (e.g. due to
-      internal dependencies).  If it chooses not to start writeout, it
-      should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep
-      calling ->writepage on that page.
-
-      See the file "Locking" for more details.
-
-``readpage``: called by the VM to read a page from backing store.
-       The page will be Locked when readpage is called, and should be
-       unlocked and marked uptodate once the read completes.
-       If ->readpage discovers that it needs to unlock the page for
-       some reason, it can do so, and then return AOP_TRUNCATED_PAGE.
-       In this case, the page will be relocated, relocked and if
-       that all succeeds, ->readpage will be called again.
-
-``writepages``: called by the VM to write out pages associated with the
+``writepage``
+	called by the VM to write a dirty page to backing store.  This
+	may happen for data integrity reasons (i.e. 'sync'), or to free
+	up memory (flush).  The difference can be seen in
+	wbc->sync_mode.  The PG_Dirty flag has been cleared and
+	PageLocked is true.  writepage should start writeout, should set
+	PG_Writeback, and should make sure the page is unlocked, either
+	synchronously or asynchronously when the write operation
+	completes.
+
+	If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
+	try too hard if there are problems, and may choose to write out
+	other pages from the mapping if that is easier (e.g. due to
+	internal dependencies).  If it chooses not to start writeout, it
+	should return AOP_WRITEPAGE_ACTIVATE so that the VM will not
+	keep calling ->writepage on that page.
+
+	See the file "Locking" for more details.
+
+``readpage``
+	called by the VM to read a page from backing store.  The page
+	will be Locked when readpage is called, and should be unlocked
+	and marked uptodate once the read completes.  If ->readpage
+	discovers that it needs to unlock the page for some reason, it
+	can do so, and then return AOP_TRUNCATED_PAGE.  In this case,
+	the page will be relocated, relocked and if that all succeeds,
+	->readpage will be called again.
+
+``writepages``
+	called by the VM to write out pages associated with the
 	address_space object.  If wbc->sync_mode is WBC_SYNC_ALL, then
 	the writeback_control will specify a range of pages that must be
-	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is given
-	and that many pages should be written if possible.
-	If no ->writepages is given, then mpage_writepages is used
-	instead.  This will choose pages from the address space that are
-	tagged as DIRTY and will pass them to ->writepage.
-
-``set_page_dirty``: called by the VM to set a page dirty.
-	This is particularly needed if an address space attaches
-	private data to a page, and that data needs to be updated when
-	a page is dirtied.  This is called, for example, when a memory
-	mapped page gets modified.
+	written out.  If it is WBC_SYNC_NONE, then a nr_to_write is
+	given and that many pages should be written if possible.  If no
+	->writepages is given, then mpage_writepages is used instead.
+	This will choose pages from the address space that are tagged as
+	DIRTY and will pass them to ->writepage.
+
+``set_page_dirty``
+	called by the VM to set a page dirty.  This is particularly
+	needed if an address space attaches private data to a page, and
+	that data needs to be updated when a page is dirtied.  This is
+	called, for example, when a memory mapped page gets modified.
 	If defined, it should set the PageDirty flag, and the
 	PAGECACHE_TAG_DIRTY tag in the radix tree.
 
-``readpages``: called by the VM to read pages associated with the address_space
-	object.  This is essentially just a vector version of
-	readpage.  Instead of just one page, several pages are
-	requested.
+``readpages``
+	called by the VM to read pages associated with the address_space
+	object.  This is essentially just a vector version of readpage.
+	Instead of just one page, several pages are requested.
 	readpages is only used for read-ahead, so read errors are
 	ignored.  If anything goes wrong, feel free to give up.
 
-``write_begin``:
-	Called by the generic buffered write code to ask the filesystem to
-	prepare to write len bytes at the given offset in the file.  The
-	address_space should check that the write will be able to complete,
-	by allocating space if necessary and doing any other internal
-	housekeeping.  If the write will update parts of any basic-blocks on
-	storage, then those blocks should be pre-read (if they haven't been
-	read already) so that the updated blocks can be written out properly.
+``write_begin``
+	Called by the generic buffered write code to ask the filesystem
+	to prepare to write len bytes at the given offset in the file.
+	The address_space should check that the write will be able to
+	complete, by allocating space if necessary and doing any other
+	internal housekeeping.  If the write will update parts of any
+	basic-blocks on storage, then those blocks should be pre-read
+	(if they haven't been read already) so that the updated blocks
+	can be written out properly.
 
-	The filesystem must return the locked pagecache page for the specified
-	offset, in ``*pagep``, for the caller to write into.
+	The filesystem must return the locked pagecache page for the
+	specified offset, in ``*pagep``, for the caller to write into.
 
-	It must be able to cope with short writes (where the length passed to
-	write_begin is greater than the number of bytes copied into the page).
+	It must be able to cope with short writes (where the length
+	passed to write_begin is greater than the number of bytes copied
+	into the page).
 
 	flags is a field for AOP_FLAG_xxx flags, described in
 	include/linux/fs.h.
@@ -744,114 +811,128 @@ cache in your filesystem.  The following members are defined:
 	A void * may be returned in fsdata, which then gets passed into
 	write_end.
 
-	Returns 0 on success; < 0 on failure (which is the error code), in
-	which case write_end is not called.
-
-``write_end``: After a successful write_begin, and data copy, write_end must
-	be called.  len is the original len passed to write_begin, and copied
-	is the amount that was able to be copied.
-
-	The filesystem must take care of unlocking the page and releasing it
-	refcount, and updating i_size.
-
-	Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
-	that were able to be copied into pagecache.
-
-``bmap``: called by the VFS to map a logical block offset within object to
-	physical block number.  This method is used by the FIBMAP
-	ioctl and for working with swap-files.  To be able to swap to
-	a file, the file must have a stable mapping to a block
-	device.  The swap system does not go through the filesystem
-	but instead uses bmap to find out where the blocks in the file
-	are and uses those addresses directly.
-
-``invalidatepage``: If a page has PagePrivate set, then invalidatepage
-	will be called when part or all of the page is to be removed
-	from the address space.  This generally corresponds to either a
-	truncation, punch hole  or a complete invalidation of the address
+	Returns 0 on success; < 0 on failure (which is the error code),
+	in which case write_end is not called.
+
+``write_end``
+	After a successful write_begin, and data copy, write_end must be
+	called.  len is the original len passed to write_begin, and
+	copied is the amount that was able to be copied.
+
+	The filesystem must take care of unlocking the page and
+	releasing it refcount, and updating i_size.
+
+	Returns < 0 on failure, otherwise the number of bytes (<=
+	'copied') that were able to be copied into pagecache.
+
+``bmap``
+	called by the VFS to map a logical block offset within object to
+	physical block number.  This method is used by the FIBMAP ioctl
+	and for working with swap-files.  To be able to swap to a file,
+	the file must have a stable mapping to a block device.  The swap
+	system does not go through the filesystem but instead uses bmap
+	to find out where the blocks in the file are and uses those
+	addresses directly.
+
+``invalidatepage``
+	If a page has PagePrivate set, then invalidatepage will be
+	called when part or all of the page is to be removed from the
+	address space.  This generally corresponds to either a
+	truncation, punch hole or a complete invalidation of the address
 	space (in the latter case 'offset' will always be 0 and 'length'
 	will be PAGE_SIZE).  Any private data associated with the page
-	should be updated to reflect this truncation.  If offset is 0 and
-	length is PAGE_SIZE, then the private data should be released,
-	because the page must be able to be completely discarded.  This may
-	be done by calling the ->releasepage function, but in this case the
-	release MUST succeed.
-
-``releasepage``: releasepage is called on PagePrivate pages to indicate
-	that the page should be freed if possible.  ->releasepage
-	should remove any private data from the page and clear the
-	PagePrivate flag.  If releasepage() fails for some reason, it must
-	indicate failure with a 0 return value.
-	releasepage() is used in two distinct though related cases.  The
-	first is when the VM finds a clean page with no active users and
-	wants to make it a free page.  If ->releasepage succeeds, the
-	page will be removed from the address_space and become free.
+	should be updated to reflect this truncation.  If offset is 0
+	and length is PAGE_SIZE, then the private data should be
+	released, because the page must be able to be completely
+	discarded.  This may be done by calling the ->releasepage
+	function, but in this case the release MUST succeed.
+
+``releasepage``
+	releasepage is called on PagePrivate pages to indicate that the
+	page should be freed if possible.  ->releasepage should remove
+	any private data from the page and clear the PagePrivate flag.
+	If releasepage() fails for some reason, it must indicate failure
+	with a 0 return value.  releasepage() is used in two distinct
+	though related cases.  The first is when the VM finds a clean
+	page with no active users and wants to make it a free page.  If
+	->releasepage succeeds, the page will be removed from the
+	address_space and become free.
 
 	The second case is when a request has been made to invalidate
-	some or all pages in an address_space.  This can happen
-	through the fadvise(POSIX_FADV_DONTNEED) system call or by the
-	filesystem explicitly requesting it as nfs and 9fs do (when
-	they believe the cache may be out of date with storage) by
-	calling invalidate_inode_pages2().
-	If the filesystem makes such a call, and needs to be certain
-	that all pages are invalidated, then its releasepage will
-	need to ensure this.  Possibly it can clear the PageUptodate
-	bit if it cannot free private data yet.
-
-``freepage``: freepage is called once the page is no longer visible in
-	the page cache in order to allow the cleanup of any private
-	data.  Since it may be called by the memory reclaimer, it
-	should not assume that the original address_space mapping still
-	exists, and it should not block.
-
-``direct_IO``: called by the generic read/write routines to perform
-	direct_IO - that is IO requests which bypass the page cache
-	and transfer data directly between the storage and the
-	application's address space.
-
-``isolate_page``: Called by the VM when isolating a movable non-lru page.
-	If page is successfully isolated, VM marks the page as PG_isolated
-	via __SetPageIsolated.
-
-``migrate_page``:  This is used to compact the physical memory usage.
-	If the VM wants to relocate a page (maybe off a memory card
-	that is signalling imminent failure) it will pass a new page
-	and an old page to this function.  migrate_page should
-	transfer any private data across and update any references
-	that it has to the page.
-
-``putback_page``: Called by the VM when isolated page's migration fails.
-
-``launder_page``: Called before freeing a page - it writes back the dirty page.  To
-	prevent redirtying the page, it is kept locked during the whole
-	operation.
-
-``is_partially_uptodate``: Called by the VM when reading a file through the
-	pagecache when the underlying blocksize != pagesize.  If the required
-	block is up to date then the read can complete without needing the IO
-	to bring the whole page up to date.
-
-``is_dirty_writeback``: Called by the VM when attempting to reclaim a page.
-	The VM uses dirty and writeback information to determine if it needs
-	to stall to allow flushers a chance to complete some IO.  Ordinarily
-	it can use PageDirty and PageWriteback but some filesystems have
-	more complex state (unstable pages in NFS prevent reclaim) or
-	do not set those flags due to locking problems.  This callback
-	allows a filesystem to indicate to the VM if a page should be
-	treated as dirty or writeback for the purposes of stalling.
-
-``error_remove_page``: normally set to generic_error_remove_page if truncation
-	is ok for this address space.  Used for memory failure handling.
+	some or all pages in an address_space.  This can happen through
+	the fadvise(POSIX_FADV_DONTNEED) system call or by the
+	filesystem explicitly requesting it as nfs and 9fs do (when they
+	believe the cache may be out of date with storage) by calling
+	invalidate_inode_pages2().  If the filesystem makes such a call,
+	and needs to be certain that all pages are invalidated, then its
+	releasepage will need to ensure this.  Possibly it can clear the
+	PageUptodate bit if it cannot free private data yet.
+
+``freepage``
+	freepage is called once the page is no longer visible in the
+	page cache in order to allow the cleanup of any private data.
+	Since it may be called by the memory reclaimer, it should not
+	assume that the original address_space mapping still exists, and
+	it should not block.
+
+``direct_IO``
+	called by the generic read/write routines to perform direct_IO -
+	that is IO requests which bypass the page cache and transfer
+	data directly between the storage and the application's address
+	space.
+
+``isolate_page``
+	Called by the VM when isolating a movable non-lru page.  If page
+	is successfully isolated, VM marks the page as PG_isolated via
+	__SetPageIsolated.
+
+``migrate_page``
+	This is used to compact the physical memory usage.  If the VM
+	wants to relocate a page (maybe off a memory card that is
+	signalling imminent failure) it will pass a new page and an old
+	page to this function.  migrate_page should transfer any private
+	data across and update any references that it has to the page.
+
+``putback_page``
+	Called by the VM when isolated page's migration fails.
+
+``launder_page``
+	Called before freeing a page - it writes back the dirty page.
+	To prevent redirtying the page, it is kept locked during the
+	whole operation.
+
+``is_partially_uptodate``
+	Called by the VM when reading a file through the pagecache when
+	the underlying blocksize != pagesize.  If the required block is
+	up to date then the read can complete without needing the IO to
+	bring the whole page up to date.
+
+``is_dirty_writeback``
+	Called by the VM when attempting to reclaim a page.  The VM uses
+	dirty and writeback information to determine if it needs to
+	stall to allow flushers a chance to complete some IO.
+	Ordinarily it can use PageDirty and PageWriteback but some
+	filesystems have more complex state (unstable pages in NFS
+	prevent reclaim) or do not set those flags due to locking
+	problems.  This callback allows a filesystem to indicate to the
+	VM if a page should be treated as dirty or writeback for the
+	purposes of stalling.
+
+``error_remove_page``
+	normally set to generic_error_remove_page if truncation is ok
+	for this address space.  Used for memory failure handling.
 	Setting this implies you deal with pages going away under you,
 	unless you have them locked or reference counts increased.
 
-``swap_activate``: Called when swapon is used on a file to allocate
-	space if necessary and pin the block lookup information in
-	memory.  A return value of zero indicates success,
-	in which case this file can be used to back swapspace.
+``swap_activate``
+	Called when swapon is used on a file to allocate space if
+	necessary and pin the block lookup information in memory.  A
+	return value of zero indicates success, in which case this file
+	can be used to back swapspace.
 
-``swap_deactivate``: Called during swapoff on files where swap_activate
-	was successful.
+``swap_deactivate``
+	Called during swapoff on files where swap_activate was
+	successful.
 
 
 The File Object
@@ -912,91 +993,120 @@ This describes how the VFS can manipulate an open file.  As of kernel
 Again, all methods are called without any locks being held, unless
 otherwise noted.
 
-``llseek``: called when the VFS needs to move the file position index
+``llseek``
+	called when the VFS needs to move the file position index
 
-``read``: called by read(2) and related system calls
+``read``
+	called by read(2) and related system calls
 
-``read_iter``: possibly asynchronous read with iov_iter as destination
+``read_iter``
+	possibly asynchronous read with iov_iter as destination
 
-``write``: called by write(2) and related system calls
+``write``
+	called by write(2) and related system calls
 
-``write_iter``: possibly asynchronous write with iov_iter as source
+``write_iter``
+	possibly asynchronous write with iov_iter as source
 
-``iopoll``: called when aio wants to poll for completions on HIPRI iocbs
+``iopoll``
+	called when aio wants to poll for completions on HIPRI iocbs
 
-``iterate``: called when the VFS needs to read the directory contents
+``iterate``
+	called when the VFS needs to read the directory contents
 
-``iterate_shared``: called when the VFS needs to read the directory contents
-	when filesystem supports concurrent dir iterators
+``iterate_shared``
+	called when the VFS needs to read the directory contents when
+	filesystem supports concurrent dir iterators
 
-``poll``: called by the VFS when a process wants to check if there is
+``poll``
+	called by the VFS when a process wants to check if there is
 	activity on this file and (optionally) go to sleep until there
 	is activity.  Called by the select(2) and poll(2) system calls
 
-``unlocked_ioctl``: called by the ioctl(2) system call.
+``unlocked_ioctl``
+	called by the ioctl(2) system call.
 
-``compat_ioctl``: called by the ioctl(2) system call when 32 bit system calls
-	 are used on 64 bit kernels.
+``compat_ioctl``
+	called by the ioctl(2) system call when 32 bit system calls are
+	 used on 64 bit kernels.
 
-``mmap``: called by the mmap(2) system call
+``mmap``
+	called by the mmap(2) system call
 
-``open``: called by the VFS when an inode should be opened.  When the VFS
+``open``
+	called by the VFS when an inode should be opened.  When the VFS
 	opens a file, it creates a new "struct file".  It then calls the
 	open method for the newly allocated file structure.  You might
-	think that the open method really belongs in
-	"struct inode_operations", and you may be right.  I think it's
-	done the way it is because it makes filesystems simpler to
-	implement.  The open() method is a good place to initialize the
+	think that the open method really belongs in "struct
+	inode_operations", and you may be right.  I think it's done the
+	way it is because it makes filesystems simpler to implement.
+	The open() method is a good place to initialize the
 	"private_data" member in the file structure if you want to point
 	to a device structure
 
-``flush``: called by the close(2) system call to flush a file
+``flush``
+	called by the close(2) system call to flush a file
 
-``release``: called when the last reference to an open file is closed
+``release``
+	called when the last reference to an open file is closed
 
-``fsync``: called by the fsync(2) system call.  Also see the section above
-	 entitled "Handling errors during writeback".
+``fsync``
+	called by the fsync(2) system call.  Also see the section above
+	entitled "Handling errors during writeback".
 
-``fasync``: called by the fcntl(2) system call when asynchronous
+``fasync``
+	called by the fcntl(2) system call when asynchronous
 	(non-blocking) mode is enabled for a file
 
-``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
-	commands
+``lock``
+	called by the fcntl(2) system call for F_GETLK, F_SETLK, and
+	F_SETLKW commands
 
-``get_unmapped_area``: called by the mmap(2) system call
+``get_unmapped_area``
+	called by the mmap(2) system call
 
-``check_flags``: called by the fcntl(2) system call for F_SETFL command
+``check_flags``
+	called by the fcntl(2) system call for F_SETFL command
 
-``flock``: called by the flock(2) system call
+``flock``
+	called by the flock(2) system call
 
-``splice_write``: called by the VFS to splice data from a pipe to a file.  This
-		method is used by the splice(2) system call
+``splice_write``
+	called by the VFS to splice data from a pipe to a file.  This
+	method is used by the splice(2) system call
 
-``splice_read``: called by the VFS to splice data from file to a pipe.  This
-	       method is used by the splice(2) system call
+``splice_read``
+	called by the VFS to splice data from file to a pipe.  This
+	method is used by the splice(2) system call
 
-``setlease``: called by the VFS to set or release a file lock lease.  setlease
-	    implementations should call generic_setlease to record or remove
-	    the lease in the inode after setting it.
+``setlease``
+	called by the VFS to set or release a file lock lease.  setlease
+	implementations should call generic_setlease to record or remove
+	the lease in the inode after setting it.
 
-``fallocate``: called by the VFS to preallocate blocks or punch a hole.
+``fallocate``
+	called by the VFS to preallocate blocks or punch a hole.
 
-``copy_file_range``: called by the copy_file_range(2) system call.
+``copy_file_range``
+	called by the copy_file_range(2) system call.
 
-``remap_file_range``: called by the ioctl(2) system call for FICLONERANGE and
-	FICLONE and FIDEDUPERANGE commands to remap file ranges.  An
-	implementation should remap len bytes at pos_in of the source file into
-	the dest file at pos_out.  Implementations must handle callers passing
-	in len == 0; this means "remap to the end of the source file".  The
-	return value should the number of bytes remapped, or the usual
-	negative error code if errors occurred before any bytes were remapped.
-	The remap_flags parameter accepts REMAP_FILE_* flags.  If
-	REMAP_FILE_DEDUP is set then the implementation must only remap if the
-	requested file ranges have identical contents.  If REMAP_CAN_SHORTEN is
-	set, the caller is ok with the implementation shortening the request
-	length to satisfy alignment or EOF requirements (or any other reason).
+``remap_file_range``
+	called by the ioctl(2) system call for FICLONERANGE and FICLONE
+	and FIDEDUPERANGE commands to remap file ranges.  An
+	implementation should remap len bytes at pos_in of the source
+	file into the dest file at pos_out.  Implementations must handle
+	callers passing in len == 0; this means "remap to the end of the
+	source file".  The return value should the number of bytes
+	remapped, or the usual negative error code if errors occurred
+	before any bytes were remapped.  The remap_flags parameter
+	accepts REMAP_FILE_* flags.  If REMAP_FILE_DEDUP is set then the
+	implementation must only remap if the requested file ranges have
+	identical contents.  If REMAP_CAN_SHORTEN is set, the caller is
+	ok with the implementation shortening the request length to
+	satisfy alignment or EOF requirements (or any other reason).
 
-``fadvise``: possibly called by the fadvise64() system call.
+``fadvise``
+	possibly called by the fadvise64() system call.
 
 Note that the file operations are implemented by the specific
 filesystem in which the inode resides.  When opening a device node
@@ -1041,89 +1151,104 @@ defined:
 		struct dentry *(*d_real)(struct dentry *, const struct inode *);
 	};
 
-``d_revalidate``: called when the VFS needs to revalidate a dentry.  This
-	is called whenever a name look-up finds a dentry in the
-	dcache.  Most local filesystems leave this as NULL, because all their
-	dentries in the dcache are valid.  Network filesystems are different
-	since things can change on the server without the client necessarily
-	being aware of it.
-
-	This function should return a positive value if the dentry is still
-	valid, and zero or a negative error code if it isn't.
-
-	d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU).
-	If in rcu-walk mode, the filesystem must revalidate the dentry without
-	blocking or storing to the dentry, d_parent and d_inode should not be
-	used without care (because they can change and, in d_inode case, even
-	become NULL under us).
-
-	If a situation is encountered that rcu-walk cannot handle, return
+``d_revalidate``
+	called when the VFS needs to revalidate a dentry.  This is
+	called whenever a name look-up finds a dentry in the dcache.
+	Most local filesystems leave this as NULL, because all their
+	dentries in the dcache are valid.  Network filesystems are
+	different since things can change on the server without the
+	client necessarily being aware of it.
+
+	This function should return a positive value if the dentry is
+	still valid, and zero or a negative error code if it isn't.
+
+	d_revalidate may be called in rcu-walk mode (flags &
+	LOOKUP_RCU).  If in rcu-walk mode, the filesystem must
+	revalidate the dentry without blocking or storing to the dentry,
+	d_parent and d_inode should not be used without care (because
+	they can change and, in d_inode case, even become NULL under
+	us).
+
+	If a situation is encountered that rcu-walk cannot handle,
+	return
 	-ECHILD and it will be called again in ref-walk mode.
 
-``_weak_revalidate``: called when the VFS needs to revalidate a "jumped" dentry.
-	This is called when a path-walk ends at dentry that was not acquired by
-	doing a lookup in the parent directory.  This includes "/", "." and "..",
-	as well as procfs-style symlinks and mountpoint traversal.
+``_weak_revalidate``
+	called when the VFS needs to revalidate a "jumped" dentry.  This
+	is called when a path-walk ends at dentry that was not acquired
+	by doing a lookup in the parent directory.  This includes "/",
+	"." and "..", as well as procfs-style symlinks and mountpoint
+	traversal.
 
-	In this case, we are less concerned with whether the dentry is still
-	fully correct, but rather that the inode is still valid.  As with
-	d_revalidate, most local filesystems will set this to NULL since their
-	dcache entries are always valid.
+	In this case, we are less concerned with whether the dentry is
+	still fully correct, but rather that the inode is still valid.
+	As with d_revalidate, most local filesystems will set this to
+	NULL since their dcache entries are always valid.
 
-	This function has the same return code semantics as d_revalidate.
+	This function has the same return code semantics as
+	d_revalidate.
 
 	d_weak_revalidate is only called after leaving rcu-walk mode.
 
-``d_hash``: called when the VFS adds a dentry to the hash table.  The first
+``d_hash``
+	called when the VFS adds a dentry to the hash table.  The first
 	dentry passed to d_hash is the parent directory that the name is
 	to be hashed into.
 
 	Same locking and synchronisation rules as d_compare regarding
 	what is safe to dereference etc.
 
-``d_compare``: called to compare a dentry name with a given name.  The first
+``d_compare``
+	called to compare a dentry name with a given name.  The first
 	dentry is the parent of the dentry to be compared, the second is
-	the child dentry.  len and name string are properties of the dentry
-	to be compared.  qstr is the name to compare it with.
+	the child dentry.  len and name string are properties of the
+	dentry to be compared.  qstr is the name to compare it with.
 
 	Must be constant and idempotent, and should not take locks if
-	possible, and should not or store into the dentry.
-	Should not dereference pointers outside the dentry without
-	lots of care (eg.  d_parent, d_inode, d_name should not be used).
-
-	However, our vfsmount is pinned, and RCU held, so the dentries and
-	inodes won't disappear, neither will our sb or filesystem module.
-	->d_sb may be used.
-
-	It is a tricky calling convention because it needs to be called under
-	"rcu-walk", ie. without any locks or references on things.
-
-``d_delete``: called when the last reference to a dentry is dropped and the
-	dcache is deciding whether or not to cache it.  Return 1 to delete
-	immediately, or 0 to cache the dentry.  Default is NULL which means to
-	always cache a reachable dentry.  d_delete must be constant and
-	idempotent.
-
-``d_init``: called when a dentry is allocated
-
-``d_release``: called when a dentry is really deallocated
-
-``d_iput``: called when a dentry loses its inode (just prior to its
-	being deallocated).  The default when this is NULL is that the
-	VFS calls iput().  If you define this method, you must call
-	iput() yourself
-
-``d_dname``: called when the pathname of a dentry should be generated.
-	Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay
-	pathname generation.  (Instead of doing it when dentry is created,
-	it's done only when the path is needed.).  Real filesystems probably
-	dont want to use it, because their dentries are present in global
-	dcache hash, so their hash should be an invariant.  As no lock is
-	held, d_dname() should not try to modify the dentry itself, unless
-	appropriate SMP safety is used.  CAUTION : d_path() logic is quite
-	tricky.  The correct way to return for example "Hello" is to put it
-	at the end of the buffer, and returns a pointer to the first char.
-	dynamic_dname() helper function is provided to take care of this.
+	possible, and should not or store into the dentry.  Should not
+	dereference pointers outside the dentry without lots of care
+	(eg.  d_parent, d_inode, d_name should not be used).
+
+	However, our vfsmount is pinned, and RCU held, so the dentries
+	and inodes won't disappear, neither will our sb or filesystem
+	module.  ->d_sb may be used.
+
+	It is a tricky calling convention because it needs to be called
+	under "rcu-walk", ie. without any locks or references on things.
+
+``d_delete``
+	called when the last reference to a dentry is dropped and the
+	dcache is deciding whether or not to cache it.  Return 1 to
+	delete immediately, or 0 to cache the dentry.  Default is NULL
+	which means to always cache a reachable dentry.  d_delete must
+	be constant and idempotent.
+
+``d_init``
+	called when a dentry is allocated
+
+``d_release``
+	called when a dentry is really deallocated
+
+``d_iput``
+	called when a dentry loses its inode (just prior to its being
+	deallocated).  The default when this is NULL is that the VFS
+	calls iput().  If you define this method, you must call iput()
+	yourself
+
+``d_dname``
+	called when the pathname of a dentry should be generated.
+	Useful for some pseudo filesystems (sockfs, pipefs, ...) to
+	delay pathname generation.  (Instead of doing it when dentry is
+	created, it's done only when the path is needed.).  Real
+	filesystems probably dont want to use it, because their dentries
+	are present in global dcache hash, so their hash should be an
+	invariant.  As no lock is held, d_dname() should not try to
+	modify the dentry itself, unless appropriate SMP safety is used.
+	CAUTION : d_path() logic is quite tricky.  The correct way to
+	return for example "Hello" is to put it at the end of the
+	buffer, and returns a pointer to the first char.
+	dynamic_dname() helper function is provided to take care of
+	this.
 
 	Example :
 
@@ -1135,52 +1260,57 @@ defined:
 				dentry->d_inode->i_ino);
 	}
 
-``d_automount``: called when an automount dentry is to be traversed (optional).
-	This should create a new VFS mount record and return the record to the
-	caller.  The caller is supplied with a path parameter giving the
-	automount directory to describe the automount target and the parent
-	VFS mount record to provide inheritable mount parameters.  NULL should
-	be returned if someone else managed to make the automount first.  If
-	the vfsmount creation failed, then an error code should be returned.
-	If -EISDIR is returned, then the directory will be treated as an
-	ordinary directory and returned to pathwalk to continue walking.
-
-	If a vfsmount is returned, the caller will attempt to mount it on the
-	mountpoint and will remove the vfsmount from its expiration list in
-	the case of failure.  The vfsmount should be returned with 2 refs on
-	it to prevent automatic expiration - the caller will clean up the
-	additional ref.
-
-	This function is only used if DCACHE_NEED_AUTOMOUNT is set on the
-	dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is set on the
-	inode being added.
-
-``d_manage``: called to allow the filesystem to manage the transition from a
-	dentry (optional).  This allows autofs, for example, to hold up clients
-	waiting to explore behind a 'mountpoint' while letting the daemon go
-	past and construct the subtree there.  0 should be returned to let the
-	calling process continue.  -EISDIR can be returned to tell pathwalk to
-	use this directory as an ordinary directory and to ignore anything
-	mounted on it and not to check the automount flag.  Any other error
-	code will abort pathwalk completely.
+``d_automount``
+	called when an automount dentry is to be traversed (optional).
+	This should create a new VFS mount record and return the record
+	to the caller.  The caller is supplied with a path parameter
+	giving the automount directory to describe the automount target
+	and the parent VFS mount record to provide inheritable mount
+	parameters.  NULL should be returned if someone else managed to
+	make the automount first.  If the vfsmount creation failed, then
+	an error code should be returned.  If -EISDIR is returned, then
+	the directory will be treated as an ordinary directory and
+	returned to pathwalk to continue walking.
+
+	If a vfsmount is returned, the caller will attempt to mount it
+	on the mountpoint and will remove the vfsmount from its
+	expiration list in the case of failure.  The vfsmount should be
+	returned with 2 refs on it to prevent automatic expiration - the
+	caller will clean up the additional ref.
+
+	This function is only used if DCACHE_NEED_AUTOMOUNT is set on
+	the dentry.  This is set by __d_instantiate() if S_AUTOMOUNT is
+	set on the inode being added.
+
+``d_manage``
+	called to allow the filesystem to manage the transition from a
+	dentry (optional).  This allows autofs, for example, to hold up
+	clients waiting to explore behind a 'mountpoint' while letting
+	the daemon go past and construct the subtree there.  0 should be
+	returned to let the calling process continue.  -EISDIR can be
+	returned to tell pathwalk to use this directory as an ordinary
+	directory and to ignore anything mounted on it and not to check
+	the automount flag.  Any other error code will abort pathwalk
+	completely.
 
 	If the 'rcu_walk' parameter is true, then the caller is doing a
-	pathwalk in RCU-walk mode.  Sleeping is not permitted in this mode,
-	and the caller can be asked to leave it and call again by returning
-	-ECHILD.  -EISDIR may also be returned to tell pathwalk to
-	ignore d_automount or any mounts.
+	pathwalk in RCU-walk mode.  Sleeping is not permitted in this
+	mode, and the caller can be asked to leave it and call again by
+	returning -ECHILD.  -EISDIR may also be returned to tell
+	pathwalk to ignore d_automount or any mounts.
 
-	This function is only used if DCACHE_MANAGE_TRANSIT is set on the
-	dentry being transited from.
+	This function is only used if DCACHE_MANAGE_TRANSIT is set on
+	the dentry being transited from.
 
-``d_real``: overlay/union type filesystems implement this method to return one of
-	the underlying dentries hidden by the overlay.  It is used in two
-	different modes:
+``d_real``
+	overlay/union type filesystems implement this method to return
+	one of the underlying dentries hidden by the overlay.  It is
+	used in two different modes:
 
-	Called from file_dentry() it returns the real dentry matching the inode
-	argument.  The real dentry may be from a lower layer already copied up,
-	but still referenced from the file.  This mode is selected with a
-	non-NULL inode argument.
+	Called from file_dentry() it returns the real dentry matching
+	the inode argument.  The real dentry may be from a lower layer
+	already copied up, but still referenced from the file.  This
+	mode is selected with a non-NULL inode argument.
 
 	With NULL inode the topmost real underlying dentry is returned.
 
@@ -1195,40 +1325,47 @@ Directory Entry Cache API
 There are a number of functions defined which permit a filesystem to
 manipulate dentries:
 
-``dget``: open a new handle for an existing dentry (this just increments
+``dget``
+	open a new handle for an existing dentry (this just increments
 	the usage count)
 
-``dput``: close a handle for a dentry (decrements the usage count).  If
+``dput``
+	close a handle for a dentry (decrements the usage count).  If
 	the usage count drops to 0, and the dentry is still in its
 	parent's hash, the "d_delete" method is called to check whether
-	it should be cached.  If it should not be cached, or if the dentry
-	is not hashed, it is deleted.  Otherwise cached dentries are put
-	into an LRU list to be reclaimed on memory shortage.
-
-``d_drop``: this unhashes a dentry from its parents hash list.  A
-	subsequent call to dput() will deallocate the dentry if its
-	usage count drops to 0
-
-``d_delete``: delete a dentry.  If there are no other open references to
-	the dentry then the dentry is turned into a negative dentry
-	(the d_iput() method is called).  If there are other
-	references, then d_drop() is called instead
-
-``d_add``: add a dentry to its parents hash list and then calls
+	it should be cached.  If it should not be cached, or if the
+	dentry is not hashed, it is deleted.  Otherwise cached dentries
+	are put into an LRU list to be reclaimed on memory shortage.
+
+``d_drop``
+	this unhashes a dentry from its parents hash list.  A subsequent
+	call to dput() will deallocate the dentry if its usage count
+	drops to 0
+
+``d_delete``
+	delete a dentry.  If there are no other open references to the
+	dentry then the dentry is turned into a negative dentry (the
+	d_iput() method is called).  If there are other references, then
+	d_drop() is called instead
+
+``d_add``
+	add a dentry to its parents hash list and then calls
 	d_instantiate()
 
-``d_instantiate``: add a dentry to the alias hash list for the inode and
-	updates the "d_inode" member.  The "i_count" member in the
-	inode structure should be set/incremented.  If the inode
-	pointer is NULL, the dentry is called a "negative
-	dentry".  This function is commonly called when an inode is
-	created for an existing negative dentry
-
-``d_lookup``: look up a dentry given its parent and path name component
-	It looks up the child of that given name from the dcache
-	hash table.  If it is found, the reference count is incremented
-	and the dentry is returned.  The caller must use dput()
-	to free the dentry when it finishes using it.
+``d_instantiate``
+	add a dentry to the alias hash list for the inode and updates
+	the "d_inode" member.  The "i_count" member in the inode
+	structure should be set/incremented.  If the inode pointer is
+	NULL, the dentry is called a "negative dentry".  This function
+	is commonly called when an inode is created for an existing
+	negative dentry
+
+``d_lookup``
+	look up a dentry given its parent and path name component It
+	looks up the child of that given name from the dcache hash
+	table.  If it is found, the reference count is incremented and
+	the dentry is returned.  The caller must use dput() to free the
+	dentry when it finishes using it.
 
 
 Mount Options
-- 
cgit v1.2.3-59-g8ed1b


From b422124758c19db06c4c30c4abb8f57bf18995b9 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Wed, 5 Jun 2019 19:39:44 +0300
Subject: docs/core-api: Add string helpers API to the list

Some times string helpers are needed, but there is nothing about them
in the generated documentation.

Fill the gap by adding a reference to string_helpers.c exported functions.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/kernel-api.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index a53ec2eb8176..65ae2bf1f86d 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -33,6 +33,9 @@ String Conversions
 .. kernel-doc:: lib/kstrtox.c
    :export:
 
+.. kernel-doc:: lib/string_helpers.c
+   :export:
+
 String Manipulation
 -------------------
 
-- 
cgit v1.2.3-59-g8ed1b


From 58d494669f36d0b61b7ec42c232877167ed3f5ce Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Wed, 5 Jun 2019 19:51:13 +0300
Subject: docs/core-api: Add integer power functions to the list

Some times integer power functions, such as int_sqrt(), are needed, but
there is nothing about them in the generated documentation.

Fill the gap by adding a reference to the corresponding exported functions.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/kernel-api.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index 65ae2bf1f86d..824f24ccf401 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -141,6 +141,15 @@ Base 2 log and power Functions
 .. kernel-doc:: include/linux/log2.h
    :internal:
 
+Integer power Functions
+-----------------------
+
+.. kernel-doc:: lib/math/int_pow.c
+   :export:
+
+.. kernel-doc:: lib/math/int_sqrt.c
+   :export:
+
 Division Functions
 ------------------
 
-- 
cgit v1.2.3-59-g8ed1b


From 99d2b938672944831035bef50c68a6e948e93abf Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Date: Fri, 7 Jun 2019 16:47:13 +0900
Subject: Documentation: DMA-API: fix a function name of max_mapping_size

The exported function name is dma_max_mapping_size(), not
dma_direct_max_mapping_size() so that this patch fixes
the function name in the documentation.

Fixes: 133d624b1cee ("dma: Introduce dma_max_mapping_size()")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DMA-API.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 0076150fdccb..e47c63bd4887 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -198,7 +198,7 @@ call to set the mask to the value returned.
 ::
 
 	size_t
-	dma_direct_max_mapping_size(struct device *dev);
+	dma_max_mapping_size(struct device *dev);
 
 Returns the maximum size of a mapping for the device. The size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
-- 
cgit v1.2.3-59-g8ed1b


From 4241d516b0041ae55092fb12739e12184427de5d Mon Sep 17 00:00:00 2001
From: Helen Koike <helen.koike@collabora.com>
Date: Tue, 4 Jun 2019 15:27:19 -0300
Subject: Documentation/dm-init: fix multi device example

The example in the docs regarding multiple device-mappers is invalid (it
has a wrong number of arguments), it's a left over from previous
versions of the patch.
Replace the example with an valid and tested one.

Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/device-mapper/dm-init.txt | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/dm-init.txt b/Documentation/device-mapper/dm-init.txt
index 8464ee7c01b8..130b3c3679c5 100644
--- a/Documentation/device-mapper/dm-init.txt
+++ b/Documentation/device-mapper/dm-init.txt
@@ -74,13 +74,13 @@ this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
 An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here
 split on multiple lines for readability:
 
-  vroot,,,ro,
-    0 1740800 verity 254:0 254:0 1740800 sha1
-      76e9be054b15884a9fa85973e9cb274c93afadb6
-      5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe;
-  vram,,,rw,
-    0 32768 linear 1:0 0,
-    32768 32768 linear 1:1 0
+  dm-linear,,1,rw,
+    0 32768 linear 8:1 0,
+    32768 1024000 linear 8:2 0;
+  dm-verity,,3,ro,
+    0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
+    ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
+    5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
 
 Other examples (per target):
 
-- 
cgit v1.2.3-59-g8ed1b


From e0cef9ff6315d48a4dfd39da09ca770e242f9cb5 Mon Sep 17 00:00:00 2001
From: Aurelien Thierry <aurelien.thierry@quoscient.io>
Date: Fri, 7 Jun 2019 10:07:02 +0200
Subject: Documentation: fix typo CLOCK_MONONOTNIC_COARSE

Fix typo in documentation file timekeeping.rst: CLOCK_MONONOTNIC_COARSE
should be CLOCK_MONOTONIC_COARSE.

Signed-off-by: Aurelien Thierry <aurelien.thierry@quoscient.io>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/timekeeping.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/core-api/timekeeping.rst b/Documentation/core-api/timekeeping.rst
index 93cbeb9daec0..5f87d9c8b04d 100644
--- a/Documentation/core-api/timekeeping.rst
+++ b/Documentation/core-api/timekeeping.rst
@@ -111,7 +111,7 @@ Some additional variants exist for more specialized cases:
 		void ktime_get_coarse_raw_ts64( struct timespec64 * )
 
 	These are quicker than the non-coarse versions, but less accurate,
-	corresponding to CLOCK_MONONOTNIC_COARSE and CLOCK_REALTIME_COARSE
+	corresponding to CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE
 	in user space, along with the equivalent boottime/tai/raw
 	timebase not available in user space.
 
-- 
cgit v1.2.3-59-g8ed1b


From e47cf0c958775700c74223a1f21a8b3457c57069 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Fri, 7 Jun 2019 13:07:29 +0200
Subject: Documentation: tee: Grammar s/the its/its/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/tee.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/tee.txt b/Documentation/tee.txt
index 56ea85ffebf2..afacdf2fd1de 100644
--- a/Documentation/tee.txt
+++ b/Documentation/tee.txt
@@ -32,7 +32,7 @@ User space (the client) connects to the driver by opening /dev/tee[0-9]* or
   memory.
 
 - TEE_IOC_VERSION lets user space know which TEE this driver handles and
-  the its capabilities.
+  its capabilities.
 
 - TEE_IOC_OPEN_SESSION opens a new session to a Trusted Application.
 
-- 
cgit v1.2.3-59-g8ed1b


From 6fb44c439eda692f94cf60aad55f130a34204ece Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Fri, 7 Jun 2019 13:08:42 +0200
Subject: Documentation: net: dsa: Grammar s/the its/its/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/networking/dsa/dsa.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index ca87068b9ab9..563d56c6a25c 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -531,7 +531,7 @@ Bridge VLAN filtering
   a software implementation.
 
 .. note:: VLAN ID 0 corresponds to the port private database, which, in the context
-        of DSA, would be the its port-based VLAN, used by the associated bridge device.
+        of DSA, would be its port-based VLAN, used by the associated bridge device.
 
 - ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a
   Forwarding Database entry, the switch hardware should be programmed to delete
@@ -554,7 +554,7 @@ Bridge VLAN filtering
   associated with this VLAN ID.
 
 .. note:: VLAN ID 0 corresponds to the port private database, which, in the context
-        of DSA, would be the its port-based VLAN, used by the associated bridge device.
+        of DSA, would be its port-based VLAN, used by the associated bridge device.
 
 - ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a
   multicast database entry, the switch hardware should be programmed to delete
-- 
cgit v1.2.3-59-g8ed1b


From 3f9564e680efb2092dfb826e2f768920c9eb203b Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Fri, 7 Jun 2019 13:29:51 +0200
Subject: KVM: arm/arm64: Always capitalize ITS

All but one reference is capitalized.  Fix the remaining one.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/virtual/kvm/devices/arm-vgic-its.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
index 4f0c9fc40365..eeaa95b893a8 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic-its.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt
@@ -103,7 +103,7 @@ Groups:
 The following ordering must be followed when restoring the GIC and the ITS:
 a) restore all guest memory and create vcpus
 b) restore all redistributors
-c) provide the its base address
+c) provide the ITS base address
    (KVM_DEV_ARM_VGIC_GRP_ADDR)
 d) restore the ITS in the following order:
    1. Restore GITS_CBASER
-- 
cgit v1.2.3-59-g8ed1b


From b1663d7e3a7961fc45262fd68a89253f2803036c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Tue, 4 Jun 2019 09:26:27 -0300
Subject: docs: Kbuild/Makefile: allow check for missing docs at build time

While this doesn't make sense for production Kernels, in order to
avoid regressions when documents are touched, let's add a
check target at the make file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/Kconfig                | 13 +++++++++++++
 Documentation/Makefile               |  5 +++++
 Kconfig                              |  2 ++
 scripts/documentation-file-ref-check |  9 +++++++++
 4 files changed, 29 insertions(+)
 create mode 100644 Documentation/Kconfig

diff --git a/Documentation/Kconfig b/Documentation/Kconfig
new file mode 100644
index 000000000000..66046fa1c341
--- /dev/null
+++ b/Documentation/Kconfig
@@ -0,0 +1,13 @@
+config WARN_MISSING_DOCUMENTS
+
+	bool "Warn if there's a missing documentation file"
+	depends on COMPILE_TEST
+	help
+	   It is not uncommon that a document gets renamed.
+	   This option makes the Kernel to check for missing dependencies,
+	   warning when something is missing. Works only if the Kernel
+	   is built from a git tree.
+
+	   If unsure, select 'N'.
+
+
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2df0789f90b7..e145e4db508b 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -4,6 +4,11 @@
 
 subdir-y := devicetree/bindings/
 
+# Check for broken documentation file references
+ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y)
+$(shell $(srctree)/scripts/documentation-file-ref-check --warn)
+endif
+
 # You can set these variables from the command line.
 SPHINXBUILD   = sphinx-build
 SPHINXOPTS    =
diff --git a/Kconfig b/Kconfig
index 48a80beab685..990b0c390dfc 100644
--- a/Kconfig
+++ b/Kconfig
@@ -30,3 +30,5 @@ source "crypto/Kconfig"
 source "lib/Kconfig"
 
 source "lib/Kconfig.debug"
+
+source "Documentation/Kconfig"
diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index ff16db269079..440227bb55a9 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -22,9 +22,16 @@ $scriptname =~ s,.*/([^/]+/),$1,;
 # Parse arguments
 my $help = 0;
 my $fix = 0;
+my $warn = 0;
+
+if (! -d ".git") {
+	printf "Warning: can't check if file exists, as this is not a git tree";
+	exit 0;
+}
 
 GetOptions(
 	'fix' => \$fix,
+	'warn' => \$warn,
 	'h|help|usage' => \$help,
 );
 
@@ -139,6 +146,8 @@ while (<IN>) {
 			if (!($ref =~ m/(scripts|Kconfig|Kbuild)/)) {
 				$broken_ref{$ref}++;
 			}
+		} elsif ($warn) {
+			print STDERR "Warning: $f references a file that doesn't exist: $fulref\n";
 		} else {
 			print STDERR "$f: $fulref\n";
 		}
-- 
cgit v1.2.3-59-g8ed1b


From 889aa9ca930602a0e860cfb89e467c2a7a729b1b Mon Sep 17 00:00:00 2001
From: Luca Ceresoli <luca@lucaceresoli.net>
Date: Fri, 31 May 2019 16:30:16 +0200
Subject: docs: clk: fix struct syntax

The clk_foo_ops struct example has syntax errors. Fix it so it can be
copy-pasted and used more easily.

Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/clk.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/driver-api/clk.rst b/Documentation/driver-api/clk.rst
index 593cca5058b1..3cad45d14187 100644
--- a/Documentation/driver-api/clk.rst
+++ b/Documentation/driver-api/clk.rst
@@ -175,9 +175,9 @@ the following::
 To take advantage of your data you'll need to support valid operations
 for your clk::
 
-	struct clk_ops clk_foo_ops {
-		.enable		= &clk_foo_enable;
-		.disable	= &clk_foo_disable;
+	struct clk_ops clk_foo_ops = {
+		.enable		= &clk_foo_enable,
+		.disable	= &clk_foo_disable,
 	};
 
 Implement the above functions using container_of::
-- 
cgit v1.2.3-59-g8ed1b


From 54002b56b04bc83f8961c8751f6bfef07461d587 Mon Sep 17 00:00:00 2001
From: Bjorn Helgaas <bhelgaas@google.com>
Date: Thu, 30 May 2019 16:59:14 -0500
Subject: scripts/sphinx-pre-install: fix "dependenties" typo

Fix typo ("dependenties" for "dependencies").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/sphinx-pre-install | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index f001fc2fcf12..158f522f12ed 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -632,7 +632,7 @@ sub check_needs()
 	}
 	printf "\n";
 
-	print "All optional dependenties are met.\n" if (!$optional);
+	print "All optional dependencies are met.\n" if (!$optional);
 
 	if ($need == 1) {
 		die "Can't build as $need mandatory dependency is missing";
-- 
cgit v1.2.3-59-g8ed1b


From 165915c17d681c61962251728d72ecdabe95518e Mon Sep 17 00:00:00 2001
From: Federico Vaga <federico.vaga@vaga.pv.it>
Date: Thu, 30 May 2019 22:14:54 +0200
Subject: doc:it_IT: fix file references

Fix italian translation file references based on
`scripts/documentation-file-ref-check` output.

Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../translations/it_IT/admin-guide/kernel-parameters.rst     | 12 ++++++++++++
 Documentation/translations/it_IT/process/adding-syscalls.rst |  2 +-
 Documentation/translations/it_IT/process/coding-style.rst    |  2 +-
 Documentation/translations/it_IT/process/howto.rst           |  2 +-
 Documentation/translations/it_IT/process/magic-number.rst    |  2 +-
 .../translations/it_IT/process/stable-kernel-rules.rst       |  4 ++--
 6 files changed, 18 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/translations/it_IT/admin-guide/kernel-parameters.rst

diff --git a/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst b/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst
new file mode 100644
index 000000000000..0e36d82a92be
--- /dev/null
+++ b/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst
@@ -0,0 +1,12 @@
+.. include:: ../disclaimer-ita.rst
+
+:Original: :ref:`Documentation/admin-guide/kernel-parameters.rst <kernelparameters>`
+
+.. _it_kernelparameters:
+
+I parametri da linea di comando del kernel
+==========================================
+
+.. warning::
+
+    TODO ancora da tradurre
diff --git a/Documentation/translations/it_IT/process/adding-syscalls.rst b/Documentation/translations/it_IT/process/adding-syscalls.rst
index e0a64b0688a7..c3a3439595a6 100644
--- a/Documentation/translations/it_IT/process/adding-syscalls.rst
+++ b/Documentation/translations/it_IT/process/adding-syscalls.rst
@@ -39,7 +39,7 @@ vostra interfaccia.
        un qualche modo opaca.
 
  - Se dovete esporre solo delle informazioni sul sistema, un nuovo nodo in
-   sysfs (vedere ``Documentation/translations/it_IT/filesystems/sysfs.txt``) o
+   sysfs (vedere ``Documentation/filesystems/sysfs.txt``) o
    in procfs potrebbe essere sufficiente.  Tuttavia, l'accesso a questi
    meccanismi richiede che il filesystem sia montato, il che potrebbe non
    essere sempre vero (per esempio, in ambienti come namespace/sandbox/chroot).
diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst
index 5ef534c95e69..a6559d25a23d 100644
--- a/Documentation/translations/it_IT/process/coding-style.rst
+++ b/Documentation/translations/it_IT/process/coding-style.rst
@@ -696,7 +696,7 @@ nella stringa di titolo::
 	...
 
 Per la documentazione completa sui file di configurazione, consultate
-il documento Documentation/translations/it_IT/kbuild/kconfig-language.txt
+il documento Documentation/kbuild/kconfig-language.txt
 
 
 11) Strutture dati
diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst
index 9903ac7c566b..44e6077730e8 100644
--- a/Documentation/translations/it_IT/process/howto.rst
+++ b/Documentation/translations/it_IT/process/howto.rst
@@ -131,7 +131,7 @@ Di seguito una lista di file che sono presenti nei sorgente del kernel e che
 	"Linux kernel patch submission format"
 		http://linux.yyz.us/patch-format.html
 
-  :ref:`Documentation/process/translations/it_IT/stable-api-nonsense.rst <it_stable_api_nonsense>`
+  :ref:`Documentation/translations/it_IT/process/stable-api-nonsense.rst <it_stable_api_nonsense>`
 
     Questo file descrive la motivazioni sottostanti la conscia decisione di
     non avere un API stabile all'interno del kernel, incluso cose come:
diff --git a/Documentation/translations/it_IT/process/magic-number.rst b/Documentation/translations/it_IT/process/magic-number.rst
index 5281d53e57ee..ed1121d0ba84 100644
--- a/Documentation/translations/it_IT/process/magic-number.rst
+++ b/Documentation/translations/it_IT/process/magic-number.rst
@@ -1,6 +1,6 @@
 .. include:: ../disclaimer-ita.rst
 
-:Original: :ref:`Documentation/process/magic-numbers.rst <magicnumbers>`
+:Original: :ref:`Documentation/process/magic-number.rst <magicnumbers>`
 :Translator: Federico Vaga <federico.vaga@vaga.pv.it>
 
 .. _it_magicnumbers:
diff --git a/Documentation/translations/it_IT/process/stable-kernel-rules.rst b/Documentation/translations/it_IT/process/stable-kernel-rules.rst
index 48e88e5ad2c5..4f206cee31a7 100644
--- a/Documentation/translations/it_IT/process/stable-kernel-rules.rst
+++ b/Documentation/translations/it_IT/process/stable-kernel-rules.rst
@@ -33,7 +33,7 @@ Regole sul tipo di patch che vengono o non vengono accettate nei sorgenti
  - Non deve includere alcuna correzione "banale" (correzioni grammaticali,
    pulizia dagli spazi bianchi, eccetera).
  - Deve rispettare le regole scritte in
-   :ref:`Documentation/translation/it_IT/process/submitting-patches.rst <it_submittingpatches>`
+   :ref:`Documentation/translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`
  - Questa patch o una equivalente deve esistere già nei sorgenti principali di
    Linux
 
@@ -43,7 +43,7 @@ Procedura per sottomettere patch per i sorgenti -stable
 
  - Se la patch contiene modifiche a dei file nelle cartelle net/ o drivers/net,
    allora seguite le linee guida descritte in
-   :ref:`Documentation/translation/it_IT/networking/netdev-FAQ.rst <it_netdev-FAQ>`;
+   :ref:`Documentation/translations/it_IT/networking/netdev-FAQ.rst <it_netdev-FAQ>`;
    ma solo dopo aver verificato al seguente indirizzo che la patch non sia
    già in coda:
    https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive=
-- 
cgit v1.2.3-59-g8ed1b


From bed0918d64ca28169d55bd138ed20f09e288303e Mon Sep 17 00:00:00 2001
From: Federico Vaga <federico.vaga@vaga.pv.it>
Date: Thu, 30 May 2019 22:14:55 +0200
Subject: doc:it_IT: documentation alignment

Documentation alignment for the following changes:
a700767a7682 (doc/docs-next) docs: requirements.txt: recommend Sphinx 1.7.9

Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/translations/it_IT/doc-guide/sphinx.rst | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/translations/it_IT/doc-guide/sphinx.rst b/Documentation/translations/it_IT/doc-guide/sphinx.rst
index 793b5cc33403..1739cba8863e 100644
--- a/Documentation/translations/it_IT/doc-guide/sphinx.rst
+++ b/Documentation/translations/it_IT/doc-guide/sphinx.rst
@@ -35,8 +35,7 @@ Installazione Sphinx
 ====================
 
 I marcatori ReST utilizzati nei file in Documentation/ sono pensati per essere
-processati da ``Sphinx`` nella versione 1.3 o superiore. Se desiderate produrre
-un documento PDF è raccomandato l'utilizzo di una versione superiore alle 1.4.6.
+processati da ``Sphinx`` nella versione 1.3 o superiore.
 
 Esiste uno script che verifica i requisiti Sphinx. Per ulteriori dettagli
 consultate :ref:`it_sphinx-pre-install`.
@@ -68,13 +67,13 @@ pacchettizzato dalla vostra distribuzione.
       utilizzando LaTeX. Per una corretta interpretazione, è necessario aver
       installato texlive con i pacchetti amdfonts e amsmath.
 
-Riassumendo, se volete installare la versione 1.4.9 di Sphinx dovete eseguire::
+Riassumendo, se volete installare la versione 1.7.9 di Sphinx dovete eseguire::
 
-       $ virtualenv sphinx_1.4
-       $ . sphinx_1.4/bin/activate
-       (sphinx_1.4) $ pip install -r Documentation/sphinx/requirements.txt
+       $ virtualenv sphinx_1.7.9
+       $ . sphinx_1.7.9/bin/activate
+       (sphinx_1.7.9) $ pip install -r Documentation/sphinx/requirements.txt
 
-Dopo aver eseguito ``. sphinx_1.4/bin/activate``, il prompt cambierà per
+Dopo aver eseguito ``. sphinx_1.7.9/bin/activate``, il prompt cambierà per
 indicare che state usando il nuovo ambiente. Se aprite un nuova sessione,
 prima di generare la documentazione, dovrete rieseguire questo comando per
 rientrare nell'ambiente virtuale.
@@ -120,8 +119,8 @@ l'installazione::
 	You should run:
 
 		sudo dnf install -y texlive-luatex85
-		/usr/bin/virtualenv sphinx_1.4
-		. sphinx_1.4/bin/activate
+		/usr/bin/virtualenv sphinx_1.7.9
+		. sphinx_1.7.9/bin/activate
 		pip install -r Documentation/sphinx/requirements.txt
 
 	Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468.
-- 
cgit v1.2.3-59-g8ed1b


From 3d9cf48b2ca257f1a249b347236098c3cf9d54f1 Mon Sep 17 00:00:00 2001
From: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Date: Thu, 9 May 2019 15:40:49 +0800
Subject: Documentation: nvdimm: Fix typo

Remove the extra 'we '.

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/nvdimm/nvdimm.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt
index e894de69915a..1669f626b037 100644
--- a/Documentation/nvdimm/nvdimm.txt
+++ b/Documentation/nvdimm/nvdimm.txt
@@ -284,8 +284,8 @@ A bus has a 1:1 relationship with an NFIT.  The current expectation for
 ACPI based systems is that there is only ever one platform-global NFIT.
 That said, it is trivial to register multiple NFITs, the specification
 does not preclude it.  The infrastructure supports multiple busses and
-we we use this capability to test multiple NFIT configurations in the
-unit test.
+we use this capability to test multiple NFIT configurations in the unit
+test.
 
 LIBNVDIMM: control class device in /sys/class
 
-- 
cgit v1.2.3-59-g8ed1b


From 9d61944356590c40b13f6b1f99df84260e4db0c1 Mon Sep 17 00:00:00 2001
From: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Date: Thu, 9 May 2019 11:05:49 +0800
Subject: Documentation: xfs: Fix typo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In "Y+P" of this line, there are two non-ASCII characters(0xd9 0x8d)
following behind the 'Y'.  Shown as a small '=' under the '+' in VIM
and a '賺' in webpage[1].

I think it's a mistake and remove these strange characters.

[1]: https://www.kernel.org/doc/Documentation/filesystems/xfs-delayed-logging-design.txt

Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/xfs-delayed-logging-design.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/filesystems/xfs-delayed-logging-design.txt b/Documentation/filesystems/xfs-delayed-logging-design.txt
index 2ce36439c09f..9a6dd289b17b 100644
--- a/Documentation/filesystems/xfs-delayed-logging-design.txt
+++ b/Documentation/filesystems/xfs-delayed-logging-design.txt
@@ -34,7 +34,7 @@ transaction:
 	   D			A+B+C+D		X+n+m+o
 	    <object written to disk>
 	   E			   E		   Y (> X+n+m+o)
-	   F			  E+F		  Yٍ+p
+	   F			  E+F		  Y+p
 
 In other words, each time an object is relogged, the new transaction contains
 the aggregation of all the previous changes currently held only in the log.
-- 
cgit v1.2.3-59-g8ed1b


From 462e5a521ab73f7762583add73cbab1662612beb Mon Sep 17 00:00:00 2001
From: "George G. Davis" <george_davis@mentor.com>
Date: Wed, 5 Jun 2019 16:30:10 -0400
Subject: treewide: trivial: fix s/poped/popped/ typo

Fix a couple of s/poped/popped/ typos.

Signed-off-by: George G. Davis <george_davis@mentor.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm/mem_alignment | 2 +-
 arch/x86/kernel/kprobes/core.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/arm/mem_alignment b/Documentation/arm/mem_alignment
index 6335fcacbba9..e110e2781039 100644
--- a/Documentation/arm/mem_alignment
+++ b/Documentation/arm/mem_alignment
@@ -1,4 +1,4 @@
-Too many problems poped up because of unnoticed misaligned memory access in
+Too many problems popped up because of unnoticed misaligned memory access in
 kernel code lately.  Therefore the alignment fixup is now unconditionally
 configured in for SA11x0 based targets.  According to Alan Cox, this is a
 bad idea to configure it out, but Russell King has some good reasons for
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 9e4fa2484d10..1de809afaf65 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -826,7 +826,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 			continue;
 		/*
 		 * Return probes must be pushed on this hash list correct
-		 * order (same as return order) so that it can be poped
+		 * order (same as return order) so that it can be popped
 		 * correctly. However, if we find it is pushed it incorrect
 		 * order, this means we find a function which should not be
 		 * probed, because the wrong order entry is pushed on the
-- 
cgit v1.2.3-59-g8ed1b


From 78a89463a31ce463a4b968553f57ff9932a0697f Mon Sep 17 00:00:00 2001
From: Lecopzer Chen <lecopzer.chen@mediatek.com>
Date: Thu, 9 May 2019 18:31:16 +0800
Subject: Documentation: {u,k}probes: add tracing_on before tracing

After following the document step by step, the `cat trace` can't be
worked without enabling tracing_on and might mislead newbies about
the functionality.

Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/trace/kprobetrace.rst  | 6 ++++++
 Documentation/trace/uprobetracer.rst | 7 ++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 235ce2ab131a..baa3c42ba2f4 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -189,6 +189,12 @@ events, you need to enable it.
   echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
   echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
 
+Use the following command to start tracing in an interval.
+::
+    # echo 1 > tracing_on
+    Open something...
+    # echo 0 > tracing_on
+
 And you can see the traced information via /sys/kernel/debug/tracing/trace.
 ::
 
diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 4346e23e3ae7..0b21305fabdc 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -152,10 +152,15 @@ events, you need to enable it by::
 
     # echo 1 > events/uprobes/enable
 
-Lets disable the event after sleeping for some time.
+Lets start tracing, sleep for some time and stop tracing.
 ::
 
+    # echo 1 > tracing_on
     # sleep 20
+    # echo 0 > tracing_on
+
+Also, you can disable the event by::
+
     # echo 0 > events/uprobes/enable
 
 And you can see the traced information via /sys/kernel/debug/tracing/trace.
-- 
cgit v1.2.3-59-g8ed1b


From 671c30957e78a822917cf0b04c4592e9813f7f9b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:17 -0300
Subject: ABI: sysfs-devices-system-cpu: point to the right docs

The cpuidle doc was split on two, one at the admin guide
and another one at the driver API guide. Instead of pointing
to a non-existent file, point to both (admin guide being
the first one).

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1528239f69b2..87478ac6c2af 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -137,7 +137,8 @@ Description:	Discover cpuidle policy and mechanism
 		current_governor: (RW) displays current idle policy. Users can
 		switch the governor at runtime by writing to this file.
 
-		See files in Documentation/cpuidle/ for more information.
+		See Documentation/admin-guide/pm/cpuidle.rst and
+		Documentation/driver-api/pm/cpuidle.rst for more information.
 
 
 What:		/sys/devices/system/cpu/cpuX/cpuidle/stateN/name
-- 
cgit v1.2.3-59-g8ed1b


From 8b01caee99fb07218908c0ac9be8c758878f33f9 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:18 -0300
Subject: isdn: mISDN: remove a bogus reference to a non-existing doc

The mISDN driver was added on those commits:

	960366cf8dbb ("Add mISDN DSP")
	1b2b03f8e514 ("Add mISDN core files")
	04578dd330f1 ("Define AF_ISDN and PF_ISDN")
	e4ac9bc1f668 ("Add mISDN driver")

None of them added a Documentation/isdn/mISDN.cert file.
Also, whatever were supposed to be written there on that time,
probably doesn't make any sense nowadays, as I doubt isdn would
have any massive changes.

So, let's just get rid of the broken reference, in order to
shut up a warning produced by ./scripts/documentation-file-ref-check.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 drivers/isdn/mISDN/dsp_core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/isdn/mISDN/dsp_core.c b/drivers/isdn/mISDN/dsp_core.c
index cd036e87335a..038e72a84b33 100644
--- a/drivers/isdn/mISDN/dsp_core.c
+++ b/drivers/isdn/mISDN/dsp_core.c
@@ -4,8 +4,6 @@
  *		Karsten Keil (keil@isdn4linux.de)
  *
  *		This file is (c) under GNU PUBLIC LICENSE
- *		For changes and modifications please read
- *		../../../Documentation/isdn/mISDN.cert
  *
  * Thanks to    Karsten Keil (great drivers)
  *              Cologne Chip (great chips)
-- 
cgit v1.2.3-59-g8ed1b


From 065efe27872ca942b53b9f11d5b3f534a9c33857 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:19 -0300
Subject: docs: zh_CN: get rid of basic_profiling.txt

Changeset 5700d1974818 ("docs: Get rid of the "basic profiling" guide")
removed an old basic-profiling.txt file that was not updated over
the last 11 years and won't reflect the post-perf era.

It makes no sense to keep its translation, so get rid of it too.

Fixes: 5700d1974818 ("docs: Get rid of the "basic profiling" guide")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../translations/zh_CN/basic_profiling.txt         | 71 ----------------------
 1 file changed, 71 deletions(-)
 delete mode 100644 Documentation/translations/zh_CN/basic_profiling.txt

diff --git a/Documentation/translations/zh_CN/basic_profiling.txt b/Documentation/translations/zh_CN/basic_profiling.txt
deleted file mode 100644
index 1e6bf0bdf8f5..000000000000
--- a/Documentation/translations/zh_CN/basic_profiling.txt
+++ /dev/null
@@ -1,71 +0,0 @@
-Chinese translated version of Documentation/basic_profiling
-
-If you have any comment or update to the content, please post to LKML directly.
-However, if you have problem communicating in English you can also ask the
-Chinese maintainer for help.  Contact the Chinese maintainer, if this
-translation is outdated or there is problem with translation.
-
-Chinese maintainer: Liang Xie <xieliang@xiaomi.com>
----------------------------------------------------------------------
-Documentation/basic_profiling的中文翻译
-
-如果想评论或更新本文的内容，请直接发信到LKML。如果你使用英文交流有困难的话，也可
-以向中文版维护者求助。如果本翻译更新不及时或者翻译存在问题，请联系中文版维护者。
-
-中文版维护者： 谢良 Liang Xie <xieliang007@gmail.com>
-中文版翻译者： 谢良 Liang Xie <xieliang007@gmail.com>
-中文版校译者：
-以下为正文
----------------------------------------------------------------------
-
-下面这些说明指令都是非常基础的，如果你想进一步了解请阅读相关专业文档：）
-请不要再在本文档增加新的内容，但可以修复文档中的错误：）(mbligh@aracnet.com)
-感谢John Levon，Dave Hansen等在撰写时的帮助
-
-<test> 用于表示要测量的目标
-请先确保您已经有正确的System.map / vmlinux配置！
-
-对于linux系统来说，配置vmlinuz最容易的方法可能就是使用“make install”，然后修改
-/sbin/installkernel将vmlinux拷贝到/boot目录，而System.map通常是默认安装好的
-
-Readprofile
------------
-2.6系列内核需要版本相对较新的readprofile，比如util-linux 2.12a中包含的，可以从:
-
-http://www.kernel.org/pub/linux/utils/util-linux/ 下载
-
-大部分linux发行版已经包含了.
-
-启用readprofile需要在kernel启动命令行增加”profile=2“
-
-clear		readprofile -r
-		<test>
-dump output	readprofile -m /boot/System.map > captured_profile
-
-Oprofile
---------
-
-从http://oprofile.sourceforge.net/获取源代码（请参考Changes以获取匹配的版本）
-在kernel启动命令行增加“idle=poll”
-
-配置CONFIG_PROFILING=y和CONFIG_OPROFILE=y然后重启进入新kernel
-
-./configure --with-kernel-support
-make install
-
-想得到好的测量结果，请确保启用了本地APIC特性。如果opreport显示有0Hz CPU，
-说明APIC特性没有开启。另外注意idle=poll选项可能有损性能。
-
-One time setup:
-		opcontrol --setup --vmlinux=/boot/vmlinux
-
-clear		opcontrol --reset
-start		opcontrol --start
-		<test>
-stop		opcontrol --stop
-dump output	opreport >  output_file
-
-如果只看kernel相关的报告结果，请运行命令 opreport -l /boot/vmlinux > output_file
-
-通过reset选项可以清理过期统计数据，相当于重启的效果。
-
-- 
cgit v1.2.3-59-g8ed1b


From 2e03e3a42c961b709926ba5f7c42c09ea6bfb8c1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:20 -0300
Subject: docs: mm: numaperf.rst: get rid of a build warning

When building it, it gets this warning:

	Documentation/admin-guide/mm/numaperf.rst:168: WARNING: Footnote [1] is not referenced.

The problem is that this is not really a reference, as it is not
mentioned within the documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/mm/numaperf.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst
index c067ed145158..a80c3c37226e 100644
--- a/Documentation/admin-guide/mm/numaperf.rst
+++ b/Documentation/admin-guide/mm/numaperf.rst
@@ -165,5 +165,6 @@ write-through caching.
 ========
 See Also
 ========
-.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
-       Section 5.2.27
+
+[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
+- Section 5.2.27
-- 
cgit v1.2.3-59-g8ed1b


From d857a3ffd3d609d1c822b255d4fe4db8b3464e34 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:21 -0300
Subject: docs: bpf: get rid of two warnings

Documentation/bpf/btf.rst:154: WARNING: Unexpected indentation.
Documentation/bpf/btf.rst:163: WARNING: Unexpected indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/bpf/btf.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst
index 8820360d00da..4ae022d274ab 100644
--- a/Documentation/bpf/btf.rst
+++ b/Documentation/bpf/btf.rst
@@ -151,6 +151,7 @@ for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
 
 The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
 for this int. For example, a bitfield struct member has:
+
  * btf member bit offset 100 from the start of the structure,
  * btf member pointing to an int type,
  * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
@@ -160,6 +161,7 @@ from bits ``100 + 2 = 102``.
 
 Alternatively, the bitfield struct member can be the following to access the
 same bits as the above:
+
  * btf member bit offset 102,
  * btf member pointing to an int type,
  * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
-- 
cgit v1.2.3-59-g8ed1b


From 27c054d2939f1a46a4da62732e71c140e664afb9 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:22 -0300
Subject: docs: mark orphan documents as such

Sphinx doesn't like orphan documents:

    Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree
    Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree
    Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree
    Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree
    Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree
    Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree
    Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree
    Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree

So, while they aren't on any toctree, add :orphan: to them, in order
to silent this warning.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/accelerators/ocxl.rst             | 2 ++
 Documentation/arm/stm32/overview.rst            | 2 ++
 Documentation/arm/stm32/stm32f429-overview.rst  | 2 ++
 Documentation/arm/stm32/stm32f746-overview.rst  | 2 ++
 Documentation/arm/stm32/stm32f769-overview.rst  | 2 ++
 Documentation/arm/stm32/stm32h743-overview.rst  | 2 ++
 Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++
 Documentation/gpu/msm-crash-dump.rst            | 2 ++
 Documentation/interconnect/interconnect.rst     | 2 ++
 Documentation/laptops/lg-laptop.rst             | 2 ++
 Documentation/powerpc/isa-versions.rst          | 2 ++
 11 files changed, 22 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
index 14cefc020e2d..b1cea19a90f5 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 ========================================================
 OpenCAPI (Open Coherent Accelerator Processor Interface)
 ========================================================
diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst
index 85cfc8410798..f7e734153860 100644
--- a/Documentation/arm/stm32/overview.rst
+++ b/Documentation/arm/stm32/overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 ========================
 STM32 ARM Linux Overview
 ========================
diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst
index 18feda97f483..65bbb1c3b423 100644
--- a/Documentation/arm/stm32/stm32f429-overview.rst
+++ b/Documentation/arm/stm32/stm32f429-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 STM32F429 Overview
 ==================
 
diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst
index b5f4b6ce7656..42d593085015 100644
--- a/Documentation/arm/stm32/stm32f746-overview.rst
+++ b/Documentation/arm/stm32/stm32f746-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 STM32F746 Overview
 ==================
 
diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst
index 228656ced2fe..f6adac862b17 100644
--- a/Documentation/arm/stm32/stm32f769-overview.rst
+++ b/Documentation/arm/stm32/stm32f769-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 STM32F769 Overview
 ==================
 
diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst
index 3458dc00095d..c525835e7473 100644
--- a/Documentation/arm/stm32/stm32h743-overview.rst
+++ b/Documentation/arm/stm32/stm32h743-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 STM32H743 Overview
 ==================
 
diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst
index 62e176d47ca7..2c52cd020601 100644
--- a/Documentation/arm/stm32/stm32mp157-overview.rst
+++ b/Documentation/arm/stm32/stm32mp157-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 STM32MP157 Overview
 ===================
 
diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst
index 757cd257e0d8..240ef200f76c 100644
--- a/Documentation/gpu/msm-crash-dump.rst
+++ b/Documentation/gpu/msm-crash-dump.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 =====================
 MSM Crash Dump Format
 =====================
diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
index c3e004893796..56e331dab70e 100644
--- a/Documentation/interconnect/interconnect.rst
+++ b/Documentation/interconnect/interconnect.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0
 
+:orphan:
+
 =====================================
 GENERIC SYSTEM INTERCONNECT SUBSYSTEM
 =====================================
diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst
index aa503ee9b3bc..f2c2ffe31101 100644
--- a/Documentation/laptops/lg-laptop.rst
+++ b/Documentation/laptops/lg-laptop.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0+
 
+:orphan:
+
 LG Gram laptop extra features
 =============================
 
diff --git a/Documentation/powerpc/isa-versions.rst b/Documentation/powerpc/isa-versions.rst
index 812e20cc898c..66c24140ebf1 100644
--- a/Documentation/powerpc/isa-versions.rst
+++ b/Documentation/powerpc/isa-versions.rst
@@ -1,3 +1,5 @@
+:orphan:
+
 CPU to ISA Version Mapping
 ==========================
 
-- 
cgit v1.2.3-59-g8ed1b


From f672febc3d132ea0487c63367455124dfa39e30f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:23 -0300
Subject: docs: amd-memory-encryption.rst get rid of warnings

Get rid of those warnings:

    Documentation/virtual/kvm/amd-memory-encryption.rst:244: WARNING: Citation [white-paper] is not referenced.
    Documentation/virtual/kvm/amd-memory-encryption.rst:246: WARNING: Citation [amd-apm] is not referenced.
    Documentation/virtual/kvm/amd-memory-encryption.rst:247: WARNING: Citation [kvm-forum] is not referenced.

For references that aren't mentioned at the text by adding an
explicit reference to them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/virtual/kvm/amd-memory-encryption.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/virtual/kvm/amd-memory-encryption.rst b/Documentation/virtual/kvm/amd-memory-encryption.rst
index 659bbc093b52..d18c97b4e140 100644
--- a/Documentation/virtual/kvm/amd-memory-encryption.rst
+++ b/Documentation/virtual/kvm/amd-memory-encryption.rst
@@ -241,6 +241,9 @@ Returns: 0 on success, -negative on error
 References
 ==========
 
+
+See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info.
+
 .. [white-paper] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
 .. [api-spec] http://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf
 .. [amd-apm] http://support.amd.com/TechDocs/24593.pdf (section 15.34)
-- 
cgit v1.2.3-59-g8ed1b


From d0727cc650f38243c0ac63fd8c91bfd63e3e2578 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:24 -0300
Subject: docs: zh_CN: avoid duplicate citation references

    Documentation/process/management-style.rst:35: WARNING: duplicate label decisions, other instance in     Documentation/translations/zh_CN/process/management-style.rst
    Documentation/process/programming-language.rst:37: WARNING: duplicate citation c-language, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:38: WARNING: duplicate citation gcc, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:39: WARNING: duplicate citation clang, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:40: WARNING: duplicate citation icc, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:41: WARNING: duplicate citation gcc-c-dialect-options, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:42: WARNING: duplicate citation gnu-extensions, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:43: WARNING: duplicate citation gcc-attribute-syntax, other instance in     Documentation/translations/zh_CN/process/programming-language.rst
    Documentation/process/programming-language.rst:44: WARNING: duplicate citation n2049, other instance in     Documentation/translations/zh_CN/process/programming-language.rst

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../zh_CN/process/management-style.rst             |  4 +-
 .../zh_CN/process/programming-language.rst         | 59 +++++++++++++++++-----
 2 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/Documentation/translations/zh_CN/process/management-style.rst b/Documentation/translations/zh_CN/process/management-style.rst
index a181fa56d19e..c6a5bb285797 100644
--- a/Documentation/translations/zh_CN/process/management-style.rst
+++ b/Documentation/translations/zh_CN/process/management-style.rst
@@ -28,7 +28,7 @@ Linux内核管理风格
 
 不管怎样，这里是：
 
-.. _decisions:
+.. _cn_decisions:
 
 1）决策
 -------
@@ -108,7 +108,7 @@ Linux内核管理风格
 但是，为了做好作为内核管理者的准备，最好记住不要烧掉任何桥梁，不要轰炸任何
 无辜的村民，也不要疏远太多的内核开发人员。事实证明，疏远人是相当容易的，而
 亲近一个疏远的人是很难的。因此，“疏远”立即属于“不可逆”的范畴，并根据
-:ref:`decisions` 成为绝不可以做的事情。
+:ref:`cn_decisions` 成为绝不可以做的事情。
 
 这里只有几个简单的规则：
 
diff --git a/Documentation/translations/zh_CN/process/programming-language.rst b/Documentation/translations/zh_CN/process/programming-language.rst
index 51fd4ef48ea1..2a47a1d2ec20 100644
--- a/Documentation/translations/zh_CN/process/programming-language.rst
+++ b/Documentation/translations/zh_CN/process/programming-language.rst
@@ -8,21 +8,21 @@
 程序设计语言
 ============
 
-内核是用C语言 [c-language]_ 编写的。更准确地说，内核通常是用 ``gcc`` [gcc]_
-在 ``-std=gnu89`` [gcc-c-dialect-options]_ 下编译的：ISO C90的 GNU 方言（
+内核是用C语言 :ref:`c-language <cn_c-language>` 编写的。更准确地说，内核通常是用 :ref:`gcc <cn_gcc>`
+在 ``-std=gnu89`` :ref:`gcc-c-dialect-options <cn_gcc-c-dialect-options>` 下编译的：ISO C90的 GNU 方言（
 包括一些C99特性）
 
-这种方言包含对语言 [gnu-extensions]_ 的许多扩展，当然，它们许多都在内核中使用。
+这种方言包含对语言 :ref:`gnu-extensions <cn_gnu-extensions>` 的许多扩展，当然，它们许多都在内核中使用。
 
-对于一些体系结构，有一些使用 ``clang`` [clang]_ 和 ``icc`` [icc]_ 编译内核
+对于一些体系结构，有一些使用 :ref:`clang <cn_clang>` 和 :ref:`icc <cn_icc>` 编译内核
 的支持，尽管在编写此文档时还没有完成，仍需要第三方补丁。
 
 属性
 ----
 
-在整个内核中使用的一个常见扩展是属性（attributes） [gcc-attribute-syntax]_
+在整个内核中使用的一个常见扩展是属性（attributes） :ref:`gcc-attribute-syntax <cn_gcc-attribute-syntax>`
 属性允许将实现定义的语义引入语言实体（如变量、函数或类型），而无需对语言进行
-重大的语法更改（例如添加新关键字） [n2049]_
+重大的语法更改（例如添加新关键字） :ref:`n2049 <cn_n2049>`
 
 在某些情况下，属性是可选的（即不支持这些属性的编译器仍然应该生成正确的代码，
 即使其速度较慢或执行的编译时检查/诊断次数不够）
@@ -31,11 +31,42 @@
 ``__attribute__((__pure__))`` ），以检测可以使用哪些关键字和/或缩短代码, 具体
 请参阅 ``include/linux/compiler_attributes.h``
 
-.. [c-language] http://www.open-std.org/jtc1/sc22/wg14/www/standards
-.. [gcc] https://gcc.gnu.org
-.. [clang] https://clang.llvm.org
-.. [icc] https://software.intel.com/en-us/c-compilers
-.. [gcc-c-dialect-options] https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
-.. [gnu-extensions] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
-.. [gcc-attribute-syntax] https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
-.. [n2049] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf
+.. _cn_c-language:
+
+c-language
+   http://www.open-std.org/jtc1/sc22/wg14/www/standards
+
+.. _cn_gcc:
+
+gcc
+   https://gcc.gnu.org
+
+.. _cn_clang:
+
+clang
+   https://clang.llvm.org
+
+.. _cn_icc:
+
+icc
+   https://software.intel.com/en-us/c-compilers
+
+.. _cn_gcc-c-dialect-options:
+
+c-dialect-options
+   https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
+
+.. _cn_gnu-extensions:
+
+gnu-extensions
+   https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
+
+.. _cn_gcc-attribute-syntax:
+
+gcc-attribute-syntax
+   https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
+
+.. _cn_n2049:
+
+n2049
+   http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf
-- 
cgit v1.2.3-59-g8ed1b


From ea0ad8763b17395fc611f6d91d1de389ec0cc584 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:25 -0300
Subject: docs: it: license-rules.rst: get rid of warnings

There's a wrong identation on a code block, and it tries to use
a reference that was not defined at the Italian translation.

    Documentation/translations/it_IT/process/license-rules.rst:329: WARNING: Literal block expected; none found.
    Documentation/translations/it_IT/process/license-rules.rst:332: WARNING: Unexpected indentation.
    Documentation/translations/it_IT/process/license-rules.rst:339: WARNING: Block quote ends without a blank line; unexpected unindent.
    Documentation/translations/it_IT/process/license-rules.rst:341: WARNING: Unexpected indentation.
    Documentation/translations/it_IT/process/license-rules.rst:305: WARNING: Unknown target name: "metatags".

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../translations/it_IT/process/license-rules.rst   | 28 +++++++++++-----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/Documentation/translations/it_IT/process/license-rules.rst b/Documentation/translations/it_IT/process/license-rules.rst
index f058e06996dc..4cd87a3a7bf9 100644
--- a/Documentation/translations/it_IT/process/license-rules.rst
+++ b/Documentation/translations/it_IT/process/license-rules.rst
@@ -303,7 +303,7 @@ essere categorizzate in:
      LICENSES/dual
 
    I file in questa cartella contengono il testo completo della rispettiva
-   licenza e i suoi `Metatags`_.  I nomi dei file sono identici agli
+   licenza e i suoi `Metatag`_.  I nomi dei file sono identici agli
    identificatori di licenza SPDX che dovrebbero essere usati nei file
    sorgenti.
 
@@ -326,19 +326,19 @@ essere categorizzate in:
 
    Esempio del formato del file::
 
-   Valid-License-Identifier: MPL-1.1
-   SPDX-URL: https://spdx.org/licenses/MPL-1.1.html
-   Usage-Guide:
-     Do NOT use. The MPL-1.1 is not GPL2 compatible. It may only be used for
-     dual-licensed files where the other license is GPL2 compatible.
-     If you end up using this it MUST be used together with a GPL2 compatible
-     license using "OR".
-     To use the Mozilla Public License version 1.1 put the following SPDX
-     tag/value pair into a comment according to the placement guidelines in
-     the licensing rules documentation:
-   SPDX-License-Identifier: MPL-1.1
-   License-Text:
-     Full license text
+    Valid-License-Identifier: MPL-1.1
+    SPDX-URL: https://spdx.org/licenses/MPL-1.1.html
+    Usage-Guide:
+      Do NOT use. The MPL-1.1 is not GPL2 compatible. It may only be used for
+      dual-licensed files where the other license is GPL2 compatible.
+      If you end up using this it MUST be used together with a GPL2 compatible
+      license using "OR".
+      To use the Mozilla Public License version 1.1 put the following SPDX
+      tag/value pair into a comment according to the placement guidelines in
+      the licensing rules documentation:
+    SPDX-License-Identifier: MPL-1.1
+    License-Text:
+      Full license text
 
 |
 
-- 
cgit v1.2.3-59-g8ed1b


From 6ad8b21652ec26a5ad51ffc91470e15c19156548 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:27 -0300
Subject: docs: security: trusted-encrypted.rst: fix code-block tag

The code-block tag is at the wrong place, causing those
warnings:

    Documentation/security/keys/trusted-encrypted.rst:112: WARNING: Literal block expected; none found.
    Documentation/security/keys/trusted-encrypted.rst:121: WARNING: Unexpected indentation.
    Documentation/security/keys/trusted-encrypted.rst:122: WARNING: Block quote ends without a blank line; unexpected unindent.
    Documentation/security/keys/trusted-encrypted.rst:123: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/security/keys/trusted-encrypted.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/security/keys/trusted-encrypted.rst b/Documentation/security/keys/trusted-encrypted.rst
index 7b35fcb58933..50ac8bcd6970 100644
--- a/Documentation/security/keys/trusted-encrypted.rst
+++ b/Documentation/security/keys/trusted-encrypted.rst
@@ -107,12 +107,14 @@ Where::
 
 Examples of trusted and encrypted key usage:
 
-Create and save a trusted key named "kmk" of length 32 bytes::
+Create and save a trusted key named "kmk" of length 32 bytes.
 
 Note: When using a TPM 2.0 with a persistent key with handle 0x81000001,
 append 'keyhandle=0x81000001' to statements between quotes, such as
 "new 32 keyhandle=0x81000001".
 
+::
+
     $ keyctl add trusted kmk "new 32" @u
     440502848
 
-- 
cgit v1.2.3-59-g8ed1b


From 43415f13276f09623b1b61376c6f2e43f71bedbb Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:28 -0300
Subject: docs: security: core.rst: Fix several warnings

Multi-line literal markups only work when they're idented at the
same level, with is not the case here:

   Documentation/security/keys/core.rst:1597: WARNING: Inline literal start-string without end-string.
   Documentation/security/keys/core.rst:1597: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1597: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1598: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1598: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1600: WARNING: Inline literal start-string without end-string.
   Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1666: WARNING: Inline literal start-string without end-string.
   Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string.
   Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string.

Fix it by using a code-block instead.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/security/keys/core.rst | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst
index 9521c4207f01..3fd60dcb2dc6 100644
--- a/Documentation/security/keys/core.rst
+++ b/Documentation/security/keys/core.rst
@@ -1594,10 +1594,12 @@ The structure has a number of fields, some of which are mandatory:
      attempted key link operation. If there is no match, -EINVAL is returned.
 
 
-  *  ``int (*asym_eds_op)(struct kernel_pkey_params *params,
-			  const void *in, void *out);``
-     ``int (*asym_verify_signature)(struct kernel_pkey_params *params,
-				    const void *in, const void *in2);``
+  *  ``asym_eds_op`` and ``asym_verify_signature``::
+
+       int (*asym_eds_op)(struct kernel_pkey_params *params,
+			  const void *in, void *out);
+       int (*asym_verify_signature)(struct kernel_pkey_params *params,
+				    const void *in, const void *in2);
 
      These methods are optional.  If provided the first allows a key to be
      used to encrypt, decrypt or sign a blob of data, and the second allows a
@@ -1662,8 +1664,10 @@ The structure has a number of fields, some of which are mandatory:
      required crypto isn't available.
 
 
-  *  ``int (*asym_query)(const struct kernel_pkey_params *params,
-			 struct kernel_pkey_query *info);``
+  *  ``asym_query``::
+
+       int (*asym_query)(const struct kernel_pkey_params *params,
+			 struct kernel_pkey_query *info);
 
      This method is optional.  If provided it allows information about the
      public or asymmetric key held in the key to be determined.
-- 
cgit v1.2.3-59-g8ed1b


From c6fff4d3b2f467dd62ee8c69e49c8a8795fe7400 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:30 -0300
Subject: docs: net: sja1105.rst: fix table format

There's a table there with produces two warnings when built
with Sphinx:

    Documentation/networking/dsa/sja1105.rst:91: WARNING: Block quote ends without a blank line; unexpected unindent.
    Documentation/networking/dsa/sja1105.rst:91: WARNING: Block quote ends without a blank line; unexpected unindent.

It will still produce a table, but the html output is wrong, as
it won't interpret the second line as the continuation for the
first ones, because identation doesn't match.

After the change, the output looks a way better and we got rid
of two warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/networking/dsa/sja1105.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst
index ea7bac438cfd..cb2858dece93 100644
--- a/Documentation/networking/dsa/sja1105.rst
+++ b/Documentation/networking/dsa/sja1105.rst
@@ -86,13 +86,13 @@ functionality.
 The following traffic modes are supported over the switch netdevices:
 
 +--------------------+------------+------------------+------------------+
-|                    | Standalone |   Bridged with   |   Bridged with   |
-|                    |    ports   | vlan_filtering 0 | vlan_filtering 1 |
+|                    | Standalone | Bridged with     | Bridged with     |
+|                    | ports      | vlan_filtering 0 | vlan_filtering 1 |
 +====================+============+==================+==================+
 | Regular traffic    |     Yes    |       Yes        |  No (use master) |
 +--------------------+------------+------------------+------------------+
 | Management traffic |     Yes    |       Yes        |       Yes        |
-|    (BPDU, PTP)     |            |                  |                  |
+| (BPDU, PTP)        |            |                  |                  |
 +--------------------+------------+------------------+------------------+
 
 Switching features
-- 
cgit v1.2.3-59-g8ed1b


From 14b767430a58046bfef8ff9b9f12854e20343092 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:29 -0300
Subject: docs: net: dpio-driver.rst: fix two codeblock warnings

    Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:43: WARNING: Definition list ends without a blank line; unexpected unindent.
    Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:63: WARNING: Unexpected indentation. looking for now-outdated files... none found

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 .../networking/device_drivers/freescale/dpaa2/dpio-driver.rst         | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
index 5045df990a4c..17dbee1ac53e 100644
--- a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
+++ b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst
@@ -39,8 +39,7 @@ The Linux DPIO driver consists of 3 primary components--
 
    DPIO service-- provides APIs to other Linux drivers for services
 
-   QBman portal interface-- sends portal commands, gets responses
-::
+   QBman portal interface-- sends portal commands, gets responses::
 
           fsl-mc          other
            bus           drivers
@@ -60,6 +59,7 @@ The Linux DPIO driver consists of 3 primary components--
 
 The diagram below shows how the DPIO driver components fit with the other
 DPAA2 Linux driver components::
+
                                                    +------------+
                                                    | OS Network |
                                                    |   Stack    |
-- 
cgit v1.2.3-59-g8ed1b


From 1eecbcdca2bd8d96881cace19ad105dc0f0263f5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:31 -0300
Subject: docs: move protection-keys.rst to the core-api book

This document is used by multiple architectures:

	$ echo $(git grep -l  pkey_mprotect arch|cut -d'/' -f 2|sort|uniq)
	alpha arm arm64 ia64 m68k microblaze mips parisc powerpc s390 sh sparc x86 xtensa

So, let's move it to the core book and adjust the links to it
accordingly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/index.rst              |  1 +
 Documentation/core-api/protection-keys.rst    | 99 +++++++++++++++++++++++++++
 Documentation/x86/index.rst                   |  1 -
 Documentation/x86/protection-keys.rst         | 99 ---------------------------
 arch/powerpc/Kconfig                          |  2 +-
 arch/x86/Kconfig                              |  2 +-
 tools/testing/selftests/x86/protection_keys.c |  2 +-
 7 files changed, 103 insertions(+), 103 deletions(-)
 create mode 100644 Documentation/core-api/protection-keys.rst
 delete mode 100644 Documentation/x86/protection-keys.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index ee1bb8983a88..2466a4c51031 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -34,6 +34,7 @@ Core utilities
    timekeeping
    boot-time-mm
    memory-hotplug
+   protection-keys
 
 
 Interfaces for kernel debugging
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
new file mode 100644
index 000000000000..49d9833af871
--- /dev/null
+++ b/Documentation/core-api/protection-keys.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Memory Protection Keys
+======================
+
+Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
+which is found on Intel's Skylake "Scalable Processor" Server CPUs.
+It will be avalable in future non-server parts.
+
+For anyone wishing to test or use this feature, it is available in
+Amazon's EC2 C5 instances and is known to work there using an Ubuntu
+17.04 image.
+
+Memory Protection Keys provides a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains.  It works by
+dedicating 4 previously ignored bits in each page table entry to a
+"protection key", giving 16 possible keys.
+
+There is also a new user-accessible register (PKRU) with two separate
+bits (Access Disable and Write Disable) for each key.  Being a CPU
+register, PKRU is inherently thread-local, potentially giving each
+thread a different set of protections from every other thread.
+
+There are two new instructions (RDPKRU/WRPKRU) for reading and writing
+to the new register.  The feature is only available in 64-bit mode,
+even though there is theoretically space in the PAE PTEs.  These
+permissions are enforced on data access only and have no effect on
+instruction fetches.
+
+Syscalls
+========
+
+There are 3 system calls which directly interact with pkeys::
+
+	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+	int pkey_free(int pkey);
+	int pkey_mprotect(unsigned long start, size_t len,
+			  unsigned long prot, int pkey);
+
+Before a pkey can be used, it must first be allocated with
+pkey_alloc().  An application calls the WRPKRU instruction
+directly in order to change access permissions to memory covered
+with a key.  In this example WRPKRU is wrapped by a C function
+called pkey_set().
+::
+
+	int real_prot = PROT_READ|PROT_WRITE;
+	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
+	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
+	... application runs here
+
+Now, if the application needs to update the data at 'ptr', it can
+gain access, do the update, then remove its write access::
+
+	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
+	*ptr = foo; // assign something
+	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
+
+Now when it frees the memory, it will also free the pkey since it
+is no longer in use::
+
+	munmap(ptr, PAGE_SIZE);
+	pkey_free(pkey);
+
+.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+          An example implementation can be found in
+          tools/testing/selftests/x86/protection_keys.c.
+
+Behavior
+========
+
+The kernel attempts to make protection keys consistent with the
+behavior of a plain mprotect().  For instance if you do this::
+
+	mprotect(ptr, size, PROT_NONE);
+	something(ptr);
+
+you can expect the same effects with protection keys when doing this::
+
+	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
+	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
+	something(ptr);
+
+That should be true whether something() is a direct access to 'ptr'
+like::
+
+	*ptr = foo;
+
+or when the kernel does the access on the application's behalf like
+with a read()::
+
+	read(fd, ptr, 1);
+
+The kernel will send a SIGSEGV in both cases, but si_code will be set
+to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
+the plain mprotect() permissions are violated.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ae36fc5fc649..f2de1b2d3ac7 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -19,7 +19,6 @@ x86-specific Documentation
    tlb
    mtrr
    pat
-   protection-keys
    intel_mpx
    amd-memory-encryption
    pti
diff --git a/Documentation/x86/protection-keys.rst b/Documentation/x86/protection-keys.rst
deleted file mode 100644
index 49d9833af871..000000000000
--- a/Documentation/x86/protection-keys.rst
+++ /dev/null
@@ -1,99 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-======================
-Memory Protection Keys
-======================
-
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake "Scalable Processor" Server CPUs.
-It will be avalable in future non-server parts.
-
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
-
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains.  It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
-
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key.  Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
-
-There are two new instructions (RDPKRU/WRPKRU) for reading and writing
-to the new register.  The feature is only available in 64-bit mode,
-even though there is theoretically space in the PAE PTEs.  These
-permissions are enforced on data access only and have no effect on
-instruction fetches.
-
-Syscalls
-========
-
-There are 3 system calls which directly interact with pkeys::
-
-	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
-	int pkey_free(int pkey);
-	int pkey_mprotect(unsigned long start, size_t len,
-			  unsigned long prot, int pkey);
-
-Before a pkey can be used, it must first be allocated with
-pkey_alloc().  An application calls the WRPKRU instruction
-directly in order to change access permissions to memory covered
-with a key.  In this example WRPKRU is wrapped by a C function
-called pkey_set().
-::
-
-	int real_prot = PROT_READ|PROT_WRITE;
-	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
-	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
-	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
-	... application runs here
-
-Now, if the application needs to update the data at 'ptr', it can
-gain access, do the update, then remove its write access::
-
-	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
-	*ptr = foo; // assign something
-	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
-
-Now when it frees the memory, it will also free the pkey since it
-is no longer in use::
-
-	munmap(ptr, PAGE_SIZE);
-	pkey_free(pkey);
-
-.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
-          An example implementation can be found in
-          tools/testing/selftests/x86/protection_keys.c.
-
-Behavior
-========
-
-The kernel attempts to make protection keys consistent with the
-behavior of a plain mprotect().  For instance if you do this::
-
-	mprotect(ptr, size, PROT_NONE);
-	something(ptr);
-
-you can expect the same effects with protection keys when doing this::
-
-	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
-	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
-	something(ptr);
-
-That should be true whether something() is a direct access to 'ptr'
-like::
-
-	*ptr = foo;
-
-or when the kernel does the access on the application's behalf like
-with a read()::
-
-	read(fd, ptr, 1);
-
-The kernel will send a SIGSEGV in both cases, but si_code will be set
-to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
-the plain mprotect() permissions are violated.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308c8..3b795a0cab62 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -898,7 +898,7 @@ config PPC_MEM_KEYS
 	  page-based protections, but without requiring modification of the
 	  page tables when an application changes protection domains.
 
-	  For details, see Documentation/vm/protection-keys.rst
+	  For details, see Documentation/core-api/protection-keys.rst
 
 	  If unsure, say y.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d1ba31..d87d53fcd261 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1911,7 +1911,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 	  page-based protections, but without requiring modification of the
 	  page tables when an application changes protection domains.
 
-	  For details, see Documentation/x86/protection-keys.txt
+	  For details, see Documentation/core-api/protection-keys.rst
 
 	  If unsure, say y.
 
diff --git a/tools/testing/selftests/x86/protection_keys.c b/tools/testing/selftests/x86/protection_keys.c
index 5d546dcdbc80..480995bceefa 100644
--- a/tools/testing/selftests/x86/protection_keys.c
+++ b/tools/testing/selftests/x86/protection_keys.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt)
+ * Tests x86 Memory Protection Keys (see Documentation/core-api/protection-keys.rst)
  *
  * There are examples in here of:
  *  * how to set protection keys on memory
-- 
cgit v1.2.3-59-g8ed1b


From cb1aaebea8d79860181559d7b5d482aea63db113 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:32 -0300
Subject: docs: fix broken documentation links

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/acpi/dsd/leds.txt                          |  2 +-
 Documentation/admin-guide/kernel-parameters.rst          |  6 +++---
 Documentation/admin-guide/kernel-parameters.txt          | 16 ++++++++--------
 Documentation/admin-guide/ras.rst                        |  2 +-
 Documentation/devicetree/bindings/net/fsl-enetc.txt      |  7 +++----
 .../devicetree/bindings/pci/amlogic,meson-pcie.txt       |  2 +-
 .../bindings/regulator/qcom,rpmh-regulator.txt           |  2 +-
 Documentation/devicetree/booting-without-of.txt          |  2 +-
 Documentation/driver-api/gpio/board.rst                  |  2 +-
 Documentation/driver-api/gpio/consumer.rst               |  2 +-
 Documentation/firmware-guide/acpi/enumeration.rst        |  2 +-
 Documentation/firmware-guide/acpi/method-tracing.rst     |  2 +-
 Documentation/i2c/instantiating-devices                  |  2 +-
 Documentation/sysctl/kernel.txt                          |  4 ++--
 Documentation/translations/zh_CN/process/4.Coding.rst    |  2 +-
 Documentation/x86/x86_64/5level-paging.rst               |  2 +-
 Documentation/x86/x86_64/boot-options.rst                |  4 ++--
 Documentation/x86/x86_64/fake-numa-for-cpusets.rst       |  2 +-
 MAINTAINERS                                              |  4 ++--
 arch/arm/Kconfig                                         |  2 +-
 arch/arm64/kernel/kexec_image.c                          |  2 +-
 arch/x86/Kconfig                                         | 14 +++++++-------
 arch/x86/Kconfig.debug                                   |  2 +-
 arch/x86/boot/header.S                                   |  2 +-
 arch/x86/entry/entry_64.S                                |  2 +-
 arch/x86/include/asm/bootparam_utils.h                   |  2 +-
 arch/x86/include/asm/page_64_types.h                     |  2 +-
 arch/x86/include/asm/pgtable_64_types.h                  |  2 +-
 arch/x86/kernel/cpu/microcode/amd.c                      |  2 +-
 arch/x86/kernel/kexec-bzimage64.c                        |  2 +-
 arch/x86/kernel/pci-dma.c                                |  2 +-
 arch/x86/mm/tlb.c                                        |  2 +-
 arch/x86/platform/pvh/enlighten.c                        |  2 +-
 drivers/acpi/Kconfig                                     | 10 +++++-----
 drivers/net/ethernet/faraday/ftgmac100.c                 |  2 +-
 drivers/staging/fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
 drivers/vhost/vhost.c                                    |  2 +-
 include/acpi/acpi_drivers.h                              |  2 +-
 include/linux/fs_context.h                               |  2 +-
 include/linux/lsm_hooks.h                                |  2 +-
 mm/Kconfig                                               |  2 +-
 security/Kconfig                                         |  2 +-
 tools/include/linux/err.h                                |  2 +-
 tools/objtool/Documentation/stack-validation.txt         |  4 ++--
 44 files changed, 70 insertions(+), 71 deletions(-)

diff --git a/Documentation/acpi/dsd/leds.txt b/Documentation/acpi/dsd/leds.txt
index 81a63af42ed2..cc58b1a574c5 100644
--- a/Documentation/acpi/dsd/leds.txt
+++ b/Documentation/acpi/dsd/leds.txt
@@ -96,4 +96,4 @@ where
     <URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
     referenced 2019-02-21.
 
-[7] Documentation/acpi/dsd/data-node-reference.txt
+[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 0124980dca2d..8d3273e32eb1 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -167,7 +167,7 @@ parameter is applicable::
 	X86-32	X86-32, aka i386 architecture is enabled.
 	X86-64	X86-64 architecture is enabled.
 			More X86-64 boot options can be found in
-			Documentation/x86/x86_64/boot-options.txt .
+			Documentation/x86/x86_64/boot-options.rst.
 	X86	Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
 	X86_UV	SGI UV support is enabled.
 	XEN	Xen support is enabled
@@ -181,10 +181,10 @@ In addition, the following text indicates that the option::
 Parameters denoted with BOOT are actually interpreted by the boot
 loader, and have no meaning to the kernel directly.
 Do not modify the syntax of boot loader parameters without extreme
-need or coordination with <Documentation/x86/boot.txt>.
+need or coordination with <Documentation/x86/boot.rst>.
 
 There are also arch-specific kernel-parameters not documented here.
-See for example <Documentation/x86/x86_64/boot-options.txt>.
+See for example <Documentation/x86/x86_64/boot-options.rst>.
 
 Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
 a trailing = on the name of any parameter states that that parameter will
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 79d043b8850d..1abd7e145357 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -53,7 +53,7 @@
 			ACPI_DEBUG_PRINT statements, e.g.,
 			    ACPI_DEBUG_PRINT((ACPI_DB_INFO, ...
 			The debug_level mask defaults to "info".  See
-			Documentation/acpi/debug.txt for more information about
+			Documentation/firmware-guide/acpi/debug.rst for more information about
 			debug layers and levels.
 
 			Enable processor driver info messages:
@@ -963,7 +963,7 @@
 			for details.
 
 	nompx		[X86] Disables Intel Memory Protection Extensions.
-			See Documentation/x86/intel_mpx.txt for more
+			See Documentation/x86/intel_mpx.rst for more
 			information about the feature.
 
 	nopku		[X86] Disable Memory Protection Keys CPU feature found
@@ -1189,7 +1189,7 @@
 			that is to be dynamically loaded by Linux. If there are
 			multiple variables with the same name but with different
 			vendor GUIDs, all of them will be loaded. See
-			Documentation/acpi/ssdt-overlays.txt for details.
+			Documentation/admin-guide/acpi/ssdt-overlays.rst for details.
 
 
 	eisa_irq_edge=	[PARISC,HW]
@@ -2383,7 +2383,7 @@
 
 	mce		[X86-32] Machine Check Exception
 
-	mce=option	[X86-64] See Documentation/x86/x86_64/boot-options.txt
+	mce=option	[X86-64] See Documentation/x86/x86_64/boot-options.rst
 
 	md=		[HW] RAID subsystems devices and level
 			See Documentation/admin-guide/md.rst.
@@ -2439,7 +2439,7 @@
 			set according to the
 			CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
 			option.
-			See Documentation/memory-hotplug.txt.
+			See Documentation/admin-guide/mm/memory-hotplug.rst.
 
 	memmap=exactmap	[KNL,X86] Enable setting of an exact
 			E820 memory map, as specified by the user.
@@ -2528,7 +2528,7 @@
 			mem_encrypt=on:		Activate SME
 			mem_encrypt=off:	Do not activate SME
 
-			Refer to Documentation/x86/amd-memory-encryption.txt
+			Refer to Documentation/virtual/kvm/amd-memory-encryption.rst
 			for details on when memory encryption can be activated.
 
 	mem_sleep_default=	[SUSPEND] Default system suspend mode:
@@ -3529,7 +3529,7 @@
 			See Documentation/blockdev/paride.txt.
 
 	pirq=		[SMP,APIC] Manual mp-table setup
-			See Documentation/x86/i386/IO-APIC.txt.
+			See Documentation/x86/i386/IO-APIC.rst.
 
 	plip=		[PPT,NET] Parallel port network link
 			Format: { parport<nr> | timid | 0 }
@@ -5055,7 +5055,7 @@
 			Can be used multiple times for multiple devices.
 
 	vga=		[BOOT,X86-32] Select a particular video mode
-			See Documentation/x86/boot.txt and
+			See Documentation/x86/boot.rst and
 			Documentation/svga.txt.
 			Use vga=ask for menu.
 			This is actually a boot loader parameter; the value is
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index c7495e42e6f4..2b20f5f7380d 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -199,7 +199,7 @@ Architecture (MCA)\ [#f3]_.
   mode).
 
 .. [#f3] For more details about the Machine Check Architecture (MCA),
-  please read Documentation/x86/x86_64/machinecheck at the Kernel tree.
+  please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree.
 
 EDAC - Error Detection And Correction
 *************************************
diff --git a/Documentation/devicetree/bindings/net/fsl-enetc.txt b/Documentation/devicetree/bindings/net/fsl-enetc.txt
index c812e25ae90f..25fc687419db 100644
--- a/Documentation/devicetree/bindings/net/fsl-enetc.txt
+++ b/Documentation/devicetree/bindings/net/fsl-enetc.txt
@@ -16,8 +16,8 @@ Required properties:
 In this case, the ENETC node should include a "mdio" sub-node
 that in turn should contain the "ethernet-phy" node describing the
 external phy.  Below properties are required, their bindings
-already defined in ethernet.txt or phy.txt, under
-Documentation/devicetree/bindings/net/*.
+already defined in Documentation/devicetree/bindings/net/ethernet.txt or
+Documentation/devicetree/bindings/net/phy.txt.
 
 Required:
 
@@ -51,8 +51,7 @@ Example:
 connection:
 
 In this case, the ENETC port node defines a fixed link connection,
-as specified by "fixed-link.txt", under
-Documentation/devicetree/bindings/net/*.
+as specified by Documentation/devicetree/bindings/net/fixed-link.txt.
 
 Required:
 
diff --git a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
index 12b18f82d441..efa2c8b9b85a 100644
--- a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
@@ -3,7 +3,7 @@ Amlogic Meson AXG DWC PCIE SoC controller
 Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI core.
 It shares common functions with the PCIe DesignWare core driver and
 inherits common properties defined in
-Documentation/devicetree/bindings/pci/designware-pci.txt.
+Documentation/devicetree/bindings/pci/designware-pcie.txt.
 
 Additional properties are described here:
 
diff --git a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt
index 7ef2dbe48e8a..14d2eee96b3d 100644
--- a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt
@@ -97,7 +97,7 @@ Second Level Nodes - Regulators
 		    sent for this regulator including those which are for a
 		    strictly lower power state.
 
-Other properties defined in Documentation/devicetree/bindings/regulator.txt
+Other properties defined in Documentation/devicetree/bindings/regulator/regulator.txt
 may also be used.  regulator-initial-mode and regulator-allowed-modes may be
 specified for VRM regulators using mode values from
 include/dt-bindings/regulator/qcom,rpmh-regulator.h.  regulator-allow-bypass
diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt
index e86bd2f64117..60f8640f2b2f 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -277,7 +277,7 @@ it with special cases.
   the decompressor (the real mode entry point goes to the same  32bit
   entry point once it switched into protected mode). That entry point
   supports one calling convention which is documented in
-  Documentation/x86/boot.txt
+  Documentation/x86/boot.rst
   The physical pointer to the device-tree block (defined in chapter II)
   is passed via setup_data which requires at least boot protocol 2.09.
   The type filed is defined as
diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst
index b37f3f7b8926..ce91518bf9f4 100644
--- a/Documentation/driver-api/gpio/board.rst
+++ b/Documentation/driver-api/gpio/board.rst
@@ -101,7 +101,7 @@ with the help of _DSD (Device Specific Data), introduced in ACPI 5.1::
 	}
 
 For more information about the ACPI GPIO bindings see
-Documentation/acpi/gpio-properties.txt.
+Documentation/firmware-guide/acpi/gpio-properties.rst.
 
 Platform Data
 -------------
diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst
index 5e4d8aa68913..fdecb6d711db 100644
--- a/Documentation/driver-api/gpio/consumer.rst
+++ b/Documentation/driver-api/gpio/consumer.rst
@@ -437,7 +437,7 @@ case, it will be handled by the GPIO subsystem automatically.  However, if the
 _DSD is not present, the mappings between GpioIo()/GpioInt() resources and GPIO
 connection IDs need to be provided by device drivers.
 
-For details refer to Documentation/acpi/gpio-properties.txt
+For details refer to Documentation/firmware-guide/acpi/gpio-properties.rst
 
 
 Interacting With the Legacy GPIO Subsystem
diff --git a/Documentation/firmware-guide/acpi/enumeration.rst b/Documentation/firmware-guide/acpi/enumeration.rst
index 850be9696931..1252617b520f 100644
--- a/Documentation/firmware-guide/acpi/enumeration.rst
+++ b/Documentation/firmware-guide/acpi/enumeration.rst
@@ -339,7 +339,7 @@ a code like this::
 There are also devm_* versions of these functions which release the
 descriptors once the device is released.
 
-See Documentation/acpi/gpio-properties.txt for more information about the
+See Documentation/firmware-guide/acpi/gpio-properties.rst for more information about the
 _DSD binding related to GPIOs.
 
 MFD devices
diff --git a/Documentation/firmware-guide/acpi/method-tracing.rst b/Documentation/firmware-guide/acpi/method-tracing.rst
index d0b077b73f5f..0aa7e2c5d32a 100644
--- a/Documentation/firmware-guide/acpi/method-tracing.rst
+++ b/Documentation/firmware-guide/acpi/method-tracing.rst
@@ -68,7 +68,7 @@ c. Filter out the debug layer/level matched logs when the specified
 
 Where:
    0xXXXXXXXX/0xYYYYYYYY
-     Refer to Documentation/acpi/debug.txt for possible debug layer/level
+     Refer to Documentation/firmware-guide/acpi/debug.rst for possible debug layer/level
      masking values.
    \PPPP.AAAA.TTTT.HHHH
      Full path of a control method that can be found in the ACPI namespace.
diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices
index 0d85ac1935b7..5a3e2f331e8c 100644
--- a/Documentation/i2c/instantiating-devices
+++ b/Documentation/i2c/instantiating-devices
@@ -85,7 +85,7 @@ Method 1c: Declare the I2C devices via ACPI
 -------------------------------------------
 
 ACPI can also describe I2C devices. There is special documentation for this
-which is currently located at Documentation/acpi/enumeration.txt.
+which is currently located at Documentation/firmware-guide/acpi/enumeration.rst.
 
 
 Method 2: Instantiate the devices explicitly
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index f0c86fbb3b48..92f7f34b021a 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -155,7 +155,7 @@ is 0x15 and the full version number is 0x234, this file will contain
 the value 340 = 0x154.
 
 See the type_of_loader and ext_loader_type fields in
-Documentation/x86/boot.txt for additional information.
+Documentation/x86/boot.rst for additional information.
 
 ==============================================================
 
@@ -167,7 +167,7 @@ The complete bootloader version number.  In the example above, this
 file will contain the value 564 = 0x234.
 
 See the type_of_loader and ext_loader_ver fields in
-Documentation/x86/boot.txt for additional information.
+Documentation/x86/boot.rst for additional information.
 
 ==============================================================
 
diff --git a/Documentation/translations/zh_CN/process/4.Coding.rst b/Documentation/translations/zh_CN/process/4.Coding.rst
index 5301e9d55255..8bb777941394 100644
--- a/Documentation/translations/zh_CN/process/4.Coding.rst
+++ b/Documentation/translations/zh_CN/process/4.Coding.rst
@@ -241,7 +241,7 @@ scripts/coccinelle目录下已经打包了相当多的内核“语义补丁”
 
 任何添加新用户空间界面的代码（包括新的sysfs或/proc文件）都应该包含该界面的
 文档，该文档使用户空间开发人员能够知道他们在使用什么。请参阅
-Documentation/abi/readme，了解如何格式化此文档以及需要提供哪些信息。
+Documentation/ABI/README，了解如何格式化此文档以及需要提供哪些信息。
 
 文件 :ref:`Documentation/admin-guide/kernel-parameters.rst <kernelparameters>`
 描述了内核的所有引导时间参数。任何添加新参数的补丁都应该向该文件添加适当的
diff --git a/Documentation/x86/x86_64/5level-paging.rst b/Documentation/x86/x86_64/5level-paging.rst
index ab88a4514163..44856417e6a5 100644
--- a/Documentation/x86/x86_64/5level-paging.rst
+++ b/Documentation/x86/x86_64/5level-paging.rst
@@ -20,7 +20,7 @@ physical address space. This "ought to be enough for anybody" ©.
 QEMU 2.9 and later support 5-level paging.
 
 Virtual memory layout for 5-level paging is described in
-Documentation/x86/x86_64/mm.txt
+Documentation/x86/x86_64/mm.rst
 
 
 Enabling 5-level paging
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index 2f69836b8445..6a4285a3c7a4 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -9,7 +9,7 @@ only the AMD64 specific ones are listed here.
 
 Machine check
 =============
-Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
+Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
 
    mce=off
 		Disable machine check
@@ -89,7 +89,7 @@ APICs
      Don't use the local APIC (alias for i386 compatibility)
 
    pirq=...
-	See Documentation/x86/i386/IO-APIC.txt
+	See Documentation/x86/i386/IO-APIC.rst
 
    noapictimer
 	Don't set up the APIC timer
diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
index 74fbb78b3c67..04df57b9aa3f 100644
--- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
+++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
@@ -18,7 +18,7 @@ For more information on the features of cpusets, see
 Documentation/cgroup-v1/cpusets.txt.
 There are a number of different configurations you can use for your needs.  For
 more information on the numa=fake command line option and its various ways of
-configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt.
+configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
 
 For the purposes of this introduction, we'll assume a very primitive NUMA
 emulation setup of "numa=fake=4*512,".  This will split our system memory into
diff --git a/MAINTAINERS b/MAINTAINERS
index 5cfbea4ce575..26e0369c1641 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3874,7 +3874,7 @@ F:	Documentation/devicetree/bindings/hwmon/cirrus,lochnagar.txt
 F:	Documentation/devicetree/bindings/pinctrl/cirrus,lochnagar.txt
 F:	Documentation/devicetree/bindings/regulator/cirrus,lochnagar.txt
 F:	Documentation/devicetree/bindings/sound/cirrus,lochnagar.txt
-F:	Documentation/hwmon/lochnagar
+F:	Documentation/hwmon/lochnagar.rst
 
 CISCO FCOE HBA DRIVER
 M:	Satish Kharat <satishkh@cisco.com>
@@ -11272,7 +11272,7 @@ NXP FXAS21002C DRIVER
 M:	Rui Miguel Silva <rmfrfs@gmail.com>
 L:	linux-iio@vger.kernel.org
 S:	Maintained
-F:	Documentation/devicetree/bindings/iio/gyroscope/fxas21002c.txt
+F:	Documentation/devicetree/bindings/iio/gyroscope/nxp,fxas21002c.txt
 F:	drivers/iio/gyro/fxas21002c_core.c
 F:	drivers/iio/gyro/fxas21002c.h
 F:	drivers/iio/gyro/fxas21002c_i2c.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 8869742a85df..0f220264cc23 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1263,7 +1263,7 @@ config SMP
 	  uniprocessor machines. On a uniprocessor machine, the kernel
 	  will run faster if you say N here.
 
-	  See also <file:Documentation/x86/i386/IO-APIC.txt>,
+	  See also <file:Documentation/x86/i386/IO-APIC.rst>,
 	  <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO available at
 	  <http://tldp.org/HOWTO/SMP-HOWTO.html>.
 
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 07bf740bea91..31cc2f423aa8 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -53,7 +53,7 @@ static void *image_load(struct kimage *image,
 
 	/*
 	 * We require a kernel with an unambiguous Image header. Per
-	 * Documentation/booting.txt, this is the case when image_size
+	 * Documentation/arm64/booting.txt, this is the case when image_size
 	 * is non-zero (practically speaking, since v3.17).
 	 */
 	h = (struct arm64_image_header *)kernel;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d87d53fcd261..9f1f7b47621c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -395,7 +395,7 @@ config SMP
 	  Y to "Enhanced Real Time Clock Support", below. The "Advanced Power
 	  Management" code will be disabled if you say Y here.
 
-	  See also <file:Documentation/x86/i386/IO-APIC.txt>,
+	  See also <file:Documentation/x86/i386/IO-APIC.rst>,
 	  <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO available at
 	  <http://www.tldp.org/docs.html#howto>.
 
@@ -1290,7 +1290,7 @@ config MICROCODE
 	  the Linux kernel.
 
 	  The preferred method to load microcode from a detached initrd is described
-	  in Documentation/x86/microcode.txt. For that you need to enable
+	  in Documentation/x86/microcode.rst. For that you need to enable
 	  CONFIG_BLK_DEV_INITRD in order for the loader to be able to scan the
 	  initrd for microcode blobs.
 
@@ -1329,7 +1329,7 @@ config MICROCODE_OLD_INTERFACE
 	  It is inadequate because it runs too late to be able to properly
 	  load microcode on a machine and it needs special tools. Instead, you
 	  should've switched to the early loading method with the initrd or
-	  builtin microcode by now: Documentation/x86/microcode.txt
+	  builtin microcode by now: Documentation/x86/microcode.rst
 
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
@@ -1478,7 +1478,7 @@ config X86_5LEVEL
 	  A kernel with the option enabled can be booted on machines that
 	  support 4- or 5-level paging.
 
-	  See Documentation/x86/x86_64/5level-paging.txt for more
+	  See Documentation/x86/x86_64/5level-paging.rst for more
 	  information.
 
 	  Say N if unsure.
@@ -1626,7 +1626,7 @@ config ARCH_MEMORY_PROBE
 	depends on X86_64 && MEMORY_HOTPLUG
 	help
 	  This option enables a sysfs memory/probe interface for testing.
-	  See Documentation/memory-hotplug.txt for more information.
+	  See Documentation/admin-guide/mm/memory-hotplug.rst for more information.
 	  If you are unsure how to answer this question, answer N.
 
 config ARCH_PROC_KCORE_TEXT
@@ -1783,7 +1783,7 @@ config MTRR
 	  You can safely say Y even if your machine doesn't have MTRRs, you'll
 	  just add about 9 KB to your kernel.
 
-	  See <file:Documentation/x86/mtrr.txt> for more information.
+	  See <file:Documentation/x86/mtrr.rst> for more information.
 
 config MTRR_SANITIZER
 	def_bool y
@@ -1895,7 +1895,7 @@ config X86_INTEL_MPX
 	  process and adds some branches to paths used during
 	  exec() and munmap().
 
-	  For details, see Documentation/x86/intel_mpx.txt
+	  For details, see Documentation/x86/intel_mpx.rst
 
 	  If unsure, say N.
 
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index f730680dc818..59f598543203 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -156,7 +156,7 @@ config IOMMU_DEBUG
 	  code. When you use it make sure you have a big enough
 	  IOMMU/AGP aperture.  Most of the options enabled by this can
 	  be set more finegrained using the iommu= command line
-	  options. See Documentation/x86/x86_64/boot-options.txt for more
+	  options. See Documentation/x86/x86_64/boot-options.rst for more
 	  details.
 
 config IOMMU_LEAK
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 850b8762e889..90d791ca1a95 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -313,7 +313,7 @@ start_sys_seg:	.word	SYSSEG		# obsolete and meaningless, but just
 
 type_of_loader:	.byte	0		# 0 means ancient bootloader, newer
 					# bootloaders know to change this.
-					# See Documentation/x86/boot.txt for
+					# See Documentation/x86/boot.rst for
 					# assigned ids
 
 # flags, unused bits must be zero (RFU) bit within loadflags
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 11aa3b2afa4d..33f9fc38d014 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -8,7 +8,7 @@
  *
  * entry.S contains the system-call and fault low-level handling routines.
  *
- * Some of this is documented in Documentation/x86/entry_64.txt
+ * Some of this is documented in Documentation/x86/entry_64.rst
  *
  * A note on terminology:
  * - iret frame:	Architecture defined interrupt frame from SS to RIP
diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h
index f6f6ef436599..101eb944f13c 100644
--- a/arch/x86/include/asm/bootparam_utils.h
+++ b/arch/x86/include/asm/bootparam_utils.h
@@ -24,7 +24,7 @@ static void sanitize_boot_params(struct boot_params *boot_params)
 	 * IMPORTANT NOTE TO BOOTLOADER AUTHORS: do not simply clear
 	 * this field.  The purpose of this field is to guarantee
 	 * compliance with the x86 boot spec located in
-	 * Documentation/x86/boot.txt .  That spec says that the
+	 * Documentation/x86/boot.rst .  That spec says that the
 	 * *whole* structure should be cleared, after which only the
 	 * portion defined by struct setup_header (boot_params->hdr)
 	 * should be copied in.
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 793c14c372cb..288b065955b7 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,7 +48,7 @@
 
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
 
-/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
+/* See Documentation/x86/x86_64/mm.rst for a description of the memory map. */
 
 #define __PHYSICAL_MASK_SHIFT	52
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 88bca456da99..52e5f5f2240d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -103,7 +103,7 @@ extern unsigned int ptrs_per_p4d;
 #define PGDIR_MASK	(~(PGDIR_SIZE - 1))
 
 /*
- * See Documentation/x86/x86_64/mm.txt for a description of the memory map.
+ * See Documentation/x86/x86_64/mm.rst for a description of the memory map.
  *
  * Be very careful vs. KASLR when changing anything here. The KASLR address
  * range must not overlap with anything except the KASAN shadow area, which
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index e1f3ba19ba54..06d4e67f31ab 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -61,7 +61,7 @@ static u8 amd_ucode_patch[PATCH_MAX_SIZE];
 
 /*
  * Microcode patch container file is prepended to the initrd in cpio
- * format. See Documentation/x86/microcode.txt
+ * format. See Documentation/x86/microcode.rst
  */
 static const char
 ucode_path[] __maybe_unused = "kernel/x86/microcode/AuthenticAMD.bin";
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 22f60dd26460..b07e7069b09e 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -416,7 +416,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel,
 	efi_map_offset = params_cmdline_sz;
 	efi_setup_data_offset = efi_map_offset + ALIGN(efi_map_sz, 16);
 
-	/* Copy setup header onto bootparams. Documentation/x86/boot.txt */
+	/* Copy setup header onto bootparams. Documentation/x86/boot.rst */
 	setup_header_size = 0x0202 + kernel[0x0201] - setup_hdr_offset;
 
 	/* Is there a limit on setup header size? */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index dcd272dbd0a9..f62b498b18fb 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -70,7 +70,7 @@ void __init pci_iommu_alloc(void)
 }
 
 /*
- * See <Documentation/x86/x86_64/boot-options.txt> for the iommu kernel
+ * See <Documentation/x86/x86_64/boot-options.rst> for the iommu kernel
  * parameter documentation.
  */
 static __init int iommu_setup(char *p)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7f61431c75fb..400c1ba033aa 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -711,7 +711,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 }
 
 /*
- * See Documentation/x86/tlb.txt for details.  We choose 33
+ * See Documentation/x86/tlb.rst for details.  We choose 33
  * because it is large enough to cover the vast majority (at
  * least 95%) of allocations, and is small enough that we are
  * confident it will not cause too much overhead.  Each single
diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c
index 1861a2ba0f2b..c0a502f7e3a7 100644
--- a/arch/x86/platform/pvh/enlighten.c
+++ b/arch/x86/platform/pvh/enlighten.c
@@ -86,7 +86,7 @@ static void __init init_pvh_bootparams(bool xen_guest)
 	}
 
 	/*
-	 * See Documentation/x86/boot.txt.
+	 * See Documentation/x86/boot.rst.
 	 *
 	 * Version 2.12 supports Xen entry point but we will use default x86/PC
 	 * environment (i.e. hardware_subarch 0).
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 283ee94224c6..2438f37f2ca1 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -333,7 +333,7 @@ config ACPI_CUSTOM_DSDT_FILE
 	depends on !STANDALONE
 	help
 	  This option supports a custom DSDT by linking it into the kernel.
-	  See Documentation/acpi/dsdt-override.txt
+	  See Documentation/admin-guide/acpi/dsdt-override.rst
 
 	  Enter the full path name to the file which includes the AmlCode
 	  or dsdt_aml_code declaration.
@@ -355,7 +355,7 @@ config ACPI_TABLE_UPGRADE
 	  This option provides functionality to upgrade arbitrary ACPI tables
 	  via initrd. No functional change if no ACPI tables are passed via
 	  initrd, therefore it's safe to say Y.
-	  See Documentation/acpi/initrd_table_override.txt for details
+	  See Documentation/admin-guide/acpi/initrd_table_override.rst for details
 
 config ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD
 	bool "Override ACPI tables from built-in initrd"
@@ -365,7 +365,7 @@ config ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD
 	  This option provides functionality to override arbitrary ACPI tables
 	  from built-in uncompressed initrd.
 
-	  See Documentation/acpi/initrd_table_override.txt for details
+	  See Documentation/admin-guide/acpi/initrd_table_override.rst for details
 
 config ACPI_DEBUG
 	bool "Debug Statements"
@@ -374,7 +374,7 @@ config ACPI_DEBUG
 	  output and increases the kernel size by around 50K.
 
 	  Use the acpi.debug_layer and acpi.debug_level kernel command-line
-	  parameters documented in Documentation/acpi/debug.txt and
+	  parameters documented in Documentation/firmware-guide/acpi/debug.rst and
 	  Documentation/admin-guide/kernel-parameters.rst to control the type and
 	  amount of debug output.
 
@@ -445,7 +445,7 @@ config ACPI_CUSTOM_METHOD
 	help
 	  This debug facility allows ACPI AML methods to be inserted and/or
 	  replaced without rebooting the system. For details refer to:
-	  Documentation/acpi/method-customizing.txt.
+	  Documentation/firmware-guide/acpi/method-customizing.rst.
 
 	  NOTE: This option is security sensitive, because it allows arbitrary
 	  kernel memory to be written to by root (uid=0) users, allowing them
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index b17b79e612a3..ac6280ad43a1 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1075,7 +1075,7 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv, phy_interface_t intf)
 	}
 
 	/* Indicate that we support PAUSE frames (see comment in
-	 * Documentation/networking/phy.txt)
+	 * Documentation/networking/phy.rst)
 	 */
 	phy_support_asym_pause(phydev);
 
diff --git a/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt b/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt
index 56af3f650fa3..89fb8e14676f 100644
--- a/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt
+++ b/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt
@@ -54,8 +54,8 @@ a limited few common behaviours and properties. This allows us to define
 a simple interface consisting of a character device and a set of sysfs files:
 
 See:
-Documentation/ABI/testing/sysfs-class-fieldbus-dev
-Documentation/ABI/testing/fieldbus-dev-cdev
+drivers/staging/fieldbus/Documentation/ABI/sysfs-class-fieldbus-dev
+drivers/staging/fieldbus/Documentation/ABI/fieldbus-dev-cdev
 
 Note that this simple interface does not provide a way to modify adapter
 configuration settings. It is therefore useful only for adapters that get their
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1e3ed41ae1f3..69938dbae2d0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1694,7 +1694,7 @@ EXPORT_SYMBOL_GPL(vhost_dev_ioctl);
 
 /* TODO: This is really inefficient.  We need something like get_user()
  * (instruction directly accesses the data, with an exception table entry
- * returning -EFAULT). See Documentation/x86/exception-tables.txt.
+ * returning -EFAULT). See Documentation/x86/exception-tables.rst.
  */
 static int set_bit_to_user(int nr, void __user *addr)
 {
diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
index de1804aeaf69..98e3db7a89cd 100644
--- a/include/acpi/acpi_drivers.h
+++ b/include/acpi/acpi_drivers.h
@@ -25,7 +25,7 @@
 #define ACPI_MAX_STRING			80
 
 /*
- * Please update drivers/acpi/debug.c and Documentation/acpi/debug.txt
+ * Please update drivers/acpi/debug.c and Documentation/firmware-guide/acpi/debug.rst
  * if you add to this list.
  */
 #define ACPI_BUS_COMPONENT		0x00010000
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index 1f966670c8dc..623eb58560b9 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -85,7 +85,7 @@ struct fs_parameter {
  * Superblock creation fills in ->root whereas reconfiguration begins with this
  * already set.
  *
- * See Documentation/filesystems/mounting.txt
+ * See Documentation/filesystems/mount_api.txt
  */
 struct fs_context {
 	const struct fs_context_operations *ops;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 47f58cfb6a19..df1318d85f7d 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -77,7 +77,7 @@
  *	state.  This is called immediately after commit_creds().
  *
  * Security hooks for mount using fs_context.
- *	[See also Documentation/filesystems/mounting.txt]
+ *	[See also Documentation/filesystems/mount_api.txt]
  *
  * @fs_context_dup:
  *	Allocate and attach a security structure to sc->security.  This pointer
diff --git a/mm/Kconfig b/mm/Kconfig
index ee8d1f311858..6e5fb81bde4b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -165,7 +165,7 @@ config MEMORY_HOTPLUG_DEFAULT_ONLINE
 	  onlining policy (/sys/devices/system/memory/auto_online_blocks) which
 	  determines what happens to newly added memory regions. Policy setting
 	  can always be changed at runtime.
-	  See Documentation/memory-hotplug.txt for more information.
+	  See Documentation/admin-guide/mm/memory-hotplug.rst for more information.
 
 	  Say Y here if you want all hot-plugged memory blocks to appear in
 	  'online' state by default.
diff --git a/security/Kconfig b/security/Kconfig
index aeac3676dd4d..6d75ed71970c 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -62,7 +62,7 @@ config PAGE_TABLE_ISOLATION
 	  ensuring that the majority of kernel addresses are not mapped
 	  into userspace.
 
-	  See Documentation/x86/pti.txt for more details.
+	  See Documentation/x86/pti.rst for more details.
 
 config SECURITY_INFINIBAND
 	bool "Infiniband Security Hooks"
diff --git a/tools/include/linux/err.h b/tools/include/linux/err.h
index 2f5a12b88a86..25f2bb3a991d 100644
--- a/tools/include/linux/err.h
+++ b/tools/include/linux/err.h
@@ -20,7 +20,7 @@
  * Userspace note:
  * The same principle works for userspace, because 'error' pointers
  * fall down to the unused hole far from user space, as described
- * in Documentation/x86/x86_64/mm.txt for x86_64 arch:
+ * in Documentation/x86/x86_64/mm.rst for x86_64 arch:
  *
  * 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension
  * ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 4dd11a554b9b..de094670050b 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -21,7 +21,7 @@ instructions).  Similarly, it knows how to follow switch statements, for
 which gcc sometimes uses jump tables.
 
 (Objtool also has an 'orc generate' subcommand which generates debuginfo
-for the ORC unwinder.  See Documentation/x86/orc-unwinder.txt in the
+for the ORC unwinder.  See Documentation/x86/orc-unwinder.rst in the
 kernel tree for more details.)
 
 
@@ -101,7 +101,7 @@ b) ORC (Oops Rewind Capability) unwind table generation
    band.  So it doesn't affect runtime performance and it can be
    reliable even when interrupts or exceptions are involved.
 
-   For more details, see Documentation/x86/orc-unwinder.txt.
+   For more details, see Documentation/x86/orc-unwinder.rst.
 
 c) Higher live patching compatibility rate
 
-- 
cgit v1.2.3-59-g8ed1b


From 9915ec28ec7fc79f0f30ebbba5d19bfa17eb7f03 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:34 -0300
Subject: docs: isdn: remove hisax references from kernel-parameters.txt

The hisax driver got removed on 85993b8c9786 ("isdn: remove hisax driver"),
but a left-over was kept at kernel-parameters.txt.

Fixes: 85993b8c9786 ("isdn: remove hisax driver")

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1abd7e145357..9b16b640ce48 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1388,9 +1388,6 @@
 			Valid parameters: "on", "off"
 			Default: "on"
 
-	hisax=		[HW,ISDN]
-			See Documentation/isdn/README.HiSax.
-
 	hlt		[BUGS=ARM,SH]
 
 	hpet=		[X86-32,HPET] option to control HPET usage
-- 
cgit v1.2.3-59-g8ed1b


From 5c437fa29561f5809ef114ba3a5e80556cc43fb3 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:35 -0300
Subject: docs: fs: fix broken links to vfs.txt with was renamed to vfs.rst

A recent documentation conversion renamed this file but forgot
to update the links.

Fixes: af96c1e304f7 ("docs: filesystems: vfs: Convert vfs.txt to RST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/porting | 10 +++++-----
 include/linux/dcache.h            |  4 ++--
 include/linux/fs.h                |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 3bd1148d8bb6..2813a19389fe 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -330,14 +330,14 @@ unreferenced dentries, and is now only called when the dentry refcount goes to
 [mandatory]
 
 	.d_compare() calling convention and locking rules are significantly
-changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
+changed. Read updated documentation in Documentation/filesystems/vfs.rst (and
 look at examples of other filesystems) for guidance.
 
 ---
 [mandatory]
 
 	.d_hash() calling convention and locking rules are significantly
-changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
+changed. Read updated documentation in Documentation/filesystems/vfs.rst (and
 look at examples of other filesystems) for guidance.
 
 ---
@@ -377,12 +377,12 @@ where possible.
 the filesystem provides it), which requires dropping out of rcu-walk mode. This
 may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
 returned if the filesystem cannot handle rcu-walk. See
-Documentation/filesystems/vfs.txt for more details.
+Documentation/filesystems/vfs.rst for more details.
 
 	permission is an inode permission check that is called on many or all
 directory inodes on the way down a path walk (to check for exec permission). It
 must now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
-Documentation/filesystems/vfs.txt for more details.
+Documentation/filesystems/vfs.rst for more details.
  
 --
 [mandatory]
@@ -625,7 +625,7 @@ in your dentry operations instead.
 --
 [mandatory]
 	->clone_file_range() and ->dedupe_file_range have been replaced with
-	->remap_file_range().  See Documentation/filesystems/vfs.txt for more
+	->remap_file_range().  See Documentation/filesystems/vfs.rst for more
 	information.
 --
 [recommended]
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index f14e587c5d5d..5e0eadf7de55 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -153,7 +153,7 @@ struct dentry_operations {
  * Locking rules for dentry_operations callbacks are to be found in
  * Documentation/filesystems/Locking. Keep it updated!
  *
- * FUrther descriptions are found in Documentation/filesystems/vfs.txt.
+ * FUrther descriptions are found in Documentation/filesystems/vfs.rst.
  * Keep it updated too!
  */
 
@@ -568,7 +568,7 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper)
  * If dentry is on a union/overlay, then return the underlying, real dentry.
  * Otherwise return the dentry itself.
  *
- * See also: Documentation/filesystems/vfs.txt
+ * See also: Documentation/filesystems/vfs.rst
  */
 static inline struct dentry *d_real(struct dentry *dentry,
 				    const struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f7fdfe93e25d..c564cf3f48d9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1769,7 +1769,7 @@ struct block_device_operations;
 /*
  * These flags control the behavior of the remap_file_range function pointer.
  * If it is called with len == 0 that means "remap to end of source file".
- * See Documentation/filesystems/vfs.txt for more details about this call.
+ * See Documentation/filesystems/vfs.rst for more details about this call.
  *
  * REMAP_FILE_DEDUP: only remap if contents identical (i.e. deduplicate)
  * REMAP_FILE_CAN_SHORTEN: caller can handle a shortened request
-- 
cgit v1.2.3-59-g8ed1b


From b640fbad2d8fe120c761f61eb6c96f05047100cd Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 7 Jun 2019 15:54:36 -0300
Subject: docs: pci: fix broken links due to conversion from pci.txt to pci.rst

Some documentation files were still pointing to the old place.

Fixes: 229b4e0728e0 ("Documentation: PCI: convert pci.txt to reST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/memory-barriers.txt                    | 2 +-
 Documentation/translations/ko_KR/memory-barriers.txt | 2 +-
 drivers/scsi/hpsa.c                                  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f70ebcdfe592..f4170aae1d75 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -548,7 +548,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
 
 	[*] For information on bus mastering DMA and coherency please read:
 
-	    Documentation/PCI/pci.txt
+	    Documentation/PCI/pci.rst
 	    Documentation/DMA-API-HOWTO.txt
 	    Documentation/DMA-API.txt
 
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index db0b9d8619f1..07725b1df002 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -569,7 +569,7 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE 
 
 	[*] 버스 마스터링 DMA 와 일관성에 대해서는 다음을 참고하시기 바랍니다:
 
-	    Documentation/PCI/pci.txt
+	    Documentation/PCI/pci.rst
 	    Documentation/DMA-API-HOWTO.txt
 	    Documentation/DMA-API.txt
 
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 1bef1da273c2..53df6f7dd3f9 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -7760,7 +7760,7 @@ static void hpsa_free_pci_init(struct ctlr_info *h)
 	hpsa_disable_interrupt_mode(h);		/* pci_init 2 */
 	/*
 	 * call pci_disable_device before pci_release_regions per
-	 * Documentation/PCI/pci.txt
+	 * Documentation/PCI/pci.rst
 	 */
 	pci_disable_device(h->pdev);		/* pci_init 1 */
 	pci_release_regions(h->pdev);		/* pci_init 2 */
@@ -7843,7 +7843,7 @@ clean2:	/* intmode+region, pci */
 clean1:
 	/*
 	 * call pci_disable_device before pci_release_regions per
-	 * Documentation/PCI/pci.txt
+	 * Documentation/PCI/pci.rst
 	 */
 	pci_disable_device(h->pdev);
 	pci_release_regions(h->pdev);
-- 
cgit v1.2.3-59-g8ed1b


From ce1a5ea18ef9bf4c62c75abe7c540a29264ec988 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 14 Jun 2019 09:02:49 +0200
Subject: Documentation: Remove duplicate x86 index entry

x86 got added twice to the index via the RST conversion and the MDS
documentation changes. Remove one instance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/index.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/index.rst b/Documentation/index.rst
index a7566ef62411..781042b4579d 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -112,7 +112,6 @@ implementation.
 .. toctree::
    :maxdepth: 2
 
-   x86/index
    sh/index
    x86/index
 
-- 
cgit v1.2.3-59-g8ed1b


From 305a99eb98af22996e9771078b7a19978732ed41 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:37 -0300
Subject: docs: aoe: convert docs to ReST and rename to *.rst

There are only two files within Documentation/aoe dir that are
documentation. The remaining ones are examples and shell
scripts.

Convert the two AoE files to ReST format, and add the others
as literal, as they're part of the documentation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/aoe/aoe.rst      | 150 +++++++++++++++++++++++++++++++++++++++++
 Documentation/aoe/aoe.txt      | 143 ---------------------------------------
 Documentation/aoe/examples.rst |  23 +++++++
 Documentation/aoe/index.rst    |  19 ++++++
 Documentation/aoe/todo.rst     |  17 +++++
 Documentation/aoe/todo.txt     |  14 ----
 Documentation/aoe/udev.txt     |   2 +-
 7 files changed, 210 insertions(+), 158 deletions(-)
 create mode 100644 Documentation/aoe/aoe.rst
 delete mode 100644 Documentation/aoe/aoe.txt
 create mode 100644 Documentation/aoe/examples.rst
 create mode 100644 Documentation/aoe/index.rst
 create mode 100644 Documentation/aoe/todo.rst
 delete mode 100644 Documentation/aoe/todo.txt

diff --git a/Documentation/aoe/aoe.rst b/Documentation/aoe/aoe.rst
new file mode 100644
index 000000000000..58747ecec71d
--- /dev/null
+++ b/Documentation/aoe/aoe.rst
@@ -0,0 +1,150 @@
+Introduction
+============
+
+ATA over Ethernet is a network protocol that provides simple access to
+block storage on the LAN.
+
+  http://support.coraid.com/documents/AoEr11.txt
+
+The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...
+
+  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
+
+It has many tips and hints!  Please see, especially, recommended
+tunings for virtual memory:
+
+  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19
+
+The aoetools are userland programs that are designed to work with this
+driver.  The aoetools are on sourceforge.
+
+  http://aoetools.sourceforge.net/
+
+The scripts in this Documentation/aoe directory are intended to
+document the use of the driver and are not necessary if you install
+the aoetools.
+
+
+Creating Device Nodes
+=====================
+
+  Users of udev should find the block device nodes created
+  automatically, but to create all the necessary device nodes, use the
+  udev configuration rules provided in udev.txt (in this directory).
+
+  There is a udev-install.sh script that shows how to install these
+  rules on your system.
+
+  There is also an autoload script that shows how to edit
+  /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
+  necessary.  Preloading the aoe module is preferable to autoloading,
+  however, because AoE discovery takes a few seconds.  It can be
+  confusing when an AoE device is not present the first time the a
+  command is run but appears a second later.
+
+Using Device Nodes
+==================
+
+  "cat /dev/etherd/err" blocks, waiting for error diagnostic output,
+  like any retransmitted packets.
+
+  "echo eth2 eth4 > /dev/etherd/interfaces" tells the aoe driver to
+  limit ATA over Ethernet traffic to eth2 and eth4.  AoE traffic from
+  untrusted networks should be ignored as a matter of security.  See
+  also the aoe_iflist driver option described below.
+
+  "echo > /dev/etherd/discover" tells the driver to find out what AoE
+  devices are available.
+
+  In the future these character devices may disappear and be replaced
+  by sysfs counterparts.  Using the commands in aoetools insulates
+  users from these implementation details.
+
+  The block devices are named like this::
+
+	e{shelf}.{slot}
+	e{shelf}.{slot}p{part}
+
+  ... so that "e0.2" is the third blade from the left (slot 2) in the
+  first shelf (shelf address zero).  That's the whole disk.  The first
+  partition on that disk would be "e0.2p1".
+
+Using sysfs
+===========
+
+  Each aoe block device in /sys/block has the extra attributes of
+  state, mac, and netif.  The state attribute is "up" when the device
+  is ready for I/O and "down" if detected but unusable.  The
+  "down,closewait" state shows that the device is still open and
+  cannot come up again until it has been closed.
+
+  The mac attribute is the ethernet address of the remote AoE device.
+  The netif attribute is the network interface on the localhost
+  through which we are communicating with the remote AoE device.
+
+  There is a script in this directory that formats this information in
+  a convenient way.  Users with aoetools should use the aoe-stat
+  command::
+
+    root@makki root# sh Documentation/aoe/status.sh
+       e10.0            eth3              up
+       e10.1            eth3              up
+       e10.2            eth3              up
+       e10.3            eth3              up
+       e10.4            eth3              up
+       e10.5            eth3              up
+       e10.6            eth3              up
+       e10.7            eth3              up
+       e10.8            eth3              up
+       e10.9            eth3              up
+        e4.0            eth1              up
+        e4.1            eth1              up
+        e4.2            eth1              up
+        e4.3            eth1              up
+        e4.4            eth1              up
+        e4.5            eth1              up
+        e4.6            eth1              up
+        e4.7            eth1              up
+        e4.8            eth1              up
+        e4.9            eth1              up
+
+  Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
+  option discussed below) instead of /dev/etherd/interfaces to limit
+  AoE traffic to the network interfaces in the given
+  whitespace-separated list.  Unlike the old character device, the
+  sysfs entry can be read from as well as written to.
+
+  It's helpful to trigger discovery after setting the list of allowed
+  interfaces.  The aoetools package provides an aoe-discover script
+  for this purpose.  You can also directly use the
+  /dev/etherd/discover special file described above.
+
+Driver Options
+==============
+
+  There is a boot option for the built-in aoe driver and a
+  corresponding module parameter, aoe_iflist.  Without this option,
+  all network interfaces may be used for ATA over Ethernet.  Here is a
+  usage example for the module parameter::
+
+    modprobe aoe_iflist="eth1 eth3"
+
+  The aoe_deadsecs module parameter determines the maximum number of
+  seconds that the driver will wait for an AoE device to provide a
+  response to an AoE command.  After aoe_deadsecs seconds have
+  elapsed, the AoE device will be marked as "down".  A value of zero
+  is supported for testing purposes and makes the aoe driver keep
+  trying AoE commands forever.
+
+  The aoe_maxout module parameter has a default of 128.  This is the
+  maximum number of unresponded packets that will be sent to an AoE
+  target at one time.
+
+  The aoe_dyndevs module parameter defaults to 1, meaning that the
+  driver will assign a block device minor number to a discovered AoE
+  target based on the order of its discovery.  With dynamic minor
+  device numbers in use, a greater range of AoE shelf and slot
+  addresses can be supported.  Users with udev will never have to
+  think about minor numbers.  Using aoe_dyndevs=0 allows device nodes
+  to be pre-created using a static minor-number scheme with the
+  aoe-mkshelf script in the aoetools.
diff --git a/Documentation/aoe/aoe.txt b/Documentation/aoe/aoe.txt
deleted file mode 100644
index c71487d399d1..000000000000
--- a/Documentation/aoe/aoe.txt
+++ /dev/null
@@ -1,143 +0,0 @@
-ATA over Ethernet is a network protocol that provides simple access to
-block storage on the LAN.
-
-  http://support.coraid.com/documents/AoEr11.txt
-
-The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...
-
-  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
-
-It has many tips and hints!  Please see, especially, recommended
-tunings for virtual memory:
-
-  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19
-
-The aoetools are userland programs that are designed to work with this
-driver.  The aoetools are on sourceforge.
-
-  http://aoetools.sourceforge.net/
-
-The scripts in this Documentation/aoe directory are intended to
-document the use of the driver and are not necessary if you install
-the aoetools.
-
-
-CREATING DEVICE NODES
-
-  Users of udev should find the block device nodes created
-  automatically, but to create all the necessary device nodes, use the
-  udev configuration rules provided in udev.txt (in this directory).
-
-  There is a udev-install.sh script that shows how to install these
-  rules on your system.
-
-  There is also an autoload script that shows how to edit
-  /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
-  necessary.  Preloading the aoe module is preferable to autoloading,
-  however, because AoE discovery takes a few seconds.  It can be
-  confusing when an AoE device is not present the first time the a
-  command is run but appears a second later.
-
-USING DEVICE NODES
-
-  "cat /dev/etherd/err" blocks, waiting for error diagnostic output,
-  like any retransmitted packets.
-
-  "echo eth2 eth4 > /dev/etherd/interfaces" tells the aoe driver to
-  limit ATA over Ethernet traffic to eth2 and eth4.  AoE traffic from
-  untrusted networks should be ignored as a matter of security.  See
-  also the aoe_iflist driver option described below.
-
-  "echo > /dev/etherd/discover" tells the driver to find out what AoE
-  devices are available.
-
-  In the future these character devices may disappear and be replaced
-  by sysfs counterparts.  Using the commands in aoetools insulates
-  users from these implementation details.
-
-  The block devices are named like this:
-
-	e{shelf}.{slot}
-	e{shelf}.{slot}p{part}
-
-  ... so that "e0.2" is the third blade from the left (slot 2) in the
-  first shelf (shelf address zero).  That's the whole disk.  The first
-  partition on that disk would be "e0.2p1".
-
-USING SYSFS
-
-  Each aoe block device in /sys/block has the extra attributes of
-  state, mac, and netif.  The state attribute is "up" when the device
-  is ready for I/O and "down" if detected but unusable.  The
-  "down,closewait" state shows that the device is still open and
-  cannot come up again until it has been closed.
-
-  The mac attribute is the ethernet address of the remote AoE device.
-  The netif attribute is the network interface on the localhost
-  through which we are communicating with the remote AoE device.
-
-  There is a script in this directory that formats this information in
-  a convenient way.  Users with aoetools should use the aoe-stat
-  command.
-
-  root@makki root# sh Documentation/aoe/status.sh 
-     e10.0            eth3              up
-     e10.1            eth3              up
-     e10.2            eth3              up
-     e10.3            eth3              up
-     e10.4            eth3              up
-     e10.5            eth3              up
-     e10.6            eth3              up
-     e10.7            eth3              up
-     e10.8            eth3              up
-     e10.9            eth3              up
-      e4.0            eth1              up
-      e4.1            eth1              up
-      e4.2            eth1              up
-      e4.3            eth1              up
-      e4.4            eth1              up
-      e4.5            eth1              up
-      e4.6            eth1              up
-      e4.7            eth1              up
-      e4.8            eth1              up
-      e4.9            eth1              up
-
-  Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
-  option discussed below) instead of /dev/etherd/interfaces to limit
-  AoE traffic to the network interfaces in the given
-  whitespace-separated list.  Unlike the old character device, the
-  sysfs entry can be read from as well as written to.
-
-  It's helpful to trigger discovery after setting the list of allowed
-  interfaces.  The aoetools package provides an aoe-discover script
-  for this purpose.  You can also directly use the
-  /dev/etherd/discover special file described above.
-
-DRIVER OPTIONS
-
-  There is a boot option for the built-in aoe driver and a
-  corresponding module parameter, aoe_iflist.  Without this option,
-  all network interfaces may be used for ATA over Ethernet.  Here is a
-  usage example for the module parameter.
-
-    modprobe aoe_iflist="eth1 eth3"
-
-  The aoe_deadsecs module parameter determines the maximum number of
-  seconds that the driver will wait for an AoE device to provide a
-  response to an AoE command.  After aoe_deadsecs seconds have
-  elapsed, the AoE device will be marked as "down".  A value of zero
-  is supported for testing purposes and makes the aoe driver keep
-  trying AoE commands forever.
-
-  The aoe_maxout module parameter has a default of 128.  This is the
-  maximum number of unresponded packets that will be sent to an AoE
-  target at one time.
-
-  The aoe_dyndevs module parameter defaults to 1, meaning that the
-  driver will assign a block device minor number to a discovered AoE
-  target based on the order of its discovery.  With dynamic minor
-  device numbers in use, a greater range of AoE shelf and slot
-  addresses can be supported.  Users with udev will never have to
-  think about minor numbers.  Using aoe_dyndevs=0 allows device nodes
-  to be pre-created using a static minor-number scheme with the
-  aoe-mkshelf script in the aoetools.
diff --git a/Documentation/aoe/examples.rst b/Documentation/aoe/examples.rst
new file mode 100644
index 000000000000..91f3198e52c1
--- /dev/null
+++ b/Documentation/aoe/examples.rst
@@ -0,0 +1,23 @@
+Example of udev rules
+---------------------
+
+ .. include:: udev.txt
+    :literal:
+
+Example of udev install rules script
+------------------------------------
+
+ .. literalinclude:: udev-install.sh
+    :language: shell
+
+Example script to get status
+----------------------------
+
+ .. literalinclude:: status.sh
+    :language: shell
+
+Example of AoE autoload script
+------------------------------
+
+ .. literalinclude:: autoload.sh
+    :language: shell
diff --git a/Documentation/aoe/index.rst b/Documentation/aoe/index.rst
new file mode 100644
index 000000000000..4394b9b7913c
--- /dev/null
+++ b/Documentation/aoe/index.rst
@@ -0,0 +1,19 @@
+:orphan:
+
+=======================
+ATA over Ethernet (AoE)
+=======================
+
+.. toctree::
+    :maxdepth: 1
+
+    aoe
+    todo
+    examples
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/aoe/todo.rst b/Documentation/aoe/todo.rst
new file mode 100644
index 000000000000..dea8db5a33e1
--- /dev/null
+++ b/Documentation/aoe/todo.rst
@@ -0,0 +1,17 @@
+TODO
+====
+
+There is a potential for deadlock when allocating a struct sk_buff for
+data that needs to be written out to aoe storage.  If the data is
+being written from a dirty page in order to free that page, and if
+there are no other pages available, then deadlock may occur when a
+free page is needed for the sk_buff allocation.  This situation has
+not been observed, but it would be nice to eliminate any potential for
+deadlock under memory pressure.
+
+Because ATA over Ethernet is not fragmented by the kernel's IP code,
+the destructor member of the struct sk_buff is available to the aoe
+driver.  By using a mempool for allocating all but the first few
+sk_buffs, and by registering a destructor, we should be able to
+efficiently allocate sk_buffs without introducing any potential for
+deadlock.
diff --git a/Documentation/aoe/todo.txt b/Documentation/aoe/todo.txt
deleted file mode 100644
index c09dfad4aed8..000000000000
--- a/Documentation/aoe/todo.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-There is a potential for deadlock when allocating a struct sk_buff for
-data that needs to be written out to aoe storage.  If the data is
-being written from a dirty page in order to free that page, and if
-there are no other pages available, then deadlock may occur when a
-free page is needed for the sk_buff allocation.  This situation has
-not been observed, but it would be nice to eliminate any potential for
-deadlock under memory pressure.
-
-Because ATA over Ethernet is not fragmented by the kernel's IP code,
-the destructor member of the struct sk_buff is available to the aoe
-driver.  By using a mempool for allocating all but the first few
-sk_buffs, and by registering a destructor, we should be able to
-efficiently allocate sk_buffs without introducing any potential for
-deadlock.
diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt
index 1f06daf03f5b..54feda5a0772 100644
--- a/Documentation/aoe/udev.txt
+++ b/Documentation/aoe/udev.txt
@@ -11,7 +11,7 @@
 #   udev_rules="/etc/udev/rules.d/"
 #   bash# ls /etc/udev/rules.d/
 #   10-wacom.rules  50-udev.rules
-#   bash# cp /path/to/linux-2.6.xx/Documentation/aoe/udev.txt \
+#   bash# cp /path/to/linux/Documentation/aoe/udev.txt \
 #           /etc/udev/rules.d/60-aoe.rules
 #  
 
-- 
cgit v1.2.3-59-g8ed1b


From b693d0b372afb39432e1c49ad7b3454855bc6bed Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:38 -0300
Subject: docs: arm64: convert docs to ReST and rename to .rst

The documentation is in a format that is very close to ReST format.

The conversion is actually:
  - add blank lines in order to identify paragraphs;
  - fixing tables markups;
  - adding some lists markups;
  - marking literal blocks;
  - adjust some title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm64/acpi_object_usage.rst          | 738 +++++++++++++++++++++
 Documentation/arm64/acpi_object_usage.txt          | 622 -----------------
 Documentation/arm64/arm-acpi.rst                   | 528 +++++++++++++++
 Documentation/arm64/arm-acpi.txt                   | 519 ---------------
 Documentation/arm64/booting.rst                    | 293 ++++++++
 Documentation/arm64/booting.txt                    | 266 --------
 Documentation/arm64/cpu-feature-registers.rst      | 304 +++++++++
 Documentation/arm64/cpu-feature-registers.txt      | 296 ---------
 Documentation/arm64/elf_hwcaps.rst                 | 201 ++++++
 Documentation/arm64/elf_hwcaps.txt                 | 231 -------
 Documentation/arm64/hugetlbpage.rst                |  41 ++
 Documentation/arm64/hugetlbpage.txt                |  38 --
 Documentation/arm64/index.rst                      |  28 +
 Documentation/arm64/legacy_instructions.rst        |  68 ++
 Documentation/arm64/legacy_instructions.txt        |  57 --
 Documentation/arm64/memory.rst                     |  98 +++
 Documentation/arm64/memory.txt                     |  97 ---
 Documentation/arm64/pointer-authentication.rst     | 109 +++
 Documentation/arm64/pointer-authentication.txt     | 107 ---
 Documentation/arm64/silicon-errata.rst             | 131 ++++
 Documentation/arm64/silicon-errata.txt             |  88 ---
 Documentation/arm64/sve.rst                        | 529 +++++++++++++++
 Documentation/arm64/sve.txt                        | 525 ---------------
 Documentation/arm64/tagged-pointers.rst            |  68 ++
 Documentation/arm64/tagged-pointers.txt            |  66 --
 Documentation/translations/zh_CN/arm64/booting.txt |   4 +-
 .../zh_CN/arm64/legacy_instructions.txt            |   4 +-
 Documentation/translations/zh_CN/arm64/memory.txt  |   4 +-
 .../translations/zh_CN/arm64/silicon-errata.txt    |   4 +-
 .../translations/zh_CN/arm64/tagged-pointers.txt   |   4 +-
 Documentation/virtual/kvm/api.txt                  |   2 +-
 arch/arm64/include/asm/efi.h                       |   2 +-
 arch/arm64/include/asm/image.h                     |   2 +-
 arch/arm64/include/uapi/asm/sigcontext.h           |   2 +-
 arch/arm64/kernel/kexec_image.c                    |   2 +-
 35 files changed, 3151 insertions(+), 2927 deletions(-)
 create mode 100644 Documentation/arm64/acpi_object_usage.rst
 delete mode 100644 Documentation/arm64/acpi_object_usage.txt
 create mode 100644 Documentation/arm64/arm-acpi.rst
 delete mode 100644 Documentation/arm64/arm-acpi.txt
 create mode 100644 Documentation/arm64/booting.rst
 delete mode 100644 Documentation/arm64/booting.txt
 create mode 100644 Documentation/arm64/cpu-feature-registers.rst
 delete mode 100644 Documentation/arm64/cpu-feature-registers.txt
 create mode 100644 Documentation/arm64/elf_hwcaps.rst
 delete mode 100644 Documentation/arm64/elf_hwcaps.txt
 create mode 100644 Documentation/arm64/hugetlbpage.rst
 delete mode 100644 Documentation/arm64/hugetlbpage.txt
 create mode 100644 Documentation/arm64/index.rst
 create mode 100644 Documentation/arm64/legacy_instructions.rst
 delete mode 100644 Documentation/arm64/legacy_instructions.txt
 create mode 100644 Documentation/arm64/memory.rst
 delete mode 100644 Documentation/arm64/memory.txt
 create mode 100644 Documentation/arm64/pointer-authentication.rst
 delete mode 100644 Documentation/arm64/pointer-authentication.txt
 create mode 100644 Documentation/arm64/silicon-errata.rst
 delete mode 100644 Documentation/arm64/silicon-errata.txt
 create mode 100644 Documentation/arm64/sve.rst
 delete mode 100644 Documentation/arm64/sve.txt
 create mode 100644 Documentation/arm64/tagged-pointers.rst
 delete mode 100644 Documentation/arm64/tagged-pointers.txt

diff --git a/Documentation/arm64/acpi_object_usage.rst b/Documentation/arm64/acpi_object_usage.rst
new file mode 100644
index 000000000000..d51b69dc624d
--- /dev/null
+++ b/Documentation/arm64/acpi_object_usage.rst
@@ -0,0 +1,738 @@
+===========
+ACPI Tables
+===========
+
+The expectations of individual ACPI tables are discussed in the list that
+follows.
+
+If a section number is used, it refers to a section number in the ACPI
+specification where the object is defined.  If "Signature Reserved" is used,
+the table signature (the first four bytes of the table) is the only portion
+of the table recognized by the specification, and the actual table is defined
+outside of the UEFI Forum (see Section 5.2.6 of the specification).
+
+For ACPI on arm64, tables also fall into the following categories:
+
+       -  Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
+
+       -  Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
+
+       -  Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT,
+          MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO,
+	  TCPA, TPM2, UEFI, XENV
+
+       -  Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT,
+          MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
+
+====== ========================================================================
+Table  Usage for ARMv8 Linux
+====== ========================================================================
+BERT   Section 18.3 (signature == "BERT")
+
+       **Boot Error Record Table**
+
+       Must be supplied if RAS support is provided by the platform.  It
+       is recommended this table be supplied.
+
+BOOT   Signature Reserved (signature == "BOOT")
+
+       **simple BOOT flag table**
+
+       Microsoft only table, will not be supported.
+
+BGRT   Section 5.2.22 (signature == "BGRT")
+
+       **Boot Graphics Resource Table**
+
+       Optional, not currently supported, with no real use-case for an
+       ARM server.
+
+CPEP   Section 5.2.18 (signature == "CPEP")
+
+       **Corrected Platform Error Polling table**
+
+       Optional, not currently supported, and not recommended until such
+       time as ARM-compatible hardware is available, and the specification
+       suitably modified.
+
+CSRT   Signature Reserved (signature == "CSRT")
+
+       **Core System Resources Table**
+
+       Optional, not currently supported.
+
+DBG2   Signature Reserved (signature == "DBG2")
+
+       **DeBuG port table 2**
+
+       License has changed and should be usable.  Optional if used instead
+       of earlycon=<device> on the command line.
+
+DBGP   Signature Reserved (signature == "DBGP")
+
+       **DeBuG Port table**
+
+       Microsoft only table, will not be supported.
+
+DSDT   Section 5.2.11.1 (signature == "DSDT")
+
+       **Differentiated System Description Table**
+
+       A DSDT is required; see also SSDT.
+
+       ACPI tables contain only one DSDT but can contain one or more SSDTs,
+       which are optional.  Each SSDT can only add to the ACPI namespace,
+       but cannot modify or replace anything in the DSDT.
+
+DMAR   Signature Reserved (signature == "DMAR")
+
+       **DMA Remapping table**
+
+       x86 only table, will not be supported.
+
+DRTM   Signature Reserved (signature == "DRTM")
+
+       **Dynamic Root of Trust for Measurement table**
+
+       Optional, not currently supported.
+
+ECDT   Section 5.2.16 (signature == "ECDT")
+
+       **Embedded Controller Description Table**
+
+       Optional, not currently supported, but could be used on ARM if and
+       only if one uses the GPE_BIT field to represent an IRQ number, since
+       there are no GPE blocks defined in hardware reduced mode.  This would
+       need to be modified in the ACPI specification.
+
+EINJ   Section 18.6 (signature == "EINJ")
+
+       **Error Injection table**
+
+       This table is very useful for testing platform response to error
+       conditions; it allows one to inject an error into the system as
+       if it had actually occurred.  However, this table should not be
+       shipped with a production system; it should be dynamically loaded
+       and executed with the ACPICA tools only during testing.
+
+ERST   Section 18.5 (signature == "ERST")
+
+       **Error Record Serialization Table**
+
+       On a platform supports RAS, this table must be supplied if it is not
+       UEFI-based; if it is UEFI-based, this table may be supplied. When this
+       table is not present, UEFI run time service will be utilized to save
+       and retrieve hardware error information to and from a persistent store.
+
+ETDT   Signature Reserved (signature == "ETDT")
+
+       **Event Timer Description Table**
+
+       Obsolete table, will not be supported.
+
+FACS   Section 5.2.10 (signature == "FACS")
+
+       **Firmware ACPI Control Structure**
+
+       It is unlikely that this table will be terribly useful.  If it is
+       provided, the Global Lock will NOT be used since it is not part of
+       the hardware reduced profile, and only 64-bit address fields will
+       be considered valid.
+
+FADT   Section 5.2.9 (signature == "FACP")
+
+       **Fixed ACPI Description Table**
+       Required for arm64.
+
+
+       The HW_REDUCED_ACPI flag must be set.  All of the fields that are
+       to be ignored when HW_REDUCED_ACPI is set are expected to be set to
+       zero.
+
+       If an FACS table is provided, the X_FIRMWARE_CTRL field is to be
+       used, not FIRMWARE_CTRL.
+
+       If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is
+       filled in properly - that the PSCI_COMPLIANT flag is set and that
+       PSCI_USE_HVC is set or unset as needed (see table 5-37).
+
+       For the DSDT that is also required, the X_DSDT field is to be used,
+       not the DSDT field.
+
+FPDT   Section 5.2.23 (signature == "FPDT")
+
+       **Firmware Performance Data Table**
+
+       Optional, not currently supported.
+
+GTDT   Section 5.2.24 (signature == "GTDT")
+
+       **Generic Timer Description Table**
+
+       Required for arm64.
+
+HEST   Section 18.3.2 (signature == "HEST")
+
+       **Hardware Error Source Table**
+
+       ARM-specific error sources have been defined; please use those or the
+       PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER
+       Bridge), or use type 9 (Generic Hardware Error Source).  Firmware first
+       error handling is possible if and only if Trusted Firmware is being
+       used on arm64.
+
+       Must be supplied if RAS support is provided by the platform.  It
+       is recommended this table be supplied.
+
+HPET   Signature Reserved (signature == "HPET")
+
+       **High Precision Event timer Table**
+
+       x86 only table, will not be supported.
+
+IBFT   Signature Reserved (signature == "IBFT")
+
+       **iSCSI Boot Firmware Table**
+
+       Microsoft defined table, support TBD.
+
+IORT   Signature Reserved (signature == "IORT")
+
+       **Input Output Remapping Table**
+
+       arm64 only table, required in order to describe IO topology, SMMUs,
+       and GIC ITSs, and how those various components are connected together,
+       such as identifying which components are behind which SMMUs/ITSs.
+       This table will only be required on certain SBSA platforms (e.g.,
+       when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it
+       remains optional.
+
+IVRS   Signature Reserved (signature == "IVRS")
+
+       **I/O Virtualization Reporting Structure**
+
+       x86_64 (AMD) only table, will not be supported.
+
+LPIT   Signature Reserved (signature == "LPIT")
+
+       **Low Power Idle Table**
+
+       x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
+       descriptions and power states on ARM platforms should use the DSDT
+       and define processor container devices (_HID ACPI0010, Section 8.4,
+       and more specifically 8.4.3 and and 8.4.4).
+
+MADT   Section 5.2.12 (signature == "APIC")
+
+       **Multiple APIC Description Table**
+
+       Required for arm64.  Only the GIC interrupt controller structures
+       should be used (types 0xA - 0xF).
+
+MCFG   Signature Reserved (signature == "MCFG")
+
+       **Memory-mapped ConFiGuration space**
+
+       If the platform supports PCI/PCIe, an MCFG table is required.
+
+MCHI   Signature Reserved (signature == "MCHI")
+
+       **Management Controller Host Interface table**
+
+       Optional, not currently supported.
+
+MPST   Section 5.2.21 (signature == "MPST")
+
+       **Memory Power State Table**
+
+       Optional, not currently supported.
+
+MSCT   Section 5.2.19 (signature == "MSCT")
+
+       **Maximum System Characteristic Table**
+
+       Optional, not currently supported.
+
+MSDM   Signature Reserved (signature == "MSDM")
+
+       **Microsoft Data Management table**
+
+       Microsoft only table, will not be supported.
+
+NFIT   Section 5.2.25 (signature == "NFIT")
+
+       **NVDIMM Firmware Interface Table**
+
+       Optional, not currently supported.
+
+OEMx   Signature of "OEMx" only
+
+       **OEM Specific Tables**
+
+       All tables starting with a signature of "OEM" are reserved for OEM
+       use.  Since these are not meant to be of general use but are limited
+       to very specific end users, they are not recommended for use and are
+       not supported by the kernel for arm64.
+
+PCCT   Section 14.1 (signature == "PCCT)
+
+       **Platform Communications Channel Table**
+
+       Recommend for use on arm64; use of PCC is recommended when using CPPC
+       to control performance and power for platform processors.
+
+PMTT   Section 5.2.21.12 (signature == "PMTT")
+
+       **Platform Memory Topology Table**
+
+       Optional, not currently supported.
+
+PSDT   Section 5.2.11.3 (signature == "PSDT")
+
+       **Persistent System Description Table**
+
+       Obsolete table, will not be supported.
+
+RASF   Section 5.2.20 (signature == "RASF")
+
+       **RAS Feature table**
+
+       Optional, not currently supported.
+
+RSDP   Section 5.2.5 (signature == "RSD PTR")
+
+       **Root System Description PoinTeR**
+
+       Required for arm64.
+
+RSDT   Section 5.2.7 (signature == "RSDT")
+
+       **Root System Description Table**
+
+       Since this table can only provide 32-bit addresses, it is deprecated
+       on arm64, and will not be used.  If provided, it will be ignored.
+
+SBST   Section 5.2.14 (signature == "SBST")
+
+       **Smart Battery Subsystem Table**
+
+       Optional, not currently supported.
+
+SLIC   Signature Reserved (signature == "SLIC")
+
+       **Software LIcensing table**
+
+       Microsoft only table, will not be supported.
+
+SLIT   Section 5.2.17 (signature == "SLIT")
+
+       **System Locality distance Information Table**
+
+       Optional in general, but required for NUMA systems.
+
+SPCR   Signature Reserved (signature == "SPCR")
+
+       **Serial Port Console Redirection table**
+
+       Required for arm64.
+
+SPMI   Signature Reserved (signature == "SPMI")
+
+       **Server Platform Management Interface table**
+
+       Optional, not currently supported.
+
+SRAT   Section 5.2.16 (signature == "SRAT")
+
+       **System Resource Affinity Table**
+
+       Optional, but if used, only the GICC Affinity structures are read.
+       To support arm64 NUMA, this table is required.
+
+SSDT   Section 5.2.11.2 (signature == "SSDT")
+
+       **Secondary System Description Table**
+
+       These tables are a continuation of the DSDT; these are recommended
+       for use with devices that can be added to a running system, but can
+       also serve the purpose of dividing up device descriptions into more
+       manageable pieces.
+
+       An SSDT can only ADD to the ACPI namespace.  It cannot modify or
+       replace existing device descriptions already in the namespace.
+
+       These tables are optional, however.  ACPI tables should contain only
+       one DSDT but can contain many SSDTs.
+
+STAO   Signature Reserved (signature == "STAO")
+
+       **_STA Override table**
+
+       Optional, but only necessary in virtualized environments in order to
+       hide devices from guest OSs.
+
+TCPA   Signature Reserved (signature == "TCPA")
+
+       **Trusted Computing Platform Alliance table**
+
+       Optional, not currently supported, and may need changes to fully
+       interoperate with arm64.
+
+TPM2   Signature Reserved (signature == "TPM2")
+
+       **Trusted Platform Module 2 table**
+
+       Optional, not currently supported, and may need changes to fully
+       interoperate with arm64.
+
+UEFI   Signature Reserved (signature == "UEFI")
+
+       **UEFI ACPI data table**
+
+       Optional, not currently supported.  No known use case for arm64,
+       at present.
+
+WAET   Signature Reserved (signature == "WAET")
+
+       **Windows ACPI Emulated devices Table**
+
+       Microsoft only table, will not be supported.
+
+WDAT   Signature Reserved (signature == "WDAT")
+
+       **Watch Dog Action Table**
+
+       Microsoft only table, will not be supported.
+
+WDRT   Signature Reserved (signature == "WDRT")
+
+       **Watch Dog Resource Table**
+
+       Microsoft only table, will not be supported.
+
+WPBT   Signature Reserved (signature == "WPBT")
+
+       **Windows Platform Binary Table**
+
+       Microsoft only table, will not be supported.
+
+XENV   Signature Reserved (signature == "XENV")
+
+       **Xen project table**
+
+       Optional, used only by Xen at present.
+
+XSDT   Section 5.2.8 (signature == "XSDT")
+
+       **eXtended System Description Table**
+
+       Required for arm64.
+====== ========================================================================
+
+ACPI Objects
+------------
+The expectations on individual ACPI objects that are likely to be used are
+shown in the list that follows; any object not explicitly mentioned below
+should be used as needed for a particular platform or particular subsystem,
+such as power management or PCI.
+
+===== ================ ========================================================
+Name   Section         Usage for ARMv8 Linux
+===== ================ ========================================================
+_CCA   6.2.17          This method must be defined for all bus masters
+                       on arm64 - there are no assumptions made about
+                       whether such devices are cache coherent or not.
+                       The _CCA value is inherited by all descendants of
+                       these devices so it does not need to be repeated.
+                       Without _CCA on arm64, the kernel does not know what
+                       to do about setting up DMA for the device.
+
+                       NB: this method provides default cache coherency
+                       attributes; the presence of an SMMU can be used to
+                       modify that, however.  For example, a master could
+                       default to non-coherent, but be made coherent with
+                       the appropriate SMMU configuration (see Table 17 of
+                       the IORT specification, ARM Document DEN 0049B).
+
+_CID   6.1.2           Use as needed, see also _HID.
+
+_CLS   6.1.3           Use as needed, see also _HID.
+
+_CPC   8.4.7.1         Use as needed, power management specific.  CPPC is
+                       recommended on arm64.
+
+_CRS   6.2.2           Required on arm64.
+
+_CSD   8.4.2.2         Use as needed, used only in conjunction with _CST.
+
+_CST   8.4.2.1         Low power idle states (8.4.4) are recommended instead
+                       of C-states.
+
+_DDN   6.1.4           This field can be used for a device name.  However,
+                       it is meant for DOS device names (e.g., COM1), so be
+                       careful of its use across OSes.
+
+_DSD   6.2.5           To be used with caution.  If this object is used, try
+                       to use it within the constraints already defined by the
+                       Device Properties UUID.  Only in rare circumstances
+                       should it be necessary to create a new _DSD UUID.
+
+                       In either case, submit the _DSD definition along with
+                       any driver patches for discussion, especially when
+                       device properties are used.  A driver will not be
+                       considered complete without a corresponding _DSD
+                       description.  Once approved by kernel maintainers,
+                       the UUID or device properties must then be registered
+                       with the UEFI Forum; this may cause some iteration as
+                       more than one OS will be registering entries.
+
+_DSM   9.1.1           Do not use this method.  It is not standardized, the
+                       return values are not well documented, and it is
+                       currently a frequent source of error.
+
+\_GL   5.7.1           This object is not to be used in hardware reduced
+                       mode, and therefore should not be used on arm64.
+
+_GLK   6.5.7           This object requires a global lock be defined; there
+                       is no global lock on arm64 since it runs in hardware
+                       reduced mode.  Hence, do not use this object on arm64.
+
+\_GPE  5.3.1           This namespace is for x86 use only.  Do not use it
+                       on arm64.
+
+_HID   6.1.5           This is the primary object to use in device probing,
+		       though _CID and _CLS may also be used.
+
+_INI   6.5.1           Not required, but can be useful in setting up devices
+                       when UEFI leaves them in a state that may not be what
+                       the driver expects before it starts probing.
+
+_LPI   8.4.4.3         Recommended for use with processor definitions (_HID
+		       ACPI0010) on arm64.  See also _RDI.
+
+_MLS   6.1.7           Highly recommended for use in internationalization.
+
+_OFF   7.2.2           It is recommended to define this method for any device
+                       that can be turned on or off.
+
+_ON    7.2.3           It is recommended to define this method for any device
+                       that can be turned on or off.
+
+\_OS   5.7.3           This method will return "Linux" by default (this is
+                       the value of the macro ACPI_OS_NAME on Linux).  The
+                       command line parameter acpi_os=<string> can be used
+                       to set it to some other value.
+
+_OSC   6.2.11          This method can be a global method in ACPI (i.e.,
+                       \_SB._OSC), or it may be associated with a specific
+                       device (e.g., \_SB.DEV0._OSC), or both.  When used
+                       as a global method, only capabilities published in
+                       the ACPI specification are allowed.  When used as
+                       a device-specific method, the process described for
+                       using _DSD MUST be used to create an _OSC definition;
+                       out-of-process use of _OSC is not allowed.  That is,
+                       submit the device-specific _OSC usage description as
+                       part of the kernel driver submission, get it approved
+                       by the kernel community, then register it with the
+                       UEFI Forum.
+
+\_OSI  5.7.2           Deprecated on ARM64.  As far as ACPI firmware is
+		       concerned, _OSI is not to be used to determine what
+		       sort of system is being used or what functionality
+		       is provided.  The _OSC method is to be used instead.
+
+_PDC   8.4.1           Deprecated, do not use on arm64.
+
+\_PIC  5.8.1           The method should not be used.  On arm64, the only
+                       interrupt model available is GIC.
+
+\_PR   5.3.1           This namespace is for x86 use only on legacy systems.
+                       Do not use it on arm64.
+
+_PRT   6.2.13          Required as part of the definition of all PCI root
+                       devices.
+
+_PRx   7.3.8-11        Use as needed; power management specific.  If _PR0 is
+                       defined, _PR3 must also be defined.
+
+_PSx   7.3.2-5         Use as needed; power management specific.  If _PS0 is
+                       defined, _PS3 must also be defined.  If clocks or
+                       regulators need adjusting to be consistent with power
+                       usage, change them in these methods.
+
+_RDI   8.4.4.4         Recommended for use with processor definitions (_HID
+		       ACPI0010) on arm64.  This should only be used in
+		       conjunction with _LPI.
+
+\_REV  5.7.4           Always returns the latest version of ACPI supported.
+
+\_SB   5.3.1           Required on arm64; all devices must be defined in this
+                       namespace.
+
+_SLI   6.2.15          Use is recommended when SLIT table is in use.
+
+_STA   6.3.7,          It is recommended to define this method for any device
+       7.2.4           that can be turned on or off.  See also the STAO table
+                       that provides overrides to hide devices in virtualized
+                       environments.
+
+_SRS   6.2.16          Use as needed; see also _PRS.
+
+_STR   6.1.10          Recommended for conveying device names to end users;
+                       this is preferred over using _DDN.
+
+_SUB   6.1.9           Use as needed; _HID or _CID are preferred.
+
+_SUN   6.1.11          Use as needed, but recommended.
+
+_SWS   7.4.3           Use as needed; power management specific; this may
+                       require specification changes for use on arm64.
+
+_UID   6.1.12          Recommended for distinguishing devices of the same
+                       class; define it if at all possible.
+===== ================ ========================================================
+
+
+
+
+ACPI Event Model
+----------------
+Do not use GPE block devices; these are not supported in the hardware reduced
+profile used by arm64.  Since there are no GPE blocks defined for use on ARM
+platforms, ACPI events must be signaled differently.
+
+There are two options: GPIO-signaled interrupts (Section 5.6.5), and
+interrupt-signaled events (Section 5.6.9).  Interrupt-signaled events are a
+new feature in the ACPI 6.1 specification.  Either - or both - can be used
+on a given platform, and which to use may be dependent of limitations in any
+given SoC.  If possible, interrupt-signaled events are recommended.
+
+
+ACPI Processor Control
+----------------------
+Section 8 of the ACPI specification changed significantly in version 6.0.
+Processors should now be defined as Device objects with _HID ACPI0007; do
+not use the deprecated Processor statement in ASL.  All multiprocessor systems
+should also define a hierarchy of processors, done with Processor Container
+Devices (see Section 8.4.3.1, _HID ACPI0010); do not use processor aggregator
+devices (Section 8.5) to describe processor topology.  Section 8.4 of the
+specification describes the semantics of these object definitions and how
+they interrelate.
+
+Most importantly, the processor hierarchy defined also defines the low power
+idle states that are available to the platform, along with the rules for
+determining which processors can be turned on or off and the circumstances
+that control that.  Without this information, the processors will run in
+whatever power state they were left in by UEFI.
+
+Note too, that the processor Device objects defined and the entries in the
+MADT for GICs are expected to be in synchronization.  The _UID of the Device
+object must correspond to processor IDs used in the MADT.
+
+It is recommended that CPPC (8.4.5) be used as the primary model for processor
+performance control on arm64.  C-states and P-states may become available at
+some point in the future, but most current design work appears to favor CPPC.
+
+Further, it is essential that the ARMv8 SoC provide a fully functional
+implementation of PSCI; this will be the only mechanism supported by ACPI
+to control CPU power state.  Booting of secondary CPUs using the ACPI
+parking protocol is possible, but discouraged, since only PSCI is supported
+for ARM servers.
+
+
+ACPI System Address Map Interfaces
+----------------------------------
+In Section 15 of the ACPI specification, several methods are mentioned as
+possible mechanisms for conveying memory resource information to the kernel.
+For arm64, we will only support UEFI for booting with ACPI, hence the UEFI
+GetMemoryMap() boot service is the only mechanism that will be used.
+
+
+ACPI Platform Error Interfaces (APEI)
+-------------------------------------
+The APEI tables supported are described above.
+
+APEI requires the equivalent of an SCI and an NMI on ARMv8.  The SCI is used
+to notify the OSPM of errors that have occurred but can be corrected and the
+system can continue correct operation, even if possibly degraded.  The NMI is
+used to indicate fatal errors that cannot be corrected, and require immediate
+attention.
+
+Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles
+these slightly differently.  The SCI is handled as a high priority interrupt;
+given that these are corrected (or correctable) errors being reported, this
+is sufficient.  The NMI is emulated as the highest priority interrupt
+possible.  This implies some caution must be used since there could be
+interrupts at higher privilege levels or even interrupts at the same priority
+as the emulated NMI.  In Linux, this should not be the case but one should
+be aware it could happen.
+
+
+ACPI Objects Not Supported on ARM64
+-----------------------------------
+While this may change in the future, there are several classes of objects
+that can be defined, but are not currently of general interest to ARM servers.
+Some of these objects have x86 equivalents, and may actually make sense in ARM
+servers.  However, there is either no hardware available at present, or there
+may not even be a non-ARM implementation yet.  Hence, they are not currently
+supported.
+
+The following classes of objects are not supported:
+
+       -  Section 9.2: ambient light sensor devices
+
+       -  Section 9.3: battery devices
+
+       -  Section 9.4: lids (e.g., laptop lids)
+
+       -  Section 9.8.2: IDE controllers
+
+       -  Section 9.9: floppy controllers
+
+       -  Section 9.10: GPE block devices
+
+       -  Section 9.15: PC/AT RTC/CMOS devices
+
+       -  Section 9.16: user presence detection devices
+
+       -  Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT
+
+       -  Section 9.18: time and alarm devices (see 9.15)
+
+       -  Section 10: power source and power meter devices
+
+       -  Section 11: thermal management
+
+       -  Section 12: embedded controllers interface
+
+       -  Section 13: SMBus interfaces
+
+
+This also means that there is no support for the following objects:
+
+====   =========================== ====   ==========
+Name   Section                     Name   Section
+====   =========================== ====   ==========
+_ALC   9.3.4                       _FDM   9.10.3
+_ALI   9.3.2                       _FIX   6.2.7
+_ALP   9.3.6                       _GAI   10.4.5
+_ALR   9.3.5                       _GHL   10.4.7
+_ALT   9.3.3                       _GTM   9.9.2.1.1
+_BCT   10.2.2.10                   _LID   9.5.1
+_BDN   6.5.3                       _PAI   10.4.4
+_BIF   10.2.2.1                    _PCL   10.3.2
+_BIX   10.2.2.1                    _PIF   10.3.3
+_BLT   9.2.3                       _PMC   10.4.1
+_BMA   10.2.2.4                    _PMD   10.4.8
+_BMC   10.2.2.12                   _PMM   10.4.3
+_BMD   10.2.2.11                   _PRL   10.3.4
+_BMS   10.2.2.5                    _PSR   10.3.1
+_BST   10.2.2.6                    _PTP   10.4.2
+_BTH   10.2.2.7                    _SBS   10.1.3
+_BTM   10.2.2.9                    _SHL   10.4.6
+_BTP   10.2.2.8                    _STM   9.9.2.1.1
+_DCK   6.5.2                       _UPD   9.16.1
+_EC    12.12                       _UPP   9.16.2
+_FDE   9.10.1                      _WPC   10.5.2
+_FDI   9.10.2                      _WPP   10.5.3
+====   =========================== ====   ==========
diff --git a/Documentation/arm64/acpi_object_usage.txt b/Documentation/arm64/acpi_object_usage.txt
deleted file mode 100644
index c77010c5c1f0..000000000000
--- a/Documentation/arm64/acpi_object_usage.txt
+++ /dev/null
@@ -1,622 +0,0 @@
-ACPI Tables
------------
-The expectations of individual ACPI tables are discussed in the list that
-follows.
-
-If a section number is used, it refers to a section number in the ACPI
-specification where the object is defined.  If "Signature Reserved" is used,
-the table signature (the first four bytes of the table) is the only portion
-of the table recognized by the specification, and the actual table is defined
-outside of the UEFI Forum (see Section 5.2.6 of the specification).
-
-For ACPI on arm64, tables also fall into the following categories:
-
-       -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT
-
-       -- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT
-
-       -- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT,
-          MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO,
-	  TCPA, TPM2, UEFI, XENV
-
-       -- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT,
-          MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT
-
-Table  Usage for ARMv8 Linux
------  ----------------------------------------------------------------
-BERT   Section 18.3 (signature == "BERT")
-       == Boot Error Record Table ==
-       Must be supplied if RAS support is provided by the platform.  It
-       is recommended this table be supplied.
-
-BOOT   Signature Reserved (signature == "BOOT")
-       == simple BOOT flag table ==
-       Microsoft only table, will not be supported.
-
-BGRT   Section 5.2.22 (signature == "BGRT")
-       == Boot Graphics Resource Table ==
-       Optional, not currently supported, with no real use-case for an
-       ARM server.
-
-CPEP   Section 5.2.18 (signature == "CPEP")
-       == Corrected Platform Error Polling table ==
-       Optional, not currently supported, and not recommended until such
-       time as ARM-compatible hardware is available, and the specification
-       suitably modified.
-
-CSRT   Signature Reserved (signature == "CSRT")
-       == Core System Resources Table ==
-       Optional, not currently supported.
-
-DBG2   Signature Reserved (signature == "DBG2")
-       == DeBuG port table 2 ==
-       License has changed and should be usable.  Optional if used instead
-       of earlycon=<device> on the command line.
-
-DBGP   Signature Reserved (signature == "DBGP")
-       == DeBuG Port table ==
-       Microsoft only table, will not be supported.
-
-DSDT   Section 5.2.11.1 (signature == "DSDT")
-       == Differentiated System Description Table ==
-       A DSDT is required; see also SSDT.
-
-       ACPI tables contain only one DSDT but can contain one or more SSDTs,
-       which are optional.  Each SSDT can only add to the ACPI namespace,
-       but cannot modify or replace anything in the DSDT.
-
-DMAR   Signature Reserved (signature == "DMAR")
-       == DMA Remapping table ==
-       x86 only table, will not be supported.
-
-DRTM   Signature Reserved (signature == "DRTM")
-       == Dynamic Root of Trust for Measurement table ==
-       Optional, not currently supported.
-
-ECDT   Section 5.2.16 (signature == "ECDT")
-       == Embedded Controller Description Table ==
-       Optional, not currently supported, but could be used on ARM if and
-       only if one uses the GPE_BIT field to represent an IRQ number, since
-       there are no GPE blocks defined in hardware reduced mode.  This would
-       need to be modified in the ACPI specification.
-
-EINJ   Section 18.6 (signature == "EINJ")
-       == Error Injection table ==
-       This table is very useful for testing platform response to error
-       conditions; it allows one to inject an error into the system as
-       if it had actually occurred.  However, this table should not be
-       shipped with a production system; it should be dynamically loaded
-       and executed with the ACPICA tools only during testing.
-
-ERST   Section 18.5 (signature == "ERST")
-       == Error Record Serialization Table ==
-       On a platform supports RAS, this table must be supplied if it is not
-       UEFI-based; if it is UEFI-based, this table may be supplied. When this
-       table is not present, UEFI run time service will be utilized to save
-       and retrieve hardware error information to and from a persistent store.
-
-ETDT   Signature Reserved (signature == "ETDT")
-       == Event Timer Description Table ==
-       Obsolete table, will not be supported.
-
-FACS   Section 5.2.10 (signature == "FACS")
-       == Firmware ACPI Control Structure ==
-       It is unlikely that this table will be terribly useful.  If it is
-       provided, the Global Lock will NOT be used since it is not part of
-       the hardware reduced profile, and only 64-bit address fields will
-       be considered valid.
-
-FADT   Section 5.2.9 (signature == "FACP")
-       == Fixed ACPI Description Table ==
-       Required for arm64.
-
-       The HW_REDUCED_ACPI flag must be set.  All of the fields that are
-       to be ignored when HW_REDUCED_ACPI is set are expected to be set to
-       zero.
-
-       If an FACS table is provided, the X_FIRMWARE_CTRL field is to be
-       used, not FIRMWARE_CTRL.
-
-       If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is
-       filled in properly -- that the PSCI_COMPLIANT flag is set and that
-       PSCI_USE_HVC is set or unset as needed (see table 5-37).
-
-       For the DSDT that is also required, the X_DSDT field is to be used,
-       not the DSDT field.
-
-FPDT   Section 5.2.23 (signature == "FPDT")
-       == Firmware Performance Data Table ==
-       Optional, not currently supported.
-
-GTDT   Section 5.2.24 (signature == "GTDT")
-       == Generic Timer Description Table ==
-       Required for arm64.
-
-HEST   Section 18.3.2 (signature == "HEST")
-       == Hardware Error Source Table ==
-       ARM-specific error sources have been defined; please use those or the
-       PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER
-       Bridge), or use type 9 (Generic Hardware Error Source).  Firmware first
-       error handling is possible if and only if Trusted Firmware is being
-       used on arm64.
-
-       Must be supplied if RAS support is provided by the platform.  It
-       is recommended this table be supplied.
-
-HPET   Signature Reserved (signature == "HPET")
-       == High Precision Event timer Table ==
-       x86 only table, will not be supported.
-
-IBFT   Signature Reserved (signature == "IBFT")
-       == iSCSI Boot Firmware Table ==
-       Microsoft defined table, support TBD.
-
-IORT   Signature Reserved (signature == "IORT")
-       == Input Output Remapping Table ==
-       arm64 only table, required in order to describe IO topology, SMMUs,
-       and GIC ITSs, and how those various components are connected together,
-       such as identifying which components are behind which SMMUs/ITSs.
-       This table will only be required on certain SBSA platforms (e.g.,
-       when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it 
-       remains optional.
-
-IVRS   Signature Reserved (signature == "IVRS")
-       == I/O Virtualization Reporting Structure ==
-       x86_64 (AMD) only table, will not be supported.
-
-LPIT   Signature Reserved (signature == "LPIT")
-       == Low Power Idle Table ==
-       x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
-       descriptions and power states on ARM platforms should use the DSDT
-       and define processor container devices (_HID ACPI0010, Section 8.4,
-       and more specifically 8.4.3 and and 8.4.4).
-
-MADT   Section 5.2.12 (signature == "APIC")
-       == Multiple APIC Description Table ==
-       Required for arm64.  Only the GIC interrupt controller structures
-       should be used (types 0xA - 0xF).
-
-MCFG   Signature Reserved (signature == "MCFG")
-       == Memory-mapped ConFiGuration space ==
-       If the platform supports PCI/PCIe, an MCFG table is required.
-
-MCHI   Signature Reserved (signature == "MCHI")
-       == Management Controller Host Interface table ==
-       Optional, not currently supported.
-
-MPST   Section 5.2.21 (signature == "MPST")
-       == Memory Power State Table ==
-       Optional, not currently supported.
-
-MSCT   Section 5.2.19 (signature == "MSCT")
-       == Maximum System Characteristic Table ==
-       Optional, not currently supported.
-
-MSDM   Signature Reserved (signature == "MSDM")
-       == Microsoft Data Management table ==
-       Microsoft only table, will not be supported.
-
-NFIT   Section 5.2.25 (signature == "NFIT")
-       == NVDIMM Firmware Interface Table ==
-       Optional, not currently supported.
-
-OEMx   Signature of "OEMx" only
-       == OEM Specific Tables ==
-       All tables starting with a signature of "OEM" are reserved for OEM
-       use.  Since these are not meant to be of general use but are limited
-       to very specific end users, they are not recommended for use and are
-       not supported by the kernel for arm64.
-
-PCCT   Section 14.1 (signature == "PCCT)
-       == Platform Communications Channel Table ==
-       Recommend for use on arm64; use of PCC is recommended when using CPPC
-       to control performance and power for platform processors.
-
-PMTT   Section 5.2.21.12 (signature == "PMTT")
-       == Platform Memory Topology Table ==
-       Optional, not currently supported.
-
-PSDT   Section 5.2.11.3 (signature == "PSDT")
-       == Persistent System Description Table ==
-       Obsolete table, will not be supported.
-
-RASF   Section 5.2.20 (signature == "RASF")
-       == RAS Feature table ==
-       Optional, not currently supported.
-
-RSDP   Section 5.2.5 (signature == "RSD PTR")
-       == Root System Description PoinTeR ==
-       Required for arm64.
-
-RSDT   Section 5.2.7 (signature == "RSDT")
-       == Root System Description Table ==
-       Since this table can only provide 32-bit addresses, it is deprecated
-       on arm64, and will not be used.  If provided, it will be ignored.
-
-SBST   Section 5.2.14 (signature == "SBST")
-       == Smart Battery Subsystem Table ==
-       Optional, not currently supported.
-
-SLIC   Signature Reserved (signature == "SLIC")
-       == Software LIcensing table ==
-       Microsoft only table, will not be supported.
-
-SLIT   Section 5.2.17 (signature == "SLIT")
-       == System Locality distance Information Table ==
-       Optional in general, but required for NUMA systems.
-
-SPCR   Signature Reserved (signature == "SPCR")
-       == Serial Port Console Redirection table ==
-       Required for arm64.
-
-SPMI   Signature Reserved (signature == "SPMI")
-       == Server Platform Management Interface table ==
-       Optional, not currently supported.
-
-SRAT   Section 5.2.16 (signature == "SRAT")
-       == System Resource Affinity Table ==
-       Optional, but if used, only the GICC Affinity structures are read.
-       To support arm64 NUMA, this table is required.
-
-SSDT   Section 5.2.11.2 (signature == "SSDT")
-       == Secondary System Description Table ==
-       These tables are a continuation of the DSDT; these are recommended
-       for use with devices that can be added to a running system, but can
-       also serve the purpose of dividing up device descriptions into more
-       manageable pieces.
-
-       An SSDT can only ADD to the ACPI namespace.  It cannot modify or
-       replace existing device descriptions already in the namespace.
-
-       These tables are optional, however.  ACPI tables should contain only
-       one DSDT but can contain many SSDTs.
-
-STAO   Signature Reserved (signature == "STAO")
-       == _STA Override table ==
-       Optional, but only necessary in virtualized environments in order to
-       hide devices from guest OSs.
-
-TCPA   Signature Reserved (signature == "TCPA")
-       == Trusted Computing Platform Alliance table ==
-       Optional, not currently supported, and may need changes to fully
-       interoperate with arm64.
-
-TPM2   Signature Reserved (signature == "TPM2")
-       == Trusted Platform Module 2 table ==
-       Optional, not currently supported, and may need changes to fully
-       interoperate with arm64.
-
-UEFI   Signature Reserved (signature == "UEFI")
-       == UEFI ACPI data table ==
-       Optional, not currently supported.  No known use case for arm64,
-       at present.
-
-WAET   Signature Reserved (signature == "WAET")
-       == Windows ACPI Emulated devices Table ==
-       Microsoft only table, will not be supported.
-
-WDAT   Signature Reserved (signature == "WDAT")
-       == Watch Dog Action Table ==
-       Microsoft only table, will not be supported.
-
-WDRT   Signature Reserved (signature == "WDRT")
-       == Watch Dog Resource Table ==
-       Microsoft only table, will not be supported.
-
-WPBT   Signature Reserved (signature == "WPBT")
-       == Windows Platform Binary Table ==
-       Microsoft only table, will not be supported.
-
-XENV   Signature Reserved (signature == "XENV")
-       == Xen project table ==
-       Optional, used only by Xen at present.
-
-XSDT   Section 5.2.8 (signature == "XSDT")
-       == eXtended System Description Table ==
-       Required for arm64.
-
-
-ACPI Objects
-------------
-The expectations on individual ACPI objects that are likely to be used are
-shown in the list that follows; any object not explicitly mentioned below
-should be used as needed for a particular platform or particular subsystem,
-such as power management or PCI.
-
-Name   Section         Usage for ARMv8 Linux
-----   ------------    -------------------------------------------------
-_CCA   6.2.17          This method must be defined for all bus masters
-                       on arm64 -- there are no assumptions made about
-                       whether such devices are cache coherent or not.
-                       The _CCA value is inherited by all descendants of
-                       these devices so it does not need to be repeated.
-                       Without _CCA on arm64, the kernel does not know what
-                       to do about setting up DMA for the device.
-
-                       NB: this method provides default cache coherency
-                       attributes; the presence of an SMMU can be used to
-                       modify that, however.  For example, a master could
-                       default to non-coherent, but be made coherent with
-                       the appropriate SMMU configuration (see Table 17 of
-                       the IORT specification, ARM Document DEN 0049B).
-
-_CID   6.1.2           Use as needed, see also _HID.
-
-_CLS   6.1.3           Use as needed, see also _HID.
-
-_CPC   8.4.7.1         Use as needed, power management specific.  CPPC is
-                       recommended on arm64.
-
-_CRS   6.2.2           Required on arm64.
-
-_CSD   8.4.2.2         Use as needed, used only in conjunction with _CST.
-
-_CST   8.4.2.1         Low power idle states (8.4.4) are recommended instead
-                       of C-states.
-
-_DDN   6.1.4           This field can be used for a device name.  However,
-                       it is meant for DOS device names (e.g., COM1), so be
-                       careful of its use across OSes.
-
-_DSD   6.2.5           To be used with caution.  If this object is used, try
-                       to use it within the constraints already defined by the
-                       Device Properties UUID.  Only in rare circumstances
-                       should it be necessary to create a new _DSD UUID.
-
-                       In either case, submit the _DSD definition along with
-                       any driver patches for discussion, especially when
-                       device properties are used.  A driver will not be
-                       considered complete without a corresponding _DSD
-                       description.  Once approved by kernel maintainers,
-                       the UUID or device properties must then be registered
-                       with the UEFI Forum; this may cause some iteration as
-                       more than one OS will be registering entries.
-
-_DSM   9.1.1           Do not use this method.  It is not standardized, the
-                       return values are not well documented, and it is
-                       currently a frequent source of error.
-
-\_GL   5.7.1           This object is not to be used in hardware reduced
-                       mode, and therefore should not be used on arm64.
-
-_GLK   6.5.7           This object requires a global lock be defined; there
-                       is no global lock on arm64 since it runs in hardware
-                       reduced mode.  Hence, do not use this object on arm64.
-
-\_GPE  5.3.1           This namespace is for x86 use only.  Do not use it
-                       on arm64.
-
-_HID   6.1.5           This is the primary object to use in device probing,
-		       though _CID and _CLS may also be used.
-
-_INI   6.5.1           Not required, but can be useful in setting up devices
-                       when UEFI leaves them in a state that may not be what
-                       the driver expects before it starts probing.
-
-_LPI   8.4.4.3         Recommended for use with processor definitions (_HID
-		       ACPI0010) on arm64.  See also _RDI.
-
-_MLS   6.1.7           Highly recommended for use in internationalization.
-
-_OFF   7.2.2           It is recommended to define this method for any device
-                       that can be turned on or off.
-
-_ON    7.2.3           It is recommended to define this method for any device
-                       that can be turned on or off.
-
-\_OS   5.7.3           This method will return "Linux" by default (this is
-                       the value of the macro ACPI_OS_NAME on Linux).  The
-                       command line parameter acpi_os=<string> can be used
-                       to set it to some other value.
-
-_OSC   6.2.11          This method can be a global method in ACPI (i.e.,
-                       \_SB._OSC), or it may be associated with a specific
-                       device (e.g., \_SB.DEV0._OSC), or both.  When used
-                       as a global method, only capabilities published in
-                       the ACPI specification are allowed.  When used as
-                       a device-specific method, the process described for
-                       using _DSD MUST be used to create an _OSC definition;
-                       out-of-process use of _OSC is not allowed.  That is,
-                       submit the device-specific _OSC usage description as
-                       part of the kernel driver submission, get it approved
-                       by the kernel community, then register it with the
-                       UEFI Forum.
-
-\_OSI  5.7.2           Deprecated on ARM64.  As far as ACPI firmware is 
-		       concerned, _OSI is not to be used to determine what 
-		       sort of system is being used or what functionality
-		       is provided.  The _OSC method is to be used instead.
-
-_PDC   8.4.1           Deprecated, do not use on arm64.
-
-\_PIC  5.8.1           The method should not be used.  On arm64, the only
-                       interrupt model available is GIC.
-
-\_PR   5.3.1           This namespace is for x86 use only on legacy systems.
-                       Do not use it on arm64.
-
-_PRT   6.2.13          Required as part of the definition of all PCI root
-                       devices.
-
-_PRx   7.3.8-11        Use as needed; power management specific.  If _PR0 is
-                       defined, _PR3 must also be defined.
-
-_PSx   7.3.2-5         Use as needed; power management specific.  If _PS0 is
-                       defined, _PS3 must also be defined.  If clocks or
-                       regulators need adjusting to be consistent with power
-                       usage, change them in these methods.
-
-_RDI   8.4.4.4         Recommended for use with processor definitions (_HID
-		       ACPI0010) on arm64.  This should only be used in 
-		       conjunction with _LPI.
-
-\_REV  5.7.4           Always returns the latest version of ACPI supported.
-
-\_SB   5.3.1           Required on arm64; all devices must be defined in this
-                       namespace.
-
-_SLI   6.2.15          Use is recommended when SLIT table is in use.
-
-_STA   6.3.7,          It is recommended to define this method for any device
-       7.2.4           that can be turned on or off.  See also the STAO table
-                       that provides overrides to hide devices in virtualized
-                       environments.
-
-_SRS   6.2.16          Use as needed; see also _PRS.
-
-_STR   6.1.10          Recommended for conveying device names to end users;
-                       this is preferred over using _DDN.
-
-_SUB   6.1.9           Use as needed; _HID or _CID are preferred.
-
-_SUN   6.1.11          Use as needed, but recommended.
-
-_SWS   7.4.3           Use as needed; power management specific; this may
-                       require specification changes for use on arm64.
-
-_UID   6.1.12          Recommended for distinguishing devices of the same
-                       class; define it if at all possible.
-
-
-
-
-ACPI Event Model
-----------------
-Do not use GPE block devices; these are not supported in the hardware reduced
-profile used by arm64.  Since there are no GPE blocks defined for use on ARM
-platforms, ACPI events must be signaled differently.
-
-There are two options: GPIO-signaled interrupts (Section 5.6.5), and
-interrupt-signaled events (Section 5.6.9).  Interrupt-signaled events are a
-new feature in the ACPI 6.1 specification.  Either -- or both -- can be used
-on a given platform, and which to use may be dependent of limitations in any
-given SoC.  If possible, interrupt-signaled events are recommended.
-
-
-ACPI Processor Control
-----------------------
-Section 8 of the ACPI specification changed significantly in version 6.0.
-Processors should now be defined as Device objects with _HID ACPI0007; do
-not use the deprecated Processor statement in ASL.  All multiprocessor systems
-should also define a hierarchy of processors, done with Processor Container
-Devices (see Section 8.4.3.1, _HID ACPI0010); do not use processor aggregator
-devices (Section 8.5) to describe processor topology.  Section 8.4 of the
-specification describes the semantics of these object definitions and how
-they interrelate.
-
-Most importantly, the processor hierarchy defined also defines the low power
-idle states that are available to the platform, along with the rules for
-determining which processors can be turned on or off and the circumstances
-that control that.  Without this information, the processors will run in
-whatever power state they were left in by UEFI.
-
-Note too, that the processor Device objects defined and the entries in the
-MADT for GICs are expected to be in synchronization.  The _UID of the Device
-object must correspond to processor IDs used in the MADT.
-
-It is recommended that CPPC (8.4.5) be used as the primary model for processor
-performance control on arm64.  C-states and P-states may become available at
-some point in the future, but most current design work appears to favor CPPC.
-
-Further, it is essential that the ARMv8 SoC provide a fully functional
-implementation of PSCI; this will be the only mechanism supported by ACPI
-to control CPU power state.  Booting of secondary CPUs using the ACPI
-parking protocol is possible, but discouraged, since only PSCI is supported
-for ARM servers.
-
-
-ACPI System Address Map Interfaces
-----------------------------------
-In Section 15 of the ACPI specification, several methods are mentioned as
-possible mechanisms for conveying memory resource information to the kernel.
-For arm64, we will only support UEFI for booting with ACPI, hence the UEFI
-GetMemoryMap() boot service is the only mechanism that will be used.
-
-
-ACPI Platform Error Interfaces (APEI)
--------------------------------------
-The APEI tables supported are described above.
-
-APEI requires the equivalent of an SCI and an NMI on ARMv8.  The SCI is used
-to notify the OSPM of errors that have occurred but can be corrected and the
-system can continue correct operation, even if possibly degraded.  The NMI is
-used to indicate fatal errors that cannot be corrected, and require immediate
-attention.
-
-Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles
-these slightly differently.  The SCI is handled as a high priority interrupt;
-given that these are corrected (or correctable) errors being reported, this
-is sufficient.  The NMI is emulated as the highest priority interrupt
-possible.  This implies some caution must be used since there could be
-interrupts at higher privilege levels or even interrupts at the same priority
-as the emulated NMI.  In Linux, this should not be the case but one should
-be aware it could happen.
-
-
-ACPI Objects Not Supported on ARM64
------------------------------------
-While this may change in the future, there are several classes of objects
-that can be defined, but are not currently of general interest to ARM servers.
-Some of these objects have x86 equivalents, and may actually make sense in ARM
-servers.  However, there is either no hardware available at present, or there
-may not even be a non-ARM implementation yet.  Hence, they are not currently
-supported.
-
-The following classes of objects are not supported:
-
-       -- Section 9.2: ambient light sensor devices
-
-       -- Section 9.3: battery devices
-
-       -- Section 9.4: lids (e.g., laptop lids)
-
-       -- Section 9.8.2: IDE controllers
-
-       -- Section 9.9: floppy controllers
-
-       -- Section 9.10: GPE block devices
-
-       -- Section 9.15: PC/AT RTC/CMOS devices
-
-       -- Section 9.16: user presence detection devices
-
-       -- Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT
-
-       -- Section 9.18: time and alarm devices (see 9.15)
-
-       -- Section 10: power source and power meter devices
-
-       -- Section 11: thermal management
-
-       -- Section 12: embedded controllers interface
-
-       -- Section 13: SMBus interfaces
-
-
-This also means that there is no support for the following objects:
-
-Name   Section                     Name   Section
-----   ------------                ----   ------------
-_ALC   9.3.4                       _FDM   9.10.3
-_ALI   9.3.2                       _FIX   6.2.7
-_ALP   9.3.6                       _GAI   10.4.5
-_ALR   9.3.5                       _GHL   10.4.7
-_ALT   9.3.3                       _GTM   9.9.2.1.1
-_BCT   10.2.2.10                   _LID   9.5.1
-_BDN   6.5.3                       _PAI   10.4.4
-_BIF   10.2.2.1                    _PCL   10.3.2
-_BIX   10.2.2.1                    _PIF   10.3.3
-_BLT   9.2.3                       _PMC   10.4.1
-_BMA   10.2.2.4                    _PMD   10.4.8
-_BMC   10.2.2.12                   _PMM   10.4.3
-_BMD   10.2.2.11                   _PRL   10.3.4
-_BMS   10.2.2.5                    _PSR   10.3.1
-_BST   10.2.2.6                    _PTP   10.4.2
-_BTH   10.2.2.7                    _SBS   10.1.3
-_BTM   10.2.2.9                    _SHL   10.4.6
-_BTP   10.2.2.8                    _STM   9.9.2.1.1
-_DCK   6.5.2                       _UPD   9.16.1
-_EC    12.12                       _UPP   9.16.2
-_FDE   9.10.1                      _WPC   10.5.2
-_FDI   9.10.2                      _WPP   10.5.3
-
diff --git a/Documentation/arm64/arm-acpi.rst b/Documentation/arm64/arm-acpi.rst
new file mode 100644
index 000000000000..872dbbc73d4a
--- /dev/null
+++ b/Documentation/arm64/arm-acpi.rst
@@ -0,0 +1,528 @@
+=====================
+ACPI on ARMv8 Servers
+=====================
+
+ACPI can be used for ARMv8 general purpose servers designed to follow
+the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
+Base Boot Requirements) [1] specifications.  Please note that the SBBR
+can be retrieved simply by visiting [1], but the SBSA is currently only
+available to those with an ARM login due to ARM IP licensing concerns.
+
+The ARMv8 kernel implements the reduced hardware model of ACPI version
+5.1 or later.  Links to the specification and all external documents
+it refers to are managed by the UEFI Forum.  The specification is
+available at http://www.uefi.org/specifications and documents referenced
+by the specification can be found via http://www.uefi.org/acpi.
+
+If an ARMv8 system does not meet the requirements of the SBSA and SBBR,
+or cannot be described using the mechanisms defined in the required ACPI
+specifications, then ACPI may not be a good fit for the hardware.
+
+While the documents mentioned above set out the requirements for building
+industry-standard ARMv8 servers, they also apply to more than one operating
+system.  The purpose of this document is to describe the interaction between
+ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of
+ACPI and what ACPI can expect of Linux.
+
+
+Why ACPI on ARM?
+----------------
+Before examining the details of the interface between ACPI and Linux, it is
+useful to understand why ACPI is being used.  Several technologies already
+exist in Linux for describing non-enumerable hardware, after all.  In this
+section we summarize a blog post [2] from Grant Likely that outlines the
+reasoning behind ACPI on ARMv8 servers.  Actually, we snitch a good portion
+of the summary text almost directly, to be honest.
+
+The short form of the rationale for ACPI on ARM is:
+
+-  ACPI’s byte code (AML) allows the platform to encode hardware behavior,
+   while DT explicitly does not support this.  For hardware vendors, being
+   able to encode behavior is a key tool used in supporting operating
+   system releases on new hardware.
+
+-  ACPI’s OSPM defines a power management model that constrains what the
+   platform is allowed to do into a specific model, while still providing
+   flexibility in hardware design.
+
+-  In the enterprise server environment, ACPI has established bindings (such
+   as for RAS) which are currently used in production systems.  DT does not.
+   Such bindings could be defined in DT at some point, but doing so means ARM
+   and x86 would end up using completely different code paths in both firmware
+   and the kernel.
+
+-  Choosing a single interface to describe the abstraction between a platform
+   and an OS is important.  Hardware vendors would not be required to implement
+   both DT and ACPI if they want to support multiple operating systems.  And,
+   agreeing on a single interface instead of being fragmented into per OS
+   interfaces makes for better interoperability overall.
+
+-  The new ACPI governance process works well and Linux is now at the same
+   table as hardware vendors and other OS vendors.  In fact, there is no
+   longer any reason to feel that ACPI only belongs to Windows or that
+   Linux is in any way secondary to Microsoft in this arena.  The move of
+   ACPI governance into the UEFI forum has significantly opened up the
+   specification development process, and currently, a large portion of the
+   changes being made to ACPI are being driven by Linux.
+
+Key to the use of ACPI is the support model.  For servers in general, the
+responsibility for hardware behaviour cannot solely be the domain of the
+kernel, but rather must be split between the platform and the kernel, in
+order to allow for orderly change over time.  ACPI frees the OS from needing
+to understand all the minute details of the hardware so that the OS doesn’t
+need to be ported to each and every device individually.  It allows the
+hardware vendors to take responsibility for power management behaviour without
+depending on an OS release cycle which is not under their control.
+
+ACPI is also important because hardware and OS vendors have already worked
+out the mechanisms for supporting a general purpose computing ecosystem.  The
+infrastructure is in place, the bindings are in place, and the processes are
+in place.  DT does exactly what Linux needs it to when working with vertically
+integrated devices, but there are no good processes for supporting what the
+server vendors need.  Linux could potentially get there with DT, but doing so
+really just duplicates something that already works.  ACPI already does what
+the hardware vendors need, Microsoft won’t collaborate on DT, and hardware
+vendors would still end up providing two completely separate firmware
+interfaces -- one for Linux and one for Windows.
+
+
+Kernel Compatibility
+--------------------
+One of the primary motivations for ACPI is standardization, and using that
+to provide backward compatibility for Linux kernels.  In the server market,
+software and hardware are often used for long periods.  ACPI allows the
+kernel and firmware to agree on a consistent abstraction that can be
+maintained over time, even as hardware or software change.  As long as the
+abstraction is supported, systems can be updated without necessarily having
+to replace the kernel.
+
+When a Linux driver or subsystem is first implemented using ACPI, it by
+definition ends up requiring a specific version of the ACPI specification
+-- it's baseline.  ACPI firmware must continue to work, even though it may
+not be optimal, with the earliest kernel version that first provides support
+for that baseline version of ACPI.  There may be a need for additional drivers,
+but adding new functionality (e.g., CPU power management) should not break
+older kernel versions.  Further, ACPI firmware must also work with the most
+recent version of the kernel.
+
+
+Relationship with Device Tree
+-----------------------------
+ACPI support in drivers and subsystems for ARMv8 should never be mutually
+exclusive with DT support at compile time.
+
+At boot time the kernel will only use one description method depending on
+parameters passed from the boot loader (including kernel bootargs).
+
+Regardless of whether DT or ACPI is used, the kernel must always be capable
+of booting with either scheme (in kernels with both schemes enabled at compile
+time).
+
+
+Booting using ACPI tables
+-------------------------
+The only defined method for passing ACPI tables to the kernel on ARMv8
+is via the UEFI system configuration table.  Just so it is explicit, this
+means that ACPI is only supported on platforms that boot via UEFI.
+
+When an ARMv8 system boots, it can either have DT information, ACPI tables,
+or in some very unusual cases, both.  If no command line parameters are used,
+the kernel will try to use DT for device enumeration; if there is no DT
+present, the kernel will try to use ACPI tables, but only if they are present.
+In neither is available, the kernel will not boot.  If acpi=force is used
+on the command line, the kernel will attempt to use ACPI tables first, but
+fall back to DT if there are no ACPI tables present.  The basic idea is that
+the kernel will not fail to boot unless it absolutely has no other choice.
+
+Processing of ACPI tables may be disabled by passing acpi=off on the kernel
+command line; this is the default behavior.
+
+In order for the kernel to load and use ACPI tables, the UEFI implementation
+MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with
+the ACPI signature "RSD PTR ").  If this pointer is incorrect and acpi=force
+is used, the kernel will disable ACPI and try to use DT to boot instead; the
+kernel has, in effect, determined that ACPI tables are not present at that
+point.
+
+If the pointer to the RSDP table is correct, the table will be mapped into
+the kernel by the ACPI core, using the address provided by UEFI.
+
+The ACPI core will then locate and map in all other ACPI tables provided by
+using the addresses in the RSDP table to find the XSDT (eXtended System
+Description Table).  The XSDT in turn provides the addresses to all other
+ACPI tables provided by the system firmware; the ACPI core will then traverse
+this table and map in the tables listed.
+
+The ACPI core will ignore any provided RSDT (Root System Description Table).
+RSDTs have been deprecated and are ignored on arm64 since they only allow
+for 32-bit addresses.
+
+Further, the ACPI core will only use the 64-bit address fields in the FADT
+(Fixed ACPI Description Table).  Any 32-bit address fields in the FADT will
+be ignored on arm64.
+
+Hardware reduced mode (see Section 4.1 of the ACPI 6.1 specification) will
+be enforced by the ACPI core on arm64.  Doing so allows the ACPI core to
+run less complex code since it no longer has to provide support for legacy
+hardware from other architectures.  Any fields that are not to be used for
+hardware reduced mode must be set to zero.
+
+For the ACPI core to operate properly, and in turn provide the information
+the kernel needs to configure devices, it expects to find the following
+tables (all section numbers refer to the ACPI 6.1 specification):
+
+    -  RSDP (Root System Description Pointer), section 5.2.5
+
+    -  XSDT (eXtended System Description Table), section 5.2.8
+
+    -  FADT (Fixed ACPI Description Table), section 5.2.9
+
+    -  DSDT (Differentiated System Description Table), section
+       5.2.11.1
+
+    -  MADT (Multiple APIC Description Table), section 5.2.12
+
+    -  GTDT (Generic Timer Description Table), section 5.2.24
+
+    -  If PCI is supported, the MCFG (Memory mapped ConFiGuration
+       Table), section 5.2.6, specifically Table 5-31.
+
+    -  If booting without a console=<device> kernel parameter is
+       supported, the SPCR (Serial Port Console Redirection table),
+       section 5.2.6, specifically Table 5-31.
+
+    -  If necessary to describe the I/O topology, SMMUs and GIC ITSs,
+       the IORT (Input Output Remapping Table, section 5.2.6, specifically
+       Table 5-31).
+
+    -  If NUMA is supported, the SRAT (System Resource Affinity Table)
+       and SLIT (System Locality distance Information Table), sections
+       5.2.16 and 5.2.17, respectively.
+
+If the above tables are not all present, the kernel may or may not be
+able to boot properly since it may not be able to configure all of the
+devices available.  This list of tables is not meant to be all inclusive;
+in some environments other tables may be needed (e.g., any of the APEI
+tables from section 18) to support specific functionality.
+
+
+ACPI Detection
+--------------
+Drivers should determine their probe() type by checking for a null
+value for ACPI_HANDLE, or checking .of_node, or other information in
+the device structure.  This is detailed further in the "Driver
+Recommendations" section.
+
+In non-driver code, if the presence of ACPI needs to be detected at
+run time, then check the value of acpi_disabled. If CONFIG_ACPI is not
+set, acpi_disabled will always be 1.
+
+
+Device Enumeration
+------------------
+Device descriptions in ACPI should use standard recognized ACPI interfaces.
+These may contain less information than is typically provided via a Device
+Tree description for the same device.  This is also one of the reasons that
+ACPI can be useful -- the driver takes into account that it may have less
+detailed information about the device and uses sensible defaults instead.
+If done properly in the driver, the hardware can change and improve over
+time without the driver having to change at all.
+
+Clocks provide an excellent example.  In DT, clocks need to be specified
+and the drivers need to take them into account.  In ACPI, the assumption
+is that UEFI will leave the device in a reasonable default state, including
+any clock settings.  If for some reason the driver needs to change a clock
+value, this can be done in an ACPI method; all the driver needs to do is
+invoke the method and not concern itself with what the method needs to do
+to change the clock.  Changing the hardware can then take place over time
+by changing what the ACPI method does, and not the driver.
+
+In DT, the parameters needed by the driver to set up clocks as in the example
+above are known as "bindings"; in ACPI, these are known as "Device Properties"
+and provided to a driver via the _DSD object.
+
+ACPI tables are described with a formal language called ASL, the ACPI
+Source Language (section 19 of the specification).  This means that there
+are always multiple ways to describe the same thing -- including device
+properties.  For example, device properties could use an ASL construct
+that looks like this: Name(KEY0, "value0").  An ACPI device driver would
+then retrieve the value of the property by evaluating the KEY0 object.
+However, using Name() this way has multiple problems: (1) ACPI limits
+names ("KEY0") to four characters unlike DT; (2) there is no industry
+wide registry that maintains a list of names, minimizing re-use; (3)
+there is also no registry for the definition of property values ("value0"),
+again making re-use difficult; and (4) how does one maintain backward
+compatibility as new hardware comes out?  The _DSD method was created
+to solve precisely these sorts of problems; Linux drivers should ALWAYS
+use the _DSD method for device properties and nothing else.
+
+The _DSM object (ACPI Section 9.14.1) could also be used for conveying
+device properties to a driver.  Linux drivers should only expect it to
+be used if _DSD cannot represent the data required, and there is no way
+to create a new UUID for the _DSD object.  Note that there is even less
+regulation of the use of _DSM than there is of _DSD.  Drivers that depend
+on the contents of _DSM objects will be more difficult to maintain over
+time because of this; as of this writing, the use of _DSM is the cause
+of quite a few firmware problems and is not recommended.
+
+Drivers should look for device properties in the _DSD object ONLY; the _DSD
+object is described in the ACPI specification section 6.2.5, but this only
+describes how to define the structure of an object returned via _DSD, and
+how specific data structures are defined by specific UUIDs.  Linux should
+only use the _DSD Device Properties UUID [5]:
+
+   - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
+
+   - http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
+
+The UEFI Forum provides a mechanism for registering device properties [4]
+so that they may be used across all operating systems supporting ACPI.
+Device properties that have not been registered with the UEFI Forum should
+not be used.
+
+Before creating new device properties, check to be sure that they have not
+been defined before and either registered in the Linux kernel documentation
+as DT bindings, or the UEFI Forum as device properties.  While we do not want
+to simply move all DT bindings into ACPI device properties, we can learn from
+what has been previously defined.
+
+If it is necessary to define a new device property, or if it makes sense to
+synthesize the definition of a binding so it can be used in any firmware,
+both DT bindings and ACPI device properties for device drivers have review
+processes.  Use them both.  When the driver itself is submitted for review
+to the Linux mailing lists, the device property definitions needed must be
+submitted at the same time.  A driver that supports ACPI and uses device
+properties will not be considered complete without their definitions.  Once
+the device property has been accepted by the Linux community, it must be
+registered with the UEFI Forum [4], which will review it again for consistency
+within the registry.  This may require iteration.  The UEFI Forum, though,
+will always be the canonical site for device property definitions.
+
+It may make sense to provide notice to the UEFI Forum that there is the
+intent to register a previously unused device property name as a means of
+reserving the name for later use.  Other operating system vendors will
+also be submitting registration requests and this may help smooth the
+process.
+
+Once registration and review have been completed, the kernel provides an
+interface for looking up device properties in a manner independent of
+whether DT or ACPI is being used.  This API should be used [6]; it can
+eliminate some duplication of code paths in driver probing functions and
+discourage divergence between DT bindings and ACPI device properties.
+
+
+Programmable Power Control Resources
+------------------------------------
+Programmable power control resources include such resources as voltage/current
+providers (regulators) and clock sources.
+
+With ACPI, the kernel clock and regulator framework is not expected to be used
+at all.
+
+The kernel assumes that power control of these resources is represented with
+Power Resource Objects (ACPI section 7.1).  The ACPI core will then handle
+correctly enabling and disabling resources as they are needed.  In order to
+get that to work, ACPI assumes each device has defined D-states and that these
+can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3;
+in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for
+turning a device full off.
+
+There are two options for using those Power Resources.  They can:
+
+   -  be managed in a _PSx method which gets called on entry to power
+      state Dx.
+
+   -  be declared separately as power resources with their own _ON and _OFF
+      methods.  They are then tied back to D-states for a particular device
+      via _PRx which specifies which power resources a device needs to be on
+      while in Dx.  Kernel then tracks number of devices using a power resource
+      and calls _ON/_OFF as needed.
+
+The kernel ACPI code will also assume that the _PSx methods follow the normal
+ACPI rules for such methods:
+
+   -  If either _PS0 or _PS3 is implemented, then the other method must also
+      be implemented.
+
+   -  If a device requires usage or setup of a power resource when on, the ASL
+      should organize that it is allocated/enabled using the _PS0 method.
+
+   -  Resources allocated or enabled in the _PS0 method should be disabled
+      or de-allocated in the _PS3 method.
+
+   -  Firmware will leave the resources in a reasonable state before handing
+      over control to the kernel.
+
+Such code in _PSx methods will of course be very platform specific.  But,
+this allows the driver to abstract out the interface for operating the device
+and avoid having to read special non-standard values from ACPI tables. Further,
+abstracting the use of these resources allows the hardware to change over time
+without requiring updates to the driver.
+
+
+Clocks
+------
+ACPI makes the assumption that clocks are initialized by the firmware --
+UEFI, in this case -- to some working value before control is handed over
+to the kernel.  This has implications for devices such as UARTs, or SoC-driven
+LCD displays, for example.
+
+When the kernel boots, the clocks are assumed to be set to reasonable
+working values.  If for some reason the frequency needs to change -- e.g.,
+throttling for power management -- the device driver should expect that
+process to be abstracted out into some ACPI method that can be invoked
+(please see the ACPI specification for further recommendations on standard
+methods to be expected).  The only exceptions to this are CPU clocks where
+CPPC provides a much richer interface than ACPI methods.  If the clocks
+are not set, there is no direct way for Linux to control them.
+
+If an SoC vendor wants to provide fine-grained control of the system clocks,
+they could do so by providing ACPI methods that could be invoked by Linux
+drivers.  However, this is NOT recommended and Linux drivers should NOT use
+such methods, even if they are provided.  Such methods are not currently
+standardized in the ACPI specification, and using them could tie a kernel
+to a very specific SoC, or tie an SoC to a very specific version of the
+kernel, both of which we are trying to avoid.
+
+
+Driver Recommendations
+----------------------
+DO NOT remove any DT handling when adding ACPI support for a driver.  The
+same device may be used on many different systems.
+
+DO try to structure the driver so that it is data-driven.  That is, set up
+a struct containing internal per-device state based on defaults and whatever
+else must be discovered by the driver probe function.  Then, have the rest
+of the driver operate off of the contents of that struct.  Doing so should
+allow most divergence between ACPI and DT functionality to be kept local to
+the probe function instead of being scattered throughout the driver.  For
+example::
+
+  static int device_probe_dt(struct platform_device *pdev)
+  {
+         /* DT specific functionality */
+         ...
+  }
+
+  static int device_probe_acpi(struct platform_device *pdev)
+  {
+         /* ACPI specific functionality */
+         ...
+  }
+
+  static int device_probe(struct platform_device *pdev)
+  {
+         ...
+         struct device_node node = pdev->dev.of_node;
+         ...
+
+         if (node)
+                 ret = device_probe_dt(pdev);
+         else if (ACPI_HANDLE(&pdev->dev))
+                 ret = device_probe_acpi(pdev);
+         else
+                 /* other initialization */
+                 ...
+         /* Continue with any generic probe operations */
+         ...
+  }
+
+DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it
+clear the different names the driver is probed for, both from DT and from
+ACPI::
+
+  static struct of_device_id virtio_mmio_match[] = {
+          { .compatible = "virtio,mmio", },
+          { }
+  };
+  MODULE_DEVICE_TABLE(of, virtio_mmio_match);
+
+  static const struct acpi_device_id virtio_mmio_acpi_match[] = {
+          { "LNRO0005", },
+          { }
+  };
+  MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match);
+
+
+ASWG
+----
+The ACPI specification changes regularly.  During the year 2014, for instance,
+version 5.1 was released and version 6.0 substantially completed, with most of
+the changes being driven by ARM-specific requirements.  Proposed changes are
+presented and discussed in the ASWG (ACPI Specification Working Group) which
+is a part of the UEFI Forum.  The current version of the ACPI specification
+is 6.1 release in January 2016.
+
+Participation in this group is open to all UEFI members.  Please see
+http://www.uefi.org/workinggroup for details on group membership.
+
+It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification
+as closely as possible, and to only implement functionality that complies with
+the released standards from UEFI ASWG.  As a practical matter, there will be
+vendors that provide bad ACPI tables or violate the standards in some way.
+If this is because of errors, quirks and fix-ups may be necessary, but will
+be avoided if possible.  If there are features missing from ACPI that preclude
+it from being used on a platform, ECRs (Engineering Change Requests) should be
+submitted to ASWG and go through the normal approval process; for those that
+are not UEFI members, many other members of the Linux community are and would
+likely be willing to assist in submitting ECRs.
+
+
+Linux Code
+----------
+Individual items specific to Linux on ARM, contained in the the Linux
+source code, are in the list that follows:
+
+ACPI_OS_NAME
+                       This macro defines the string to be returned when
+                       an ACPI method invokes the _OS method.  On ARM64
+                       systems, this macro will be "Linux" by default.
+                       The command line parameter acpi_os=<string>
+                       can be used to set it to some other value.  The
+                       default value for other architectures is "Microsoft
+                       Windows NT", for example.
+
+ACPI Objects
+------------
+Detailed expectations for ACPI tables and object are listed in the file
+Documentation/arm64/acpi_object_usage.rst.
+
+
+References
+----------
+[0] http://silver.arm.com
+    document ARM-DEN-0029, or newer:
+    "Server Base System Architecture", version 2.3, dated 27 Mar 2014
+
+[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
+    Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
+    Software on ARM Platforms", dated 16 Aug 2014
+
+[2] http://www.secretlab.ca/archives/151,
+    10 Jan 2015, Copyright (c) 2015,
+    Linaro Ltd., written by Grant Likely.
+
+[3] AMD ACPI for Seattle platform documentation
+    http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
+
+
+[4] http://www.uefi.org/acpi
+    please see the link for the "ACPI _DSD Device
+    Property Registry Instructions"
+
+[5] http://www.uefi.org/acpi
+    please see the link for the "_DSD (Device
+    Specific Data) Implementation Guide"
+
+[6] Kernel code for the unified device
+    property interface can be found in
+    include/linux/property.h and drivers/base/property.c.
+
+
+Authors
+-------
+- Al Stone <al.stone@linaro.org>
+- Graeme Gregory <graeme.gregory@linaro.org>
+- Hanjun Guo <hanjun.guo@linaro.org>
+
+- Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section
diff --git a/Documentation/arm64/arm-acpi.txt b/Documentation/arm64/arm-acpi.txt
deleted file mode 100644
index 1a74a041a443..000000000000
--- a/Documentation/arm64/arm-acpi.txt
+++ /dev/null
@@ -1,519 +0,0 @@
-ACPI on ARMv8 Servers
----------------------
-ACPI can be used for ARMv8 general purpose servers designed to follow
-the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server
-Base Boot Requirements) [1] specifications.  Please note that the SBBR
-can be retrieved simply by visiting [1], but the SBSA is currently only
-available to those with an ARM login due to ARM IP licensing concerns.
-
-The ARMv8 kernel implements the reduced hardware model of ACPI version
-5.1 or later.  Links to the specification and all external documents
-it refers to are managed by the UEFI Forum.  The specification is
-available at http://www.uefi.org/specifications and documents referenced
-by the specification can be found via http://www.uefi.org/acpi.
-
-If an ARMv8 system does not meet the requirements of the SBSA and SBBR,
-or cannot be described using the mechanisms defined in the required ACPI
-specifications, then ACPI may not be a good fit for the hardware.
-
-While the documents mentioned above set out the requirements for building
-industry-standard ARMv8 servers, they also apply to more than one operating
-system.  The purpose of this document is to describe the interaction between
-ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of
-ACPI and what ACPI can expect of Linux.
-
-
-Why ACPI on ARM?
-----------------
-Before examining the details of the interface between ACPI and Linux, it is
-useful to understand why ACPI is being used.  Several technologies already
-exist in Linux for describing non-enumerable hardware, after all.  In this
-section we summarize a blog post [2] from Grant Likely that outlines the
-reasoning behind ACPI on ARMv8 servers.  Actually, we snitch a good portion
-of the summary text almost directly, to be honest.
-
-The short form of the rationale for ACPI on ARM is:
-
--- ACPI’s byte code (AML) allows the platform to encode hardware behavior,
-   while DT explicitly does not support this.  For hardware vendors, being
-   able to encode behavior is a key tool used in supporting operating
-   system releases on new hardware.
-
--- ACPI’s OSPM defines a power management model that constrains what the
-   platform is allowed to do into a specific model, while still providing
-   flexibility in hardware design.
-
--- In the enterprise server environment, ACPI has established bindings (such
-   as for RAS) which are currently used in production systems.  DT does not.
-   Such bindings could be defined in DT at some point, but doing so means ARM
-   and x86 would end up using completely different code paths in both firmware
-   and the kernel.
-
--- Choosing a single interface to describe the abstraction between a platform
-   and an OS is important.  Hardware vendors would not be required to implement
-   both DT and ACPI if they want to support multiple operating systems.  And,
-   agreeing on a single interface instead of being fragmented into per OS
-   interfaces makes for better interoperability overall.
-
--- The new ACPI governance process works well and Linux is now at the same
-   table as hardware vendors and other OS vendors.  In fact, there is no
-   longer any reason to feel that ACPI only belongs to Windows or that
-   Linux is in any way secondary to Microsoft in this arena.  The move of
-   ACPI governance into the UEFI forum has significantly opened up the
-   specification development process, and currently, a large portion of the
-   changes being made to ACPI are being driven by Linux.
-
-Key to the use of ACPI is the support model.  For servers in general, the
-responsibility for hardware behaviour cannot solely be the domain of the
-kernel, but rather must be split between the platform and the kernel, in
-order to allow for orderly change over time.  ACPI frees the OS from needing
-to understand all the minute details of the hardware so that the OS doesn’t
-need to be ported to each and every device individually.  It allows the
-hardware vendors to take responsibility for power management behaviour without
-depending on an OS release cycle which is not under their control.
-
-ACPI is also important because hardware and OS vendors have already worked
-out the mechanisms for supporting a general purpose computing ecosystem.  The
-infrastructure is in place, the bindings are in place, and the processes are
-in place.  DT does exactly what Linux needs it to when working with vertically
-integrated devices, but there are no good processes for supporting what the
-server vendors need.  Linux could potentially get there with DT, but doing so
-really just duplicates something that already works.  ACPI already does what
-the hardware vendors need, Microsoft won’t collaborate on DT, and hardware
-vendors would still end up providing two completely separate firmware
-interfaces -- one for Linux and one for Windows.
-
-
-Kernel Compatibility
---------------------
-One of the primary motivations for ACPI is standardization, and using that
-to provide backward compatibility for Linux kernels.  In the server market,
-software and hardware are often used for long periods.  ACPI allows the
-kernel and firmware to agree on a consistent abstraction that can be
-maintained over time, even as hardware or software change.  As long as the
-abstraction is supported, systems can be updated without necessarily having
-to replace the kernel.
-
-When a Linux driver or subsystem is first implemented using ACPI, it by
-definition ends up requiring a specific version of the ACPI specification
--- it's baseline.  ACPI firmware must continue to work, even though it may
-not be optimal, with the earliest kernel version that first provides support
-for that baseline version of ACPI.  There may be a need for additional drivers,
-but adding new functionality (e.g., CPU power management) should not break
-older kernel versions.  Further, ACPI firmware must also work with the most
-recent version of the kernel.
-
-
-Relationship with Device Tree
------------------------------
-ACPI support in drivers and subsystems for ARMv8 should never be mutually
-exclusive with DT support at compile time.
-
-At boot time the kernel will only use one description method depending on
-parameters passed from the boot loader (including kernel bootargs).
-
-Regardless of whether DT or ACPI is used, the kernel must always be capable
-of booting with either scheme (in kernels with both schemes enabled at compile
-time).
-
-
-Booting using ACPI tables
--------------------------
-The only defined method for passing ACPI tables to the kernel on ARMv8
-is via the UEFI system configuration table.  Just so it is explicit, this
-means that ACPI is only supported on platforms that boot via UEFI.
-
-When an ARMv8 system boots, it can either have DT information, ACPI tables,
-or in some very unusual cases, both.  If no command line parameters are used,
-the kernel will try to use DT for device enumeration; if there is no DT
-present, the kernel will try to use ACPI tables, but only if they are present.
-In neither is available, the kernel will not boot.  If acpi=force is used
-on the command line, the kernel will attempt to use ACPI tables first, but
-fall back to DT if there are no ACPI tables present.  The basic idea is that
-the kernel will not fail to boot unless it absolutely has no other choice.
-
-Processing of ACPI tables may be disabled by passing acpi=off on the kernel
-command line; this is the default behavior.
-
-In order for the kernel to load and use ACPI tables, the UEFI implementation
-MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with
-the ACPI signature "RSD PTR ").  If this pointer is incorrect and acpi=force
-is used, the kernel will disable ACPI and try to use DT to boot instead; the
-kernel has, in effect, determined that ACPI tables are not present at that
-point.
-
-If the pointer to the RSDP table is correct, the table will be mapped into
-the kernel by the ACPI core, using the address provided by UEFI.
-
-The ACPI core will then locate and map in all other ACPI tables provided by
-using the addresses in the RSDP table to find the XSDT (eXtended System
-Description Table).  The XSDT in turn provides the addresses to all other
-ACPI tables provided by the system firmware; the ACPI core will then traverse
-this table and map in the tables listed.
-
-The ACPI core will ignore any provided RSDT (Root System Description Table).
-RSDTs have been deprecated and are ignored on arm64 since they only allow
-for 32-bit addresses.
-
-Further, the ACPI core will only use the 64-bit address fields in the FADT
-(Fixed ACPI Description Table).  Any 32-bit address fields in the FADT will
-be ignored on arm64.
-
-Hardware reduced mode (see Section 4.1 of the ACPI 6.1 specification) will
-be enforced by the ACPI core on arm64.  Doing so allows the ACPI core to
-run less complex code since it no longer has to provide support for legacy
-hardware from other architectures.  Any fields that are not to be used for
-hardware reduced mode must be set to zero.
-
-For the ACPI core to operate properly, and in turn provide the information
-the kernel needs to configure devices, it expects to find the following
-tables (all section numbers refer to the ACPI 6.1 specification):
-
-    -- RSDP (Root System Description Pointer), section 5.2.5
-
-    -- XSDT (eXtended System Description Table), section 5.2.8
-
-    -- FADT (Fixed ACPI Description Table), section 5.2.9
-
-    -- DSDT (Differentiated System Description Table), section
-       5.2.11.1
-
-    -- MADT (Multiple APIC Description Table), section 5.2.12
-
-    -- GTDT (Generic Timer Description Table), section 5.2.24
-
-    -- If PCI is supported, the MCFG (Memory mapped ConFiGuration
-       Table), section 5.2.6, specifically Table 5-31.
-
-    -- If booting without a console=<device> kernel parameter is
-       supported, the SPCR (Serial Port Console Redirection table),
-       section 5.2.6, specifically Table 5-31.
-
-    -- If necessary to describe the I/O topology, SMMUs and GIC ITSs,
-       the IORT (Input Output Remapping Table, section 5.2.6, specifically
-       Table 5-31).
-
-    -- If NUMA is supported, the SRAT (System Resource Affinity Table)
-       and SLIT (System Locality distance Information Table), sections
-       5.2.16 and 5.2.17, respectively.
-
-If the above tables are not all present, the kernel may or may not be
-able to boot properly since it may not be able to configure all of the
-devices available.  This list of tables is not meant to be all inclusive;
-in some environments other tables may be needed (e.g., any of the APEI
-tables from section 18) to support specific functionality.
-
-
-ACPI Detection
---------------
-Drivers should determine their probe() type by checking for a null
-value for ACPI_HANDLE, or checking .of_node, or other information in
-the device structure.  This is detailed further in the "Driver
-Recommendations" section.
-
-In non-driver code, if the presence of ACPI needs to be detected at
-run time, then check the value of acpi_disabled. If CONFIG_ACPI is not
-set, acpi_disabled will always be 1.
-
-
-Device Enumeration
-------------------
-Device descriptions in ACPI should use standard recognized ACPI interfaces.
-These may contain less information than is typically provided via a Device
-Tree description for the same device.  This is also one of the reasons that
-ACPI can be useful -- the driver takes into account that it may have less
-detailed information about the device and uses sensible defaults instead.
-If done properly in the driver, the hardware can change and improve over
-time without the driver having to change at all.
-
-Clocks provide an excellent example.  In DT, clocks need to be specified
-and the drivers need to take them into account.  In ACPI, the assumption
-is that UEFI will leave the device in a reasonable default state, including
-any clock settings.  If for some reason the driver needs to change a clock
-value, this can be done in an ACPI method; all the driver needs to do is
-invoke the method and not concern itself with what the method needs to do
-to change the clock.  Changing the hardware can then take place over time
-by changing what the ACPI method does, and not the driver.
-
-In DT, the parameters needed by the driver to set up clocks as in the example
-above are known as "bindings"; in ACPI, these are known as "Device Properties"
-and provided to a driver via the _DSD object.
-
-ACPI tables are described with a formal language called ASL, the ACPI
-Source Language (section 19 of the specification).  This means that there
-are always multiple ways to describe the same thing -- including device
-properties.  For example, device properties could use an ASL construct
-that looks like this: Name(KEY0, "value0").  An ACPI device driver would
-then retrieve the value of the property by evaluating the KEY0 object.
-However, using Name() this way has multiple problems: (1) ACPI limits
-names ("KEY0") to four characters unlike DT; (2) there is no industry
-wide registry that maintains a list of names, minimizing re-use; (3)
-there is also no registry for the definition of property values ("value0"),
-again making re-use difficult; and (4) how does one maintain backward
-compatibility as new hardware comes out?  The _DSD method was created
-to solve precisely these sorts of problems; Linux drivers should ALWAYS
-use the _DSD method for device properties and nothing else.
-
-The _DSM object (ACPI Section 9.14.1) could also be used for conveying
-device properties to a driver.  Linux drivers should only expect it to
-be used if _DSD cannot represent the data required, and there is no way
-to create a new UUID for the _DSD object.  Note that there is even less
-regulation of the use of _DSM than there is of _DSD.  Drivers that depend
-on the contents of _DSM objects will be more difficult to maintain over
-time because of this; as of this writing, the use of _DSM is the cause
-of quite a few firmware problems and is not recommended.
-
-Drivers should look for device properties in the _DSD object ONLY; the _DSD
-object is described in the ACPI specification section 6.2.5, but this only
-describes how to define the structure of an object returned via _DSD, and
-how specific data structures are defined by specific UUIDs.  Linux should
-only use the _DSD Device Properties UUID [5]:
-
-   -- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
-
-   -- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
-
-The UEFI Forum provides a mechanism for registering device properties [4]
-so that they may be used across all operating systems supporting ACPI.
-Device properties that have not been registered with the UEFI Forum should
-not be used.
-
-Before creating new device properties, check to be sure that they have not
-been defined before and either registered in the Linux kernel documentation
-as DT bindings, or the UEFI Forum as device properties.  While we do not want
-to simply move all DT bindings into ACPI device properties, we can learn from
-what has been previously defined.
-
-If it is necessary to define a new device property, or if it makes sense to
-synthesize the definition of a binding so it can be used in any firmware,
-both DT bindings and ACPI device properties for device drivers have review
-processes.  Use them both.  When the driver itself is submitted for review
-to the Linux mailing lists, the device property definitions needed must be
-submitted at the same time.  A driver that supports ACPI and uses device
-properties will not be considered complete without their definitions.  Once
-the device property has been accepted by the Linux community, it must be
-registered with the UEFI Forum [4], which will review it again for consistency
-within the registry.  This may require iteration.  The UEFI Forum, though,
-will always be the canonical site for device property definitions.
-
-It may make sense to provide notice to the UEFI Forum that there is the
-intent to register a previously unused device property name as a means of
-reserving the name for later use.  Other operating system vendors will
-also be submitting registration requests and this may help smooth the
-process.
-
-Once registration and review have been completed, the kernel provides an
-interface for looking up device properties in a manner independent of
-whether DT or ACPI is being used.  This API should be used [6]; it can
-eliminate some duplication of code paths in driver probing functions and
-discourage divergence between DT bindings and ACPI device properties.
-
-
-Programmable Power Control Resources
-------------------------------------
-Programmable power control resources include such resources as voltage/current
-providers (regulators) and clock sources.
-
-With ACPI, the kernel clock and regulator framework is not expected to be used
-at all.
-
-The kernel assumes that power control of these resources is represented with
-Power Resource Objects (ACPI section 7.1).  The ACPI core will then handle
-correctly enabling and disabling resources as they are needed.  In order to
-get that to work, ACPI assumes each device has defined D-states and that these
-can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3;
-in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for
-turning a device full off.
-
-There are two options for using those Power Resources.  They can:
-
-   -- be managed in a _PSx method which gets called on entry to power
-      state Dx.
-
-   -- be declared separately as power resources with their own _ON and _OFF
-      methods.  They are then tied back to D-states for a particular device
-      via _PRx which specifies which power resources a device needs to be on
-      while in Dx.  Kernel then tracks number of devices using a power resource
-      and calls _ON/_OFF as needed.
-
-The kernel ACPI code will also assume that the _PSx methods follow the normal
-ACPI rules for such methods:
-
-   -- If either _PS0 or _PS3 is implemented, then the other method must also
-      be implemented.
-
-   -- If a device requires usage or setup of a power resource when on, the ASL
-      should organize that it is allocated/enabled using the _PS0 method.
-
-   -- Resources allocated or enabled in the _PS0 method should be disabled
-      or de-allocated in the _PS3 method.
-
-   -- Firmware will leave the resources in a reasonable state before handing
-      over control to the kernel.
-
-Such code in _PSx methods will of course be very platform specific.  But,
-this allows the driver to abstract out the interface for operating the device
-and avoid having to read special non-standard values from ACPI tables. Further,
-abstracting the use of these resources allows the hardware to change over time
-without requiring updates to the driver.
-
-
-Clocks
-------
-ACPI makes the assumption that clocks are initialized by the firmware --
-UEFI, in this case -- to some working value before control is handed over
-to the kernel.  This has implications for devices such as UARTs, or SoC-driven
-LCD displays, for example.
-
-When the kernel boots, the clocks are assumed to be set to reasonable
-working values.  If for some reason the frequency needs to change -- e.g.,
-throttling for power management -- the device driver should expect that
-process to be abstracted out into some ACPI method that can be invoked
-(please see the ACPI specification for further recommendations on standard
-methods to be expected).  The only exceptions to this are CPU clocks where
-CPPC provides a much richer interface than ACPI methods.  If the clocks
-are not set, there is no direct way for Linux to control them.
-
-If an SoC vendor wants to provide fine-grained control of the system clocks,
-they could do so by providing ACPI methods that could be invoked by Linux
-drivers.  However, this is NOT recommended and Linux drivers should NOT use
-such methods, even if they are provided.  Such methods are not currently
-standardized in the ACPI specification, and using them could tie a kernel
-to a very specific SoC, or tie an SoC to a very specific version of the
-kernel, both of which we are trying to avoid.
-
-
-Driver Recommendations
-----------------------
-DO NOT remove any DT handling when adding ACPI support for a driver.  The
-same device may be used on many different systems.
-
-DO try to structure the driver so that it is data-driven.  That is, set up
-a struct containing internal per-device state based on defaults and whatever
-else must be discovered by the driver probe function.  Then, have the rest
-of the driver operate off of the contents of that struct.  Doing so should
-allow most divergence between ACPI and DT functionality to be kept local to
-the probe function instead of being scattered throughout the driver.  For
-example:
-
-static int device_probe_dt(struct platform_device *pdev)
-{
-       /* DT specific functionality */
-       ...
-}
-
-static int device_probe_acpi(struct platform_device *pdev)
-{
-       /* ACPI specific functionality */
-       ...
-}
-
-static int device_probe(struct platform_device *pdev)
-{
-       ...
-       struct device_node node = pdev->dev.of_node;
-       ...
-
-       if (node)
-               ret = device_probe_dt(pdev);
-       else if (ACPI_HANDLE(&pdev->dev))
-               ret = device_probe_acpi(pdev);
-       else
-               /* other initialization */
-               ...
-       /* Continue with any generic probe operations */
-       ...
-}
-
-DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it
-clear the different names the driver is probed for, both from DT and from
-ACPI:
-
-static struct of_device_id virtio_mmio_match[] = {
-        { .compatible = "virtio,mmio", },
-        { }
-};
-MODULE_DEVICE_TABLE(of, virtio_mmio_match);
-
-static const struct acpi_device_id virtio_mmio_acpi_match[] = {
-        { "LNRO0005", },
-        { }
-};
-MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match);
-
-
-ASWG
-----
-The ACPI specification changes regularly.  During the year 2014, for instance,
-version 5.1 was released and version 6.0 substantially completed, with most of
-the changes being driven by ARM-specific requirements.  Proposed changes are
-presented and discussed in the ASWG (ACPI Specification Working Group) which
-is a part of the UEFI Forum.  The current version of the ACPI specification
-is 6.1 release in January 2016.
-
-Participation in this group is open to all UEFI members.  Please see
-http://www.uefi.org/workinggroup for details on group membership.
-
-It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification
-as closely as possible, and to only implement functionality that complies with
-the released standards from UEFI ASWG.  As a practical matter, there will be
-vendors that provide bad ACPI tables or violate the standards in some way.
-If this is because of errors, quirks and fix-ups may be necessary, but will
-be avoided if possible.  If there are features missing from ACPI that preclude
-it from being used on a platform, ECRs (Engineering Change Requests) should be
-submitted to ASWG and go through the normal approval process; for those that
-are not UEFI members, many other members of the Linux community are and would
-likely be willing to assist in submitting ECRs.
-
-
-Linux Code
-----------
-Individual items specific to Linux on ARM, contained in the the Linux
-source code, are in the list that follows:
-
-ACPI_OS_NAME           This macro defines the string to be returned when
-                       an ACPI method invokes the _OS method.  On ARM64
-                       systems, this macro will be "Linux" by default.
-                       The command line parameter acpi_os=<string>
-                       can be used to set it to some other value.  The
-                       default value for other architectures is "Microsoft
-                       Windows NT", for example.
-
-ACPI Objects
-------------
-Detailed expectations for ACPI tables and object are listed in the file
-Documentation/arm64/acpi_object_usage.txt.
-
-
-References
-----------
-[0] http://silver.arm.com -- document ARM-DEN-0029, or newer
-    "Server Base System Architecture", version 2.3, dated 27 Mar 2014
-
-[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf
-    Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System
-    Software on ARM Platforms", dated 16 Aug 2014
-
-[2] http://www.secretlab.ca/archives/151, 10 Jan 2015, Copyright (c) 2015,
-    Linaro Ltd., written by Grant Likely.
-
-[3] AMD ACPI for Seattle platform documentation:
-    http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf
-
-[4] http://www.uefi.org/acpi -- please see the link for the "ACPI _DSD Device
-    Property Registry Instructions"
-
-[5] http://www.uefi.org/acpi -- please see the link for the "_DSD (Device
-    Specific Data) Implementation Guide"
-
-[6] Kernel code for the unified device property interface can be found in
-    include/linux/property.h and drivers/base/property.c.
-
-
-Authors
--------
-Al Stone <al.stone@linaro.org>
-Graeme Gregory <graeme.gregory@linaro.org>
-Hanjun Guo <hanjun.guo@linaro.org>
-
-Grant Likely <grant.likely@linaro.org>, for the "Why ACPI on ARM?" section
diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
new file mode 100644
index 000000000000..3d041d0d16e8
--- /dev/null
+++ b/Documentation/arm64/booting.rst
@@ -0,0 +1,293 @@
+=====================
+Booting AArch64 Linux
+=====================
+
+Author: Will Deacon <will.deacon@arm.com>
+
+Date  : 07 September 2012
+
+This document is based on the ARM booting document by Russell King and
+is relevant to all public releases of the AArch64 Linux kernel.
+
+The AArch64 exception model is made up of a number of exception levels
+(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
+counterpart.  EL2 is the hypervisor level and exists only in non-secure
+mode. EL3 is the highest priority level and exists only in secure mode.
+
+For the purposes of this document, we will use the term `boot loader`
+simply to define all software that executes on the CPU(s) before control
+is passed to the Linux kernel.  This may include secure monitor and
+hypervisor code, or it may just be a handful of instructions for
+preparing a minimal boot environment.
+
+Essentially, the boot loader should provide (as a minimum) the
+following:
+
+1. Setup and initialise the RAM
+2. Setup the device tree
+3. Decompress the kernel image
+4. Call the kernel image
+
+
+1. Setup and initialise RAM
+---------------------------
+
+Requirement: MANDATORY
+
+The boot loader is expected to find and initialise all RAM that the
+kernel will use for volatile data storage in the system.  It performs
+this in a machine dependent manner.  (It may use internal algorithms
+to automatically locate and size all RAM, or it may use knowledge of
+the RAM in the machine, or any other method the boot loader designer
+sees fit.)
+
+
+2. Setup the device tree
+-------------------------
+
+Requirement: MANDATORY
+
+The device tree blob (dtb) must be placed on an 8-byte boundary and must
+not exceed 2 megabytes in size. Since the dtb will be mapped cacheable
+using blocks of up to 2 megabytes in size, it must not be placed within
+any 2M region which must be mapped with any specific attributes.
+
+NOTE: versions prior to v4.2 also require that the DTB be placed within
+the 512 MB region starting at text_offset bytes below the kernel Image.
+
+3. Decompress the kernel image
+------------------------------
+
+Requirement: OPTIONAL
+
+The AArch64 kernel does not currently provide a decompressor and
+therefore requires decompression (gzip etc.) to be performed by the boot
+loader if a compressed Image target (e.g. Image.gz) is used.  For
+bootloaders that do not implement this requirement, the uncompressed
+Image target is available instead.
+
+
+4. Call the kernel image
+------------------------
+
+Requirement: MANDATORY
+
+The decompressed kernel image contains a 64-byte header as follows::
+
+  u32 code0;			/* Executable code */
+  u32 code1;			/* Executable code */
+  u64 text_offset;		/* Image load offset, little endian */
+  u64 image_size;		/* Effective Image size, little endian */
+  u64 flags;			/* kernel flags, little endian */
+  u64 res2	= 0;		/* reserved */
+  u64 res3	= 0;		/* reserved */
+  u64 res4	= 0;		/* reserved */
+  u32 magic	= 0x644d5241;	/* Magic number, little endian, "ARM\x64" */
+  u32 res5;			/* reserved (used for PE COFF offset) */
+
+
+Header notes:
+
+- As of v3.17, all fields are little endian unless stated otherwise.
+
+- code0/code1 are responsible for branching to stext.
+
+- when booting through EFI, code0/code1 are initially skipped.
+  res5 is an offset to the PE header and the PE header has the EFI
+  entry point (efi_stub_entry).  When the stub has done its work, it
+  jumps to code0 to resume the normal boot process.
+
+- Prior to v3.17, the endianness of text_offset was not specified.  In
+  these cases image_size is zero and text_offset is 0x80000 in the
+  endianness of the kernel.  Where image_size is non-zero image_size is
+  little-endian and must be respected.  Where image_size is zero,
+  text_offset can be assumed to be 0x80000.
+
+- The flags field (introduced in v3.17) is a little-endian 64-bit field
+  composed as follows:
+
+  ============= ===============================================================
+  Bit 0		Kernel endianness.  1 if BE, 0 if LE.
+  Bit 1-2	Kernel Page size.
+
+			* 0 - Unspecified.
+			* 1 - 4K
+			* 2 - 16K
+			* 3 - 64K
+  Bit 3		Kernel physical placement
+
+			0
+			  2MB aligned base should be as close as possible
+			  to the base of DRAM, since memory below it is not
+			  accessible via the linear mapping
+			1
+			  2MB aligned base may be anywhere in physical
+			  memory
+  Bits 4-63	Reserved.
+  ============= ===============================================================
+
+- When image_size is zero, a bootloader should attempt to keep as much
+  memory as possible free for use by the kernel immediately after the
+  end of the kernel image. The amount of space required will vary
+  depending on selected features, and is effectively unbound.
+
+The Image must be placed text_offset bytes from a 2MB aligned base
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
+At least image_size bytes from the start of the image must be free for
+use by the kernel.
+NOTE: versions prior to v4.6 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
+
+If an initrd/initramfs is passed to the kernel at boot, it must reside
+entirely within a 1 GB aligned physical memory window of up to 32 GB in
+size that fully covers the kernel Image as well.
+
+Any memory described to the kernel (even that below the start of the
+image) which is not marked as reserved from the kernel (e.g., with a
+memreserve region in the device tree) will be considered as available to
+the kernel.
+
+Before jumping into the kernel, the following conditions must be met:
+
+- Quiesce all DMA capable devices so that memory does not get
+  corrupted by bogus network packets or disk data.  This will save
+  you many hours of debug.
+
+- Primary CPU general-purpose register settings:
+
+    - x0 = physical address of device tree blob (dtb) in system RAM.
+    - x1 = 0 (reserved for future use)
+    - x2 = 0 (reserved for future use)
+    - x3 = 0 (reserved for future use)
+
+- CPU mode
+
+  All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
+  IRQ and FIQ).
+  The CPU must be in either EL2 (RECOMMENDED in order to have access to
+  the virtualisation extensions) or non-secure EL1.
+
+- Caches, MMUs
+
+  The MMU must be off.
+  Instruction cache may be on or off.
+  The address range corresponding to the loaded kernel image must be
+  cleaned to the PoC. In the presence of a system cache or other
+  coherent masters with caches enabled, this will typically require
+  cache maintenance by VA rather than set/way operations.
+  System caches which respect the architected cache maintenance by VA
+  operations must be configured and may be enabled.
+  System caches which do not respect architected cache maintenance by VA
+  operations (not recommended) must be configured and disabled.
+
+- Architected timers
+
+  CNTFRQ must be programmed with the timer frequency and CNTVOFF must
+  be programmed with a consistent value on all CPUs.  If entering the
+  kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where
+  available.
+
+- Coherency
+
+  All CPUs to be booted by the kernel must be part of the same coherency
+  domain on entry to the kernel.  This may require IMPLEMENTATION DEFINED
+  initialisation to enable the receiving of maintenance operations on
+  each CPU.
+
+- System registers
+
+  All writable architected system registers at the exception level where
+  the kernel image will be entered must be initialised by software at a
+  higher exception level to prevent execution in an UNKNOWN state.
+
+  - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
+    executing on.
+  - The value of SCR_EL3.FIQ must be the same as the one present at boot
+    time whenever the kernel is executing.
+
+  For systems with a GICv3 interrupt controller to be used in v3 mode:
+  - If EL3 is present:
+
+      - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
+      - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
+
+  - If the kernel is entered at EL1:
+
+      - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
+      - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
+
+  - The DT or ACPI tables must describe a GICv3 interrupt controller.
+
+  For systems with a GICv3 interrupt controller to be used in
+  compatibility (v2) mode:
+
+  - If EL3 is present:
+
+      ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0.
+
+  - If the kernel is entered at EL1:
+
+      ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0.
+
+  - The DT or ACPI tables must describe a GICv2 interrupt controller.
+
+  For CPUs with pointer authentication functionality:
+  - If EL3 is present:
+
+    - SCR_EL3.APK (bit 16) must be initialised to 0b1
+    - SCR_EL3.API (bit 17) must be initialised to 0b1
+
+  - If the kernel is entered at EL1:
+
+    - HCR_EL2.APK (bit 40) must be initialised to 0b1
+    - HCR_EL2.API (bit 41) must be initialised to 0b1
+
+The requirements described above for CPU mode, caches, MMUs, architected
+timers, coherency and system registers apply to all CPUs.  All CPUs must
+enter the kernel in the same exception level.
+
+The boot loader is expected to enter the kernel on each CPU in the
+following manner:
+
+- The primary CPU must jump directly to the first instruction of the
+  kernel image.  The device tree blob passed by this CPU must contain
+  an 'enable-method' property for each cpu node.  The supported
+  enable-methods are described below.
+
+  It is expected that the bootloader will generate these device tree
+  properties and insert them into the blob prior to kernel entry.
+
+- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr'
+  property in their cpu node.  This property identifies a
+  naturally-aligned 64-bit zero-initalised memory location.
+
+  These CPUs should spin outside of the kernel in a reserved area of
+  memory (communicated to the kernel by a /memreserve/ region in the
+  device tree) polling their cpu-release-addr location, which must be
+  contained in the reserved region.  A wfe instruction may be inserted
+  to reduce the overhead of the busy-loop and a sev will be issued by
+  the primary CPU.  When a read of the location pointed to by the
+  cpu-release-addr returns a non-zero value, the CPU must jump to this
+  value.  The value will be written as a single 64-bit little-endian
+  value, so CPUs must convert the read value to their native endianness
+  before jumping to it.
+
+- CPUs with a "psci" enable method should remain outside of
+  the kernel (i.e. outside of the regions of memory described to the
+  kernel in the memory node, or in a reserved area of memory described
+  to the kernel by a /memreserve/ region in the device tree).  The
+  kernel will issue CPU_ON calls as described in ARM document number ARM
+  DEN 0022A ("Power State Coordination Interface System Software on ARM
+  processors") to bring CPUs into the kernel.
+
+  The device tree should contain a 'psci' node, as described in
+  Documentation/devicetree/bindings/arm/psci.txt.
+
+- Secondary CPU general-purpose register settings
+  x0 = 0 (reserved for future use)
+  x1 = 0 (reserved for future use)
+  x2 = 0 (reserved for future use)
+  x3 = 0 (reserved for future use)
diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
deleted file mode 100644
index fbab7e21d116..000000000000
--- a/Documentation/arm64/booting.txt
+++ /dev/null
@@ -1,266 +0,0 @@
-			Booting AArch64 Linux
-			=====================
-
-Author: Will Deacon <will.deacon@arm.com>
-Date  : 07 September 2012
-
-This document is based on the ARM booting document by Russell King and
-is relevant to all public releases of the AArch64 Linux kernel.
-
-The AArch64 exception model is made up of a number of exception levels
-(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
-counterpart.  EL2 is the hypervisor level and exists only in non-secure
-mode. EL3 is the highest priority level and exists only in secure mode.
-
-For the purposes of this document, we will use the term `boot loader'
-simply to define all software that executes on the CPU(s) before control
-is passed to the Linux kernel.  This may include secure monitor and
-hypervisor code, or it may just be a handful of instructions for
-preparing a minimal boot environment.
-
-Essentially, the boot loader should provide (as a minimum) the
-following:
-
-1. Setup and initialise the RAM
-2. Setup the device tree
-3. Decompress the kernel image
-4. Call the kernel image
-
-
-1. Setup and initialise RAM
----------------------------
-
-Requirement: MANDATORY
-
-The boot loader is expected to find and initialise all RAM that the
-kernel will use for volatile data storage in the system.  It performs
-this in a machine dependent manner.  (It may use internal algorithms
-to automatically locate and size all RAM, or it may use knowledge of
-the RAM in the machine, or any other method the boot loader designer
-sees fit.)
-
-
-2. Setup the device tree
--------------------------
-
-Requirement: MANDATORY
-
-The device tree blob (dtb) must be placed on an 8-byte boundary and must
-not exceed 2 megabytes in size. Since the dtb will be mapped cacheable
-using blocks of up to 2 megabytes in size, it must not be placed within
-any 2M region which must be mapped with any specific attributes.
-
-NOTE: versions prior to v4.2 also require that the DTB be placed within
-the 512 MB region starting at text_offset bytes below the kernel Image.
-
-3. Decompress the kernel image
-------------------------------
-
-Requirement: OPTIONAL
-
-The AArch64 kernel does not currently provide a decompressor and
-therefore requires decompression (gzip etc.) to be performed by the boot
-loader if a compressed Image target (e.g. Image.gz) is used.  For
-bootloaders that do not implement this requirement, the uncompressed
-Image target is available instead.
-
-
-4. Call the kernel image
-------------------------
-
-Requirement: MANDATORY
-
-The decompressed kernel image contains a 64-byte header as follows:
-
-  u32 code0;			/* Executable code */
-  u32 code1;			/* Executable code */
-  u64 text_offset;		/* Image load offset, little endian */
-  u64 image_size;		/* Effective Image size, little endian */
-  u64 flags;			/* kernel flags, little endian */
-  u64 res2	= 0;		/* reserved */
-  u64 res3	= 0;		/* reserved */
-  u64 res4	= 0;		/* reserved */
-  u32 magic	= 0x644d5241;	/* Magic number, little endian, "ARM\x64" */
-  u32 res5;			/* reserved (used for PE COFF offset) */
-
-
-Header notes:
-
-- As of v3.17, all fields are little endian unless stated otherwise.
-
-- code0/code1 are responsible for branching to stext.
-
-- when booting through EFI, code0/code1 are initially skipped.
-  res5 is an offset to the PE header and the PE header has the EFI
-  entry point (efi_stub_entry).  When the stub has done its work, it
-  jumps to code0 to resume the normal boot process.
-
-- Prior to v3.17, the endianness of text_offset was not specified.  In
-  these cases image_size is zero and text_offset is 0x80000 in the
-  endianness of the kernel.  Where image_size is non-zero image_size is
-  little-endian and must be respected.  Where image_size is zero,
-  text_offset can be assumed to be 0x80000.
-
-- The flags field (introduced in v3.17) is a little-endian 64-bit field
-  composed as follows:
-  Bit 0:	Kernel endianness.  1 if BE, 0 if LE.
-  Bit 1-2:	Kernel Page size.
-			0 - Unspecified.
-			1 - 4K
-			2 - 16K
-			3 - 64K
-  Bit 3:	Kernel physical placement
-			0 - 2MB aligned base should be as close as possible
-			    to the base of DRAM, since memory below it is not
-			    accessible via the linear mapping
-			1 - 2MB aligned base may be anywhere in physical
-			    memory
-  Bits 4-63:	Reserved.
-
-- When image_size is zero, a bootloader should attempt to keep as much
-  memory as possible free for use by the kernel immediately after the
-  end of the kernel image. The amount of space required will vary
-  depending on selected features, and is effectively unbound.
-
-The Image must be placed text_offset bytes from a 2MB aligned base
-address anywhere in usable system RAM and called there. The region
-between the 2 MB aligned base address and the start of the image has no
-special significance to the kernel, and may be used for other purposes.
-At least image_size bytes from the start of the image must be free for
-use by the kernel.
-NOTE: versions prior to v4.6 cannot make use of memory below the
-physical offset of the Image so it is recommended that the Image be
-placed as close as possible to the start of system RAM.
-
-If an initrd/initramfs is passed to the kernel at boot, it must reside
-entirely within a 1 GB aligned physical memory window of up to 32 GB in
-size that fully covers the kernel Image as well.
-
-Any memory described to the kernel (even that below the start of the
-image) which is not marked as reserved from the kernel (e.g., with a
-memreserve region in the device tree) will be considered as available to
-the kernel.
-
-Before jumping into the kernel, the following conditions must be met:
-
-- Quiesce all DMA capable devices so that memory does not get
-  corrupted by bogus network packets or disk data.  This will save
-  you many hours of debug.
-
-- Primary CPU general-purpose register settings
-  x0 = physical address of device tree blob (dtb) in system RAM.
-  x1 = 0 (reserved for future use)
-  x2 = 0 (reserved for future use)
-  x3 = 0 (reserved for future use)
-
-- CPU mode
-  All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
-  IRQ and FIQ).
-  The CPU must be in either EL2 (RECOMMENDED in order to have access to
-  the virtualisation extensions) or non-secure EL1.
-
-- Caches, MMUs
-  The MMU must be off.
-  Instruction cache may be on or off.
-  The address range corresponding to the loaded kernel image must be
-  cleaned to the PoC. In the presence of a system cache or other
-  coherent masters with caches enabled, this will typically require
-  cache maintenance by VA rather than set/way operations.
-  System caches which respect the architected cache maintenance by VA
-  operations must be configured and may be enabled.
-  System caches which do not respect architected cache maintenance by VA
-  operations (not recommended) must be configured and disabled.
-
-- Architected timers
-  CNTFRQ must be programmed with the timer frequency and CNTVOFF must
-  be programmed with a consistent value on all CPUs.  If entering the
-  kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where
-  available.
-
-- Coherency
-  All CPUs to be booted by the kernel must be part of the same coherency
-  domain on entry to the kernel.  This may require IMPLEMENTATION DEFINED
-  initialisation to enable the receiving of maintenance operations on
-  each CPU.
-
-- System registers
-  All writable architected system registers at the exception level where
-  the kernel image will be entered must be initialised by software at a
-  higher exception level to prevent execution in an UNKNOWN state.
-
-  - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
-    executing on.
-  - The value of SCR_EL3.FIQ must be the same as the one present at boot
-    time whenever the kernel is executing.
-
-  For systems with a GICv3 interrupt controller to be used in v3 mode:
-  - If EL3 is present:
-    ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
-    ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
-  - If the kernel is entered at EL1:
-    ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
-    ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
-  - The DT or ACPI tables must describe a GICv3 interrupt controller.
-
-  For systems with a GICv3 interrupt controller to be used in
-  compatibility (v2) mode:
-  - If EL3 is present:
-    ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0.
-  - If the kernel is entered at EL1:
-    ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0.
-  - The DT or ACPI tables must describe a GICv2 interrupt controller.
-
-  For CPUs with pointer authentication functionality:
-  - If EL3 is present:
-    SCR_EL3.APK (bit 16) must be initialised to 0b1
-    SCR_EL3.API (bit 17) must be initialised to 0b1
-  - If the kernel is entered at EL1:
-    HCR_EL2.APK (bit 40) must be initialised to 0b1
-    HCR_EL2.API (bit 41) must be initialised to 0b1
-
-The requirements described above for CPU mode, caches, MMUs, architected
-timers, coherency and system registers apply to all CPUs.  All CPUs must
-enter the kernel in the same exception level.
-
-The boot loader is expected to enter the kernel on each CPU in the
-following manner:
-
-- The primary CPU must jump directly to the first instruction of the
-  kernel image.  The device tree blob passed by this CPU must contain
-  an 'enable-method' property for each cpu node.  The supported
-  enable-methods are described below.
-
-  It is expected that the bootloader will generate these device tree
-  properties and insert them into the blob prior to kernel entry.
-
-- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr'
-  property in their cpu node.  This property identifies a
-  naturally-aligned 64-bit zero-initalised memory location.
-
-  These CPUs should spin outside of the kernel in a reserved area of
-  memory (communicated to the kernel by a /memreserve/ region in the
-  device tree) polling their cpu-release-addr location, which must be
-  contained in the reserved region.  A wfe instruction may be inserted
-  to reduce the overhead of the busy-loop and a sev will be issued by
-  the primary CPU.  When a read of the location pointed to by the
-  cpu-release-addr returns a non-zero value, the CPU must jump to this
-  value.  The value will be written as a single 64-bit little-endian
-  value, so CPUs must convert the read value to their native endianness
-  before jumping to it.
-
-- CPUs with a "psci" enable method should remain outside of
-  the kernel (i.e. outside of the regions of memory described to the
-  kernel in the memory node, or in a reserved area of memory described
-  to the kernel by a /memreserve/ region in the device tree).  The
-  kernel will issue CPU_ON calls as described in ARM document number ARM
-  DEN 0022A ("Power State Coordination Interface System Software on ARM
-  processors") to bring CPUs into the kernel.
-
-  The device tree should contain a 'psci' node, as described in
-  Documentation/devicetree/bindings/arm/psci.txt.
-
-- Secondary CPU general-purpose register settings
-  x0 = 0 (reserved for future use)
-  x1 = 0 (reserved for future use)
-  x2 = 0 (reserved for future use)
-  x3 = 0 (reserved for future use)
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
new file mode 100644
index 000000000000..2955287e9acc
--- /dev/null
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -0,0 +1,304 @@
+===========================
+ARM64 CPU Feature Registers
+===========================
+
+Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+
+
+This file describes the ABI for exporting the AArch64 CPU ID/feature
+registers to userspace. The availability of this ABI is advertised
+via the HWCAP_CPUID in HWCAPs.
+
+1. Motivation
+-------------
+
+The ARM architecture defines a set of feature registers, which describe
+the capabilities of the CPU/system. Access to these system registers is
+restricted from EL0 and there is no reliable way for an application to
+extract this information to make better decisions at runtime. There is
+limited information available to the application via HWCAPs, however
+there are some issues with their usage.
+
+ a) Any change to the HWCAPs requires an update to userspace (e.g libc)
+    to detect the new changes, which can take a long time to appear in
+    distributions. Exposing the registers allows applications to get the
+    information without requiring updates to the toolchains.
+
+ b) Access to HWCAPs is sometimes limited (e.g prior to libc, or
+    when ld is initialised at startup time).
+
+ c) HWCAPs cannot represent non-boolean information effectively. The
+    architecture defines a canonical format for representing features
+    in the ID registers; this is well defined and is capable of
+    representing all valid architecture variations.
+
+
+2. Requirements
+---------------
+
+ a) Safety:
+
+    Applications should be able to use the information provided by the
+    infrastructure to run safely across the system. This has greater
+    implications on a system with heterogeneous CPUs.
+    The infrastructure exports a value that is safe across all the
+    available CPU on the system.
+
+    e.g, If at least one CPU doesn't implement CRC32 instructions, while
+    others do, we should report that the CRC32 is not implemented.
+    Otherwise an application could crash when scheduled on the CPU
+    which doesn't support CRC32.
+
+ b) Security:
+
+    Applications should only be able to receive information that is
+    relevant to the normal operation in userspace. Hence, some of the
+    fields are masked out(i.e, made invisible) and their values are set to
+    indicate the feature is 'not supported'. See Section 4 for the list
+    of visible features. Also, the kernel may manipulate the fields
+    based on what it supports. e.g, If FP is not supported by the
+    kernel, the values could indicate that the FP is not available
+    (even when the CPU provides it).
+
+ c) Implementation Defined Features
+
+    The infrastructure doesn't expose any register which is
+    IMPLEMENTATION DEFINED as per ARMv8-A Architecture.
+
+ d) CPU Identification:
+
+    MIDR_EL1 is exposed to help identify the processor. On a
+    heterogeneous system, this could be racy (just like getcpu()). The
+    process could be migrated to another CPU by the time it uses the
+    register value, unless the CPU affinity is set. Hence, there is no
+    guarantee that the value reflects the processor that it is
+    currently executing on. The REVIDR is not exposed due to this
+    constraint, as REVIDR makes sense only in conjunction with the
+    MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
+    at::
+
+	/sys/devices/system/cpu/cpu$ID/regs/identification/
+	                                              \- midr
+	                                              \- revidr
+
+3. Implementation
+--------------------
+
+The infrastructure is built on the emulation of the 'MRS' instruction.
+Accessing a restricted system register from an application generates an
+exception and ends up in SIGILL being delivered to the process.
+The infrastructure hooks into the exception handler and emulates the
+operation if the source belongs to the supported system register space.
+
+The infrastructure emulates only the following system register space::
+
+	Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7
+
+(See Table C5-6 'System instruction encodings for non-Debug System
+register accesses' in ARMv8 ARM DDI 0487A.h, for the list of
+registers).
+
+The following rules are applied to the value returned by the
+infrastructure:
+
+ a) The value of an 'IMPLEMENTATION DEFINED' field is set to 0.
+ b) The value of a reserved field is populated with the reserved
+    value as defined by the architecture.
+ c) The value of a 'visible' field holds the system wide safe value
+    for the particular feature (except for MIDR_EL1, see section 4).
+ d) All other fields (i.e, invisible fields) are set to indicate
+    the feature is missing (as defined by the architecture).
+
+4. List of registers with visible features
+-------------------------------------------
+
+  1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | TS                           | [55-52] |    y    |
+     +------------------------------+---------+---------+
+     | FHM                          | [51-48] |    y    |
+     +------------------------------+---------+---------+
+     | DP                           | [47-44] |    y    |
+     +------------------------------+---------+---------+
+     | SM4                          | [43-40] |    y    |
+     +------------------------------+---------+---------+
+     | SM3                          | [39-36] |    y    |
+     +------------------------------+---------+---------+
+     | SHA3                         | [35-32] |    y    |
+     +------------------------------+---------+---------+
+     | RDM                          | [31-28] |    y    |
+     +------------------------------+---------+---------+
+     | ATOMICS                      | [23-20] |    y    |
+     +------------------------------+---------+---------+
+     | CRC32                        | [19-16] |    y    |
+     +------------------------------+---------+---------+
+     | SHA2                         | [15-12] |    y    |
+     +------------------------------+---------+---------+
+     | SHA1                         | [11-8]  |    y    |
+     +------------------------------+---------+---------+
+     | AES                          | [7-4]   |    y    |
+     +------------------------------+---------+---------+
+
+
+  2) ID_AA64PFR0_EL1 - Processor Feature Register 0
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | DIT                          | [51-48] |    y    |
+     +------------------------------+---------+---------+
+     | SVE                          | [35-32] |    y    |
+     +------------------------------+---------+---------+
+     | GIC                          | [27-24] |    n    |
+     +------------------------------+---------+---------+
+     | AdvSIMD                      | [23-20] |    y    |
+     +------------------------------+---------+---------+
+     | FP                           | [19-16] |    y    |
+     +------------------------------+---------+---------+
+     | EL3                          | [15-12] |    n    |
+     +------------------------------+---------+---------+
+     | EL2                          | [11-8]  |    n    |
+     +------------------------------+---------+---------+
+     | EL1                          | [7-4]   |    n    |
+     +------------------------------+---------+---------+
+     | EL0                          | [3-0]   |    n    |
+     +------------------------------+---------+---------+
+
+
+  3) MIDR_EL1 - Main ID Register
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | Implementer                  | [31-24] |    y    |
+     +------------------------------+---------+---------+
+     | Variant                      | [23-20] |    y    |
+     +------------------------------+---------+---------+
+     | Architecture                 | [19-16] |    y    |
+     +------------------------------+---------+---------+
+     | PartNum                      | [15-4]  |    y    |
+     +------------------------------+---------+---------+
+     | Revision                     | [3-0]   |    y    |
+     +------------------------------+---------+---------+
+
+   NOTE: The 'visible' fields of MIDR_EL1 will contain the value
+   as available on the CPU where it is fetched and is not a system
+   wide safe value.
+
+  4) ID_AA64ISAR1_EL1 - Instruction set attribute register 1
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | GPI                          | [31-28] |    y    |
+     +------------------------------+---------+---------+
+     | GPA                          | [27-24] |    y    |
+     +------------------------------+---------+---------+
+     | LRCPC                        | [23-20] |    y    |
+     +------------------------------+---------+---------+
+     | FCMA                         | [19-16] |    y    |
+     +------------------------------+---------+---------+
+     | JSCVT                        | [15-12] |    y    |
+     +------------------------------+---------+---------+
+     | API                          | [11-8]  |    y    |
+     +------------------------------+---------+---------+
+     | APA                          | [7-4]   |    y    |
+     +------------------------------+---------+---------+
+     | DPB                          | [3-0]   |    y    |
+     +------------------------------+---------+---------+
+
+  5) ID_AA64MMFR2_EL1 - Memory model feature register 2
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | AT                           | [35-32] |    y    |
+     +------------------------------+---------+---------+
+
+  6) ID_AA64ZFR0_EL1 - SVE feature ID register 0
+
+     +------------------------------+---------+---------+
+     | Name                         |  bits   | visible |
+     +------------------------------+---------+---------+
+     | SM4                          | [43-40] |    y    |
+     +------------------------------+---------+---------+
+     | SHA3                         | [35-32] |    y    |
+     +------------------------------+---------+---------+
+     | BitPerm                      | [19-16] |    y    |
+     +------------------------------+---------+---------+
+     | AES                          | [7-4]   |    y    |
+     +------------------------------+---------+---------+
+     | SVEVer                       | [3-0]   |    y    |
+     +------------------------------+---------+---------+
+
+Appendix I: Example
+-------------------
+
+::
+
+  /*
+   * Sample program to demonstrate the MRS emulation ABI.
+   *
+   * Copyright (C) 2015-2016, ARM Ltd
+   *
+   * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+   *
+   * This program is free software; you can redistribute it and/or modify
+   * it under the terms of the GNU General Public License version 2 as
+   * published by the Free Software Foundation.
+   *
+   * This program is distributed in the hope that it will be useful,
+   * but WITHOUT ANY WARRANTY; without even the implied warranty of
+   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   * GNU General Public License for more details.
+   * This program is free software; you can redistribute it and/or modify
+   * it under the terms of the GNU General Public License version 2 as
+   * published by the Free Software Foundation.
+   *
+   * This program is distributed in the hope that it will be useful,
+   * but WITHOUT ANY WARRANTY; without even the implied warranty of
+   * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   * GNU General Public License for more details.
+   */
+
+  #include <asm/hwcap.h>
+  #include <stdio.h>
+  #include <sys/auxv.h>
+
+  #define get_cpu_ftr(id) ({					\
+		unsigned long __val;				\
+		asm("mrs %0, "#id : "=r" (__val));		\
+		printf("%-20s: 0x%016lx\n", #id, __val);	\
+	})
+
+  int main(void)
+  {
+
+	if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
+		fputs("CPUID registers unavailable\n", stderr);
+		return 1;
+	}
+
+	get_cpu_ftr(ID_AA64ISAR0_EL1);
+	get_cpu_ftr(ID_AA64ISAR1_EL1);
+	get_cpu_ftr(ID_AA64MMFR0_EL1);
+	get_cpu_ftr(ID_AA64MMFR1_EL1);
+	get_cpu_ftr(ID_AA64PFR0_EL1);
+	get_cpu_ftr(ID_AA64PFR1_EL1);
+	get_cpu_ftr(ID_AA64DFR0_EL1);
+	get_cpu_ftr(ID_AA64DFR1_EL1);
+
+	get_cpu_ftr(MIDR_EL1);
+	get_cpu_ftr(MPIDR_EL1);
+	get_cpu_ftr(REVIDR_EL1);
+
+  #if 0
+	/* Unexposed register access causes SIGILL */
+	get_cpu_ftr(ID_MMFR0_EL1);
+  #endif
+
+	return 0;
+  }
diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
deleted file mode 100644
index 684a0da39378..000000000000
--- a/Documentation/arm64/cpu-feature-registers.txt
+++ /dev/null
@@ -1,296 +0,0 @@
-		ARM64 CPU Feature Registers
-		===========================
-
-Author: Suzuki K Poulose <suzuki.poulose@arm.com>
-
-
-This file describes the ABI for exporting the AArch64 CPU ID/feature
-registers to userspace. The availability of this ABI is advertised
-via the HWCAP_CPUID in HWCAPs.
-
-1. Motivation
----------------
-
-The ARM architecture defines a set of feature registers, which describe
-the capabilities of the CPU/system. Access to these system registers is
-restricted from EL0 and there is no reliable way for an application to
-extract this information to make better decisions at runtime. There is
-limited information available to the application via HWCAPs, however
-there are some issues with their usage.
-
- a) Any change to the HWCAPs requires an update to userspace (e.g libc)
-    to detect the new changes, which can take a long time to appear in
-    distributions. Exposing the registers allows applications to get the
-    information without requiring updates to the toolchains.
-
- b) Access to HWCAPs is sometimes limited (e.g prior to libc, or
-    when ld is initialised at startup time).
-
- c) HWCAPs cannot represent non-boolean information effectively. The
-    architecture defines a canonical format for representing features
-    in the ID registers; this is well defined and is capable of
-    representing all valid architecture variations.
-
-
-2. Requirements
------------------
-
- a) Safety :
-    Applications should be able to use the information provided by the
-    infrastructure to run safely across the system. This has greater
-    implications on a system with heterogeneous CPUs.
-    The infrastructure exports a value that is safe across all the
-    available CPU on the system.
-
-    e.g, If at least one CPU doesn't implement CRC32 instructions, while
-    others do, we should report that the CRC32 is not implemented.
-    Otherwise an application could crash when scheduled on the CPU
-    which doesn't support CRC32.
-
- b) Security :
-    Applications should only be able to receive information that is
-    relevant to the normal operation in userspace. Hence, some of the
-    fields are masked out(i.e, made invisible) and their values are set to
-    indicate the feature is 'not supported'. See Section 4 for the list
-    of visible features. Also, the kernel may manipulate the fields
-    based on what it supports. e.g, If FP is not supported by the
-    kernel, the values could indicate that the FP is not available
-    (even when the CPU provides it).
-
- c) Implementation Defined Features
-    The infrastructure doesn't expose any register which is
-    IMPLEMENTATION DEFINED as per ARMv8-A Architecture.
-
- d) CPU Identification :
-    MIDR_EL1 is exposed to help identify the processor. On a
-    heterogeneous system, this could be racy (just like getcpu()). The
-    process could be migrated to another CPU by the time it uses the
-    register value, unless the CPU affinity is set. Hence, there is no
-    guarantee that the value reflects the processor that it is
-    currently executing on. The REVIDR is not exposed due to this
-    constraint, as REVIDR makes sense only in conjunction with the
-    MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
-    at:
-
-	/sys/devices/system/cpu/cpu$ID/regs/identification/
-	                                              \- midr
-	                                              \- revidr
-
-3. Implementation
---------------------
-
-The infrastructure is built on the emulation of the 'MRS' instruction.
-Accessing a restricted system register from an application generates an
-exception and ends up in SIGILL being delivered to the process.
-The infrastructure hooks into the exception handler and emulates the
-operation if the source belongs to the supported system register space.
-
-The infrastructure emulates only the following system register space:
-	Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7
-
-(See Table C5-6 'System instruction encodings for non-Debug System
-register accesses' in ARMv8 ARM DDI 0487A.h, for the list of
-registers).
-
-The following rules are applied to the value returned by the
-infrastructure:
-
- a) The value of an 'IMPLEMENTATION DEFINED' field is set to 0.
- b) The value of a reserved field is populated with the reserved
-    value as defined by the architecture.
- c) The value of a 'visible' field holds the system wide safe value
-    for the particular feature (except for MIDR_EL1, see section 4).
- d) All other fields (i.e, invisible fields) are set to indicate
-    the feature is missing (as defined by the architecture).
-
-4. List of registers with visible features
--------------------------------------------
-
-  1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | TS                           | [55-52] |    y    |
-     |--------------------------------------------------|
-     | FHM                          | [51-48] |    y    |
-     |--------------------------------------------------|
-     | DP                           | [47-44] |    y    |
-     |--------------------------------------------------|
-     | SM4                          | [43-40] |    y    |
-     |--------------------------------------------------|
-     | SM3                          | [39-36] |    y    |
-     |--------------------------------------------------|
-     | SHA3                         | [35-32] |    y    |
-     |--------------------------------------------------|
-     | RDM                          | [31-28] |    y    |
-     |--------------------------------------------------|
-     | ATOMICS                      | [23-20] |    y    |
-     |--------------------------------------------------|
-     | CRC32                        | [19-16] |    y    |
-     |--------------------------------------------------|
-     | SHA2                         | [15-12] |    y    |
-     |--------------------------------------------------|
-     | SHA1                         | [11-8]  |    y    |
-     |--------------------------------------------------|
-     | AES                          | [7-4]   |    y    |
-     x--------------------------------------------------x
-
-
-  2) ID_AA64PFR0_EL1 - Processor Feature Register 0
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | DIT                          | [51-48] |    y    |
-     |--------------------------------------------------|
-     | SVE                          | [35-32] |    y    |
-     |--------------------------------------------------|
-     | GIC                          | [27-24] |    n    |
-     |--------------------------------------------------|
-     | AdvSIMD                      | [23-20] |    y    |
-     |--------------------------------------------------|
-     | FP                           | [19-16] |    y    |
-     |--------------------------------------------------|
-     | EL3                          | [15-12] |    n    |
-     |--------------------------------------------------|
-     | EL2                          | [11-8]  |    n    |
-     |--------------------------------------------------|
-     | EL1                          | [7-4]   |    n    |
-     |--------------------------------------------------|
-     | EL0                          | [3-0]   |    n    |
-     x--------------------------------------------------x
-
-
-  3) MIDR_EL1 - Main ID Register
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | Implementer                  | [31-24] |    y    |
-     |--------------------------------------------------|
-     | Variant                      | [23-20] |    y    |
-     |--------------------------------------------------|
-     | Architecture                 | [19-16] |    y    |
-     |--------------------------------------------------|
-     | PartNum                      | [15-4]  |    y    |
-     |--------------------------------------------------|
-     | Revision                     | [3-0]   |    y    |
-     x--------------------------------------------------x
-
-   NOTE: The 'visible' fields of MIDR_EL1 will contain the value
-   as available on the CPU where it is fetched and is not a system
-   wide safe value.
-
-  4) ID_AA64ISAR1_EL1 - Instruction set attribute register 1
-
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | GPI                          | [31-28] |    y    |
-     |--------------------------------------------------|
-     | GPA                          | [27-24] |    y    |
-     |--------------------------------------------------|
-     | LRCPC                        | [23-20] |    y    |
-     |--------------------------------------------------|
-     | FCMA                         | [19-16] |    y    |
-     |--------------------------------------------------|
-     | JSCVT                        | [15-12] |    y    |
-     |--------------------------------------------------|
-     | API                          | [11-8]  |    y    |
-     |--------------------------------------------------|
-     | APA                          | [7-4]   |    y    |
-     |--------------------------------------------------|
-     | DPB                          | [3-0]   |    y    |
-     x--------------------------------------------------x
-
-  5) ID_AA64MMFR2_EL1 - Memory model feature register 2
-
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | AT                           | [35-32] |    y    |
-     x--------------------------------------------------x
-
-  6) ID_AA64ZFR0_EL1 - SVE feature ID register 0
-
-     x--------------------------------------------------x
-     | Name                         |  bits   | visible |
-     |--------------------------------------------------|
-     | SM4                          | [43-40] |    y    |
-     |--------------------------------------------------|
-     | SHA3                         | [35-32] |    y    |
-     |--------------------------------------------------|
-     | BitPerm                      | [19-16] |    y    |
-     |--------------------------------------------------|
-     | AES                          | [7-4]   |    y    |
-     |--------------------------------------------------|
-     | SVEVer                       | [3-0]   |    y    |
-     x--------------------------------------------------x
-
-Appendix I: Example
----------------------------
-
-/*
- * Sample program to demonstrate the MRS emulation ABI.
- *
- * Copyright (C) 2015-2016, ARM Ltd
- *
- * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include <asm/hwcap.h>
-#include <stdio.h>
-#include <sys/auxv.h>
-
-#define get_cpu_ftr(id) ({					\
-		unsigned long __val;				\
-		asm("mrs %0, "#id : "=r" (__val));		\
-		printf("%-20s: 0x%016lx\n", #id, __val);	\
-	})
-
-int main(void)
-{
-
-	if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
-		fputs("CPUID registers unavailable\n", stderr);
-		return 1;
-	}
-
-	get_cpu_ftr(ID_AA64ISAR0_EL1);
-	get_cpu_ftr(ID_AA64ISAR1_EL1);
-	get_cpu_ftr(ID_AA64MMFR0_EL1);
-	get_cpu_ftr(ID_AA64MMFR1_EL1);
-	get_cpu_ftr(ID_AA64PFR0_EL1);
-	get_cpu_ftr(ID_AA64PFR1_EL1);
-	get_cpu_ftr(ID_AA64DFR0_EL1);
-	get_cpu_ftr(ID_AA64DFR1_EL1);
-
-	get_cpu_ftr(MIDR_EL1);
-	get_cpu_ftr(MPIDR_EL1);
-	get_cpu_ftr(REVIDR_EL1);
-
-#if 0
-	/* Unexposed register access causes SIGILL */
-	get_cpu_ftr(ID_MMFR0_EL1);
-#endif
-
-	return 0;
-}
-
-
-
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
new file mode 100644
index 000000000000..c7cbf4b571c0
--- /dev/null
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -0,0 +1,201 @@
+================
+ARM64 ELF hwcaps
+================
+
+This document describes the usage and semantics of the arm64 ELF hwcaps.
+
+
+1. Introduction
+---------------
+
+Some hardware or software features are only available on some CPU
+implementations, and/or with certain kernel configurations, but have no
+architected discovery mechanism available to userspace code at EL0. The
+kernel exposes the presence of these features to userspace through a set
+of flags called hwcaps, exposed in the auxilliary vector.
+
+Userspace software can test for features by acquiring the AT_HWCAP or
+AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
+flags are set, e.g.::
+
+	bool floating_point_is_present(void)
+	{
+		unsigned long hwcaps = getauxval(AT_HWCAP);
+		if (hwcaps & HWCAP_FP)
+			return true;
+
+		return false;
+	}
+
+Where software relies on a feature described by a hwcap, it should check
+the relevant hwcap flag to verify that the feature is present before
+attempting to make use of the feature.
+
+Features cannot be probed reliably through other means. When a feature
+is not available, attempting to use it may result in unpredictable
+behaviour, and is not guaranteed to result in any reliable indication
+that the feature is unavailable, such as a SIGILL.
+
+
+2. Interpretation of hwcaps
+---------------------------
+
+The majority of hwcaps are intended to indicate the presence of features
+which are described by architected ID registers inaccessible to
+userspace code at EL0. These hwcaps are defined in terms of ID register
+fields, and should be interpreted with reference to the definition of
+these fields in the ARM Architecture Reference Manual (ARM ARM).
+
+Such hwcaps are described below in the form::
+
+    Functionality implied by idreg.field == val.
+
+Such hwcaps indicate the availability of functionality that the ARM ARM
+defines as being present when idreg.field has value val, but do not
+indicate that idreg.field is precisely equal to val, nor do they
+indicate the absence of functionality implied by other values of
+idreg.field.
+
+Other hwcaps may indicate the presence of features which cannot be
+described by ID registers alone. These may be described without
+reference to ID registers, and may refer to other documentation.
+
+
+3. The hwcaps exposed in AT_HWCAP
+---------------------------------
+
+HWCAP_FP
+    Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
+
+HWCAP_ASIMD
+    Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
+
+HWCAP_EVTSTRM
+    The generic timer is configured to generate events at a frequency of
+    approximately 100KHz.
+
+HWCAP_AES
+    Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
+
+HWCAP_PMULL
+    Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010.
+
+HWCAP_SHA1
+    Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
+
+HWCAP_SHA2
+    Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
+
+HWCAP_CRC32
+    Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
+
+HWCAP_ATOMICS
+    Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
+
+HWCAP_FPHP
+    Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
+
+HWCAP_ASIMDHP
+    Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
+
+HWCAP_CPUID
+    EL0 access to certain ID registers is available, to the extent
+    described by Documentation/arm64/cpu-feature-registers.rst.
+
+    These ID registers may imply the availability of features.
+
+HWCAP_ASIMDRDM
+    Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
+
+HWCAP_JSCVT
+    Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
+
+HWCAP_FCMA
+    Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
+
+HWCAP_LRCPC
+    Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
+
+HWCAP_DCPOP
+    Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
+
+HWCAP2_DCPODP
+
+    Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
+
+HWCAP_SHA3
+    Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
+
+HWCAP_SM3
+    Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
+
+HWCAP_SM4
+    Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
+
+HWCAP_ASIMDDP
+    Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
+
+HWCAP_SHA512
+    Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010.
+
+HWCAP_SVE
+    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.
+
+HWCAP2_SVE2
+
+    Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001.
+
+HWCAP2_SVEAES
+
+    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001.
+
+HWCAP2_SVEPMULL
+
+    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010.
+
+HWCAP2_SVEBITPERM
+
+    Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001.
+
+HWCAP2_SVESHA3
+
+    Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001.
+
+HWCAP2_SVESM4
+
+    Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
+
+HWCAP_ASIMDFHM
+   Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001.
+
+HWCAP_DIT
+    Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001.
+
+HWCAP_USCAT
+    Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001.
+
+HWCAP_ILRCPC
+    Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010.
+
+HWCAP_FLAGM
+    Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
+
+HWCAP_SSBS
+    Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
+
+HWCAP_PACA
+    Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or
+    ID_AA64ISAR1_EL1.API == 0b0001, as described by
+    Documentation/arm64/pointer-authentication.rst.
+
+HWCAP_PACG
+    Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or
+    ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
+    Documentation/arm64/pointer-authentication.rst.
+
+
+4. Unused AT_HWCAP bits
+-----------------------
+
+For interoperation with userspace, the kernel guarantees that bits 62
+and 63 of AT_HWCAP will always be returned as 0.
diff --git a/Documentation/arm64/elf_hwcaps.txt b/Documentation/arm64/elf_hwcaps.txt
deleted file mode 100644
index b73a2519ecf2..000000000000
--- a/Documentation/arm64/elf_hwcaps.txt
+++ /dev/null
@@ -1,231 +0,0 @@
-ARM64 ELF hwcaps
-================
-
-This document describes the usage and semantics of the arm64 ELF hwcaps.
-
-
-1. Introduction
----------------
-
-Some hardware or software features are only available on some CPU
-implementations, and/or with certain kernel configurations, but have no
-architected discovery mechanism available to userspace code at EL0. The
-kernel exposes the presence of these features to userspace through a set
-of flags called hwcaps, exposed in the auxilliary vector.
-
-Userspace software can test for features by acquiring the AT_HWCAP or
-AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
-flags are set, e.g.
-
-bool floating_point_is_present(void)
-{
-	unsigned long hwcaps = getauxval(AT_HWCAP);
-	if (hwcaps & HWCAP_FP)
-		return true;
-
-	return false;
-}
-
-Where software relies on a feature described by a hwcap, it should check
-the relevant hwcap flag to verify that the feature is present before
-attempting to make use of the feature.
-
-Features cannot be probed reliably through other means. When a feature
-is not available, attempting to use it may result in unpredictable
-behaviour, and is not guaranteed to result in any reliable indication
-that the feature is unavailable, such as a SIGILL.
-
-
-2. Interpretation of hwcaps
----------------------------
-
-The majority of hwcaps are intended to indicate the presence of features
-which are described by architected ID registers inaccessible to
-userspace code at EL0. These hwcaps are defined in terms of ID register
-fields, and should be interpreted with reference to the definition of
-these fields in the ARM Architecture Reference Manual (ARM ARM).
-
-Such hwcaps are described below in the form:
-
-    Functionality implied by idreg.field == val.
-
-Such hwcaps indicate the availability of functionality that the ARM ARM
-defines as being present when idreg.field has value val, but do not
-indicate that idreg.field is precisely equal to val, nor do they
-indicate the absence of functionality implied by other values of
-idreg.field.
-
-Other hwcaps may indicate the presence of features which cannot be
-described by ID registers alone. These may be described without
-reference to ID registers, and may refer to other documentation.
-
-
-3. The hwcaps exposed in AT_HWCAP
----------------------------------
-
-HWCAP_FP
-
-    Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
-
-HWCAP_ASIMD
-
-    Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
-
-HWCAP_EVTSTRM
-
-    The generic timer is configured to generate events at a frequency of
-    approximately 100KHz.
-
-HWCAP_AES
-
-    Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
-
-HWCAP_PMULL
-
-    Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010.
-
-HWCAP_SHA1
-
-    Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
-
-HWCAP_SHA2
-
-    Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
-
-HWCAP_CRC32
-
-    Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
-
-HWCAP_ATOMICS
-
-    Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
-
-HWCAP_FPHP
-
-    Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
-
-HWCAP_ASIMDHP
-
-    Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
-
-HWCAP_CPUID
-
-    EL0 access to certain ID registers is available, to the extent
-    described by Documentation/arm64/cpu-feature-registers.txt.
-
-    These ID registers may imply the availability of features.
-
-HWCAP_ASIMDRDM
-
-    Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
-
-HWCAP_JSCVT
-
-    Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
-
-HWCAP_FCMA
-
-    Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
-
-HWCAP_LRCPC
-
-    Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
-
-HWCAP_DCPOP
-
-    Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
-
-HWCAP2_DCPODP
-
-    Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
-
-HWCAP_SHA3
-
-    Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
-
-HWCAP_SM3
-
-    Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
-
-HWCAP_SM4
-
-    Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
-
-HWCAP_ASIMDDP
-
-    Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
-
-HWCAP_SHA512
-
-    Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010.
-
-HWCAP_SVE
-
-    Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.
-
-HWCAP2_SVE2
-
-    Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001.
-
-HWCAP2_SVEAES
-
-    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001.
-
-HWCAP2_SVEPMULL
-
-    Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010.
-
-HWCAP2_SVEBITPERM
-
-    Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001.
-
-HWCAP2_SVESHA3
-
-    Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001.
-
-HWCAP2_SVESM4
-
-    Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001.
-
-HWCAP_ASIMDFHM
-
-   Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001.
-
-HWCAP_DIT
-
-    Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001.
-
-HWCAP_USCAT
-
-    Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001.
-
-HWCAP_ILRCPC
-
-    Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010.
-
-HWCAP_FLAGM
-
-    Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
-
-HWCAP_SSBS
-
-    Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
-
-HWCAP_PACA
-
-    Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or
-    ID_AA64ISAR1_EL1.API == 0b0001, as described by
-    Documentation/arm64/pointer-authentication.txt.
-
-HWCAP_PACG
-
-    Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or
-    ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
-    Documentation/arm64/pointer-authentication.txt.
-
-
-4. Unused AT_HWCAP bits
------------------------
-
-For interoperation with userspace, the kernel guarantees that bits 62
-and 63 of AT_HWCAP will always be returned as 0.
diff --git a/Documentation/arm64/hugetlbpage.rst b/Documentation/arm64/hugetlbpage.rst
new file mode 100644
index 000000000000..b44f939e5210
--- /dev/null
+++ b/Documentation/arm64/hugetlbpage.rst
@@ -0,0 +1,41 @@
+====================
+HugeTLBpage on ARM64
+====================
+
+Hugepage relies on making efficient use of TLBs to improve performance of
+address translations. The benefit depends on both -
+
+  - the size of hugepages
+  - size of entries supported by the TLBs
+
+The ARM64 port supports two flavours of hugepages.
+
+1) Block mappings at the pud/pmd level
+--------------------------------------
+
+These are regular hugepages where a pmd or a pud page table entry points to a
+block of memory. Regardless of the supported size of entries in TLB, block
+mappings reduce the depth of page table walk needed to translate hugepage
+addresses.
+
+2) Using the Contiguous bit
+---------------------------
+
+The architecture provides a contiguous bit in the translation table entries
+(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a
+contiguous set of entries that can be cached in a single TLB entry.
+
+The contiguous bit is used in Linux to increase the mapping size at the pmd and
+pte (last) level. The number of supported contiguous entries varies by page size
+and level of the page table.
+
+
+The following hugepage sizes are supported -
+
+  ====== ========   ====    ========    ===
+  -      CONT PTE    PMD    CONT PMD    PUD
+  ====== ========   ====    ========    ===
+  4K:         64K     2M         32M     1G
+  16K:         2M    32M          1G
+  64K:         2M   512M         16G
+  ====== ========   ====    ========    ===
diff --git a/Documentation/arm64/hugetlbpage.txt b/Documentation/arm64/hugetlbpage.txt
deleted file mode 100644
index cfae87dc653b..000000000000
--- a/Documentation/arm64/hugetlbpage.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-HugeTLBpage on ARM64
-====================
-
-Hugepage relies on making efficient use of TLBs to improve performance of
-address translations. The benefit depends on both -
-
-  - the size of hugepages
-  - size of entries supported by the TLBs
-
-The ARM64 port supports two flavours of hugepages.
-
-1) Block mappings at the pud/pmd level
---------------------------------------
-
-These are regular hugepages where a pmd or a pud page table entry points to a
-block of memory. Regardless of the supported size of entries in TLB, block
-mappings reduce the depth of page table walk needed to translate hugepage
-addresses.
-
-2) Using the Contiguous bit
----------------------------
-
-The architecture provides a contiguous bit in the translation table entries
-(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a
-contiguous set of entries that can be cached in a single TLB entry.
-
-The contiguous bit is used in Linux to increase the mapping size at the pmd and
-pte (last) level. The number of supported contiguous entries varies by page size
-and level of the page table.
-
-
-The following hugepage sizes are supported -
-
-         CONT PTE    PMD    CONT PMD    PUD
-         --------    ---    --------    ---
-  4K:         64K     2M         32M     1G
-  16K:         2M    32M          1G
-  64K:         2M   512M         16G
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
new file mode 100644
index 000000000000..018b7836ecb7
--- /dev/null
+++ b/Documentation/arm64/index.rst
@@ -0,0 +1,28 @@
+:orphan:
+
+==================
+ARM64 Architecture
+==================
+
+.. toctree::
+    :maxdepth: 1
+
+    acpi_object_usage
+    arm-acpi
+    booting
+    cpu-feature-registers
+    elf_hwcaps
+    hugetlbpage
+    legacy_instructions
+    memory
+    pointer-authentication
+    silicon-errata
+    sve
+    tagged-pointers
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/arm64/legacy_instructions.rst b/Documentation/arm64/legacy_instructions.rst
new file mode 100644
index 000000000000..54401b22cb8f
--- /dev/null
+++ b/Documentation/arm64/legacy_instructions.rst
@@ -0,0 +1,68 @@
+===================
+Legacy instructions
+===================
+
+The arm64 port of the Linux kernel provides infrastructure to support
+emulation of instructions which have been deprecated, or obsoleted in
+the architecture. The infrastructure code uses undefined instruction
+hooks to support emulation. Where available it also allows turning on
+the instruction execution in hardware.
+
+The emulation mode can be controlled by writing to sysctl nodes
+(/proc/sys/abi). The following explains the different execution
+behaviours and the corresponding values of the sysctl nodes -
+
+* Undef
+    Value: 0
+
+  Generates undefined instruction abort. Default for instructions that
+  have been obsoleted in the architecture, e.g., SWP
+
+* Emulate
+    Value: 1
+
+  Uses software emulation. To aid migration of software, in this mode
+  usage of emulated instruction is traced as well as rate limited
+  warnings are issued. This is the default for deprecated
+  instructions, .e.g., CP15 barriers
+
+* Hardware Execution
+    Value: 2
+
+  Although marked as deprecated, some implementations may support the
+  enabling/disabling of hardware support for the execution of these
+  instructions. Using hardware execution generally provides better
+  performance, but at the loss of ability to gather runtime statistics
+  about the use of the deprecated instructions.
+
+The default mode depends on the status of the instruction in the
+architecture. Deprecated instructions should default to emulation
+while obsolete instructions must be undefined by default.
+
+Note: Instruction emulation may not be possible in all cases. See
+individual instruction notes for further information.
+
+Supported legacy instructions
+-----------------------------
+* SWP{B}
+
+:Node: /proc/sys/abi/swp
+:Status: Obsolete
+:Default: Undef (0)
+
+* CP15 Barriers
+
+:Node: /proc/sys/abi/cp15_barrier
+:Status: Deprecated
+:Default: Emulate (1)
+
+* SETEND
+
+:Node: /proc/sys/abi/setend
+:Status: Deprecated
+:Default: Emulate (1)*
+
+  Note: All the cpus on the system must have mixed endian support at EL0
+  for this feature to be enabled. If a new CPU - which doesn't support mixed
+  endian - is hotplugged in after this feature has been enabled, there could
+  be unexpected results in the application.
diff --git a/Documentation/arm64/legacy_instructions.txt b/Documentation/arm64/legacy_instructions.txt
deleted file mode 100644
index 01bf3d9fac85..000000000000
--- a/Documentation/arm64/legacy_instructions.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-The arm64 port of the Linux kernel provides infrastructure to support
-emulation of instructions which have been deprecated, or obsoleted in
-the architecture. The infrastructure code uses undefined instruction
-hooks to support emulation. Where available it also allows turning on
-the instruction execution in hardware.
-
-The emulation mode can be controlled by writing to sysctl nodes
-(/proc/sys/abi). The following explains the different execution
-behaviours and the corresponding values of the sysctl nodes -
-
-* Undef
-  Value: 0
-  Generates undefined instruction abort. Default for instructions that
-  have been obsoleted in the architecture, e.g., SWP
-
-* Emulate
-  Value: 1
-  Uses software emulation. To aid migration of software, in this mode
-  usage of emulated instruction is traced as well as rate limited
-  warnings are issued. This is the default for deprecated
-  instructions, .e.g., CP15 barriers
-
-* Hardware Execution
-  Value: 2
-  Although marked as deprecated, some implementations may support the
-  enabling/disabling of hardware support for the execution of these
-  instructions. Using hardware execution generally provides better
-  performance, but at the loss of ability to gather runtime statistics
-  about the use of the deprecated instructions.
-
-The default mode depends on the status of the instruction in the
-architecture. Deprecated instructions should default to emulation
-while obsolete instructions must be undefined by default.
-
-Note: Instruction emulation may not be possible in all cases. See
-individual instruction notes for further information.
-
-Supported legacy instructions
------------------------------
-* SWP{B}
-Node: /proc/sys/abi/swp
-Status: Obsolete
-Default: Undef (0)
-
-* CP15 Barriers
-Node: /proc/sys/abi/cp15_barrier
-Status: Deprecated
-Default: Emulate (1)
-
-* SETEND
-Node: /proc/sys/abi/setend
-Status: Deprecated
-Default: Emulate (1)*
-Note: All the cpus on the system must have mixed endian support at EL0
-for this feature to be enabled. If a new CPU - which doesn't support mixed
-endian - is hotplugged in after this feature has been enabled, there could
-be unexpected results in the application.
diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
new file mode 100644
index 000000000000..464b880fc4b7
--- /dev/null
+++ b/Documentation/arm64/memory.rst
@@ -0,0 +1,98 @@
+==============================
+Memory Layout on AArch64 Linux
+==============================
+
+Author: Catalin Marinas <catalin.marinas@arm.com>
+
+This document describes the virtual memory layout used by the AArch64
+Linux kernel. The architecture allows up to 4 levels of translation
+tables with a 4KB page size and up to 3 levels with a 64KB page size.
+
+AArch64 Linux uses either 3 levels or 4 levels of translation tables
+with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
+(256TB) virtual addresses, respectively, for both user and kernel. With
+64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
+virtual address, are used but the memory layout is the same.
+
+User addresses have bits 63:48 set to 0 while the kernel addresses have
+the same bits set to 1. TTBRx selection is given by bit 63 of the
+virtual address. The swapper_pg_dir contains only kernel (global)
+mappings while the user pgd contains only user (non-global) mappings.
+The swapper_pg_dir address is written to TTBR1 and never written to
+TTBR0.
+
+
+AArch64 Linux memory layout with 4KB pages + 3 levels::
+
+  Start			End			Size		Use
+  -----------------------------------------------------------------------
+  0000000000000000	0000007fffffffff	 512GB		user
+  ffffff8000000000	ffffffffffffffff	 512GB		kernel
+
+
+AArch64 Linux memory layout with 4KB pages + 4 levels::
+
+  Start			End			Size		Use
+  -----------------------------------------------------------------------
+  0000000000000000	0000ffffffffffff	 256TB		user
+  ffff000000000000	ffffffffffffffff	 256TB		kernel
+
+
+AArch64 Linux memory layout with 64KB pages + 2 levels::
+
+  Start			End			Size		Use
+  -----------------------------------------------------------------------
+  0000000000000000	000003ffffffffff	   4TB		user
+  fffffc0000000000	ffffffffffffffff	   4TB		kernel
+
+
+AArch64 Linux memory layout with 64KB pages + 3 levels::
+
+  Start			End			Size		Use
+  -----------------------------------------------------------------------
+  0000000000000000	0000ffffffffffff	 256TB		user
+  ffff000000000000	ffffffffffffffff	 256TB		kernel
+
+
+For details of the virtual kernel memory layout please see the kernel
+booting log.
+
+
+Translation table lookup with 4KB pages::
+
+  +--------+--------+--------+--------+--------+--------+--------+--------+
+  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
+  +--------+--------+--------+--------+--------+--------+--------+--------+
+   |                 |         |         |         |         |
+   |                 |         |         |         |         v
+   |                 |         |         |         |   [11:0]  in-page offset
+   |                 |         |         |         +-> [20:12] L3 index
+   |                 |         |         +-----------> [29:21] L2 index
+   |                 |         +---------------------> [38:30] L1 index
+   |                 +-------------------------------> [47:39] L0 index
+   +-------------------------------------------------> [63] TTBR0/1
+
+
+Translation table lookup with 64KB pages::
+
+  +--------+--------+--------+--------+--------+--------+--------+--------+
+  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
+  +--------+--------+--------+--------+--------+--------+--------+--------+
+   |                 |    |               |              |
+   |                 |    |               |              v
+   |                 |    |               |            [15:0]  in-page offset
+   |                 |    |               +----------> [28:16] L3 index
+   |                 |    +--------------------------> [41:29] L2 index
+   |                 +-------------------------------> [47:42] L1 index
+   +-------------------------------------------------> [63] TTBR0/1
+
+
+When using KVM without the Virtualization Host Extensions, the
+hypervisor maps kernel pages in EL2 at a fixed (and potentially
+random) offset from the linear mapping. See the kern_hyp_va macro and
+kvm_update_va_mask function for more details. MMIO devices such as
+GICv2 gets mapped next to the HYP idmap page, as do vectors when
+ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
+
+When using KVM with the Virtualization Host Extensions, no additional
+mappings are created, since the host kernel runs directly in EL2.
diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
deleted file mode 100644
index c5dab30d3389..000000000000
--- a/Documentation/arm64/memory.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-		     Memory Layout on AArch64 Linux
-		     ==============================
-
-Author: Catalin Marinas <catalin.marinas@arm.com>
-
-This document describes the virtual memory layout used by the AArch64
-Linux kernel. The architecture allows up to 4 levels of translation
-tables with a 4KB page size and up to 3 levels with a 64KB page size.
-
-AArch64 Linux uses either 3 levels or 4 levels of translation tables
-with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
-(256TB) virtual addresses, respectively, for both user and kernel. With
-64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
-virtual address, are used but the memory layout is the same.
-
-User addresses have bits 63:48 set to 0 while the kernel addresses have
-the same bits set to 1. TTBRx selection is given by bit 63 of the
-virtual address. The swapper_pg_dir contains only kernel (global)
-mappings while the user pgd contains only user (non-global) mappings.
-The swapper_pg_dir address is written to TTBR1 and never written to
-TTBR0.
-
-
-AArch64 Linux memory layout with 4KB pages + 3 levels:
-
-Start			End			Size		Use
------------------------------------------------------------------------
-0000000000000000	0000007fffffffff	 512GB		user
-ffffff8000000000	ffffffffffffffff	 512GB		kernel
-
-
-AArch64 Linux memory layout with 4KB pages + 4 levels:
-
-Start			End			Size		Use
------------------------------------------------------------------------
-0000000000000000	0000ffffffffffff	 256TB		user
-ffff000000000000	ffffffffffffffff	 256TB		kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 2 levels:
-
-Start			End			Size		Use
------------------------------------------------------------------------
-0000000000000000	000003ffffffffff	   4TB		user
-fffffc0000000000	ffffffffffffffff	   4TB		kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 3 levels:
-
-Start			End			Size		Use
------------------------------------------------------------------------
-0000000000000000	0000ffffffffffff	 256TB		user
-ffff000000000000	ffffffffffffffff	 256TB		kernel
-
-
-For details of the virtual kernel memory layout please see the kernel
-booting log.
-
-
-Translation table lookup with 4KB pages:
-
-+--------+--------+--------+--------+--------+--------+--------+--------+
-|63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
-+--------+--------+--------+--------+--------+--------+--------+--------+
- |                 |         |         |         |         |
- |                 |         |         |         |         v
- |                 |         |         |         |   [11:0]  in-page offset
- |                 |         |         |         +-> [20:12] L3 index
- |                 |         |         +-----------> [29:21] L2 index
- |                 |         +---------------------> [38:30] L1 index
- |                 +-------------------------------> [47:39] L0 index
- +-------------------------------------------------> [63] TTBR0/1
-
-
-Translation table lookup with 64KB pages:
-
-+--------+--------+--------+--------+--------+--------+--------+--------+
-|63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
-+--------+--------+--------+--------+--------+--------+--------+--------+
- |                 |    |               |              |
- |                 |    |               |              v
- |                 |    |               |            [15:0]  in-page offset
- |                 |    |               +----------> [28:16] L3 index
- |                 |    +--------------------------> [41:29] L2 index
- |                 +-------------------------------> [47:42] L1 index
- +-------------------------------------------------> [63] TTBR0/1
-
-
-When using KVM without the Virtualization Host Extensions, the
-hypervisor maps kernel pages in EL2 at a fixed (and potentially
-random) offset from the linear mapping. See the kern_hyp_va macro and
-kvm_update_va_mask function for more details. MMIO devices such as
-GICv2 gets mapped next to the HYP idmap page, as do vectors when
-ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
-
-When using KVM with the Virtualization Host Extensions, no additional
-mappings are created, since the host kernel runs directly in EL2.
diff --git a/Documentation/arm64/pointer-authentication.rst b/Documentation/arm64/pointer-authentication.rst
new file mode 100644
index 000000000000..30b2ab06526b
--- /dev/null
+++ b/Documentation/arm64/pointer-authentication.rst
@@ -0,0 +1,109 @@
+=======================================
+Pointer authentication in AArch64 Linux
+=======================================
+
+Author: Mark Rutland <mark.rutland@arm.com>
+
+Date: 2017-07-19
+
+This document briefly describes the provision of pointer authentication
+functionality in AArch64 Linux.
+
+
+Architecture overview
+---------------------
+
+The ARMv8.3 Pointer Authentication extension adds primitives that can be
+used to mitigate certain classes of attack where an attacker can corrupt
+the contents of some memory (e.g. the stack).
+
+The extension uses a Pointer Authentication Code (PAC) to determine
+whether pointers have been modified unexpectedly. A PAC is derived from
+a pointer, another value (such as the stack pointer), and a secret key
+held in system registers.
+
+The extension adds instructions to insert a valid PAC into a pointer,
+and to verify/remove the PAC from a pointer. The PAC occupies a number
+of high-order bits of the pointer, which varies dependent on the
+configured virtual address size and whether pointer tagging is in use.
+
+A subset of these instructions have been allocated from the HINT
+encoding space. In the absence of the extension (or when disabled),
+these instructions behave as NOPs. Applications and libraries using
+these instructions operate correctly regardless of the presence of the
+extension.
+
+The extension provides five separate keys to generate PACs - two for
+instruction addresses (APIAKey, APIBKey), two for data addresses
+(APDAKey, APDBKey), and one for generic authentication (APGAKey).
+
+
+Basic support
+-------------
+
+When CONFIG_ARM64_PTR_AUTH is selected, and relevant HW support is
+present, the kernel will assign random key values to each process at
+exec*() time. The keys are shared by all threads within the process, and
+are preserved across fork().
+
+Presence of address authentication functionality is advertised via
+HWCAP_PACA, and generic authentication functionality via HWCAP_PACG.
+
+The number of bits that the PAC occupies in a pointer is 55 minus the
+virtual address size configured by the kernel. For example, with a
+virtual address size of 48, the PAC is 7 bits wide.
+
+Recent versions of GCC can compile code with APIAKey-based return
+address protection when passed the -msign-return-address option. This
+uses instructions in the HINT space (unless -march=armv8.3-a or higher
+is also passed), and such code can run on systems without the pointer
+authentication extension.
+
+In addition to exec(), keys can also be reinitialized to random values
+using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY,
+PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY and PR_PAC_APGAKEY
+specifies which keys are to be reinitialized; specifying 0 means "all
+keys".
+
+
+Debugging
+---------
+
+When CONFIG_ARM64_PTR_AUTH is selected, and HW support for address
+authentication is present, the kernel will expose the position of TTBR0
+PAC bits in the NT_ARM_PAC_MASK regset (struct user_pac_mask), which
+userspace can acquire via PTRACE_GETREGSET.
+
+The regset is exposed only when HWCAP_PACA is set. Separate masks are
+exposed for data pointers and instruction pointers, as the set of PAC
+bits can vary between the two. Note that the masks apply to TTBR0
+addresses, and are not valid to apply to TTBR1 addresses (e.g. kernel
+pointers).
+
+Additionally, when CONFIG_CHECKPOINT_RESTORE is also set, the kernel
+will expose the NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS regsets (struct
+user_pac_address_keys and struct user_pac_generic_keys). These can be
+used to get and set the keys for a thread.
+
+
+Virtualization
+--------------
+
+Pointer authentication is enabled in KVM guest when each virtual cpu is
+initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and
+requesting these two separate cpu features to be enabled. The current KVM
+guest implementation works by enabling both features together, so both
+these userspace flags are checked before enabling pointer authentication.
+The separate userspace flag will allow to have no userspace ABI changes
+if support is added in the future to allow these two features to be
+enabled independently of one another.
+
+As Arm Architecture specifies that Pointer Authentication feature is
+implemented along with the VHE feature so KVM arm64 ptrauth code relies
+on VHE mode to be present.
+
+Additionally, when these vcpu feature flags are not set then KVM will
+filter out the Pointer Authentication system key registers from
+KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
+register. Any attempt to use the Pointer Authentication instructions will
+result in an UNDEFINED exception being injected into the guest.
diff --git a/Documentation/arm64/pointer-authentication.txt b/Documentation/arm64/pointer-authentication.txt
deleted file mode 100644
index fc71b33de87e..000000000000
--- a/Documentation/arm64/pointer-authentication.txt
+++ /dev/null
@@ -1,107 +0,0 @@
-Pointer authentication in AArch64 Linux
-=======================================
-
-Author: Mark Rutland <mark.rutland@arm.com>
-Date: 2017-07-19
-
-This document briefly describes the provision of pointer authentication
-functionality in AArch64 Linux.
-
-
-Architecture overview
----------------------
-
-The ARMv8.3 Pointer Authentication extension adds primitives that can be
-used to mitigate certain classes of attack where an attacker can corrupt
-the contents of some memory (e.g. the stack).
-
-The extension uses a Pointer Authentication Code (PAC) to determine
-whether pointers have been modified unexpectedly. A PAC is derived from
-a pointer, another value (such as the stack pointer), and a secret key
-held in system registers.
-
-The extension adds instructions to insert a valid PAC into a pointer,
-and to verify/remove the PAC from a pointer. The PAC occupies a number
-of high-order bits of the pointer, which varies dependent on the
-configured virtual address size and whether pointer tagging is in use.
-
-A subset of these instructions have been allocated from the HINT
-encoding space. In the absence of the extension (or when disabled),
-these instructions behave as NOPs. Applications and libraries using
-these instructions operate correctly regardless of the presence of the
-extension.
-
-The extension provides five separate keys to generate PACs - two for
-instruction addresses (APIAKey, APIBKey), two for data addresses
-(APDAKey, APDBKey), and one for generic authentication (APGAKey).
-
-
-Basic support
--------------
-
-When CONFIG_ARM64_PTR_AUTH is selected, and relevant HW support is
-present, the kernel will assign random key values to each process at
-exec*() time. The keys are shared by all threads within the process, and
-are preserved across fork().
-
-Presence of address authentication functionality is advertised via
-HWCAP_PACA, and generic authentication functionality via HWCAP_PACG.
-
-The number of bits that the PAC occupies in a pointer is 55 minus the
-virtual address size configured by the kernel. For example, with a
-virtual address size of 48, the PAC is 7 bits wide.
-
-Recent versions of GCC can compile code with APIAKey-based return
-address protection when passed the -msign-return-address option. This
-uses instructions in the HINT space (unless -march=armv8.3-a or higher
-is also passed), and such code can run on systems without the pointer
-authentication extension.
-
-In addition to exec(), keys can also be reinitialized to random values
-using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY,
-PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY and PR_PAC_APGAKEY
-specifies which keys are to be reinitialized; specifying 0 means "all
-keys".
-
-
-Debugging
----------
-
-When CONFIG_ARM64_PTR_AUTH is selected, and HW support for address
-authentication is present, the kernel will expose the position of TTBR0
-PAC bits in the NT_ARM_PAC_MASK regset (struct user_pac_mask), which
-userspace can acquire via PTRACE_GETREGSET.
-
-The regset is exposed only when HWCAP_PACA is set. Separate masks are
-exposed for data pointers and instruction pointers, as the set of PAC
-bits can vary between the two. Note that the masks apply to TTBR0
-addresses, and are not valid to apply to TTBR1 addresses (e.g. kernel
-pointers).
-
-Additionally, when CONFIG_CHECKPOINT_RESTORE is also set, the kernel
-will expose the NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS regsets (struct
-user_pac_address_keys and struct user_pac_generic_keys). These can be
-used to get and set the keys for a thread.
-
-
-Virtualization
---------------
-
-Pointer authentication is enabled in KVM guest when each virtual cpu is
-initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and
-requesting these two separate cpu features to be enabled. The current KVM
-guest implementation works by enabling both features together, so both
-these userspace flags are checked before enabling pointer authentication.
-The separate userspace flag will allow to have no userspace ABI changes
-if support is added in the future to allow these two features to be
-enabled independently of one another.
-
-As Arm Architecture specifies that Pointer Authentication feature is
-implemented along with the VHE feature so KVM arm64 ptrauth code relies
-on VHE mode to be present.
-
-Additionally, when these vcpu feature flags are not set then KVM will
-filter out the Pointer Authentication system key registers from
-KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
-register. Any attempt to use the Pointer Authentication instructions will
-result in an UNDEFINED exception being injected into the guest.
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
new file mode 100644
index 000000000000..c792774be59e
--- /dev/null
+++ b/Documentation/arm64/silicon-errata.rst
@@ -0,0 +1,131 @@
+=======================================
+Silicon Errata and Software Workarounds
+=======================================
+
+Author: Will Deacon <will.deacon@arm.com>
+
+Date  : 27 November 2015
+
+It is an unfortunate fact of life that hardware is often produced with
+so-called "errata", which can cause it to deviate from the architecture
+under specific circumstances.  For hardware produced by ARM, these
+errata are broadly classified into the following categories:
+
+  ==========  ========================================================
+  Category A  A critical error without a viable workaround.
+  Category B  A significant or critical error with an acceptable
+              workaround.
+  Category C  A minor error that is not expected to occur under normal
+              operation.
+  ==========  ========================================================
+
+For more information, consult one of the "Software Developers Errata
+Notice" documents available on infocenter.arm.com (registration
+required).
+
+As far as Linux is concerned, Category B errata may require some special
+treatment in the operating system. For example, avoiding a particular
+sequence of code, or configuring the processor in a particular way. A
+less common situation may require similar actions in order to declassify
+a Category A erratum into a Category C erratum. These are collectively
+known as "software workarounds" and are only required in the minority of
+cases (e.g. those cases that both require a non-secure workaround *and*
+can be triggered by Linux).
+
+For software workarounds that may adversely impact systems unaffected by
+the erratum in question, a Kconfig entry is added under "Kernel
+Features" -> "ARM errata workarounds via the alternatives framework".
+These are enabled by default and patched in at runtime when an affected
+CPU is detected. For less-intrusive workarounds, a Kconfig option is not
+available and the code is structured (preferably with a comment) in such
+a way that the erratum will not be hit.
+
+This approach can make it slightly onerous to determine exactly which
+errata are worked around in an arbitrary kernel source tree, so this
+file acts as a registry of software workarounds in the Linux Kernel and
+will be updated when new workarounds are committed and backported to
+stable kernels.
+
++----------------+-----------------+-----------------+-----------------------------+
+| Implementor    | Component       | Erratum ID      | Kconfig                     |
++================+=================+=================+=============================+
+| Allwinner      | A64/R18         | UNKNOWN1        | SUN50I_ERRATUM_UNKNOWN1     |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #824069         | ARM64_ERRATUM_824069        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #819472         | ARM64_ERRATUM_819472        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #845719         | ARM64_ERRATUM_845719        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A53      | #843419         | ARM64_ERRATUM_843419        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A57      | #852523         | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A72      | #853709         | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A73      | #858921         | ARM64_ERRATUM_858921        |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A55      | #1024718        | ARM64_ERRATUM_1024718       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A76      | #1188873,1418040| ARM64_ERRATUM_1418040       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A76      | #1165522        | ARM64_ERRATUM_1165522       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A76      | #1286807        | ARM64_ERRATUM_1286807       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Cortex-A76      | #1463225        | ARM64_ERRATUM_1463225       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM            | MMU-500         | #841119,826419  | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX ITS    | #22375,24313    | CAVIUM_ERRATUM_22375        |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144        |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154        |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Hisilicon      | Hip0{5,6,7}     | #161010101      | HISILICON_ERRATUM_161010101 |
++----------------+-----------------+-----------------+-----------------------------+
+| Hisilicon      | Hip0{6,7}       | #161010701      | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
+| Hisilicon      | Hip07           | #161600802      | HISILICON_ERRATUM_161600802 |
++----------------+-----------------+-----------------+-----------------------------+
+| Hisilicon      | Hip08 SMMU PMCG | #162001800      | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Qualcomm Tech. | Kryo/Falkor v1  | E1003           | QCOM_FALKOR_ERRATUM_1003    |
++----------------+-----------------+-----------------+-----------------------------+
+| Qualcomm Tech. | Falkor v1       | E1009           | QCOM_FALKOR_ERRATUM_1009    |
++----------------+-----------------+-----------------+-----------------------------+
+| Qualcomm Tech. | QDF2400 ITS     | E0065           | QCOM_QDF2400_ERRATUM_0065   |
++----------------+-----------------+-----------------+-----------------------------+
+| Qualcomm Tech. | Falkor v{1,2}   | E1041           | QCOM_FALKOR_ERRATUM_1041    |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Fujitsu        | A64FX           | E#010001        | FUJITSU_ERRATUM_010001      |
++----------------+-----------------+-----------------+-----------------------------+
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
deleted file mode 100644
index 2735462d5958..000000000000
--- a/Documentation/arm64/silicon-errata.txt
+++ /dev/null
@@ -1,88 +0,0 @@
-                Silicon Errata and Software Workarounds
-                =======================================
-
-Author: Will Deacon <will.deacon@arm.com>
-Date  : 27 November 2015
-
-It is an unfortunate fact of life that hardware is often produced with
-so-called "errata", which can cause it to deviate from the architecture
-under specific circumstances.  For hardware produced by ARM, these
-errata are broadly classified into the following categories:
-
-  Category A: A critical error without a viable workaround.
-  Category B: A significant or critical error with an acceptable
-              workaround.
-  Category C: A minor error that is not expected to occur under normal
-              operation.
-
-For more information, consult one of the "Software Developers Errata
-Notice" documents available on infocenter.arm.com (registration
-required).
-
-As far as Linux is concerned, Category B errata may require some special
-treatment in the operating system. For example, avoiding a particular
-sequence of code, or configuring the processor in a particular way. A
-less common situation may require similar actions in order to declassify
-a Category A erratum into a Category C erratum. These are collectively
-known as "software workarounds" and are only required in the minority of
-cases (e.g. those cases that both require a non-secure workaround *and*
-can be triggered by Linux).
-
-For software workarounds that may adversely impact systems unaffected by
-the erratum in question, a Kconfig entry is added under "Kernel
-Features" -> "ARM errata workarounds via the alternatives framework".
-These are enabled by default and patched in at runtime when an affected
-CPU is detected. For less-intrusive workarounds, a Kconfig option is not
-available and the code is structured (preferably with a comment) in such
-a way that the erratum will not be hit.
-
-This approach can make it slightly onerous to determine exactly which
-errata are worked around in an arbitrary kernel source tree, so this
-file acts as a registry of software workarounds in the Linux Kernel and
-will be updated when new workarounds are committed and backported to
-stable kernels.
-
-| Implementor    | Component       | Erratum ID      | Kconfig                     |
-+----------------+-----------------+-----------------+-----------------------------+
-| Allwinner      | A64/R18         | UNKNOWN1        | SUN50I_ERRATUM_UNKNOWN1     |
-|                |                 |                 |                             |
-| ARM            | Cortex-A53      | #826319         | ARM64_ERRATUM_826319        |
-| ARM            | Cortex-A53      | #827319         | ARM64_ERRATUM_827319        |
-| ARM            | Cortex-A53      | #824069         | ARM64_ERRATUM_824069        |
-| ARM            | Cortex-A53      | #819472         | ARM64_ERRATUM_819472        |
-| ARM            | Cortex-A53      | #845719         | ARM64_ERRATUM_845719        |
-| ARM            | Cortex-A53      | #843419         | ARM64_ERRATUM_843419        |
-| ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075        |
-| ARM            | Cortex-A57      | #852523         | N/A                         |
-| ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220        |
-| ARM            | Cortex-A72      | #853709         | N/A                         |
-| ARM            | Cortex-A73      | #858921         | ARM64_ERRATUM_858921        |
-| ARM            | Cortex-A55      | #1024718        | ARM64_ERRATUM_1024718       |
-| ARM            | Cortex-A76      | #1188873,1418040| ARM64_ERRATUM_1418040       |
-| ARM            | Cortex-A76      | #1165522        | ARM64_ERRATUM_1165522       |
-| ARM            | Cortex-A76      | #1286807        | ARM64_ERRATUM_1286807       |
-| ARM            | Cortex-A76      | #1463225        | ARM64_ERRATUM_1463225       |
-| ARM            | Neoverse-N1     | #1188873,1418040| ARM64_ERRATUM_1418040       |
-| ARM            | MMU-500         | #841119,826419  | N/A                         |
-|                |                 |                 |                             |
-| Cavium         | ThunderX ITS    | #22375,24313    | CAVIUM_ERRATUM_22375        |
-| Cavium         | ThunderX ITS    | #23144          | CAVIUM_ERRATUM_23144        |
-| Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154        |
-| Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456        |
-| Cavium         | ThunderX Core   | #30115          | CAVIUM_ERRATUM_30115        |
-| Cavium         | ThunderX SMMUv2 | #27704          | N/A                         |
-| Cavium         | ThunderX2 SMMUv3| #74             | N/A                         |
-| Cavium         | ThunderX2 SMMUv3| #126            | N/A                         |
-|                |                 |                 |                             |
-| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
-|                |                 |                 |                             |
-| Hisilicon      | Hip0{5,6,7}     | #161010101      | HISILICON_ERRATUM_161010101 |
-| Hisilicon      | Hip0{6,7}       | #161010701      | N/A                         |
-| Hisilicon      | Hip07           | #161600802      | HISILICON_ERRATUM_161600802 |
-| Hisilicon      | Hip08 SMMU PMCG | #162001800      | N/A                         |
-|                |                 |                 |                             |
-| Qualcomm Tech. | Kryo/Falkor v1  | E1003           | QCOM_FALKOR_ERRATUM_1003    |
-| Qualcomm Tech. | Falkor v1       | E1009           | QCOM_FALKOR_ERRATUM_1009    |
-| Qualcomm Tech. | QDF2400 ITS     | E0065           | QCOM_QDF2400_ERRATUM_0065   |
-| Qualcomm Tech. | Falkor v{1,2}   | E1041           | QCOM_FALKOR_ERRATUM_1041    |
-| Fujitsu        | A64FX           | E#010001        | FUJITSU_ERRATUM_010001      |
diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
new file mode 100644
index 000000000000..38422ab249dd
--- /dev/null
+++ b/Documentation/arm64/sve.rst
@@ -0,0 +1,529 @@
+===================================================
+Scalable Vector Extension support for AArch64 Linux
+===================================================
+
+Author: Dave Martin <Dave.Martin@arm.com>
+
+Date:   4 August 2017
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Scalable Vector Extension (SVE).
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.
+
+This document does not aim to describe the SVE architecture or programmer's
+model.  To aid understanding, a minimal description of relevant programmer's
+model features for SVE is included in Appendix A.
+
+
+1.  General
+-----------
+
+* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
+  tracked per-thread.
+
+* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
+  AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
+  instructions and registers, and the Linux-specific system interfaces
+  described in this document.  SVE is reported in /proc/cpuinfo as "sve".
+
+* Support for the execution of SVE instructions in userspace can also be
+  detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
+  instruction, and checking that the value of the SVE field is nonzero. [3]
+
+  It does not guarantee the presence of the system interfaces described in the
+  following sections: software that needs to verify that those interfaces are
+  present must check for HWCAP_SVE instead.
+
+* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also
+  be reported in the AT_HWCAP2 aux vector entry.  In addition to this,
+  optional extensions to SVE2 may be reported by the presence of:
+
+	HWCAP2_SVE2
+	HWCAP2_SVEAES
+	HWCAP2_SVEPMULL
+	HWCAP2_SVEBITPERM
+	HWCAP2_SVESHA3
+	HWCAP2_SVESM4
+
+  This list may be extended over time as the SVE architecture evolves.
+
+  These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1,
+  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
+  cpu-feature-registers.txt for details.
+
+* Debuggers should restrict themselves to interacting with the target via the
+  NT_ARM_SVE regset.  The recommended way of detecting support for this regset
+  is to connect to a target process first and then attempt a
+  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
+
+
+2.  Vector length terminology
+-----------------------------
+
+The size of an SVE vector (Z) register is referred to as the "vector length".
+
+To avoid confusion about the units used to express vector length, the kernel
+adopts the following conventions:
+
+* Vector length (VL) = size of a Z-register in bytes
+
+* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
+
+(So, VL = 16 * VQ.)
+
+The VQ convention is used where the underlying granularity is important, such
+as in data structure definitions.  In most other situations, the VL convention
+is used.  This is consistent with the meaning of the "VL" pseudo-register in
+the SVE instruction set architecture.
+
+
+3.  System call behaviour
+-------------------------
+
+* On syscall, V0..V31 are preserved (as without SVE).  Thus, bits [127:0] of
+  Z0..Z31 are preserved.  All other bits of Z0..Z31, and all of P0..P15 and FFR
+  become unspecified on return from a syscall.
+
+* The SVE registers are not used to pass arguments to or receive results from
+  any syscall.
+
+* In practice the affected registers/bits will be preserved or will be replaced
+  with zeros on return from a syscall, but userspace should not make
+  assumptions about this.  The kernel behaviour may vary on a case-by-case
+  basis.
+
+* All other SVE state of a thread, including the currently configured vector
+  length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
+  length (if any), is preserved across all syscalls, subject to the specific
+  exceptions for execve() described in section 6.
+
+  In particular, on return from a fork() or clone(), the parent and new child
+  process or thread share identical SVE configuration, matching that of the
+  parent before the call.
+
+
+4.  Signal handling
+-------------------
+
+* A new signal frame record sve_context encodes the SVE registers on signal
+  delivery. [1]
+
+* This record is supplementary to fpsimd_context.  The FPSR and FPCR registers
+  are only present in fpsimd_context.  For convenience, the content of V0..V31
+  is duplicated between sve_context and fpsimd_context.
+
+* The signal frame record for SVE always contains basic metadata, in particular
+  the thread's vector length (in sve_context.vl).
+
+* The SVE registers may or may not be included in the record, depending on
+  whether the registers are live for the thread.  The registers are present if
+  and only if:
+  sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
+
+* If the registers are present, the remainder of the record has a vl-dependent
+  size and layout.  Macros SVE_SIG_* are defined [1] to facilitate access to
+  the members.
+
+* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
+  space is allocated on the stack, an extra_context record is written in
+  __reserved[] referencing this space.  sve_context is then written in the
+  extra space.  Refer to [1] for further details about this mechanism.
+
+
+5.  Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is no sve_context record in the signal frame, or if the record is
+  present but contains no register data as desribed in the previous section,
+  then the SVE registers/bits become non-live and take unspecified values.
+
+* If sve_context is present in the signal frame and contains full register
+  data, the SVE registers become live and are populated with the specified
+  data.  However, for backward compatibility reasons, bits [127:0] of Z0..Z31
+  are always restored from the corresponding members of fpsimd_context.vregs[]
+  and not from sve_context.  The remaining bits are restored from sve_context.
+
+* Inclusion of fpsimd_context in the signal frame remains mandatory,
+  irrespective of whether sve_context is present or not.
+
+* The vector length cannot be changed via signal return.  If sve_context.vl in
+  the signal frame does not match the current vector length, the signal return
+  attempt is treated as illegal, resulting in a forced SIGSEGV.
+
+
+6.  prctl extensions
+--------------------
+
+Some new prctl() calls are added to allow programs to manage the SVE vector
+length:
+
+prctl(PR_SVE_SET_VL, unsigned long arg)
+
+    Sets the vector length of the calling thread and related flags, where
+    arg == vl | flags.  Other threads of the calling process are unaffected.
+
+    vl is the desired vector length, where sve_vl_valid(vl) must be true.
+
+    flags:
+
+	PR_SVE_SET_VL_INHERIT
+
+	    Inherit the current vector length across execve().  Otherwise, the
+	    vector length is reset to the system default at execve().  (See
+	    Section 9.)
+
+	PR_SVE_SET_VL_ONEXEC
+
+	    Defer the requested vector length change until the next execve()
+	    performed by this thread.
+
+	    The effect is equivalent to implicit exceution of the following
+	    call immediately after the next execve() (if any) by the thread:
+
+		prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
+
+	    This allows launching of a new program with a different vector
+	    length, while avoiding runtime side effects in the caller.
+
+
+	    Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
+	    immediately.
+
+
+    Return value: a nonnegative on success, or a negative value on error:
+	EINVAL: SVE not supported, invalid vector length requested, or
+	    invalid flags.
+
+
+    On success:
+
+    * Either the calling thread's vector length or the deferred vector length
+      to be applied at the next execve() by the thread (dependent on whether
+      PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
+      supported by the system that is less than or equal to vl.  If vl ==
+      SVE_VL_MAX, the value set will be the largest value supported by the
+      system.
+
+    * Any previously outstanding deferred vector length change in the calling
+      thread is cancelled.
+
+    * The returned value describes the resulting configuration, encoded as for
+      PR_SVE_GET_VL.  The vector length reported in this value is the new
+      current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
+      present in arg; otherwise, the reported vector length is the deferred
+      vector length that will be applied at the next execve() by the calling
+      thread.
+
+    * Changing the vector length causes all of P0..P15, FFR and all bits of
+      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
+      unspecified.  Calling PR_SVE_SET_VL with vl equal to the thread's current
+      vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
+      flag, does not constitute a change to the vector length for this purpose.
+
+
+prctl(PR_SVE_GET_VL)
+
+    Gets the vector length of the calling thread.
+
+    The following flag may be OR-ed into the result:
+
+	PR_SVE_SET_VL_INHERIT
+
+	    Vector length will be inherited across execve().
+
+    There is no way to determine whether there is an outstanding deferred
+    vector length change (which would only normally be the case between a
+    fork() or vfork() and the corresponding execve() in typical use).
+
+    To extract the vector length from the result, and it with
+    PR_SVE_VL_LEN_MASK.
+
+    Return value: a nonnegative value on success, or a negative value on error:
+	EINVAL: SVE not supported.
+
+
+7.  ptrace extensions
+---------------------
+
+* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
+  PTRACE_SETREGSET.
+
+  Refer to [2] for definitions.
+
+The regset data starts with struct user_sve_header, containing:
+
+    size
+
+	Size of the complete regset, in bytes.
+	This depends on vl and possibly on other things in the future.
+
+	If a call to PTRACE_GETREGSET requests less data than the value of
+	size, the caller can allocate a larger buffer and retry in order to
+	read the complete regset.
+
+    max_size
+
+	Maximum size in bytes that the regset can grow to for the target
+	thread.  The regset won't grow bigger than this even if the target
+	thread changes its vector length etc.
+
+    vl
+
+	Target thread's current vector length, in bytes.
+
+    max_vl
+
+	Maximum possible vector length for the target thread.
+
+    flags
+
+	either
+
+	    SVE_PT_REGS_FPSIMD
+
+		SVE registers are not live (GETREGSET) or are to be made
+		non-live (SETREGSET).
+
+		The payload is of type struct user_fpsimd_state, with the same
+		meaning as for NT_PRFPREG, starting at offset
+		SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
+
+		Extra data might be appended in the future: the size of the
+		payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
+
+		vq should be obtained using sve_vq_from_vl(vl).
+
+		or
+
+	    SVE_PT_REGS_SVE
+
+		SVE registers are live (GETREGSET) or are to be made live
+		(SETREGSET).
+
+		The payload contains the SVE register data, starting at offset
+		SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
+		size SVE_PT_SVE_SIZE(vq, flags);
+
+	... OR-ed with zero or more of the following flags, which have the same
+	meaning and behaviour as the corresponding PR_SET_VL_* flags:
+
+	    SVE_PT_VL_INHERIT
+
+	    SVE_PT_VL_ONEXEC (SETREGSET only).
+
+* The effects of changing the vector length and/or flags are equivalent to
+  those documented for PR_SVE_SET_VL.
+
+  The caller must make a further GETREGSET call if it needs to know what VL is
+  actually set by SETREGSET, unless is it known in advance that the requested
+  VL is supported.
+
+* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
+  the header fields.  The SVE_PT_SVE_*() macros are provided to facilitate
+  access to the members.
+
+* In either case, for SETREGSET it is permissible to omit the payload, in which
+  case only the vector length and flags are changed (along with any
+  consequences of those changes).
+
+* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
+  requested VL is not supported, the effect will be the same as if the
+  payload were omitted, except that an EIO error is reported.  No
+  attempt is made to translate the payload data to the correct layout
+  for the vector length actually set.  The thread's FPSIMD state is
+  preserved, but the remaining bits of the SVE registers become
+  unspecified.  It is up to the caller to translate the payload layout
+  for the actual VL and retry.
+
+* The effect of writing a partial, incomplete payload is unspecified.
+
+
+8.  ELF coredump extensions
+---------------------------
+
+* A NT_ARM_SVE note will be added to each coredump for each thread of the
+  dumped process.  The contents will be equivalent to the data that would have
+  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
+  when the coredump was generated.
+
+
+9.  System runtime configuration
+--------------------------------
+
+* To mitigate the ABI impact of expansion of the signal frame, a policy
+  mechanism is provided for administrators, distro maintainers and developers
+  to set the default vector length for userspace processes:
+
+/proc/sys/abi/sve_default_vector_length
+
+    Writing the text representation of an integer to this file sets the system
+    default vector length to the specified value, unless the value is greater
+    than the maximum vector length supported by the system in which case the
+    default vector length is set to that maximum.
+
+    The result can be determined by reopening the file and reading its
+    contents.
+
+    At boot, the default vector length is initially set to 64 or the maximum
+    supported vector length, whichever is smaller.  This determines the initial
+    vector length of the init process (PID 1).
+
+    Reading this file returns the current system default vector length.
+
+* At every execve() call, the new vector length of the new process is set to
+  the system default vector length, unless
+
+    * PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
+      calling thread, or
+
+    * a deferred vector length change is pending, established via the
+      PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
+
+* Modifying the system default vector length does not affect the vector length
+  of any existing process or thread that does not make an execve() call.
+
+
+Appendix A.  SVE programmer's model (informative)
+=================================================
+
+This section provides a minimal description of the additions made by SVE to the
+ARMv8-A programmer's model that are relevant to this document.
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+A.1.  Registers
+---------------
+
+In A64 state, SVE adds the following:
+
+* 32 8VL-bit vector registers Z0..Z31
+  For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
+
+  A register write using a Vn register name zeros all bits of the corresponding
+  Zn except for bits [127:0].
+
+* 16 VL-bit predicate registers P0..P15
+
+* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
+
+* a VL "pseudo-register" that determines the size of each vector register
+
+  The SVE instruction set architecture provides no way to write VL directly.
+  Instead, it can be modified only by EL1 and above, by writing appropriate
+  system registers.
+
+* The value of VL can be configured at runtime by EL1 and above:
+  16 <= VL <= VLmax, where VL must be a multiple of 16.
+
+* The maximum vector length is determined by the hardware:
+  16 <= VLmax <= 256.
+
+  (The SVE architecture specifies 256, but permits future architecture
+  revisions to raise this limit.)
+
+* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
+  operations in a similar way to the way in which they interact with ARMv8
+  floating-point operations::
+
+         8VL-1                       128               0  bit index
+        +----          ////            -----------------+
+     Z0 |                               :       V0      |
+      :                                          :
+     Z7 |                               :       V7      |
+     Z8 |                               :     * V8      |
+      :                                       :  :
+    Z15 |                               :     *V15      |
+    Z16 |                               :      V16      |
+      :                                          :
+    Z31 |                               :      V31      |
+        +----          ////            -----------------+
+                                                 31    0
+         VL-1                  0                +-------+
+        +----       ////      --+          FPSR |       |
+     P0 |                       |               +-------+
+      : |                       |         *FPCR |       |
+    P15 |                       |               +-------+
+        +----       ////      --+
+    FFR |                       |               +-----+
+        +----       ////      --+            VL |     |
+                                                +-----+
+
+(*) callee-save:
+    This only applies to bits [63:0] of Z-/V-registers.
+    FPCR contains callee-save and caller-save bits.  See [4] for details.
+
+
+A.2.  Procedure call standard
+-----------------------------
+
+The ARMv8-A base procedure call standard is extended as follows with respect to
+the additional SVE register state:
+
+* All SVE register bits that are not shared with FP/SIMD are caller-save.
+
+* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
+
+  This follows from the way these bits are mapped to V8..V15, which are caller-
+  save in the base procedure call standard.
+
+
+Appendix B.  ARMv8-A FP/SIMD programmer's model
+===============================================
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+Refer to [4] for for more information.
+
+ARMv8-A defines the following floating-point / SIMD register state:
+
+* 32 128-bit vector registers V0..V31
+* 2 32-bit status/control registers FPSR, FPCR
+
+::
+
+         127           0  bit index
+        +---------------+
+     V0 |               |
+      : :               :
+     V7 |               |
+   * V8 |               |
+   :  : :               :
+   *V15 |               |
+    V16 |               |
+      : :               :
+    V31 |               |
+        +---------------+
+
+                 31    0
+                +-------+
+           FPSR |       |
+                +-------+
+          *FPCR |       |
+                +-------+
+
+(*) callee-save:
+    This only applies to bits [63:0] of V-registers.
+    FPCR contains a mixture of callee-save and caller-save bits.
+
+
+References
+==========
+
+[1] arch/arm64/include/uapi/asm/sigcontext.h
+    AArch64 Linux signal ABI definitions
+
+[2] arch/arm64/include/uapi/asm/ptrace.h
+    AArch64 Linux ptrace ABI definitions
+
+[3] Documentation/arm64/cpu-feature-registers.rst
+
+[4] ARM IHI0055C
+    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
+    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
+    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
diff --git a/Documentation/arm64/sve.txt b/Documentation/arm64/sve.txt
deleted file mode 100644
index 9940e924a47e..000000000000
--- a/Documentation/arm64/sve.txt
+++ /dev/null
@@ -1,525 +0,0 @@
-            Scalable Vector Extension support for AArch64 Linux
-            ===================================================
-
-Author: Dave Martin <Dave.Martin@arm.com>
-Date:   4 August 2017
-
-This document outlines briefly the interface provided to userspace by Linux in
-order to support use of the ARM Scalable Vector Extension (SVE).
-
-This is an outline of the most important features and issues only and not
-intended to be exhaustive.
-
-This document does not aim to describe the SVE architecture or programmer's
-model.  To aid understanding, a minimal description of relevant programmer's
-model features for SVE is included in Appendix A.
-
-
-1.  General
------------
-
-* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
-  tracked per-thread.
-
-* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
-  AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
-  instructions and registers, and the Linux-specific system interfaces
-  described in this document.  SVE is reported in /proc/cpuinfo as "sve".
-
-* Support for the execution of SVE instructions in userspace can also be
-  detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
-  instruction, and checking that the value of the SVE field is nonzero. [3]
-
-  It does not guarantee the presence of the system interfaces described in the
-  following sections: software that needs to verify that those interfaces are
-  present must check for HWCAP_SVE instead.
-
-* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also
-  be reported in the AT_HWCAP2 aux vector entry.  In addition to this,
-  optional extensions to SVE2 may be reported by the presence of:
-
-	HWCAP2_SVE2
-	HWCAP2_SVEAES
-	HWCAP2_SVEPMULL
-	HWCAP2_SVEBITPERM
-	HWCAP2_SVESHA3
-	HWCAP2_SVESM4
-
-  This list may be extended over time as the SVE architecture evolves.
-
-  These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1,
-  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
-  cpu-feature-registers.txt for details.
-
-* Debuggers should restrict themselves to interacting with the target via the
-  NT_ARM_SVE regset.  The recommended way of detecting support for this regset
-  is to connect to a target process first and then attempt a
-  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
-
-
-2.  Vector length terminology
------------------------------
-
-The size of an SVE vector (Z) register is referred to as the "vector length".
-
-To avoid confusion about the units used to express vector length, the kernel
-adopts the following conventions:
-
-* Vector length (VL) = size of a Z-register in bytes
-
-* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
-
-(So, VL = 16 * VQ.)
-
-The VQ convention is used where the underlying granularity is important, such
-as in data structure definitions.  In most other situations, the VL convention
-is used.  This is consistent with the meaning of the "VL" pseudo-register in
-the SVE instruction set architecture.
-
-
-3.  System call behaviour
--------------------------
-
-* On syscall, V0..V31 are preserved (as without SVE).  Thus, bits [127:0] of
-  Z0..Z31 are preserved.  All other bits of Z0..Z31, and all of P0..P15 and FFR
-  become unspecified on return from a syscall.
-
-* The SVE registers are not used to pass arguments to or receive results from
-  any syscall.
-
-* In practice the affected registers/bits will be preserved or will be replaced
-  with zeros on return from a syscall, but userspace should not make
-  assumptions about this.  The kernel behaviour may vary on a case-by-case
-  basis.
-
-* All other SVE state of a thread, including the currently configured vector
-  length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
-  length (if any), is preserved across all syscalls, subject to the specific
-  exceptions for execve() described in section 6.
-
-  In particular, on return from a fork() or clone(), the parent and new child
-  process or thread share identical SVE configuration, matching that of the
-  parent before the call.
-
-
-4.  Signal handling
--------------------
-
-* A new signal frame record sve_context encodes the SVE registers on signal
-  delivery. [1]
-
-* This record is supplementary to fpsimd_context.  The FPSR and FPCR registers
-  are only present in fpsimd_context.  For convenience, the content of V0..V31
-  is duplicated between sve_context and fpsimd_context.
-
-* The signal frame record for SVE always contains basic metadata, in particular
-  the thread's vector length (in sve_context.vl).
-
-* The SVE registers may or may not be included in the record, depending on
-  whether the registers are live for the thread.  The registers are present if
-  and only if:
-  sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
-
-* If the registers are present, the remainder of the record has a vl-dependent
-  size and layout.  Macros SVE_SIG_* are defined [1] to facilitate access to
-  the members.
-
-* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
-  space is allocated on the stack, an extra_context record is written in
-  __reserved[] referencing this space.  sve_context is then written in the
-  extra space.  Refer to [1] for further details about this mechanism.
-
-
-5.  Signal return
------------------
-
-When returning from a signal handler:
-
-* If there is no sve_context record in the signal frame, or if the record is
-  present but contains no register data as desribed in the previous section,
-  then the SVE registers/bits become non-live and take unspecified values.
-
-* If sve_context is present in the signal frame and contains full register
-  data, the SVE registers become live and are populated with the specified
-  data.  However, for backward compatibility reasons, bits [127:0] of Z0..Z31
-  are always restored from the corresponding members of fpsimd_context.vregs[]
-  and not from sve_context.  The remaining bits are restored from sve_context.
-
-* Inclusion of fpsimd_context in the signal frame remains mandatory,
-  irrespective of whether sve_context is present or not.
-
-* The vector length cannot be changed via signal return.  If sve_context.vl in
-  the signal frame does not match the current vector length, the signal return
-  attempt is treated as illegal, resulting in a forced SIGSEGV.
-
-
-6.  prctl extensions
---------------------
-
-Some new prctl() calls are added to allow programs to manage the SVE vector
-length:
-
-prctl(PR_SVE_SET_VL, unsigned long arg)
-
-    Sets the vector length of the calling thread and related flags, where
-    arg == vl | flags.  Other threads of the calling process are unaffected.
-
-    vl is the desired vector length, where sve_vl_valid(vl) must be true.
-
-    flags:
-
-	PR_SVE_SET_VL_INHERIT
-
-	    Inherit the current vector length across execve().  Otherwise, the
-	    vector length is reset to the system default at execve().  (See
-	    Section 9.)
-
-	PR_SVE_SET_VL_ONEXEC
-
-	    Defer the requested vector length change until the next execve()
-	    performed by this thread.
-
-	    The effect is equivalent to implicit exceution of the following
-	    call immediately after the next execve() (if any) by the thread:
-
-		prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
-
-	    This allows launching of a new program with a different vector
-	    length, while avoiding runtime side effects in the caller.
-
-
-	    Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
-	    immediately.
-
-
-    Return value: a nonnegative on success, or a negative value on error:
-	EINVAL: SVE not supported, invalid vector length requested, or
-	    invalid flags.
-
-
-    On success:
-
-    * Either the calling thread's vector length or the deferred vector length
-      to be applied at the next execve() by the thread (dependent on whether
-      PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
-      supported by the system that is less than or equal to vl.  If vl ==
-      SVE_VL_MAX, the value set will be the largest value supported by the
-      system.
-
-    * Any previously outstanding deferred vector length change in the calling
-      thread is cancelled.
-
-    * The returned value describes the resulting configuration, encoded as for
-      PR_SVE_GET_VL.  The vector length reported in this value is the new
-      current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
-      present in arg; otherwise, the reported vector length is the deferred
-      vector length that will be applied at the next execve() by the calling
-      thread.
-
-    * Changing the vector length causes all of P0..P15, FFR and all bits of
-      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
-      unspecified.  Calling PR_SVE_SET_VL with vl equal to the thread's current
-      vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
-      flag, does not constitute a change to the vector length for this purpose.
-
-
-prctl(PR_SVE_GET_VL)
-
-    Gets the vector length of the calling thread.
-
-    The following flag may be OR-ed into the result:
-
-	PR_SVE_SET_VL_INHERIT
-
-	    Vector length will be inherited across execve().
-
-    There is no way to determine whether there is an outstanding deferred
-    vector length change (which would only normally be the case between a
-    fork() or vfork() and the corresponding execve() in typical use).
-
-    To extract the vector length from the result, and it with
-    PR_SVE_VL_LEN_MASK.
-
-    Return value: a nonnegative value on success, or a negative value on error:
-	EINVAL: SVE not supported.
-
-
-7.  ptrace extensions
----------------------
-
-* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
-  PTRACE_SETREGSET.
-
-  Refer to [2] for definitions.
-
-The regset data starts with struct user_sve_header, containing:
-
-    size
-
-	Size of the complete regset, in bytes.
-	This depends on vl and possibly on other things in the future.
-
-	If a call to PTRACE_GETREGSET requests less data than the value of
-	size, the caller can allocate a larger buffer and retry in order to
-	read the complete regset.
-
-    max_size
-
-	Maximum size in bytes that the regset can grow to for the target
-	thread.  The regset won't grow bigger than this even if the target
-	thread changes its vector length etc.
-
-    vl
-
-	Target thread's current vector length, in bytes.
-
-    max_vl
-
-	Maximum possible vector length for the target thread.
-
-    flags
-
-	either
-
-	    SVE_PT_REGS_FPSIMD
-
-		SVE registers are not live (GETREGSET) or are to be made
-		non-live (SETREGSET).
-
-		The payload is of type struct user_fpsimd_state, with the same
-		meaning as for NT_PRFPREG, starting at offset
-		SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
-
-		Extra data might be appended in the future: the size of the
-		payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
-
-		vq should be obtained using sve_vq_from_vl(vl).
-
-		or
-
-	    SVE_PT_REGS_SVE
-
-		SVE registers are live (GETREGSET) or are to be made live
-		(SETREGSET).
-
-		The payload contains the SVE register data, starting at offset
-		SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
-		size SVE_PT_SVE_SIZE(vq, flags);
-
-	... OR-ed with zero or more of the following flags, which have the same
-	meaning and behaviour as the corresponding PR_SET_VL_* flags:
-
-	    SVE_PT_VL_INHERIT
-
-	    SVE_PT_VL_ONEXEC (SETREGSET only).
-
-* The effects of changing the vector length and/or flags are equivalent to
-  those documented for PR_SVE_SET_VL.
-
-  The caller must make a further GETREGSET call if it needs to know what VL is
-  actually set by SETREGSET, unless is it known in advance that the requested
-  VL is supported.
-
-* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
-  the header fields.  The SVE_PT_SVE_*() macros are provided to facilitate
-  access to the members.
-
-* In either case, for SETREGSET it is permissible to omit the payload, in which
-  case only the vector length and flags are changed (along with any
-  consequences of those changes).
-
-* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
-  requested VL is not supported, the effect will be the same as if the
-  payload were omitted, except that an EIO error is reported.  No
-  attempt is made to translate the payload data to the correct layout
-  for the vector length actually set.  The thread's FPSIMD state is
-  preserved, but the remaining bits of the SVE registers become
-  unspecified.  It is up to the caller to translate the payload layout
-  for the actual VL and retry.
-
-* The effect of writing a partial, incomplete payload is unspecified.
-
-
-8.  ELF coredump extensions
----------------------------
-
-* A NT_ARM_SVE note will be added to each coredump for each thread of the
-  dumped process.  The contents will be equivalent to the data that would have
-  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
-  when the coredump was generated.
-
-
-9.  System runtime configuration
---------------------------------
-
-* To mitigate the ABI impact of expansion of the signal frame, a policy
-  mechanism is provided for administrators, distro maintainers and developers
-  to set the default vector length for userspace processes:
-
-/proc/sys/abi/sve_default_vector_length
-
-    Writing the text representation of an integer to this file sets the system
-    default vector length to the specified value, unless the value is greater
-    than the maximum vector length supported by the system in which case the
-    default vector length is set to that maximum.
-
-    The result can be determined by reopening the file and reading its
-    contents.
-
-    At boot, the default vector length is initially set to 64 or the maximum
-    supported vector length, whichever is smaller.  This determines the initial
-    vector length of the init process (PID 1).
-
-    Reading this file returns the current system default vector length.
-
-* At every execve() call, the new vector length of the new process is set to
-  the system default vector length, unless
-
-    * PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
-      calling thread, or
-
-    * a deferred vector length change is pending, established via the
-      PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
-
-* Modifying the system default vector length does not affect the vector length
-  of any existing process or thread that does not make an execve() call.
-
-
-Appendix A.  SVE programmer's model (informative)
-=================================================
-
-This section provides a minimal description of the additions made by SVE to the
-ARMv8-A programmer's model that are relevant to this document.
-
-Note: This section is for information only and not intended to be complete or
-to replace any architectural specification.
-
-A.1.  Registers
----------------
-
-In A64 state, SVE adds the following:
-
-* 32 8VL-bit vector registers Z0..Z31
-  For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
-
-  A register write using a Vn register name zeros all bits of the corresponding
-  Zn except for bits [127:0].
-
-* 16 VL-bit predicate registers P0..P15
-
-* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
-
-* a VL "pseudo-register" that determines the size of each vector register
-
-  The SVE instruction set architecture provides no way to write VL directly.
-  Instead, it can be modified only by EL1 and above, by writing appropriate
-  system registers.
-
-* The value of VL can be configured at runtime by EL1 and above:
-  16 <= VL <= VLmax, where VL must be a multiple of 16.
-
-* The maximum vector length is determined by the hardware:
-  16 <= VLmax <= 256.
-
-  (The SVE architecture specifies 256, but permits future architecture
-  revisions to raise this limit.)
-
-* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
-  operations in a similar way to the way in which they interact with ARMv8
-  floating-point operations.
-
-         8VL-1                       128               0  bit index
-        +----          ////            -----------------+
-     Z0 |                               :       V0      |
-      :                                          :
-     Z7 |                               :       V7      |
-     Z8 |                               :     * V8      |
-      :                                       :  :
-    Z15 |                               :     *V15      |
-    Z16 |                               :      V16      |
-      :                                          :
-    Z31 |                               :      V31      |
-        +----          ////            -----------------+
-                                                 31    0
-         VL-1                  0                +-------+
-        +----       ////      --+          FPSR |       |
-     P0 |                       |               +-------+
-      : |                       |         *FPCR |       |
-    P15 |                       |               +-------+
-        +----       ////      --+
-    FFR |                       |               +-----+
-        +----       ////      --+            VL |     |
-                                                +-----+
-
-(*) callee-save:
-    This only applies to bits [63:0] of Z-/V-registers.
-    FPCR contains callee-save and caller-save bits.  See [4] for details.
-
-
-A.2.  Procedure call standard
------------------------------
-
-The ARMv8-A base procedure call standard is extended as follows with respect to
-the additional SVE register state:
-
-* All SVE register bits that are not shared with FP/SIMD are caller-save.
-
-* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
-
-  This follows from the way these bits are mapped to V8..V15, which are caller-
-  save in the base procedure call standard.
-
-
-Appendix B.  ARMv8-A FP/SIMD programmer's model
-===============================================
-
-Note: This section is for information only and not intended to be complete or
-to replace any architectural specification.
-
-Refer to [4] for for more information.
-
-ARMv8-A defines the following floating-point / SIMD register state:
-
-* 32 128-bit vector registers V0..V31
-* 2 32-bit status/control registers FPSR, FPCR
-
-         127           0  bit index
-        +---------------+
-     V0 |               |
-      : :               :
-     V7 |               |
-   * V8 |               |
-   :  : :               :
-   *V15 |               |
-    V16 |               |
-      : :               :
-    V31 |               |
-        +---------------+
-
-                 31    0
-                +-------+
-           FPSR |       |
-                +-------+
-          *FPCR |       |
-                +-------+
-
-(*) callee-save:
-    This only applies to bits [63:0] of V-registers.
-    FPCR contains a mixture of callee-save and caller-save bits.
-
-
-References
-==========
-
-[1] arch/arm64/include/uapi/asm/sigcontext.h
-    AArch64 Linux signal ABI definitions
-
-[2] arch/arm64/include/uapi/asm/ptrace.h
-    AArch64 Linux ptrace ABI definitions
-
-[3] Documentation/arm64/cpu-feature-registers.txt
-
-[4] ARM IHI0055C
-    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
-    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
-    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst
new file mode 100644
index 000000000000..2acdec3ebbeb
--- /dev/null
+++ b/Documentation/arm64/tagged-pointers.rst
@@ -0,0 +1,68 @@
+=========================================
+Tagged virtual addresses in AArch64 Linux
+=========================================
+
+Author: Will Deacon <will.deacon@arm.com>
+
+Date  : 12 June 2013
+
+This document briefly describes the provision of tagged virtual
+addresses in the AArch64 translation system and their potential uses
+in AArch64 Linux.
+
+The kernel configures the translation tables so that translations made
+via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of
+the virtual address ignored by the translation hardware. This frees up
+this byte for application use.
+
+
+Passing tagged addresses to the kernel
+--------------------------------------
+
+All interpretation of userspace memory addresses by the kernel assumes
+an address tag of 0x00.
+
+This includes, but is not limited to, addresses found in:
+
+ - pointer arguments to system calls, including pointers in structures
+   passed to system calls,
+
+ - the stack pointer (sp), e.g. when interpreting it to deliver a
+   signal,
+
+ - the frame pointer (x29) and frame records, e.g. when interpreting
+   them to generate a backtrace or call graph.
+
+Using non-zero address tags in any of these locations may result in an
+error code being returned, a (fatal) signal being raised, or other modes
+of failure.
+
+For these reasons, passing non-zero address tags to the kernel via
+system calls is forbidden, and using a non-zero address tag for sp is
+strongly discouraged.
+
+Programs maintaining a frame pointer and frame records that use non-zero
+address tags may suffer impaired or inaccurate debug and profiling
+visibility.
+
+
+Preserving tags
+---------------
+
+Non-zero tags are not preserved when delivering signals. This means that
+signal handlers in applications making use of tags cannot rely on the
+tag information for user virtual addresses being maintained for fields
+inside siginfo_t. One exception to this rule is for signals raised in
+response to watchpoint debug exceptions, where the tag information will
+be preserved.
+
+The architecture prevents the use of a tagged PC, so the upper byte will
+be set to a sign-extension of bit 55 on exception return.
+
+
+Other considerations
+--------------------
+
+Special care should be taken when using tagged pointers, since it is
+likely that C compilers will not hazard two virtual addresses differing
+only in the upper byte.
diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.txt
deleted file mode 100644
index a25a99e82bb1..000000000000
--- a/Documentation/arm64/tagged-pointers.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-		Tagged virtual addresses in AArch64 Linux
-		=========================================
-
-Author: Will Deacon <will.deacon@arm.com>
-Date  : 12 June 2013
-
-This document briefly describes the provision of tagged virtual
-addresses in the AArch64 translation system and their potential uses
-in AArch64 Linux.
-
-The kernel configures the translation tables so that translations made
-via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of
-the virtual address ignored by the translation hardware. This frees up
-this byte for application use.
-
-
-Passing tagged addresses to the kernel
---------------------------------------
-
-All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
-
-This includes, but is not limited to, addresses found in:
-
- - pointer arguments to system calls, including pointers in structures
-   passed to system calls,
-
- - the stack pointer (sp), e.g. when interpreting it to deliver a
-   signal,
-
- - the frame pointer (x29) and frame records, e.g. when interpreting
-   them to generate a backtrace or call graph.
-
-Using non-zero address tags in any of these locations may result in an
-error code being returned, a (fatal) signal being raised, or other modes
-of failure.
-
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
-
-Programs maintaining a frame pointer and frame records that use non-zero
-address tags may suffer impaired or inaccurate debug and profiling
-visibility.
-
-
-Preserving tags
----------------
-
-Non-zero tags are not preserved when delivering signals. This means that
-signal handlers in applications making use of tags cannot rely on the
-tag information for user virtual addresses being maintained for fields
-inside siginfo_t. One exception to this rule is for signals raised in
-response to watchpoint debug exceptions, where the tag information will
-be preserved.
-
-The architecture prevents the use of a tagged PC, so the upper byte will
-be set to a sign-extension of bit 55 on exception return.
-
-
-Other considerations
---------------------
-
-Special care should be taken when using tagged pointers, since it is
-likely that C compilers will not hazard two virtual addresses differing
-only in the upper byte.
diff --git a/Documentation/translations/zh_CN/arm64/booting.txt b/Documentation/translations/zh_CN/arm64/booting.txt
index c1dd968c5ee9..3bfbf66e5a5e 100644
--- a/Documentation/translations/zh_CN/arm64/booting.txt
+++ b/Documentation/translations/zh_CN/arm64/booting.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm64/booting.txt
+Chinese translated version of Documentation/arm64/booting.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ M:	Will Deacon <will.deacon@arm.com>
 zh_CN:	Fu Wei <wefu@redhat.com>
 C:	55f058e7574c3615dea4615573a19bdb258696c6
 ---------------------------------------------------------------------
-Documentation/arm64/booting.txt 的中文翻译
+Documentation/arm64/booting.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt b/Documentation/translations/zh_CN/arm64/legacy_instructions.txt
index 68362a1ab717..e295cf75f606 100644
--- a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt
+++ b/Documentation/translations/zh_CN/arm64/legacy_instructions.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm64/legacy_instructions.txt
+Chinese translated version of Documentation/arm64/legacy_instructions.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ Maintainer: Punit Agrawal <punit.agrawal@arm.com>
             Suzuki K. Poulose <suzuki.poulose@arm.com>
 Chinese maintainer: Fu Wei <wefu@redhat.com>
 ---------------------------------------------------------------------
-Documentation/arm64/legacy_instructions.txt 的中文翻译
+Documentation/arm64/legacy_instructions.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/arm64/memory.txt b/Documentation/translations/zh_CN/arm64/memory.txt
index 19b3a52d5d94..be20f8228b91 100644
--- a/Documentation/translations/zh_CN/arm64/memory.txt
+++ b/Documentation/translations/zh_CN/arm64/memory.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm64/memory.txt
+Chinese translated version of Documentation/arm64/memory.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -9,7 +9,7 @@ or if there is a problem with the translation.
 Maintainer: Catalin Marinas <catalin.marinas@arm.com>
 Chinese maintainer: Fu Wei <wefu@redhat.com>
 ---------------------------------------------------------------------
-Documentation/arm64/memory.txt 的中文翻译
+Documentation/arm64/memory.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/arm64/silicon-errata.txt b/Documentation/translations/zh_CN/arm64/silicon-errata.txt
index 39477c75c4a4..440c59ac7dce 100644
--- a/Documentation/translations/zh_CN/arm64/silicon-errata.txt
+++ b/Documentation/translations/zh_CN/arm64/silicon-errata.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm64/silicon-errata.txt
+Chinese translated version of Documentation/arm64/silicon-errata.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ M:	Will Deacon <will.deacon@arm.com>
 zh_CN:	Fu Wei <wefu@redhat.com>
 C:	1926e54f115725a9248d0c4c65c22acaf94de4c4
 ---------------------------------------------------------------------
-Documentation/arm64/silicon-errata.txt 的中文翻译
+Documentation/arm64/silicon-errata.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt b/Documentation/translations/zh_CN/arm64/tagged-pointers.txt
index 2664d1bd5a1c..77ac3548a16d 100644
--- a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt
+++ b/Documentation/translations/zh_CN/arm64/tagged-pointers.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm64/tagged-pointers.txt
+Chinese translated version of Documentation/arm64/tagged-pointers.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -9,7 +9,7 @@ or if there is a problem with the translation.
 Maintainer: Will Deacon <will.deacon@arm.com>
 Chinese maintainer: Fu Wei <wefu@redhat.com>
 ---------------------------------------------------------------------
-Documentation/arm64/tagged-pointers.txt 的中文翻译
+Documentation/arm64/tagged-pointers.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba6c42c576dd..68984c284c40 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2205,7 +2205,7 @@ max_vq.  This is the maximum vector length available to the guest on
 this vcpu, and determines which register slices are visible through
 this ioctl interface.
 
-(See Documentation/arm64/sve.txt for an explanation of the "vq"
+(See Documentation/arm64/sve.rst for an explanation of the "vq"
 nomenclature.)
 
 KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT.
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index c9e9a6978e73..8e79ce9c3f5c 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -83,7 +83,7 @@ static inline unsigned long efi_get_max_fdt_addr(unsigned long dram_base)
  * guaranteed to cover the kernel Image.
  *
  * Since the EFI stub is part of the kernel Image, we can relax the
- * usual requirements in Documentation/arm64/booting.txt, which still
+ * usual requirements in Documentation/arm64/booting.rst, which still
  * apply to other bootloaders, and are required for some kernel
  * configurations.
  */
diff --git a/arch/arm64/include/asm/image.h b/arch/arm64/include/asm/image.h
index e2c27a2278e9..c2b13213c720 100644
--- a/arch/arm64/include/asm/image.h
+++ b/arch/arm64/include/asm/image.h
@@ -27,7 +27,7 @@
 
 /*
  * struct arm64_image_header - arm64 kernel image header
- * See Documentation/arm64/booting.txt for details
+ * See Documentation/arm64/booting.rst for details
  *
  * @code0:		Executable code, or
  *   @mz_header		  alternatively used for part of MZ header
diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h
index 5f3c0cec5af9..a61f89ddbf34 100644
--- a/arch/arm64/include/uapi/asm/sigcontext.h
+++ b/arch/arm64/include/uapi/asm/sigcontext.h
@@ -137,7 +137,7 @@ struct sve_context {
  * vector length beyond its initial architectural limit of 2048 bits
  * (16 quadwords).
  *
- * See linux/Documentation/arm64/sve.txt for a description of the VL/VQ
+ * See linux/Documentation/arm64/sve.rst for a description of the VL/VQ
  * terminology.
  */
 #define SVE_VQ_BYTES		__SVE_VQ_BYTES	/* bytes per quadword */
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 31cc2f423aa8..2514fd6f12cb 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -53,7 +53,7 @@ static void *image_load(struct kimage *image,
 
 	/*
 	 * We require a kernel with an unambiguous Image header. Per
-	 * Documentation/arm64/booting.txt, this is the case when image_size
+	 * Documentation/arm64/booting.rst, this is the case when image_size
 	 * is non-zero (practically speaking, since v3.17).
 	 */
 	h = (struct arm64_image_header *)kernel;
-- 
cgit v1.2.3-59-g8ed1b


From e327cfcb25422c91f4bb8e8a3488386ac95955f1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:39 -0300
Subject: docs: cdrom-standard.tex: convert from LaTeX to ReST

This is the only LaTeX documentation file inside the documentation.

Instead of having a Latex document directly there, convert
it to ReST format, as this is the format we're using for docs.

For now, let's keep the extension as .txt in order to avoid
warnings when building the documentation with Sphinx.

The next patch patch will rename it to .rst and add it to the
building system.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/cdrom/Makefile           |   21 -
 Documentation/cdrom/cdrom-standard.tex | 1026 ------------------------------
 Documentation/cdrom/cdrom-standard.txt | 1063 ++++++++++++++++++++++++++++++++
 drivers/cdrom/cdrom.c                  |    2 +-
 4 files changed, 1064 insertions(+), 1048 deletions(-)
 delete mode 100644 Documentation/cdrom/Makefile
 delete mode 100644 Documentation/cdrom/cdrom-standard.tex
 create mode 100644 Documentation/cdrom/cdrom-standard.txt

diff --git a/Documentation/cdrom/Makefile b/Documentation/cdrom/Makefile
deleted file mode 100644
index a19e321928e1..000000000000
--- a/Documentation/cdrom/Makefile
+++ /dev/null
@@ -1,21 +0,0 @@
-LATEXFILE = cdrom-standard
-
-all:
-	make clean
-	latex $(LATEXFILE)
-	latex $(LATEXFILE)
-	@if [ -x `which gv` ]; then \
-		`dvips -q -t letter -o $(LATEXFILE).ps $(LATEXFILE).dvi` ;\
-		`gv -antialias -media letter -nocenter $(LATEXFILE).ps` ;\
-	else \
-		`xdvi $(LATEXFILE).dvi &` ;\
-	fi
-	make sortofclean
-
-clean:
-	rm -f $(LATEXFILE).ps $(LATEXFILE).dvi $(LATEXFILE).aux $(LATEXFILE).log 
-
-sortofclean:
-	rm -f $(LATEXFILE).aux $(LATEXFILE).log 
-
-
diff --git a/Documentation/cdrom/cdrom-standard.tex b/Documentation/cdrom/cdrom-standard.tex
deleted file mode 100644
index f7cd455973f7..000000000000
--- a/Documentation/cdrom/cdrom-standard.tex
+++ /dev/null
@@ -1,1026 +0,0 @@
-\documentclass{article}
-\def\version{$Id: cdrom-standard.tex,v 1.9 1997/12/28 15:42:49 david Exp $}
-\newcommand{\newsection}[1]{\newpage\section{#1}}
-
-\evensidemargin=0pt
-\oddsidemargin=0pt
-\topmargin=-\headheight \advance\topmargin by -\headsep
-\textwidth=15.99cm \textheight=24.62cm % normal A4, 1'' margin
-
-\def\linux{{\sc Linux}}
-\def\cdrom{{\sc cd-rom}}
-\def\UCD{{\sc Uniform cd-rom Driver}}
-\def\cdromc{{\tt {cdrom.c}}}
-\def\cdromh{{\tt {cdrom.h}}}
-\def\fo{\sl}                    % foreign words
-\def\ie{{\fo i.e.}}
-\def\eg{{\fo e.g.}}
-
-\everymath{\it} \everydisplay{\it}
-\catcode `\_=\active \def_{\_\penalty100 }
-\catcode`\<=\active \def<#1>{{\langle\hbox{\rm#1}\rangle}}
-
-\begin{document}
-\title{A \linux\ \cdrom\ standard}
-\author{David van Leeuwen\\{\normalsize\tt david@ElseWare.cistron.nl}
-\\{\footnotesize updated by Erik Andersen {\tt(andersee@debian.org)}}
-\\{\footnotesize updated by Jens Axboe {\tt(axboe@image.dk)}}}
-\date{12 March 1999}
-
-\maketitle
-
-\newsection{Introduction}
-
-\linux\ is probably the Unix-like operating system that supports
-the widest variety of hardware devices. The reasons for this are
-presumably 
-\begin{itemize} 
-\item 
-  The large list of hardware devices available for the many platforms
-  that \linux\ now supports (\ie, i386-PCs, Sparc Suns, etc.)
-\item 
-  The open design of the operating system, such that anybody can write a
-  driver for \linux.
-\item 
-  There is plenty of source code around as examples of how to write a driver.
-\end{itemize}
-The openness of \linux, and the many different types of available
-hardware has allowed \linux\ to support many different hardware devices.
-Unfortunately, the very openness that has allowed \linux\ to support
-all these different devices has also allowed the behavior of each
-device driver to differ significantly from one device to another.
-This divergence of behavior has been very significant for \cdrom\
-devices; the way a particular drive reacts to a `standard' $ioctl()$
-call varies greatly from one device driver to another. To avoid making
-their drivers totally inconsistent, the writers of \linux\ \cdrom\
-drivers generally created new device drivers by understanding, copying,
-and then changing an existing one. Unfortunately, this practice did not
-maintain uniform behavior across all the \linux\ \cdrom\ drivers. 
-
-This document describes an effort to establish Uniform behavior across
-all the different \cdrom\ device drivers for \linux. This document also
-defines the various $ioctl$s, and how the low-level \cdrom\ device
-drivers should implement them. Currently (as of the \linux\ 2.1.$x$
-development kernels) several low-level \cdrom\ device drivers, including
-both IDE/ATAPI and SCSI, now use this Uniform interface.
-
-When the \cdrom\ was developed, the interface between the \cdrom\ drive
-and the computer was not specified in the standards. As a result, many
-different \cdrom\ interfaces were developed. Some of them had their
-own proprietary design (Sony, Mitsumi, Panasonic, Philips), other
-manufacturers adopted an existing electrical interface and changed
-the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply
-adapted their drives to one or more of the already existing electrical
-interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and
-most of the `NoName' manufacturers). In cases where a new drive really
-brought its own interface or used its own command set and flow control
-scheme, either a separate driver had to be written, or an existing
-driver had to be enhanced. History has delivered us \cdrom\ support for
-many of these different interfaces. Nowadays, almost all new \cdrom\
-drives are either IDE/ATAPI or SCSI, and it is very unlikely that any
-manufacturer will create a new interface. Even finding drives for the
-old proprietary interfaces is getting difficult.
-
-When (in the 1.3.70's) I looked at the existing software interface,
-which was expressed through \cdromh, it appeared to be a rather wild
-set of commands and data formats.\footnote{I cannot recollect what
-kernel version I looked at, then, presumably 1.2.13 and 1.3.34---the
-latest kernel that I was indirectly involved in.} It seemed that many
-features of the software interface had been added to accommodate the
-capabilities of a particular drive, in an {\fo ad hoc\/} manner. More
-importantly, it appeared that the behavior of the `standard' commands
-was different for most of the different drivers: \eg, some drivers
-close the tray if an $open()$ call occurs when the tray is open, while
-others do not. Some drivers lock the door upon opening the device, to
-prevent an incoherent file system, but others don't, to allow software
-ejection. Undoubtedly, the capabilities of the different drives vary,
-but even when two drives have the same capability their drivers'
-behavior was usually different.
-
-I decided to start a discussion on how to make all the \linux\ \cdrom\
-drivers behave more uniformly. I began by contacting the developers of
-the many \cdrom\ drivers found in the \linux\ kernel. Their reactions
-encouraged me to write the \UCD\ which this document is intended to
-describe. The implementation of the \UCD\ is in the file \cdromc. This
-driver is intended to be an additional software layer that sits on top
-of the low-level device drivers for each \cdrom\ drive. By adding this
-additional layer, it is possible to have all the different \cdrom\
-devices behave {\em exactly\/} the same (insofar as the underlying
-hardware will allow).
-
-The goal of the \UCD\ is {\em not\/} to alienate driver developers who
-have not yet taken steps to support this effort. The goal of \UCD\ is
-simply to give people writing application programs for \cdrom\ drives
-{\em one\/} \linux\ \cdrom\ interface with consistent behavior for all
-\cdrom\ devices. In addition, this also provides a consistent interface
-between the low-level device driver code and the \linux\ kernel. Care
-is taken that 100\,\% compatibility exists with the data structures and
-programmer's interface defined in \cdromh. This guide was written to
-help \cdrom\ driver developers adapt their code to use the \UCD\ code
-defined in \cdromc.
-
-Personally, I think that the most important hardware interfaces are
-the IDE/ATAPI drives and, of course, the SCSI drives, but as prices
-of hardware drop continuously, it is also likely that people may have
-more than one \cdrom\ drive, possibly of mixed types. It is important
-that these drives behave in the same way. In December 1994, one of the
-cheapest \cdrom\ drives was a Philips cm206, a double-speed proprietary
-drive. In the months that I was busy writing a \linux\ driver for it,
-proprietary drives became obsolete and IDE/ATAPI drives became the
-standard. At the time of the last update to this document (November
-1997) it is becoming difficult to even {\em find} anything less than a
-16 speed \cdrom\ drive, and 24 speed drives are common.
-
-\newsection{Standardizing through another software level}
-\label{cdrom.c}
-
-At the time this document was conceived, all drivers directly
-implemented the \cdrom\ $ioctl()$ calls through their own routines. This
-led to the danger of different drivers forgetting to do important things
-like checking that the user was giving the driver valid data. More
-importantly, this led to the divergence of behavior, which has already
-been discussed.
-
-For this reason, the \UCD\ was created to enforce consistent \cdrom\
-drive behavior, and to provide a common set of services to the various
-low-level \cdrom\ device drivers. The \UCD\ now provides another
-software-level, that separates the $ioctl()$ and $open()$ implementation
-from the actual hardware implementation. Note that this effort has
-made few changes which will affect a user's application programs. The
-greatest change involved moving the contents of the various low-level
-\cdrom\ drivers' header files to the kernel's cdrom directory. This was
-done to help ensure that the user is only presented with only one cdrom
-interface, the interface defined in \cdromh.
-
-\cdrom\ drives are specific enough (\ie, different from other
-block-devices such as floppy or hard disc drives), to define a set
-of common {\em \cdrom\ device operations}, $<cdrom-device>_dops$.
-These operations are different from the classical block-device file
-operations, $<block-device>_fops$.
-
-The routines for the \UCD\ interface level are implemented in the file
-\cdromc. In this file, the \UCD\ interfaces with the kernel as a block
-device by registering the following general $struct\ file_operations$:
-$$
-\halign{$#$\ \hfil&$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-struct& file_operations\ cdrom_fops = \{\hidewidth\cr
-        &NULL,                  & lseek \cr
-        &block_read,            & read---general block-dev read \cr
-        &block_write,           & write---general block-dev write \cr
-        &NULL,                  & readdir \cr
-        &NULL,                  & select \cr
-        &cdrom_ioctl,           & ioctl \cr
-        &NULL,                  & mmap \cr
-        &cdrom_open,            & open \cr
-        &cdrom_release,         & release \cr
-        &NULL,                  & fsync \cr
-        &NULL,                  & fasync \cr
-        &cdrom_media_changed,   & media change \cr
-        &NULL                   & revalidate \cr
-\};\cr
-}
-$$ 
-
-Every active \cdrom\ device shares this $struct$. The routines
-declared above are all implemented in \cdromc, since this file is the
-place where the behavior of all \cdrom-devices is defined and
-standardized. The actual interface to the various types of \cdrom\ 
-hardware is still performed by various low-level \cdrom-device
-drivers. These routines simply implement certain {\em capabilities\/}
-that are common to all \cdrom\ (and really, all removable-media
-devices).
-
-Registration of a low-level \cdrom\ device driver is now done through
-the general routines in \cdromc, not through the Virtual File System
-(VFS) any more. The interface implemented in \cdromc\ is carried out
-through two general structures that contain information about the
-capabilities of the driver, and the specific drives on which the
-driver operates. The structures are:
-\begin{description}
-\item[$cdrom_device_ops$] 
-  This structure contains information about the low-level driver for a
-  \cdrom\ device. This structure is conceptually connected to the major
-  number of the device (although some drivers may have different
-  major numbers, as is the case for the IDE driver).
-\item[$cdrom_device_info$] 
-  This structure contains information about a particular \cdrom\ drive,
-  such as its device name, speed, etc. This structure is conceptually
-  connected to the minor number of the device.
-\end{description}
-
-Registering a particular \cdrom\ drive with the \UCD\ is done by the
-low-level device driver though a call to:
-$$register_cdrom(struct\ cdrom_device_info * <device>_info)  
-$$
-The device information structure, $<device>_info$, contains all the
-information needed for the kernel to interface with the low-level
-\cdrom\ device driver. One of the most important entries in this
-structure is a pointer to the $cdrom_device_ops$ structure of the
-low-level driver.
-
-The device operations structure, $cdrom_device_ops$, contains a list
-of pointers to the functions which are implemented in the low-level
-device driver. When \cdromc\ accesses a \cdrom\ device, it does it
-through the functions in this structure. It is impossible to know all
-the capabilities of future \cdrom\ drives, so it is expected that this
-list may need to be expanded from time to time as new technologies are
-developed. For example, CD-R and CD-R/W drives are beginning to become
-popular, and support will soon need to be added for them. For now, the
-current $struct$ is:
-$$
-\halign{$#$\ \hfil&$#$\ \hfil&\hbox to 10em{$#$\hss}&
-  $/*$ \rm# $*/$\hfil\cr
-struct& cdrom_device_ops\ \{ \hidewidth\cr
-  &int& (* open)(struct\ cdrom_device_info *, int)\cr
-  &void& (* release)(struct\ cdrom_device_info *);\cr 
-  &int& (* drive_status)(struct\ cdrom_device_info *, int);\cr     
-  &unsigned\ int& (* check_events)(struct\ cdrom_device_info *, unsigned\ int, int);\cr
-  &int& (* media_changed)(struct\ cdrom_device_info *, int);\cr 
-  &int& (* tray_move)(struct\ cdrom_device_info *, int);\cr
-  &int& (* lock_door)(struct\ cdrom_device_info *, int);\cr
-  &int& (* select_speed)(struct\ cdrom_device_info *, int);\cr
-  &int& (* select_disc)(struct\ cdrom_device_info *, int);\cr
-  &int& (* get_last_session) (struct\ cdrom_device_info *, 
-        struct\ cdrom_multisession *{});\cr
-  &int& (* get_mcn)(struct\ cdrom_device_info *, struct\ cdrom_mcn *{});\cr
-  &int& (* reset)(struct\ cdrom_device_info *);\cr
-  &int& (* audio_ioctl)(struct\ cdrom_device_info *, unsigned\ int, 
-        void *{});\cr 
-\noalign{\medskip}
-  &const\ int& capability;& capability flags \cr
-  &int& (* generic_packet)(struct\ cdrom_device_info *, struct\ packet_command *{});\cr
-\};\cr
-}
-$$
-When a low-level device driver implements one of these capabilities,
-it should add a function pointer to this $struct$. When a particular
-function is not implemented, however, this $struct$ should contain a
-NULL instead. The $capability$ flags specify the capabilities of the
-\cdrom\ hardware and/or low-level \cdrom\ driver when a \cdrom\ drive
-is registered with the \UCD.
-
-Note that most functions have fewer parameters than their
-$blkdev_fops$ counterparts. This is because very little of the
-information in the structures $inode$ and $file$ is used. For most
-drivers, the main parameter is the $struct$ $cdrom_device_info$, from
-which the major and minor number can be extracted. (Most low-level
-\cdrom\ drivers don't even look at the major and minor number though,
-since many of them only support one device.) This will be available
-through $dev$ in $cdrom_device_info$ described below.
-
-The drive-specific, minor-like information that is registered with
-\cdromc, currently contains the following fields:
-$$
-\halign{$#$\ \hfil&$#$\ \hfil&\hbox to 10em{$#$\hss}&
-  $/*$ \rm# $*/$\hfil\cr
-struct& cdrom_device_info\ \{ \hidewidth\cr
-  & const\ struct\ cdrom_device_ops *& ops;& device operations for this major\cr
-  & struct\ list_head& list;& linked list of all device_info\cr
-  & struct\ gendisk *& disk;& matching block layer disk\cr
-  & void *&  handle;& driver-dependent data\cr
-\noalign{\medskip}
-  & int& mask;& mask of capability: disables them \cr
-  & int& speed;& maximum speed for reading data \cr
-  & int& capacity;& number of discs in a jukebox \cr
-\noalign{\medskip}
-  &unsigned\ int& options : 30;& options flags \cr
-  &unsigned& mc_flags : 2;& media-change buffer flags \cr
-  &unsigned\ int& vfs_events;& cached events for vfs path\cr
-  &unsigned\ int& ioctl_events;& cached events for ioctl path\cr
-  & int& use_count;& number of times device is opened\cr
-  & char& name[20];& name of the device type\cr
-\noalign{\medskip}
-  &__u8& sanyo_slot : 2;& Sanyo 3-CD changer support\cr
-  &__u8& keeplocked : 1;& CDROM_LOCKDOOR status\cr
-  &__u8& reserved : 5;& not used yet\cr
-  & int& cdda_method;& see CDDA_* flags\cr
-  &__u8& last_sense;& saves last sense key\cr
-  &__u8& media_written;& dirty flag, DVD+RW bookkeeping\cr
-  &unsigned\ short& mmc3_profile;& current MMC3 profile\cr
-  & int& for_data;& unknown:TBD\cr
-  & int\ (* exit)\ (struct\ cdrom_device_info *);&& unknown:TBD\cr
-  & int& mrw_mode_page;& which MRW mode page is in use\cr
-\}\cr
-}$$
-Using this $struct$, a linked list of the registered minor devices is
-built, using the $next$ field. The device number, the device operations
-struct and specifications of properties of the drive are stored in this
-structure.
-
-The $mask$ flags can be used to mask out some of the capabilities listed
-in $ops\to capability$, if a specific drive doesn't support a feature
-of the driver. The value $speed$ specifies the maximum head-rate of the
-drive, measured in units of normal audio speed (176\,kB/sec raw data or
-150\,kB/sec file system data).  The parameters are declared $const$
-because they describe properties of the drive, which don't change after
-registration.
-
-A few registers contain variables local to the \cdrom\ drive. The
-flags $options$ are used to specify how the general \cdrom\ routines
-should behave. These various flags registers should provide enough
-flexibility to adapt to the different users' wishes (and {\em not\/} the
-`arbitrary' wishes of the author of the low-level device driver, as is
-the case in the old scheme). The register $mc_flags$ is used to buffer
-the information from $media_changed()$ to two separate queues. Other
-data that is specific to a minor drive, can be accessed through $handle$,
-which can point to a data structure specific to the low-level driver.
-The fields $use_count$, $next$, $options$ and $mc_flags$ need not be
-initialized.
-
-The intermediate software layer that \cdromc\ forms will perform some
-additional bookkeeping. The use count of the device (the number of
-processes that have the device opened) is registered in $use_count$. The
-function $cdrom_ioctl()$ will verify the appropriate user-memory regions
-for read and write, and in case a location on the CD is transferred,
-it will `sanitize' the format by making requests to the low-level
-drivers in a standard format, and translating all formats between the
-user-software and low level drivers. This relieves much of the drivers'
-memory checking and format checking and translation. Also, the necessary
-structures will be declared on the program stack.
-
-The implementation of the functions should be as defined in the
-following sections. Two functions {\em must\/} be implemented, namely
-$open()$ and $release()$. Other functions may be omitted, their
-corresponding capability flags will be cleared upon registration.
-Generally, a function returns zero on success and negative on error. A
-function call should return only after the command has completed, but of
-course waiting for the device should not use processor time.
-
-\subsection{$Int\ open(struct\ cdrom_device_info * cdi, int\ purpose)$}
-
-$Open()$ should try to open the device for a specific $purpose$, which
-can be either:
-\begin{itemize}
-\item[0] Open for reading data, as done by {\tt {mount()}} (2), or the
-user commands {\tt {dd}} or {\tt {cat}}.  
-\item[1] Open for $ioctl$ commands, as done by audio-CD playing
-programs.
-\end{itemize}
-Notice that any strategic code (closing tray upon $open()$, etc.)\ is
-done by the calling routine in \cdromc, so the low-level routine
-should only be concerned with proper initialization, such as spinning
-up the disc, etc. % and device-use count
-
-
-\subsection{$Void\ release(struct\ cdrom_device_info * cdi)$}
-
-
-Device-specific actions should be taken such as spinning down the device.
-However, strategic actions such as ejection of the tray, or unlocking
-the door, should be left over to the general routine $cdrom_release()$.
-This is the only function returning type $void$.
-
-\subsection{$Int\ drive_status(struct\ cdrom_device_info * cdi, int\ slot_nr)$}
-\label{drive status}
-
-The function $drive_status$, if implemented, should provide
-information on the status of the drive (not the status of the disc,
-which may or may not be in the drive). If the drive is not a changer,
-$slot_nr$ should be ignored. In \cdromh\ the possibilities are listed: 
-$$
-\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-CDS_NO_INFO& no information available\cr
-CDS_NO_DISC& no disc is inserted, tray is closed\cr
-CDS_TRAY_OPEN& tray is opened\cr
-CDS_DRIVE_NOT_READY& something is wrong, tray is moving?\cr
-CDS_DISC_OK& a disc is loaded and everything is fine\cr
-}
-$$
-
-\subsection{$Int\ media_changed(struct\ cdrom_device_info * cdi, int\ disc_nr)$}
-
-This function is very similar to the original function in $struct\ 
-file_operations$. It returns 1 if the medium of the device $cdi\to
-dev$ has changed since the last call, and 0 otherwise. The parameter
-$disc_nr$ identifies a specific slot in a juke-box, it should be
-ignored for single-disc drives.  Note that by `re-routing' this
-function through $cdrom_media_changed()$, we can implement separate
-queues for the VFS and a new $ioctl()$ function that can report device
-changes to software (\eg, an auto-mounting daemon).
-
-\subsection{$Int\ tray_move(struct\ cdrom_device_info * cdi, int\ position)$}
-
-This function, if implemented, should control the tray movement. (No
-other function should control this.) The parameter $position$ controls
-the desired direction of movement:
-\begin{itemize}
-\item[0] Close tray
-\item[1] Open tray
-\end{itemize}
-This function returns 0 upon success, and a non-zero value upon
-error. Note that if the tray is already in the desired position, no
-action need be taken, and the return value should be 0. 
-
-\subsection{$Int\ lock_door(struct\ cdrom_device_info * cdi, int\ lock)$}
-
-This function (and no other code) controls locking of the door, if the
-drive allows this. The value of $lock$ controls the desired locking
-state:
-\begin{itemize}
-\item[0] Unlock door, manual opening is allowed
-\item[1] Lock door, tray cannot be ejected manually
-\end{itemize}
-This function returns 0 upon success, and a non-zero value upon
-error. Note that if the door is already in the requested state, no
-action need be taken, and the return value should be 0. 
-
-\subsection{$Int\ select_speed(struct\ cdrom_device_info * cdi, int\ speed)$}
-
-Some \cdrom\ drives are capable of changing their head-speed. There
-are several reasons for changing the speed of a \cdrom\ drive. Badly
-pressed \cdrom s may benefit from less-than-maximum head rate. Modern
-\cdrom\ drives can obtain very high head rates (up to $24\times$ is
-common).  It has been reported that these drives can make reading
-errors at these high speeds, reducing the speed can prevent data loss
-in these circumstances.  Finally, some of these drives can
-make an annoyingly loud noise, which a lower speed may reduce. %Finally,
-%although the audio-low-pass filters probably aren't designed for it,
-%more than real-time playback of audio might be used for high-speed
-%copying of audio tracks.
-
-This function specifies the speed at which data is read or audio is
-played back. The value of $speed$ specifies the head-speed of the
-drive, measured in units of standard cdrom speed (176\,kB/sec raw data
-or 150\,kB/sec file system data). So to request that a \cdrom\ drive
-operate at 300\,kB/sec you would call the CDROM_SELECT_SPEED $ioctl$
-with $speed=2$. The special value `0' means `auto-selection', \ie,
-maximum data-rate or real-time audio rate. If the drive doesn't have
-this `auto-selection' capability, the decision should be made on the
-current disc loaded and the return value should be positive. A negative
-return value indicates an error.
-
-\subsection{$Int\ select_disc(struct\ cdrom_device_info * cdi, int\ number)$}
-
-If the drive can store multiple discs (a juke-box) this function
-will perform disc selection. It should return the number of the
-selected disc on success, a negative value on error. Currently, only
-the ide-cd driver supports this functionality.
-
-\subsection{$Int\ get_last_session(struct\ cdrom_device_info * cdi, struct\
-  cdrom_multisession * ms_info)$}
-
-This function should implement the old corresponding $ioctl()$. For
-device $cdi\to dev$, the start of the last session of the current disc
-should be returned in the pointer argument $ms_info$. Note that
-routines in \cdromc\ have sanitized this argument: its requested
-format will {\em always\/} be of the type $CDROM_LBA$ (linear block
-addressing mode), whatever the calling software requested. But
-sanitization goes even further: the low-level implementation may
-return the requested information in $CDROM_MSF$ format if it wishes so
-(setting the $ms_info\rightarrow addr_format$ field appropriately, of
-course) and the routines in \cdromc\ will make the transformation if
-necessary. The return value is 0 upon success.
-
-\subsection{$Int\ get_mcn(struct\ cdrom_device_info * cdi, struct\
-  cdrom_mcn * mcn)$}
-
-Some discs carry a `Media Catalog Number' (MCN), also called
-`Universal Product Code' (UPC). This number should reflect the number
-that is generally found in the bar-code on the product. Unfortunately,
-the few discs that carry such a number on the disc don't even use the
-same format. The return argument to this function is a pointer to a
-pre-declared memory region of type $struct\ cdrom_mcn$. The MCN is
-expected as a 13-character string, terminated by a null-character.
-
-\subsection{$Int\ reset(struct\ cdrom_device_info * cdi)$}
-
-This call should perform a hard-reset on the drive (although in
-circumstances that a hard-reset is necessary, a drive may very well not
-listen to commands anymore). Preferably, control is returned to the
-caller only after the drive has finished resetting. If the drive is no
-longer listening, it may be wise for the underlying low-level cdrom
-driver to time out.
-
-\subsection{$Int\ audio_ioctl(struct\ cdrom_device_info * cdi, unsigned\
-  int\ cmd, void * arg)$}
-
-Some of the \cdrom-$ioctl$s defined in \cdromh\ can be
-implemented by the routines described above, and hence the function
-$cdrom_ioctl$ will use those. However, most $ioctl$s deal with
-audio-control. We have decided to leave these to be accessed through a
-single function, repeating the arguments $cmd$ and $arg$. Note that
-the latter is of type $void*{}$, rather than $unsigned\ long\
-int$. The routine $cdrom_ioctl()$ does do some useful things,
-though. It sanitizes the address format type to $CDROM_MSF$ (Minutes,
-Seconds, Frames) for all audio calls. It also verifies the memory
-location of $arg$, and reserves stack-memory for the argument. This
-makes implementation of the $audio_ioctl()$ much simpler than in the
-old driver scheme. For example, you may look up the function
-$cm206_audio_ioctl()$ in {\tt {cm206.c}} that should be updated with
-this documentation. 
-
-An unimplemented ioctl should return $-ENOSYS$, but a harmless request
-(\eg, $CDROMSTART$) may be ignored by returning 0 (success). Other
-errors should be according to the standards, whatever they are. When
-an error is returned by the low-level driver, the \UCD\ tries whenever
-possible to return the error code to the calling program. (We may decide
-to sanitize the return value in $cdrom_ioctl()$ though, in order to
-guarantee a uniform interface to the audio-player software.)
-
-\subsection{$Int\ dev_ioctl(struct\ cdrom_device_info * cdi, unsigned\ int\
-  cmd, unsigned\ long\ arg)$}
-
-Some $ioctl$s seem to be specific to certain \cdrom\ drives. That is,
-they are introduced to service some capabilities of certain drives. In
-fact, there are 6 different $ioctl$s for reading data, either in some
-particular kind of format, or audio data. Not many drives support
-reading audio tracks as data, I believe this is because of protection
-of copyrights of artists. Moreover, I think that if audio-tracks are
-supported, it should be done through the VFS and not via $ioctl$s. A
-problem here could be the fact that audio-frames are 2352 bytes long,
-so either the audio-file-system should ask for 75264 bytes at once
-(the least common multiple of 512 and 2352), or the drivers should
-bend their backs to cope with this incoherence (to which I would be
-opposed).  Furthermore, it is very difficult for the hardware to find
-the exact frame boundaries, since there are no synchronization headers
-in audio frames.  Once these issues are resolved, this code should be
-standardized in \cdromc.
-
-Because there are so many $ioctl$s that seem to be introduced to
-satisfy certain drivers,\footnote{Is there software around that
-  actually uses these? I'd be interested!} any `non-standard' $ioctl$s
-are routed through the call $dev_ioctl()$. In principle, `private'
-$ioctl$s should be numbered after the device's major number, and not
-the general \cdrom\ $ioctl$ number, {\tt {0x53}}. Currently the
-non-supported $ioctl$s are: {\it CDROMREADMODE1, CDROMREADMODE2,
-  CDROMREADAUDIO, CDROMREADRAW, CDROMREADCOOKED, CDROMSEEK,
-  CDROMPLAY\-BLK and CDROM\-READALL}.
-
-
-\subsection{\cdrom\ capabilities}
-\label{capability}
-
-Instead of just implementing some $ioctl$ calls, the interface in
-\cdromc\ supplies the possibility to indicate the {\em capabilities\/}
-of a \cdrom\ drive. This can be done by ORing any number of
-capability-constants that are defined in \cdromh\ at the registration
-phase. Currently, the capabilities are any of:
-$$
-\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-CDC_CLOSE_TRAY& can close tray by software control\cr
-CDC_OPEN_TRAY& can open tray\cr
-CDC_LOCK& can lock and unlock the door\cr
-CDC_SELECT_SPEED& can select speed, in units of $\sim$150\,kB/s\cr
-CDC_SELECT_DISC& drive is juke-box\cr
-CDC_MULTI_SESSION& can read sessions $>\rm1$\cr
-CDC_MCN& can read Media Catalog Number\cr
-CDC_MEDIA_CHANGED& can report if disc has changed\cr
-CDC_PLAY_AUDIO& can perform audio-functions (play, pause, etc)\cr
-CDC_RESET& hard reset device\cr
-CDC_IOCTLS& driver has non-standard ioctls\cr
-CDC_DRIVE_STATUS& driver implements drive status\cr
-}
-$$
-The capability flag is declared $const$, to prevent drivers from
-accidentally tampering with the contents. The capability fags actually
-inform \cdromc\ of what the driver can do. If the drive found
-by the driver does not have the capability, is can be masked out by
-the $cdrom_device_info$ variable $mask$. For instance, the SCSI \cdrom\
-driver has implemented the code for loading and ejecting \cdrom's, and
-hence its corresponding flags in $capability$ will be set. But a SCSI
-\cdrom\ drive might be a caddy system, which can't load the tray, and
-hence for this drive the $cdrom_device_info$ struct will have set
-the $CDC_CLOSE_TRAY$ bit in $mask$.
-
-In the file \cdromc\ you will encounter many constructions of the type
-$$\it
-if\ (cdo\rightarrow capability \mathrel\& \mathord{\sim} cdi\rightarrow mask 
-   \mathrel{\&} CDC_<capability>) \ldots
-$$
-There is no $ioctl$ to set the mask\dots The reason is that
-I think it is better to control the {\em behavior\/} rather than the
-{\em capabilities}.
-
-\subsection{Options}
-
-A final flag register controls the {\em behavior\/} of the \cdrom\
-drives, in order to satisfy different users' wishes, hopefully
-independently of the ideas of the respective author who happened to
-have made the drive's support available to the \linux\ community. The
-current behavior options are:
-$$
-\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-CDO_AUTO_CLOSE& try to close tray upon device $open()$\cr
-CDO_AUTO_EJECT& try to open tray on last device $close()$\cr
-CDO_USE_FFLAGS& use $file_pointer\rightarrow f_flags$ to indicate
- purpose for $open()$\cr
-CDO_LOCK& try to lock door if device is opened\cr
-CDO_CHECK_TYPE& ensure disc type is data if opened for data\cr
-}
-$$
-
-The initial value of this register is $CDO_AUTO_CLOSE \mathrel|
-CDO_USE_FFLAGS \mathrel| CDO_LOCK$, reflecting my own view on user
-interface and software standards. Before you protest, there are two
-new $ioctl$s implemented in \cdromc, that allow you to control the
-behavior by software. These are:
-$$
-\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-CDROM_SET_OPTIONS& set options specified in $(int)\ arg$\cr
-CDROM_CLEAR_OPTIONS& clear options specified in $(int)\ arg$\cr
-}
-$$
-One option needs some more explanation: $CDO_USE_FFLAGS$. In the next
-newsection we explain what the need for this option is.
-
-A software package {\tt setcd}, available from the Debian distribution
-and {\tt sunsite.unc.edu}, allows user level control of these flags. 
-
-\newsection{The need to know the purpose of opening the \cdrom\ device}
-
-Traditionally, Unix devices can be used in two different `modes',
-either by reading/writing to the device file, or by issuing
-controlling commands to the device, by the device's $ioctl()$
-call. The problem with \cdrom\ drives, is that they can be used for
-two entirely different purposes. One is to mount removable
-file systems, \cdrom s, the other is to play audio CD's. Audio commands
-are implemented entirely through $ioctl$s, presumably because the
-first implementation (SUN?) has been such. In principle there is
-nothing wrong with this, but a good control of the `CD player' demands
-that the device can {\em always\/} be opened in order to give the
-$ioctl$ commands, regardless of the state the drive is in. 
-
-On the other hand, when used as a removable-media disc drive (what the
-original purpose of \cdrom s is) we would like to make sure that the
-disc drive is ready for operation upon opening the device. In the old
-scheme, some \cdrom\ drivers don't do any integrity checking, resulting
-in a number of i/o errors reported by the VFS to the kernel when an
-attempt for mounting a \cdrom\ on an empty drive occurs. This is not a
-particularly elegant way to find out that there is no \cdrom\ inserted;
-it more-or-less looks like the old IBM-PC trying to read an empty floppy
-drive for a couple of seconds, after which the system complains it
-can't read from it. Nowadays we can {\em sense\/} the existence of a
-removable medium in a drive, and we believe we should exploit that
-fact. An integrity check on opening of the device, that verifies the
-availability of a \cdrom\ and its correct type (data), would be
-desirable.
-
-These two ways of using a \cdrom\ drive, principally for data and
-secondarily for playing audio discs, have different demands for the
-behavior of the $open()$ call. Audio use simply wants to open the
-device in order to get a file handle which is needed for issuing
-$ioctl$ commands, while data use wants to open for correct and
-reliable data transfer. The only way user programs can indicate what
-their {\em purpose\/} of opening the device is, is through the $flags$
-parameter (see {\tt {open(2)}}). For \cdrom\ devices, these flags aren't
-implemented (some drivers implement checking for write-related flags,
-but this is not strictly necessary if the device file has correct
-permission flags). Most option flags simply don't make sense to
-\cdrom\ devices: $O_CREAT$, $O_NOCTTY$, $O_TRUNC$, $O_APPEND$, and
-$O_SYNC$ have no meaning to a \cdrom. 
-
-We therefore propose to use the flag $O_NONBLOCK$ to indicate
-that the device is opened just for issuing $ioctl$
-commands. Strictly, the meaning of $O_NONBLOCK$ is that opening and
-subsequent calls to the device don't cause the calling process to
-wait. We could interpret this as ``don't wait until someone has
-inserted some valid data-\cdrom.'' Thus, our proposal of the
-implementation for the $open()$ call for \cdrom s is:
-\begin{itemize}
-\item If no other flags are set than $O_RDONLY$, the device is opened
-for data transfer, and the return value will be 0 only upon successful
-initialization of the transfer. The call may even induce some actions
-on the \cdrom, such as closing the tray.  
-\item If the option flag $O_NONBLOCK$ is set, opening will always be
-successful, unless the whole device doesn't exist. The drive will take
-no actions whatsoever. 
-\end{itemize}
-
-\subsection{And what about standards?}
-
-You might hesitate to accept this proposal as it comes from the
-\linux\ community, and not from some standardizing institute. What
-about SUN, SGI, HP and all those other Unix and hardware vendors?
-Well, these companies are in the lucky position that they generally
-control both the hardware and software of their supported products,
-and are large enough to set their own standard. They do not have to
-deal with a dozen or more different, competing hardware
-configurations.\footnote{Incidentally, I think that SUN's approach to
-mounting \cdrom s is very good in origin: under Solaris a
-volume-daemon automatically mounts a newly inserted \cdrom\ under {\tt
-{/cdrom/$<volume-name>$/}}. In my opinion they should have pushed this
-further and have {\em every\/} \cdrom\ on the local area network be
-mounted at the similar location, \ie, no matter in which particular
-machine you insert a \cdrom, it will always appear at the same
-position in the directory tree, on every system. When I wanted to
-implement such a user-program for \linux, I came across the
-differences in behavior of the various drivers, and the need for an
-$ioctl$ informing about media changes.}
-
-We believe that using $O_NONBLOCK$ to indicate that a device is being opened
-for $ioctl$ commands only can be easily introduced in the \linux\
-community. All the CD-player authors will have to be informed, we can
-even send in our own patches to the programs. The use of $O_NONBLOCK$
-has most likely no influence on the behavior of the CD-players on
-other operating systems than \linux. Finally, a user can always revert
-to old behavior by a call to $ioctl(file_descriptor, CDROM_CLEAR_OPTIONS,
-CDO_USE_FFLAGS)$. 
-
-\subsection{The preferred strategy of $open()$}
-
-The routines in \cdromc\ are designed in such a way that run-time
-configuration of the behavior of \cdrom\ devices (of {\em any\/} type)
-can be carried out, by the $CDROM_SET/CLEAR_OPTIONS$ $ioctls$. Thus, various
-modes of operation can be set:
-\begin{description}
-\item[$CDO_AUTO_CLOSE \mathrel| CDO_USE_FFLAGS \mathrel| CDO_LOCK$] This
-is the default setting. (With $CDO_CHECK_TYPE$ it will be better, in the
-future.) If the device is not yet opened by any other process, and if
-the device is being opened for data ($O_NONBLOCK$ is not set) and the
-tray is found to be open, an attempt to close the tray is made. Then,
-it is verified that a disc is in the drive and, if $CDO_CHECK_TYPE$ is
-set, that it contains tracks of type `data mode 1.' Only if all tests
-are passed is the return value zero. The door is locked to prevent file
-system corruption. If the drive is opened for audio ($O_NONBLOCK$ is
-set), no actions are taken and a value of 0 will be returned. 
-\item[$CDO_AUTO_CLOSE \mathrel| CDO_AUTO_EJECT \mathrel| CDO_LOCK$] This
-mimics the behavior of the current sbpcd-driver. The option flags are
-ignored, the tray is closed on the first open, if necessary. Similarly,
-the tray is opened on the last release, \ie, if a \cdrom\ is unmounted,
-it is automatically ejected, such that the user can replace it.
-\end{description} 
-We hope that these option can convince everybody (both driver
-maintainers and user program developers) to adopt the new \cdrom\
-driver scheme and option flag interpretation.
-
-\newsection{Description of routines in \cdromc}
-
-Only a few routines in \cdromc\ are exported to the drivers. In this
-new section we will discuss these, as well as the functions that `take
-over' the \cdrom\ interface to the kernel. The header file belonging
-to \cdromc\ is called \cdromh. Formerly, some of the contents of this
-file were placed in the file {\tt {ucdrom.h}}, but this file has now been
-merged back into \cdromh.
-
-\subsection{$Struct\ file_operations\ cdrom_fops$}
-
-The contents of this structure were described in section~\ref{cdrom.c}.
-A pointer to this structure is assigned to the $fops$ field
-of the $struct gendisk$.
-
-\subsection{$Int\ register_cdrom( struct\ cdrom_device_info\ * cdi)$}
-
-This function is used in about the same way one registers $cdrom_fops$
-with the kernel, the device operations and information structures,
-as described in section~\ref{cdrom.c}, should be registered with the
-\UCD:
-$$
-register_cdrom(\&<device>_info));
-$$
-This function returns zero upon success, and non-zero upon
-failure. The structure $<device>_info$ should have a pointer to the
-driver's $<device>_dops$, as in 
-$$
-\vbox{\halign{&$#$\hfil\cr
-struct\ &cdrom_device_info\ <device>_info = \{\cr
-& <device>_dops;\cr
-&\ldots\cr
-\}\cr
-}}$$
-Note that a driver must have one static structure, $<device>_dops$, while
-it may have as many structures $<device>_info$ as there are minor devices
-active. $Register_cdrom()$ builds a linked list from these. 
-
-\subsection{$Void\ unregister_cdrom(struct\ cdrom_device_info * cdi)$}
-
-Unregistering device $cdi$ with minor number $MINOR(cdi\to dev)$ removes
-the minor device from the list. If it was the last registered minor for
-the low-level driver, this disconnects the registered device-operation
-routines from the \cdrom\ interface. This function returns zero upon
-success, and non-zero upon failure.
-
-\subsection{$Int\ cdrom_open(struct\ inode * ip, struct\ file * fp)$}
-
-This function is not called directly by the low-level drivers, it is
-listed in the standard $cdrom_fops$. If the VFS opens a file, this
-function becomes active. A strategy is implemented in this routine,
-taking care of all capabilities and options that are set in the
-$cdrom_device_ops$ connected to the device. Then, the program flow is
-transferred to the device_dependent $open()$ call.
-
-\subsection{$Void\ cdrom_release(struct\ inode *ip, struct\ file
-*fp)$}
-
-This function implements the reverse-logic of $cdrom_open()$, and then
-calls the device-dependent $release()$ routine. When the use-count has
-reached 0, the allocated buffers are flushed by calls to $sync_dev(dev)$
-and $invalidate_buffers(dev)$.
-
-
-\subsection{$Int\ cdrom_ioctl(struct\ inode *ip, struct\ file *fp,
-unsigned\ int\ cmd, unsigned\ long\ arg)$}
-\label{cdrom-ioctl}
-
-This function handles all the standard $ioctl$ requests for \cdrom\
-devices in a uniform way. The different calls fall into three
-categories: $ioctl$s that can be directly implemented by device
-operations, ones that are routed through the call $audio_ioctl()$, and
-the remaining ones, that are presumable device-dependent. Generally, a
-negative return value indicates an error.
-
-\subsubsection{Directly implemented $ioctl$s}
-\label{ioctl-direct}
-
-The following `old' \cdrom-$ioctl$s are implemented by directly
-calling device-operations in $cdrom_device_ops$, if implemented and
-not masked:
-\begin{description}
-\item[CDROMMULTISESSION] Requests the last session on a \cdrom.
-\item[CDROMEJECT] Open tray. 
-\item[CDROMCLOSETRAY] Close tray.
-\item[CDROMEJECT_SW] If $arg\not=0$, set behavior to auto-close (close
-tray on first open) and auto-eject (eject on last release), otherwise
-set behavior to non-moving on $open()$ and $release()$ calls.
-\item[CDROM_GET_MCN] Get the Media Catalog Number from a CD.
-\end{description}
-
-\subsubsection{$Ioctl$s routed through $audio_ioctl()$}
-\label{ioctl-audio}
-
-The following set of $ioctl$s are all implemented through a call to
-the $cdrom_fops$ function $audio_ioctl()$. Memory checks and
-allocation are performed in $cdrom_ioctl()$, and also sanitization of
-address format ($CDROM_LBA$/$CDROM_MSF$) is done.
-\begin{description}
-\item[CDROMSUBCHNL] Get sub-channel data in argument $arg$ of type $struct\
-cdrom_subchnl *{}$.
-\item[CDROMREADTOCHDR] Read Table of Contents header, in $arg$ of type
-$struct\ cdrom_tochdr *{}$. 
-\item[CDROMREADTOCENTRY] Read a Table of Contents entry in $arg$ and
-specified by $arg$ of type $struct\ cdrom_tocentry *{}$.
-\item[CDROMPLAYMSF] Play audio fragment specified in Minute, Second,
-Frame format, delimited by $arg$ of type $struct\ cdrom_msf *{}$.
-\item[CDROMPLAYTRKIND] Play audio fragment in track-index format
-delimited by $arg$ of type $struct\ \penalty-1000 cdrom_ti *{}$.
-\item[CDROMVOLCTRL] Set volume specified by $arg$ of type $struct\
-cdrom_volctrl *{}$.
-\item[CDROMVOLREAD] Read volume into by $arg$ of type $struct\
-cdrom_volctrl *{}$.
-\item[CDROMSTART] Spin up disc.
-\item[CDROMSTOP] Stop playback of audio fragment.
-\item[CDROMPAUSE] Pause playback of audio fragment.
-\item[CDROMRESUME] Resume playing.
-\end{description}
-
-\subsubsection{New $ioctl$s in \cdromc}
-
-The following $ioctl$s have been introduced to allow user programs to
-control the behavior of individual \cdrom\ devices. New $ioctl$
-commands can be identified by the underscores in their names.
-\begin{description}
-\item[CDROM_SET_OPTIONS] Set options specified by $arg$. Returns the
-option flag register after modification. Use  $arg = \rm0$ for reading
-the current flags.
-\item[CDROM_CLEAR_OPTIONS] Clear options specified by $arg$. Returns
-  the option flag register after modification.
-\item[CDROM_SELECT_SPEED] Select head-rate speed of disc specified as
-  by $arg$ in units of standard cdrom speed (176\,kB/sec raw data or
-  150\,kB/sec file system data). The value 0 means `auto-select', \ie,
-  play audio discs at real time and data discs at maximum speed. The value
-  $arg$ is checked against the maximum head rate of the drive found in the
-  $cdrom_dops$.
-\item[CDROM_SELECT_DISC] Select disc numbered $arg$ from a juke-box.
-  First disc is numbered 0. The number $arg$ is checked against the
-  maximum number of discs in the juke-box found in the $cdrom_dops$.
-\item[CDROM_MEDIA_CHANGED] Returns 1 if a disc has been changed since
-  the last call. Note that calls to $cdrom_media_changed$ by the VFS
-  are treated by an independent queue, so both mechanisms will detect
-  a media change once. For juke-boxes, an extra argument $arg$
-  specifies the slot for which the information is given. The special
-  value $CDSL_CURRENT$ requests that information about the currently
-  selected slot be returned.
-\item[CDROM_DRIVE_STATUS] Returns the status of the drive by a call to
-  $drive_status()$. Return values are defined in section~\ref{drive
-   status}. Note that this call doesn't return information on the
-  current playing activity of the drive; this can be polled through an
-  $ioctl$ call to $CDROMSUBCHNL$. For juke-boxes, an extra argument
-  $arg$ specifies the slot for which (possibly limited) information is
-  given. The special value $CDSL_CURRENT$ requests that information
-  about the currently selected slot be returned.
-\item[CDROM_DISC_STATUS] Returns the type of the disc currently in the
-  drive.  It should be viewed as a complement to $CDROM_DRIVE_STATUS$.
-  This $ioctl$ can provide \emph {some} information about the current
-  disc that is inserted in the drive.  This functionality used to be
-  implemented in the low level drivers, but is now carried out
-  entirely in \UCD.
-  
-  The history of development of the CD's use as a carrier medium for
-  various digital information has lead to many different disc types.
-  This $ioctl$ is useful only in the case that CDs have \emph {only
-    one} type of data on them.  While this is often the case, it is
-  also very common for CDs to have some tracks with data, and some
-  tracks with audio.  Because this is an existing interface, rather
-  than fixing this interface by changing the assumptions it was made
-  under, thereby breaking all user applications that use this
-  function, the \UCD\ implements this $ioctl$ as follows: If the CD in
-  question has audio tracks on it, and it has absolutely no CD-I, XA,
-  or data tracks on it, it will be reported as $CDS_AUDIO$.  If it has
-  both audio and data tracks, it will return $CDS_MIXED$.  If there
-  are no audio tracks on the disc, and if the CD in question has any
-  CD-I tracks on it, it will be reported as $CDS_XA_2_2$.  Failing
-  that, if the CD in question has any XA tracks on it, it will be
-  reported as $CDS_XA_2_1$.  Finally, if the CD in question has any
-  data tracks on it, it will be reported as a data CD ($CDS_DATA_1$).
-
-  This $ioctl$ can return:
-  $$
-  \halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr
-    CDS_NO_INFO& no information available\cr
-    CDS_NO_DISC& no disc is inserted, or tray is opened\cr
-    CDS_AUDIO& Audio disc (2352 audio bytes/frame)\cr
-    CDS_DATA_1& data disc, mode 1 (2048 user bytes/frame)\cr
-    CDS_XA_2_1& mixed data (XA), mode 2, form 1 (2048 user bytes)\cr
-    CDS_XA_2_2& mixed data (XA), mode 2, form 1 (2324  user bytes)\cr
-    CDS_MIXED& mixed audio/data disc\cr
-    }
-  $$
-  For some information concerning frame layout of the various disc
-  types, see a recent version of \cdromh.
-
-\item[CDROM_CHANGER_NSLOTS] Returns the number of slots in a
-  juke-box. 
-\item[CDROMRESET] Reset the drive. 
-\item[CDROM_GET_CAPABILITY] Returns the $capability$ flags for the
-  drive. Refer to section \ref{capability} for more information on
-  these flags.
-\item[CDROM_LOCKDOOR] Locks the door of the drive. $arg == \rm0$
-  unlocks the door, any other value locks it.
-\item[CDROM_DEBUG] Turns on debugging info. Only root is allowed
-  to do this. Same semantics as CDROM_LOCKDOOR.
-\end{description}
-
-\subsubsection{Device dependent $ioctl$s}
-
-Finally, all other $ioctl$s are passed to the function $dev_ioctl()$,
-if implemented. No memory allocation or verification is carried out. 
-
-\newsection{How to update your driver}
-
-\begin{enumerate}
-\item Make a backup of your current driver. 
-\item Get hold of the files \cdromc\ and \cdromh, they should be in
-  the directory tree that came with this documentation.
-\item Make sure you include \cdromh.
-\item Change the 3rd argument of $register_blkdev$ from
-$\&<your-drive>_fops$ to $\&cdrom_fops$. 
-\item Just after that line, add the following to register with the \UCD:
-  $$register_cdrom(\&<your-drive>_info);$$
-  Similarly, add a call to $unregister_cdrom()$ at the appropriate place.
-\item Copy an example of the device-operations $struct$ to your
-  source, \eg, from {\tt {cm206.c}} $cm206_dops$, and change all
-  entries to names corresponding to your driver, or names you just
-  happen to like. If your driver doesn't support a certain function,
-  make the entry $NULL$. At the entry $capability$ you should list all
-  capabilities your driver currently supports. If your driver
-  has a capability that is not listed, please send me a message.
-\item Copy the $cdrom_device_info$ declaration from the same example
-  driver, and modify the entries according to your needs. If your
-  driver dynamically determines the capabilities of the hardware, this
-  structure should also be declared dynamically. 
-\item Implement all functions in your $<device>_dops$ structure,
-  according to prototypes listed in \cdromh, and specifications given
-  in section~\ref{cdrom.c}. Most likely you have already implemented
-  the code in a large part, and you will almost certainly need to adapt the
-  prototype and return values.
-\item Rename your $<device>_ioctl()$ function to $audio_ioctl$ and
-  change the prototype a little. Remove entries listed in the first
-  part in section~\ref{cdrom-ioctl}, if your code was OK, these are
-  just calls to the routines you adapted in the previous step.
-\item You may remove all remaining memory checking code in the
-  $audio_ioctl()$ function that deals with audio commands (these are
-  listed in the second part of section~\ref{cdrom-ioctl}). There is no
-  need for memory allocation either, so most $case$s in the $switch$
-  statement look similar to:
-  $$
-  case\ CDROMREADTOCENTRY\colon get_toc_entry\bigl((struct\ 
-  cdrom_tocentry *{})\ arg\bigr);
-  $$
-\item All remaining $ioctl$ cases must be moved to a separate
-  function, $<device>_ioctl$, the device-dependent $ioctl$s. Note that
-  memory checking and allocation must be kept in this code!
-\item Change the prototypes of $<device>_open()$ and
-  $<device>_release()$, and remove any strategic code (\ie, tray
-  movement, door locking, etc.).
-\item Try to recompile the drivers. We advise you to use modules, both
-  for {\tt {cdrom.o}} and your driver, as debugging is much easier this
-  way.
-\end{enumerate} 
-
-\newsection{Thanks}
-
-Thanks to all the people involved.  First, Erik Andersen, who has
-taken over the torch in maintaining \cdromc\ and integrating much
-\cdrom-related code in the 2.1-kernel.  Thanks to Scott Snyder and
-Gerd Knorr, who were the first to implement this interface for SCSI
-and IDE-CD drivers and added many ideas for extension of the data
-structures relative to kernel~2.0.  Further thanks to Heiko Ei{\ss}feldt,
-Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard M\"onkeberg and Andrew
-Kroll, the \linux\ \cdrom\ device driver developers who were kind
-enough to give suggestions and criticisms during the writing. Finally
-of course, I want to thank Linus Torvalds for making this possible in
-the first place.
-
-\vfill
-$ \version\ $
-\eject
-\end{document}
diff --git a/Documentation/cdrom/cdrom-standard.txt b/Documentation/cdrom/cdrom-standard.txt
new file mode 100644
index 000000000000..dde4f7f7fdbf
--- /dev/null
+++ b/Documentation/cdrom/cdrom-standard.txt
@@ -0,0 +1,1063 @@
+=======================
+A Linux CD-ROM standard
+=======================
+
+:Author: David van Leeuwen <david@ElseWare.cistron.nl>
+:Date: 12 March 1999
+:Updated by: Erik Andersen (andersee@debian.org)
+:Updated by: Jens Axboe (axboe@image.dk)
+
+
+Introduction
+============
+
+Linux is probably the Unix-like operating system that supports
+the widest variety of hardware devices. The reasons for this are
+presumably
+
+- The large list of hardware devices available for the many platforms
+  that Linux now supports (i.e., i386-PCs, Sparc Suns, etc.)
+- The open design of the operating system, such that anybody can write a
+  driver for Linux.
+- There is plenty of source code around as examples of how to write a driver.
+
+The openness of Linux, and the many different types of available
+hardware has allowed Linux to support many different hardware devices.
+Unfortunately, the very openness that has allowed Linux to support
+all these different devices has also allowed the behavior of each
+device driver to differ significantly from one device to another.
+This divergence of behavior has been very significant for CD-ROM
+devices; the way a particular drive reacts to a `standard` *ioctl()*
+call varies greatly from one device driver to another. To avoid making
+their drivers totally inconsistent, the writers of Linux CD-ROM
+drivers generally created new device drivers by understanding, copying,
+and then changing an existing one. Unfortunately, this practice did not
+maintain uniform behavior across all the Linux CD-ROM drivers.
+
+This document describes an effort to establish Uniform behavior across
+all the different CD-ROM device drivers for Linux. This document also
+defines the various *ioctl()'s*, and how the low-level CD-ROM device
+drivers should implement them. Currently (as of the Linux 2.1.\ *x*
+development kernels) several low-level CD-ROM device drivers, including
+both IDE/ATAPI and SCSI, now use this Uniform interface.
+
+When the CD-ROM was developed, the interface between the CD-ROM drive
+and the computer was not specified in the standards. As a result, many
+different CD-ROM interfaces were developed. Some of them had their
+own proprietary design (Sony, Mitsumi, Panasonic, Philips), other
+manufacturers adopted an existing electrical interface and changed
+the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply
+adapted their drives to one or more of the already existing electrical
+interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and
+most of the `NoName` manufacturers). In cases where a new drive really
+brought its own interface or used its own command set and flow control
+scheme, either a separate driver had to be written, or an existing
+driver had to be enhanced. History has delivered us CD-ROM support for
+many of these different interfaces. Nowadays, almost all new CD-ROM
+drives are either IDE/ATAPI or SCSI, and it is very unlikely that any
+manufacturer will create a new interface. Even finding drives for the
+old proprietary interfaces is getting difficult.
+
+When (in the 1.3.70's) I looked at the existing software interface,
+which was expressed through `cdrom.h`, it appeared to be a rather wild
+set of commands and data formats [#f1]_. It seemed that many
+features of the software interface had been added to accommodate the
+capabilities of a particular drive, in an *ad hoc* manner. More
+importantly, it appeared that the behavior of the `standard` commands
+was different for most of the different drivers: e. g., some drivers
+close the tray if an *open()* call occurs when the tray is open, while
+others do not. Some drivers lock the door upon opening the device, to
+prevent an incoherent file system, but others don't, to allow software
+ejection. Undoubtedly, the capabilities of the different drives vary,
+but even when two drives have the same capability their drivers'
+behavior was usually different.
+
+.. [#f1]
+   I cannot recollect what kernel version I looked at, then,
+   presumably 1.2.13 and 1.3.34 --- the latest kernel that I was
+   indirectly involved in.
+
+I decided to start a discussion on how to make all the Linux CD-ROM
+drivers behave more uniformly. I began by contacting the developers of
+the many CD-ROM drivers found in the Linux kernel. Their reactions
+encouraged me to write the Uniform CD-ROM Driver which this document is
+intended to describe. The implementation of the Uniform CD-ROM Driver is
+in the file `cdrom.c`. This driver is intended to be an additional software
+layer that sits on top of the low-level device drivers for each CD-ROM drive.
+By adding this additional layer, it is possible to have all the different
+CD-ROM devices behave **exactly** the same (insofar as the underlying
+hardware will allow).
+
+The goal of the Uniform CD-ROM Driver is **not** to alienate driver developers
+whohave not yet taken steps to support this effort. The goal of Uniform CD-ROM
+Driver is simply to give people writing application programs for CD-ROM drives
+**one** Linux CD-ROM interface with consistent behavior for all
+CD-ROM devices. In addition, this also provides a consistent interface
+between the low-level device driver code and the Linux kernel. Care
+is taken that 100% compatibility exists with the data structures and
+programmer's interface defined in `cdrom.h`. This guide was written to
+help CD-ROM driver developers adapt their code to use the Uniform CD-ROM
+Driver code defined in `cdrom.c`.
+
+Personally, I think that the most important hardware interfaces are
+the IDE/ATAPI drives and, of course, the SCSI drives, but as prices
+of hardware drop continuously, it is also likely that people may have
+more than one CD-ROM drive, possibly of mixed types. It is important
+that these drives behave in the same way. In December 1994, one of the
+cheapest CD-ROM drives was a Philips cm206, a double-speed proprietary
+drive. In the months that I was busy writing a Linux driver for it,
+proprietary drives became obsolete and IDE/ATAPI drives became the
+standard. At the time of the last update to this document (November
+1997) it is becoming difficult to even **find** anything less than a
+16 speed CD-ROM drive, and 24 speed drives are common.
+
+.. _cdrom_api:
+
+Standardizing through another software level
+============================================
+
+At the time this document was conceived, all drivers directly
+implemented the CD-ROM *ioctl()* calls through their own routines. This
+led to the danger of different drivers forgetting to do important things
+like checking that the user was giving the driver valid data. More
+importantly, this led to the divergence of behavior, which has already
+been discussed.
+
+For this reason, the Uniform CD-ROM Driver was created to enforce consistent
+CD-ROM drive behavior, and to provide a common set of services to the various
+low-level CD-ROM device drivers. The Uniform CD-ROM Driver now provides another
+software-level, that separates the *ioctl()* and *open()* implementation
+from the actual hardware implementation. Note that this effort has
+made few changes which will affect a user's application programs. The
+greatest change involved moving the contents of the various low-level
+CD-ROM drivers\' header files to the kernel's cdrom directory. This was
+done to help ensure that the user is only presented with only one cdrom
+interface, the interface defined in `cdrom.h`.
+
+CD-ROM drives are specific enough (i. e., different from other
+block-devices such as floppy or hard disc drives), to define a set
+of common **CD-ROM device operations**, *<cdrom-device>_dops*.
+These operations are different from the classical block-device file
+operations, *<block-device>_fops*.
+
+The routines for the Uniform CD-ROM Driver interface level are implemented
+in the file `cdrom.c`. In this file, the Uniform CD-ROM Driver interfaces
+with the kernel as a block device by registering the following general
+*struct file_operations*::
+
+	struct file_operations cdrom_fops = {
+		NULL,			/∗ lseek ∗/
+		block _read ,		/∗ read—general block-dev read ∗/
+		block _write,		/∗ write—general block-dev write ∗/
+		NULL,			/∗ readdir ∗/
+		NULL,			/∗ select ∗/
+		cdrom_ioctl,		/∗ ioctl ∗/
+		NULL,			/∗ mmap ∗/
+		cdrom_open,		/∗ open ∗/
+		cdrom_release,		/∗ release ∗/
+		NULL,			/∗ fsync ∗/
+		NULL,			/∗ fasync ∗/
+		cdrom_media_changed,	/∗ media change ∗/
+		NULL			/∗ revalidate ∗/
+	};
+
+Every active CD-ROM device shares this *struct*. The routines
+declared above are all implemented in `cdrom.c`, since this file is the
+place where the behavior of all CD-ROM-devices is defined and
+standardized. The actual interface to the various types of CD-ROM
+hardware is still performed by various low-level CD-ROM-device
+drivers. These routines simply implement certain **capabilities**
+that are common to all CD-ROM (and really, all removable-media
+devices).
+
+Registration of a low-level CD-ROM device driver is now done through
+the general routines in `cdrom.c`, not through the Virtual File System
+(VFS) any more. The interface implemented in `cdrom.c` is carried out
+through two general structures that contain information about the
+capabilities of the driver, and the specific drives on which the
+driver operates. The structures are:
+
+cdrom_device_ops
+  This structure contains information about the low-level driver for a
+  CD-ROM device. This structure is conceptually connected to the major
+  number of the device (although some drivers may have different
+  major numbers, as is the case for the IDE driver).
+
+cdrom_device_info
+  This structure contains information about a particular CD-ROM drive,
+  such as its device name, speed, etc. This structure is conceptually
+  connected to the minor number of the device.
+
+Registering a particular CD-ROM drive with the Uniform CD-ROM Driver
+is done by the low-level device driver though a call to::
+
+	register_cdrom(struct cdrom_device_info * <device>_info)
+
+The device information structure, *<device>_info*, contains all the
+information needed for the kernel to interface with the low-level
+CD-ROM device driver. One of the most important entries in this
+structure is a pointer to the *cdrom_device_ops* structure of the
+low-level driver.
+
+The device operations structure, *cdrom_device_ops*, contains a list
+of pointers to the functions which are implemented in the low-level
+device driver. When `cdrom.c` accesses a CD-ROM device, it does it
+through the functions in this structure. It is impossible to know all
+the capabilities of future CD-ROM drives, so it is expected that this
+list may need to be expanded from time to time as new technologies are
+developed. For example, CD-R and CD-R/W drives are beginning to become
+popular, and support will soon need to be added for them. For now, the
+current *struct* is::
+
+	struct cdrom_device_ops {
+		int (*open)(struct cdrom_device_info *, int)
+		void (*release)(struct cdrom_device_info *);
+		int (*drive_status)(struct cdrom_device_info *, int);
+		unsigned int (*check_events)(struct cdrom_device_info *,
+					     unsigned int, int);
+		int (*media_changed)(struct cdrom_device_info *, int);
+		int (*tray_move)(struct cdrom_device_info *, int);
+		int (*lock_door)(struct cdrom_device_info *, int);
+		int (*select_speed)(struct cdrom_device_info *, int);
+		int (*select_disc)(struct cdrom_device_info *, int);
+		int (*get_last_session) (struct cdrom_device_info *,
+					 struct cdrom_multisession *);
+		int (*get_mcn)(struct cdrom_device_info *, struct cdrom_mcn *);
+		int (*reset)(struct cdrom_device_info *);
+		int (*audio_ioctl)(struct cdrom_device_info *,
+				   unsigned int, void *);
+		const int capability;		/* capability flags */
+		int (*generic_packet)(struct cdrom_device_info *,
+				      struct packet_command *);
+	};
+
+When a low-level device driver implements one of these capabilities,
+it should add a function pointer to this *struct*. When a particular
+function is not implemented, however, this *struct* should contain a
+NULL instead. The *capability* flags specify the capabilities of the
+CD-ROM hardware and/or low-level CD-ROM driver when a CD-ROM drive
+is registered with the Uniform CD-ROM Driver.
+
+Note that most functions have fewer parameters than their
+*blkdev_fops* counterparts. This is because very little of the
+information in the structures *inode* and *file* is used. For most
+drivers, the main parameter is the *struct* *cdrom_device_info*, from
+which the major and minor number can be extracted. (Most low-level
+CD-ROM drivers don't even look at the major and minor number though,
+since many of them only support one device.) This will be available
+through *dev* in *cdrom_device_info* described below.
+
+The drive-specific, minor-like information that is registered with
+`cdrom.c`, currently contains the following fields::
+
+  struct cdrom_device_info {
+	const struct cdrom_device_ops * ops; 	/* device operations for this major */
+	struct list_head list;			/* linked list of all device_info */
+	struct gendisk * disk;			/* matching block layer disk */
+	void *  handle;				/* driver-dependent data */
+
+	int mask; 				/* mask of capability: disables them */
+	int speed;				/* maximum speed for reading data */
+	int capacity;				/* number of discs in a jukebox */
+
+	unsigned int options:30;		/* options flags */
+	unsigned mc_flags:2;			/*  media-change buffer flags */
+	unsigned int vfs_events;		/*  cached events for vfs path */
+	unsigned int ioctl_events;		/*  cached events for ioctl path */
+	int use_count;				/*  number of times device is opened */
+	char name[20];				/*  name of the device type */
+
+	__u8 sanyo_slot : 2;			/*  Sanyo 3-CD changer support */
+	__u8 keeplocked : 1;			/*  CDROM_LOCKDOOR status */
+	__u8 reserved : 5;			/*  not used yet */
+	int cdda_method;			/*  see CDDA_* flags */
+	__u8 last_sense;			/*  saves last sense key */
+	__u8 media_written;			/*  dirty flag, DVD+RW bookkeeping */
+	unsigned short mmc3_profile;		/*  current MMC3 profile */
+	int for_data;				/*  unknown:TBD */
+	int (*exit)(struct cdrom_device_info *);/*  unknown:TBD */
+	int mrw_mode_page;			/*  which MRW mode page is in use */
+  };
+
+Using this *struct*, a linked list of the registered minor devices is
+built, using the *next* field. The device number, the device operations
+struct and specifications of properties of the drive are stored in this
+structure.
+
+The *mask* flags can be used to mask out some of the capabilities listed
+in *ops->capability*, if a specific drive doesn't support a feature
+of the driver. The value *speed* specifies the maximum head-rate of the
+drive, measured in units of normal audio speed (176kB/sec raw data or
+150kB/sec file system data). The parameters are declared *const*
+because they describe properties of the drive, which don't change after
+registration.
+
+A few registers contain variables local to the CD-ROM drive. The
+flags *options* are used to specify how the general CD-ROM routines
+should behave. These various flags registers should provide enough
+flexibility to adapt to the different users' wishes (and **not** the
+`arbitrary` wishes of the author of the low-level device driver, as is
+the case in the old scheme). The register *mc_flags* is used to buffer
+the information from *media_changed()* to two separate queues. Other
+data that is specific to a minor drive, can be accessed through *handle*,
+which can point to a data structure specific to the low-level driver.
+The fields *use_count*, *next*, *options* and *mc_flags* need not be
+initialized.
+
+The intermediate software layer that `cdrom.c` forms will perform some
+additional bookkeeping. The use count of the device (the number of
+processes that have the device opened) is registered in *use_count*. The
+function *cdrom_ioctl()* will verify the appropriate user-memory regions
+for read and write, and in case a location on the CD is transferred,
+it will `sanitize` the format by making requests to the low-level
+drivers in a standard format, and translating all formats between the
+user-software and low level drivers. This relieves much of the drivers'
+memory checking and format checking and translation. Also, the necessary
+structures will be declared on the program stack.
+
+The implementation of the functions should be as defined in the
+following sections. Two functions **must** be implemented, namely
+*open()* and *release()*. Other functions may be omitted, their
+corresponding capability flags will be cleared upon registration.
+Generally, a function returns zero on success and negative on error. A
+function call should return only after the command has completed, but of
+course waiting for the device should not use processor time.
+
+::
+
+	int open(struct cdrom_device_info *cdi, int purpose)
+
+*Open()* should try to open the device for a specific *purpose*, which
+can be either:
+
+- Open for reading data, as done by `mount()` (2), or the
+  user commands `dd` or `cat`.
+- Open for *ioctl* commands, as done by audio-CD playing programs.
+
+Notice that any strategic code (closing tray upon *open()*, etc.) is
+done by the calling routine in `cdrom.c`, so the low-level routine
+should only be concerned with proper initialization, such as spinning
+up the disc, etc.
+
+::
+
+	void release(struct cdrom_device_info *cdi)
+
+Device-specific actions should be taken such as spinning down the device.
+However, strategic actions such as ejection of the tray, or unlocking
+the door, should be left over to the general routine *cdrom_release()*.
+This is the only function returning type *void*.
+
+.. _cdrom_drive_status:
+
+::
+
+	int drive_status(struct cdrom_device_info *cdi, int slot_nr)
+
+The function *drive_status*, if implemented, should provide
+information on the status of the drive (not the status of the disc,
+which may or may not be in the drive). If the drive is not a changer,
+*slot_nr* should be ignored. In `cdrom.h` the possibilities are listed::
+
+
+	CDS_NO_INFO		/* no information available */
+	CDS_NO_DISC		/* no disc is inserted, tray is closed */
+	CDS_TRAY_OPEN		/* tray is opened */
+	CDS_DRIVE_NOT_READY	/* something is wrong, tray is moving? */
+	CDS_DISC_OK		/* a disc is loaded and everything is fine */
+
+::
+
+	int media_changed(struct cdrom_device_info *cdi, int disc_nr)
+
+This function is very similar to the original function in $struct
+file_operations*. It returns 1 if the medium of the device *cdi->dev*
+has changed since the last call, and 0 otherwise. The parameter
+*disc_nr* identifies a specific slot in a juke-box, it should be
+ignored for single-disc drives. Note that by `re-routing` this
+function through *cdrom_media_changed()*, we can implement separate
+queues for the VFS and a new *ioctl()* function that can report device
+changes to software (e. g., an auto-mounting daemon).
+
+::
+
+	int tray_move(struct cdrom_device_info *cdi, int position)
+
+This function, if implemented, should control the tray movement. (No
+other function should control this.) The parameter *position* controls
+the desired direction of movement:
+
+- 0 Close tray
+- 1 Open tray
+
+This function returns 0 upon success, and a non-zero value upon
+error. Note that if the tray is already in the desired position, no
+action need be taken, and the return value should be 0.
+
+::
+
+	int lock_door(struct cdrom_device_info *cdi, int lock)
+
+This function (and no other code) controls locking of the door, if the
+drive allows this. The value of *lock* controls the desired locking
+state:
+
+- 0 Unlock door, manual opening is allowed
+- 1 Lock door, tray cannot be ejected manually
+
+This function returns 0 upon success, and a non-zero value upon
+error. Note that if the door is already in the requested state, no
+action need be taken, and the return value should be 0.
+
+::
+
+	int select_speed(struct cdrom_device_info *cdi, int speed)
+
+Some CD-ROM drives are capable of changing their head-speed. There
+are several reasons for changing the speed of a CD-ROM drive. Badly
+pressed CD-ROM s may benefit from less-than-maximum head rate. Modern
+CD-ROM drives can obtain very high head rates (up to *24x* is
+common). It has been reported that these drives can make reading
+errors at these high speeds, reducing the speed can prevent data loss
+in these circumstances. Finally, some of these drives can
+make an annoyingly loud noise, which a lower speed may reduce.
+
+This function specifies the speed at which data is read or audio is
+played back. The value of *speed* specifies the head-speed of the
+drive, measured in units of standard cdrom speed (176kB/sec raw data
+or 150kB/sec file system data). So to request that a CD-ROM drive
+operate at 300kB/sec you would call the CDROM_SELECT_SPEED *ioctl*
+with *speed=2*. The special value `0` means `auto-selection`, i. e.,
+maximum data-rate or real-time audio rate. If the drive doesn't have
+this `auto-selection` capability, the decision should be made on the
+current disc loaded and the return value should be positive. A negative
+return value indicates an error.
+
+::
+
+	int select_disc(struct cdrom_device_info *cdi, int number)
+
+If the drive can store multiple discs (a juke-box) this function
+will perform disc selection. It should return the number of the
+selected disc on success, a negative value on error. Currently, only
+the ide-cd driver supports this functionality.
+
+::
+
+	int get_last_session(struct cdrom_device_info *cdi,
+			     struct cdrom_multisession *ms_info)
+
+This function should implement the old corresponding *ioctl()*. For
+device *cdi->dev*, the start of the last session of the current disc
+should be returned in the pointer argument *ms_info*. Note that
+routines in `cdrom.c` have sanitized this argument: its requested
+format will **always** be of the type *CDROM_LBA* (linear block
+addressing mode), whatever the calling software requested. But
+sanitization goes even further: the low-level implementation may
+return the requested information in *CDROM_MSF* format if it wishes so
+(setting the *ms_info->addr_format* field appropriately, of
+course) and the routines in `cdrom.c` will make the transformation if
+necessary. The return value is 0 upon success.
+
+::
+
+	int get_mcn(struct cdrom_device_info *cdi,
+		    struct cdrom_mcn *mcn)
+
+Some discs carry a `Media Catalog Number` (MCN), also called
+`Universal Product Code` (UPC). This number should reflect the number
+that is generally found in the bar-code on the product. Unfortunately,
+the few discs that carry such a number on the disc don't even use the
+same format. The return argument to this function is a pointer to a
+pre-declared memory region of type *struct cdrom_mcn*. The MCN is
+expected as a 13-character string, terminated by a null-character.
+
+::
+
+	int reset(struct cdrom_device_info *cdi)
+
+This call should perform a hard-reset on the drive (although in
+circumstances that a hard-reset is necessary, a drive may very well not
+listen to commands anymore). Preferably, control is returned to the
+caller only after the drive has finished resetting. If the drive is no
+longer listening, it may be wise for the underlying low-level cdrom
+driver to time out.
+
+::
+
+	int audio_ioctl(struct cdrom_device_info *cdi,
+			unsigned int cmd, void *arg)
+
+Some of the CD-ROM-\ *ioctl()*\ 's defined in `cdrom.h` can be
+implemented by the routines described above, and hence the function
+*cdrom_ioctl* will use those. However, most *ioctl()*\ 's deal with
+audio-control. We have decided to leave these to be accessed through a
+single function, repeating the arguments *cmd* and *arg*. Note that
+the latter is of type *void*, rather than *unsigned long int*.
+The routine *cdrom_ioctl()* does do some useful things,
+though. It sanitizes the address format type to *CDROM_MSF* (Minutes,
+Seconds, Frames) for all audio calls. It also verifies the memory
+location of *arg*, and reserves stack-memory for the argument. This
+makes implementation of the *audio_ioctl()* much simpler than in the
+old driver scheme. For example, you may look up the function
+*cm206_audio_ioctl()* `cm206.c` that should be updated with
+this documentation.
+
+An unimplemented ioctl should return *-ENOSYS*, but a harmless request
+(e. g., *CDROMSTART*) may be ignored by returning 0 (success). Other
+errors should be according to the standards, whatever they are. When
+an error is returned by the low-level driver, the Uniform CD-ROM Driver
+tries whenever possible to return the error code to the calling program.
+(We may decide to sanitize the return value in *cdrom_ioctl()* though, in
+order to guarantee a uniform interface to the audio-player software.)
+
+::
+
+	int dev_ioctl(struct cdrom_device_info *cdi,
+		      unsigned int cmd, unsigned long arg)
+
+Some *ioctl()'s* seem to be specific to certain CD-ROM drives. That is,
+they are introduced to service some capabilities of certain drives. In
+fact, there are 6 different *ioctl()'s* for reading data, either in some
+particular kind of format, or audio data. Not many drives support
+reading audio tracks as data, I believe this is because of protection
+of copyrights of artists. Moreover, I think that if audio-tracks are
+supported, it should be done through the VFS and not via *ioctl()'s*. A
+problem here could be the fact that audio-frames are 2352 bytes long,
+so either the audio-file-system should ask for 75264 bytes at once
+(the least common multiple of 512 and 2352), or the drivers should
+bend their backs to cope with this incoherence (to which I would be
+opposed). Furthermore, it is very difficult for the hardware to find
+the exact frame boundaries, since there are no synchronization headers
+in audio frames. Once these issues are resolved, this code should be
+standardized in `cdrom.c`.
+
+Because there are so many *ioctl()'s* that seem to be introduced to
+satisfy certain drivers [#f2]_, any non-standard *ioctl()*\ s
+are routed through the call *dev_ioctl()*. In principle, `private`
+*ioctl()*\ 's should be numbered after the device's major number, and not
+the general CD-ROM *ioctl* number, `0x53`. Currently the
+non-supported *ioctl()'s* are:
+
+	CDROMREADMODE1, CDROMREADMODE2, CDROMREADAUDIO, CDROMREADRAW,
+	CDROMREADCOOKED, CDROMSEEK, CDROMPLAY-BLK and CDROM-READALL
+
+.. [#f2]
+
+   Is there software around that actually uses these? I'd be interested!
+
+.. _cdrom_capabilities:
+
+CD-ROM capabilities
+-------------------
+
+Instead of just implementing some *ioctl* calls, the interface in
+`cdrom.c` supplies the possibility to indicate the **capabilities**
+of a CD-ROM drive. This can be done by ORing any number of
+capability-constants that are defined in `cdrom.h` at the registration
+phase. Currently, the capabilities are any of::
+
+	CDC_CLOSE_TRAY		/* can close tray by software control */
+	CDC_OPEN_TRAY		/* can open tray */
+	CDC_LOCK		/* can lock and unlock the door */
+	CDC_SELECT_SPEED	/* can select speed, in units of * sim*150 ,kB/s */
+	CDC_SELECT_DISC		/* drive is juke-box */
+	CDC_MULTI_SESSION	/* can read sessions *> rm1* */
+	CDC_MCN			/* can read Media Catalog Number */
+	CDC_MEDIA_CHANGED	/* can report if disc has changed */
+	CDC_PLAY_AUDIO		/* can perform audio-functions (play, pause, etc) */
+	CDC_RESET		/* hard reset device */
+	CDC_IOCTLS		/* driver has non-standard ioctls */
+	CDC_DRIVE_STATUS	/* driver implements drive status */
+
+The capability flag is declared *const*, to prevent drivers from
+accidentally tampering with the contents. The capability fags actually
+inform `cdrom.c` of what the driver can do. If the drive found
+by the driver does not have the capability, is can be masked out by
+the *cdrom_device_info* variable *mask*. For instance, the SCSI CD-ROM
+driver has implemented the code for loading and ejecting CD-ROM's, and
+hence its corresponding flags in *capability* will be set. But a SCSI
+CD-ROM drive might be a caddy system, which can't load the tray, and
+hence for this drive the *cdrom_device_info* struct will have set
+the *CDC_CLOSE_TRAY* bit in *mask*.
+
+In the file `cdrom.c` you will encounter many constructions of the type::
+
+	if (cdo->capability & ∼cdi->mask & CDC _⟨capability⟩) ...
+
+There is no *ioctl* to set the mask... The reason is that
+I think it is better to control the **behavior** rather than the
+**capabilities**.
+
+Options
+-------
+
+A final flag register controls the **behavior** of the CD-ROM
+drives, in order to satisfy different users' wishes, hopefully
+independently of the ideas of the respective author who happened to
+have made the drive's support available to the Linux community. The
+current behavior options are::
+
+	CDO_AUTO_CLOSE	/* try to close tray upon device open() */
+	CDO_AUTO_EJECT	/* try to open tray on last device close() */
+	CDO_USE_FFLAGS	/* use file_pointer->f_flags to indicate purpose for open() */
+	CDO_LOCK	/* try to lock door if device is opened */
+	CDO_CHECK_TYPE	/* ensure disc type is data if opened for data */
+
+The initial value of this register is
+`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`, reflecting my own view on user
+interface and software standards. Before you protest, there are two
+new *ioctl()'s* implemented in `cdrom.c`, that allow you to control the
+behavior by software. These are::
+
+	CDROM_SET_OPTIONS	/* set options specified in (int)arg */
+	CDROM_CLEAR_OPTIONS	/* clear options specified in (int)arg */
+
+One option needs some more explanation: *CDO_USE_FFLAGS*. In the next
+newsection we explain what the need for this option is.
+
+A software package `setcd`, available from the Debian distribution
+and `sunsite.unc.edu`, allows user level control of these flags.
+
+
+The need to know the purpose of opening the CD-ROM device
+=========================================================
+
+Traditionally, Unix devices can be used in two different `modes`,
+either by reading/writing to the device file, or by issuing
+controlling commands to the device, by the device's *ioctl()*
+call. The problem with CD-ROM drives, is that they can be used for
+two entirely different purposes. One is to mount removable
+file systems, CD-ROM's, the other is to play audio CD's. Audio commands
+are implemented entirely through *ioctl()\'s*, presumably because the
+first implementation (SUN?) has been such. In principle there is
+nothing wrong with this, but a good control of the `CD player` demands
+that the device can **always** be opened in order to give the
+*ioctl* commands, regardless of the state the drive is in.
+
+On the other hand, when used as a removable-media disc drive (what the
+original purpose of CD-ROM s is) we would like to make sure that the
+disc drive is ready for operation upon opening the device. In the old
+scheme, some CD-ROM drivers don't do any integrity checking, resulting
+in a number of i/o errors reported by the VFS to the kernel when an
+attempt for mounting a CD-ROM on an empty drive occurs. This is not a
+particularly elegant way to find out that there is no CD-ROM inserted;
+it more-or-less looks like the old IBM-PC trying to read an empty floppy
+drive for a couple of seconds, after which the system complains it
+can't read from it. Nowadays we can **sense** the existence of a
+removable medium in a drive, and we believe we should exploit that
+fact. An integrity check on opening of the device, that verifies the
+availability of a CD-ROM and its correct type (data), would be
+desirable.
+
+These two ways of using a CD-ROM drive, principally for data and
+secondarily for playing audio discs, have different demands for the
+behavior of the *open()* call. Audio use simply wants to open the
+device in order to get a file handle which is needed for issuing
+*ioctl* commands, while data use wants to open for correct and
+reliable data transfer. The only way user programs can indicate what
+their *purpose* of opening the device is, is through the *flags*
+parameter (see `open(2)`). For CD-ROM devices, these flags aren't
+implemented (some drivers implement checking for write-related flags,
+but this is not strictly necessary if the device file has correct
+permission flags). Most option flags simply don't make sense to
+CD-ROM devices: *O_CREAT*, *O_NOCTTY*, *O_TRUNC*, *O_APPEND*, and
+*O_SYNC* have no meaning to a CD-ROM.
+
+We therefore propose to use the flag *O_NONBLOCK* to indicate
+that the device is opened just for issuing *ioctl*
+commands. Strictly, the meaning of *O_NONBLOCK* is that opening and
+subsequent calls to the device don't cause the calling process to
+wait. We could interpret this as don't wait until someone has
+inserted some valid data-CD-ROM. Thus, our proposal of the
+implementation for the *open()* call for CD-ROM s is:
+
+- If no other flags are set than *O_RDONLY*, the device is opened
+  for data transfer, and the return value will be 0 only upon successful
+  initialization of the transfer. The call may even induce some actions
+  on the CD-ROM, such as closing the tray.
+- If the option flag *O_NONBLOCK* is set, opening will always be
+  successful, unless the whole device doesn't exist. The drive will take
+  no actions whatsoever.
+
+And what about standards?
+-------------------------
+
+You might hesitate to accept this proposal as it comes from the
+Linux community, and not from some standardizing institute. What
+about SUN, SGI, HP and all those other Unix and hardware vendors?
+Well, these companies are in the lucky position that they generally
+control both the hardware and software of their supported products,
+and are large enough to set their own standard. They do not have to
+deal with a dozen or more different, competing hardware
+configurations\ [#f3]_.
+
+.. [#f3]
+
+   Incidentally, I think that SUN's approach to mounting CD-ROM s is very
+   good in origin: under Solaris a volume-daemon automatically mounts a
+   newly inserted CD-ROM under `/cdrom/*<volume-name>*`.
+
+   In my opinion they should have pushed this
+   further and have **every** CD-ROM on the local area network be
+   mounted at the similar location, i. e., no matter in which particular
+   machine you insert a CD-ROM, it will always appear at the same
+   position in the directory tree, on every system. When I wanted to
+   implement such a user-program for Linux, I came across the
+   differences in behavior of the various drivers, and the need for an
+   *ioctl* informing about media changes.
+
+We believe that using *O_NONBLOCK* to indicate that a device is being opened
+for *ioctl* commands only can be easily introduced in the Linux
+community. All the CD-player authors will have to be informed, we can
+even send in our own patches to the programs. The use of *O_NONBLOCK*
+has most likely no influence on the behavior of the CD-players on
+other operating systems than Linux. Finally, a user can always revert
+to old behavior by a call to
+*ioctl(file_descriptor, CDROM_CLEAR_OPTIONS, CDO_USE_FFLAGS)*.
+
+The preferred strategy of *open()*
+----------------------------------
+
+The routines in `cdrom.c` are designed in such a way that run-time
+configuration of the behavior of CD-ROM devices (of **any** type)
+can be carried out, by the *CDROM_SET/CLEAR_OPTIONS* *ioctls*. Thus, various
+modes of operation can be set:
+
+`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`
+   This is the default setting. (With *CDO_CHECK_TYPE* it will be better, in
+   the future.) If the device is not yet opened by any other process, and if
+   the device is being opened for data (*O_NONBLOCK* is not set) and the
+   tray is found to be open, an attempt to close the tray is made. Then,
+   it is verified that a disc is in the drive and, if *CDO_CHECK_TYPE* is
+   set, that it contains tracks of type `data mode 1`. Only if all tests
+   are passed is the return value zero. The door is locked to prevent file
+   system corruption. If the drive is opened for audio (*O_NONBLOCK* is
+   set), no actions are taken and a value of 0 will be returned.
+
+`CDO_AUTO_CLOSE | CDO_AUTO_EJECT | CDO_LOCK`
+   This mimics the behavior of the current sbpcd-driver. The option flags are
+   ignored, the tray is closed on the first open, if necessary. Similarly,
+   the tray is opened on the last release, i. e., if a CD-ROM is unmounted,
+   it is automatically ejected, such that the user can replace it.
+
+We hope that these option can convince everybody (both driver
+maintainers and user program developers) to adopt the new CD-ROM
+driver scheme and option flag interpretation.
+
+Description of routines in `cdrom.c`
+====================================
+
+Only a few routines in `cdrom.c` are exported to the drivers. In this
+new section we will discuss these, as well as the functions that `take
+over' the CD-ROM interface to the kernel. The header file belonging
+to `cdrom.c` is called `cdrom.h`. Formerly, some of the contents of this
+file were placed in the file `ucdrom.h`, but this file has now been
+merged back into `cdrom.h`.
+
+::
+
+	struct file_operations cdrom_fops
+
+The contents of this structure were described in cdrom_api_.
+A pointer to this structure is assigned to the *fops* field
+of the *struct gendisk*.
+
+::
+
+	int register_cdrom(struct cdrom_device_info *cdi)
+
+This function is used in about the same way one registers *cdrom_fops*
+with the kernel, the device operations and information structures,
+as described in cdrom_api_, should be registered with the
+Uniform CD-ROM Driver::
+
+	register_cdrom(&<device>_info);
+
+
+This function returns zero upon success, and non-zero upon
+failure. The structure *<device>_info* should have a pointer to the
+driver's *<device>_dops*, as in::
+
+	struct cdrom_device_info <device>_info = {
+		<device>_dops;
+		...
+	}
+
+Note that a driver must have one static structure, *<device>_dops*, while
+it may have as many structures *<device>_info* as there are minor devices
+active. *Register_cdrom()* builds a linked list from these.
+
+
+::
+
+	void unregister_cdrom(struct cdrom_device_info *cdi)
+
+Unregistering device *cdi* with minor number *MINOR(cdi->dev)* removes
+the minor device from the list. If it was the last registered minor for
+the low-level driver, this disconnects the registered device-operation
+routines from the CD-ROM interface. This function returns zero upon
+success, and non-zero upon failure.
+
+::
+
+	int cdrom_open(struct inode * ip, struct file * fp)
+
+This function is not called directly by the low-level drivers, it is
+listed in the standard *cdrom_fops*. If the VFS opens a file, this
+function becomes active. A strategy is implemented in this routine,
+taking care of all capabilities and options that are set in the
+*cdrom_device_ops* connected to the device. Then, the program flow is
+transferred to the device_dependent *open()* call.
+
+::
+
+	void cdrom_release(struct inode *ip, struct file *fp)
+
+This function implements the reverse-logic of *cdrom_open()*, and then
+calls the device-dependent *release()* routine. When the use-count has
+reached 0, the allocated buffers are flushed by calls to *sync_dev(dev)*
+and *invalidate_buffers(dev)*.
+
+
+.. _cdrom_ioctl:
+
+::
+
+	int cdrom_ioctl(struct inode *ip, struct file *fp,
+			unsigned int cmd, unsigned long arg)
+
+This function handles all the standard *ioctl* requests for CD-ROM
+devices in a uniform way. The different calls fall into three
+categories: *ioctl()'s* that can be directly implemented by device
+operations, ones that are routed through the call *audio_ioctl()*, and
+the remaining ones, that are presumable device-dependent. Generally, a
+negative return value indicates an error.
+
+Directly implemented *ioctl()'s*
+--------------------------------
+
+The following `old` CD-ROM *ioctl()*\ 's are implemented by directly
+calling device-operations in *cdrom_device_ops*, if implemented and
+not masked:
+
+`CDROMMULTISESSION`
+	Requests the last session on a CD-ROM.
+`CDROMEJECT`
+	Open tray.
+`CDROMCLOSETRAY`
+	Close tray.
+`CDROMEJECT_SW`
+	If *arg\not=0*, set behavior to auto-close (close
+	tray on first open) and auto-eject (eject on last release), otherwise
+	set behavior to non-moving on *open()* and *release()* calls.
+`CDROM_GET_MCN`
+	Get the Media Catalog Number from a CD.
+
+*Ioctl*s routed through *audio_ioctl()*
+---------------------------------------
+
+The following set of *ioctl()'s* are all implemented through a call to
+the *cdrom_fops* function *audio_ioctl()*. Memory checks and
+allocation are performed in *cdrom_ioctl()*, and also sanitization of
+address format (*CDROM_LBA*/*CDROM_MSF*) is done.
+
+`CDROMSUBCHNL`
+	Get sub-channel data in argument *arg* of type
+	`struct cdrom_subchnl *`.
+`CDROMREADTOCHDR`
+	Read Table of Contents header, in *arg* of type
+	`struct cdrom_tochdr *`.
+`CDROMREADTOCENTRY`
+	Read a Table of Contents entry in *arg* and specified by *arg*
+	of type `struct cdrom_tocentry *`.
+`CDROMPLAYMSF`
+	Play audio fragment specified in Minute, Second, Frame format,
+	delimited by *arg* of type `struct cdrom_msf *`.
+`CDROMPLAYTRKIND`
+	Play audio fragment in track-index format delimited by *arg*
+	of type `struct cdrom_ti *`.
+`CDROMVOLCTRL`
+	Set volume specified by *arg* of type `struct cdrom_volctrl *`.
+`CDROMVOLREAD`
+	Read volume into by *arg* of type `struct cdrom_volctrl *`.
+`CDROMSTART`
+	Spin up disc.
+`CDROMSTOP`
+	Stop playback of audio fragment.
+`CDROMPAUSE`
+	Pause playback of audio fragment.
+`CDROMRESUME`
+	Resume playing.
+
+New *ioctl()'s* in `cdrom.c`
+----------------------------
+
+The following *ioctl()'s* have been introduced to allow user programs to
+control the behavior of individual CD-ROM devices. New *ioctl*
+commands can be identified by the underscores in their names.
+
+`CDROM_SET_OPTIONS`
+	Set options specified by *arg*. Returns the option flag register
+	after modification. Use *arg = \rm0* for reading the current flags.
+`CDROM_CLEAR_OPTIONS`
+	Clear options specified by *arg*. Returns the option flag register
+	after modification.
+`CDROM_SELECT_SPEED`
+	Select head-rate speed of disc specified as by *arg* in units
+	of standard cdrom speed (176\,kB/sec raw data or
+	150kB/sec file system data). The value 0 means `auto-select`,
+	i. e., play audio discs at real time and data discs at maximum speed.
+	The value *arg* is checked against the maximum head rate of the
+	drive found in the *cdrom_dops*.
+`CDROM_SELECT_DISC`
+	Select disc numbered *arg* from a juke-box.
+
+	First disc is numbered 0. The number *arg* is checked against the
+	maximum number of discs in the juke-box found in the *cdrom_dops*.
+`CDROM_MEDIA_CHANGED`
+	Returns 1 if a disc has been changed since the last call.
+	Note that calls to *cdrom_media_changed* by the VFS are treated
+	by an independent queue, so both mechanisms will detect a
+	media change once. For juke-boxes, an extra argument *arg*
+	specifies the slot for which the information is given. The special
+	value *CDSL_CURRENT* requests that information about the currently
+	selected slot be returned.
+`CDROM_DRIVE_STATUS`
+	Returns the status of the drive by a call to
+	*drive_status()*. Return values are defined in cdrom_drive_status_.
+	Note that this call doesn't return information on the
+	current playing activity of the drive; this can be polled through
+	an *ioctl* call to *CDROMSUBCHNL*. For juke-boxes, an extra argument
+	*arg* specifies the slot for which (possibly limited) information is
+	given. The special value *CDSL_CURRENT* requests that information
+	about the currently selected slot be returned.
+`CDROM_DISC_STATUS`
+	Returns the type of the disc currently in the drive.
+	It should be viewed as a complement to *CDROM_DRIVE_STATUS*.
+	This *ioctl* can provide *some* information about the current
+	disc that is inserted in the drive. This functionality used to be
+	implemented in the low level drivers, but is now carried out
+	entirely in Uniform CD-ROM Driver.
+
+	The history of development of the CD's use as a carrier medium for
+	various digital information has lead to many different disc types.
+	This *ioctl* is useful only in the case that CDs have \emph {only
+	one} type of data on them. While this is often the case, it is
+	also very common for CDs to have some tracks with data, and some
+	tracks with audio. Because this is an existing interface, rather
+	than fixing this interface by changing the assumptions it was made
+	under, thereby breaking all user applications that use this
+	function, the Uniform CD-ROM Driver implements this *ioctl* as
+	follows: If the CD in question has audio tracks on it, and it has
+	absolutely no CD-I, XA, or data tracks on it, it will be reported
+	as *CDS_AUDIO*. If it has both audio and data tracks, it will
+	return *CDS_MIXED*. If there are no audio tracks on the disc, and
+	if the CD in question has any CD-I tracks on it, it will be
+	reported as *CDS_XA_2_2*. Failing that, if the CD in question
+	has any XA tracks on it, it will be reported as *CDS_XA_2_1*.
+	Finally, if the CD in question has any data tracks on it,
+	it will be reported as a data CD (*CDS_DATA_1*).
+
+	This *ioctl* can return::
+
+		CDS_NO_INFO	/* no information available */
+		CDS_NO_DISC	/* no disc is inserted, or tray is opened */
+		CDS_AUDIO	/* Audio disc (2352 audio bytes/frame) */
+		CDS_DATA_1	/* data disc, mode 1 (2048 user bytes/frame) */
+		CDS_XA_2_1	/* mixed data (XA), mode 2, form 1 (2048 user bytes) */
+		CDS_XA_2_2	/* mixed data (XA), mode 2, form 1 (2324 user bytes) */
+		CDS_MIXED	/* mixed audio/data disc */
+
+	For some information concerning frame layout of the various disc
+	types, see a recent version of `cdrom.h`.
+
+`CDROM_CHANGER_NSLOTS`
+	Returns the number of slots in a juke-box.
+`CDROMRESET`
+	Reset the drive.
+`CDROM_GET_CAPABILITY`
+	Returns the *capability* flags for the drive. Refer to section
+	cdrom_capabilities_ for more information on these flags.
+`CDROM_LOCKDOOR`
+	 Locks the door of the drive. `arg == 0` unlocks the door,
+	 any other value locks it.
+`CDROM_DEBUG`
+	 Turns on debugging info. Only root is allowed to do this.
+	 Same semantics as CDROM_LOCKDOOR.
+
+
+Device dependent *ioctl()'s*
+----------------------------
+
+Finally, all other *ioctl()'s* are passed to the function *dev_ioctl()*,
+if implemented. No memory allocation or verification is carried out.
+
+How to update your driver
+=========================
+
+- Make a backup of your current driver.
+- Get hold of the files `cdrom.c` and `cdrom.h`, they should be in
+  the directory tree that came with this documentation.
+- Make sure you include `cdrom.h`.
+- Change the 3rd argument of *register_blkdev* from `&<your-drive>_fops`
+  to `&cdrom_fops`.
+- Just after that line, add the following to register with the Uniform
+  CD-ROM Driver::
+
+	register_cdrom(&<your-drive>_info);*
+
+  Similarly, add a call to *unregister_cdrom()* at the appropriate place.
+- Copy an example of the device-operations *struct* to your
+  source, e. g., from `cm206.c` *cm206_dops*, and change all
+  entries to names corresponding to your driver, or names you just
+  happen to like. If your driver doesn't support a certain function,
+  make the entry *NULL*. At the entry *capability* you should list all
+  capabilities your driver currently supports. If your driver
+  has a capability that is not listed, please send me a message.
+- Copy the *cdrom_device_info* declaration from the same example
+  driver, and modify the entries according to your needs. If your
+  driver dynamically determines the capabilities of the hardware, this
+  structure should also be declared dynamically.
+- Implement all functions in your `<device>_dops` structure,
+  according to prototypes listed in  `cdrom.h`, and specifications given
+  in cdrom_api_. Most likely you have already implemented
+  the code in a large part, and you will almost certainly need to adapt the
+  prototype and return values.
+- Rename your `<device>_ioctl()` function to *audio_ioctl* and
+  change the prototype a little. Remove entries listed in the first
+  part in cdrom_ioctl_, if your code was OK, these are
+  just calls to the routines you adapted in the previous step.
+- You may remove all remaining memory checking code in the
+  *audio_ioctl()* function that deals with audio commands (these are
+  listed in the second part of cdrom_ioctl_. There is no
+  need for memory allocation either, so most *case*s in the *switch*
+  statement look similar to::
+
+	case CDROMREADTOCENTRY:
+		get_toc_entry\bigl((struct cdrom_tocentry *) arg);
+
+- All remaining *ioctl* cases must be moved to a separate
+  function, *<device>_ioctl*, the device-dependent *ioctl()'s*. Note that
+  memory checking and allocation must be kept in this code!
+- Change the prototypes of *<device>_open()* and
+  *<device>_release()*, and remove any strategic code (i. e., tray
+  movement, door locking, etc.).
+- Try to recompile the drivers. We advise you to use modules, both
+  for `cdrom.o` and your driver, as debugging is much easier this
+  way.
+
+Thanks
+======
+
+Thanks to all the people involved. First, Erik Andersen, who has
+taken over the torch in maintaining `cdrom.c` and integrating much
+CD-ROM-related code in the 2.1-kernel. Thanks to Scott Snyder and
+Gerd Knorr, who were the first to implement this interface for SCSI
+and IDE-CD drivers and added many ideas for extension of the data
+structures relative to kernel~2.0. Further thanks to Heiko Eißfeldt,
+Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard Mönkeberg and Andrew Kroll,
+the Linux CD-ROM device driver developers who were kind
+enough to give suggestions and criticisms during the writing. Finally
+of course, I want to thank Linus Torvalds for making this possible in
+the first place.
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 933268b8d6a5..5d1e0a4a7d84 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -7,7 +7,7 @@
    License.  See linux/COPYING for more information.
 
    Uniform CD-ROM driver for Linux.
-   See Documentation/cdrom/cdrom-standard.tex for usage information.
+   See Documentation/cdrom/cdrom-standard.txt for usage information.
 
    The routines in the file provide a uniform interface between the
    software that uses CD-ROMs and the various low-level drivers that
-- 
cgit v1.2.3-59-g8ed1b


From 8ea618899b6b4fbe97c8462e7d769867307de011 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:40 -0300
Subject: docs: cdrom: convert docs to ReST and rename to *.rst

The stuff there is almost already at ReST format. A
conversion for them is trivial: just add a missing titles
and fix some scape codes for them to match ReST syntax.

While here, rename the cdrom-standard.txt, with was converted
from LaTeX to ReST on the previous patch, and add it to the
index file.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/cdrom/cdrom-standard.rst | 1063 ++++++++++++++++++++++++++++++++
 Documentation/cdrom/cdrom-standard.txt | 1063 --------------------------------
 Documentation/cdrom/ide-cd             |  534 ----------------
 Documentation/cdrom/ide-cd.rst         |  538 ++++++++++++++++
 Documentation/cdrom/index.rst          |   19 +
 Documentation/cdrom/packet-writing.rst |  139 +++++
 Documentation/cdrom/packet-writing.txt |  132 ----
 MAINTAINERS                            |    2 +-
 drivers/block/Kconfig                  |    2 +-
 drivers/cdrom/cdrom.c                  |    2 +-
 drivers/ide/ide-cd.c                   |    2 +-
 11 files changed, 1763 insertions(+), 1733 deletions(-)
 create mode 100644 Documentation/cdrom/cdrom-standard.rst
 delete mode 100644 Documentation/cdrom/cdrom-standard.txt
 delete mode 100644 Documentation/cdrom/ide-cd
 create mode 100644 Documentation/cdrom/ide-cd.rst
 create mode 100644 Documentation/cdrom/index.rst
 create mode 100644 Documentation/cdrom/packet-writing.rst
 delete mode 100644 Documentation/cdrom/packet-writing.txt

diff --git a/Documentation/cdrom/cdrom-standard.rst b/Documentation/cdrom/cdrom-standard.rst
new file mode 100644
index 000000000000..dde4f7f7fdbf
--- /dev/null
+++ b/Documentation/cdrom/cdrom-standard.rst
@@ -0,0 +1,1063 @@
+=======================
+A Linux CD-ROM standard
+=======================
+
+:Author: David van Leeuwen <david@ElseWare.cistron.nl>
+:Date: 12 March 1999
+:Updated by: Erik Andersen (andersee@debian.org)
+:Updated by: Jens Axboe (axboe@image.dk)
+
+
+Introduction
+============
+
+Linux is probably the Unix-like operating system that supports
+the widest variety of hardware devices. The reasons for this are
+presumably
+
+- The large list of hardware devices available for the many platforms
+  that Linux now supports (i.e., i386-PCs, Sparc Suns, etc.)
+- The open design of the operating system, such that anybody can write a
+  driver for Linux.
+- There is plenty of source code around as examples of how to write a driver.
+
+The openness of Linux, and the many different types of available
+hardware has allowed Linux to support many different hardware devices.
+Unfortunately, the very openness that has allowed Linux to support
+all these different devices has also allowed the behavior of each
+device driver to differ significantly from one device to another.
+This divergence of behavior has been very significant for CD-ROM
+devices; the way a particular drive reacts to a `standard` *ioctl()*
+call varies greatly from one device driver to another. To avoid making
+their drivers totally inconsistent, the writers of Linux CD-ROM
+drivers generally created new device drivers by understanding, copying,
+and then changing an existing one. Unfortunately, this practice did not
+maintain uniform behavior across all the Linux CD-ROM drivers.
+
+This document describes an effort to establish Uniform behavior across
+all the different CD-ROM device drivers for Linux. This document also
+defines the various *ioctl()'s*, and how the low-level CD-ROM device
+drivers should implement them. Currently (as of the Linux 2.1.\ *x*
+development kernels) several low-level CD-ROM device drivers, including
+both IDE/ATAPI and SCSI, now use this Uniform interface.
+
+When the CD-ROM was developed, the interface between the CD-ROM drive
+and the computer was not specified in the standards. As a result, many
+different CD-ROM interfaces were developed. Some of them had their
+own proprietary design (Sony, Mitsumi, Panasonic, Philips), other
+manufacturers adopted an existing electrical interface and changed
+the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply
+adapted their drives to one or more of the already existing electrical
+interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and
+most of the `NoName` manufacturers). In cases where a new drive really
+brought its own interface or used its own command set and flow control
+scheme, either a separate driver had to be written, or an existing
+driver had to be enhanced. History has delivered us CD-ROM support for
+many of these different interfaces. Nowadays, almost all new CD-ROM
+drives are either IDE/ATAPI or SCSI, and it is very unlikely that any
+manufacturer will create a new interface. Even finding drives for the
+old proprietary interfaces is getting difficult.
+
+When (in the 1.3.70's) I looked at the existing software interface,
+which was expressed through `cdrom.h`, it appeared to be a rather wild
+set of commands and data formats [#f1]_. It seemed that many
+features of the software interface had been added to accommodate the
+capabilities of a particular drive, in an *ad hoc* manner. More
+importantly, it appeared that the behavior of the `standard` commands
+was different for most of the different drivers: e. g., some drivers
+close the tray if an *open()* call occurs when the tray is open, while
+others do not. Some drivers lock the door upon opening the device, to
+prevent an incoherent file system, but others don't, to allow software
+ejection. Undoubtedly, the capabilities of the different drives vary,
+but even when two drives have the same capability their drivers'
+behavior was usually different.
+
+.. [#f1]
+   I cannot recollect what kernel version I looked at, then,
+   presumably 1.2.13 and 1.3.34 --- the latest kernel that I was
+   indirectly involved in.
+
+I decided to start a discussion on how to make all the Linux CD-ROM
+drivers behave more uniformly. I began by contacting the developers of
+the many CD-ROM drivers found in the Linux kernel. Their reactions
+encouraged me to write the Uniform CD-ROM Driver which this document is
+intended to describe. The implementation of the Uniform CD-ROM Driver is
+in the file `cdrom.c`. This driver is intended to be an additional software
+layer that sits on top of the low-level device drivers for each CD-ROM drive.
+By adding this additional layer, it is possible to have all the different
+CD-ROM devices behave **exactly** the same (insofar as the underlying
+hardware will allow).
+
+The goal of the Uniform CD-ROM Driver is **not** to alienate driver developers
+whohave not yet taken steps to support this effort. The goal of Uniform CD-ROM
+Driver is simply to give people writing application programs for CD-ROM drives
+**one** Linux CD-ROM interface with consistent behavior for all
+CD-ROM devices. In addition, this also provides a consistent interface
+between the low-level device driver code and the Linux kernel. Care
+is taken that 100% compatibility exists with the data structures and
+programmer's interface defined in `cdrom.h`. This guide was written to
+help CD-ROM driver developers adapt their code to use the Uniform CD-ROM
+Driver code defined in `cdrom.c`.
+
+Personally, I think that the most important hardware interfaces are
+the IDE/ATAPI drives and, of course, the SCSI drives, but as prices
+of hardware drop continuously, it is also likely that people may have
+more than one CD-ROM drive, possibly of mixed types. It is important
+that these drives behave in the same way. In December 1994, one of the
+cheapest CD-ROM drives was a Philips cm206, a double-speed proprietary
+drive. In the months that I was busy writing a Linux driver for it,
+proprietary drives became obsolete and IDE/ATAPI drives became the
+standard. At the time of the last update to this document (November
+1997) it is becoming difficult to even **find** anything less than a
+16 speed CD-ROM drive, and 24 speed drives are common.
+
+.. _cdrom_api:
+
+Standardizing through another software level
+============================================
+
+At the time this document was conceived, all drivers directly
+implemented the CD-ROM *ioctl()* calls through their own routines. This
+led to the danger of different drivers forgetting to do important things
+like checking that the user was giving the driver valid data. More
+importantly, this led to the divergence of behavior, which has already
+been discussed.
+
+For this reason, the Uniform CD-ROM Driver was created to enforce consistent
+CD-ROM drive behavior, and to provide a common set of services to the various
+low-level CD-ROM device drivers. The Uniform CD-ROM Driver now provides another
+software-level, that separates the *ioctl()* and *open()* implementation
+from the actual hardware implementation. Note that this effort has
+made few changes which will affect a user's application programs. The
+greatest change involved moving the contents of the various low-level
+CD-ROM drivers\' header files to the kernel's cdrom directory. This was
+done to help ensure that the user is only presented with only one cdrom
+interface, the interface defined in `cdrom.h`.
+
+CD-ROM drives are specific enough (i. e., different from other
+block-devices such as floppy or hard disc drives), to define a set
+of common **CD-ROM device operations**, *<cdrom-device>_dops*.
+These operations are different from the classical block-device file
+operations, *<block-device>_fops*.
+
+The routines for the Uniform CD-ROM Driver interface level are implemented
+in the file `cdrom.c`. In this file, the Uniform CD-ROM Driver interfaces
+with the kernel as a block device by registering the following general
+*struct file_operations*::
+
+	struct file_operations cdrom_fops = {
+		NULL,			/∗ lseek ∗/
+		block _read ,		/∗ read—general block-dev read ∗/
+		block _write,		/∗ write—general block-dev write ∗/
+		NULL,			/∗ readdir ∗/
+		NULL,			/∗ select ∗/
+		cdrom_ioctl,		/∗ ioctl ∗/
+		NULL,			/∗ mmap ∗/
+		cdrom_open,		/∗ open ∗/
+		cdrom_release,		/∗ release ∗/
+		NULL,			/∗ fsync ∗/
+		NULL,			/∗ fasync ∗/
+		cdrom_media_changed,	/∗ media change ∗/
+		NULL			/∗ revalidate ∗/
+	};
+
+Every active CD-ROM device shares this *struct*. The routines
+declared above are all implemented in `cdrom.c`, since this file is the
+place where the behavior of all CD-ROM-devices is defined and
+standardized. The actual interface to the various types of CD-ROM
+hardware is still performed by various low-level CD-ROM-device
+drivers. These routines simply implement certain **capabilities**
+that are common to all CD-ROM (and really, all removable-media
+devices).
+
+Registration of a low-level CD-ROM device driver is now done through
+the general routines in `cdrom.c`, not through the Virtual File System
+(VFS) any more. The interface implemented in `cdrom.c` is carried out
+through two general structures that contain information about the
+capabilities of the driver, and the specific drives on which the
+driver operates. The structures are:
+
+cdrom_device_ops
+  This structure contains information about the low-level driver for a
+  CD-ROM device. This structure is conceptually connected to the major
+  number of the device (although some drivers may have different
+  major numbers, as is the case for the IDE driver).
+
+cdrom_device_info
+  This structure contains information about a particular CD-ROM drive,
+  such as its device name, speed, etc. This structure is conceptually
+  connected to the minor number of the device.
+
+Registering a particular CD-ROM drive with the Uniform CD-ROM Driver
+is done by the low-level device driver though a call to::
+
+	register_cdrom(struct cdrom_device_info * <device>_info)
+
+The device information structure, *<device>_info*, contains all the
+information needed for the kernel to interface with the low-level
+CD-ROM device driver. One of the most important entries in this
+structure is a pointer to the *cdrom_device_ops* structure of the
+low-level driver.
+
+The device operations structure, *cdrom_device_ops*, contains a list
+of pointers to the functions which are implemented in the low-level
+device driver. When `cdrom.c` accesses a CD-ROM device, it does it
+through the functions in this structure. It is impossible to know all
+the capabilities of future CD-ROM drives, so it is expected that this
+list may need to be expanded from time to time as new technologies are
+developed. For example, CD-R and CD-R/W drives are beginning to become
+popular, and support will soon need to be added for them. For now, the
+current *struct* is::
+
+	struct cdrom_device_ops {
+		int (*open)(struct cdrom_device_info *, int)
+		void (*release)(struct cdrom_device_info *);
+		int (*drive_status)(struct cdrom_device_info *, int);
+		unsigned int (*check_events)(struct cdrom_device_info *,
+					     unsigned int, int);
+		int (*media_changed)(struct cdrom_device_info *, int);
+		int (*tray_move)(struct cdrom_device_info *, int);
+		int (*lock_door)(struct cdrom_device_info *, int);
+		int (*select_speed)(struct cdrom_device_info *, int);
+		int (*select_disc)(struct cdrom_device_info *, int);
+		int (*get_last_session) (struct cdrom_device_info *,
+					 struct cdrom_multisession *);
+		int (*get_mcn)(struct cdrom_device_info *, struct cdrom_mcn *);
+		int (*reset)(struct cdrom_device_info *);
+		int (*audio_ioctl)(struct cdrom_device_info *,
+				   unsigned int, void *);
+		const int capability;		/* capability flags */
+		int (*generic_packet)(struct cdrom_device_info *,
+				      struct packet_command *);
+	};
+
+When a low-level device driver implements one of these capabilities,
+it should add a function pointer to this *struct*. When a particular
+function is not implemented, however, this *struct* should contain a
+NULL instead. The *capability* flags specify the capabilities of the
+CD-ROM hardware and/or low-level CD-ROM driver when a CD-ROM drive
+is registered with the Uniform CD-ROM Driver.
+
+Note that most functions have fewer parameters than their
+*blkdev_fops* counterparts. This is because very little of the
+information in the structures *inode* and *file* is used. For most
+drivers, the main parameter is the *struct* *cdrom_device_info*, from
+which the major and minor number can be extracted. (Most low-level
+CD-ROM drivers don't even look at the major and minor number though,
+since many of them only support one device.) This will be available
+through *dev* in *cdrom_device_info* described below.
+
+The drive-specific, minor-like information that is registered with
+`cdrom.c`, currently contains the following fields::
+
+  struct cdrom_device_info {
+	const struct cdrom_device_ops * ops; 	/* device operations for this major */
+	struct list_head list;			/* linked list of all device_info */
+	struct gendisk * disk;			/* matching block layer disk */
+	void *  handle;				/* driver-dependent data */
+
+	int mask; 				/* mask of capability: disables them */
+	int speed;				/* maximum speed for reading data */
+	int capacity;				/* number of discs in a jukebox */
+
+	unsigned int options:30;		/* options flags */
+	unsigned mc_flags:2;			/*  media-change buffer flags */
+	unsigned int vfs_events;		/*  cached events for vfs path */
+	unsigned int ioctl_events;		/*  cached events for ioctl path */
+	int use_count;				/*  number of times device is opened */
+	char name[20];				/*  name of the device type */
+
+	__u8 sanyo_slot : 2;			/*  Sanyo 3-CD changer support */
+	__u8 keeplocked : 1;			/*  CDROM_LOCKDOOR status */
+	__u8 reserved : 5;			/*  not used yet */
+	int cdda_method;			/*  see CDDA_* flags */
+	__u8 last_sense;			/*  saves last sense key */
+	__u8 media_written;			/*  dirty flag, DVD+RW bookkeeping */
+	unsigned short mmc3_profile;		/*  current MMC3 profile */
+	int for_data;				/*  unknown:TBD */
+	int (*exit)(struct cdrom_device_info *);/*  unknown:TBD */
+	int mrw_mode_page;			/*  which MRW mode page is in use */
+  };
+
+Using this *struct*, a linked list of the registered minor devices is
+built, using the *next* field. The device number, the device operations
+struct and specifications of properties of the drive are stored in this
+structure.
+
+The *mask* flags can be used to mask out some of the capabilities listed
+in *ops->capability*, if a specific drive doesn't support a feature
+of the driver. The value *speed* specifies the maximum head-rate of the
+drive, measured in units of normal audio speed (176kB/sec raw data or
+150kB/sec file system data). The parameters are declared *const*
+because they describe properties of the drive, which don't change after
+registration.
+
+A few registers contain variables local to the CD-ROM drive. The
+flags *options* are used to specify how the general CD-ROM routines
+should behave. These various flags registers should provide enough
+flexibility to adapt to the different users' wishes (and **not** the
+`arbitrary` wishes of the author of the low-level device driver, as is
+the case in the old scheme). The register *mc_flags* is used to buffer
+the information from *media_changed()* to two separate queues. Other
+data that is specific to a minor drive, can be accessed through *handle*,
+which can point to a data structure specific to the low-level driver.
+The fields *use_count*, *next*, *options* and *mc_flags* need not be
+initialized.
+
+The intermediate software layer that `cdrom.c` forms will perform some
+additional bookkeeping. The use count of the device (the number of
+processes that have the device opened) is registered in *use_count*. The
+function *cdrom_ioctl()* will verify the appropriate user-memory regions
+for read and write, and in case a location on the CD is transferred,
+it will `sanitize` the format by making requests to the low-level
+drivers in a standard format, and translating all formats between the
+user-software and low level drivers. This relieves much of the drivers'
+memory checking and format checking and translation. Also, the necessary
+structures will be declared on the program stack.
+
+The implementation of the functions should be as defined in the
+following sections. Two functions **must** be implemented, namely
+*open()* and *release()*. Other functions may be omitted, their
+corresponding capability flags will be cleared upon registration.
+Generally, a function returns zero on success and negative on error. A
+function call should return only after the command has completed, but of
+course waiting for the device should not use processor time.
+
+::
+
+	int open(struct cdrom_device_info *cdi, int purpose)
+
+*Open()* should try to open the device for a specific *purpose*, which
+can be either:
+
+- Open for reading data, as done by `mount()` (2), or the
+  user commands `dd` or `cat`.
+- Open for *ioctl* commands, as done by audio-CD playing programs.
+
+Notice that any strategic code (closing tray upon *open()*, etc.) is
+done by the calling routine in `cdrom.c`, so the low-level routine
+should only be concerned with proper initialization, such as spinning
+up the disc, etc.
+
+::
+
+	void release(struct cdrom_device_info *cdi)
+
+Device-specific actions should be taken such as spinning down the device.
+However, strategic actions such as ejection of the tray, or unlocking
+the door, should be left over to the general routine *cdrom_release()*.
+This is the only function returning type *void*.
+
+.. _cdrom_drive_status:
+
+::
+
+	int drive_status(struct cdrom_device_info *cdi, int slot_nr)
+
+The function *drive_status*, if implemented, should provide
+information on the status of the drive (not the status of the disc,
+which may or may not be in the drive). If the drive is not a changer,
+*slot_nr* should be ignored. In `cdrom.h` the possibilities are listed::
+
+
+	CDS_NO_INFO		/* no information available */
+	CDS_NO_DISC		/* no disc is inserted, tray is closed */
+	CDS_TRAY_OPEN		/* tray is opened */
+	CDS_DRIVE_NOT_READY	/* something is wrong, tray is moving? */
+	CDS_DISC_OK		/* a disc is loaded and everything is fine */
+
+::
+
+	int media_changed(struct cdrom_device_info *cdi, int disc_nr)
+
+This function is very similar to the original function in $struct
+file_operations*. It returns 1 if the medium of the device *cdi->dev*
+has changed since the last call, and 0 otherwise. The parameter
+*disc_nr* identifies a specific slot in a juke-box, it should be
+ignored for single-disc drives. Note that by `re-routing` this
+function through *cdrom_media_changed()*, we can implement separate
+queues for the VFS and a new *ioctl()* function that can report device
+changes to software (e. g., an auto-mounting daemon).
+
+::
+
+	int tray_move(struct cdrom_device_info *cdi, int position)
+
+This function, if implemented, should control the tray movement. (No
+other function should control this.) The parameter *position* controls
+the desired direction of movement:
+
+- 0 Close tray
+- 1 Open tray
+
+This function returns 0 upon success, and a non-zero value upon
+error. Note that if the tray is already in the desired position, no
+action need be taken, and the return value should be 0.
+
+::
+
+	int lock_door(struct cdrom_device_info *cdi, int lock)
+
+This function (and no other code) controls locking of the door, if the
+drive allows this. The value of *lock* controls the desired locking
+state:
+
+- 0 Unlock door, manual opening is allowed
+- 1 Lock door, tray cannot be ejected manually
+
+This function returns 0 upon success, and a non-zero value upon
+error. Note that if the door is already in the requested state, no
+action need be taken, and the return value should be 0.
+
+::
+
+	int select_speed(struct cdrom_device_info *cdi, int speed)
+
+Some CD-ROM drives are capable of changing their head-speed. There
+are several reasons for changing the speed of a CD-ROM drive. Badly
+pressed CD-ROM s may benefit from less-than-maximum head rate. Modern
+CD-ROM drives can obtain very high head rates (up to *24x* is
+common). It has been reported that these drives can make reading
+errors at these high speeds, reducing the speed can prevent data loss
+in these circumstances. Finally, some of these drives can
+make an annoyingly loud noise, which a lower speed may reduce.
+
+This function specifies the speed at which data is read or audio is
+played back. The value of *speed* specifies the head-speed of the
+drive, measured in units of standard cdrom speed (176kB/sec raw data
+or 150kB/sec file system data). So to request that a CD-ROM drive
+operate at 300kB/sec you would call the CDROM_SELECT_SPEED *ioctl*
+with *speed=2*. The special value `0` means `auto-selection`, i. e.,
+maximum data-rate or real-time audio rate. If the drive doesn't have
+this `auto-selection` capability, the decision should be made on the
+current disc loaded and the return value should be positive. A negative
+return value indicates an error.
+
+::
+
+	int select_disc(struct cdrom_device_info *cdi, int number)
+
+If the drive can store multiple discs (a juke-box) this function
+will perform disc selection. It should return the number of the
+selected disc on success, a negative value on error. Currently, only
+the ide-cd driver supports this functionality.
+
+::
+
+	int get_last_session(struct cdrom_device_info *cdi,
+			     struct cdrom_multisession *ms_info)
+
+This function should implement the old corresponding *ioctl()*. For
+device *cdi->dev*, the start of the last session of the current disc
+should be returned in the pointer argument *ms_info*. Note that
+routines in `cdrom.c` have sanitized this argument: its requested
+format will **always** be of the type *CDROM_LBA* (linear block
+addressing mode), whatever the calling software requested. But
+sanitization goes even further: the low-level implementation may
+return the requested information in *CDROM_MSF* format if it wishes so
+(setting the *ms_info->addr_format* field appropriately, of
+course) and the routines in `cdrom.c` will make the transformation if
+necessary. The return value is 0 upon success.
+
+::
+
+	int get_mcn(struct cdrom_device_info *cdi,
+		    struct cdrom_mcn *mcn)
+
+Some discs carry a `Media Catalog Number` (MCN), also called
+`Universal Product Code` (UPC). This number should reflect the number
+that is generally found in the bar-code on the product. Unfortunately,
+the few discs that carry such a number on the disc don't even use the
+same format. The return argument to this function is a pointer to a
+pre-declared memory region of type *struct cdrom_mcn*. The MCN is
+expected as a 13-character string, terminated by a null-character.
+
+::
+
+	int reset(struct cdrom_device_info *cdi)
+
+This call should perform a hard-reset on the drive (although in
+circumstances that a hard-reset is necessary, a drive may very well not
+listen to commands anymore). Preferably, control is returned to the
+caller only after the drive has finished resetting. If the drive is no
+longer listening, it may be wise for the underlying low-level cdrom
+driver to time out.
+
+::
+
+	int audio_ioctl(struct cdrom_device_info *cdi,
+			unsigned int cmd, void *arg)
+
+Some of the CD-ROM-\ *ioctl()*\ 's defined in `cdrom.h` can be
+implemented by the routines described above, and hence the function
+*cdrom_ioctl* will use those. However, most *ioctl()*\ 's deal with
+audio-control. We have decided to leave these to be accessed through a
+single function, repeating the arguments *cmd* and *arg*. Note that
+the latter is of type *void*, rather than *unsigned long int*.
+The routine *cdrom_ioctl()* does do some useful things,
+though. It sanitizes the address format type to *CDROM_MSF* (Minutes,
+Seconds, Frames) for all audio calls. It also verifies the memory
+location of *arg*, and reserves stack-memory for the argument. This
+makes implementation of the *audio_ioctl()* much simpler than in the
+old driver scheme. For example, you may look up the function
+*cm206_audio_ioctl()* `cm206.c` that should be updated with
+this documentation.
+
+An unimplemented ioctl should return *-ENOSYS*, but a harmless request
+(e. g., *CDROMSTART*) may be ignored by returning 0 (success). Other
+errors should be according to the standards, whatever they are. When
+an error is returned by the low-level driver, the Uniform CD-ROM Driver
+tries whenever possible to return the error code to the calling program.
+(We may decide to sanitize the return value in *cdrom_ioctl()* though, in
+order to guarantee a uniform interface to the audio-player software.)
+
+::
+
+	int dev_ioctl(struct cdrom_device_info *cdi,
+		      unsigned int cmd, unsigned long arg)
+
+Some *ioctl()'s* seem to be specific to certain CD-ROM drives. That is,
+they are introduced to service some capabilities of certain drives. In
+fact, there are 6 different *ioctl()'s* for reading data, either in some
+particular kind of format, or audio data. Not many drives support
+reading audio tracks as data, I believe this is because of protection
+of copyrights of artists. Moreover, I think that if audio-tracks are
+supported, it should be done through the VFS and not via *ioctl()'s*. A
+problem here could be the fact that audio-frames are 2352 bytes long,
+so either the audio-file-system should ask for 75264 bytes at once
+(the least common multiple of 512 and 2352), or the drivers should
+bend their backs to cope with this incoherence (to which I would be
+opposed). Furthermore, it is very difficult for the hardware to find
+the exact frame boundaries, since there are no synchronization headers
+in audio frames. Once these issues are resolved, this code should be
+standardized in `cdrom.c`.
+
+Because there are so many *ioctl()'s* that seem to be introduced to
+satisfy certain drivers [#f2]_, any non-standard *ioctl()*\ s
+are routed through the call *dev_ioctl()*. In principle, `private`
+*ioctl()*\ 's should be numbered after the device's major number, and not
+the general CD-ROM *ioctl* number, `0x53`. Currently the
+non-supported *ioctl()'s* are:
+
+	CDROMREADMODE1, CDROMREADMODE2, CDROMREADAUDIO, CDROMREADRAW,
+	CDROMREADCOOKED, CDROMSEEK, CDROMPLAY-BLK and CDROM-READALL
+
+.. [#f2]
+
+   Is there software around that actually uses these? I'd be interested!
+
+.. _cdrom_capabilities:
+
+CD-ROM capabilities
+-------------------
+
+Instead of just implementing some *ioctl* calls, the interface in
+`cdrom.c` supplies the possibility to indicate the **capabilities**
+of a CD-ROM drive. This can be done by ORing any number of
+capability-constants that are defined in `cdrom.h` at the registration
+phase. Currently, the capabilities are any of::
+
+	CDC_CLOSE_TRAY		/* can close tray by software control */
+	CDC_OPEN_TRAY		/* can open tray */
+	CDC_LOCK		/* can lock and unlock the door */
+	CDC_SELECT_SPEED	/* can select speed, in units of * sim*150 ,kB/s */
+	CDC_SELECT_DISC		/* drive is juke-box */
+	CDC_MULTI_SESSION	/* can read sessions *> rm1* */
+	CDC_MCN			/* can read Media Catalog Number */
+	CDC_MEDIA_CHANGED	/* can report if disc has changed */
+	CDC_PLAY_AUDIO		/* can perform audio-functions (play, pause, etc) */
+	CDC_RESET		/* hard reset device */
+	CDC_IOCTLS		/* driver has non-standard ioctls */
+	CDC_DRIVE_STATUS	/* driver implements drive status */
+
+The capability flag is declared *const*, to prevent drivers from
+accidentally tampering with the contents. The capability fags actually
+inform `cdrom.c` of what the driver can do. If the drive found
+by the driver does not have the capability, is can be masked out by
+the *cdrom_device_info* variable *mask*. For instance, the SCSI CD-ROM
+driver has implemented the code for loading and ejecting CD-ROM's, and
+hence its corresponding flags in *capability* will be set. But a SCSI
+CD-ROM drive might be a caddy system, which can't load the tray, and
+hence for this drive the *cdrom_device_info* struct will have set
+the *CDC_CLOSE_TRAY* bit in *mask*.
+
+In the file `cdrom.c` you will encounter many constructions of the type::
+
+	if (cdo->capability & ∼cdi->mask & CDC _⟨capability⟩) ...
+
+There is no *ioctl* to set the mask... The reason is that
+I think it is better to control the **behavior** rather than the
+**capabilities**.
+
+Options
+-------
+
+A final flag register controls the **behavior** of the CD-ROM
+drives, in order to satisfy different users' wishes, hopefully
+independently of the ideas of the respective author who happened to
+have made the drive's support available to the Linux community. The
+current behavior options are::
+
+	CDO_AUTO_CLOSE	/* try to close tray upon device open() */
+	CDO_AUTO_EJECT	/* try to open tray on last device close() */
+	CDO_USE_FFLAGS	/* use file_pointer->f_flags to indicate purpose for open() */
+	CDO_LOCK	/* try to lock door if device is opened */
+	CDO_CHECK_TYPE	/* ensure disc type is data if opened for data */
+
+The initial value of this register is
+`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`, reflecting my own view on user
+interface and software standards. Before you protest, there are two
+new *ioctl()'s* implemented in `cdrom.c`, that allow you to control the
+behavior by software. These are::
+
+	CDROM_SET_OPTIONS	/* set options specified in (int)arg */
+	CDROM_CLEAR_OPTIONS	/* clear options specified in (int)arg */
+
+One option needs some more explanation: *CDO_USE_FFLAGS*. In the next
+newsection we explain what the need for this option is.
+
+A software package `setcd`, available from the Debian distribution
+and `sunsite.unc.edu`, allows user level control of these flags.
+
+
+The need to know the purpose of opening the CD-ROM device
+=========================================================
+
+Traditionally, Unix devices can be used in two different `modes`,
+either by reading/writing to the device file, or by issuing
+controlling commands to the device, by the device's *ioctl()*
+call. The problem with CD-ROM drives, is that they can be used for
+two entirely different purposes. One is to mount removable
+file systems, CD-ROM's, the other is to play audio CD's. Audio commands
+are implemented entirely through *ioctl()\'s*, presumably because the
+first implementation (SUN?) has been such. In principle there is
+nothing wrong with this, but a good control of the `CD player` demands
+that the device can **always** be opened in order to give the
+*ioctl* commands, regardless of the state the drive is in.
+
+On the other hand, when used as a removable-media disc drive (what the
+original purpose of CD-ROM s is) we would like to make sure that the
+disc drive is ready for operation upon opening the device. In the old
+scheme, some CD-ROM drivers don't do any integrity checking, resulting
+in a number of i/o errors reported by the VFS to the kernel when an
+attempt for mounting a CD-ROM on an empty drive occurs. This is not a
+particularly elegant way to find out that there is no CD-ROM inserted;
+it more-or-less looks like the old IBM-PC trying to read an empty floppy
+drive for a couple of seconds, after which the system complains it
+can't read from it. Nowadays we can **sense** the existence of a
+removable medium in a drive, and we believe we should exploit that
+fact. An integrity check on opening of the device, that verifies the
+availability of a CD-ROM and its correct type (data), would be
+desirable.
+
+These two ways of using a CD-ROM drive, principally for data and
+secondarily for playing audio discs, have different demands for the
+behavior of the *open()* call. Audio use simply wants to open the
+device in order to get a file handle which is needed for issuing
+*ioctl* commands, while data use wants to open for correct and
+reliable data transfer. The only way user programs can indicate what
+their *purpose* of opening the device is, is through the *flags*
+parameter (see `open(2)`). For CD-ROM devices, these flags aren't
+implemented (some drivers implement checking for write-related flags,
+but this is not strictly necessary if the device file has correct
+permission flags). Most option flags simply don't make sense to
+CD-ROM devices: *O_CREAT*, *O_NOCTTY*, *O_TRUNC*, *O_APPEND*, and
+*O_SYNC* have no meaning to a CD-ROM.
+
+We therefore propose to use the flag *O_NONBLOCK* to indicate
+that the device is opened just for issuing *ioctl*
+commands. Strictly, the meaning of *O_NONBLOCK* is that opening and
+subsequent calls to the device don't cause the calling process to
+wait. We could interpret this as don't wait until someone has
+inserted some valid data-CD-ROM. Thus, our proposal of the
+implementation for the *open()* call for CD-ROM s is:
+
+- If no other flags are set than *O_RDONLY*, the device is opened
+  for data transfer, and the return value will be 0 only upon successful
+  initialization of the transfer. The call may even induce some actions
+  on the CD-ROM, such as closing the tray.
+- If the option flag *O_NONBLOCK* is set, opening will always be
+  successful, unless the whole device doesn't exist. The drive will take
+  no actions whatsoever.
+
+And what about standards?
+-------------------------
+
+You might hesitate to accept this proposal as it comes from the
+Linux community, and not from some standardizing institute. What
+about SUN, SGI, HP and all those other Unix and hardware vendors?
+Well, these companies are in the lucky position that they generally
+control both the hardware and software of their supported products,
+and are large enough to set their own standard. They do not have to
+deal with a dozen or more different, competing hardware
+configurations\ [#f3]_.
+
+.. [#f3]
+
+   Incidentally, I think that SUN's approach to mounting CD-ROM s is very
+   good in origin: under Solaris a volume-daemon automatically mounts a
+   newly inserted CD-ROM under `/cdrom/*<volume-name>*`.
+
+   In my opinion they should have pushed this
+   further and have **every** CD-ROM on the local area network be
+   mounted at the similar location, i. e., no matter in which particular
+   machine you insert a CD-ROM, it will always appear at the same
+   position in the directory tree, on every system. When I wanted to
+   implement such a user-program for Linux, I came across the
+   differences in behavior of the various drivers, and the need for an
+   *ioctl* informing about media changes.
+
+We believe that using *O_NONBLOCK* to indicate that a device is being opened
+for *ioctl* commands only can be easily introduced in the Linux
+community. All the CD-player authors will have to be informed, we can
+even send in our own patches to the programs. The use of *O_NONBLOCK*
+has most likely no influence on the behavior of the CD-players on
+other operating systems than Linux. Finally, a user can always revert
+to old behavior by a call to
+*ioctl(file_descriptor, CDROM_CLEAR_OPTIONS, CDO_USE_FFLAGS)*.
+
+The preferred strategy of *open()*
+----------------------------------
+
+The routines in `cdrom.c` are designed in such a way that run-time
+configuration of the behavior of CD-ROM devices (of **any** type)
+can be carried out, by the *CDROM_SET/CLEAR_OPTIONS* *ioctls*. Thus, various
+modes of operation can be set:
+
+`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`
+   This is the default setting. (With *CDO_CHECK_TYPE* it will be better, in
+   the future.) If the device is not yet opened by any other process, and if
+   the device is being opened for data (*O_NONBLOCK* is not set) and the
+   tray is found to be open, an attempt to close the tray is made. Then,
+   it is verified that a disc is in the drive and, if *CDO_CHECK_TYPE* is
+   set, that it contains tracks of type `data mode 1`. Only if all tests
+   are passed is the return value zero. The door is locked to prevent file
+   system corruption. If the drive is opened for audio (*O_NONBLOCK* is
+   set), no actions are taken and a value of 0 will be returned.
+
+`CDO_AUTO_CLOSE | CDO_AUTO_EJECT | CDO_LOCK`
+   This mimics the behavior of the current sbpcd-driver. The option flags are
+   ignored, the tray is closed on the first open, if necessary. Similarly,
+   the tray is opened on the last release, i. e., if a CD-ROM is unmounted,
+   it is automatically ejected, such that the user can replace it.
+
+We hope that these option can convince everybody (both driver
+maintainers and user program developers) to adopt the new CD-ROM
+driver scheme and option flag interpretation.
+
+Description of routines in `cdrom.c`
+====================================
+
+Only a few routines in `cdrom.c` are exported to the drivers. In this
+new section we will discuss these, as well as the functions that `take
+over' the CD-ROM interface to the kernel. The header file belonging
+to `cdrom.c` is called `cdrom.h`. Formerly, some of the contents of this
+file were placed in the file `ucdrom.h`, but this file has now been
+merged back into `cdrom.h`.
+
+::
+
+	struct file_operations cdrom_fops
+
+The contents of this structure were described in cdrom_api_.
+A pointer to this structure is assigned to the *fops* field
+of the *struct gendisk*.
+
+::
+
+	int register_cdrom(struct cdrom_device_info *cdi)
+
+This function is used in about the same way one registers *cdrom_fops*
+with the kernel, the device operations and information structures,
+as described in cdrom_api_, should be registered with the
+Uniform CD-ROM Driver::
+
+	register_cdrom(&<device>_info);
+
+
+This function returns zero upon success, and non-zero upon
+failure. The structure *<device>_info* should have a pointer to the
+driver's *<device>_dops*, as in::
+
+	struct cdrom_device_info <device>_info = {
+		<device>_dops;
+		...
+	}
+
+Note that a driver must have one static structure, *<device>_dops*, while
+it may have as many structures *<device>_info* as there are minor devices
+active. *Register_cdrom()* builds a linked list from these.
+
+
+::
+
+	void unregister_cdrom(struct cdrom_device_info *cdi)
+
+Unregistering device *cdi* with minor number *MINOR(cdi->dev)* removes
+the minor device from the list. If it was the last registered minor for
+the low-level driver, this disconnects the registered device-operation
+routines from the CD-ROM interface. This function returns zero upon
+success, and non-zero upon failure.
+
+::
+
+	int cdrom_open(struct inode * ip, struct file * fp)
+
+This function is not called directly by the low-level drivers, it is
+listed in the standard *cdrom_fops*. If the VFS opens a file, this
+function becomes active. A strategy is implemented in this routine,
+taking care of all capabilities and options that are set in the
+*cdrom_device_ops* connected to the device. Then, the program flow is
+transferred to the device_dependent *open()* call.
+
+::
+
+	void cdrom_release(struct inode *ip, struct file *fp)
+
+This function implements the reverse-logic of *cdrom_open()*, and then
+calls the device-dependent *release()* routine. When the use-count has
+reached 0, the allocated buffers are flushed by calls to *sync_dev(dev)*
+and *invalidate_buffers(dev)*.
+
+
+.. _cdrom_ioctl:
+
+::
+
+	int cdrom_ioctl(struct inode *ip, struct file *fp,
+			unsigned int cmd, unsigned long arg)
+
+This function handles all the standard *ioctl* requests for CD-ROM
+devices in a uniform way. The different calls fall into three
+categories: *ioctl()'s* that can be directly implemented by device
+operations, ones that are routed through the call *audio_ioctl()*, and
+the remaining ones, that are presumable device-dependent. Generally, a
+negative return value indicates an error.
+
+Directly implemented *ioctl()'s*
+--------------------------------
+
+The following `old` CD-ROM *ioctl()*\ 's are implemented by directly
+calling device-operations in *cdrom_device_ops*, if implemented and
+not masked:
+
+`CDROMMULTISESSION`
+	Requests the last session on a CD-ROM.
+`CDROMEJECT`
+	Open tray.
+`CDROMCLOSETRAY`
+	Close tray.
+`CDROMEJECT_SW`
+	If *arg\not=0*, set behavior to auto-close (close
+	tray on first open) and auto-eject (eject on last release), otherwise
+	set behavior to non-moving on *open()* and *release()* calls.
+`CDROM_GET_MCN`
+	Get the Media Catalog Number from a CD.
+
+*Ioctl*s routed through *audio_ioctl()*
+---------------------------------------
+
+The following set of *ioctl()'s* are all implemented through a call to
+the *cdrom_fops* function *audio_ioctl()*. Memory checks and
+allocation are performed in *cdrom_ioctl()*, and also sanitization of
+address format (*CDROM_LBA*/*CDROM_MSF*) is done.
+
+`CDROMSUBCHNL`
+	Get sub-channel data in argument *arg* of type
+	`struct cdrom_subchnl *`.
+`CDROMREADTOCHDR`
+	Read Table of Contents header, in *arg* of type
+	`struct cdrom_tochdr *`.
+`CDROMREADTOCENTRY`
+	Read a Table of Contents entry in *arg* and specified by *arg*
+	of type `struct cdrom_tocentry *`.
+`CDROMPLAYMSF`
+	Play audio fragment specified in Minute, Second, Frame format,
+	delimited by *arg* of type `struct cdrom_msf *`.
+`CDROMPLAYTRKIND`
+	Play audio fragment in track-index format delimited by *arg*
+	of type `struct cdrom_ti *`.
+`CDROMVOLCTRL`
+	Set volume specified by *arg* of type `struct cdrom_volctrl *`.
+`CDROMVOLREAD`
+	Read volume into by *arg* of type `struct cdrom_volctrl *`.
+`CDROMSTART`
+	Spin up disc.
+`CDROMSTOP`
+	Stop playback of audio fragment.
+`CDROMPAUSE`
+	Pause playback of audio fragment.
+`CDROMRESUME`
+	Resume playing.
+
+New *ioctl()'s* in `cdrom.c`
+----------------------------
+
+The following *ioctl()'s* have been introduced to allow user programs to
+control the behavior of individual CD-ROM devices. New *ioctl*
+commands can be identified by the underscores in their names.
+
+`CDROM_SET_OPTIONS`
+	Set options specified by *arg*. Returns the option flag register
+	after modification. Use *arg = \rm0* for reading the current flags.
+`CDROM_CLEAR_OPTIONS`
+	Clear options specified by *arg*. Returns the option flag register
+	after modification.
+`CDROM_SELECT_SPEED`
+	Select head-rate speed of disc specified as by *arg* in units
+	of standard cdrom speed (176\,kB/sec raw data or
+	150kB/sec file system data). The value 0 means `auto-select`,
+	i. e., play audio discs at real time and data discs at maximum speed.
+	The value *arg* is checked against the maximum head rate of the
+	drive found in the *cdrom_dops*.
+`CDROM_SELECT_DISC`
+	Select disc numbered *arg* from a juke-box.
+
+	First disc is numbered 0. The number *arg* is checked against the
+	maximum number of discs in the juke-box found in the *cdrom_dops*.
+`CDROM_MEDIA_CHANGED`
+	Returns 1 if a disc has been changed since the last call.
+	Note that calls to *cdrom_media_changed* by the VFS are treated
+	by an independent queue, so both mechanisms will detect a
+	media change once. For juke-boxes, an extra argument *arg*
+	specifies the slot for which the information is given. The special
+	value *CDSL_CURRENT* requests that information about the currently
+	selected slot be returned.
+`CDROM_DRIVE_STATUS`
+	Returns the status of the drive by a call to
+	*drive_status()*. Return values are defined in cdrom_drive_status_.
+	Note that this call doesn't return information on the
+	current playing activity of the drive; this can be polled through
+	an *ioctl* call to *CDROMSUBCHNL*. For juke-boxes, an extra argument
+	*arg* specifies the slot for which (possibly limited) information is
+	given. The special value *CDSL_CURRENT* requests that information
+	about the currently selected slot be returned.
+`CDROM_DISC_STATUS`
+	Returns the type of the disc currently in the drive.
+	It should be viewed as a complement to *CDROM_DRIVE_STATUS*.
+	This *ioctl* can provide *some* information about the current
+	disc that is inserted in the drive. This functionality used to be
+	implemented in the low level drivers, but is now carried out
+	entirely in Uniform CD-ROM Driver.
+
+	The history of development of the CD's use as a carrier medium for
+	various digital information has lead to many different disc types.
+	This *ioctl* is useful only in the case that CDs have \emph {only
+	one} type of data on them. While this is often the case, it is
+	also very common for CDs to have some tracks with data, and some
+	tracks with audio. Because this is an existing interface, rather
+	than fixing this interface by changing the assumptions it was made
+	under, thereby breaking all user applications that use this
+	function, the Uniform CD-ROM Driver implements this *ioctl* as
+	follows: If the CD in question has audio tracks on it, and it has
+	absolutely no CD-I, XA, or data tracks on it, it will be reported
+	as *CDS_AUDIO*. If it has both audio and data tracks, it will
+	return *CDS_MIXED*. If there are no audio tracks on the disc, and
+	if the CD in question has any CD-I tracks on it, it will be
+	reported as *CDS_XA_2_2*. Failing that, if the CD in question
+	has any XA tracks on it, it will be reported as *CDS_XA_2_1*.
+	Finally, if the CD in question has any data tracks on it,
+	it will be reported as a data CD (*CDS_DATA_1*).
+
+	This *ioctl* can return::
+
+		CDS_NO_INFO	/* no information available */
+		CDS_NO_DISC	/* no disc is inserted, or tray is opened */
+		CDS_AUDIO	/* Audio disc (2352 audio bytes/frame) */
+		CDS_DATA_1	/* data disc, mode 1 (2048 user bytes/frame) */
+		CDS_XA_2_1	/* mixed data (XA), mode 2, form 1 (2048 user bytes) */
+		CDS_XA_2_2	/* mixed data (XA), mode 2, form 1 (2324 user bytes) */
+		CDS_MIXED	/* mixed audio/data disc */
+
+	For some information concerning frame layout of the various disc
+	types, see a recent version of `cdrom.h`.
+
+`CDROM_CHANGER_NSLOTS`
+	Returns the number of slots in a juke-box.
+`CDROMRESET`
+	Reset the drive.
+`CDROM_GET_CAPABILITY`
+	Returns the *capability* flags for the drive. Refer to section
+	cdrom_capabilities_ for more information on these flags.
+`CDROM_LOCKDOOR`
+	 Locks the door of the drive. `arg == 0` unlocks the door,
+	 any other value locks it.
+`CDROM_DEBUG`
+	 Turns on debugging info. Only root is allowed to do this.
+	 Same semantics as CDROM_LOCKDOOR.
+
+
+Device dependent *ioctl()'s*
+----------------------------
+
+Finally, all other *ioctl()'s* are passed to the function *dev_ioctl()*,
+if implemented. No memory allocation or verification is carried out.
+
+How to update your driver
+=========================
+
+- Make a backup of your current driver.
+- Get hold of the files `cdrom.c` and `cdrom.h`, they should be in
+  the directory tree that came with this documentation.
+- Make sure you include `cdrom.h`.
+- Change the 3rd argument of *register_blkdev* from `&<your-drive>_fops`
+  to `&cdrom_fops`.
+- Just after that line, add the following to register with the Uniform
+  CD-ROM Driver::
+
+	register_cdrom(&<your-drive>_info);*
+
+  Similarly, add a call to *unregister_cdrom()* at the appropriate place.
+- Copy an example of the device-operations *struct* to your
+  source, e. g., from `cm206.c` *cm206_dops*, and change all
+  entries to names corresponding to your driver, or names you just
+  happen to like. If your driver doesn't support a certain function,
+  make the entry *NULL*. At the entry *capability* you should list all
+  capabilities your driver currently supports. If your driver
+  has a capability that is not listed, please send me a message.
+- Copy the *cdrom_device_info* declaration from the same example
+  driver, and modify the entries according to your needs. If your
+  driver dynamically determines the capabilities of the hardware, this
+  structure should also be declared dynamically.
+- Implement all functions in your `<device>_dops` structure,
+  according to prototypes listed in  `cdrom.h`, and specifications given
+  in cdrom_api_. Most likely you have already implemented
+  the code in a large part, and you will almost certainly need to adapt the
+  prototype and return values.
+- Rename your `<device>_ioctl()` function to *audio_ioctl* and
+  change the prototype a little. Remove entries listed in the first
+  part in cdrom_ioctl_, if your code was OK, these are
+  just calls to the routines you adapted in the previous step.
+- You may remove all remaining memory checking code in the
+  *audio_ioctl()* function that deals with audio commands (these are
+  listed in the second part of cdrom_ioctl_. There is no
+  need for memory allocation either, so most *case*s in the *switch*
+  statement look similar to::
+
+	case CDROMREADTOCENTRY:
+		get_toc_entry\bigl((struct cdrom_tocentry *) arg);
+
+- All remaining *ioctl* cases must be moved to a separate
+  function, *<device>_ioctl*, the device-dependent *ioctl()'s*. Note that
+  memory checking and allocation must be kept in this code!
+- Change the prototypes of *<device>_open()* and
+  *<device>_release()*, and remove any strategic code (i. e., tray
+  movement, door locking, etc.).
+- Try to recompile the drivers. We advise you to use modules, both
+  for `cdrom.o` and your driver, as debugging is much easier this
+  way.
+
+Thanks
+======
+
+Thanks to all the people involved. First, Erik Andersen, who has
+taken over the torch in maintaining `cdrom.c` and integrating much
+CD-ROM-related code in the 2.1-kernel. Thanks to Scott Snyder and
+Gerd Knorr, who were the first to implement this interface for SCSI
+and IDE-CD drivers and added many ideas for extension of the data
+structures relative to kernel~2.0. Further thanks to Heiko Eißfeldt,
+Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard Mönkeberg and Andrew Kroll,
+the Linux CD-ROM device driver developers who were kind
+enough to give suggestions and criticisms during the writing. Finally
+of course, I want to thank Linus Torvalds for making this possible in
+the first place.
diff --git a/Documentation/cdrom/cdrom-standard.txt b/Documentation/cdrom/cdrom-standard.txt
deleted file mode 100644
index dde4f7f7fdbf..000000000000
--- a/Documentation/cdrom/cdrom-standard.txt
+++ /dev/null
@@ -1,1063 +0,0 @@
-=======================
-A Linux CD-ROM standard
-=======================
-
-:Author: David van Leeuwen <david@ElseWare.cistron.nl>
-:Date: 12 March 1999
-:Updated by: Erik Andersen (andersee@debian.org)
-:Updated by: Jens Axboe (axboe@image.dk)
-
-
-Introduction
-============
-
-Linux is probably the Unix-like operating system that supports
-the widest variety of hardware devices. The reasons for this are
-presumably
-
-- The large list of hardware devices available for the many platforms
-  that Linux now supports (i.e., i386-PCs, Sparc Suns, etc.)
-- The open design of the operating system, such that anybody can write a
-  driver for Linux.
-- There is plenty of source code around as examples of how to write a driver.
-
-The openness of Linux, and the many different types of available
-hardware has allowed Linux to support many different hardware devices.
-Unfortunately, the very openness that has allowed Linux to support
-all these different devices has also allowed the behavior of each
-device driver to differ significantly from one device to another.
-This divergence of behavior has been very significant for CD-ROM
-devices; the way a particular drive reacts to a `standard` *ioctl()*
-call varies greatly from one device driver to another. To avoid making
-their drivers totally inconsistent, the writers of Linux CD-ROM
-drivers generally created new device drivers by understanding, copying,
-and then changing an existing one. Unfortunately, this practice did not
-maintain uniform behavior across all the Linux CD-ROM drivers.
-
-This document describes an effort to establish Uniform behavior across
-all the different CD-ROM device drivers for Linux. This document also
-defines the various *ioctl()'s*, and how the low-level CD-ROM device
-drivers should implement them. Currently (as of the Linux 2.1.\ *x*
-development kernels) several low-level CD-ROM device drivers, including
-both IDE/ATAPI and SCSI, now use this Uniform interface.
-
-When the CD-ROM was developed, the interface between the CD-ROM drive
-and the computer was not specified in the standards. As a result, many
-different CD-ROM interfaces were developed. Some of them had their
-own proprietary design (Sony, Mitsumi, Panasonic, Philips), other
-manufacturers adopted an existing electrical interface and changed
-the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply
-adapted their drives to one or more of the already existing electrical
-interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and
-most of the `NoName` manufacturers). In cases where a new drive really
-brought its own interface or used its own command set and flow control
-scheme, either a separate driver had to be written, or an existing
-driver had to be enhanced. History has delivered us CD-ROM support for
-many of these different interfaces. Nowadays, almost all new CD-ROM
-drives are either IDE/ATAPI or SCSI, and it is very unlikely that any
-manufacturer will create a new interface. Even finding drives for the
-old proprietary interfaces is getting difficult.
-
-When (in the 1.3.70's) I looked at the existing software interface,
-which was expressed through `cdrom.h`, it appeared to be a rather wild
-set of commands and data formats [#f1]_. It seemed that many
-features of the software interface had been added to accommodate the
-capabilities of a particular drive, in an *ad hoc* manner. More
-importantly, it appeared that the behavior of the `standard` commands
-was different for most of the different drivers: e. g., some drivers
-close the tray if an *open()* call occurs when the tray is open, while
-others do not. Some drivers lock the door upon opening the device, to
-prevent an incoherent file system, but others don't, to allow software
-ejection. Undoubtedly, the capabilities of the different drives vary,
-but even when two drives have the same capability their drivers'
-behavior was usually different.
-
-.. [#f1]
-   I cannot recollect what kernel version I looked at, then,
-   presumably 1.2.13 and 1.3.34 --- the latest kernel that I was
-   indirectly involved in.
-
-I decided to start a discussion on how to make all the Linux CD-ROM
-drivers behave more uniformly. I began by contacting the developers of
-the many CD-ROM drivers found in the Linux kernel. Their reactions
-encouraged me to write the Uniform CD-ROM Driver which this document is
-intended to describe. The implementation of the Uniform CD-ROM Driver is
-in the file `cdrom.c`. This driver is intended to be an additional software
-layer that sits on top of the low-level device drivers for each CD-ROM drive.
-By adding this additional layer, it is possible to have all the different
-CD-ROM devices behave **exactly** the same (insofar as the underlying
-hardware will allow).
-
-The goal of the Uniform CD-ROM Driver is **not** to alienate driver developers
-whohave not yet taken steps to support this effort. The goal of Uniform CD-ROM
-Driver is simply to give people writing application programs for CD-ROM drives
-**one** Linux CD-ROM interface with consistent behavior for all
-CD-ROM devices. In addition, this also provides a consistent interface
-between the low-level device driver code and the Linux kernel. Care
-is taken that 100% compatibility exists with the data structures and
-programmer's interface defined in `cdrom.h`. This guide was written to
-help CD-ROM driver developers adapt their code to use the Uniform CD-ROM
-Driver code defined in `cdrom.c`.
-
-Personally, I think that the most important hardware interfaces are
-the IDE/ATAPI drives and, of course, the SCSI drives, but as prices
-of hardware drop continuously, it is also likely that people may have
-more than one CD-ROM drive, possibly of mixed types. It is important
-that these drives behave in the same way. In December 1994, one of the
-cheapest CD-ROM drives was a Philips cm206, a double-speed proprietary
-drive. In the months that I was busy writing a Linux driver for it,
-proprietary drives became obsolete and IDE/ATAPI drives became the
-standard. At the time of the last update to this document (November
-1997) it is becoming difficult to even **find** anything less than a
-16 speed CD-ROM drive, and 24 speed drives are common.
-
-.. _cdrom_api:
-
-Standardizing through another software level
-============================================
-
-At the time this document was conceived, all drivers directly
-implemented the CD-ROM *ioctl()* calls through their own routines. This
-led to the danger of different drivers forgetting to do important things
-like checking that the user was giving the driver valid data. More
-importantly, this led to the divergence of behavior, which has already
-been discussed.
-
-For this reason, the Uniform CD-ROM Driver was created to enforce consistent
-CD-ROM drive behavior, and to provide a common set of services to the various
-low-level CD-ROM device drivers. The Uniform CD-ROM Driver now provides another
-software-level, that separates the *ioctl()* and *open()* implementation
-from the actual hardware implementation. Note that this effort has
-made few changes which will affect a user's application programs. The
-greatest change involved moving the contents of the various low-level
-CD-ROM drivers\' header files to the kernel's cdrom directory. This was
-done to help ensure that the user is only presented with only one cdrom
-interface, the interface defined in `cdrom.h`.
-
-CD-ROM drives are specific enough (i. e., different from other
-block-devices such as floppy or hard disc drives), to define a set
-of common **CD-ROM device operations**, *<cdrom-device>_dops*.
-These operations are different from the classical block-device file
-operations, *<block-device>_fops*.
-
-The routines for the Uniform CD-ROM Driver interface level are implemented
-in the file `cdrom.c`. In this file, the Uniform CD-ROM Driver interfaces
-with the kernel as a block device by registering the following general
-*struct file_operations*::
-
-	struct file_operations cdrom_fops = {
-		NULL,			/∗ lseek ∗/
-		block _read ,		/∗ read—general block-dev read ∗/
-		block _write,		/∗ write—general block-dev write ∗/
-		NULL,			/∗ readdir ∗/
-		NULL,			/∗ select ∗/
-		cdrom_ioctl,		/∗ ioctl ∗/
-		NULL,			/∗ mmap ∗/
-		cdrom_open,		/∗ open ∗/
-		cdrom_release,		/∗ release ∗/
-		NULL,			/∗ fsync ∗/
-		NULL,			/∗ fasync ∗/
-		cdrom_media_changed,	/∗ media change ∗/
-		NULL			/∗ revalidate ∗/
-	};
-
-Every active CD-ROM device shares this *struct*. The routines
-declared above are all implemented in `cdrom.c`, since this file is the
-place where the behavior of all CD-ROM-devices is defined and
-standardized. The actual interface to the various types of CD-ROM
-hardware is still performed by various low-level CD-ROM-device
-drivers. These routines simply implement certain **capabilities**
-that are common to all CD-ROM (and really, all removable-media
-devices).
-
-Registration of a low-level CD-ROM device driver is now done through
-the general routines in `cdrom.c`, not through the Virtual File System
-(VFS) any more. The interface implemented in `cdrom.c` is carried out
-through two general structures that contain information about the
-capabilities of the driver, and the specific drives on which the
-driver operates. The structures are:
-
-cdrom_device_ops
-  This structure contains information about the low-level driver for a
-  CD-ROM device. This structure is conceptually connected to the major
-  number of the device (although some drivers may have different
-  major numbers, as is the case for the IDE driver).
-
-cdrom_device_info
-  This structure contains information about a particular CD-ROM drive,
-  such as its device name, speed, etc. This structure is conceptually
-  connected to the minor number of the device.
-
-Registering a particular CD-ROM drive with the Uniform CD-ROM Driver
-is done by the low-level device driver though a call to::
-
-	register_cdrom(struct cdrom_device_info * <device>_info)
-
-The device information structure, *<device>_info*, contains all the
-information needed for the kernel to interface with the low-level
-CD-ROM device driver. One of the most important entries in this
-structure is a pointer to the *cdrom_device_ops* structure of the
-low-level driver.
-
-The device operations structure, *cdrom_device_ops*, contains a list
-of pointers to the functions which are implemented in the low-level
-device driver. When `cdrom.c` accesses a CD-ROM device, it does it
-through the functions in this structure. It is impossible to know all
-the capabilities of future CD-ROM drives, so it is expected that this
-list may need to be expanded from time to time as new technologies are
-developed. For example, CD-R and CD-R/W drives are beginning to become
-popular, and support will soon need to be added for them. For now, the
-current *struct* is::
-
-	struct cdrom_device_ops {
-		int (*open)(struct cdrom_device_info *, int)
-		void (*release)(struct cdrom_device_info *);
-		int (*drive_status)(struct cdrom_device_info *, int);
-		unsigned int (*check_events)(struct cdrom_device_info *,
-					     unsigned int, int);
-		int (*media_changed)(struct cdrom_device_info *, int);
-		int (*tray_move)(struct cdrom_device_info *, int);
-		int (*lock_door)(struct cdrom_device_info *, int);
-		int (*select_speed)(struct cdrom_device_info *, int);
-		int (*select_disc)(struct cdrom_device_info *, int);
-		int (*get_last_session) (struct cdrom_device_info *,
-					 struct cdrom_multisession *);
-		int (*get_mcn)(struct cdrom_device_info *, struct cdrom_mcn *);
-		int (*reset)(struct cdrom_device_info *);
-		int (*audio_ioctl)(struct cdrom_device_info *,
-				   unsigned int, void *);
-		const int capability;		/* capability flags */
-		int (*generic_packet)(struct cdrom_device_info *,
-				      struct packet_command *);
-	};
-
-When a low-level device driver implements one of these capabilities,
-it should add a function pointer to this *struct*. When a particular
-function is not implemented, however, this *struct* should contain a
-NULL instead. The *capability* flags specify the capabilities of the
-CD-ROM hardware and/or low-level CD-ROM driver when a CD-ROM drive
-is registered with the Uniform CD-ROM Driver.
-
-Note that most functions have fewer parameters than their
-*blkdev_fops* counterparts. This is because very little of the
-information in the structures *inode* and *file* is used. For most
-drivers, the main parameter is the *struct* *cdrom_device_info*, from
-which the major and minor number can be extracted. (Most low-level
-CD-ROM drivers don't even look at the major and minor number though,
-since many of them only support one device.) This will be available
-through *dev* in *cdrom_device_info* described below.
-
-The drive-specific, minor-like information that is registered with
-`cdrom.c`, currently contains the following fields::
-
-  struct cdrom_device_info {
-	const struct cdrom_device_ops * ops; 	/* device operations for this major */
-	struct list_head list;			/* linked list of all device_info */
-	struct gendisk * disk;			/* matching block layer disk */
-	void *  handle;				/* driver-dependent data */
-
-	int mask; 				/* mask of capability: disables them */
-	int speed;				/* maximum speed for reading data */
-	int capacity;				/* number of discs in a jukebox */
-
-	unsigned int options:30;		/* options flags */
-	unsigned mc_flags:2;			/*  media-change buffer flags */
-	unsigned int vfs_events;		/*  cached events for vfs path */
-	unsigned int ioctl_events;		/*  cached events for ioctl path */
-	int use_count;				/*  number of times device is opened */
-	char name[20];				/*  name of the device type */
-
-	__u8 sanyo_slot : 2;			/*  Sanyo 3-CD changer support */
-	__u8 keeplocked : 1;			/*  CDROM_LOCKDOOR status */
-	__u8 reserved : 5;			/*  not used yet */
-	int cdda_method;			/*  see CDDA_* flags */
-	__u8 last_sense;			/*  saves last sense key */
-	__u8 media_written;			/*  dirty flag, DVD+RW bookkeeping */
-	unsigned short mmc3_profile;		/*  current MMC3 profile */
-	int for_data;				/*  unknown:TBD */
-	int (*exit)(struct cdrom_device_info *);/*  unknown:TBD */
-	int mrw_mode_page;			/*  which MRW mode page is in use */
-  };
-
-Using this *struct*, a linked list of the registered minor devices is
-built, using the *next* field. The device number, the device operations
-struct and specifications of properties of the drive are stored in this
-structure.
-
-The *mask* flags can be used to mask out some of the capabilities listed
-in *ops->capability*, if a specific drive doesn't support a feature
-of the driver. The value *speed* specifies the maximum head-rate of the
-drive, measured in units of normal audio speed (176kB/sec raw data or
-150kB/sec file system data). The parameters are declared *const*
-because they describe properties of the drive, which don't change after
-registration.
-
-A few registers contain variables local to the CD-ROM drive. The
-flags *options* are used to specify how the general CD-ROM routines
-should behave. These various flags registers should provide enough
-flexibility to adapt to the different users' wishes (and **not** the
-`arbitrary` wishes of the author of the low-level device driver, as is
-the case in the old scheme). The register *mc_flags* is used to buffer
-the information from *media_changed()* to two separate queues. Other
-data that is specific to a minor drive, can be accessed through *handle*,
-which can point to a data structure specific to the low-level driver.
-The fields *use_count*, *next*, *options* and *mc_flags* need not be
-initialized.
-
-The intermediate software layer that `cdrom.c` forms will perform some
-additional bookkeeping. The use count of the device (the number of
-processes that have the device opened) is registered in *use_count*. The
-function *cdrom_ioctl()* will verify the appropriate user-memory regions
-for read and write, and in case a location on the CD is transferred,
-it will `sanitize` the format by making requests to the low-level
-drivers in a standard format, and translating all formats between the
-user-software and low level drivers. This relieves much of the drivers'
-memory checking and format checking and translation. Also, the necessary
-structures will be declared on the program stack.
-
-The implementation of the functions should be as defined in the
-following sections. Two functions **must** be implemented, namely
-*open()* and *release()*. Other functions may be omitted, their
-corresponding capability flags will be cleared upon registration.
-Generally, a function returns zero on success and negative on error. A
-function call should return only after the command has completed, but of
-course waiting for the device should not use processor time.
-
-::
-
-	int open(struct cdrom_device_info *cdi, int purpose)
-
-*Open()* should try to open the device for a specific *purpose*, which
-can be either:
-
-- Open for reading data, as done by `mount()` (2), or the
-  user commands `dd` or `cat`.
-- Open for *ioctl* commands, as done by audio-CD playing programs.
-
-Notice that any strategic code (closing tray upon *open()*, etc.) is
-done by the calling routine in `cdrom.c`, so the low-level routine
-should only be concerned with proper initialization, such as spinning
-up the disc, etc.
-
-::
-
-	void release(struct cdrom_device_info *cdi)
-
-Device-specific actions should be taken such as spinning down the device.
-However, strategic actions such as ejection of the tray, or unlocking
-the door, should be left over to the general routine *cdrom_release()*.
-This is the only function returning type *void*.
-
-.. _cdrom_drive_status:
-
-::
-
-	int drive_status(struct cdrom_device_info *cdi, int slot_nr)
-
-The function *drive_status*, if implemented, should provide
-information on the status of the drive (not the status of the disc,
-which may or may not be in the drive). If the drive is not a changer,
-*slot_nr* should be ignored. In `cdrom.h` the possibilities are listed::
-
-
-	CDS_NO_INFO		/* no information available */
-	CDS_NO_DISC		/* no disc is inserted, tray is closed */
-	CDS_TRAY_OPEN		/* tray is opened */
-	CDS_DRIVE_NOT_READY	/* something is wrong, tray is moving? */
-	CDS_DISC_OK		/* a disc is loaded and everything is fine */
-
-::
-
-	int media_changed(struct cdrom_device_info *cdi, int disc_nr)
-
-This function is very similar to the original function in $struct
-file_operations*. It returns 1 if the medium of the device *cdi->dev*
-has changed since the last call, and 0 otherwise. The parameter
-*disc_nr* identifies a specific slot in a juke-box, it should be
-ignored for single-disc drives. Note that by `re-routing` this
-function through *cdrom_media_changed()*, we can implement separate
-queues for the VFS and a new *ioctl()* function that can report device
-changes to software (e. g., an auto-mounting daemon).
-
-::
-
-	int tray_move(struct cdrom_device_info *cdi, int position)
-
-This function, if implemented, should control the tray movement. (No
-other function should control this.) The parameter *position* controls
-the desired direction of movement:
-
-- 0 Close tray
-- 1 Open tray
-
-This function returns 0 upon success, and a non-zero value upon
-error. Note that if the tray is already in the desired position, no
-action need be taken, and the return value should be 0.
-
-::
-
-	int lock_door(struct cdrom_device_info *cdi, int lock)
-
-This function (and no other code) controls locking of the door, if the
-drive allows this. The value of *lock* controls the desired locking
-state:
-
-- 0 Unlock door, manual opening is allowed
-- 1 Lock door, tray cannot be ejected manually
-
-This function returns 0 upon success, and a non-zero value upon
-error. Note that if the door is already in the requested state, no
-action need be taken, and the return value should be 0.
-
-::
-
-	int select_speed(struct cdrom_device_info *cdi, int speed)
-
-Some CD-ROM drives are capable of changing their head-speed. There
-are several reasons for changing the speed of a CD-ROM drive. Badly
-pressed CD-ROM s may benefit from less-than-maximum head rate. Modern
-CD-ROM drives can obtain very high head rates (up to *24x* is
-common). It has been reported that these drives can make reading
-errors at these high speeds, reducing the speed can prevent data loss
-in these circumstances. Finally, some of these drives can
-make an annoyingly loud noise, which a lower speed may reduce.
-
-This function specifies the speed at which data is read or audio is
-played back. The value of *speed* specifies the head-speed of the
-drive, measured in units of standard cdrom speed (176kB/sec raw data
-or 150kB/sec file system data). So to request that a CD-ROM drive
-operate at 300kB/sec you would call the CDROM_SELECT_SPEED *ioctl*
-with *speed=2*. The special value `0` means `auto-selection`, i. e.,
-maximum data-rate or real-time audio rate. If the drive doesn't have
-this `auto-selection` capability, the decision should be made on the
-current disc loaded and the return value should be positive. A negative
-return value indicates an error.
-
-::
-
-	int select_disc(struct cdrom_device_info *cdi, int number)
-
-If the drive can store multiple discs (a juke-box) this function
-will perform disc selection. It should return the number of the
-selected disc on success, a negative value on error. Currently, only
-the ide-cd driver supports this functionality.
-
-::
-
-	int get_last_session(struct cdrom_device_info *cdi,
-			     struct cdrom_multisession *ms_info)
-
-This function should implement the old corresponding *ioctl()*. For
-device *cdi->dev*, the start of the last session of the current disc
-should be returned in the pointer argument *ms_info*. Note that
-routines in `cdrom.c` have sanitized this argument: its requested
-format will **always** be of the type *CDROM_LBA* (linear block
-addressing mode), whatever the calling software requested. But
-sanitization goes even further: the low-level implementation may
-return the requested information in *CDROM_MSF* format if it wishes so
-(setting the *ms_info->addr_format* field appropriately, of
-course) and the routines in `cdrom.c` will make the transformation if
-necessary. The return value is 0 upon success.
-
-::
-
-	int get_mcn(struct cdrom_device_info *cdi,
-		    struct cdrom_mcn *mcn)
-
-Some discs carry a `Media Catalog Number` (MCN), also called
-`Universal Product Code` (UPC). This number should reflect the number
-that is generally found in the bar-code on the product. Unfortunately,
-the few discs that carry such a number on the disc don't even use the
-same format. The return argument to this function is a pointer to a
-pre-declared memory region of type *struct cdrom_mcn*. The MCN is
-expected as a 13-character string, terminated by a null-character.
-
-::
-
-	int reset(struct cdrom_device_info *cdi)
-
-This call should perform a hard-reset on the drive (although in
-circumstances that a hard-reset is necessary, a drive may very well not
-listen to commands anymore). Preferably, control is returned to the
-caller only after the drive has finished resetting. If the drive is no
-longer listening, it may be wise for the underlying low-level cdrom
-driver to time out.
-
-::
-
-	int audio_ioctl(struct cdrom_device_info *cdi,
-			unsigned int cmd, void *arg)
-
-Some of the CD-ROM-\ *ioctl()*\ 's defined in `cdrom.h` can be
-implemented by the routines described above, and hence the function
-*cdrom_ioctl* will use those. However, most *ioctl()*\ 's deal with
-audio-control. We have decided to leave these to be accessed through a
-single function, repeating the arguments *cmd* and *arg*. Note that
-the latter is of type *void*, rather than *unsigned long int*.
-The routine *cdrom_ioctl()* does do some useful things,
-though. It sanitizes the address format type to *CDROM_MSF* (Minutes,
-Seconds, Frames) for all audio calls. It also verifies the memory
-location of *arg*, and reserves stack-memory for the argument. This
-makes implementation of the *audio_ioctl()* much simpler than in the
-old driver scheme. For example, you may look up the function
-*cm206_audio_ioctl()* `cm206.c` that should be updated with
-this documentation.
-
-An unimplemented ioctl should return *-ENOSYS*, but a harmless request
-(e. g., *CDROMSTART*) may be ignored by returning 0 (success). Other
-errors should be according to the standards, whatever they are. When
-an error is returned by the low-level driver, the Uniform CD-ROM Driver
-tries whenever possible to return the error code to the calling program.
-(We may decide to sanitize the return value in *cdrom_ioctl()* though, in
-order to guarantee a uniform interface to the audio-player software.)
-
-::
-
-	int dev_ioctl(struct cdrom_device_info *cdi,
-		      unsigned int cmd, unsigned long arg)
-
-Some *ioctl()'s* seem to be specific to certain CD-ROM drives. That is,
-they are introduced to service some capabilities of certain drives. In
-fact, there are 6 different *ioctl()'s* for reading data, either in some
-particular kind of format, or audio data. Not many drives support
-reading audio tracks as data, I believe this is because of protection
-of copyrights of artists. Moreover, I think that if audio-tracks are
-supported, it should be done through the VFS and not via *ioctl()'s*. A
-problem here could be the fact that audio-frames are 2352 bytes long,
-so either the audio-file-system should ask for 75264 bytes at once
-(the least common multiple of 512 and 2352), or the drivers should
-bend their backs to cope with this incoherence (to which I would be
-opposed). Furthermore, it is very difficult for the hardware to find
-the exact frame boundaries, since there are no synchronization headers
-in audio frames. Once these issues are resolved, this code should be
-standardized in `cdrom.c`.
-
-Because there are so many *ioctl()'s* that seem to be introduced to
-satisfy certain drivers [#f2]_, any non-standard *ioctl()*\ s
-are routed through the call *dev_ioctl()*. In principle, `private`
-*ioctl()*\ 's should be numbered after the device's major number, and not
-the general CD-ROM *ioctl* number, `0x53`. Currently the
-non-supported *ioctl()'s* are:
-
-	CDROMREADMODE1, CDROMREADMODE2, CDROMREADAUDIO, CDROMREADRAW,
-	CDROMREADCOOKED, CDROMSEEK, CDROMPLAY-BLK and CDROM-READALL
-
-.. [#f2]
-
-   Is there software around that actually uses these? I'd be interested!
-
-.. _cdrom_capabilities:
-
-CD-ROM capabilities
--------------------
-
-Instead of just implementing some *ioctl* calls, the interface in
-`cdrom.c` supplies the possibility to indicate the **capabilities**
-of a CD-ROM drive. This can be done by ORing any number of
-capability-constants that are defined in `cdrom.h` at the registration
-phase. Currently, the capabilities are any of::
-
-	CDC_CLOSE_TRAY		/* can close tray by software control */
-	CDC_OPEN_TRAY		/* can open tray */
-	CDC_LOCK		/* can lock and unlock the door */
-	CDC_SELECT_SPEED	/* can select speed, in units of * sim*150 ,kB/s */
-	CDC_SELECT_DISC		/* drive is juke-box */
-	CDC_MULTI_SESSION	/* can read sessions *> rm1* */
-	CDC_MCN			/* can read Media Catalog Number */
-	CDC_MEDIA_CHANGED	/* can report if disc has changed */
-	CDC_PLAY_AUDIO		/* can perform audio-functions (play, pause, etc) */
-	CDC_RESET		/* hard reset device */
-	CDC_IOCTLS		/* driver has non-standard ioctls */
-	CDC_DRIVE_STATUS	/* driver implements drive status */
-
-The capability flag is declared *const*, to prevent drivers from
-accidentally tampering with the contents. The capability fags actually
-inform `cdrom.c` of what the driver can do. If the drive found
-by the driver does not have the capability, is can be masked out by
-the *cdrom_device_info* variable *mask*. For instance, the SCSI CD-ROM
-driver has implemented the code for loading and ejecting CD-ROM's, and
-hence its corresponding flags in *capability* will be set. But a SCSI
-CD-ROM drive might be a caddy system, which can't load the tray, and
-hence for this drive the *cdrom_device_info* struct will have set
-the *CDC_CLOSE_TRAY* bit in *mask*.
-
-In the file `cdrom.c` you will encounter many constructions of the type::
-
-	if (cdo->capability & ∼cdi->mask & CDC _⟨capability⟩) ...
-
-There is no *ioctl* to set the mask... The reason is that
-I think it is better to control the **behavior** rather than the
-**capabilities**.
-
-Options
--------
-
-A final flag register controls the **behavior** of the CD-ROM
-drives, in order to satisfy different users' wishes, hopefully
-independently of the ideas of the respective author who happened to
-have made the drive's support available to the Linux community. The
-current behavior options are::
-
-	CDO_AUTO_CLOSE	/* try to close tray upon device open() */
-	CDO_AUTO_EJECT	/* try to open tray on last device close() */
-	CDO_USE_FFLAGS	/* use file_pointer->f_flags to indicate purpose for open() */
-	CDO_LOCK	/* try to lock door if device is opened */
-	CDO_CHECK_TYPE	/* ensure disc type is data if opened for data */
-
-The initial value of this register is
-`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`, reflecting my own view on user
-interface and software standards. Before you protest, there are two
-new *ioctl()'s* implemented in `cdrom.c`, that allow you to control the
-behavior by software. These are::
-
-	CDROM_SET_OPTIONS	/* set options specified in (int)arg */
-	CDROM_CLEAR_OPTIONS	/* clear options specified in (int)arg */
-
-One option needs some more explanation: *CDO_USE_FFLAGS*. In the next
-newsection we explain what the need for this option is.
-
-A software package `setcd`, available from the Debian distribution
-and `sunsite.unc.edu`, allows user level control of these flags.
-
-
-The need to know the purpose of opening the CD-ROM device
-=========================================================
-
-Traditionally, Unix devices can be used in two different `modes`,
-either by reading/writing to the device file, or by issuing
-controlling commands to the device, by the device's *ioctl()*
-call. The problem with CD-ROM drives, is that they can be used for
-two entirely different purposes. One is to mount removable
-file systems, CD-ROM's, the other is to play audio CD's. Audio commands
-are implemented entirely through *ioctl()\'s*, presumably because the
-first implementation (SUN?) has been such. In principle there is
-nothing wrong with this, but a good control of the `CD player` demands
-that the device can **always** be opened in order to give the
-*ioctl* commands, regardless of the state the drive is in.
-
-On the other hand, when used as a removable-media disc drive (what the
-original purpose of CD-ROM s is) we would like to make sure that the
-disc drive is ready for operation upon opening the device. In the old
-scheme, some CD-ROM drivers don't do any integrity checking, resulting
-in a number of i/o errors reported by the VFS to the kernel when an
-attempt for mounting a CD-ROM on an empty drive occurs. This is not a
-particularly elegant way to find out that there is no CD-ROM inserted;
-it more-or-less looks like the old IBM-PC trying to read an empty floppy
-drive for a couple of seconds, after which the system complains it
-can't read from it. Nowadays we can **sense** the existence of a
-removable medium in a drive, and we believe we should exploit that
-fact. An integrity check on opening of the device, that verifies the
-availability of a CD-ROM and its correct type (data), would be
-desirable.
-
-These two ways of using a CD-ROM drive, principally for data and
-secondarily for playing audio discs, have different demands for the
-behavior of the *open()* call. Audio use simply wants to open the
-device in order to get a file handle which is needed for issuing
-*ioctl* commands, while data use wants to open for correct and
-reliable data transfer. The only way user programs can indicate what
-their *purpose* of opening the device is, is through the *flags*
-parameter (see `open(2)`). For CD-ROM devices, these flags aren't
-implemented (some drivers implement checking for write-related flags,
-but this is not strictly necessary if the device file has correct
-permission flags). Most option flags simply don't make sense to
-CD-ROM devices: *O_CREAT*, *O_NOCTTY*, *O_TRUNC*, *O_APPEND*, and
-*O_SYNC* have no meaning to a CD-ROM.
-
-We therefore propose to use the flag *O_NONBLOCK* to indicate
-that the device is opened just for issuing *ioctl*
-commands. Strictly, the meaning of *O_NONBLOCK* is that opening and
-subsequent calls to the device don't cause the calling process to
-wait. We could interpret this as don't wait until someone has
-inserted some valid data-CD-ROM. Thus, our proposal of the
-implementation for the *open()* call for CD-ROM s is:
-
-- If no other flags are set than *O_RDONLY*, the device is opened
-  for data transfer, and the return value will be 0 only upon successful
-  initialization of the transfer. The call may even induce some actions
-  on the CD-ROM, such as closing the tray.
-- If the option flag *O_NONBLOCK* is set, opening will always be
-  successful, unless the whole device doesn't exist. The drive will take
-  no actions whatsoever.
-
-And what about standards?
--------------------------
-
-You might hesitate to accept this proposal as it comes from the
-Linux community, and not from some standardizing institute. What
-about SUN, SGI, HP and all those other Unix and hardware vendors?
-Well, these companies are in the lucky position that they generally
-control both the hardware and software of their supported products,
-and are large enough to set their own standard. They do not have to
-deal with a dozen or more different, competing hardware
-configurations\ [#f3]_.
-
-.. [#f3]
-
-   Incidentally, I think that SUN's approach to mounting CD-ROM s is very
-   good in origin: under Solaris a volume-daemon automatically mounts a
-   newly inserted CD-ROM under `/cdrom/*<volume-name>*`.
-
-   In my opinion they should have pushed this
-   further and have **every** CD-ROM on the local area network be
-   mounted at the similar location, i. e., no matter in which particular
-   machine you insert a CD-ROM, it will always appear at the same
-   position in the directory tree, on every system. When I wanted to
-   implement such a user-program for Linux, I came across the
-   differences in behavior of the various drivers, and the need for an
-   *ioctl* informing about media changes.
-
-We believe that using *O_NONBLOCK* to indicate that a device is being opened
-for *ioctl* commands only can be easily introduced in the Linux
-community. All the CD-player authors will have to be informed, we can
-even send in our own patches to the programs. The use of *O_NONBLOCK*
-has most likely no influence on the behavior of the CD-players on
-other operating systems than Linux. Finally, a user can always revert
-to old behavior by a call to
-*ioctl(file_descriptor, CDROM_CLEAR_OPTIONS, CDO_USE_FFLAGS)*.
-
-The preferred strategy of *open()*
-----------------------------------
-
-The routines in `cdrom.c` are designed in such a way that run-time
-configuration of the behavior of CD-ROM devices (of **any** type)
-can be carried out, by the *CDROM_SET/CLEAR_OPTIONS* *ioctls*. Thus, various
-modes of operation can be set:
-
-`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`
-   This is the default setting. (With *CDO_CHECK_TYPE* it will be better, in
-   the future.) If the device is not yet opened by any other process, and if
-   the device is being opened for data (*O_NONBLOCK* is not set) and the
-   tray is found to be open, an attempt to close the tray is made. Then,
-   it is verified that a disc is in the drive and, if *CDO_CHECK_TYPE* is
-   set, that it contains tracks of type `data mode 1`. Only if all tests
-   are passed is the return value zero. The door is locked to prevent file
-   system corruption. If the drive is opened for audio (*O_NONBLOCK* is
-   set), no actions are taken and a value of 0 will be returned.
-
-`CDO_AUTO_CLOSE | CDO_AUTO_EJECT | CDO_LOCK`
-   This mimics the behavior of the current sbpcd-driver. The option flags are
-   ignored, the tray is closed on the first open, if necessary. Similarly,
-   the tray is opened on the last release, i. e., if a CD-ROM is unmounted,
-   it is automatically ejected, such that the user can replace it.
-
-We hope that these option can convince everybody (both driver
-maintainers and user program developers) to adopt the new CD-ROM
-driver scheme and option flag interpretation.
-
-Description of routines in `cdrom.c`
-====================================
-
-Only a few routines in `cdrom.c` are exported to the drivers. In this
-new section we will discuss these, as well as the functions that `take
-over' the CD-ROM interface to the kernel. The header file belonging
-to `cdrom.c` is called `cdrom.h`. Formerly, some of the contents of this
-file were placed in the file `ucdrom.h`, but this file has now been
-merged back into `cdrom.h`.
-
-::
-
-	struct file_operations cdrom_fops
-
-The contents of this structure were described in cdrom_api_.
-A pointer to this structure is assigned to the *fops* field
-of the *struct gendisk*.
-
-::
-
-	int register_cdrom(struct cdrom_device_info *cdi)
-
-This function is used in about the same way one registers *cdrom_fops*
-with the kernel, the device operations and information structures,
-as described in cdrom_api_, should be registered with the
-Uniform CD-ROM Driver::
-
-	register_cdrom(&<device>_info);
-
-
-This function returns zero upon success, and non-zero upon
-failure. The structure *<device>_info* should have a pointer to the
-driver's *<device>_dops*, as in::
-
-	struct cdrom_device_info <device>_info = {
-		<device>_dops;
-		...
-	}
-
-Note that a driver must have one static structure, *<device>_dops*, while
-it may have as many structures *<device>_info* as there are minor devices
-active. *Register_cdrom()* builds a linked list from these.
-
-
-::
-
-	void unregister_cdrom(struct cdrom_device_info *cdi)
-
-Unregistering device *cdi* with minor number *MINOR(cdi->dev)* removes
-the minor device from the list. If it was the last registered minor for
-the low-level driver, this disconnects the registered device-operation
-routines from the CD-ROM interface. This function returns zero upon
-success, and non-zero upon failure.
-
-::
-
-	int cdrom_open(struct inode * ip, struct file * fp)
-
-This function is not called directly by the low-level drivers, it is
-listed in the standard *cdrom_fops*. If the VFS opens a file, this
-function becomes active. A strategy is implemented in this routine,
-taking care of all capabilities and options that are set in the
-*cdrom_device_ops* connected to the device. Then, the program flow is
-transferred to the device_dependent *open()* call.
-
-::
-
-	void cdrom_release(struct inode *ip, struct file *fp)
-
-This function implements the reverse-logic of *cdrom_open()*, and then
-calls the device-dependent *release()* routine. When the use-count has
-reached 0, the allocated buffers are flushed by calls to *sync_dev(dev)*
-and *invalidate_buffers(dev)*.
-
-
-.. _cdrom_ioctl:
-
-::
-
-	int cdrom_ioctl(struct inode *ip, struct file *fp,
-			unsigned int cmd, unsigned long arg)
-
-This function handles all the standard *ioctl* requests for CD-ROM
-devices in a uniform way. The different calls fall into three
-categories: *ioctl()'s* that can be directly implemented by device
-operations, ones that are routed through the call *audio_ioctl()*, and
-the remaining ones, that are presumable device-dependent. Generally, a
-negative return value indicates an error.
-
-Directly implemented *ioctl()'s*
---------------------------------
-
-The following `old` CD-ROM *ioctl()*\ 's are implemented by directly
-calling device-operations in *cdrom_device_ops*, if implemented and
-not masked:
-
-`CDROMMULTISESSION`
-	Requests the last session on a CD-ROM.
-`CDROMEJECT`
-	Open tray.
-`CDROMCLOSETRAY`
-	Close tray.
-`CDROMEJECT_SW`
-	If *arg\not=0*, set behavior to auto-close (close
-	tray on first open) and auto-eject (eject on last release), otherwise
-	set behavior to non-moving on *open()* and *release()* calls.
-`CDROM_GET_MCN`
-	Get the Media Catalog Number from a CD.
-
-*Ioctl*s routed through *audio_ioctl()*
----------------------------------------
-
-The following set of *ioctl()'s* are all implemented through a call to
-the *cdrom_fops* function *audio_ioctl()*. Memory checks and
-allocation are performed in *cdrom_ioctl()*, and also sanitization of
-address format (*CDROM_LBA*/*CDROM_MSF*) is done.
-
-`CDROMSUBCHNL`
-	Get sub-channel data in argument *arg* of type
-	`struct cdrom_subchnl *`.
-`CDROMREADTOCHDR`
-	Read Table of Contents header, in *arg* of type
-	`struct cdrom_tochdr *`.
-`CDROMREADTOCENTRY`
-	Read a Table of Contents entry in *arg* and specified by *arg*
-	of type `struct cdrom_tocentry *`.
-`CDROMPLAYMSF`
-	Play audio fragment specified in Minute, Second, Frame format,
-	delimited by *arg* of type `struct cdrom_msf *`.
-`CDROMPLAYTRKIND`
-	Play audio fragment in track-index format delimited by *arg*
-	of type `struct cdrom_ti *`.
-`CDROMVOLCTRL`
-	Set volume specified by *arg* of type `struct cdrom_volctrl *`.
-`CDROMVOLREAD`
-	Read volume into by *arg* of type `struct cdrom_volctrl *`.
-`CDROMSTART`
-	Spin up disc.
-`CDROMSTOP`
-	Stop playback of audio fragment.
-`CDROMPAUSE`
-	Pause playback of audio fragment.
-`CDROMRESUME`
-	Resume playing.
-
-New *ioctl()'s* in `cdrom.c`
-----------------------------
-
-The following *ioctl()'s* have been introduced to allow user programs to
-control the behavior of individual CD-ROM devices. New *ioctl*
-commands can be identified by the underscores in their names.
-
-`CDROM_SET_OPTIONS`
-	Set options specified by *arg*. Returns the option flag register
-	after modification. Use *arg = \rm0* for reading the current flags.
-`CDROM_CLEAR_OPTIONS`
-	Clear options specified by *arg*. Returns the option flag register
-	after modification.
-`CDROM_SELECT_SPEED`
-	Select head-rate speed of disc specified as by *arg* in units
-	of standard cdrom speed (176\,kB/sec raw data or
-	150kB/sec file system data). The value 0 means `auto-select`,
-	i. e., play audio discs at real time and data discs at maximum speed.
-	The value *arg* is checked against the maximum head rate of the
-	drive found in the *cdrom_dops*.
-`CDROM_SELECT_DISC`
-	Select disc numbered *arg* from a juke-box.
-
-	First disc is numbered 0. The number *arg* is checked against the
-	maximum number of discs in the juke-box found in the *cdrom_dops*.
-`CDROM_MEDIA_CHANGED`
-	Returns 1 if a disc has been changed since the last call.
-	Note that calls to *cdrom_media_changed* by the VFS are treated
-	by an independent queue, so both mechanisms will detect a
-	media change once. For juke-boxes, an extra argument *arg*
-	specifies the slot for which the information is given. The special
-	value *CDSL_CURRENT* requests that information about the currently
-	selected slot be returned.
-`CDROM_DRIVE_STATUS`
-	Returns the status of the drive by a call to
-	*drive_status()*. Return values are defined in cdrom_drive_status_.
-	Note that this call doesn't return information on the
-	current playing activity of the drive; this can be polled through
-	an *ioctl* call to *CDROMSUBCHNL*. For juke-boxes, an extra argument
-	*arg* specifies the slot for which (possibly limited) information is
-	given. The special value *CDSL_CURRENT* requests that information
-	about the currently selected slot be returned.
-`CDROM_DISC_STATUS`
-	Returns the type of the disc currently in the drive.
-	It should be viewed as a complement to *CDROM_DRIVE_STATUS*.
-	This *ioctl* can provide *some* information about the current
-	disc that is inserted in the drive. This functionality used to be
-	implemented in the low level drivers, but is now carried out
-	entirely in Uniform CD-ROM Driver.
-
-	The history of development of the CD's use as a carrier medium for
-	various digital information has lead to many different disc types.
-	This *ioctl* is useful only in the case that CDs have \emph {only
-	one} type of data on them. While this is often the case, it is
-	also very common for CDs to have some tracks with data, and some
-	tracks with audio. Because this is an existing interface, rather
-	than fixing this interface by changing the assumptions it was made
-	under, thereby breaking all user applications that use this
-	function, the Uniform CD-ROM Driver implements this *ioctl* as
-	follows: If the CD in question has audio tracks on it, and it has
-	absolutely no CD-I, XA, or data tracks on it, it will be reported
-	as *CDS_AUDIO*. If it has both audio and data tracks, it will
-	return *CDS_MIXED*. If there are no audio tracks on the disc, and
-	if the CD in question has any CD-I tracks on it, it will be
-	reported as *CDS_XA_2_2*. Failing that, if the CD in question
-	has any XA tracks on it, it will be reported as *CDS_XA_2_1*.
-	Finally, if the CD in question has any data tracks on it,
-	it will be reported as a data CD (*CDS_DATA_1*).
-
-	This *ioctl* can return::
-
-		CDS_NO_INFO	/* no information available */
-		CDS_NO_DISC	/* no disc is inserted, or tray is opened */
-		CDS_AUDIO	/* Audio disc (2352 audio bytes/frame) */
-		CDS_DATA_1	/* data disc, mode 1 (2048 user bytes/frame) */
-		CDS_XA_2_1	/* mixed data (XA), mode 2, form 1 (2048 user bytes) */
-		CDS_XA_2_2	/* mixed data (XA), mode 2, form 1 (2324 user bytes) */
-		CDS_MIXED	/* mixed audio/data disc */
-
-	For some information concerning frame layout of the various disc
-	types, see a recent version of `cdrom.h`.
-
-`CDROM_CHANGER_NSLOTS`
-	Returns the number of slots in a juke-box.
-`CDROMRESET`
-	Reset the drive.
-`CDROM_GET_CAPABILITY`
-	Returns the *capability* flags for the drive. Refer to section
-	cdrom_capabilities_ for more information on these flags.
-`CDROM_LOCKDOOR`
-	 Locks the door of the drive. `arg == 0` unlocks the door,
-	 any other value locks it.
-`CDROM_DEBUG`
-	 Turns on debugging info. Only root is allowed to do this.
-	 Same semantics as CDROM_LOCKDOOR.
-
-
-Device dependent *ioctl()'s*
-----------------------------
-
-Finally, all other *ioctl()'s* are passed to the function *dev_ioctl()*,
-if implemented. No memory allocation or verification is carried out.
-
-How to update your driver
-=========================
-
-- Make a backup of your current driver.
-- Get hold of the files `cdrom.c` and `cdrom.h`, they should be in
-  the directory tree that came with this documentation.
-- Make sure you include `cdrom.h`.
-- Change the 3rd argument of *register_blkdev* from `&<your-drive>_fops`
-  to `&cdrom_fops`.
-- Just after that line, add the following to register with the Uniform
-  CD-ROM Driver::
-
-	register_cdrom(&<your-drive>_info);*
-
-  Similarly, add a call to *unregister_cdrom()* at the appropriate place.
-- Copy an example of the device-operations *struct* to your
-  source, e. g., from `cm206.c` *cm206_dops*, and change all
-  entries to names corresponding to your driver, or names you just
-  happen to like. If your driver doesn't support a certain function,
-  make the entry *NULL*. At the entry *capability* you should list all
-  capabilities your driver currently supports. If your driver
-  has a capability that is not listed, please send me a message.
-- Copy the *cdrom_device_info* declaration from the same example
-  driver, and modify the entries according to your needs. If your
-  driver dynamically determines the capabilities of the hardware, this
-  structure should also be declared dynamically.
-- Implement all functions in your `<device>_dops` structure,
-  according to prototypes listed in  `cdrom.h`, and specifications given
-  in cdrom_api_. Most likely you have already implemented
-  the code in a large part, and you will almost certainly need to adapt the
-  prototype and return values.
-- Rename your `<device>_ioctl()` function to *audio_ioctl* and
-  change the prototype a little. Remove entries listed in the first
-  part in cdrom_ioctl_, if your code was OK, these are
-  just calls to the routines you adapted in the previous step.
-- You may remove all remaining memory checking code in the
-  *audio_ioctl()* function that deals with audio commands (these are
-  listed in the second part of cdrom_ioctl_. There is no
-  need for memory allocation either, so most *case*s in the *switch*
-  statement look similar to::
-
-	case CDROMREADTOCENTRY:
-		get_toc_entry\bigl((struct cdrom_tocentry *) arg);
-
-- All remaining *ioctl* cases must be moved to a separate
-  function, *<device>_ioctl*, the device-dependent *ioctl()'s*. Note that
-  memory checking and allocation must be kept in this code!
-- Change the prototypes of *<device>_open()* and
-  *<device>_release()*, and remove any strategic code (i. e., tray
-  movement, door locking, etc.).
-- Try to recompile the drivers. We advise you to use modules, both
-  for `cdrom.o` and your driver, as debugging is much easier this
-  way.
-
-Thanks
-======
-
-Thanks to all the people involved. First, Erik Andersen, who has
-taken over the torch in maintaining `cdrom.c` and integrating much
-CD-ROM-related code in the 2.1-kernel. Thanks to Scott Snyder and
-Gerd Knorr, who were the first to implement this interface for SCSI
-and IDE-CD drivers and added many ideas for extension of the data
-structures relative to kernel~2.0. Further thanks to Heiko Eißfeldt,
-Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard Mönkeberg and Andrew Kroll,
-the Linux CD-ROM device driver developers who were kind
-enough to give suggestions and criticisms during the writing. Finally
-of course, I want to thank Linus Torvalds for making this possible in
-the first place.
diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd
deleted file mode 100644
index a5f2a7f1ff46..000000000000
--- a/Documentation/cdrom/ide-cd
+++ /dev/null
@@ -1,534 +0,0 @@
-IDE-CD driver documentation
-Originally by scott snyder  <snyder@fnald0.fnal.gov> (19 May 1996)
-Carrying on the torch is: Erik Andersen <andersee@debian.org>
-New maintainers (19 Oct 1998): Jens Axboe <axboe@image.dk>
-
-1. Introduction
----------------
-
-The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant 
-CDROM drives which attach to an IDE interface.  Note that some CDROM vendors
-(including Mitsumi, Sony, Creative, Aztech, and Goldstar) have made
-both ATAPI-compliant drives and drives which use a proprietary
-interface.  If your drive uses one of those proprietary interfaces,
-this driver will not work with it (but one of the other CDROM drivers
-probably will).  This driver will not work with `ATAPI' drives which
-attach to the parallel port.  In addition, there is at least one drive
-(CyCDROM CR520ie) which attaches to the IDE port but is not ATAPI;
-this driver will not work with drives like that either (but see the
-aztcd driver).
-
-This driver provides the following features:
-
- - Reading from data tracks, and mounting ISO 9660 filesystems.
-
- - Playing audio tracks.  Most of the CDROM player programs floating
-   around should work; I usually use Workman.
-
- - Multisession support.
-
- - On drives which support it, reading digital audio data directly
-   from audio tracks.  The program cdda2wav can be used for this.
-   Note, however, that only some drives actually support this.
-
- - There is now support for CDROM changers which comply with the 
-   ATAPI 2.6 draft standard (such as the NEC CDR-251).  This additional
-   functionality includes a function call to query which slot is the
-   currently selected slot, a function call to query which slots contain
-   CDs, etc. A sample program which demonstrates this functionality is
-   appended to the end of this file.  The Sanyo 3-disc changer
-   (which does not conform to the standard) is also now supported.
-   Please note the driver refers to the first CD as slot # 0.
-
-
-2. Installation
----------------
-
-0. The ide-cd relies on the ide disk driver.  See
-   Documentation/ide/ide.txt for up-to-date information on the ide
-   driver.
-
-1. Make sure that the ide and ide-cd drivers are compiled into the
-   kernel you're using.  When configuring the kernel, in the section 
-   entitled "Floppy, IDE, and other block devices", say either `Y' 
-   (which will compile the support directly into the kernel) or `M'
-   (to compile support as a module which can be loaded and unloaded)
-   to the options: 
-
-      ATA/ATAPI/MFM/RLL support
-      Include IDE/ATAPI CDROM support
-
-   Depending on what type of IDE interface you have, you may need to
-   specify additional configuration options.  See
-   Documentation/ide/ide.txt.
-
-2. You should also ensure that the iso9660 filesystem is either
-   compiled into the kernel or available as a loadable module.  You
-   can see if a filesystem is known to the kernel by catting
-   /proc/filesystems.
-
-3. The CDROM drive should be connected to the host on an IDE
-   interface.  Each interface on a system is defined by an I/O port
-   address and an IRQ number, the standard assignments being
-   0x1f0 and 14 for the primary interface and 0x170 and 15 for the
-   secondary interface.  Each interface can control up to two devices,
-   where each device can be a hard drive, a CDROM drive, a floppy drive, 
-   or a tape drive.  The two devices on an interface are called `master'
-   and `slave'; this is usually selectable via a jumper on the drive.
-
-   Linux names these devices as follows.  The master and slave devices
-   on the primary IDE interface are called `hda' and `hdb',
-   respectively.  The drives on the secondary interface are called
-   `hdc' and `hdd'.  (Interfaces at other locations get other letters
-   in the third position; see Documentation/ide/ide.txt.)
-
-   If you want your CDROM drive to be found automatically by the
-   driver, you should make sure your IDE interface uses either the
-   primary or secondary addresses mentioned above.  In addition, if
-   the CDROM drive is the only device on the IDE interface, it should
-   be jumpered as `master'.  (If for some reason you cannot configure
-   your system in this manner, you can probably still use the driver.
-   You may have to pass extra configuration information to the kernel
-   when you boot, however.  See Documentation/ide/ide.txt for more
-   information.)
-
-4. Boot the system.  If the drive is recognized, you should see a
-   message which looks like
-
-     hdb: NEC CD-ROM DRIVE:260, ATAPI CDROM drive
-
-   If you do not see this, see section 5 below.
-
-5. You may want to create a symbolic link /dev/cdrom pointing to the
-   actual device.  You can do this with the command
-
-     ln -s  /dev/hdX  /dev/cdrom
-
-   where X should be replaced by the letter indicating where your
-   drive is installed.
-
-6. You should be able to see any error messages from the driver with
-   the `dmesg' command.
-
-
-3. Basic usage
---------------
-
-An ISO 9660 CDROM can be mounted by putting the disc in the drive and 
-typing (as root)
-
-  mount -t iso9660 /dev/cdrom /mnt/cdrom
-
-where it is assumed that /dev/cdrom is a link pointing to the actual
-device (as described in step 5 of the last section) and /mnt/cdrom is
-an empty directory.  You should now be able to see the contents of the
-CDROM under the /mnt/cdrom directory.  If you want to eject the CDROM,
-you must first dismount it with a command like
-
-  umount /mnt/cdrom
-
-Note that audio CDs cannot be mounted.
-
-Some distributions set up /etc/fstab to always try to mount a CDROM
-filesystem on bootup.  It is not required to mount the CDROM in this
-manner, though, and it may be a nuisance if you change CDROMs often.
-You should feel free to remove the cdrom line from /etc/fstab and
-mount CDROMs manually if that suits you better.
-
-Multisession and photocd discs should work with no special handling.
-The hpcdtoppm package (ftp.gwdg.de:/pub/linux/hpcdtoppm/) may be
-useful for reading photocds.
-
-To play an audio CD, you should first unmount and remove any data
-CDROM.  Any of the CDROM player programs should then work (workman,
-workbone, cdplayer, etc.).
-
-On a few drives, you can read digital audio directly using a program
-such as cdda2wav.  The only types of drive which I've heard support
-this are Sony and Toshiba drives.  You will get errors if you try to
-use this function on a drive which does not support it.
-
-For supported changers, you can use the `cdchange' program (appended to
-the end of this file) to switch between changer slots.  Note that the
-drive should be unmounted before attempting this.  The program takes
-two arguments:  the CDROM device, and the slot number to which you wish
-to change.  If the slot number is -1, the drive is unloaded.
-
-
-4. Common problems
-------------------
-
-This section discusses some common problems encountered when trying to
-use the driver, and some possible solutions.  Note that if you are
-experiencing problems, you should probably also review
-Documentation/ide/ide.txt for current information about the underlying
-IDE support code.  Some of these items apply only to earlier versions
-of the driver, but are mentioned here for completeness.
-
-In most cases, you should probably check with `dmesg' for any errors
-from the driver.
-
-a. Drive is not detected during booting.
-
-   - Review the configuration instructions above and in
-     Documentation/ide/ide.txt, and check how your hardware is
-     configured.
-
-   - If your drive is the only device on an IDE interface, it should
-     be jumpered as master, if at all possible.
-
-   - If your IDE interface is not at the standard addresses of 0x170
-     or 0x1f0, you'll need to explicitly inform the driver using a
-     lilo option.  See Documentation/ide/ide.txt.  (This feature was
-     added around kernel version 1.3.30.)
-
-   - If the autoprobing is not finding your drive, you can tell the
-     driver to assume that one exists by using a lilo option of the
-     form `hdX=cdrom', where X is the drive letter corresponding to
-     where your drive is installed.  Note that if you do this and you 
-     see a boot message like
-
-       hdX: ATAPI cdrom (?)
-
-     this does _not_ mean that the driver has successfully detected
-     the drive; rather, it means that the driver has not detected a
-     drive, but is assuming there's one there anyway because you told
-     it so.  If you actually try to do I/O to a drive defined at a
-     nonexistent or nonresponding I/O address, you'll probably get
-     errors with a status value of 0xff.
-
-   - Some IDE adapters require a nonstandard initialization sequence
-     before they'll function properly.  (If this is the case, there
-     will often be a separate MS-DOS driver just for the controller.)
-     IDE interfaces on sound cards often fall into this category.
-
-     Support for some interfaces needing extra initialization is
-     provided in later 1.3.x kernels.  You may need to turn on
-     additional kernel configuration options to get them to work;
-     see Documentation/ide/ide.txt.
-
-     Even if support is not available for your interface, you may be
-     able to get it to work with the following procedure.  First boot
-     MS-DOS and load the appropriate drivers.  Then warm-boot linux
-     (i.e., without powering off).  If this works, it can be automated
-     by running loadlin from the MS-DOS autoexec.
-
-
-b. Timeout/IRQ errors.
-
-  - If you always get timeout errors, interrupts from the drive are
-    probably not making it to the host.
-
-  - IRQ problems may also be indicated by the message
-    `IRQ probe failed (<n>)' while booting.  If <n> is zero, that
-    means that the system did not see an interrupt from the drive when
-    it was expecting one (on any feasible IRQ).  If <n> is negative,
-    that means the system saw interrupts on multiple IRQ lines, when
-    it was expecting to receive just one from the CDROM drive.
-
-  - Double-check your hardware configuration to make sure that the IRQ
-    number of your IDE interface matches what the driver expects.
-    (The usual assignments are 14 for the primary (0x1f0) interface
-    and 15 for the secondary (0x170) interface.)  Also be sure that
-    you don't have some other hardware which might be conflicting with
-    the IRQ you're using.  Also check the BIOS setup for your system;
-    some have the ability to disable individual IRQ levels, and I've
-    had one report of a system which was shipped with IRQ 15 disabled
-    by default.
-
-  - Note that many MS-DOS CDROM drivers will still function even if
-    there are hardware problems with the interrupt setup; they
-    apparently don't use interrupts.
-
-  - If you own a Pioneer DR-A24X, you _will_ get nasty error messages 
-    on boot such as "irq timeout: status=0x50 { DriveReady SeekComplete }"
-    The Pioneer DR-A24X CDROM drives are fairly popular these days.
-    Unfortunately, these drives seem to become very confused when we perform
-    the standard Linux ATA disk drive probe. If you own one of these drives,
-    you can bypass the ATA probing which confuses these CDROM drives, by 
-    adding `append="hdX=noprobe hdX=cdrom"' to your lilo.conf file and running 
-    lilo (again where X is the drive letter corresponding to where your drive 
-    is installed.)
-    
-c. System hangups.
-
-  - If the system locks up when you try to access the CDROM, the most
-    likely cause is that you have a buggy IDE adapter which doesn't
-    properly handle simultaneous transactions on multiple interfaces.
-    The most notorious of these is the CMD640B chip.  This problem can
-    be worked around by specifying the `serialize' option when
-    booting.  Recent kernels should be able to detect the need for
-    this automatically in most cases, but the detection is not
-    foolproof.  See Documentation/ide/ide.txt for more information
-    about the `serialize' option and the CMD640B.
-
-  - Note that many MS-DOS CDROM drivers will work with such buggy
-    hardware, apparently because they never attempt to overlap CDROM
-    operations with other disk activity.
-
-
-d. Can't mount a CDROM.
-
-  - If you get errors from mount, it may help to check `dmesg' to see
-    if there are any more specific errors from the driver or from the
-    filesystem.
-
-  - Make sure there's a CDROM loaded in the drive, and that's it's an
-    ISO 9660 disc.  You can't mount an audio CD.
-
-  - With the CDROM in the drive and unmounted, try something like
-
-      cat /dev/cdrom | od | more
-
-    If you see a dump, then the drive and driver are probably working
-    OK, and the problem is at the filesystem level (i.e., the CDROM is
-    not ISO 9660 or has errors in the filesystem structure).
-
-  - If you see `not a block device' errors, check that the definitions
-    of the device special files are correct.  They should be as
-    follows:
-
-      brw-rw----   1 root     disk       3,   0 Nov 11 18:48 /dev/hda
-      brw-rw----   1 root     disk       3,  64 Nov 11 18:48 /dev/hdb
-      brw-rw----   1 root     disk      22,   0 Nov 11 18:48 /dev/hdc
-      brw-rw----   1 root     disk      22,  64 Nov 11 18:48 /dev/hdd
-
-    Some early Slackware releases had these defined incorrectly.  If
-    these are wrong, you can remake them by running the script
-    scripts/MAKEDEV.ide.  (You may have to make it executable
-    with chmod first.)
-
-    If you have a /dev/cdrom symbolic link, check that it is pointing
-    to the correct device file.
-
-    If you hear people talking of the devices `hd1a' and `hd1b', these
-    were old names for what are now called hdc and hdd.  Those names
-    should be considered obsolete.
-
-  - If mount is complaining that the iso9660 filesystem is not
-    available, but you know it is (check /proc/filesystems), you
-    probably need a newer version of mount.  Early versions would not
-    always give meaningful error messages.
-
-
-e. Directory listings are unpredictably truncated, and `dmesg' shows
-   `buffer botch' error messages from the driver.
-
-  - There was a bug in the version of the driver in 1.2.x kernels
-    which could cause this.  It was fixed in 1.3.0.  If you can't
-    upgrade, you can probably work around the problem by specifying a
-    blocksize of 2048 when mounting.  (Note that you won't be able to
-    directly execute binaries off the CDROM in that case.)
-
-    If you see this in kernels later than 1.3.0, please report it as a
-    bug.
-
-
-f. Data corruption.
-
-  - Random data corruption was occasionally observed with the Hitachi
-    CDR-7730 CDROM. If you experience data corruption, using "hdx=slow"
-    as a command line parameter may work around the problem, at the
-    expense of low system performance.
-
-
-5. cdchange.c
--------------
-
-/*
- * cdchange.c  [-v]  <device>  [<slot>]
- *
- * This loads a CDROM from a specified slot in a changer, and displays 
- * information about the changer status.  The drive should be unmounted before 
- * using this program.
- *
- * Changer information is displayed if either the -v flag is specified
- * or no slot was specified.
- *
- * Based on code originally from Gerhard Zuber <zuber@berlin.snafu.de>.
- * Changer status information, and rewrite for the new Uniform CDROM driver
- * interface by Erik Andersen <andersee@debian.org>.
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <errno.h>
-#include <string.h>
-#include <unistd.h>
-#include <fcntl.h>
-#include <sys/ioctl.h>
-#include <linux/cdrom.h>
-
-
-int
-main (int argc, char **argv)
-{
-	char *program;
-	char *device;
-	int fd;           /* file descriptor for CD-ROM device */
-	int status;       /* return status for system calls */
-	int verbose = 0;
-	int slot=-1, x_slot;
-	int total_slots_available;
-
-	program = argv[0];
-
-	++argv;
-	--argc;
-
-	if (argc < 1 || argc > 3) {
-		fprintf (stderr, "usage: %s [-v] <device> [<slot>]\n",
-			 program);
-		fprintf (stderr, "       Slots are numbered 1 -- n.\n");
-		exit (1);
-	}
- 
-       if (strcmp (argv[0], "-v") == 0) {
-                verbose = 1;
-                ++argv;
-                --argc;
-        }
- 
-	device = argv[0];
- 
-	if (argc == 2)
-		slot = atoi (argv[1]) - 1;
-
-	/* open device */ 
-	fd = open(device, O_RDONLY | O_NONBLOCK);
-	if (fd < 0) {
-		fprintf (stderr, "%s: open failed for `%s': %s\n",
-			 program, device, strerror (errno));
-		exit (1);
-	}
-
-	/* Check CD player status */ 
-	total_slots_available = ioctl (fd, CDROM_CHANGER_NSLOTS);
-	if (total_slots_available <= 1 ) {
-		fprintf (stderr, "%s: Device `%s' is not an ATAPI "
-			"compliant CD changer.\n", program, device);
-		exit (1);
-	}
-
-	if (slot >= 0) {
-		if (slot >= total_slots_available) {
-			fprintf (stderr, "Bad slot number.  "
-				 "Should be 1 -- %d.\n",
-				 total_slots_available);
-			exit (1);
-		}
-
-		/* load */ 
-		slot=ioctl (fd, CDROM_SELECT_DISC, slot);
-		if (slot<0) {
-			fflush(stdout);
-				perror ("CDROM_SELECT_DISC ");
-			exit(1);
-		}
-	}
-
-	if (slot < 0 || verbose) {
-
-		status=ioctl (fd, CDROM_SELECT_DISC, CDSL_CURRENT);
-		if (status<0) {
-			fflush(stdout);
-			perror (" CDROM_SELECT_DISC");
-			exit(1);
-		}
-		slot=status;
-
-		printf ("Current slot: %d\n", slot+1);
-		printf ("Total slots available: %d\n",
-			total_slots_available);
-
-		printf ("Drive status: ");
-                status = ioctl (fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
-                if (status<0) {
-                  perror(" CDROM_DRIVE_STATUS");
-                } else switch(status) {
-		case CDS_DISC_OK:
-			printf ("Ready.\n");
-			break;
-		case CDS_TRAY_OPEN:
-			printf ("Tray Open.\n");
-			break;
-		case CDS_DRIVE_NOT_READY:
-			printf ("Drive Not Ready.\n");
-			break;
-		default:
-			printf ("This Should not happen!\n");
-			break;
-		}
-
-		for (x_slot=0; x_slot<total_slots_available; x_slot++) {
-			printf ("Slot %2d: ", x_slot+1);
-             		status = ioctl (fd, CDROM_DRIVE_STATUS, x_slot);
-             		if (status<0) {
-             		     perror(" CDROM_DRIVE_STATUS");
-             		} else switch(status) {
-			case CDS_DISC_OK:
-				printf ("Disc present.");
-				break;
-			case CDS_NO_DISC: 
-				printf ("Empty slot.");
-				break;
-			case CDS_TRAY_OPEN:
-				printf ("CD-ROM tray open.\n");
-				break;
-			case CDS_DRIVE_NOT_READY:
-				printf ("CD-ROM drive not ready.\n");
-				break;
-			case CDS_NO_INFO:
-				printf ("No Information available.");
-				break;
-			default:
-				printf ("This Should not happen!\n");
-				break;
-			}
-		  if (slot == x_slot) {
-                  status = ioctl (fd, CDROM_DISC_STATUS);
-                  if (status<0) {
-			perror(" CDROM_DISC_STATUS");
-                  }
-		  switch (status) {
-			case CDS_AUDIO:
-				printf ("\tAudio disc.\t");
-				break;
-			case CDS_DATA_1:
-			case CDS_DATA_2:
-				printf ("\tData disc type %d.\t", status-CDS_DATA_1+1);
-				break;
-			case CDS_XA_2_1:
-			case CDS_XA_2_2:
-				printf ("\tXA data disc type %d.\t", status-CDS_XA_2_1+1);
-				break;
-			default:
-				printf ("\tUnknown disc type 0x%x!\t", status);
-				break;
-			}
-			}
-                  	status = ioctl (fd, CDROM_MEDIA_CHANGED, x_slot);
-                  	if (status<0) {
-				perror(" CDROM_MEDIA_CHANGED");
-                  	}
-		  	switch (status) {
-			case 1:
-				printf ("Changed.\n");
-				break;
-			default:
-				printf ("\n");
-				break;
-			}
-		}
-	}
-
-	/* close device */
-	status = close (fd);
-	if (status != 0) {
-		fprintf (stderr, "%s: close failed for `%s': %s\n",
-			 program, device, strerror (errno));
-		exit (1);
-	}
- 
-	exit (0);
-}
diff --git a/Documentation/cdrom/ide-cd.rst b/Documentation/cdrom/ide-cd.rst
new file mode 100644
index 000000000000..dadc94ef6b6c
--- /dev/null
+++ b/Documentation/cdrom/ide-cd.rst
@@ -0,0 +1,538 @@
+IDE-CD driver documentation
+===========================
+
+:Originally by: scott snyder  <snyder@fnald0.fnal.gov> (19 May 1996)
+:Carrying on the torch is: Erik Andersen <andersee@debian.org>
+:New maintainers (19 Oct 1998): Jens Axboe <axboe@image.dk>
+
+1. Introduction
+---------------
+
+The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant
+CDROM drives which attach to an IDE interface.  Note that some CDROM vendors
+(including Mitsumi, Sony, Creative, Aztech, and Goldstar) have made
+both ATAPI-compliant drives and drives which use a proprietary
+interface.  If your drive uses one of those proprietary interfaces,
+this driver will not work with it (but one of the other CDROM drivers
+probably will).  This driver will not work with `ATAPI` drives which
+attach to the parallel port.  In addition, there is at least one drive
+(CyCDROM CR520ie) which attaches to the IDE port but is not ATAPI;
+this driver will not work with drives like that either (but see the
+aztcd driver).
+
+This driver provides the following features:
+
+ - Reading from data tracks, and mounting ISO 9660 filesystems.
+
+ - Playing audio tracks.  Most of the CDROM player programs floating
+   around should work; I usually use Workman.
+
+ - Multisession support.
+
+ - On drives which support it, reading digital audio data directly
+   from audio tracks.  The program cdda2wav can be used for this.
+   Note, however, that only some drives actually support this.
+
+ - There is now support for CDROM changers which comply with the
+   ATAPI 2.6 draft standard (such as the NEC CDR-251).  This additional
+   functionality includes a function call to query which slot is the
+   currently selected slot, a function call to query which slots contain
+   CDs, etc. A sample program which demonstrates this functionality is
+   appended to the end of this file.  The Sanyo 3-disc changer
+   (which does not conform to the standard) is also now supported.
+   Please note the driver refers to the first CD as slot # 0.
+
+
+2. Installation
+---------------
+
+0. The ide-cd relies on the ide disk driver.  See
+   Documentation/ide/ide.txt for up-to-date information on the ide
+   driver.
+
+1. Make sure that the ide and ide-cd drivers are compiled into the
+   kernel you're using.  When configuring the kernel, in the section
+   entitled "Floppy, IDE, and other block devices", say either `Y`
+   (which will compile the support directly into the kernel) or `M`
+   (to compile support as a module which can be loaded and unloaded)
+   to the options::
+
+      ATA/ATAPI/MFM/RLL support
+      Include IDE/ATAPI CDROM support
+
+   Depending on what type of IDE interface you have, you may need to
+   specify additional configuration options.  See
+   Documentation/ide/ide.txt.
+
+2. You should also ensure that the iso9660 filesystem is either
+   compiled into the kernel or available as a loadable module.  You
+   can see if a filesystem is known to the kernel by catting
+   /proc/filesystems.
+
+3. The CDROM drive should be connected to the host on an IDE
+   interface.  Each interface on a system is defined by an I/O port
+   address and an IRQ number, the standard assignments being
+   0x1f0 and 14 for the primary interface and 0x170 and 15 for the
+   secondary interface.  Each interface can control up to two devices,
+   where each device can be a hard drive, a CDROM drive, a floppy drive,
+   or a tape drive.  The two devices on an interface are called `master`
+   and `slave`; this is usually selectable via a jumper on the drive.
+
+   Linux names these devices as follows.  The master and slave devices
+   on the primary IDE interface are called `hda` and `hdb`,
+   respectively.  The drives on the secondary interface are called
+   `hdc` and `hdd`.  (Interfaces at other locations get other letters
+   in the third position; see Documentation/ide/ide.txt.)
+
+   If you want your CDROM drive to be found automatically by the
+   driver, you should make sure your IDE interface uses either the
+   primary or secondary addresses mentioned above.  In addition, if
+   the CDROM drive is the only device on the IDE interface, it should
+   be jumpered as `master`.  (If for some reason you cannot configure
+   your system in this manner, you can probably still use the driver.
+   You may have to pass extra configuration information to the kernel
+   when you boot, however.  See Documentation/ide/ide.txt for more
+   information.)
+
+4. Boot the system.  If the drive is recognized, you should see a
+   message which looks like::
+
+     hdb: NEC CD-ROM DRIVE:260, ATAPI CDROM drive
+
+   If you do not see this, see section 5 below.
+
+5. You may want to create a symbolic link /dev/cdrom pointing to the
+   actual device.  You can do this with the command::
+
+     ln -s  /dev/hdX  /dev/cdrom
+
+   where X should be replaced by the letter indicating where your
+   drive is installed.
+
+6. You should be able to see any error messages from the driver with
+   the `dmesg` command.
+
+
+3. Basic usage
+--------------
+
+An ISO 9660 CDROM can be mounted by putting the disc in the drive and
+typing (as root)::
+
+  mount -t iso9660 /dev/cdrom /mnt/cdrom
+
+where it is assumed that /dev/cdrom is a link pointing to the actual
+device (as described in step 5 of the last section) and /mnt/cdrom is
+an empty directory.  You should now be able to see the contents of the
+CDROM under the /mnt/cdrom directory.  If you want to eject the CDROM,
+you must first dismount it with a command like::
+
+  umount /mnt/cdrom
+
+Note that audio CDs cannot be mounted.
+
+Some distributions set up /etc/fstab to always try to mount a CDROM
+filesystem on bootup.  It is not required to mount the CDROM in this
+manner, though, and it may be a nuisance if you change CDROMs often.
+You should feel free to remove the cdrom line from /etc/fstab and
+mount CDROMs manually if that suits you better.
+
+Multisession and photocd discs should work with no special handling.
+The hpcdtoppm package (ftp.gwdg.de:/pub/linux/hpcdtoppm/) may be
+useful for reading photocds.
+
+To play an audio CD, you should first unmount and remove any data
+CDROM.  Any of the CDROM player programs should then work (workman,
+workbone, cdplayer, etc.).
+
+On a few drives, you can read digital audio directly using a program
+such as cdda2wav.  The only types of drive which I've heard support
+this are Sony and Toshiba drives.  You will get errors if you try to
+use this function on a drive which does not support it.
+
+For supported changers, you can use the `cdchange` program (appended to
+the end of this file) to switch between changer slots.  Note that the
+drive should be unmounted before attempting this.  The program takes
+two arguments:  the CDROM device, and the slot number to which you wish
+to change.  If the slot number is -1, the drive is unloaded.
+
+
+4. Common problems
+------------------
+
+This section discusses some common problems encountered when trying to
+use the driver, and some possible solutions.  Note that if you are
+experiencing problems, you should probably also review
+Documentation/ide/ide.txt for current information about the underlying
+IDE support code.  Some of these items apply only to earlier versions
+of the driver, but are mentioned here for completeness.
+
+In most cases, you should probably check with `dmesg` for any errors
+from the driver.
+
+a. Drive is not detected during booting.
+
+   - Review the configuration instructions above and in
+     Documentation/ide/ide.txt, and check how your hardware is
+     configured.
+
+   - If your drive is the only device on an IDE interface, it should
+     be jumpered as master, if at all possible.
+
+   - If your IDE interface is not at the standard addresses of 0x170
+     or 0x1f0, you'll need to explicitly inform the driver using a
+     lilo option.  See Documentation/ide/ide.txt.  (This feature was
+     added around kernel version 1.3.30.)
+
+   - If the autoprobing is not finding your drive, you can tell the
+     driver to assume that one exists by using a lilo option of the
+     form `hdX=cdrom`, where X is the drive letter corresponding to
+     where your drive is installed.  Note that if you do this and you
+     see a boot message like::
+
+       hdX: ATAPI cdrom (?)
+
+     this does _not_ mean that the driver has successfully detected
+     the drive; rather, it means that the driver has not detected a
+     drive, but is assuming there's one there anyway because you told
+     it so.  If you actually try to do I/O to a drive defined at a
+     nonexistent or nonresponding I/O address, you'll probably get
+     errors with a status value of 0xff.
+
+   - Some IDE adapters require a nonstandard initialization sequence
+     before they'll function properly.  (If this is the case, there
+     will often be a separate MS-DOS driver just for the controller.)
+     IDE interfaces on sound cards often fall into this category.
+
+     Support for some interfaces needing extra initialization is
+     provided in later 1.3.x kernels.  You may need to turn on
+     additional kernel configuration options to get them to work;
+     see Documentation/ide/ide.txt.
+
+     Even if support is not available for your interface, you may be
+     able to get it to work with the following procedure.  First boot
+     MS-DOS and load the appropriate drivers.  Then warm-boot linux
+     (i.e., without powering off).  If this works, it can be automated
+     by running loadlin from the MS-DOS autoexec.
+
+
+b. Timeout/IRQ errors.
+
+  - If you always get timeout errors, interrupts from the drive are
+    probably not making it to the host.
+
+  - IRQ problems may also be indicated by the message
+    `IRQ probe failed (<n>)` while booting.  If <n> is zero, that
+    means that the system did not see an interrupt from the drive when
+    it was expecting one (on any feasible IRQ).  If <n> is negative,
+    that means the system saw interrupts on multiple IRQ lines, when
+    it was expecting to receive just one from the CDROM drive.
+
+  - Double-check your hardware configuration to make sure that the IRQ
+    number of your IDE interface matches what the driver expects.
+    (The usual assignments are 14 for the primary (0x1f0) interface
+    and 15 for the secondary (0x170) interface.)  Also be sure that
+    you don't have some other hardware which might be conflicting with
+    the IRQ you're using.  Also check the BIOS setup for your system;
+    some have the ability to disable individual IRQ levels, and I've
+    had one report of a system which was shipped with IRQ 15 disabled
+    by default.
+
+  - Note that many MS-DOS CDROM drivers will still function even if
+    there are hardware problems with the interrupt setup; they
+    apparently don't use interrupts.
+
+  - If you own a Pioneer DR-A24X, you _will_ get nasty error messages
+    on boot such as "irq timeout: status=0x50 { DriveReady SeekComplete }"
+    The Pioneer DR-A24X CDROM drives are fairly popular these days.
+    Unfortunately, these drives seem to become very confused when we perform
+    the standard Linux ATA disk drive probe. If you own one of these drives,
+    you can bypass the ATA probing which confuses these CDROM drives, by
+    adding `append="hdX=noprobe hdX=cdrom"` to your lilo.conf file and running
+    lilo (again where X is the drive letter corresponding to where your drive
+    is installed.)
+
+c. System hangups.
+
+  - If the system locks up when you try to access the CDROM, the most
+    likely cause is that you have a buggy IDE adapter which doesn't
+    properly handle simultaneous transactions on multiple interfaces.
+    The most notorious of these is the CMD640B chip.  This problem can
+    be worked around by specifying the `serialize` option when
+    booting.  Recent kernels should be able to detect the need for
+    this automatically in most cases, but the detection is not
+    foolproof.  See Documentation/ide/ide.txt for more information
+    about the `serialize` option and the CMD640B.
+
+  - Note that many MS-DOS CDROM drivers will work with such buggy
+    hardware, apparently because they never attempt to overlap CDROM
+    operations with other disk activity.
+
+
+d. Can't mount a CDROM.
+
+  - If you get errors from mount, it may help to check `dmesg` to see
+    if there are any more specific errors from the driver or from the
+    filesystem.
+
+  - Make sure there's a CDROM loaded in the drive, and that's it's an
+    ISO 9660 disc.  You can't mount an audio CD.
+
+  - With the CDROM in the drive and unmounted, try something like::
+
+      cat /dev/cdrom | od | more
+
+    If you see a dump, then the drive and driver are probably working
+    OK, and the problem is at the filesystem level (i.e., the CDROM is
+    not ISO 9660 or has errors in the filesystem structure).
+
+  - If you see `not a block device` errors, check that the definitions
+    of the device special files are correct.  They should be as
+    follows::
+
+      brw-rw----   1 root     disk       3,   0 Nov 11 18:48 /dev/hda
+      brw-rw----   1 root     disk       3,  64 Nov 11 18:48 /dev/hdb
+      brw-rw----   1 root     disk      22,   0 Nov 11 18:48 /dev/hdc
+      brw-rw----   1 root     disk      22,  64 Nov 11 18:48 /dev/hdd
+
+    Some early Slackware releases had these defined incorrectly.  If
+    these are wrong, you can remake them by running the script
+    scripts/MAKEDEV.ide.  (You may have to make it executable
+    with chmod first.)
+
+    If you have a /dev/cdrom symbolic link, check that it is pointing
+    to the correct device file.
+
+    If you hear people talking of the devices `hd1a` and `hd1b`, these
+    were old names for what are now called hdc and hdd.  Those names
+    should be considered obsolete.
+
+  - If mount is complaining that the iso9660 filesystem is not
+    available, but you know it is (check /proc/filesystems), you
+    probably need a newer version of mount.  Early versions would not
+    always give meaningful error messages.
+
+
+e. Directory listings are unpredictably truncated, and `dmesg` shows
+   `buffer botch` error messages from the driver.
+
+  - There was a bug in the version of the driver in 1.2.x kernels
+    which could cause this.  It was fixed in 1.3.0.  If you can't
+    upgrade, you can probably work around the problem by specifying a
+    blocksize of 2048 when mounting.  (Note that you won't be able to
+    directly execute binaries off the CDROM in that case.)
+
+    If you see this in kernels later than 1.3.0, please report it as a
+    bug.
+
+
+f. Data corruption.
+
+  - Random data corruption was occasionally observed with the Hitachi
+    CDR-7730 CDROM. If you experience data corruption, using "hdx=slow"
+    as a command line parameter may work around the problem, at the
+    expense of low system performance.
+
+
+5. cdchange.c
+-------------
+
+::
+
+  /*
+   * cdchange.c  [-v]  <device>  [<slot>]
+   *
+   * This loads a CDROM from a specified slot in a changer, and displays
+   * information about the changer status.  The drive should be unmounted before
+   * using this program.
+   *
+   * Changer information is displayed if either the -v flag is specified
+   * or no slot was specified.
+   *
+   * Based on code originally from Gerhard Zuber <zuber@berlin.snafu.de>.
+   * Changer status information, and rewrite for the new Uniform CDROM driver
+   * interface by Erik Andersen <andersee@debian.org>.
+   */
+
+  #include <stdio.h>
+  #include <stdlib.h>
+  #include <errno.h>
+  #include <string.h>
+  #include <unistd.h>
+  #include <fcntl.h>
+  #include <sys/ioctl.h>
+  #include <linux/cdrom.h>
+
+
+  int
+  main (int argc, char **argv)
+  {
+	char *program;
+	char *device;
+	int fd;           /* file descriptor for CD-ROM device */
+	int status;       /* return status for system calls */
+	int verbose = 0;
+	int slot=-1, x_slot;
+	int total_slots_available;
+
+	program = argv[0];
+
+	++argv;
+	--argc;
+
+	if (argc < 1 || argc > 3) {
+		fprintf (stderr, "usage: %s [-v] <device> [<slot>]\n",
+			 program);
+		fprintf (stderr, "       Slots are numbered 1 -- n.\n");
+		exit (1);
+	}
+
+       if (strcmp (argv[0], "-v") == 0) {
+                verbose = 1;
+                ++argv;
+                --argc;
+        }
+
+	device = argv[0];
+
+	if (argc == 2)
+		slot = atoi (argv[1]) - 1;
+
+	/* open device */
+	fd = open(device, O_RDONLY | O_NONBLOCK);
+	if (fd < 0) {
+		fprintf (stderr, "%s: open failed for `%s`: %s\n",
+			 program, device, strerror (errno));
+		exit (1);
+	}
+
+	/* Check CD player status */
+	total_slots_available = ioctl (fd, CDROM_CHANGER_NSLOTS);
+	if (total_slots_available <= 1 ) {
+		fprintf (stderr, "%s: Device `%s` is not an ATAPI "
+			"compliant CD changer.\n", program, device);
+		exit (1);
+	}
+
+	if (slot >= 0) {
+		if (slot >= total_slots_available) {
+			fprintf (stderr, "Bad slot number.  "
+				 "Should be 1 -- %d.\n",
+				 total_slots_available);
+			exit (1);
+		}
+
+		/* load */
+		slot=ioctl (fd, CDROM_SELECT_DISC, slot);
+		if (slot<0) {
+			fflush(stdout);
+				perror ("CDROM_SELECT_DISC ");
+			exit(1);
+		}
+	}
+
+	if (slot < 0 || verbose) {
+
+		status=ioctl (fd, CDROM_SELECT_DISC, CDSL_CURRENT);
+		if (status<0) {
+			fflush(stdout);
+			perror (" CDROM_SELECT_DISC");
+			exit(1);
+		}
+		slot=status;
+
+		printf ("Current slot: %d\n", slot+1);
+		printf ("Total slots available: %d\n",
+			total_slots_available);
+
+		printf ("Drive status: ");
+                status = ioctl (fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
+                if (status<0) {
+                  perror(" CDROM_DRIVE_STATUS");
+                } else switch(status) {
+		case CDS_DISC_OK:
+			printf ("Ready.\n");
+			break;
+		case CDS_TRAY_OPEN:
+			printf ("Tray Open.\n");
+			break;
+		case CDS_DRIVE_NOT_READY:
+			printf ("Drive Not Ready.\n");
+			break;
+		default:
+			printf ("This Should not happen!\n");
+			break;
+		}
+
+		for (x_slot=0; x_slot<total_slots_available; x_slot++) {
+			printf ("Slot %2d: ", x_slot+1);
+			status = ioctl (fd, CDROM_DRIVE_STATUS, x_slot);
+			if (status<0) {
+			     perror(" CDROM_DRIVE_STATUS");
+			} else switch(status) {
+			case CDS_DISC_OK:
+				printf ("Disc present.");
+				break;
+			case CDS_NO_DISC:
+				printf ("Empty slot.");
+				break;
+			case CDS_TRAY_OPEN:
+				printf ("CD-ROM tray open.\n");
+				break;
+			case CDS_DRIVE_NOT_READY:
+				printf ("CD-ROM drive not ready.\n");
+				break;
+			case CDS_NO_INFO:
+				printf ("No Information available.");
+				break;
+			default:
+				printf ("This Should not happen!\n");
+				break;
+			}
+		  if (slot == x_slot) {
+                  status = ioctl (fd, CDROM_DISC_STATUS);
+                  if (status<0) {
+			perror(" CDROM_DISC_STATUS");
+                  }
+		  switch (status) {
+			case CDS_AUDIO:
+				printf ("\tAudio disc.\t");
+				break;
+			case CDS_DATA_1:
+			case CDS_DATA_2:
+				printf ("\tData disc type %d.\t", status-CDS_DATA_1+1);
+				break;
+			case CDS_XA_2_1:
+			case CDS_XA_2_2:
+				printf ("\tXA data disc type %d.\t", status-CDS_XA_2_1+1);
+				break;
+			default:
+				printf ("\tUnknown disc type 0x%x!\t", status);
+				break;
+			}
+			}
+			status = ioctl (fd, CDROM_MEDIA_CHANGED, x_slot);
+			if (status<0) {
+				perror(" CDROM_MEDIA_CHANGED");
+			}
+			switch (status) {
+			case 1:
+				printf ("Changed.\n");
+				break;
+			default:
+				printf ("\n");
+				break;
+			}
+		}
+	}
+
+	/* close device */
+	status = close (fd);
+	if (status != 0) {
+		fprintf (stderr, "%s: close failed for `%s`: %s\n",
+			 program, device, strerror (errno));
+		exit (1);
+	}
+
+	exit (0);
+  }
diff --git a/Documentation/cdrom/index.rst b/Documentation/cdrom/index.rst
new file mode 100644
index 000000000000..efbd5d111825
--- /dev/null
+++ b/Documentation/cdrom/index.rst
@@ -0,0 +1,19 @@
+:orphan:
+
+=====
+cdrom
+=====
+
+.. toctree::
+    :maxdepth: 1
+
+    cdrom-standard
+    ide-cd
+    packet-writing
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/cdrom/packet-writing.rst b/Documentation/cdrom/packet-writing.rst
new file mode 100644
index 000000000000..c5c957195a5a
--- /dev/null
+++ b/Documentation/cdrom/packet-writing.rst
@@ -0,0 +1,139 @@
+==============
+Packet writing
+==============
+
+Getting started quick
+---------------------
+
+- Select packet support in the block device section and UDF support in
+  the file system section.
+
+- Compile and install kernel and modules, reboot.
+
+- You need the udftools package (pktsetup, mkudffs, cdrwtool).
+  Download from http://sourceforge.net/projects/linux-udf/
+
+- Grab a new CD-RW disc and format it (assuming CD-RW is hdc, substitute
+  as appropriate)::
+
+	# cdrwtool -d /dev/hdc -q
+
+- Setup your writer::
+
+	# pktsetup dev_name /dev/hdc
+
+- Now you can mount /dev/pktcdvd/dev_name and copy files to it. Enjoy::
+
+	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
+
+
+Packet writing for DVD-RW media
+-------------------------------
+
+DVD-RW discs can be written to much like CD-RW discs if they are in
+the so called "restricted overwrite" mode. To put a disc in restricted
+overwrite mode, run::
+
+	# dvd+rw-format /dev/hdc
+
+You can then use the disc the same way you would use a CD-RW disc::
+
+	# pktsetup dev_name /dev/hdc
+	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
+
+
+Packet writing for DVD+RW media
+-------------------------------
+
+According to the DVD+RW specification, a drive supporting DVD+RW discs
+shall implement "true random writes with 2KB granularity", which means
+that it should be possible to put any filesystem with a block size >=
+2KB on such a disc. For example, it should be possible to do::
+
+	# dvd+rw-format /dev/hdc   (only needed if the disc has never
+	                            been formatted)
+	# mkudffs /dev/hdc
+	# mount /dev/hdc /cdrom -t udf -o rw,noatime
+
+However, some drives don't follow the specification and expect the
+host to perform aligned writes at 32KB boundaries. Other drives do
+follow the specification, but suffer bad performance problems if the
+writes are not 32KB aligned.
+
+Both problems can be solved by using the pktcdvd driver, which always
+generates aligned writes::
+
+	# dvd+rw-format /dev/hdc
+	# pktsetup dev_name /dev/hdc
+	# mkudffs /dev/pktcdvd/dev_name
+	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
+
+
+Packet writing for DVD-RAM media
+--------------------------------
+
+DVD-RAM discs are random writable, so using the pktcdvd driver is not
+necessary. However, using the pktcdvd driver can improve performance
+in the same way it does for DVD+RW media.
+
+
+Notes
+-----
+
+- CD-RW media can usually not be overwritten more than about 1000
+  times, so to avoid unnecessary wear on the media, you should always
+  use the noatime mount option.
+
+- Defect management (ie automatic remapping of bad sectors) has not
+  been implemented yet, so you are likely to get at least some
+  filesystem corruption if the disc wears out.
+
+- Since the pktcdvd driver makes the disc appear as a regular block
+  device with a 2KB block size, you can put any filesystem you like on
+  the disc. For example, run::
+
+	# /sbin/mke2fs /dev/pktcdvd/dev_name
+
+  to create an ext2 filesystem on the disc.
+
+
+Using the pktcdvd sysfs interface
+---------------------------------
+
+Since Linux 2.6.20, the pktcdvd module has a sysfs interface
+and can be controlled by it. For example the "pktcdvd" tool uses
+this interface. (see http://tom.ist-im-web.de/download/pktcdvd )
+
+"pktcdvd" works similar to "pktsetup", e.g.::
+
+	# pktcdvd -a dev_name /dev/hdc
+	# mkudffs /dev/pktcdvd/dev_name
+	# mount -t udf -o rw,noatime /dev/pktcdvd/dev_name /dvdram
+	# cp files /dvdram
+	# umount /dvdram
+	# pktcdvd -r dev_name
+
+
+For a description of the sysfs interface look into the file:
+
+  Documentation/ABI/testing/sysfs-class-pktcdvd
+
+
+Using the pktcdvd debugfs interface
+-----------------------------------
+
+To read pktcdvd device infos in human readable form, do::
+
+	# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
+
+For a description of the debugfs interface look into the file:
+
+  Documentation/ABI/testing/debugfs-pktcdvd
+
+
+
+Links
+-----
+
+See http://fy.chalmers.se/~appro/linux/DVD+RW/ for more information
+about DVD writing.
diff --git a/Documentation/cdrom/packet-writing.txt b/Documentation/cdrom/packet-writing.txt
deleted file mode 100644
index 2834170d821e..000000000000
--- a/Documentation/cdrom/packet-writing.txt
+++ /dev/null
@@ -1,132 +0,0 @@
-Getting started quick
----------------------
-
-- Select packet support in the block device section and UDF support in
-  the file system section.
-
-- Compile and install kernel and modules, reboot.
-
-- You need the udftools package (pktsetup, mkudffs, cdrwtool).
-  Download from http://sourceforge.net/projects/linux-udf/
-
-- Grab a new CD-RW disc and format it (assuming CD-RW is hdc, substitute
-  as appropriate):
-	# cdrwtool -d /dev/hdc -q
-
-- Setup your writer
-	# pktsetup dev_name /dev/hdc
-
-- Now you can mount /dev/pktcdvd/dev_name and copy files to it. Enjoy!
-	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
-
-
-Packet writing for DVD-RW media
--------------------------------
-
-DVD-RW discs can be written to much like CD-RW discs if they are in
-the so called "restricted overwrite" mode. To put a disc in restricted
-overwrite mode, run:
-
-	# dvd+rw-format /dev/hdc
-
-You can then use the disc the same way you would use a CD-RW disc:
-
-	# pktsetup dev_name /dev/hdc
-	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
-
-
-Packet writing for DVD+RW media
--------------------------------
-
-According to the DVD+RW specification, a drive supporting DVD+RW discs
-shall implement "true random writes with 2KB granularity", which means
-that it should be possible to put any filesystem with a block size >=
-2KB on such a disc. For example, it should be possible to do:
-
-	# dvd+rw-format /dev/hdc   (only needed if the disc has never
-	                            been formatted)
-	# mkudffs /dev/hdc
-	# mount /dev/hdc /cdrom -t udf -o rw,noatime
-
-However, some drives don't follow the specification and expect the
-host to perform aligned writes at 32KB boundaries. Other drives do
-follow the specification, but suffer bad performance problems if the
-writes are not 32KB aligned.
-
-Both problems can be solved by using the pktcdvd driver, which always
-generates aligned writes.
-
-	# dvd+rw-format /dev/hdc
-	# pktsetup dev_name /dev/hdc
-	# mkudffs /dev/pktcdvd/dev_name
-	# mount /dev/pktcdvd/dev_name /cdrom -t udf -o rw,noatime
-
-
-Packet writing for DVD-RAM media
---------------------------------
-
-DVD-RAM discs are random writable, so using the pktcdvd driver is not
-necessary. However, using the pktcdvd driver can improve performance
-in the same way it does for DVD+RW media.
-
-
-Notes
------
-
-- CD-RW media can usually not be overwritten more than about 1000
-  times, so to avoid unnecessary wear on the media, you should always
-  use the noatime mount option.
-
-- Defect management (ie automatic remapping of bad sectors) has not
-  been implemented yet, so you are likely to get at least some
-  filesystem corruption if the disc wears out.
-
-- Since the pktcdvd driver makes the disc appear as a regular block
-  device with a 2KB block size, you can put any filesystem you like on
-  the disc. For example, run:
-
-	# /sbin/mke2fs /dev/pktcdvd/dev_name
-
-  to create an ext2 filesystem on the disc.
-
-
-Using the pktcdvd sysfs interface
----------------------------------
-
-Since Linux 2.6.20, the pktcdvd module has a sysfs interface
-and can be controlled by it. For example the "pktcdvd" tool uses
-this interface. (see http://tom.ist-im-web.de/download/pktcdvd )
-
-"pktcdvd" works similar to "pktsetup", e.g.:
-
-	# pktcdvd -a dev_name /dev/hdc
-	# mkudffs /dev/pktcdvd/dev_name
-	# mount -t udf -o rw,noatime /dev/pktcdvd/dev_name /dvdram
-	# cp files /dvdram
-	# umount /dvdram
-	# pktcdvd -r dev_name
-
-
-For a description of the sysfs interface look into the file:
-
-  Documentation/ABI/testing/sysfs-class-pktcdvd
-
-
-Using the pktcdvd debugfs interface
------------------------------------
-
-To read pktcdvd device infos in human readable form, do:
-
-	# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
-
-For a description of the debugfs interface look into the file:
-
-  Documentation/ABI/testing/debugfs-pktcdvd
-
-
-
-Links
------
-
-See http://fy.chalmers.se/~appro/linux/DVD+RW/ for more information
-about DVD writing.
diff --git a/MAINTAINERS b/MAINTAINERS
index 92eb34679b26..c95c29735327 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7610,7 +7610,7 @@ IDE/ATAPI DRIVERS
 M:	Borislav Petkov <bp@alien8.de>
 L:	linux-ide@vger.kernel.org
 S:	Maintained
-F:	Documentation/cdrom/ide-cd
+F:	Documentation/cdrom/ide-cd.rst
 F:	drivers/ide/ide-cd*
 
 IDEAPAD LAPTOP EXTRAS DRIVER
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 20bb4bfa4be6..96ec7e0fc1ea 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -347,7 +347,7 @@ config CDROM_PKTCDVD
 	  is possible.
 	  DVD-RW disks must be in restricted overwrite mode.
 
-	  See the file <file:Documentation/cdrom/packet-writing.txt>
+	  See the file <file:Documentation/cdrom/packet-writing.rst>
 	  for further information on the use of this driver.
 
 	  To compile this driver as a module, choose M here: the
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 5d1e0a4a7d84..ac42ae4651ce 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -7,7 +7,7 @@
    License.  See linux/COPYING for more information.
 
    Uniform CD-ROM driver for Linux.
-   See Documentation/cdrom/cdrom-standard.txt for usage information.
+   See Documentation/cdrom/cdrom-standard.rst for usage information.
 
    The routines in the file provide a uniform interface between the
    software that uses CD-ROMs and the various low-level drivers that
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 3b15adc6ce98..9d117936bee1 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -9,7 +9,7 @@
  * May be copied or modified under the terms of the GNU General Public
  * License.  See linux/COPYING for more information.
  *
- * See Documentation/cdrom/ide-cd for usage information.
+ * See Documentation/cdrom/ide-cd.rst for usage information.
  *
  * Suggestions are welcome. Patches that work are more welcome though. ;-)
  *
-- 
cgit v1.2.3-59-g8ed1b


From f0ba43774cea3fc14732bb9243ce7238ae8a3202 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:43 -0300
Subject: docs: convert docs to ReST and rename to *.rst

The conversion is actually:
  - add blank lines and indentation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/device-mapper/cache-policies.rst    | 131 +++++++
 Documentation/device-mapper/cache-policies.txt    | 121 ------
 Documentation/device-mapper/cache.rst             | 337 +++++++++++++++++
 Documentation/device-mapper/cache.txt             | 311 ----------------
 Documentation/device-mapper/delay.rst             |  31 ++
 Documentation/device-mapper/delay.txt             |  28 --
 Documentation/device-mapper/dm-crypt.rst          | 173 +++++++++
 Documentation/device-mapper/dm-crypt.txt          | 162 --------
 Documentation/device-mapper/dm-flakey.rst         |  74 ++++
 Documentation/device-mapper/dm-flakey.txt         |  57 ---
 Documentation/device-mapper/dm-init.rst           | 125 +++++++
 Documentation/device-mapper/dm-init.txt           | 114 ------
 Documentation/device-mapper/dm-integrity.rst      | 259 +++++++++++++
 Documentation/device-mapper/dm-integrity.txt      | 233 ------------
 Documentation/device-mapper/dm-io.rst             |  75 ++++
 Documentation/device-mapper/dm-io.txt             |  75 ----
 Documentation/device-mapper/dm-log.rst            |  57 +++
 Documentation/device-mapper/dm-log.txt            |  54 ---
 Documentation/device-mapper/dm-queue-length.rst   |  48 +++
 Documentation/device-mapper/dm-queue-length.txt   |  39 --
 Documentation/device-mapper/dm-raid.rst           | 419 +++++++++++++++++++++
 Documentation/device-mapper/dm-raid.txt           | 354 ------------------
 Documentation/device-mapper/dm-service-time.rst   | 101 +++++
 Documentation/device-mapper/dm-service-time.txt   |  91 -----
 Documentation/device-mapper/dm-uevent.rst         | 110 ++++++
 Documentation/device-mapper/dm-uevent.txt         |  97 -----
 Documentation/device-mapper/dm-zoned.rst          | 146 ++++++++
 Documentation/device-mapper/dm-zoned.txt          | 144 --------
 Documentation/device-mapper/era.rst               | 116 ++++++
 Documentation/device-mapper/era.txt               | 108 ------
 Documentation/device-mapper/index.rst             |  44 +++
 Documentation/device-mapper/kcopyd.rst            |  47 +++
 Documentation/device-mapper/kcopyd.txt            |  47 ---
 Documentation/device-mapper/linear.rst            |  63 ++++
 Documentation/device-mapper/linear.txt            |  61 ----
 Documentation/device-mapper/log-writes.rst        | 145 ++++++++
 Documentation/device-mapper/log-writes.txt        | 140 -------
 Documentation/device-mapper/persistent-data.rst   |  88 +++++
 Documentation/device-mapper/persistent-data.txt   |  84 -----
 Documentation/device-mapper/snapshot.rst          | 180 +++++++++
 Documentation/device-mapper/snapshot.txt          | 176 ---------
 Documentation/device-mapper/statistics.rst        | 225 ++++++++++++
 Documentation/device-mapper/statistics.txt        | 223 -----------
 Documentation/device-mapper/striped.rst           |  61 ++++
 Documentation/device-mapper/striped.txt           |  57 ---
 Documentation/device-mapper/switch.rst            | 141 +++++++
 Documentation/device-mapper/switch.txt            | 138 -------
 Documentation/device-mapper/thin-provisioning.rst | 427 ++++++++++++++++++++++
 Documentation/device-mapper/thin-provisioning.txt | 411 ---------------------
 Documentation/device-mapper/unstriped.rst         | 135 +++++++
 Documentation/device-mapper/unstriped.txt         | 124 -------
 Documentation/device-mapper/verity.rst            | 229 ++++++++++++
 Documentation/device-mapper/verity.txt            | 219 -----------
 Documentation/device-mapper/writecache.rst        |  79 ++++
 Documentation/device-mapper/writecache.txt        |  70 ----
 Documentation/device-mapper/zero.rst              |  37 ++
 Documentation/device-mapper/zero.txt              |  37 --
 Documentation/filesystems/ubifs-authentication.md |   4 +-
 drivers/md/Kconfig                                |   2 +-
 drivers/md/dm-init.c                              |   2 +-
 drivers/md/dm-raid.c                              |   2 +-
 61 files changed, 4108 insertions(+), 3780 deletions(-)
 create mode 100644 Documentation/device-mapper/cache-policies.rst
 delete mode 100644 Documentation/device-mapper/cache-policies.txt
 create mode 100644 Documentation/device-mapper/cache.rst
 delete mode 100644 Documentation/device-mapper/cache.txt
 create mode 100644 Documentation/device-mapper/delay.rst
 delete mode 100644 Documentation/device-mapper/delay.txt
 create mode 100644 Documentation/device-mapper/dm-crypt.rst
 delete mode 100644 Documentation/device-mapper/dm-crypt.txt
 create mode 100644 Documentation/device-mapper/dm-flakey.rst
 delete mode 100644 Documentation/device-mapper/dm-flakey.txt
 create mode 100644 Documentation/device-mapper/dm-init.rst
 delete mode 100644 Documentation/device-mapper/dm-init.txt
 create mode 100644 Documentation/device-mapper/dm-integrity.rst
 delete mode 100644 Documentation/device-mapper/dm-integrity.txt
 create mode 100644 Documentation/device-mapper/dm-io.rst
 delete mode 100644 Documentation/device-mapper/dm-io.txt
 create mode 100644 Documentation/device-mapper/dm-log.rst
 delete mode 100644 Documentation/device-mapper/dm-log.txt
 create mode 100644 Documentation/device-mapper/dm-queue-length.rst
 delete mode 100644 Documentation/device-mapper/dm-queue-length.txt
 create mode 100644 Documentation/device-mapper/dm-raid.rst
 delete mode 100644 Documentation/device-mapper/dm-raid.txt
 create mode 100644 Documentation/device-mapper/dm-service-time.rst
 delete mode 100644 Documentation/device-mapper/dm-service-time.txt
 create mode 100644 Documentation/device-mapper/dm-uevent.rst
 delete mode 100644 Documentation/device-mapper/dm-uevent.txt
 create mode 100644 Documentation/device-mapper/dm-zoned.rst
 delete mode 100644 Documentation/device-mapper/dm-zoned.txt
 create mode 100644 Documentation/device-mapper/era.rst
 delete mode 100644 Documentation/device-mapper/era.txt
 create mode 100644 Documentation/device-mapper/index.rst
 create mode 100644 Documentation/device-mapper/kcopyd.rst
 delete mode 100644 Documentation/device-mapper/kcopyd.txt
 create mode 100644 Documentation/device-mapper/linear.rst
 delete mode 100644 Documentation/device-mapper/linear.txt
 create mode 100644 Documentation/device-mapper/log-writes.rst
 delete mode 100644 Documentation/device-mapper/log-writes.txt
 create mode 100644 Documentation/device-mapper/persistent-data.rst
 delete mode 100644 Documentation/device-mapper/persistent-data.txt
 create mode 100644 Documentation/device-mapper/snapshot.rst
 delete mode 100644 Documentation/device-mapper/snapshot.txt
 create mode 100644 Documentation/device-mapper/statistics.rst
 delete mode 100644 Documentation/device-mapper/statistics.txt
 create mode 100644 Documentation/device-mapper/striped.rst
 delete mode 100644 Documentation/device-mapper/striped.txt
 create mode 100644 Documentation/device-mapper/switch.rst
 delete mode 100644 Documentation/device-mapper/switch.txt
 create mode 100644 Documentation/device-mapper/thin-provisioning.rst
 delete mode 100644 Documentation/device-mapper/thin-provisioning.txt
 create mode 100644 Documentation/device-mapper/unstriped.rst
 delete mode 100644 Documentation/device-mapper/unstriped.txt
 create mode 100644 Documentation/device-mapper/verity.rst
 delete mode 100644 Documentation/device-mapper/verity.txt
 create mode 100644 Documentation/device-mapper/writecache.rst
 delete mode 100644 Documentation/device-mapper/writecache.txt
 create mode 100644 Documentation/device-mapper/zero.rst
 delete mode 100644 Documentation/device-mapper/zero.txt

diff --git a/Documentation/device-mapper/cache-policies.rst b/Documentation/device-mapper/cache-policies.rst
new file mode 100644
index 000000000000..b17fe352fc41
--- /dev/null
+++ b/Documentation/device-mapper/cache-policies.rst
@@ -0,0 +1,131 @@
+=============================
+Guidance for writing policies
+=============================
+
+Try to keep transactionality out of it.  The core is careful to
+avoid asking about anything that is migrating.  This is a pain, but
+makes it easier to write the policies.
+
+Mappings are loaded into the policy at construction time.
+
+Every bio that is mapped by the target is referred to the policy.
+The policy can return a simple HIT or MISS or issue a migration.
+
+Currently there's no way for the policy to issue background work,
+e.g. to start writing back dirty blocks that are going to be evicted
+soon.
+
+Because we map bios, rather than requests it's easy for the policy
+to get fooled by many small bios.  For this reason the core target
+issues periodic ticks to the policy.  It's suggested that the policy
+doesn't update states (eg, hit counts) for a block more than once
+for each tick.  The core ticks by watching bios complete, and so
+trying to see when the io scheduler has let the ios run.
+
+
+Overview of supplied cache replacement policies
+===============================================
+
+multiqueue (mq)
+---------------
+
+This policy is now an alias for smq (see below).
+
+The following tunables are accepted, but have no effect::
+
+	'sequential_threshold <#nr_sequential_ios>'
+	'random_threshold <#nr_random_ios>'
+	'read_promote_adjustment <value>'
+	'write_promote_adjustment <value>'
+	'discard_promote_adjustment <value>'
+
+Stochastic multiqueue (smq)
+---------------------------
+
+This policy is the default.
+
+The stochastic multi-queue (smq) policy addresses some of the problems
+with the multiqueue (mq) policy.
+
+The smq policy (vs mq) offers the promise of less memory utilization,
+improved performance and increased adaptability in the face of changing
+workloads.  smq also does not have any cumbersome tuning knobs.
+
+Users may switch from "mq" to "smq" simply by appropriately reloading a
+DM table that is using the cache target.  Doing so will cause all of the
+mq policy's hints to be dropped.  Also, performance of the cache may
+degrade slightly until smq recalculates the origin device's hotspots
+that should be cached.
+
+Memory usage
+^^^^^^^^^^^^
+
+The mq policy used a lot of memory; 88 bytes per cache block on a 64
+bit machine.
+
+smq uses 28bit indexes to implement its data structures rather than
+pointers.  It avoids storing an explicit hit count for each block.  It
+has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
+the entries (each hotspot block covers a larger area than a single
+cache block).
+
+All this means smq uses ~25bytes per cache block.  Still a lot of
+memory, but a substantial improvement nontheless.
+
+Level balancing
+^^^^^^^^^^^^^^^
+
+mq placed entries in different levels of the multiqueue structures
+based on their hit count (~ln(hit count)).  This meant the bottom
+levels generally had the most entries, and the top ones had very
+few.  Having unbalanced levels like this reduced the efficacy of the
+multiqueue.
+
+smq does not maintain a hit count, instead it swaps hit entries with
+the least recently used entry from the level above.  The overall
+ordering being a side effect of this stochastic process.  With this
+scheme we can decide how many entries occupy each multiqueue level,
+resulting in better promotion/demotion decisions.
+
+Adaptability:
+The mq policy maintained a hit count for each cache block.  For a
+different block to get promoted to the cache its hit count has to
+exceed the lowest currently in the cache.  This meant it could take a
+long time for the cache to adapt between varying IO patterns.
+
+smq doesn't maintain hit counts, so a lot of this problem just goes
+away.  In addition it tracks performance of the hotspot queue, which
+is used to decide which blocks to promote.  If the hotspot queue is
+performing badly then it starts moving entries more quickly between
+levels.  This lets it adapt to new IO patterns very quickly.
+
+Performance
+^^^^^^^^^^^
+
+Testing smq shows substantially better performance than mq.
+
+cleaner
+-------
+
+The cleaner writes back all dirty blocks in a cache to decommission it.
+
+Examples
+========
+
+The syntax for a table is::
+
+	cache <metadata dev> <cache dev> <origin dev> <block size>
+	<#feature_args> [<feature arg>]*
+	<policy> <#policy_args> [<policy arg>]*
+
+The syntax to send a message using the dmsetup command is::
+
+	dmsetup message <mapped device> 0 sequential_threshold 1024
+	dmsetup message <mapped device> 0 random_threshold 8
+
+Using dmsetup::
+
+	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
+	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
+	creates a 128GB large mapped device named 'blah' with the
+	sequential threshold set to 1024 and the random_threshold set to 8.
diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt
deleted file mode 100644
index 86786d87d9a8..000000000000
--- a/Documentation/device-mapper/cache-policies.txt
+++ /dev/null
@@ -1,121 +0,0 @@
-Guidance for writing policies
-=============================
-
-Try to keep transactionality out of it.  The core is careful to
-avoid asking about anything that is migrating.  This is a pain, but
-makes it easier to write the policies.
-
-Mappings are loaded into the policy at construction time.
-
-Every bio that is mapped by the target is referred to the policy.
-The policy can return a simple HIT or MISS or issue a migration.
-
-Currently there's no way for the policy to issue background work,
-e.g. to start writing back dirty blocks that are going to be evicted
-soon.
-
-Because we map bios, rather than requests it's easy for the policy
-to get fooled by many small bios.  For this reason the core target
-issues periodic ticks to the policy.  It's suggested that the policy
-doesn't update states (eg, hit counts) for a block more than once
-for each tick.  The core ticks by watching bios complete, and so
-trying to see when the io scheduler has let the ios run.
-
-
-Overview of supplied cache replacement policies
-===============================================
-
-multiqueue (mq)
----------------
-
-This policy is now an alias for smq (see below).
-
-The following tunables are accepted, but have no effect:
-
-	'sequential_threshold <#nr_sequential_ios>'
-	'random_threshold <#nr_random_ios>'
-	'read_promote_adjustment <value>'
-	'write_promote_adjustment <value>'
-	'discard_promote_adjustment <value>'
-
-Stochastic multiqueue (smq)
----------------------------
-
-This policy is the default.
-
-The stochastic multi-queue (smq) policy addresses some of the problems
-with the multiqueue (mq) policy.
-
-The smq policy (vs mq) offers the promise of less memory utilization,
-improved performance and increased adaptability in the face of changing
-workloads.  smq also does not have any cumbersome tuning knobs.
-
-Users may switch from "mq" to "smq" simply by appropriately reloading a
-DM table that is using the cache target.  Doing so will cause all of the
-mq policy's hints to be dropped.  Also, performance of the cache may
-degrade slightly until smq recalculates the origin device's hotspots
-that should be cached.
-
-Memory usage:
-The mq policy used a lot of memory; 88 bytes per cache block on a 64
-bit machine.
-
-smq uses 28bit indexes to implement its data structures rather than
-pointers.  It avoids storing an explicit hit count for each block.  It
-has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
-the entries (each hotspot block covers a larger area than a single
-cache block).
-
-All this means smq uses ~25bytes per cache block.  Still a lot of
-memory, but a substantial improvement nontheless.
-
-Level balancing:
-mq placed entries in different levels of the multiqueue structures
-based on their hit count (~ln(hit count)).  This meant the bottom
-levels generally had the most entries, and the top ones had very
-few.  Having unbalanced levels like this reduced the efficacy of the
-multiqueue.
-
-smq does not maintain a hit count, instead it swaps hit entries with
-the least recently used entry from the level above.  The overall
-ordering being a side effect of this stochastic process.  With this
-scheme we can decide how many entries occupy each multiqueue level,
-resulting in better promotion/demotion decisions.
-
-Adaptability:
-The mq policy maintained a hit count for each cache block.  For a
-different block to get promoted to the cache its hit count has to
-exceed the lowest currently in the cache.  This meant it could take a
-long time for the cache to adapt between varying IO patterns.
-
-smq doesn't maintain hit counts, so a lot of this problem just goes
-away.  In addition it tracks performance of the hotspot queue, which
-is used to decide which blocks to promote.  If the hotspot queue is
-performing badly then it starts moving entries more quickly between
-levels.  This lets it adapt to new IO patterns very quickly.
-
-Performance:
-Testing smq shows substantially better performance than mq.
-
-cleaner
--------
-
-The cleaner writes back all dirty blocks in a cache to decommission it.
-
-Examples
-========
-
-The syntax for a table is:
-	cache <metadata dev> <cache dev> <origin dev> <block size>
-	<#feature_args> [<feature arg>]*
-	<policy> <#policy_args> [<policy arg>]*
-
-The syntax to send a message using the dmsetup command is:
-	dmsetup message <mapped device> 0 sequential_threshold 1024
-	dmsetup message <mapped device> 0 random_threshold 8
-
-Using dmsetup:
-	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
-	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
-	creates a 128GB large mapped device named 'blah' with the
-	sequential threshold set to 1024 and the random_threshold set to 8.
diff --git a/Documentation/device-mapper/cache.rst b/Documentation/device-mapper/cache.rst
new file mode 100644
index 000000000000..f15e5254d05b
--- /dev/null
+++ b/Documentation/device-mapper/cache.rst
@@ -0,0 +1,337 @@
+=====
+Cache
+=====
+
+Introduction
+============
+
+dm-cache is a device mapper target written by Joe Thornber, Heinz
+Mauelshagen, and Mike Snitzer.
+
+It aims to improve performance of a block device (eg, a spindle) by
+dynamically migrating some of its data to a faster, smaller device
+(eg, an SSD).
+
+This device-mapper solution allows us to insert this caching at
+different levels of the dm stack, for instance above the data device for
+a thin-provisioning pool.  Caching solutions that are integrated more
+closely with the virtual memory system should give better performance.
+
+The target reuses the metadata library used in the thin-provisioning
+library.
+
+The decision as to what data to migrate and when is left to a plug-in
+policy module.  Several of these have been written as we experiment,
+and we hope other people will contribute others for specific io
+scenarios (eg. a vm image server).
+
+Glossary
+========
+
+  Migration
+	       Movement of the primary copy of a logical block from one
+	       device to the other.
+  Promotion
+	       Migration from slow device to fast device.
+  Demotion
+	       Migration from fast device to slow device.
+
+The origin device always contains a copy of the logical block, which
+may be out of date or kept in sync with the copy on the cache device
+(depending on policy).
+
+Design
+======
+
+Sub-devices
+-----------
+
+The target is constructed by passing three devices to it (along with
+other parameters detailed later):
+
+1. An origin device - the big, slow one.
+
+2. A cache device - the small, fast one.
+
+3. A small metadata device - records which blocks are in the cache,
+   which are dirty, and extra hints for use by the policy object.
+   This information could be put on the cache device, but having it
+   separate allows the volume manager to configure it differently,
+   e.g. as a mirror for extra robustness.  This metadata device may only
+   be used by a single cache device.
+
+Fixed block size
+----------------
+
+The origin is divided up into blocks of a fixed size.  This block size
+is configurable when you first create the cache.  Typically we've been
+using block sizes of 256KB - 1024KB.  The block size must be between 64
+sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
+
+Having a fixed block size simplifies the target a lot.  But it is
+something of a compromise.  For instance, a small part of a block may be
+getting hit a lot, yet the whole block will be promoted to the cache.
+So large block sizes are bad because they waste cache space.  And small
+block sizes are bad because they increase the amount of metadata (both
+in core and on disk).
+
+Cache operating modes
+---------------------
+
+The cache has three operating modes: writeback, writethrough and
+passthrough.
+
+If writeback, the default, is selected then a write to a block that is
+cached will go only to the cache and the block will be marked dirty in
+the metadata.
+
+If writethrough is selected then a write to a cached block will not
+complete until it has hit both the origin and cache devices.  Clean
+blocks should remain clean.
+
+If passthrough is selected, useful when the cache contents are not known
+to be coherent with the origin device, then all reads are served from
+the origin device (all reads miss the cache) and all writes are
+forwarded to the origin device; additionally, write hits cause cache
+block invalidates.  To enable passthrough mode the cache must be clean.
+Passthrough mode allows a cache device to be activated without having to
+worry about coherency.  Coherency that exists is maintained, although
+the cache will gradually cool as writes take place.  If the coherency of
+the cache can later be verified, or established through use of the
+"invalidate_cblocks" message, the cache device can be transitioned to
+writethrough or writeback mode while still warm.  Otherwise, the cache
+contents can be discarded prior to transitioning to the desired
+operating mode.
+
+A simple cleaner policy is provided, which will clean (write back) all
+dirty blocks in a cache.  Useful for decommissioning a cache or when
+shrinking a cache.  Shrinking the cache's fast device requires all cache
+blocks, in the area of the cache being removed, to be clean.  If the
+area being removed from the cache still contains dirty blocks the resize
+will fail.  Care must be taken to never reduce the volume used for the
+cache's fast device until the cache is clean.  This is of particular
+importance if writeback mode is used.  Writethrough and passthrough
+modes already maintain a clean cache.  Future support to partially clean
+the cache, above a specified threshold, will allow for keeping the cache
+warm and in writeback mode during resize.
+
+Migration throttling
+--------------------
+
+Migrating data between the origin and cache device uses bandwidth.
+The user can set a throttle to prevent more than a certain amount of
+migration occurring at any one time.  Currently we're not taking any
+account of normal io traffic going to the devices.  More work needs
+doing here to avoid migrating during those peak io moments.
+
+For the time being, a message "migration_threshold <#sectors>"
+can be used to set the maximum number of sectors being migrated,
+the default being 2048 sectors (1MB).
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the cache behaves like a physical disk that has a volatile write
+cache.  If power is lost you may lose some recent writes.  The metadata
+should always be consistent in spite of any crash.
+
+The 'dirty' state for a cache block changes far too frequently for us
+to keep updating it on the fly.  So we treat it as a hint.  In normal
+operation it will be written when the dm device is suspended.  If the
+system crashes all cache blocks will be assumed dirty when restarted.
+
+Per-block policy hints
+----------------------
+
+Policy plug-ins can store a chunk of data per cache block.  It's up to
+the policy how big this chunk is, but it should be kept small.  Like the
+dirty flags this data is lost if there's a crash so a safe fallback
+value should always be possible.
+
+Policy hints affect performance, not correctness.
+
+Policy messaging
+----------------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these.  Device-mapper
+messages are used.  Refer to cache-policies.txt.
+
+Discard bitset resolution
+-------------------------
+
+We can avoid copying data during migration if we know the block has
+been discarded.  A prime example of this is when mkfs discards the
+whole block device.  We store a bitset tracking the discard state of
+blocks.  However, we allow this bitset to have a different block size
+from the cache blocks.  This is because we need to track the discard
+state for all of the origin device (compare with the dirty bitset
+which is just for the smaller cache device).
+
+Target interface
+================
+
+Constructor
+-----------
+
+  ::
+
+   cache <metadata dev> <cache dev> <origin dev> <block size>
+         <#feature args> [<feature arg>]*
+         <policy> <#policy args> [policy args]*
+
+ ================ =======================================================
+ metadata dev     fast device holding the persistent metadata
+ cache dev	  fast device holding cached data blocks
+ origin dev	  slow device holding original data blocks
+ block size       cache unit size in sectors
+
+ #feature args    number of feature arguments passed
+ feature args     writethrough or passthrough (The default is writeback.)
+
+ policy           the replacement policy to use
+ #policy args     an even number of arguments corresponding to
+                  key/value pairs passed to the policy
+ policy args      key/value pairs passed to the policy
+		  E.g. 'sequential_threshold 1024'
+		  See cache-policies.txt for details.
+ ================ =======================================================
+
+Optional feature arguments are:
+
+
+   ==================== ========================================================
+   writethrough		write through caching that prohibits cache block
+			content from being different from origin block content.
+			Without this argument, the default behaviour is to write
+			back cache block contents later for performance reasons,
+			so they may differ from the corresponding origin blocks.
+
+   passthrough		a degraded mode useful for various cache coherency
+			situations (e.g., rolling back snapshots of
+			underlying storage).	 Reads and writes always go to
+			the origin.	If a write goes to a cached origin
+			block, then the cache block is invalidated.
+			To enable passthrough mode the cache must be clean.
+
+   metadata2		use version 2 of the metadata.  This stores the dirty
+			bits in a separate btree, which improves speed of
+			shutting down the cache.
+
+   no_discard_passdown	disable passing down discards from the cache
+			to the origin's data device.
+   ==================== ========================================================
+
+A policy called 'default' is always registered.  This is an alias for
+the policy we currently think is giving best all round performance.
+
+As the default policy could vary between kernels, if you are relying on
+the characteristics of a specific policy, always request it by name.
+
+Status
+------
+
+::
+
+  <metadata block size> <#used metadata blocks>/<#total metadata blocks>
+  <cache block size> <#used cache blocks>/<#total cache blocks>
+  <#read hits> <#read misses> <#write hits> <#write misses>
+  <#demotions> <#promotions> <#dirty> <#features> <features>*
+  <#core args> <core args>* <policy name> <#policy args> <policy args>*
+  <cache metadata mode>
+
+
+========================= =====================================================
+metadata block size	  Fixed block size for each metadata block in
+			  sectors
+#used metadata blocks	  Number of metadata blocks used
+#total metadata blocks	  Total number of metadata blocks
+cache block size	  Configurable block size for the cache device
+			  in sectors
+#used cache blocks	  Number of blocks resident in the cache
+#total cache blocks	  Total number of cache blocks
+#read hits		  Number of times a READ bio has been mapped
+			  to the cache
+#read misses		  Number of times a READ bio has been mapped
+			  to the origin
+#write hits		  Number of times a WRITE bio has been mapped
+			  to the cache
+#write misses		  Number of times a WRITE bio has been
+			  mapped to the origin
+#demotions		  Number of times a block has been removed
+			  from the cache
+#promotions		  Number of times a block has been moved to
+			  the cache
+#dirty			  Number of blocks in the cache that differ
+			  from the origin
+#feature args		  Number of feature args to follow
+feature args		  'writethrough' (optional)
+#core args		  Number of core arguments (must be even)
+core args		  Key/value pairs for tuning the core
+			  e.g. migration_threshold
+policy name		  Name of the policy
+#policy args		  Number of policy arguments to follow (must be even)
+policy args		  Key/value pairs e.g. sequential_threshold
+cache metadata mode       ro if read-only, rw if read-write
+
+			  In serious cases where even a read-only mode is
+			  deemed unsafe no further I/O will be permitted and
+			  the status will just contain the string 'Fail'.
+			  The userspace recovery tools should then be used.
+needs_check		  'needs_check' if set, '-' if not set
+			  A metadata operation has failed, resulting in the
+			  needs_check flag being set in the metadata's
+			  superblock.  The metadata device must be
+			  deactivated and checked/repaired before the
+			  cache can be made fully operational again.
+			  '-' indicates	needs_check is not set.
+========================= =====================================================
+
+Messages
+--------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these.  Device-mapper
+messages are used.  (A sysfs interface would also be possible.)
+
+The message format is::
+
+   <key> <value>
+
+E.g.::
+
+   dmsetup message my_cache 0 sequential_threshold 1024
+
+
+Invalidation is removing an entry from the cache without writing it
+back.  Cache blocks can be invalidated via the invalidate_cblocks
+message, which takes an arbitrary number of cblock ranges.  Each cblock
+range's end value is "one past the end", meaning 5-10 expresses a range
+of values from 5 to 9.  Each cblock must be expressed as a decimal
+value, in the future a variant message that takes cblock ranges
+expressed in hexadecimal may be needed to better support efficient
+invalidation of larger caches.  The cache must be in passthrough mode
+when invalidate_cblocks is used::
+
+   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
+
+E.g.::
+
+   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
+
+Examples
+========
+
+The test suite can be found here:
+
+https://github.com/jthornber/device-mapper-test-suite
+
+::
+
+  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+	  /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
+  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+	  /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
+	  mq 4 sequential_threshold 1024 random_threshold 8'
diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt
deleted file mode 100644
index 8ae1cf8e94da..000000000000
--- a/Documentation/device-mapper/cache.txt
+++ /dev/null
@@ -1,311 +0,0 @@
-Introduction
-============
-
-dm-cache is a device mapper target written by Joe Thornber, Heinz
-Mauelshagen, and Mike Snitzer.
-
-It aims to improve performance of a block device (eg, a spindle) by
-dynamically migrating some of its data to a faster, smaller device
-(eg, an SSD).
-
-This device-mapper solution allows us to insert this caching at
-different levels of the dm stack, for instance above the data device for
-a thin-provisioning pool.  Caching solutions that are integrated more
-closely with the virtual memory system should give better performance.
-
-The target reuses the metadata library used in the thin-provisioning
-library.
-
-The decision as to what data to migrate and when is left to a plug-in
-policy module.  Several of these have been written as we experiment,
-and we hope other people will contribute others for specific io
-scenarios (eg. a vm image server).
-
-Glossary
-========
-
-  Migration -  Movement of the primary copy of a logical block from one
-	       device to the other.
-  Promotion -  Migration from slow device to fast device.
-  Demotion  -  Migration from fast device to slow device.
-
-The origin device always contains a copy of the logical block, which
-may be out of date or kept in sync with the copy on the cache device
-(depending on policy).
-
-Design
-======
-
-Sub-devices
------------
-
-The target is constructed by passing three devices to it (along with
-other parameters detailed later):
-
-1. An origin device - the big, slow one.
-
-2. A cache device - the small, fast one.
-
-3. A small metadata device - records which blocks are in the cache,
-   which are dirty, and extra hints for use by the policy object.
-   This information could be put on the cache device, but having it
-   separate allows the volume manager to configure it differently,
-   e.g. as a mirror for extra robustness.  This metadata device may only
-   be used by a single cache device.
-
-Fixed block size
-----------------
-
-The origin is divided up into blocks of a fixed size.  This block size
-is configurable when you first create the cache.  Typically we've been
-using block sizes of 256KB - 1024KB.  The block size must be between 64
-sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
-
-Having a fixed block size simplifies the target a lot.  But it is
-something of a compromise.  For instance, a small part of a block may be
-getting hit a lot, yet the whole block will be promoted to the cache.
-So large block sizes are bad because they waste cache space.  And small
-block sizes are bad because they increase the amount of metadata (both
-in core and on disk).
-
-Cache operating modes
----------------------
-
-The cache has three operating modes: writeback, writethrough and
-passthrough.
-
-If writeback, the default, is selected then a write to a block that is
-cached will go only to the cache and the block will be marked dirty in
-the metadata.
-
-If writethrough is selected then a write to a cached block will not
-complete until it has hit both the origin and cache devices.  Clean
-blocks should remain clean.
-
-If passthrough is selected, useful when the cache contents are not known
-to be coherent with the origin device, then all reads are served from
-the origin device (all reads miss the cache) and all writes are
-forwarded to the origin device; additionally, write hits cause cache
-block invalidates.  To enable passthrough mode the cache must be clean.
-Passthrough mode allows a cache device to be activated without having to
-worry about coherency.  Coherency that exists is maintained, although
-the cache will gradually cool as writes take place.  If the coherency of
-the cache can later be verified, or established through use of the
-"invalidate_cblocks" message, the cache device can be transitioned to
-writethrough or writeback mode while still warm.  Otherwise, the cache
-contents can be discarded prior to transitioning to the desired
-operating mode.
-
-A simple cleaner policy is provided, which will clean (write back) all
-dirty blocks in a cache.  Useful for decommissioning a cache or when
-shrinking a cache.  Shrinking the cache's fast device requires all cache
-blocks, in the area of the cache being removed, to be clean.  If the
-area being removed from the cache still contains dirty blocks the resize
-will fail.  Care must be taken to never reduce the volume used for the
-cache's fast device until the cache is clean.  This is of particular
-importance if writeback mode is used.  Writethrough and passthrough
-modes already maintain a clean cache.  Future support to partially clean
-the cache, above a specified threshold, will allow for keeping the cache
-warm and in writeback mode during resize.
-
-Migration throttling
---------------------
-
-Migrating data between the origin and cache device uses bandwidth.
-The user can set a throttle to prevent more than a certain amount of
-migration occurring at any one time.  Currently we're not taking any
-account of normal io traffic going to the devices.  More work needs
-doing here to avoid migrating during those peak io moments.
-
-For the time being, a message "migration_threshold <#sectors>"
-can be used to set the maximum number of sectors being migrated,
-the default being 2048 sectors (1MB).
-
-Updating on-disk metadata
--------------------------
-
-On-disk metadata is committed every time a FLUSH or FUA bio is written.
-If no such requests are made then commits will occur every second.  This
-means the cache behaves like a physical disk that has a volatile write
-cache.  If power is lost you may lose some recent writes.  The metadata
-should always be consistent in spite of any crash.
-
-The 'dirty' state for a cache block changes far too frequently for us
-to keep updating it on the fly.  So we treat it as a hint.  In normal
-operation it will be written when the dm device is suspended.  If the
-system crashes all cache blocks will be assumed dirty when restarted.
-
-Per-block policy hints
-----------------------
-
-Policy plug-ins can store a chunk of data per cache block.  It's up to
-the policy how big this chunk is, but it should be kept small.  Like the
-dirty flags this data is lost if there's a crash so a safe fallback
-value should always be possible.
-
-Policy hints affect performance, not correctness.
-
-Policy messaging
-----------------
-
-Policies will have different tunables, specific to each one, so we
-need a generic way of getting and setting these.  Device-mapper
-messages are used.  Refer to cache-policies.txt.
-
-Discard bitset resolution
--------------------------
-
-We can avoid copying data during migration if we know the block has
-been discarded.  A prime example of this is when mkfs discards the
-whole block device.  We store a bitset tracking the discard state of
-blocks.  However, we allow this bitset to have a different block size
-from the cache blocks.  This is because we need to track the discard
-state for all of the origin device (compare with the dirty bitset
-which is just for the smaller cache device).
-
-Target interface
-================
-
-Constructor
------------
-
- cache <metadata dev> <cache dev> <origin dev> <block size>
-       <#feature args> [<feature arg>]*
-       <policy> <#policy args> [policy args]*
-
- metadata dev    : fast device holding the persistent metadata
- cache dev	 : fast device holding cached data blocks
- origin dev	 : slow device holding original data blocks
- block size      : cache unit size in sectors
-
- #feature args   : number of feature arguments passed
- feature args    : writethrough or passthrough (The default is writeback.)
-
- policy          : the replacement policy to use
- #policy args    : an even number of arguments corresponding to
-                   key/value pairs passed to the policy
- policy args     : key/value pairs passed to the policy
-		   E.g. 'sequential_threshold 1024'
-		   See cache-policies.txt for details.
-
-Optional feature arguments are:
-   writethrough  : write through caching that prohibits cache block
-		   content from being different from origin block content.
-		   Without this argument, the default behaviour is to write
-		   back cache block contents later for performance reasons,
-		   so they may differ from the corresponding origin blocks.
-
-   passthrough	 : a degraded mode useful for various cache coherency
-		   situations (e.g., rolling back snapshots of
-		   underlying storage).	 Reads and writes always go to
-		   the origin.	If a write goes to a cached origin
-		   block, then the cache block is invalidated.
-		   To enable passthrough mode the cache must be clean.
-
-   metadata2	: use version 2 of the metadata.  This stores the dirty bits
-                  in a separate btree, which improves speed of shutting
-		  down the cache.
-
-   no_discard_passdown	: disable passing down discards from the cache
-			  to the origin's data device.
-
-A policy called 'default' is always registered.  This is an alias for
-the policy we currently think is giving best all round performance.
-
-As the default policy could vary between kernels, if you are relying on
-the characteristics of a specific policy, always request it by name.
-
-Status
-------
-
-<metadata block size> <#used metadata blocks>/<#total metadata blocks>
-<cache block size> <#used cache blocks>/<#total cache blocks>
-<#read hits> <#read misses> <#write hits> <#write misses>
-<#demotions> <#promotions> <#dirty> <#features> <features>*
-<#core args> <core args>* <policy name> <#policy args> <policy args>*
-<cache metadata mode>
-
-metadata block size	 : Fixed block size for each metadata block in
-			     sectors
-#used metadata blocks	 : Number of metadata blocks used
-#total metadata blocks	 : Total number of metadata blocks
-cache block size	 : Configurable block size for the cache device
-			     in sectors
-#used cache blocks	 : Number of blocks resident in the cache
-#total cache blocks	 : Total number of cache blocks
-#read hits		 : Number of times a READ bio has been mapped
-			     to the cache
-#read misses		 : Number of times a READ bio has been mapped
-			     to the origin
-#write hits		 : Number of times a WRITE bio has been mapped
-			     to the cache
-#write misses		 : Number of times a WRITE bio has been
-			     mapped to the origin
-#demotions		 : Number of times a block has been removed
-			     from the cache
-#promotions		 : Number of times a block has been moved to
-			     the cache
-#dirty			 : Number of blocks in the cache that differ
-			     from the origin
-#feature args		 : Number of feature args to follow
-feature args		 : 'writethrough' (optional)
-#core args		 : Number of core arguments (must be even)
-core args		 : Key/value pairs for tuning the core
-			     e.g. migration_threshold
-policy name		 : Name of the policy
-#policy args		 : Number of policy arguments to follow (must be even)
-policy args		 : Key/value pairs e.g. sequential_threshold
-cache metadata mode      : ro if read-only, rw if read-write
-	In serious cases where even a read-only mode is deemed unsafe
-	no further I/O will be permitted and the status will just
-	contain the string 'Fail'.  The userspace recovery tools
-	should then be used.
-needs_check		 : 'needs_check' if set, '-' if not set
-	A metadata operation has failed, resulting in the needs_check
-	flag being set in the metadata's superblock.  The metadata
-	device must be deactivated and checked/repaired before the
-	cache can be made fully operational again.  '-' indicates
-	needs_check is not set.
-
-Messages
---------
-
-Policies will have different tunables, specific to each one, so we
-need a generic way of getting and setting these.  Device-mapper
-messages are used.  (A sysfs interface would also be possible.)
-
-The message format is:
-
-   <key> <value>
-
-E.g.
-   dmsetup message my_cache 0 sequential_threshold 1024
-
-
-Invalidation is removing an entry from the cache without writing it
-back.  Cache blocks can be invalidated via the invalidate_cblocks
-message, which takes an arbitrary number of cblock ranges.  Each cblock
-range's end value is "one past the end", meaning 5-10 expresses a range
-of values from 5 to 9.  Each cblock must be expressed as a decimal
-value, in the future a variant message that takes cblock ranges
-expressed in hexadecimal may be needed to better support efficient
-invalidation of larger caches.  The cache must be in passthrough mode
-when invalidate_cblocks is used.
-
-   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
-
-E.g.
-   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
-
-Examples
-========
-
-The test suite can be found here:
-
-https://github.com/jthornber/device-mapper-test-suite
-
-dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
-	/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
-dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
-	/dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
-	mq 4 sequential_threshold 1024 random_threshold 8'
diff --git a/Documentation/device-mapper/delay.rst b/Documentation/device-mapper/delay.rst
new file mode 100644
index 000000000000..917ba8c33359
--- /dev/null
+++ b/Documentation/device-mapper/delay.rst
@@ -0,0 +1,31 @@
+========
+dm-delay
+========
+
+Device-Mapper's "delay" target delays reads and/or writes
+and maps them to different devices.
+
+Parameters::
+
+    <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
+			       [<flush_device> <flush_offset> <flush_delay>]]
+
+With separate write parameters, the first set is only used for reads.
+Offsets are specified in sectors.
+Delays are specified in milliseconds.
+
+Example scripts
+===============
+
+::
+
+	#!/bin/sh
+	# Create device delaying rw operation for 500ms
+	echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
+
+::
+
+	#!/bin/sh
+	# Create device delaying only write operation for 500ms and
+	# splitting reads and writes to different devices $1 $2
+	echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
diff --git a/Documentation/device-mapper/delay.txt b/Documentation/device-mapper/delay.txt
deleted file mode 100644
index 6426c45273cb..000000000000
--- a/Documentation/device-mapper/delay.txt
+++ /dev/null
@@ -1,28 +0,0 @@
-dm-delay
-========
-
-Device-Mapper's "delay" target delays reads and/or writes
-and maps them to different devices.
-
-Parameters:
-    <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
-			       [<flush_device> <flush_offset> <flush_delay>]]
-
-With separate write parameters, the first set is only used for reads.
-Offsets are specified in sectors.
-Delays are specified in milliseconds.
-
-Example scripts
-===============
-[[
-#!/bin/sh
-# Create device delaying rw operation for 500ms
-echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
-]]
-
-[[
-#!/bin/sh
-# Create device delaying only write operation for 500ms and
-# splitting reads and writes to different devices $1 $2
-echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
-]]
diff --git a/Documentation/device-mapper/dm-crypt.rst b/Documentation/device-mapper/dm-crypt.rst
new file mode 100644
index 000000000000..8f4a3f889d43
--- /dev/null
+++ b/Documentation/device-mapper/dm-crypt.rst
@@ -0,0 +1,173 @@
+========
+dm-crypt
+========
+
+Device-Mapper's "crypt" target provides transparent encryption of block devices
+using the kernel crypto API.
+
+For a more detailed description of supported parameters see:
+https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
+
+Parameters::
+
+	      <cipher> <key> <iv_offset> <device path> \
+	      <offset> [<#opt_params> <opt_params>]
+
+<cipher>
+    Encryption cipher, encryption mode and Initial Vector (IV) generator.
+
+    The cipher specifications format is::
+
+       cipher[:keycount]-chainmode-ivmode[:ivopts]
+
+    Examples::
+
+       aes-cbc-essiv:sha256
+       aes-xts-plain64
+       serpent-xts-plain64
+
+    Cipher format also supports direct specification with kernel crypt API
+    format (selected by capi: prefix). The IV specification is the same
+    as for the first format type.
+    This format is mainly used for specification of authenticated modes.
+
+    The crypto API cipher specifications format is::
+
+        capi:cipher_api_spec-ivmode[:ivopts]
+
+    Examples::
+
+        capi:cbc(aes)-essiv:sha256
+        capi:xts(aes)-plain64
+
+    Examples of authenticated modes::
+
+        capi:gcm(aes)-random
+        capi:authenc(hmac(sha256),xts(aes))-random
+        capi:rfc7539(chacha20,poly1305)-random
+
+    The /proc/crypto contains a list of curently loaded crypto modes.
+
+<key>
+    Key used for encryption. It is encoded either as a hexadecimal number
+    or it can be passed as <key_string> prefixed with single colon
+    character (':') for keys residing in kernel keyring service.
+    You can only use key sizes that are valid for the selected cipher
+    in combination with the selected iv mode.
+    Note that for some iv modes the key string can contain additional
+    keys (for example IV seed) so the key contains more parts concatenated
+    into a single string.
+
+<key_string>
+    The kernel keyring key is identified by string in following format:
+    <key_size>:<key_type>:<key_description>.
+
+<key_size>
+    The encryption key size in bytes. The kernel key payload size must match
+    the value passed in <key_size>.
+
+<key_type>
+    Either 'logon' or 'user' kernel key type.
+
+<key_description>
+    The kernel keyring key description crypt target should look for
+    when loading key of <key_type>.
+
+<keycount>
+    Multi-key compatibility mode. You can define <keycount> keys and
+    then sectors are encrypted according to their offsets (sector 0 uses key0;
+    sector 1 uses key1 etc.).  <keycount> must be a power of two.
+
+<iv_offset>
+    The IV offset is a sector count that is added to the sector number
+    before creating the IV.
+
+<device path>
+    This is the device that is going to be used as backend and contains the
+    encrypted data.  You can specify it as a path like /dev/xxx or a device
+    number <major>:<minor>.
+
+<offset>
+    Starting sector within the device where the encrypted data begins.
+
+<#opt_params>
+    Number of optional parameters. If there are no optional parameters,
+    the optional paramaters section can be skipped or #opt_params can be zero.
+    Otherwise #opt_params is the number of following arguments.
+
+    Example of optional parameters section:
+        3 allow_discards same_cpu_crypt submit_from_crypt_cpus
+
+allow_discards
+    Block discard requests (a.k.a. TRIM) are passed through the crypt device.
+    The default is to ignore discard requests.
+
+    WARNING: Assess the specific security risks carefully before enabling this
+    option.  For example, allowing discards on encrypted devices may lead to
+    the leak of information about the ciphertext device (filesystem type,
+    used space etc.) if the discarded blocks can be located easily on the
+    device later.
+
+same_cpu_crypt
+    Perform encryption using the same cpu that IO was submitted on.
+    The default is to use an unbound workqueue so that encryption work
+    is automatically balanced between available CPUs.
+
+submit_from_crypt_cpus
+    Disable offloading writes to a separate thread after encryption.
+    There are some situations where offloading write bios from the
+    encryption threads to a single thread degrades performance
+    significantly.  The default is to offload write bios to the same
+    thread because it benefits CFQ to have writes submitted using the
+    same context.
+
+integrity:<bytes>:<type>
+    The device requires additional <bytes> metadata per-sector stored
+    in per-bio integrity structure. This metadata must by provided
+    by underlying dm-integrity target.
+
+    The <type> can be "none" if metadata is used only for persistent IV.
+
+    For Authenticated Encryption with Additional Data (AEAD)
+    the <type> is "aead". An AEAD mode additionally calculates and verifies
+    integrity for the encrypted device. The additional space is then
+    used for storing authentication tag (and persistent IV if needed).
+
+sector_size:<bytes>
+    Use <bytes> as the encryption unit instead of 512 bytes sectors.
+    This option can be in range 512 - 4096 bytes and must be power of two.
+    Virtual device will announce this size as a minimal IO and logical sector.
+
+iv_large_sectors
+   IV generators will use sector number counted in <sector_size> units
+   instead of default 512 bytes sectors.
+
+   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
+   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
+   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
+   if this flag is specified.
+
+Example scripts
+===============
+LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
+encryption with dm-crypt using the 'cryptsetup' utility, see
+https://gitlab.com/cryptsetup/cryptsetup
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using dmsetup
+	dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using dmsetup when encryption key is stored in keyring service
+	dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using cryptsetup and LUKS header with default cipher
+	cryptsetup luksFormat $1
+	cryptsetup luksOpen $1 crypt1
diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
deleted file mode 100644
index 3b3e1de21c9c..000000000000
--- a/Documentation/device-mapper/dm-crypt.txt
+++ /dev/null
@@ -1,162 +0,0 @@
-dm-crypt
-=========
-
-Device-Mapper's "crypt" target provides transparent encryption of block devices
-using the kernel crypto API.
-
-For a more detailed description of supported parameters see:
-https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
-
-Parameters: <cipher> <key> <iv_offset> <device path> \
-	      <offset> [<#opt_params> <opt_params>]
-
-<cipher>
-    Encryption cipher, encryption mode and Initial Vector (IV) generator.
-
-    The cipher specifications format is:
-       cipher[:keycount]-chainmode-ivmode[:ivopts]
-    Examples:
-       aes-cbc-essiv:sha256
-       aes-xts-plain64
-       serpent-xts-plain64
-
-    Cipher format also supports direct specification with kernel crypt API
-    format (selected by capi: prefix). The IV specification is the same
-    as for the first format type.
-    This format is mainly used for specification of authenticated modes.
-
-    The crypto API cipher specifications format is:
-        capi:cipher_api_spec-ivmode[:ivopts]
-    Examples:
-        capi:cbc(aes)-essiv:sha256
-        capi:xts(aes)-plain64
-    Examples of authenticated modes:
-        capi:gcm(aes)-random
-        capi:authenc(hmac(sha256),xts(aes))-random
-        capi:rfc7539(chacha20,poly1305)-random
-
-    The /proc/crypto contains a list of curently loaded crypto modes.
-
-<key>
-    Key used for encryption. It is encoded either as a hexadecimal number
-    or it can be passed as <key_string> prefixed with single colon
-    character (':') for keys residing in kernel keyring service.
-    You can only use key sizes that are valid for the selected cipher
-    in combination with the selected iv mode.
-    Note that for some iv modes the key string can contain additional
-    keys (for example IV seed) so the key contains more parts concatenated
-    into a single string.
-
-<key_string>
-    The kernel keyring key is identified by string in following format:
-    <key_size>:<key_type>:<key_description>.
-
-<key_size>
-    The encryption key size in bytes. The kernel key payload size must match
-    the value passed in <key_size>.
-
-<key_type>
-    Either 'logon' or 'user' kernel key type.
-
-<key_description>
-    The kernel keyring key description crypt target should look for
-    when loading key of <key_type>.
-
-<keycount>
-    Multi-key compatibility mode. You can define <keycount> keys and
-    then sectors are encrypted according to their offsets (sector 0 uses key0;
-    sector 1 uses key1 etc.).  <keycount> must be a power of two.
-
-<iv_offset>
-    The IV offset is a sector count that is added to the sector number
-    before creating the IV.
-
-<device path>
-    This is the device that is going to be used as backend and contains the
-    encrypted data.  You can specify it as a path like /dev/xxx or a device
-    number <major>:<minor>.
-
-<offset>
-    Starting sector within the device where the encrypted data begins.
-
-<#opt_params>
-    Number of optional parameters. If there are no optional parameters,
-    the optional paramaters section can be skipped or #opt_params can be zero.
-    Otherwise #opt_params is the number of following arguments.
-
-    Example of optional parameters section:
-        3 allow_discards same_cpu_crypt submit_from_crypt_cpus
-
-allow_discards
-    Block discard requests (a.k.a. TRIM) are passed through the crypt device.
-    The default is to ignore discard requests.
-
-    WARNING: Assess the specific security risks carefully before enabling this
-    option.  For example, allowing discards on encrypted devices may lead to
-    the leak of information about the ciphertext device (filesystem type,
-    used space etc.) if the discarded blocks can be located easily on the
-    device later.
-
-same_cpu_crypt
-    Perform encryption using the same cpu that IO was submitted on.
-    The default is to use an unbound workqueue so that encryption work
-    is automatically balanced between available CPUs.
-
-submit_from_crypt_cpus
-    Disable offloading writes to a separate thread after encryption.
-    There are some situations where offloading write bios from the
-    encryption threads to a single thread degrades performance
-    significantly.  The default is to offload write bios to the same
-    thread because it benefits CFQ to have writes submitted using the
-    same context.
-
-integrity:<bytes>:<type>
-    The device requires additional <bytes> metadata per-sector stored
-    in per-bio integrity structure. This metadata must by provided
-    by underlying dm-integrity target.
-
-    The <type> can be "none" if metadata is used only for persistent IV.
-
-    For Authenticated Encryption with Additional Data (AEAD)
-    the <type> is "aead". An AEAD mode additionally calculates and verifies
-    integrity for the encrypted device. The additional space is then
-    used for storing authentication tag (and persistent IV if needed).
-
-sector_size:<bytes>
-    Use <bytes> as the encryption unit instead of 512 bytes sectors.
-    This option can be in range 512 - 4096 bytes and must be power of two.
-    Virtual device will announce this size as a minimal IO and logical sector.
-
-iv_large_sectors
-   IV generators will use sector number counted in <sector_size> units
-   instead of default 512 bytes sectors.
-
-   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
-   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
-   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
-   if this flag is specified.
-
-Example scripts
-===============
-LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
-encryption with dm-crypt using the 'cryptsetup' utility, see
-https://gitlab.com/cryptsetup/cryptsetup
-
-[[
-#!/bin/sh
-# Create a crypt device using dmsetup
-dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
-]]
-
-[[
-#!/bin/sh
-# Create a crypt device using dmsetup when encryption key is stored in keyring service
-dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
-]]
-
-[[
-#!/bin/sh
-# Create a crypt device using cryptsetup and LUKS header with default cipher
-cryptsetup luksFormat $1
-cryptsetup luksOpen $1 crypt1
-]]
diff --git a/Documentation/device-mapper/dm-flakey.rst b/Documentation/device-mapper/dm-flakey.rst
new file mode 100644
index 000000000000..86138735879d
--- /dev/null
+++ b/Documentation/device-mapper/dm-flakey.rst
@@ -0,0 +1,74 @@
+=========
+dm-flakey
+=========
+
+This target is the same as the linear target except that it exhibits
+unreliable behaviour periodically.  It's been found useful in simulating
+failing devices for testing purposes.
+
+Starting from the time the table is loaded, the device is available for
+<up interval> seconds, then exhibits unreliable behaviour for <down
+interval> seconds, and then this cycle repeats.
+
+Also, consider using this in combination with the dm-delay target too,
+which can delay reads and writes and/or send them to different
+underlying devices.
+
+Table parameters
+----------------
+
+::
+
+  <dev path> <offset> <up interval> <down interval> \
+    [<num_features> [<feature arguments>]]
+
+Mandatory parameters:
+
+    <dev path>:
+        Full pathname to the underlying block-device, or a
+        "major:minor" device-number.
+    <offset>:
+        Starting sector within the device.
+    <up interval>:
+        Number of seconds device is available.
+    <down interval>:
+        Number of seconds device returns errors.
+
+Optional feature parameters:
+
+  If no feature parameters are present, during the periods of
+  unreliability, all I/O returns errors.
+
+  drop_writes:
+	All write I/O is silently ignored.
+	Read I/O is handled correctly.
+
+  error_writes:
+	All write I/O is failed with an error signalled.
+	Read I/O is handled correctly.
+
+  corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
+	During <down interval>, replace <Nth_byte> of the data of
+	each matching bio with <value>.
+
+    <Nth_byte>:
+	The offset of the byte to replace.
+	Counting starts at 1, to replace the first byte.
+    <direction>:
+	Either 'r' to corrupt reads or 'w' to corrupt writes.
+	'w' is incompatible with drop_writes.
+    <value>:
+	The value (from 0-255) to write.
+    <flags>:
+	Perform the replacement only if bio->bi_opf has all the
+	selected flags set.
+
+Examples:
+
+Replaces the 32nd byte of READ bios with the value 1::
+
+  corrupt_bio_byte 32 r 1 0
+
+Replaces the 224th byte of REQ_META (=32) bios with the value 0::
+
+  corrupt_bio_byte 224 w 0 32
diff --git a/Documentation/device-mapper/dm-flakey.txt b/Documentation/device-mapper/dm-flakey.txt
deleted file mode 100644
index 9f0e247d0877..000000000000
--- a/Documentation/device-mapper/dm-flakey.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-dm-flakey
-=========
-
-This target is the same as the linear target except that it exhibits
-unreliable behaviour periodically.  It's been found useful in simulating
-failing devices for testing purposes.
-
-Starting from the time the table is loaded, the device is available for
-<up interval> seconds, then exhibits unreliable behaviour for <down
-interval> seconds, and then this cycle repeats.
-
-Also, consider using this in combination with the dm-delay target too,
-which can delay reads and writes and/or send them to different
-underlying devices.
-
-Table parameters
-----------------
-  <dev path> <offset> <up interval> <down interval> \
-    [<num_features> [<feature arguments>]]
-
-Mandatory parameters:
-    <dev path>: Full pathname to the underlying block-device, or a
-                "major:minor" device-number.
-    <offset>: Starting sector within the device.
-    <up interval>: Number of seconds device is available.
-    <down interval>: Number of seconds device returns errors.
-
-Optional feature parameters:
-  If no feature parameters are present, during the periods of
-  unreliability, all I/O returns errors.
-
-  drop_writes:
-	All write I/O is silently ignored.
-	Read I/O is handled correctly.
-
-  error_writes:
-	All write I/O is failed with an error signalled.
-	Read I/O is handled correctly.
-
-  corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
-	During <down interval>, replace <Nth_byte> of the data of
-	each matching bio with <value>.
-
-    <Nth_byte>: The offset of the byte to replace.
-		Counting starts at 1, to replace the first byte.
-    <direction>: Either 'r' to corrupt reads or 'w' to corrupt writes.
-		 'w' is incompatible with drop_writes.
-    <value>: The value (from 0-255) to write.
-    <flags>: Perform the replacement only if bio->bi_opf has all the
-	     selected flags set.
-
-Examples:
-  corrupt_bio_byte 32 r 1 0
-	- replaces the 32nd byte of READ bios with the value 1
-
-  corrupt_bio_byte 224 w 0 32
-	- replaces the 224th byte of REQ_META (=32) bios with the value 0
diff --git a/Documentation/device-mapper/dm-init.rst b/Documentation/device-mapper/dm-init.rst
new file mode 100644
index 000000000000..e5242ff17e9b
--- /dev/null
+++ b/Documentation/device-mapper/dm-init.rst
@@ -0,0 +1,125 @@
+================================
+Early creation of mapped devices
+================================
+
+It is possible to configure a device-mapper device to act as the root device for
+your system in two ways.
+
+The first is to build an initial ramdisk which boots to a minimal userspace
+which configures the device, then pivot_root(8) in to it.
+
+The second is to create one or more device-mappers using the module parameter
+"dm-mod.create=" through the kernel boot command line argument.
+
+The format is specified as a string of data separated by commas and optionally
+semi-colons, where:
+
+ - a comma is used to separate fields like name, uuid, flags and table
+   (specifies one device)
+ - a semi-colon is used to separate devices.
+
+So the format will look like this::
+
+ dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
+
+Where::
+
+	<name>		::= The device name.
+	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
+	<minor>		::= The device minor number | ""
+	<flags>		::= "ro" | "rw"
+	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
+	<target_type>	::= "verity" | "linear" | ... (see list below)
+
+The dm line should be equivalent to the one used by the dmsetup tool with the
+`--concise` argument.
+
+Target types
+============
+
+Not all target types are available as there are serious risks in allowing
+activation of certain DM targets without first using userspace tools to check
+the validity of associated metadata.
+
+======================= =======================================================
+`cache`			constrained, userspace should verify cache device
+`crypt`			allowed
+`delay`			allowed
+`era`			constrained, userspace should verify metadata device
+`flakey`		constrained, meant for test
+`linear`		allowed
+`log-writes`		constrained, userspace should verify metadata device
+`mirror`		constrained, userspace should verify main/mirror device
+`raid`			constrained, userspace should verify metadata device
+`snapshot`		constrained, userspace should verify src/dst device
+`snapshot-origin`	allowed
+`snapshot-merge`	constrained, userspace should verify src/dst device
+`striped`		allowed
+`switch`		constrained, userspace should verify dev path
+`thin`			constrained, requires dm target message from userspace
+`thin-pool`		constrained, requires dm target message from userspace
+`verity`		allowed
+`writecache`		constrained, userspace should verify cache device
+`zero`			constrained, not meant for rootfs
+======================= =======================================================
+
+If the target is not listed above, it is constrained by default (not tested).
+
+Examples
+========
+An example of booting to a linear array made up of user-mode linux block
+devices::
+
+  dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
+
+This will boot to a rw dm-linear target of 8192 sectors split across two block
+devices identified by their major:minor numbers.  After boot, udev will rename
+this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
+
+An example of multiple device-mappers, with the dm-mod.create="..." contents
+is shown here split on multiple lines for readability::
+
+  dm-linear,,1,rw,
+    0 32768 linear 8:1 0,
+    32768 1024000 linear 8:2 0;
+  dm-verity,,3,ro,
+    0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
+    ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
+    5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
+
+Other examples (per target):
+
+"crypt"::
+
+  dm-crypt,,8,ro,
+    0 1048576 crypt aes-xts-plain64
+    babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
+    /dev/sda 0 1 allow_discards
+
+"delay"::
+
+  dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
+
+"linear"::
+
+  dm-linear,,,rw,
+    0 32768 linear /dev/sda1 0,
+    32768 1024000 linear /dev/sda2 0,
+    1056768 204800 linear /dev/sda3 0,
+    1261568 512000 linear /dev/sda4 0
+
+"snapshot-origin"::
+
+  dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
+
+"striped"::
+
+  dm-striped,,4,ro,0 1638400 striped 4 4096
+  /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
+
+"verity"::
+
+  dm-verity,,4,ro,
+    0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
+    fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
+    51934789604d1b92399c52e7cb149d1b3a1b74bbbcb103b2a0aaacbed5c08584
diff --git a/Documentation/device-mapper/dm-init.txt b/Documentation/device-mapper/dm-init.txt
deleted file mode 100644
index 130b3c3679c5..000000000000
--- a/Documentation/device-mapper/dm-init.txt
+++ /dev/null
@@ -1,114 +0,0 @@
-Early creation of mapped devices
-====================================
-
-It is possible to configure a device-mapper device to act as the root device for
-your system in two ways.
-
-The first is to build an initial ramdisk which boots to a minimal userspace
-which configures the device, then pivot_root(8) in to it.
-
-The second is to create one or more device-mappers using the module parameter
-"dm-mod.create=" through the kernel boot command line argument.
-
-The format is specified as a string of data separated by commas and optionally
-semi-colons, where:
- - a comma is used to separate fields like name, uuid, flags and table
-   (specifies one device)
- - a semi-colon is used to separate devices.
-
-So the format will look like this:
-
- dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
-
-Where,
-	<name>		::= The device name.
-	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
-	<minor>		::= The device minor number | ""
-	<flags>		::= "ro" | "rw"
-	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
-	<target_type>	::= "verity" | "linear" | ... (see list below)
-
-The dm line should be equivalent to the one used by the dmsetup tool with the
---concise argument.
-
-Target types
-============
-
-Not all target types are available as there are serious risks in allowing
-activation of certain DM targets without first using userspace tools to check
-the validity of associated metadata.
-
-	"cache":		constrained, userspace should verify cache device
-	"crypt":		allowed
-	"delay":		allowed
-	"era":			constrained, userspace should verify metadata device
-	"flakey":		constrained, meant for test
-	"linear":		allowed
-	"log-writes":		constrained, userspace should verify metadata device
-	"mirror":		constrained, userspace should verify main/mirror device
-	"raid":			constrained, userspace should verify metadata device
-	"snapshot":		constrained, userspace should verify src/dst device
-	"snapshot-origin":	allowed
-	"snapshot-merge":	constrained, userspace should verify src/dst device
-	"striped":		allowed
-	"switch":		constrained, userspace should verify dev path
-	"thin":			constrained, requires dm target message from userspace
-	"thin-pool":		constrained, requires dm target message from userspace
-	"verity":		allowed
-	"writecache":		constrained, userspace should verify cache device
-	"zero":			constrained, not meant for rootfs
-
-If the target is not listed above, it is constrained by default (not tested).
-
-Examples
-========
-An example of booting to a linear array made up of user-mode linux block
-devices:
-
-  dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
-
-This will boot to a rw dm-linear target of 8192 sectors split across two block
-devices identified by their major:minor numbers.  After boot, udev will rename
-this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
-
-An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here
-split on multiple lines for readability:
-
-  dm-linear,,1,rw,
-    0 32768 linear 8:1 0,
-    32768 1024000 linear 8:2 0;
-  dm-verity,,3,ro,
-    0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
-    ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
-    5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
-
-Other examples (per target):
-
-"crypt":
-  dm-crypt,,8,ro,
-    0 1048576 crypt aes-xts-plain64
-    babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
-    /dev/sda 0 1 allow_discards
-
-"delay":
-  dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
-
-"linear":
-  dm-linear,,,rw,
-    0 32768 linear /dev/sda1 0,
-    32768 1024000 linear /dev/sda2 0,
-    1056768 204800 linear /dev/sda3 0,
-    1261568 512000 linear /dev/sda4 0
-
-"snapshot-origin":
-  dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
-
-"striped":
-  dm-striped,,4,ro,0 1638400 striped 4 4096
-  /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
-
-"verity":
-  dm-verity,,4,ro,
-    0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
-    fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
-    51934789604d1b92399c52e7cb149d1b3a1b74bbbcb103b2a0aaacbed5c08584
diff --git a/Documentation/device-mapper/dm-integrity.rst b/Documentation/device-mapper/dm-integrity.rst
new file mode 100644
index 000000000000..a30aa91b5fbe
--- /dev/null
+++ b/Documentation/device-mapper/dm-integrity.rst
@@ -0,0 +1,259 @@
+============
+dm-integrity
+============
+
+The dm-integrity target emulates a block device that has additional
+per-sector tags that can be used for storing integrity information.
+
+A general problem with storing integrity tags with every sector is that
+writing the sector and the integrity tag must be atomic - i.e. in case of
+crash, either both sector and integrity tag or none of them is written.
+
+To guarantee write atomicity, the dm-integrity target uses journal, it
+writes sector data and integrity tags into a journal, commits the journal
+and then copies the data and integrity tags to their respective location.
+
+The dm-integrity target can be used with the dm-crypt target - in this
+situation the dm-crypt target creates the integrity data and passes them
+to the dm-integrity target via bio_integrity_payload attached to the bio.
+In this mode, the dm-crypt and dm-integrity targets provide authenticated
+disk encryption - if the attacker modifies the encrypted device, an I/O
+error is returned instead of random data.
+
+The dm-integrity target can also be used as a standalone target, in this
+mode it calculates and verifies the integrity tag internally. In this
+mode, the dm-integrity target can be used to detect silent data
+corruption on the disk or in the I/O path.
+
+There's an alternate mode of operation where dm-integrity uses bitmap
+instead of a journal. If a bit in the bitmap is 1, the corresponding
+region's data and integrity tags are not synchronized - if the machine
+crashes, the unsynchronized regions will be recalculated. The bitmap mode
+is faster than the journal mode, because we don't have to write the data
+twice, but it is also less reliable, because if data corruption happens
+when the machine crashes, it may not be detected.
+
+When loading the target for the first time, the kernel driver will format
+the device. But it will only format the device if the superblock contains
+zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
+target can't be loaded.
+
+To use the target for the first time:
+
+1. overwrite the superblock with zeroes
+2. load the dm-integrity target with one-sector size, the kernel driver
+   will format the device
+3. unload the dm-integrity target
+4. read the "provided_data_sectors" value from the superblock
+5. load the dm-integrity target with the the target size
+   "provided_data_sectors"
+6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
+   with the size "provided_data_sectors"
+
+
+Target arguments:
+
+1. the underlying block device
+
+2. the number of reserved sector at the beginning of the device - the
+   dm-integrity won't read of write these sectors
+
+3. the size of the integrity tag (if "-" is used, the size is taken from
+   the internal-hash algorithm)
+
+4. mode:
+
+	D - direct writes (without journal)
+		in this mode, journaling is
+		not used and data sectors and integrity tags are written
+		separately. In case of crash, it is possible that the data
+		and integrity tag doesn't match.
+	J - journaled writes
+		data and integrity tags are written to the
+		journal and atomicity is guaranteed. In case of crash,
+		either both data and tag or none of them are written. The
+		journaled mode degrades write throughput twice because the
+		data have to be written twice.
+	B - bitmap mode - data and metadata are written without any
+		synchronization, the driver maintains a bitmap of dirty
+		regions where data and metadata don't match. This mode can
+		only be used with internal hash.
+	R - recovery mode - in this mode, journal is not replayed,
+		checksums are not checked and writes to the device are not
+		allowed. This mode is useful for data recovery if the
+		device cannot be activated in any of the other standard
+		modes.
+
+5. the number of additional arguments
+
+Additional arguments:
+
+journal_sectors:number
+	The size of journal, this argument is used only if formatting the
+	device. If the device is already formatted, the value from the
+	superblock is used.
+
+interleave_sectors:number
+	The number of interleaved sectors. This values is rounded down to
+	a power of two. If the device is already formatted, the value from
+	the superblock is used.
+
+meta_device:device
+	Don't interleave the data and metadata on on device. Use a
+	separate device for metadata.
+
+buffer_sectors:number
+	The number of sectors in one buffer. The value is rounded down to
+	a power of two.
+
+	The tag area is accessed using buffers, the buffer size is
+	configurable. The large buffer size means that the I/O size will
+	be larger, but there could be less I/Os issued.
+
+journal_watermark:number
+	The journal watermark in percents. When the size of the journal
+	exceeds this watermark, the thread that flushes the journal will
+	be started.
+
+commit_time:number
+	Commit time in milliseconds. When this time passes, the journal is
+	written. The journal is also written immediatelly if the FLUSH
+	request is received.
+
+internal_hash:algorithm(:key)	(the key is optional)
+	Use internal hash or crc.
+	When this argument is used, the dm-integrity target won't accept
+	integrity tags from the upper target, but it will automatically
+	generate and verify the integrity tags.
+
+	You can use a crc algorithm (such as crc32), then integrity target
+	will protect the data against accidental corruption.
+	You can also use a hmac algorithm (for example
+	"hmac(sha256):0123456789abcdef"), in this mode it will provide
+	cryptographic authentication of the data without encryption.
+
+	When this argument is not used, the integrity tags are accepted
+	from an upper layer target, such as dm-crypt. The upper layer
+	target should check the validity of the integrity tags.
+
+recalculate
+	Recalculate the integrity tags automatically. It is only valid
+	when using internal hash.
+
+journal_crypt:algorithm(:key)	(the key is optional)
+	Encrypt the journal using given algorithm to make sure that the
+	attacker can't read the journal. You can use a block cipher here
+	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
+	"salsa20", "ctr(aes)" or "ecb(arc4)").
+
+	The journal contains history of last writes to the block device,
+	an attacker reading the journal could see the last sector nubmers
+	that were written. From the sector numbers, the attacker can infer
+	the size of files that were written. To protect against this
+	situation, you can encrypt the journal.
+
+journal_mac:algorithm(:key)	(the key is optional)
+	Protect sector numbers in the journal from accidental or malicious
+	modification. To protect against accidental modification, use a
+	crc algorithm, to protect against malicious modification, use a
+	hmac algorithm with a key.
+
+	This option is not needed when using internal-hash because in this
+	mode, the integrity of journal entries is checked when replaying
+	the journal. Thus, modified sector number would be detected at
+	this stage.
+
+block_size:number
+	The size of a data block in bytes.  The larger the block size the
+	less overhead there is for per-block integrity metadata.
+	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
+	specified the default block size is 512 bytes.
+
+sectors_per_bit:number
+	In the bitmap mode, this parameter specifies the number of
+	512-byte sectors that corresponds to one bitmap bit.
+
+bitmap_flush_interval:number
+	The bitmap flush interval in milliseconds. The metadata buffers
+	are synchronized when this interval expires.
+
+
+The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
+be changed when reloading the target (load an inactive table and swap the
+tables with suspend and resume). The other arguments should not be changed
+when reloading the target because the layout of disk data depend on them
+and the reloaded target would be non-functional.
+
+
+The layout of the formatted block device:
+
+* reserved sectors
+    (they are not used by this target, they can be used for
+    storing LUKS metadata or for other purpose), the size of the reserved
+    area is specified in the target arguments
+
+* superblock (4kiB)
+	* magic string - identifies that the device was formatted
+	* version
+	* log2(interleave sectors)
+	* integrity tag size
+	* the number of journal sections
+	* provided data sectors - the number of sectors that this target
+	  provides (i.e. the size of the device minus the size of all
+	  metadata and padding). The user of this target should not send
+	  bios that access data beyond the "provided data sectors" limit.
+	* flags
+	    SB_FLAG_HAVE_JOURNAL_MAC
+		- a flag is set if journal_mac is used
+	    SB_FLAG_RECALCULATING
+		- recalculating is in progress
+	    SB_FLAG_DIRTY_BITMAP
+		- journal area contains the bitmap of dirty
+		  blocks
+	* log2(sectors per block)
+	* a position where recalculating finished
+* journal
+	The journal is divided into sections, each section contains:
+
+	* metadata area (4kiB), it contains journal entries
+
+	  - every journal entry contains:
+
+		* logical sector (specifies where the data and tag should
+		  be written)
+		* last 8 bytes of data
+		* integrity tag (the size is specified in the superblock)
+
+	  - every metadata sector ends with
+
+		* mac (8-bytes), all the macs in 8 metadata sectors form a
+		  64-byte value. It is used to store hmac of sector
+		  numbers in the journal section, to protect against a
+		  possibility that the attacker tampers with sector
+		  numbers in the journal.
+		* commit id
+
+	* data area (the size is variable; it depends on how many journal
+	  entries fit into the metadata area)
+
+	    - every sector in the data area contains:
+
+		* data (504 bytes of data, the last 8 bytes are stored in
+		  the journal entry)
+		* commit id
+
+	To test if the whole journal section was written correctly, every
+	512-byte sector of the journal ends with 8-byte commit id. If the
+	commit id matches on all sectors in a journal section, then it is
+	assumed that the section was written correctly. If the commit id
+	doesn't match, the section was written partially and it should not
+	be replayed.
+
+* one or more runs of interleaved tags and data.
+    Each run contains:
+
+	* tag area - it contains integrity tags. There is one tag for each
+	  sector in the data area
+	* data area - it contains data sectors. The number of data sectors
+	  in one run must be a power of two. log2 of this value is stored
+	  in the superblock.
diff --git a/Documentation/device-mapper/dm-integrity.txt b/Documentation/device-mapper/dm-integrity.txt
deleted file mode 100644
index d63d78ffeb73..000000000000
--- a/Documentation/device-mapper/dm-integrity.txt
+++ /dev/null
@@ -1,233 +0,0 @@
-The dm-integrity target emulates a block device that has additional
-per-sector tags that can be used for storing integrity information.
-
-A general problem with storing integrity tags with every sector is that
-writing the sector and the integrity tag must be atomic - i.e. in case of
-crash, either both sector and integrity tag or none of them is written.
-
-To guarantee write atomicity, the dm-integrity target uses journal, it
-writes sector data and integrity tags into a journal, commits the journal
-and then copies the data and integrity tags to their respective location.
-
-The dm-integrity target can be used with the dm-crypt target - in this
-situation the dm-crypt target creates the integrity data and passes them
-to the dm-integrity target via bio_integrity_payload attached to the bio.
-In this mode, the dm-crypt and dm-integrity targets provide authenticated
-disk encryption - if the attacker modifies the encrypted device, an I/O
-error is returned instead of random data.
-
-The dm-integrity target can also be used as a standalone target, in this
-mode it calculates and verifies the integrity tag internally. In this
-mode, the dm-integrity target can be used to detect silent data
-corruption on the disk or in the I/O path.
-
-There's an alternate mode of operation where dm-integrity uses bitmap
-instead of a journal. If a bit in the bitmap is 1, the corresponding
-region's data and integrity tags are not synchronized - if the machine
-crashes, the unsynchronized regions will be recalculated. The bitmap mode
-is faster than the journal mode, because we don't have to write the data
-twice, but it is also less reliable, because if data corruption happens
-when the machine crashes, it may not be detected.
-
-When loading the target for the first time, the kernel driver will format
-the device. But it will only format the device if the superblock contains
-zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
-target can't be loaded.
-
-To use the target for the first time:
-1. overwrite the superblock with zeroes
-2. load the dm-integrity target with one-sector size, the kernel driver
-	will format the device
-3. unload the dm-integrity target
-4. read the "provided_data_sectors" value from the superblock
-5. load the dm-integrity target with the the target size
-	"provided_data_sectors"
-6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
-	with the size "provided_data_sectors"
-
-
-Target arguments:
-
-1. the underlying block device
-
-2. the number of reserved sector at the beginning of the device - the
-	dm-integrity won't read of write these sectors
-
-3. the size of the integrity tag (if "-" is used, the size is taken from
-	the internal-hash algorithm)
-
-4. mode:
-	D - direct writes (without journal) - in this mode, journaling is
-		not used and data sectors and integrity tags are written
-		separately. In case of crash, it is possible that the data
-		and integrity tag doesn't match.
-	J - journaled writes - data and integrity tags are written to the
-		journal and atomicity is guaranteed. In case of crash,
-		either both data and tag or none of them are written. The
-		journaled mode degrades write throughput twice because the
-		data have to be written twice.
-	B - bitmap mode - data and metadata are written without any
-		synchronization, the driver maintains a bitmap of dirty
-		regions where data and metadata don't match. This mode can
-		only be used with internal hash.
-	R - recovery mode - in this mode, journal is not replayed,
-		checksums are not checked and writes to the device are not
-		allowed. This mode is useful for data recovery if the
-		device cannot be activated in any of the other standard
-		modes.
-
-5. the number of additional arguments
-
-Additional arguments:
-
-journal_sectors:number
-	The size of journal, this argument is used only if formatting the
-	device. If the device is already formatted, the value from the
-	superblock is used.
-
-interleave_sectors:number
-	The number of interleaved sectors. This values is rounded down to
-	a power of two. If the device is already formatted, the value from
-	the superblock is used.
-
-meta_device:device
-	Don't interleave the data and metadata on on device. Use a
-	separate device for metadata.
-
-buffer_sectors:number
-	The number of sectors in one buffer. The value is rounded down to
-	a power of two.
-
-	The tag area is accessed using buffers, the buffer size is
-	configurable. The large buffer size means that the I/O size will
-	be larger, but there could be less I/Os issued.
-
-journal_watermark:number
-	The journal watermark in percents. When the size of the journal
-	exceeds this watermark, the thread that flushes the journal will
-	be started.
-
-commit_time:number
-	Commit time in milliseconds. When this time passes, the journal is
-	written. The journal is also written immediatelly if the FLUSH
-	request is received.
-
-internal_hash:algorithm(:key)	(the key is optional)
-	Use internal hash or crc.
-	When this argument is used, the dm-integrity target won't accept
-	integrity tags from the upper target, but it will automatically
-	generate and verify the integrity tags.
-
-	You can use a crc algorithm (such as crc32), then integrity target
-	will protect the data against accidental corruption.
-	You can also use a hmac algorithm (for example
-	"hmac(sha256):0123456789abcdef"), in this mode it will provide
-	cryptographic authentication of the data without encryption.
-
-	When this argument is not used, the integrity tags are accepted
-	from an upper layer target, such as dm-crypt. The upper layer
-	target should check the validity of the integrity tags.
-
-recalculate
-	Recalculate the integrity tags automatically. It is only valid
-	when using internal hash.
-
-journal_crypt:algorithm(:key)	(the key is optional)
-	Encrypt the journal using given algorithm to make sure that the
-	attacker can't read the journal. You can use a block cipher here
-	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
-	"salsa20", "ctr(aes)" or "ecb(arc4)").
-
-	The journal contains history of last writes to the block device,
-	an attacker reading the journal could see the last sector nubmers
-	that were written. From the sector numbers, the attacker can infer
-	the size of files that were written. To protect against this
-	situation, you can encrypt the journal.
-
-journal_mac:algorithm(:key)	(the key is optional)
-	Protect sector numbers in the journal from accidental or malicious
-	modification. To protect against accidental modification, use a
-	crc algorithm, to protect against malicious modification, use a
-	hmac algorithm with a key.
-
-	This option is not needed when using internal-hash because in this
-	mode, the integrity of journal entries is checked when replaying
-	the journal. Thus, modified sector number would be detected at
-	this stage.
-
-block_size:number
-	The size of a data block in bytes.  The larger the block size the
-	less overhead there is for per-block integrity metadata.
-	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
-	specified the default block size is 512 bytes.
-
-sectors_per_bit:number
-	In the bitmap mode, this parameter specifies the number of
-	512-byte sectors that corresponds to one bitmap bit.
-
-bitmap_flush_interval:number
-	The bitmap flush interval in milliseconds. The metadata buffers
-	are synchronized when this interval expires.
-
-
-The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
-be changed when reloading the target (load an inactive table and swap the
-tables with suspend and resume). The other arguments should not be changed
-when reloading the target because the layout of disk data depend on them
-and the reloaded target would be non-functional.
-
-
-The layout of the formatted block device:
-* reserved sectors (they are not used by this target, they can be used for
-  storing LUKS metadata or for other purpose), the size of the reserved
-  area is specified in the target arguments
-* superblock (4kiB)
-	* magic string - identifies that the device was formatted
-	* version
-	* log2(interleave sectors)
-	* integrity tag size
-	* the number of journal sections
-	* provided data sectors - the number of sectors that this target
-	  provides (i.e. the size of the device minus the size of all
-	  metadata and padding). The user of this target should not send
-	  bios that access data beyond the "provided data sectors" limit.
-	* flags
-	  SB_FLAG_HAVE_JOURNAL_MAC - a flag is set if journal_mac is used
-	  SB_FLAG_RECALCULATING - recalculating is in progress
-	  SB_FLAG_DIRTY_BITMAP - journal area contains the bitmap of dirty
-		blocks
-	* log2(sectors per block)
-	* a position where recalculating finished
-* journal
-	The journal is divided into sections, each section contains:
-	* metadata area (4kiB), it contains journal entries
-	  every journal entry contains:
-		* logical sector (specifies where the data and tag should
-		  be written)
-		* last 8 bytes of data
-		* integrity tag (the size is specified in the superblock)
-	    every metadata sector ends with
-		* mac (8-bytes), all the macs in 8 metadata sectors form a
-		  64-byte value. It is used to store hmac of sector
-		  numbers in the journal section, to protect against a
-		  possibility that the attacker tampers with sector
-		  numbers in the journal.
-		* commit id
-	* data area (the size is variable; it depends on how many journal
-	  entries fit into the metadata area)
-	    every sector in the data area contains:
-		* data (504 bytes of data, the last 8 bytes are stored in
-		  the journal entry)
-		* commit id
-	To test if the whole journal section was written correctly, every
-	512-byte sector of the journal ends with 8-byte commit id. If the
-	commit id matches on all sectors in a journal section, then it is
-	assumed that the section was written correctly. If the commit id
-	doesn't match, the section was written partially and it should not
-	be replayed.
-* one or more runs of interleaved tags and data. Each run contains:
-	* tag area - it contains integrity tags. There is one tag for each
-	  sector in the data area
-	* data area - it contains data sectors. The number of data sectors
-	  in one run must be a power of two. log2 of this value is stored
-	  in the superblock.
diff --git a/Documentation/device-mapper/dm-io.rst b/Documentation/device-mapper/dm-io.rst
new file mode 100644
index 000000000000..d2492917a1f5
--- /dev/null
+++ b/Documentation/device-mapper/dm-io.rst
@@ -0,0 +1,75 @@
+=====
+dm-io
+=====
+
+Dm-io provides synchronous and asynchronous I/O services. There are three
+types of I/O services available, and each type has a sync and an async
+version.
+
+The user must set up an io_region structure to describe the desired location
+of the I/O. Each io_region indicates a block-device along with the starting
+sector and size of the region::
+
+   struct io_region {
+      struct block_device *bdev;
+      sector_t sector;
+      sector_t count;
+   };
+
+Dm-io can read from one io_region or write to one or more io_regions. Writes
+to multiple regions are specified by an array of io_region structures.
+
+The first I/O service type takes a list of memory pages as the data buffer for
+the I/O, along with an offset into the first page::
+
+   struct page_list {
+      struct page_list *next;
+      struct page *page;
+   };
+
+   int dm_io_sync(unsigned int num_regions, struct io_region *where, int rw,
+                  struct page_list *pl, unsigned int offset,
+                  unsigned long *error_bits);
+   int dm_io_async(unsigned int num_regions, struct io_region *where, int rw,
+                   struct page_list *pl, unsigned int offset,
+                   io_notify_fn fn, void *context);
+
+The second I/O service type takes an array of bio vectors as the data buffer
+for the I/O. This service can be handy if the caller has a pre-assembled bio,
+but wants to direct different portions of the bio to different devices::
+
+   int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
+                       int rw, struct bio_vec *bvec,
+                       unsigned long *error_bits);
+   int dm_io_async_bvec(unsigned int num_regions, struct io_region *where,
+                        int rw, struct bio_vec *bvec,
+                        io_notify_fn fn, void *context);
+
+The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
+data buffer for the I/O. This service can be handy if the caller needs to do
+I/O to a large region but doesn't want to allocate a large number of individual
+memory pages::
+
+   int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
+                     void *data, unsigned long *error_bits);
+   int dm_io_async_vm(unsigned int num_regions, struct io_region *where, int rw,
+                      void *data, io_notify_fn fn, void *context);
+
+Callers of the asynchronous I/O services must include the name of a completion
+callback routine and a pointer to some context data for the I/O::
+
+   typedef void (*io_notify_fn)(unsigned long error, void *context);
+
+The "error" parameter in this callback, as well as the `*error` parameter in
+all of the synchronous versions, is a bitset (instead of a simple error value).
+In the case of an write-I/O to multiple regions, this bitset allows dm-io to
+indicate success or failure on each individual region.
+
+Before using any of the dm-io services, the user should call dm_io_get()
+and specify the number of pages they expect to perform I/O on concurrently.
+Dm-io will attempt to resize its mempool to make sure enough pages are
+always available in order to avoid unnecessary waiting while performing I/O.
+
+When the user is finished using the dm-io services, they should call
+dm_io_put() and specify the same number of pages that were given on the
+dm_io_get() call.
diff --git a/Documentation/device-mapper/dm-io.txt b/Documentation/device-mapper/dm-io.txt
deleted file mode 100644
index 3b5d9a52cdcf..000000000000
--- a/Documentation/device-mapper/dm-io.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-dm-io
-=====
-
-Dm-io provides synchronous and asynchronous I/O services. There are three
-types of I/O services available, and each type has a sync and an async
-version.
-
-The user must set up an io_region structure to describe the desired location
-of the I/O. Each io_region indicates a block-device along with the starting
-sector and size of the region.
-
-   struct io_region {
-      struct block_device *bdev;
-      sector_t sector;
-      sector_t count;
-   };
-
-Dm-io can read from one io_region or write to one or more io_regions. Writes
-to multiple regions are specified by an array of io_region structures.
-
-The first I/O service type takes a list of memory pages as the data buffer for
-the I/O, along with an offset into the first page.
-
-   struct page_list {
-      struct page_list *next;
-      struct page *page;
-   };
-
-   int dm_io_sync(unsigned int num_regions, struct io_region *where, int rw,
-                  struct page_list *pl, unsigned int offset,
-                  unsigned long *error_bits);
-   int dm_io_async(unsigned int num_regions, struct io_region *where, int rw,
-                   struct page_list *pl, unsigned int offset,
-                   io_notify_fn fn, void *context);
-
-The second I/O service type takes an array of bio vectors as the data buffer
-for the I/O. This service can be handy if the caller has a pre-assembled bio,
-but wants to direct different portions of the bio to different devices.
-
-   int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
-                       int rw, struct bio_vec *bvec,
-                       unsigned long *error_bits);
-   int dm_io_async_bvec(unsigned int num_regions, struct io_region *where,
-                        int rw, struct bio_vec *bvec,
-                        io_notify_fn fn, void *context);
-
-The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
-data buffer for the I/O. This service can be handy if the caller needs to do
-I/O to a large region but doesn't want to allocate a large number of individual
-memory pages.
-
-   int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
-                     void *data, unsigned long *error_bits);
-   int dm_io_async_vm(unsigned int num_regions, struct io_region *where, int rw,
-                      void *data, io_notify_fn fn, void *context);
-
-Callers of the asynchronous I/O services must include the name of a completion
-callback routine and a pointer to some context data for the I/O.
-
-   typedef void (*io_notify_fn)(unsigned long error, void *context);
-
-The "error" parameter in this callback, as well as the "*error" parameter in
-all of the synchronous versions, is a bitset (instead of a simple error value).
-In the case of an write-I/O to multiple regions, this bitset allows dm-io to
-indicate success or failure on each individual region.
-
-Before using any of the dm-io services, the user should call dm_io_get()
-and specify the number of pages they expect to perform I/O on concurrently.
-Dm-io will attempt to resize its mempool to make sure enough pages are
-always available in order to avoid unnecessary waiting while performing I/O.
-
-When the user is finished using the dm-io services, they should call
-dm_io_put() and specify the same number of pages that were given on the
-dm_io_get() call.
-
diff --git a/Documentation/device-mapper/dm-log.rst b/Documentation/device-mapper/dm-log.rst
new file mode 100644
index 000000000000..ba4fce39bc27
--- /dev/null
+++ b/Documentation/device-mapper/dm-log.rst
@@ -0,0 +1,57 @@
+=====================
+Device-Mapper Logging
+=====================
+The device-mapper logging code is used by some of the device-mapper
+RAID targets to track regions of the disk that are not consistent.
+A region (or portion of the address space) of the disk may be
+inconsistent because a RAID stripe is currently being operated on or
+a machine died while the region was being altered.  In the case of
+mirrors, a region would be considered dirty/inconsistent while you
+are writing to it because the writes need to be replicated for all
+the legs of the mirror and may not reach the legs at the same time.
+Once all writes are complete, the region is considered clean again.
+
+There is a generic logging interface that the device-mapper RAID
+implementations use to perform logging operations (see
+dm_dirty_log_type in include/linux/dm-dirty-log.h).  Various different
+logging implementations are available and provide different
+capabilities.  The list includes:
+
+==============	==============================================================
+Type		Files
+==============	==============================================================
+disk		drivers/md/dm-log.c
+core		drivers/md/dm-log.c
+userspace	drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
+==============	==============================================================
+
+The "disk" log type
+-------------------
+This log implementation commits the log state to disk.  This way, the
+logging state survives reboots/crashes.
+
+The "core" log type
+-------------------
+This log implementation keeps the log state in memory.  The log state
+will not survive a reboot or crash, but there may be a small boost in
+performance.  This method can also be used if no storage device is
+available for storing log state.
+
+The "userspace" log type
+------------------------
+This log type simply provides a way to export the log API to userspace,
+so log implementations can be done there.  This is done by forwarding most
+logging requests to userspace, where a daemon receives and processes the
+request.
+
+The structure used for communication between kernel and userspace are
+located in include/linux/dm-log-userspace.h.  Due to the frequency,
+diversity, and 2-way communication nature of the exchanges between
+kernel and userspace, 'connector' is used as the interface for
+communication.
+
+There are currently two userspace log implementations that leverage this
+framework - "clustered-disk" and "clustered-core".  These implementations
+provide a cluster-coherent log for shared-storage.  Device-mapper mirroring
+can be used in a shared-storage environment when the cluster log implementations
+are employed.
diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt
deleted file mode 100644
index c155ac569c44..000000000000
--- a/Documentation/device-mapper/dm-log.txt
+++ /dev/null
@@ -1,54 +0,0 @@
-Device-Mapper Logging
-=====================
-The device-mapper logging code is used by some of the device-mapper
-RAID targets to track regions of the disk that are not consistent.
-A region (or portion of the address space) of the disk may be
-inconsistent because a RAID stripe is currently being operated on or
-a machine died while the region was being altered.  In the case of
-mirrors, a region would be considered dirty/inconsistent while you
-are writing to it because the writes need to be replicated for all
-the legs of the mirror and may not reach the legs at the same time.
-Once all writes are complete, the region is considered clean again.
-
-There is a generic logging interface that the device-mapper RAID
-implementations use to perform logging operations (see
-dm_dirty_log_type in include/linux/dm-dirty-log.h).  Various different
-logging implementations are available and provide different
-capabilities.  The list includes:
-
-Type		Files
-====		=====
-disk		drivers/md/dm-log.c
-core		drivers/md/dm-log.c
-userspace	drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
-
-The "disk" log type
--------------------
-This log implementation commits the log state to disk.  This way, the
-logging state survives reboots/crashes.
-
-The "core" log type
--------------------
-This log implementation keeps the log state in memory.  The log state
-will not survive a reboot or crash, but there may be a small boost in
-performance.  This method can also be used if no storage device is
-available for storing log state.
-
-The "userspace" log type
-------------------------
-This log type simply provides a way to export the log API to userspace,
-so log implementations can be done there.  This is done by forwarding most
-logging requests to userspace, where a daemon receives and processes the
-request.
-
-The structure used for communication between kernel and userspace are
-located in include/linux/dm-log-userspace.h.  Due to the frequency,
-diversity, and 2-way communication nature of the exchanges between
-kernel and userspace, 'connector' is used as the interface for
-communication.
-
-There are currently two userspace log implementations that leverage this
-framework - "clustered-disk" and "clustered-core".  These implementations
-provide a cluster-coherent log for shared-storage.  Device-mapper mirroring
-can be used in a shared-storage environment when the cluster log implementations
-are employed.
diff --git a/Documentation/device-mapper/dm-queue-length.rst b/Documentation/device-mapper/dm-queue-length.rst
new file mode 100644
index 000000000000..d8e381c1cb02
--- /dev/null
+++ b/Documentation/device-mapper/dm-queue-length.rst
@@ -0,0 +1,48 @@
+===============
+dm-queue-length
+===============
+
+dm-queue-length is a path selector module for device-mapper targets,
+which selects a path with the least number of in-flight I/Os.
+The path selector name is 'queue-length'.
+
+Table parameters for each path: [<repeat_count>]
+
+::
+
+	<repeat_count>: The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used. To check
+			the default value, see the activated table.
+
+Status for each path: <status> <fail-count> <in-flight>
+
+::
+
+	<status>: 'A' if the path is active, 'F' if the path is failed.
+	<fail-count>: The number of path failures.
+	<in-flight>: The number of in-flight I/Os on the path.
+
+
+Algorithm
+=========
+
+dm-queue-length increments/decrements 'in-flight' when an I/O is
+dispatched/completed respectively.
+dm-queue-length selects a path with the minimum 'in-flight'.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128.
+
+::
+
+  # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt
deleted file mode 100644
index f4db2562175c..000000000000
--- a/Documentation/device-mapper/dm-queue-length.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-dm-queue-length
-===============
-
-dm-queue-length is a path selector module for device-mapper targets,
-which selects a path with the least number of in-flight I/Os.
-The path selector name is 'queue-length'.
-
-Table parameters for each path: [<repeat_count>]
-	<repeat_count>: The number of I/Os to dispatch using the selected
-			path before switching to the next path.
-			If not given, internal default is used. To check
-			the default value, see the activated table.
-
-Status for each path: <status> <fail-count> <in-flight>
-	<status>: 'A' if the path is active, 'F' if the path is failed.
-	<fail-count>: The number of path failures.
-	<in-flight>: The number of in-flight I/Os on the path.
-
-
-Algorithm
-=========
-
-dm-queue-length increments/decrements 'in-flight' when an I/O is
-dispatched/completed respectively.
-dm-queue-length selects a path with the minimum 'in-flight'.
-
-
-Examples
-========
-In case that 2 paths (sda and sdb) are used with repeat_count == 128.
-
-# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
-  dmsetup create test
-#
-# dmsetup table
-test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
-#
-# dmsetup status
-test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-raid.rst b/Documentation/device-mapper/dm-raid.rst
new file mode 100644
index 000000000000..2fe255b130fb
--- /dev/null
+++ b/Documentation/device-mapper/dm-raid.rst
@@ -0,0 +1,419 @@
+=======
+dm-raid
+=======
+
+The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
+It allows the MD RAID drivers to be accessed using a device-mapper
+interface.
+
+
+Mapping Table Interface
+-----------------------
+The target is named "raid" and it accepts the following parameters::
+
+  <raid_type> <#raid_params> <raid_params> \
+    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
+
+<raid_type>:
+
+  ============= ===============================================================
+  raid0		RAID0 striping (no resilience)
+  raid1		RAID1 mirroring
+  raid4		RAID4 with dedicated last parity disk
+  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
+		Same as raid4
+
+		- Transitory layout
+  raid5_la	RAID5 left asymmetric
+
+		- rotating parity 0 with data continuation
+  raid5_ra	RAID5 right asymmetric
+
+		- rotating parity N with data continuation
+  raid5_ls	RAID5 left symmetric
+
+		- rotating parity 0 with data restart
+  raid5_rs 	RAID5 right symmetric
+
+		- rotating parity N with data restart
+  raid6_zr	RAID6 zero restart
+
+		- rotating parity zero (left-to-right) with data restart
+  raid6_nr	RAID6 N restart
+
+		- rotating parity N (right-to-left) with data restart
+  raid6_nc	RAID6 N continue
+
+		- rotating parity N (right-to-left) with data continuation
+  raid6_n_6	RAID6 with dedicate parity disks
+
+		- parity and Q-syndrome on the last 2 disks;
+		  layout for takeover from/to raid4/raid5_n
+  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_la from/to raid6
+  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_ra from/to raid6
+  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_ls from/to raid6
+  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_rs from/to raid6
+  raid10        Various RAID10 inspired algorithms chosen by additional params
+		(see raid10_format and raid10_copies below)
+
+		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
+		- RAID1E: Integrated Adjacent Stripe Mirroring
+		- RAID1E: Integrated Offset Stripe Mirroring
+		- and other similar RAID10 variants
+  ============= ===============================================================
+
+  Reference: Chapter 4 of
+  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
+
+<#raid_params>: The number of parameters that follow.
+
+<raid_params> consists of
+
+    Mandatory parameters:
+        <chunk_size>:
+		      Chunk size in sectors.  This parameter is often known as
+		      "stripe size".  It is the only mandatory parameter and
+		      is placed first.
+
+    followed by optional parameters (in any order):
+	[sync|nosync]
+		Force or prevent RAID initialization.
+
+	[rebuild <idx>]
+		Rebuild drive number 'idx' (first drive is 0).
+
+	[daemon_sleep <ms>]
+		Interval between runs of the bitmap daemon that
+		clear bits.  A longer interval means less bitmap I/O but
+		resyncing after a failure is likely to take longer.
+
+	[min_recovery_rate <kB/sec/disk>]
+		Throttle RAID initialization
+	[max_recovery_rate <kB/sec/disk>]
+		Throttle RAID initialization
+	[write_mostly <idx>]
+		Mark drive index 'idx' write-mostly.
+	[max_write_behind <sectors>]
+		See '--write-behind=' (man mdadm)
+	[stripe_cache <sectors>]
+		Stripe cache size (RAID 4/5/6 only)
+	[region_size <sectors>]
+		The region_size multiplied by the number of regions is the
+		logical size of the array.  The bitmap records the device
+		synchronisation state for each region.
+
+        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
+		These two options are used to alter the default layout of
+		a RAID10 configuration.  The number of copies is can be
+		specified, but the default is 2.  There are also three
+		variations to how the copies are laid down - the default
+		is "near".  Near copies are what most people think of with
+		respect to mirroring.  If these options are left unspecified,
+		or 'raid10_copies 2' and/or 'raid10_format near' are given,
+		then the layouts for 2, 3 and 4 devices	are:
+
+		========	 ==========	   ==============
+		2 drives         3 drives          4 drives
+		========	 ==========	   ==============
+		A1  A1           A1  A1  A2        A1  A1  A2  A2
+		A2  A2           A2  A3  A3        A3  A3  A4  A4
+		A3  A3           A4  A4  A5        A5  A5  A6  A6
+		A4  A4           A5  A6  A6        A7  A7  A8  A8
+		..  ..           ..  ..  ..        ..  ..  ..  ..
+		========	 ==========	   ==============
+
+		The 2-device layout is equivalent 2-way RAID1.  The 4-device
+		layout is what a traditional RAID10 would look like.  The
+		3-device layout is what might be called a 'RAID1E - Integrated
+		Adjacent Stripe Mirroring'.
+
+		If 'raid10_copies 2' and 'raid10_format far', then the layouts
+		for 2, 3 and 4 devices are:
+
+		========	     ============	  ===================
+		2 drives             3 drives             4 drives
+		========	     ============	  ===================
+		A1  A2               A1   A2   A3         A1   A2   A3   A4
+		A3  A4               A4   A5   A6         A5   A6   A7   A8
+		A5  A6               A7   A8   A9         A9   A10  A11  A12
+		..  ..               ..   ..   ..         ..   ..   ..   ..
+		A2  A1               A3   A1   A2         A2   A1   A4   A3
+		A4  A3               A6   A4   A5         A6   A5   A8   A7
+		A6  A5               A9   A7   A8         A10  A9   A12  A11
+		..  ..               ..   ..   ..         ..   ..   ..   ..
+		========	     ============	  ===================
+
+		If 'raid10_copies 2' and 'raid10_format offset', then the
+		layouts for 2, 3 and 4 devices are:
+
+		========       ==========         ================
+		2 drives       3 drives           4 drives
+		========       ==========         ================
+		A1  A2         A1  A2  A3         A1  A2  A3  A4
+		A2  A1         A3  A1  A2         A2  A1  A4  A3
+		A3  A4         A4  A5  A6         A5  A6  A7  A8
+		A4  A3         A6  A4  A5         A6  A5  A8  A7
+		A5  A6         A7  A8  A9         A9  A10 A11 A12
+		A6  A5         A9  A7  A8         A10 A9  A12 A11
+		..  ..         ..  ..  ..         ..  ..  ..  ..
+		========       ==========         ================
+
+		Here we see layouts closely akin to 'RAID1E - Integrated
+		Offset Stripe Mirroring'.
+
+        [delta_disks <N>]
+		The delta_disks option value (-251 < N < +251) triggers
+		device removal (negative value) or device addition (positive
+		value) to any reshape supporting raid levels 4/5/6 and 10.
+		RAID levels 4/5/6 allow for addition of devices (metadata
+		and data device tuple), raid10_near and raid10_offset only
+		allow for device addition. raid10_far does not support any
+		reshaping at all.
+		A minimum of devices have to be kept to enforce resilience,
+		which is 3 devices for raid4/5 and 4 devices for raid6.
+
+        [data_offset <sectors>]
+		This option value defines the offset into each data device
+		where the data starts. This is used to provide out-of-place
+		reshaping space to avoid writing over data while
+		changing the layout of stripes, hence an interruption/crash
+		may happen at any time without the risk of losing data.
+		E.g. when adding devices to an existing raid set during
+		forward reshaping, the out-of-place space will be allocated
+		at the beginning of each raid device. The kernel raid4/5/6/10
+		MD personalities supporting such device addition will read the data from
+		the existing first stripes (those with smaller number of stripes)
+		starting at data_offset to fill up a new stripe with the larger
+		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
+		and write that new stripe to offset 0. Same will be applied to all
+		N-1 other new stripes. This out-of-place scheme is used to change
+		the RAID type (i.e. the allocation algorithm) as well, e.g.
+		changing from raid5_ls to raid5_n.
+
+	[journal_dev <dev>]
+		This option adds a journal device to raid4/5/6 raid sets and
+		uses it to close the 'write hole' caused by the non-atomic updates
+		to the component devices which can cause data loss during recovery.
+		The journal device is used as writethrough thus causing writes to
+		be throttled versus non-journaled raid4/5/6 sets.
+		Takeover/reshape is not possible with a raid4/5/6 journal device;
+		it has to be deconfigured before requesting these.
+
+	[journal_mode <mode>]
+		This option sets the caching mode on journaled raid4/5/6 raid sets
+		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
+		If 'writeback' is selected the journal device has to be resilient
+		and must not suffer from the 'write hole' problem itself (e.g. use
+		raid1 or raid10) to avoid a single point of failure.
+
+<#raid_devs>: The number of devices composing the array.
+	Each device consists of two entries.  The first is the device
+	containing the metadata (if any); the second is the one containing the
+	data. A Maximum of 64 metadata/data device entries are supported
+	up to target version 1.8.0.
+	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
+
+	If a drive has failed or is missing at creation time, a '-' can be
+	given for both the metadata and data drives for a given position.
+
+
+Example Tables
+--------------
+
+::
+
+  # RAID4 - 4 data drives, 1 parity (no metadata devices)
+  # No metadata devices specified to hold superblock/bitmap info
+  # Chunk size of 1MiB
+  # (Lines separated for easy reading)
+
+  0 1960893648 raid \
+          raid4 1 2048 \
+          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
+
+  # RAID4 - 4 data drives, 1 parity (with metadata devices)
+  # Chunk size of 1MiB, force RAID initialization,
+  #       min recovery rate at 20 kiB/sec/disk
+
+  0 1960893648 raid \
+          raid4 4 2048 sync min_recovery_rate 20 \
+          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
+
+
+Status Output
+-------------
+'dmsetup table' displays the table used to construct the mapping.
+The optional parameters are always printed in the order listed
+above with "sync" or "nosync" always output ahead of the other
+arguments, regardless of the order used when originally loading the table.
+Arguments that can be repeated are ordered by value.
+
+
+'dmsetup status' yields information on the state and health of the array.
+The output is as follows (normally a single line, but expanded here for
+clarity)::
+
+  1: <s> <l> raid \
+  2:      <raid_type> <#devices> <health_chars> \
+  3:      <sync_ratio> <sync_action> <mismatch_cnt>
+
+Line 1 is the standard output produced by device-mapper.
+
+Line 2 & 3 are produced by the raid target and are best explained by example::
+
+        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
+
+Here we can see the RAID type is raid4, there are 5 devices - all of
+which are 'A'live, and the array is 2/490221568 complete with its initial
+recovery.  Here is a fuller description of the individual fields:
+
+	=============== =========================================================
+	<raid_type>     Same as the <raid_type> used to create the array.
+	<health_chars>  One char for each device, indicating:
+
+			- 'A' = alive and in-sync
+			- 'a' = alive but not in-sync
+			- 'D' = dead/failed.
+	<sync_ratio>    The ratio indicating how much of the array has undergone
+			the process described by 'sync_action'.  If the
+			'sync_action' is "check" or "repair", then the process
+			of "resync" or "recover" can be considered complete.
+	<sync_action>   One of the following possible states:
+
+			idle
+				- No synchronization action is being performed.
+			frozen
+				- The current action has been halted.
+			resync
+				- Array is undergoing its initial synchronization
+				  or is resynchronizing after an unclean shutdown
+				  (possibly aided by a bitmap).
+			recover
+				- A device in the array is being rebuilt or
+				  replaced.
+			check
+				- A user-initiated full check of the array is
+				  being performed.  All blocks are read and
+				  checked for consistency.  The number of
+				  discrepancies found are recorded in
+				  <mismatch_cnt>.  No changes are made to the
+				  array by this action.
+			repair
+				- The same as "check", but discrepancies are
+				  corrected.
+			reshape
+				- The array is undergoing a reshape.
+	<mismatch_cnt>  The number of discrepancies found between mirror copies
+			in RAID1/10 or wrong parity values found in RAID4/5/6.
+			This value is valid only after a "check" of the array
+			is performed.  A healthy array has a 'mismatch_cnt' of 0.
+	<data_offset>   The current data offset to the start of the user data on
+			each component device of a raid set (see the respective
+			raid parameter to support out-of-place reshaping).
+	<journal_char>	- 'A' - active write-through journal device.
+			- 'a' - active write-back journal device.
+			- 'D' - dead journal device.
+			- '-' - no journal device.
+	=============== =========================================================
+
+
+Message Interface
+-----------------
+The dm-raid target will accept certain actions through the 'message' interface.
+('man dmsetup' for more information on the message interface.)  These actions
+include:
+
+	========= ================================================
+	"idle"    Halt the current sync action.
+	"frozen"  Freeze the current sync action.
+	"resync"  Initiate/continue a resync.
+	"recover" Initiate/continue a recover process.
+	"check"   Initiate a check (i.e. a "scrub") of the array.
+	"repair"  Initiate a repair of the array.
+	========= ================================================
+
+
+Discard Support
+---------------
+The implementation of discard support among hardware vendors varies.
+When a block is discarded, some storage devices will return zeroes when
+the block is read.  These devices set the 'discard_zeroes_data'
+attribute.  Other devices will return random data.  Confusingly, some
+devices that advertise 'discard_zeroes_data' will not reliably return
+zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
+from a number of devices to calculate parity blocks and (for performance
+reasons) relies on 'discard_zeroes_data' being reliable, it is important
+that the devices be consistent.  Blocks may be discarded in the middle
+of a RAID 4/5/6 stripe and if subsequent read results are not
+consistent, the parity blocks may be calculated differently at any time;
+making the parity blocks useless for redundancy.  It is important to
+understand how your hardware behaves with discards if you are going to
+enable discards with RAID 4/5/6.
+
+Since the behavior of storage devices is unreliable in this respect,
+even when reporting 'discard_zeroes_data', by default RAID 4/5/6
+discard support is disabled -- this ensures data integrity at the
+expense of losing some performance.
+
+Storage devices that properly support 'discard_zeroes_data' are
+increasingly whitelisted in the kernel and can thus be trusted.
+
+For trusted devices, the following dm-raid module parameter can be set
+to safely enable discard support for RAID 4/5/6:
+
+    'devices_handle_discards_safely'
+
+
+Version History
+---------------
+
+::
+
+ 1.0.0	Initial version.  Support for RAID 4/5/6
+ 1.1.0	Added support for RAID 1
+ 1.2.0	Handle creation of arrays that contain failed devices.
+ 1.3.0	Added support for RAID 10
+ 1.3.1	Allow device replacement/rebuild for RAID 10
+ 1.3.2	Fix/improve redundancy checking for RAID10
+ 1.4.0	Non-functional change.  Removes arg from mapping function.
+ 1.4.1	RAID10 fix redundancy validation checks (commit 55ebbb5).
+ 1.4.2	Add RAID10 "far" and "offset" algorithm support.
+ 1.5.0	Add message interface to allow manipulation of the sync_action.
+	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
+ 1.5.1	Add ability to restore transiently failed devices on resume.
+ 1.5.2	'mismatch_cnt' is zero unless [last_]sync_action is "check".
+ 1.6.0	Add discard support (and devices_handle_discard_safely module param).
+ 1.7.0	Add support for MD RAID0 mappings.
+ 1.8.0	Explicitly check for compatible flags in the superblock metadata
+	and reject to start the raid set if any are set by a newer
+	target version, thus avoiding data corruption on a raid set
+	with a reshape in progress.
+ 1.9.0	Add support for RAID level takeover/reshape/region size
+	and set size reduction.
+ 1.9.1	Fix activation of existing RAID 4/10 mapped devices
+ 1.9.2	Don't emit '- -' on the status table line in case the constructor
+	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
+	'D' on the status line.  If '- -' is passed into the constructor, emit
+	'- -' on the table line and '-' as the status line health character.
+ 1.10.0	Add support for raid4/5/6 journal device
+ 1.10.1	Fix data corruption on reshape request
+ 1.11.0	Fix table line argument order
+	(wrong raid10_copies/raid10_format sequence)
+ 1.11.1	Add raid4/5/6 journal write-back support via journal_mode option
+ 1.12.1	Fix for MD deadlock between mddev_suspend() and md_write_start() available
+ 1.13.0	Fix dev_health status at end of "recover" (was 'a', now 'A')
+ 1.13.1	Fix deadlock caused by early md_stop_writes().  Also fix size an
+	state races.
+ 1.13.2	Fix raid redundancy validation and avoid keeping raid set frozen
+ 1.14.0	Fix reshape race on small devices.  Fix stripe adding reshape
+	deadlock/potential data corruption.  Update superblock when
+	specific devices are requested via rebuild.  Fix RAID leg
+	rebuild errors.
diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt
deleted file mode 100644
index 2355bef14653..000000000000
--- a/Documentation/device-mapper/dm-raid.txt
+++ /dev/null
@@ -1,354 +0,0 @@
-dm-raid
-=======
-
-The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
-It allows the MD RAID drivers to be accessed using a device-mapper
-interface.
-
-
-Mapping Table Interface
------------------------
-The target is named "raid" and it accepts the following parameters:
-
-  <raid_type> <#raid_params> <raid_params> \
-    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
-
-<raid_type>:
-  raid0		RAID0 striping (no resilience)
-  raid1		RAID1 mirroring
-  raid4		RAID4 with dedicated last parity disk
-  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
-		Same as raid4
-		-Transitory layout
-  raid5_la	RAID5 left asymmetric
-		- rotating parity 0 with data continuation
-  raid5_ra	RAID5 right asymmetric
-		- rotating parity N with data continuation
-  raid5_ls	RAID5 left symmetric
-		- rotating parity 0 with data restart
-  raid5_rs 	RAID5 right symmetric
-		- rotating parity N with data restart
-  raid6_zr	RAID6 zero restart
-		- rotating parity zero (left-to-right) with data restart
-  raid6_nr	RAID6 N restart
-		- rotating parity N (right-to-left) with data restart
-  raid6_nc	RAID6 N continue
-		- rotating parity N (right-to-left) with data continuation
-  raid6_n_6	RAID6 with dedicate parity disks
-		- parity and Q-syndrome on the last 2 disks;
-		  layout for takeover from/to raid4/raid5_n
-  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
-		- layout for takeover from raid5_la from/to raid6
-  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
-		- layout for takeover from raid5_ra from/to raid6
-  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
-		- layout for takeover from raid5_ls from/to raid6
-  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
-		- layout for takeover from raid5_rs from/to raid6
-  raid10        Various RAID10 inspired algorithms chosen by additional params
-		(see raid10_format and raid10_copies below)
-		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
-		- RAID1E: Integrated Adjacent Stripe Mirroring
-		- RAID1E: Integrated Offset Stripe Mirroring
-		-  and other similar RAID10 variants
-
-  Reference: Chapter 4 of
-  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
-
-<#raid_params>: The number of parameters that follow.
-
-<raid_params> consists of
-    Mandatory parameters:
-        <chunk_size>: Chunk size in sectors.  This parameter is often known as
-		      "stripe size".  It is the only mandatory parameter and
-		      is placed first.
-
-    followed by optional parameters (in any order):
-	[sync|nosync]   Force or prevent RAID initialization.
-
-	[rebuild <idx>]	Rebuild drive number 'idx' (first drive is 0).
-
-	[daemon_sleep <ms>]
-		Interval between runs of the bitmap daemon that
-		clear bits.  A longer interval means less bitmap I/O but
-		resyncing after a failure is likely to take longer.
-
-	[min_recovery_rate <kB/sec/disk>]  Throttle RAID initialization
-	[max_recovery_rate <kB/sec/disk>]  Throttle RAID initialization
-	[write_mostly <idx>]		   Mark drive index 'idx' write-mostly.
-	[max_write_behind <sectors>]       See '--write-behind=' (man mdadm)
-	[stripe_cache <sectors>]           Stripe cache size (RAID 4/5/6 only)
-	[region_size <sectors>]
-		The region_size multiplied by the number of regions is the
-		logical size of the array.  The bitmap records the device
-		synchronisation state for each region.
-
-        [raid10_copies   <# copies>]
-        [raid10_format   <near|far|offset>]
-		These two options are used to alter the default layout of
-		a RAID10 configuration.  The number of copies is can be
-		specified, but the default is 2.  There are also three
-		variations to how the copies are laid down - the default
-		is "near".  Near copies are what most people think of with
-		respect to mirroring.  If these options are left unspecified,
-		or 'raid10_copies 2' and/or 'raid10_format near' are given,
-		then the layouts for 2, 3 and 4 devices	are:
-		2 drives         3 drives          4 drives
-		--------         ----------        --------------
-		A1  A1           A1  A1  A2        A1  A1  A2  A2
-		A2  A2           A2  A3  A3        A3  A3  A4  A4
-		A3  A3           A4  A4  A5        A5  A5  A6  A6
-		A4  A4           A5  A6  A6        A7  A7  A8  A8
-		..  ..           ..  ..  ..        ..  ..  ..  ..
-		The 2-device layout is equivalent 2-way RAID1.  The 4-device
-		layout is what a traditional RAID10 would look like.  The
-		3-device layout is what might be called a 'RAID1E - Integrated
-		Adjacent Stripe Mirroring'.
-
-		If 'raid10_copies 2' and 'raid10_format far', then the layouts
-		for 2, 3 and 4 devices are:
-		2 drives             3 drives             4 drives
-		--------             --------------       --------------------
-		A1  A2               A1   A2   A3         A1   A2   A3   A4
-		A3  A4               A4   A5   A6         A5   A6   A7   A8
-		A5  A6               A7   A8   A9         A9   A10  A11  A12
-		..  ..               ..   ..   ..         ..   ..   ..   ..
-		A2  A1               A3   A1   A2         A2   A1   A4   A3
-		A4  A3               A6   A4   A5         A6   A5   A8   A7
-		A6  A5               A9   A7   A8         A10  A9   A12  A11
-		..  ..               ..   ..   ..         ..   ..   ..   ..
-
-		If 'raid10_copies 2' and 'raid10_format offset', then the
-		layouts for 2, 3 and 4 devices are:
-		2 drives       3 drives           4 drives
-		--------       ------------       -----------------
-		A1  A2         A1  A2  A3         A1  A2  A3  A4
-		A2  A1         A3  A1  A2         A2  A1  A4  A3
-		A3  A4         A4  A5  A6         A5  A6  A7  A8
-		A4  A3         A6  A4  A5         A6  A5  A8  A7
-		A5  A6         A7  A8  A9         A9  A10 A11 A12
-		A6  A5         A9  A7  A8         A10 A9  A12 A11
-		..  ..         ..  ..  ..         ..  ..  ..  ..
-		Here we see layouts closely akin to 'RAID1E - Integrated
-		Offset Stripe Mirroring'.
-
-        [delta_disks <N>]
-		The delta_disks option value (-251 < N < +251) triggers
-		device removal (negative value) or device addition (positive
-		value) to any reshape supporting raid levels 4/5/6 and 10.
-		RAID levels 4/5/6 allow for addition of devices (metadata
-		and data device tuple), raid10_near and raid10_offset only
-		allow for device addition. raid10_far does not support any
-		reshaping at all.
-		A minimum of devices have to be kept to enforce resilience,
-		which is 3 devices for raid4/5 and 4 devices for raid6.
-
-        [data_offset <sectors>]
-		This option value defines the offset into each data device
-		where the data starts. This is used to provide out-of-place
-		reshaping space to avoid writing over data while
-		changing the layout of stripes, hence an interruption/crash
-		may happen at any time without the risk of losing data.
-		E.g. when adding devices to an existing raid set during
-		forward reshaping, the out-of-place space will be allocated
-		at the beginning of each raid device. The kernel raid4/5/6/10
-		MD personalities supporting such device addition will read the data from
-		the existing first stripes (those with smaller number of stripes)
-		starting at data_offset to fill up a new stripe with the larger
-		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
-		and write that new stripe to offset 0. Same will be applied to all
-		N-1 other new stripes. This out-of-place scheme is used to change
-		the RAID type (i.e. the allocation algorithm) as well, e.g.
-		changing from raid5_ls to raid5_n.
-
-	[journal_dev <dev>]
-		This option adds a journal device to raid4/5/6 raid sets and
-		uses it to close the 'write hole' caused by the non-atomic updates
-		to the component devices which can cause data loss during recovery.
-		The journal device is used as writethrough thus causing writes to
-		be throttled versus non-journaled raid4/5/6 sets.
-		Takeover/reshape is not possible with a raid4/5/6 journal device;
-		it has to be deconfigured before requesting these.
-
-	[journal_mode <mode>]
-		This option sets the caching mode on journaled raid4/5/6 raid sets
-		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
-		If 'writeback' is selected the journal device has to be resilient
-		and must not suffer from the 'write hole' problem itself (e.g. use
-		raid1 or raid10) to avoid a single point of failure.
-
-<#raid_devs>: The number of devices composing the array.
-	Each device consists of two entries.  The first is the device
-	containing the metadata (if any); the second is the one containing the
-	data. A Maximum of 64 metadata/data device entries are supported
-	up to target version 1.8.0.
-	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
-
-	If a drive has failed or is missing at creation time, a '-' can be
-	given for both the metadata and data drives for a given position.
-
-
-Example Tables
---------------
-# RAID4 - 4 data drives, 1 parity (no metadata devices)
-# No metadata devices specified to hold superblock/bitmap info
-# Chunk size of 1MiB
-# (Lines separated for easy reading)
-
-0 1960893648 raid \
-        raid4 1 2048 \
-        5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
-
-# RAID4 - 4 data drives, 1 parity (with metadata devices)
-# Chunk size of 1MiB, force RAID initialization,
-#       min recovery rate at 20 kiB/sec/disk
-
-0 1960893648 raid \
-        raid4 4 2048 sync min_recovery_rate 20 \
-        5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
-
-
-Status Output
--------------
-'dmsetup table' displays the table used to construct the mapping.
-The optional parameters are always printed in the order listed
-above with "sync" or "nosync" always output ahead of the other
-arguments, regardless of the order used when originally loading the table.
-Arguments that can be repeated are ordered by value.
-
-
-'dmsetup status' yields information on the state and health of the array.
-The output is as follows (normally a single line, but expanded here for
-clarity):
-1: <s> <l> raid \
-2:      <raid_type> <#devices> <health_chars> \
-3:      <sync_ratio> <sync_action> <mismatch_cnt>
-
-Line 1 is the standard output produced by device-mapper.
-Line 2 & 3 are produced by the raid target and are best explained by example:
-        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
-Here we can see the RAID type is raid4, there are 5 devices - all of
-which are 'A'live, and the array is 2/490221568 complete with its initial
-recovery.  Here is a fuller description of the individual fields:
-	<raid_type>     Same as the <raid_type> used to create the array.
-	<health_chars>  One char for each device, indicating: 'A' = alive and
-			in-sync, 'a' = alive but not in-sync, 'D' = dead/failed.
-	<sync_ratio>    The ratio indicating how much of the array has undergone
-			the process described by 'sync_action'.  If the
-			'sync_action' is "check" or "repair", then the process
-			of "resync" or "recover" can be considered complete.
-	<sync_action>   One of the following possible states:
-			idle    - No synchronization action is being performed.
-			frozen  - The current action has been halted.
-			resync  - Array is undergoing its initial synchronization
-				  or is resynchronizing after an unclean shutdown
-				  (possibly aided by a bitmap).
-			recover - A device in the array is being rebuilt or
-				  replaced.
-			check   - A user-initiated full check of the array is
-				  being performed.  All blocks are read and
-				  checked for consistency.  The number of
-				  discrepancies found are recorded in
-				  <mismatch_cnt>.  No changes are made to the
-				  array by this action.
-			repair  - The same as "check", but discrepancies are
-				  corrected.
-			reshape - The array is undergoing a reshape.
-	<mismatch_cnt>  The number of discrepancies found between mirror copies
-			in RAID1/10 or wrong parity values found in RAID4/5/6.
-			This value is valid only after a "check" of the array
-			is performed.  A healthy array has a 'mismatch_cnt' of 0.
-	<data_offset>   The current data offset to the start of the user data on
-			each component device of a raid set (see the respective
-			raid parameter to support out-of-place reshaping).
-	<journal_char>	'A' - active write-through journal device.
-			'a' - active write-back journal device.
-			'D' - dead journal device.
-			'-' - no journal device.
-
-
-Message Interface
------------------
-The dm-raid target will accept certain actions through the 'message' interface.
-('man dmsetup' for more information on the message interface.)  These actions
-include:
-	"idle"   - Halt the current sync action.
-	"frozen" - Freeze the current sync action.
-	"resync" - Initiate/continue a resync.
-	"recover"- Initiate/continue a recover process.
-	"check"  - Initiate a check (i.e. a "scrub") of the array.
-	"repair" - Initiate a repair of the array.
-
-
-Discard Support
----------------
-The implementation of discard support among hardware vendors varies.
-When a block is discarded, some storage devices will return zeroes when
-the block is read.  These devices set the 'discard_zeroes_data'
-attribute.  Other devices will return random data.  Confusingly, some
-devices that advertise 'discard_zeroes_data' will not reliably return
-zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
-from a number of devices to calculate parity blocks and (for performance
-reasons) relies on 'discard_zeroes_data' being reliable, it is important
-that the devices be consistent.  Blocks may be discarded in the middle
-of a RAID 4/5/6 stripe and if subsequent read results are not
-consistent, the parity blocks may be calculated differently at any time;
-making the parity blocks useless for redundancy.  It is important to
-understand how your hardware behaves with discards if you are going to
-enable discards with RAID 4/5/6.
-
-Since the behavior of storage devices is unreliable in this respect,
-even when reporting 'discard_zeroes_data', by default RAID 4/5/6
-discard support is disabled -- this ensures data integrity at the
-expense of losing some performance.
-
-Storage devices that properly support 'discard_zeroes_data' are
-increasingly whitelisted in the kernel and can thus be trusted.
-
-For trusted devices, the following dm-raid module parameter can be set
-to safely enable discard support for RAID 4/5/6:
-    'devices_handle_discards_safely'
-
-
-Version History
----------------
-1.0.0	Initial version.  Support for RAID 4/5/6
-1.1.0	Added support for RAID 1
-1.2.0	Handle creation of arrays that contain failed devices.
-1.3.0	Added support for RAID 10
-1.3.1	Allow device replacement/rebuild for RAID 10
-1.3.2   Fix/improve redundancy checking for RAID10
-1.4.0	Non-functional change.  Removes arg from mapping function.
-1.4.1   RAID10 fix redundancy validation checks (commit 55ebbb5).
-1.4.2   Add RAID10 "far" and "offset" algorithm support.
-1.5.0   Add message interface to allow manipulation of the sync_action.
-	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
-1.5.1   Add ability to restore transiently failed devices on resume.
-1.5.2   'mismatch_cnt' is zero unless [last_]sync_action is "check".
-1.6.0   Add discard support (and devices_handle_discard_safely module param).
-1.7.0   Add support for MD RAID0 mappings.
-1.8.0   Explicitly check for compatible flags in the superblock metadata
-	and reject to start the raid set if any are set by a newer
-	target version, thus avoiding data corruption on a raid set
-	with a reshape in progress.
-1.9.0   Add support for RAID level takeover/reshape/region size
-	and set size reduction.
-1.9.1   Fix activation of existing RAID 4/10 mapped devices
-1.9.2   Don't emit '- -' on the status table line in case the constructor
-	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
-	'D' on the status line.  If '- -' is passed into the constructor, emit
-	'- -' on the table line and '-' as the status line health character.
-1.10.0  Add support for raid4/5/6 journal device
-1.10.1  Fix data corruption on reshape request
-1.11.0  Fix table line argument order
-	(wrong raid10_copies/raid10_format sequence)
-1.11.1  Add raid4/5/6 journal write-back support via journal_mode option
-1.12.1  Fix for MD deadlock between mddev_suspend() and md_write_start() available
-1.13.0  Fix dev_health status at end of "recover" (was 'a', now 'A')
-1.13.1  Fix deadlock caused by early md_stop_writes().  Also fix size an
-	state races.
-1.13.2  Fix raid redundancy validation and avoid keeping raid set frozen
-1.14.0  Fix reshape race on small devices.  Fix stripe adding reshape
-	deadlock/potential data corruption.  Update superblock when
-	specific devices are requested via rebuild.  Fix RAID leg
-	rebuild errors.
diff --git a/Documentation/device-mapper/dm-service-time.rst b/Documentation/device-mapper/dm-service-time.rst
new file mode 100644
index 000000000000..facf277fc13c
--- /dev/null
+++ b/Documentation/device-mapper/dm-service-time.rst
@@ -0,0 +1,101 @@
+===============
+dm-service-time
+===============
+
+dm-service-time is a path selector module for device-mapper targets,
+which selects a path with the shortest estimated service time for
+the incoming I/O.
+
+The service time for each path is estimated by dividing the total size
+of in-flight I/Os on a path with the performance value of the path.
+The performance value is a relative throughput value among all paths
+in a path-group, and it can be specified as a table argument.
+
+The path selector name is 'service-time'.
+
+Table parameters for each path:
+
+    [<repeat_count> [<relative_throughput>]]
+	<repeat_count>:
+			The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used.  To check
+			the default value, see the activated table.
+	<relative_throughput>:
+			The relative throughput value of the path
+			among all paths in the path-group.
+			The valid range is 0-100.
+			If not given, minimum value '1' is used.
+			If '0' is given, the path isn't selected while
+			other paths having a positive value are available.
+
+Status for each path:
+
+    <status> <fail-count> <in-flight-size> <relative_throughput>
+	<status>:
+		'A' if the path is active, 'F' if the path is failed.
+	<fail-count>:
+		The number of path failures.
+	<in-flight-size>:
+		The size of in-flight I/Os on the path.
+	<relative_throughput>:
+		The relative throughput value of the path
+		among all paths in the path-group.
+
+
+Algorithm
+=========
+
+dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
+dispatched and subtracts when completed.
+Basically, dm-service-time selects a path having minimum service time
+which is calculated by::
+
+	('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
+
+However, some optimizations below are used to reduce the calculation
+as much as possible.
+
+	1. If the paths have the same 'relative_throughput', skip
+	   the division and just compare the 'in-flight-size'.
+
+	2. If the paths have the same 'in-flight-size', skip the division
+	   and just compare the 'relative_throughput'.
+
+	3. If some paths have non-zero 'relative_throughput' and others
+	   have zero 'relative_throughput', ignore those paths with zero
+	   'relative_throughput'.
+
+If such optimizations can't be applied, calculate service time, and
+compare service time.
+If calculated service time is equal, the path having maximum
+'relative_throughput' may be better.  So compare 'relative_throughput'
+then.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128
+and sda has an average throughput 1GB/s and sdb has 4GB/s,
+'relative_throughput' value may be '1' for sda and '4' for sdb::
+
+  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
+
+
+Or '2' for sda and '8' for sdb would be also true::
+
+  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt
deleted file mode 100644
index fb1d4a0cf122..000000000000
--- a/Documentation/device-mapper/dm-service-time.txt
+++ /dev/null
@@ -1,91 +0,0 @@
-dm-service-time
-===============
-
-dm-service-time is a path selector module for device-mapper targets,
-which selects a path with the shortest estimated service time for
-the incoming I/O.
-
-The service time for each path is estimated by dividing the total size
-of in-flight I/Os on a path with the performance value of the path.
-The performance value is a relative throughput value among all paths
-in a path-group, and it can be specified as a table argument.
-
-The path selector name is 'service-time'.
-
-Table parameters for each path: [<repeat_count> [<relative_throughput>]]
-	<repeat_count>: The number of I/Os to dispatch using the selected
-			path before switching to the next path.
-			If not given, internal default is used.  To check
-			the default value, see the activated table.
-	<relative_throughput>: The relative throughput value of the path
-			among all paths in the path-group.
-			The valid range is 0-100.
-			If not given, minimum value '1' is used.
-			If '0' is given, the path isn't selected while
-			other paths having a positive value are available.
-
-Status for each path: <status> <fail-count> <in-flight-size> \
-		      <relative_throughput>
-	<status>: 'A' if the path is active, 'F' if the path is failed.
-	<fail-count>: The number of path failures.
-	<in-flight-size>: The size of in-flight I/Os on the path.
-	<relative_throughput>: The relative throughput value of the path
-			among all paths in the path-group.
-
-
-Algorithm
-=========
-
-dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
-dispatched and subtracts when completed.
-Basically, dm-service-time selects a path having minimum service time
-which is calculated by:
-
-	('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
-
-However, some optimizations below are used to reduce the calculation
-as much as possible.
-
-	1. If the paths have the same 'relative_throughput', skip
-	   the division and just compare the 'in-flight-size'.
-
-	2. If the paths have the same 'in-flight-size', skip the division
-	   and just compare the 'relative_throughput'.
-
-	3. If some paths have non-zero 'relative_throughput' and others
-	   have zero 'relative_throughput', ignore those paths with zero
-	   'relative_throughput'.
-
-If such optimizations can't be applied, calculate service time, and
-compare service time.
-If calculated service time is equal, the path having maximum
-'relative_throughput' may be better.  So compare 'relative_throughput'
-then.
-
-
-Examples
-========
-In case that 2 paths (sda and sdb) are used with repeat_count == 128
-and sda has an average throughput 1GB/s and sdb has 4GB/s,
-'relative_throughput' value may be '1' for sda and '4' for sdb.
-
-# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
-  dmsetup create test
-#
-# dmsetup table
-test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
-#
-# dmsetup status
-test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
-
-
-Or '2' for sda and '8' for sdb would be also true.
-
-# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
-  dmsetup create test
-#
-# dmsetup table
-test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
-#
-# dmsetup status
-test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/device-mapper/dm-uevent.rst b/Documentation/device-mapper/dm-uevent.rst
new file mode 100644
index 000000000000..4a8ee8d069c9
--- /dev/null
+++ b/Documentation/device-mapper/dm-uevent.rst
@@ -0,0 +1,110 @@
+====================
+device-mapper uevent
+====================
+
+The device-mapper uevent code adds the capability to device-mapper to create
+and send kobject uevents (uevents).  Previously device-mapper events were only
+available through the ioctl interface.  The advantage of the uevents interface
+is the event contains environment attributes providing increased context for
+the event avoiding the need to query the state of the device-mapper device after
+the event is received.
+
+There are two functions currently for device-mapper events.  The first function
+listed creates the event and the second function sends the event(s)::
+
+  void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
+                      const char *path, unsigned nr_valid_paths)
+
+  void dm_send_uevents(struct list_head *events, struct kobject *kobj)
+
+
+The variables added to the uevent environment are:
+
+Variable Name: DM_TARGET
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description:
+:Value: Name of device-mapper target that generated the event.
+
+Variable Name: DM_ACTION
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description:
+:Value: Device-mapper specific action that caused the uevent action.
+	PATH_FAILED - A path has failed;
+	PATH_REINSTATED - A path has been reinstated.
+
+Variable Name: DM_SEQNUM
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: unsigned integer
+:Description: A sequence number for this specific device-mapper device.
+:Value: Valid unsigned integer range.
+
+Variable Name: DM_PATH
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: Major and minor number of the path device pertaining to this
+	      event.
+:Value: Path name in the form of "Major:Minor"
+
+Variable Name: DM_NR_VALID_PATHS
+--------------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: unsigned integer
+:Description:
+:Value: Valid unsigned integer range.
+
+Variable Name: DM_NAME
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: Name of the device-mapper device.
+:Value: Name
+
+Variable Name: DM_UUID
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: UUID of the device-mapper device.
+:Value: UUID. (Empty string if there isn't one.)
+
+An example of the uevents generated as captured by udevmonitor is shown
+below
+
+1.) Path failure::
+
+	UEVENT[1192521009.711215] change@/block/dm-3
+	ACTION=change
+	DEVPATH=/block/dm-3
+	SUBSYSTEM=block
+	DM_TARGET=multipath
+	DM_ACTION=PATH_FAILED
+	DM_SEQNUM=1
+	DM_PATH=8:32
+	DM_NR_VALID_PATHS=0
+	DM_NAME=mpath2
+	DM_UUID=mpath-35333333000002328
+	MINOR=3
+	MAJOR=253
+	SEQNUM=1130
+
+2.) Path reinstate::
+
+	UEVENT[1192521132.989927] change@/block/dm-3
+	ACTION=change
+	DEVPATH=/block/dm-3
+	SUBSYSTEM=block
+	DM_TARGET=multipath
+	DM_ACTION=PATH_REINSTATED
+	DM_SEQNUM=2
+	DM_PATH=8:32
+	DM_NR_VALID_PATHS=1
+	DM_NAME=mpath2
+	DM_UUID=mpath-35333333000002328
+	MINOR=3
+	MAJOR=253
+	SEQNUM=1131
diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt
deleted file mode 100644
index 07edbd85c714..000000000000
--- a/Documentation/device-mapper/dm-uevent.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-The device-mapper uevent code adds the capability to device-mapper to create
-and send kobject uevents (uevents).  Previously device-mapper events were only
-available through the ioctl interface.  The advantage of the uevents interface
-is the event contains environment attributes providing increased context for
-the event avoiding the need to query the state of the device-mapper device after
-the event is received.
-
-There are two functions currently for device-mapper events.  The first function
-listed creates the event and the second function sends the event(s).
-
-void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
-                    const char *path, unsigned nr_valid_paths)
-
-void dm_send_uevents(struct list_head *events, struct kobject *kobj)
-
-
-The variables added to the uevent environment are:
-
-Variable Name: DM_TARGET
-Uevent Action(s): KOBJ_CHANGE
-Type: string
-Description:
-Value: Name of device-mapper target that generated the event.
-
-Variable Name: DM_ACTION
-Uevent Action(s): KOBJ_CHANGE
-Type: string
-Description:
-Value: Device-mapper specific action that caused the uevent action.
-	PATH_FAILED - A path has failed.
-	PATH_REINSTATED - A path has been reinstated.
-
-Variable Name: DM_SEQNUM
-Uevent Action(s): KOBJ_CHANGE
-Type: unsigned integer
-Description: A sequence number for this specific device-mapper device.
-Value: Valid unsigned integer range.
-
-Variable Name: DM_PATH
-Uevent Action(s): KOBJ_CHANGE
-Type: string
-Description: Major and minor number of the path device pertaining to this
-event.
-Value: Path name in the form of "Major:Minor"
-
-Variable Name: DM_NR_VALID_PATHS
-Uevent Action(s): KOBJ_CHANGE
-Type: unsigned integer
-Description:
-Value: Valid unsigned integer range.
-
-Variable Name: DM_NAME
-Uevent Action(s): KOBJ_CHANGE
-Type: string
-Description: Name of the device-mapper device.
-Value: Name
-
-Variable Name: DM_UUID
-Uevent Action(s): KOBJ_CHANGE
-Type: string
-Description: UUID of the device-mapper device.
-Value: UUID. (Empty string if there isn't one.)
-
-An example of the uevents generated as captured by udevmonitor is shown
-below.
-
-1.) Path failure.
-UEVENT[1192521009.711215] change@/block/dm-3
-ACTION=change
-DEVPATH=/block/dm-3
-SUBSYSTEM=block
-DM_TARGET=multipath
-DM_ACTION=PATH_FAILED
-DM_SEQNUM=1
-DM_PATH=8:32
-DM_NR_VALID_PATHS=0
-DM_NAME=mpath2
-DM_UUID=mpath-35333333000002328
-MINOR=3
-MAJOR=253
-SEQNUM=1130
-
-2.) Path reinstate.
-UEVENT[1192521132.989927] change@/block/dm-3
-ACTION=change
-DEVPATH=/block/dm-3
-SUBSYSTEM=block
-DM_TARGET=multipath
-DM_ACTION=PATH_REINSTATED
-DM_SEQNUM=2
-DM_PATH=8:32
-DM_NR_VALID_PATHS=1
-DM_NAME=mpath2
-DM_UUID=mpath-35333333000002328
-MINOR=3
-MAJOR=253
-SEQNUM=1131
diff --git a/Documentation/device-mapper/dm-zoned.rst b/Documentation/device-mapper/dm-zoned.rst
new file mode 100644
index 000000000000..07f56ebc1730
--- /dev/null
+++ b/Documentation/device-mapper/dm-zoned.rst
@@ -0,0 +1,146 @@
+========
+dm-zoned
+========
+
+The dm-zoned device mapper target exposes a zoned block device (ZBC and
+ZAC compliant devices) as a regular block device without any write
+pattern constraints. In effect, it implements a drive-managed zoned
+block device which hides from the user (a file system or an application
+doing raw block device accesses) the sequential write constraints of
+host-managed zoned block devices and can mitigate the potential
+device-side performance degradation due to excessive random writes on
+host-aware zoned block devices.
+
+For a more detailed description of the zoned block device models and
+their constraints see (for SCSI devices):
+
+http://www.t10.org/drafts.htm#ZBC_Family
+
+and (for ATA devices):
+
+http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
+
+The dm-zoned implementation is simple and minimizes system overhead (CPU
+and memory usage as well as storage capacity loss). For a 10TB
+host-managed disk with 256 MB zones, dm-zoned memory usage per disk
+instance is at most 4.5 MB and as little as 5 zones will be used
+internally for storing metadata and performaing reclaim operations.
+
+dm-zoned target devices are formatted and checked using the dmzadm
+utility available at:
+
+https://github.com/hgst/dm-zoned-tools
+
+Algorithm
+=========
+
+dm-zoned implements an on-disk buffering scheme to handle non-sequential
+write accesses to the sequential zones of a zoned block device.
+Conventional zones are used for caching as well as for storing internal
+metadata.
+
+The zones of the device are separated into 2 types:
+
+1) Metadata zones: these are conventional zones used to store metadata.
+Metadata zones are not reported as useable capacity to the user.
+
+2) Data zones: all remaining zones, the vast majority of which will be
+sequential zones used exclusively to store user data. The conventional
+zones of the device may be used also for buffering user random writes.
+Data in these zones may be directly mapped to the conventional zone, but
+later moved to a sequential zone so that the conventional zone can be
+reused for buffering incoming random writes.
+
+dm-zoned exposes a logical device with a sector size of 4096 bytes,
+irrespective of the physical sector size of the backend zoned block
+device being used. This allows reducing the amount of metadata needed to
+manage valid blocks (blocks written).
+
+The on-disk metadata format is as follows:
+
+1) The first block of the first conventional zone found contains the
+super block which describes the on disk amount and position of metadata
+blocks.
+
+2) Following the super block, a set of blocks is used to describe the
+mapping of the logical device blocks. The mapping is done per chunk of
+blocks, with the chunk size equal to the zoned block device size. The
+mapping table is indexed by chunk number and each mapping entry
+indicates the zone number of the device storing the chunk of data. Each
+mapping entry may also indicate if the zone number of a conventional
+zone used to buffer random modification to the data zone.
+
+3) A set of blocks used to store bitmaps indicating the validity of
+blocks in the data zones follows the mapping table. A valid block is
+defined as a block that was written and not discarded. For a buffered
+data chunk, a block is always valid only in the data zone mapping the
+chunk or in the buffer zone of the chunk.
+
+For a logical chunk mapped to a conventional zone, all write operations
+are processed by directly writing to the zone. If the mapping zone is a
+sequential zone, the write operation is processed directly only if the
+write offset within the logical chunk is equal to the write pointer
+offset within of the sequential data zone (i.e. the write operation is
+aligned on the zone write pointer). Otherwise, write operations are
+processed indirectly using a buffer zone. In that case, an unused
+conventional zone is allocated and assigned to the chunk being
+accessed. Writing a block to the buffer zone of a chunk will
+automatically invalidate the same block in the sequential zone mapping
+the chunk. If all blocks of the sequential zone become invalid, the zone
+is freed and the chunk buffer zone becomes the primary zone mapping the
+chunk, resulting in native random write performance similar to a regular
+block device.
+
+Read operations are processed according to the block validity
+information provided by the bitmaps. Valid blocks are read either from
+the sequential zone mapping a chunk, or if the chunk is buffered, from
+the buffer zone assigned. If the accessed chunk has no mapping, or the
+accessed blocks are invalid, the read buffer is zeroed and the read
+operation terminated.
+
+After some time, the limited number of convnetional zones available may
+be exhausted (all used to map chunks or buffer sequential zones) and
+unaligned writes to unbuffered chunks become impossible. To avoid this
+situation, a reclaim process regularly scans used conventional zones and
+tries to reclaim the least recently used zones by copying the valid
+blocks of the buffer zone to a free sequential zone. Once the copy
+completes, the chunk mapping is updated to point to the sequential zone
+and the buffer zone freed for reuse.
+
+Metadata Protection
+===================
+
+To protect metadata against corruption in case of sudden power loss or
+system crash, 2 sets of metadata zones are used. One set, the primary
+set, is used as the main metadata region, while the secondary set is
+used as a staging area. Modified metadata is first written to the
+secondary set and validated by updating the super block in the secondary
+set, a generation counter is used to indicate that this set contains the
+newest metadata. Once this operation completes, in place of metadata
+block updates can be done in the primary metadata set. This ensures that
+one of the set is always consistent (all modifications committed or none
+at all). Flush operations are used as a commit point. Upon reception of
+a flush request, metadata modification activity is temporarily blocked
+(for both incoming BIO processing and reclaim process) and all dirty
+metadata blocks are staged and updated. Normal operation is then
+resumed. Flushing metadata thus only temporarily delays write and
+discard requests. Read requests can be processed concurrently while
+metadata flush is being executed.
+
+Usage
+=====
+
+A zoned block device must first be formatted using the dmzadm tool. This
+will analyze the device zone configuration, determine where to place the
+metadata sets on the device and initialize the metadata sets.
+
+Ex::
+
+	dmzadm --format /dev/sdxx
+
+For a formatted device, the target can be created normally with the
+dmsetup utility. The only parameter that dm-zoned requires is the
+underlying zoned block device name. Ex::
+
+	echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
+	dmsetup create dmz-`basename ${dev}`
diff --git a/Documentation/device-mapper/dm-zoned.txt b/Documentation/device-mapper/dm-zoned.txt
deleted file mode 100644
index 736fcc78d193..000000000000
--- a/Documentation/device-mapper/dm-zoned.txt
+++ /dev/null
@@ -1,144 +0,0 @@
-dm-zoned
-========
-
-The dm-zoned device mapper target exposes a zoned block device (ZBC and
-ZAC compliant devices) as a regular block device without any write
-pattern constraints. In effect, it implements a drive-managed zoned
-block device which hides from the user (a file system or an application
-doing raw block device accesses) the sequential write constraints of
-host-managed zoned block devices and can mitigate the potential
-device-side performance degradation due to excessive random writes on
-host-aware zoned block devices.
-
-For a more detailed description of the zoned block device models and
-their constraints see (for SCSI devices):
-
-http://www.t10.org/drafts.htm#ZBC_Family
-
-and (for ATA devices):
-
-http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
-
-The dm-zoned implementation is simple and minimizes system overhead (CPU
-and memory usage as well as storage capacity loss). For a 10TB
-host-managed disk with 256 MB zones, dm-zoned memory usage per disk
-instance is at most 4.5 MB and as little as 5 zones will be used
-internally for storing metadata and performaing reclaim operations.
-
-dm-zoned target devices are formatted and checked using the dmzadm
-utility available at:
-
-https://github.com/hgst/dm-zoned-tools
-
-Algorithm
-=========
-
-dm-zoned implements an on-disk buffering scheme to handle non-sequential
-write accesses to the sequential zones of a zoned block device.
-Conventional zones are used for caching as well as for storing internal
-metadata.
-
-The zones of the device are separated into 2 types:
-
-1) Metadata zones: these are conventional zones used to store metadata.
-Metadata zones are not reported as useable capacity to the user.
-
-2) Data zones: all remaining zones, the vast majority of which will be
-sequential zones used exclusively to store user data. The conventional
-zones of the device may be used also for buffering user random writes.
-Data in these zones may be directly mapped to the conventional zone, but
-later moved to a sequential zone so that the conventional zone can be
-reused for buffering incoming random writes.
-
-dm-zoned exposes a logical device with a sector size of 4096 bytes,
-irrespective of the physical sector size of the backend zoned block
-device being used. This allows reducing the amount of metadata needed to
-manage valid blocks (blocks written).
-
-The on-disk metadata format is as follows:
-
-1) The first block of the first conventional zone found contains the
-super block which describes the on disk amount and position of metadata
-blocks.
-
-2) Following the super block, a set of blocks is used to describe the
-mapping of the logical device blocks. The mapping is done per chunk of
-blocks, with the chunk size equal to the zoned block device size. The
-mapping table is indexed by chunk number and each mapping entry
-indicates the zone number of the device storing the chunk of data. Each
-mapping entry may also indicate if the zone number of a conventional
-zone used to buffer random modification to the data zone.
-
-3) A set of blocks used to store bitmaps indicating the validity of
-blocks in the data zones follows the mapping table. A valid block is
-defined as a block that was written and not discarded. For a buffered
-data chunk, a block is always valid only in the data zone mapping the
-chunk or in the buffer zone of the chunk.
-
-For a logical chunk mapped to a conventional zone, all write operations
-are processed by directly writing to the zone. If the mapping zone is a
-sequential zone, the write operation is processed directly only if the
-write offset within the logical chunk is equal to the write pointer
-offset within of the sequential data zone (i.e. the write operation is
-aligned on the zone write pointer). Otherwise, write operations are
-processed indirectly using a buffer zone. In that case, an unused
-conventional zone is allocated and assigned to the chunk being
-accessed. Writing a block to the buffer zone of a chunk will
-automatically invalidate the same block in the sequential zone mapping
-the chunk. If all blocks of the sequential zone become invalid, the zone
-is freed and the chunk buffer zone becomes the primary zone mapping the
-chunk, resulting in native random write performance similar to a regular
-block device.
-
-Read operations are processed according to the block validity
-information provided by the bitmaps. Valid blocks are read either from
-the sequential zone mapping a chunk, or if the chunk is buffered, from
-the buffer zone assigned. If the accessed chunk has no mapping, or the
-accessed blocks are invalid, the read buffer is zeroed and the read
-operation terminated.
-
-After some time, the limited number of convnetional zones available may
-be exhausted (all used to map chunks or buffer sequential zones) and
-unaligned writes to unbuffered chunks become impossible. To avoid this
-situation, a reclaim process regularly scans used conventional zones and
-tries to reclaim the least recently used zones by copying the valid
-blocks of the buffer zone to a free sequential zone. Once the copy
-completes, the chunk mapping is updated to point to the sequential zone
-and the buffer zone freed for reuse.
-
-Metadata Protection
-===================
-
-To protect metadata against corruption in case of sudden power loss or
-system crash, 2 sets of metadata zones are used. One set, the primary
-set, is used as the main metadata region, while the secondary set is
-used as a staging area. Modified metadata is first written to the
-secondary set and validated by updating the super block in the secondary
-set, a generation counter is used to indicate that this set contains the
-newest metadata. Once this operation completes, in place of metadata
-block updates can be done in the primary metadata set. This ensures that
-one of the set is always consistent (all modifications committed or none
-at all). Flush operations are used as a commit point. Upon reception of
-a flush request, metadata modification activity is temporarily blocked
-(for both incoming BIO processing and reclaim process) and all dirty
-metadata blocks are staged and updated. Normal operation is then
-resumed. Flushing metadata thus only temporarily delays write and
-discard requests. Read requests can be processed concurrently while
-metadata flush is being executed.
-
-Usage
-=====
-
-A zoned block device must first be formatted using the dmzadm tool. This
-will analyze the device zone configuration, determine where to place the
-metadata sets on the device and initialize the metadata sets.
-
-Ex:
-
-dmzadm --format /dev/sdxx
-
-For a formatted device, the target can be created normally with the
-dmsetup utility. The only parameter that dm-zoned requires is the
-underlying zoned block device name. Ex:
-
-echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}`
diff --git a/Documentation/device-mapper/era.rst b/Documentation/device-mapper/era.rst
new file mode 100644
index 000000000000..90dd5c670b9f
--- /dev/null
+++ b/Documentation/device-mapper/era.rst
@@ -0,0 +1,116 @@
+======
+dm-era
+======
+
+Introduction
+============
+
+dm-era is a target that behaves similar to the linear target.  In
+addition it keeps track of which blocks were written within a user
+defined period of time called an 'era'.  Each era target instance
+maintains the current era as a monotonically increasing 32-bit
+counter.
+
+Use cases include tracking changed blocks for backup software, and
+partially invalidating the contents of a cache to restore cache
+coherency after rolling back a vendor snapshot.
+
+Constructor
+===========
+
+era <metadata dev> <origin dev> <block size>
+
+ ================ ======================================================
+ metadata dev     fast device holding the persistent metadata
+ origin dev	  device holding data blocks that may change
+ block size       block size of origin data device, granularity that is
+		  tracked by the target
+ ================ ======================================================
+
+Messages
+========
+
+None of the dm messages take any arguments.
+
+checkpoint
+----------
+
+Possibly move to a new era.  You shouldn't assume the era has
+incremented.  After sending this message, you should check the
+current era via the status line.
+
+take_metadata_snap
+------------------
+
+Create a clone of the metadata, to allow a userland process to read it.
+
+drop_metadata_snap
+------------------
+
+Drop the metadata snapshot.
+
+Status
+======
+
+<metadata block size> <#used metadata blocks>/<#total metadata blocks>
+<current era> <held metadata root | '-'>
+
+========================= ==============================================
+metadata block size	  Fixed block size for each metadata block in
+			  sectors
+#used metadata blocks	  Number of metadata blocks used
+#total metadata blocks	  Total number of metadata blocks
+current era		  The current era
+held metadata root	  The location, in blocks, of the metadata root
+			  that has been 'held' for userspace read
+			  access. '-' indicates there is no held root
+========================= ==============================================
+
+Detailed use case
+=================
+
+The scenario of invalidating a cache when rolling back a vendor
+snapshot was the primary use case when developing this target:
+
+Taking a vendor snapshot
+------------------------
+
+- Send a checkpoint message to the era target
+- Make a note of the current era in its status line
+- Take vendor snapshot (the era and snapshot should be forever
+  associated now).
+
+Rolling back to an vendor snapshot
+----------------------------------
+
+- Cache enters passthrough mode (see: dm-cache's docs in cache.txt)
+- Rollback vendor storage
+- Take metadata snapshot
+- Ascertain which blocks have been written since the snapshot was taken
+  by checking each block's era
+- Invalidate those blocks in the caching software
+- Cache returns to writeback/writethrough mode
+
+Memory usage
+============
+
+The target uses a bitset to record writes in the current era.  It also
+has a spare bitset ready for switching over to a new era.  Other than
+that it uses a few 4k blocks for updating metadata::
+
+   (4 * nr_blocks) bytes + buffers
+
+Resilience
+==========
+
+Metadata is updated on disk before a write to a previously unwritten
+block is performed.  As such dm-era should not be effected by a hard
+crash such as power failure.
+
+Userland tools
+==============
+
+Userland tools are found in the increasingly poorly named
+thin-provisioning-tools project:
+
+    https://github.com/jthornber/thin-provisioning-tools
diff --git a/Documentation/device-mapper/era.txt b/Documentation/device-mapper/era.txt
deleted file mode 100644
index 3c6d01be3560..000000000000
--- a/Documentation/device-mapper/era.txt
+++ /dev/null
@@ -1,108 +0,0 @@
-Introduction
-============
-
-dm-era is a target that behaves similar to the linear target.  In
-addition it keeps track of which blocks were written within a user
-defined period of time called an 'era'.  Each era target instance
-maintains the current era as a monotonically increasing 32-bit
-counter.
-
-Use cases include tracking changed blocks for backup software, and
-partially invalidating the contents of a cache to restore cache
-coherency after rolling back a vendor snapshot.
-
-Constructor
-===========
-
- era <metadata dev> <origin dev> <block size>
-
- metadata dev    : fast device holding the persistent metadata
- origin dev	 : device holding data blocks that may change
- block size      : block size of origin data device, granularity that is
-		     tracked by the target
-
-Messages
-========
-
-None of the dm messages take any arguments.
-
-checkpoint
-----------
-
-Possibly move to a new era.  You shouldn't assume the era has
-incremented.  After sending this message, you should check the
-current era via the status line.
-
-take_metadata_snap
-------------------
-
-Create a clone of the metadata, to allow a userland process to read it.
-
-drop_metadata_snap
-------------------
-
-Drop the metadata snapshot.
-
-Status
-======
-
-<metadata block size> <#used metadata blocks>/<#total metadata blocks>
-<current era> <held metadata root | '-'>
-
-metadata block size	 : Fixed block size for each metadata block in
-			     sectors
-#used metadata blocks	 : Number of metadata blocks used
-#total metadata blocks	 : Total number of metadata blocks
-current era		 : The current era
-held metadata root	 : The location, in blocks, of the metadata root
-			     that has been 'held' for userspace read
-			     access. '-' indicates there is no held root
-
-Detailed use case
-=================
-
-The scenario of invalidating a cache when rolling back a vendor
-snapshot was the primary use case when developing this target:
-
-Taking a vendor snapshot
-------------------------
-
-- Send a checkpoint message to the era target
-- Make a note of the current era in its status line
-- Take vendor snapshot (the era and snapshot should be forever
-  associated now).
-
-Rolling back to an vendor snapshot
-----------------------------------
-
-- Cache enters passthrough mode (see: dm-cache's docs in cache.txt)
-- Rollback vendor storage
-- Take metadata snapshot
-- Ascertain which blocks have been written since the snapshot was taken
-  by checking each block's era
-- Invalidate those blocks in the caching software
-- Cache returns to writeback/writethrough mode
-
-Memory usage
-============
-
-The target uses a bitset to record writes in the current era.  It also
-has a spare bitset ready for switching over to a new era.  Other than
-that it uses a few 4k blocks for updating metadata.
-
-   (4 * nr_blocks) bytes + buffers
-
-Resilience
-==========
-
-Metadata is updated on disk before a write to a previously unwritten
-block is performed.  As such dm-era should not be effected by a hard
-crash such as power failure.
-
-Userland tools
-==============
-
-Userland tools are found in the increasingly poorly named
-thin-provisioning-tools project:
-
-    https://github.com/jthornber/thin-provisioning-tools
diff --git a/Documentation/device-mapper/index.rst b/Documentation/device-mapper/index.rst
new file mode 100644
index 000000000000..105e253bc231
--- /dev/null
+++ b/Documentation/device-mapper/index.rst
@@ -0,0 +1,44 @@
+:orphan:
+
+=============
+Device Mapper
+=============
+
+.. toctree::
+    :maxdepth: 1
+
+    cache-policies
+    cache
+    delay
+    dm-crypt
+    dm-flakey
+    dm-init
+    dm-integrity
+    dm-io
+    dm-log
+    dm-queue-length
+    dm-raid
+    dm-service-time
+    dm-uevent
+    dm-zoned
+    era
+    kcopyd
+    linear
+    log-writes
+    persistent-data
+    snapshot
+    statistics
+    striped
+    switch
+    thin-provisioning
+    unstriped
+    verity
+    writecache
+    zero
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/device-mapper/kcopyd.rst b/Documentation/device-mapper/kcopyd.rst
new file mode 100644
index 000000000000..7651d395127f
--- /dev/null
+++ b/Documentation/device-mapper/kcopyd.rst
@@ -0,0 +1,47 @@
+======
+kcopyd
+======
+
+Kcopyd provides the ability to copy a range of sectors from one block-device
+to one or more other block-devices, with an asynchronous completion
+notification. It is used by dm-snapshot and dm-mirror.
+
+Users of kcopyd must first create a client and indicate how many memory pages
+to set aside for their copy jobs. This is done with a call to
+kcopyd_client_create()::
+
+   int kcopyd_client_create(unsigned int num_pages,
+                            struct kcopyd_client **result);
+
+To start a copy job, the user must set up io_region structures to describe
+the source and destinations of the copy. Each io_region indicates a
+block-device along with the starting sector and size of the region. The source
+of the copy is given as one io_region structure, and the destinations of the
+copy are given as an array of io_region structures::
+
+   struct io_region {
+      struct block_device *bdev;
+      sector_t sector;
+      sector_t count;
+   };
+
+To start the copy, the user calls kcopyd_copy(), passing in the client
+pointer, pointers to the source and destination io_regions, the name of a
+completion callback routine, and a pointer to some context data for the copy::
+
+   int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
+                   unsigned int num_dests, struct io_region *dests,
+                   unsigned int flags, kcopyd_notify_fn fn, void *context);
+
+   typedef void (*kcopyd_notify_fn)(int read_err, unsigned int write_err,
+				    void *context);
+
+When the copy completes, kcopyd will call the user's completion routine,
+passing back the user's context pointer. It will also indicate if a read or
+write error occurred during the copy.
+
+When a user is done with all their copy jobs, they should call
+kcopyd_client_destroy() to delete the kcopyd client, which will release the
+associated memory pages::
+
+   void kcopyd_client_destroy(struct kcopyd_client *kc);
diff --git a/Documentation/device-mapper/kcopyd.txt b/Documentation/device-mapper/kcopyd.txt
deleted file mode 100644
index 820382c4cecf..000000000000
--- a/Documentation/device-mapper/kcopyd.txt
+++ /dev/null
@@ -1,47 +0,0 @@
-kcopyd
-======
-
-Kcopyd provides the ability to copy a range of sectors from one block-device
-to one or more other block-devices, with an asynchronous completion
-notification. It is used by dm-snapshot and dm-mirror.
-
-Users of kcopyd must first create a client and indicate how many memory pages
-to set aside for their copy jobs. This is done with a call to
-kcopyd_client_create().
-
-   int kcopyd_client_create(unsigned int num_pages,
-                            struct kcopyd_client **result);
-
-To start a copy job, the user must set up io_region structures to describe
-the source and destinations of the copy. Each io_region indicates a
-block-device along with the starting sector and size of the region. The source
-of the copy is given as one io_region structure, and the destinations of the
-copy are given as an array of io_region structures.
-
-   struct io_region {
-      struct block_device *bdev;
-      sector_t sector;
-      sector_t count;
-   };
-
-To start the copy, the user calls kcopyd_copy(), passing in the client
-pointer, pointers to the source and destination io_regions, the name of a
-completion callback routine, and a pointer to some context data for the copy.
-
-   int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
-                   unsigned int num_dests, struct io_region *dests,
-                   unsigned int flags, kcopyd_notify_fn fn, void *context);
-
-   typedef void (*kcopyd_notify_fn)(int read_err, unsigned int write_err,
-				    void *context);
-
-When the copy completes, kcopyd will call the user's completion routine,
-passing back the user's context pointer. It will also indicate if a read or
-write error occurred during the copy.
-
-When a user is done with all their copy jobs, they should call
-kcopyd_client_destroy() to delete the kcopyd client, which will release the
-associated memory pages.
-
-   void kcopyd_client_destroy(struct kcopyd_client *kc);
-
diff --git a/Documentation/device-mapper/linear.rst b/Documentation/device-mapper/linear.rst
new file mode 100644
index 000000000000..9d17fc6e64a9
--- /dev/null
+++ b/Documentation/device-mapper/linear.rst
@@ -0,0 +1,63 @@
+=========
+dm-linear
+=========
+
+Device-Mapper's "linear" target maps a linear range of the Device-Mapper
+device onto a linear range of another device.  This is the basic building
+block of logical volume managers.
+
+Parameters: <dev path> <offset>
+    <dev path>:
+	Full pathname to the underlying block-device, or a
+        "major:minor" device-number.
+    <offset>:
+	Starting sector within the device.
+
+
+Example scripts
+===============
+
+::
+
+  #!/bin/sh
+  # Create an identity mapping for a device
+  echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
+
+::
+
+  #!/bin/sh
+  # Join 2 devices together
+  size1=`blockdev --getsz $1`
+  size2=`blockdev --getsz $2`
+  echo "0 $size1 linear $1 0
+  $size1 $size2 linear $2 0" | dmsetup create joined
+
+::
+
+  #!/usr/bin/perl -w
+  # Split a device into 4M chunks and then join them together in reverse order.
+
+  my $name = "reverse";
+  my $extent_size = 4 * 1024 * 2;
+  my $dev = $ARGV[0];
+  my $table = "";
+  my $count = 0;
+
+  if (!defined($dev)) {
+          die("Please specify a device.\n");
+  }
+
+  my $dev_size = `blockdev --getsz $dev`;
+  my $extents = int($dev_size / $extent_size) -
+                (($dev_size % $extent_size) ? 1 : 0);
+
+  while ($extents > 0) {
+          my $this_start = $count * $extent_size;
+          $extents--;
+          $count++;
+          my $this_offset = $extents * $extent_size;
+
+          $table .= "$this_start $extent_size linear $dev $this_offset\n";
+  }
+
+  `echo \"$table\" | dmsetup create $name`;
diff --git a/Documentation/device-mapper/linear.txt b/Documentation/device-mapper/linear.txt
deleted file mode 100644
index 7cb98d89d3f8..000000000000
--- a/Documentation/device-mapper/linear.txt
+++ /dev/null
@@ -1,61 +0,0 @@
-dm-linear
-=========
-
-Device-Mapper's "linear" target maps a linear range of the Device-Mapper
-device onto a linear range of another device.  This is the basic building
-block of logical volume managers.
-
-Parameters: <dev path> <offset>
-    <dev path>: Full pathname to the underlying block-device, or a
-                "major:minor" device-number.
-    <offset>: Starting sector within the device.
-
-
-Example scripts
-===============
-[[
-#!/bin/sh
-# Create an identity mapping for a device
-echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
-]]
-
-
-[[
-#!/bin/sh
-# Join 2 devices together
-size1=`blockdev --getsz $1`
-size2=`blockdev --getsz $2`
-echo "0 $size1 linear $1 0
-$size1 $size2 linear $2 0" | dmsetup create joined
-]]
-
-
-[[
-#!/usr/bin/perl -w
-# Split a device into 4M chunks and then join them together in reverse order.
-
-my $name = "reverse";
-my $extent_size = 4 * 1024 * 2;
-my $dev = $ARGV[0];
-my $table = "";
-my $count = 0;
-
-if (!defined($dev)) {
-        die("Please specify a device.\n");
-}
-
-my $dev_size = `blockdev --getsz $dev`;
-my $extents = int($dev_size / $extent_size) -
-              (($dev_size % $extent_size) ? 1 : 0);
-
-while ($extents > 0) {
-        my $this_start = $count * $extent_size;
-        $extents--;
-        $count++;
-        my $this_offset = $extents * $extent_size;
-
-        $table .= "$this_start $extent_size linear $dev $this_offset\n";
-}
-
-`echo \"$table\" | dmsetup create $name`;
-]]
diff --git a/Documentation/device-mapper/log-writes.rst b/Documentation/device-mapper/log-writes.rst
new file mode 100644
index 000000000000..23141f2ffb7c
--- /dev/null
+++ b/Documentation/device-mapper/log-writes.rst
@@ -0,0 +1,145 @@
+=============
+dm-log-writes
+=============
+
+This target takes 2 devices, one to pass all IO to normally, and one to log all
+of the write operations to.  This is intended for file system developers wishing
+to verify the integrity of metadata or data as the file system is written to.
+There is a log_write_entry written for every WRITE request and the target is
+able to take arbitrary data from userspace to insert into the log.  The data
+that is in the WRITE requests is copied into the log to make the replay happen
+exactly as it happened originally.
+
+Log Ordering
+============
+
+We log things in order of completion once we are sure the write is no longer in
+cache.  This means that normal WRITE requests are not actually logged until the
+next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
+the log in a way that correlates to what is on disk and not what is in cache,
+to make it easier to detect improper waiting/flushing.
+
+This works by attaching all WRITE requests to a list once the write completes.
+Once we see a REQ_PREFLUSH request we splice this list onto the request and once
+the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
+completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
+simulate the worst case scenario with regard to power failures.  Consider the
+following example (W means write, C means complete):
+
+	W1,W2,W3,C3,C2,Wflush,C1,Cflush
+
+The log would show the following:
+
+	W3,W2,flush,W1....
+
+Again this is to simulate what is actually on disk, this allows us to detect
+cases where a power failure at a particular point in time would create an
+inconsistent file system.
+
+Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
+they complete as those requests will obviously bypass the device cache.
+
+Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
+have all the DISCARD requests, and then the WRITE requests and then the FLUSH
+request.  Consider the following example:
+
+	WRITE block 1, DISCARD block 1, FLUSH
+
+If we logged DISCARD when it completed, the replay would look like this:
+
+	DISCARD 1, WRITE 1, FLUSH
+
+which isn't quite what happened and wouldn't be caught during the log replay.
+
+Target interface
+================
+
+i) Constructor
+
+   log-writes <dev_path> <log_dev_path>
+
+   ============= ==============================================
+   dev_path	 Device that all of the IO will go to normally.
+   log_dev_path  Device where the log entries are written to.
+   ============= ==============================================
+
+ii) Status
+
+    <#logged entries> <highest allocated sector>
+
+    =========================== ========================
+    #logged entries	        Number of logged entries
+    highest allocated sector    Highest allocated sector
+    =========================== ========================
+
+iii) Messages
+
+    mark <description>
+
+	You can use a dmsetup message to set an arbitrary mark in a log.
+	For example say you want to fsck a file system after every
+	write, but first you need to replay up to the mkfs to make sure
+	we're fsck'ing something reasonable, you would do something like
+	this::
+
+	  mkfs.btrfs -f /dev/mapper/log
+	  dmsetup message log 0 mark mkfs
+	  <run test>
+
+	This would allow you to replay the log up to the mkfs mark and
+	then replay from that point on doing the fsck check in the
+	interval that you want.
+
+	Every log has a mark at the end labeled "dm-log-writes-end".
+
+Userspace component
+===================
+
+There is a userspace tool that will replay the log for you in various ways.
+It can be found here: https://github.com/josefbacik/log-writes
+
+Example usage
+=============
+
+Say you want to test fsync on your file system.  You would do something like
+this::
+
+  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
+  dmsetup create log --table "$TABLE"
+  mkfs.btrfs -f /dev/mapper/log
+  dmsetup message log 0 mark mkfs
+
+  mount /dev/mapper/log /mnt/btrfs-test
+  <some test that does fsync at the end>
+  dmsetup message log 0 mark fsync
+  md5sum /mnt/btrfs-test/foo
+  umount /mnt/btrfs-test
+
+  dmsetup remove log
+  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
+  mount /dev/sdb /mnt/btrfs-test
+  md5sum /mnt/btrfs-test/foo
+  <verify md5sum's are correct>
+
+  Another option is to do a complicated file system operation and verify the file
+  system is consistent during the entire operation.  You could do this with:
+
+  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
+  dmsetup create log --table "$TABLE"
+  mkfs.btrfs -f /dev/mapper/log
+  dmsetup message log 0 mark mkfs
+
+  mount /dev/mapper/log /mnt/btrfs-test
+  <fsstress to dirty the fs>
+  btrfs filesystem balance /mnt/btrfs-test
+  umount /mnt/btrfs-test
+  dmsetup remove log
+
+  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
+  btrfsck /dev/sdb
+  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
+	--fsck "btrfsck /dev/sdb" --check fua
+
+And that will replay the log until it sees a FUA request, run the fsck command
+and if the fsck passes it will replay to the next FUA, until it is completed or
+the fsck command exists abnormally.
diff --git a/Documentation/device-mapper/log-writes.txt b/Documentation/device-mapper/log-writes.txt
deleted file mode 100644
index b638d124be6a..000000000000
--- a/Documentation/device-mapper/log-writes.txt
+++ /dev/null
@@ -1,140 +0,0 @@
-dm-log-writes
-=============
-
-This target takes 2 devices, one to pass all IO to normally, and one to log all
-of the write operations to.  This is intended for file system developers wishing
-to verify the integrity of metadata or data as the file system is written to.
-There is a log_write_entry written for every WRITE request and the target is
-able to take arbitrary data from userspace to insert into the log.  The data
-that is in the WRITE requests is copied into the log to make the replay happen
-exactly as it happened originally.
-
-Log Ordering
-============
-
-We log things in order of completion once we are sure the write is no longer in
-cache.  This means that normal WRITE requests are not actually logged until the
-next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
-the log in a way that correlates to what is on disk and not what is in cache,
-to make it easier to detect improper waiting/flushing.
-
-This works by attaching all WRITE requests to a list once the write completes.
-Once we see a REQ_PREFLUSH request we splice this list onto the request and once
-the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
-completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
-simulate the worst case scenario with regard to power failures.  Consider the
-following example (W means write, C means complete):
-
-W1,W2,W3,C3,C2,Wflush,C1,Cflush
-
-The log would show the following
-
-W3,W2,flush,W1....
-
-Again this is to simulate what is actually on disk, this allows us to detect
-cases where a power failure at a particular point in time would create an
-inconsistent file system.
-
-Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
-they complete as those requests will obviously bypass the device cache.
-
-Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
-have all the DISCARD requests, and then the WRITE requests and then the FLUSH
-request.  Consider the following example:
-
-WRITE block 1, DISCARD block 1, FLUSH
-
-If we logged DISCARD when it completed, the replay would look like this
-
-DISCARD 1, WRITE 1, FLUSH
-
-which isn't quite what happened and wouldn't be caught during the log replay.
-
-Target interface
-================
-
-i) Constructor
-
-   log-writes <dev_path> <log_dev_path>
-
-   dev_path	: Device that all of the IO will go to normally.
-   log_dev_path : Device where the log entries are written to.
-
-ii) Status
-
-    <#logged entries> <highest allocated sector>
-
-    #logged entries	       : Number of logged entries
-    highest allocated sector   : Highest allocated sector
-
-iii) Messages
-
-    mark <description>
-
-	You can use a dmsetup message to set an arbitrary mark in a log.
-	For example say you want to fsck a file system after every
-	write, but first you need to replay up to the mkfs to make sure
-	we're fsck'ing something reasonable, you would do something like
-	this:
-
-	  mkfs.btrfs -f /dev/mapper/log
-	  dmsetup message log 0 mark mkfs
-	  <run test>
-
-	  This would allow you to replay the log up to the mkfs mark and
-	  then replay from that point on doing the fsck check in the
-	  interval that you want.
-
-	Every log has a mark at the end labeled "dm-log-writes-end".
-
-Userspace component
-===================
-
-There is a userspace tool that will replay the log for you in various ways.
-It can be found here: https://github.com/josefbacik/log-writes
-
-Example usage
-=============
-
-Say you want to test fsync on your file system.  You would do something like
-this:
-
-TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
-dmsetup create log --table "$TABLE"
-mkfs.btrfs -f /dev/mapper/log
-dmsetup message log 0 mark mkfs
-
-mount /dev/mapper/log /mnt/btrfs-test
-<some test that does fsync at the end>
-dmsetup message log 0 mark fsync
-md5sum /mnt/btrfs-test/foo
-umount /mnt/btrfs-test
-
-dmsetup remove log
-replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
-mount /dev/sdb /mnt/btrfs-test
-md5sum /mnt/btrfs-test/foo
-<verify md5sum's are correct>
-
-Another option is to do a complicated file system operation and verify the file
-system is consistent during the entire operation.  You could do this with:
-
-TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
-dmsetup create log --table "$TABLE"
-mkfs.btrfs -f /dev/mapper/log
-dmsetup message log 0 mark mkfs
-
-mount /dev/mapper/log /mnt/btrfs-test
-<fsstress to dirty the fs>
-btrfs filesystem balance /mnt/btrfs-test
-umount /mnt/btrfs-test
-dmsetup remove log
-
-replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
-btrfsck /dev/sdb
-replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
-	--fsck "btrfsck /dev/sdb" --check fua
-
-And that will replay the log until it sees a FUA request, run the fsck command
-and if the fsck passes it will replay to the next FUA, until it is completed or
-the fsck command exists abnormally.
diff --git a/Documentation/device-mapper/persistent-data.rst b/Documentation/device-mapper/persistent-data.rst
new file mode 100644
index 000000000000..2065c3c5a091
--- /dev/null
+++ b/Documentation/device-mapper/persistent-data.rst
@@ -0,0 +1,88 @@
+===============
+Persistent data
+===============
+
+Introduction
+============
+
+The more-sophisticated device-mapper targets require complex metadata
+that is managed in kernel.  In late 2010 we were seeing that various
+different targets were rolling their own data structures, for example:
+
+- Mikulas Patocka's multisnap implementation
+- Heinz Mauelshagen's thin provisioning target
+- Another btree-based caching target posted to dm-devel
+- Another multi-snapshot target based on a design of Daniel Phillips
+
+Maintaining these data structures takes a lot of work, so if possible
+we'd like to reduce the number.
+
+The persistent-data library is an attempt to provide a re-usable
+framework for people who want to store metadata in device-mapper
+targets.  It's currently used by the thin-provisioning target and an
+upcoming hierarchical storage target.
+
+Overview
+========
+
+The main documentation is in the header files which can all be found
+under drivers/md/persistent-data.
+
+The block manager
+-----------------
+
+dm-block-manager.[hc]
+
+This provides access to the data on disk in fixed sized-blocks.  There
+is a read/write locking interface to prevent concurrent accesses, and
+keep data that is being used in the cache.
+
+Clients of persistent-data are unlikely to use this directly.
+
+The transaction manager
+-----------------------
+
+dm-transaction-manager.[hc]
+
+This restricts access to blocks and enforces copy-on-write semantics.
+The only way you can get hold of a writable block through the
+transaction manager is by shadowing an existing block (ie. doing
+copy-on-write) or allocating a fresh one.  Shadowing is elided within
+the same transaction so performance is reasonable.  The commit method
+ensures that all data is flushed before it writes the superblock.
+On power failure your metadata will be as it was when last committed.
+
+The Space Maps
+--------------
+
+dm-space-map.h
+dm-space-map-metadata.[hc]
+dm-space-map-disk.[hc]
+
+On-disk data structures that keep track of reference counts of blocks.
+Also acts as the allocator of new blocks.  Currently two
+implementations: a simpler one for managing blocks on a different
+device (eg. thinly-provisioned data blocks); and one for managing
+the metadata space.  The latter is complicated by the need to store
+its own data within the space it's managing.
+
+The data structures
+-------------------
+
+dm-btree.[hc]
+dm-btree-remove.c
+dm-btree-spine.c
+dm-btree-internal.h
+
+Currently there is only one data structure, a hierarchical btree.
+There are plans to add more.  For example, something with an
+array-like interface would see a lot of use.
+
+The btree is 'hierarchical' in that you can define it to be composed
+of nested btrees, and take multiple keys.  For example, the
+thin-provisioning target uses a btree with two levels of nesting.
+The first maps a device id to a mapping tree, and that in turn maps a
+virtual block to a physical block.
+
+Values stored in the btrees can have arbitrary size.  Keys are always
+64bits, although nesting allows you to use multiple keys.
diff --git a/Documentation/device-mapper/persistent-data.txt b/Documentation/device-mapper/persistent-data.txt
deleted file mode 100644
index a333bcb3a6c2..000000000000
--- a/Documentation/device-mapper/persistent-data.txt
+++ /dev/null
@@ -1,84 +0,0 @@
-Introduction
-============
-
-The more-sophisticated device-mapper targets require complex metadata
-that is managed in kernel.  In late 2010 we were seeing that various
-different targets were rolling their own data structures, for example:
-
-- Mikulas Patocka's multisnap implementation
-- Heinz Mauelshagen's thin provisioning target
-- Another btree-based caching target posted to dm-devel
-- Another multi-snapshot target based on a design of Daniel Phillips
-
-Maintaining these data structures takes a lot of work, so if possible
-we'd like to reduce the number.
-
-The persistent-data library is an attempt to provide a re-usable
-framework for people who want to store metadata in device-mapper
-targets.  It's currently used by the thin-provisioning target and an
-upcoming hierarchical storage target.
-
-Overview
-========
-
-The main documentation is in the header files which can all be found
-under drivers/md/persistent-data.
-
-The block manager
------------------
-
-dm-block-manager.[hc]
-
-This provides access to the data on disk in fixed sized-blocks.  There
-is a read/write locking interface to prevent concurrent accesses, and
-keep data that is being used in the cache.
-
-Clients of persistent-data are unlikely to use this directly.
-
-The transaction manager
------------------------
-
-dm-transaction-manager.[hc]
-
-This restricts access to blocks and enforces copy-on-write semantics.
-The only way you can get hold of a writable block through the
-transaction manager is by shadowing an existing block (ie. doing
-copy-on-write) or allocating a fresh one.  Shadowing is elided within
-the same transaction so performance is reasonable.  The commit method
-ensures that all data is flushed before it writes the superblock.
-On power failure your metadata will be as it was when last committed.
-
-The Space Maps
---------------
-
-dm-space-map.h
-dm-space-map-metadata.[hc]
-dm-space-map-disk.[hc]
-
-On-disk data structures that keep track of reference counts of blocks.
-Also acts as the allocator of new blocks.  Currently two
-implementations: a simpler one for managing blocks on a different
-device (eg. thinly-provisioned data blocks); and one for managing
-the metadata space.  The latter is complicated by the need to store
-its own data within the space it's managing.
-
-The data structures
--------------------
-
-dm-btree.[hc]
-dm-btree-remove.c
-dm-btree-spine.c
-dm-btree-internal.h
-
-Currently there is only one data structure, a hierarchical btree.
-There are plans to add more.  For example, something with an
-array-like interface would see a lot of use.
-
-The btree is 'hierarchical' in that you can define it to be composed
-of nested btrees, and take multiple keys.  For example, the
-thin-provisioning target uses a btree with two levels of nesting.
-The first maps a device id to a mapping tree, and that in turn maps a
-virtual block to a physical block.
-
-Values stored in the btrees can have arbitrary size.  Keys are always
-64bits, although nesting allows you to use multiple keys.
diff --git a/Documentation/device-mapper/snapshot.rst b/Documentation/device-mapper/snapshot.rst
new file mode 100644
index 000000000000..4c53304e72f1
--- /dev/null
+++ b/Documentation/device-mapper/snapshot.rst
@@ -0,0 +1,180 @@
+==============================
+Device-mapper snapshot support
+==============================
+
+Device-mapper allows you, without massive data copying:
+
+-  To create snapshots of any block device i.e. mountable, saved states of
+   the block device which are also writable without interfering with the
+   original content;
+-  To create device "forks", i.e. multiple different versions of the
+   same data stream.
+-  To merge a snapshot of a block device back into the snapshot's origin
+   device.
+
+In the first two cases, dm copies only the chunks of data that get
+changed and uses a separate copy-on-write (COW) block device for
+storage.
+
+For snapshot merge the contents of the COW storage are merged back into
+the origin device.
+
+
+There are three dm targets available:
+snapshot, snapshot-origin, and snapshot-merge.
+
+-  snapshot-origin <origin>
+
+which will normally have one or more snapshots based on it.
+Reads will be mapped directly to the backing device. For each write, the
+original data will be saved in the <COW device> of each snapshot to keep
+its visible content unchanged, at least until the <COW device> fills up.
+
+
+-  snapshot <origin> <COW device> <persistent?> <chunksize>
+
+A snapshot of the <origin> block device is created. Changed chunks of
+<chunksize> sectors will be stored on the <COW device>.  Writes will
+only go to the <COW device>.  Reads will come from the <COW device> or
+from <origin> for unchanged data.  <COW device> will often be
+smaller than the origin and if it fills up the snapshot will become
+useless and be disabled, returning errors.  So it is important to monitor
+the amount of free space and expand the <COW device> before it fills up.
+
+<persistent?> is P (Persistent) or N (Not persistent - will not survive
+after reboot).  O (Overflow) can be added as a persistent store option
+to allow userspace to advertise its support for seeing "Overflow" in the
+snapshot status.  So supported store types are "P", "PO" and "N".
+
+The difference between persistent and transient is with transient
+snapshots less metadata must be saved on disk - they can be kept in
+memory by the kernel.
+
+When loading or unloading the snapshot target, the corresponding
+snapshot-origin or snapshot-merge target must be suspended. A failure to
+suspend the origin target could result in data corruption.
+
+
+* snapshot-merge <origin> <COW device> <persistent> <chunksize>
+
+takes the same table arguments as the snapshot target except it only
+works with persistent snapshots.  This target assumes the role of the
+"snapshot-origin" target and must not be loaded if the "snapshot-origin"
+is still present for <origin>.
+
+Creates a merging snapshot that takes control of the changed chunks
+stored in the <COW device> of an existing snapshot, through a handover
+procedure, and merges these chunks back into the <origin>.  Once merging
+has started (in the background) the <origin> may be opened and the merge
+will continue while I/O is flowing to it.  Changes to the <origin> are
+deferred until the merging snapshot's corresponding chunk(s) have been
+merged.  Once merging has started the snapshot device, associated with
+the "snapshot" target, will return -EIO when accessed.
+
+
+How snapshot is used by LVM2
+============================
+When you create the first LVM2 snapshot of a volume, four dm devices are used:
+
+1) a device containing the original mapping table of the source volume;
+2) a device used as the <COW device>;
+3) a "snapshot" device, combining #1 and #2, which is the visible snapshot
+   volume;
+4) the "original" volume (which uses the device number used by the original
+   source volume), whose table is replaced by a "snapshot-origin" mapping
+   from device #1.
+
+A fixed naming scheme is used, so with the following commands::
+
+  lvcreate -L 1G -n base volumeGroup
+  lvcreate -L 100M --snapshot -n snap volumeGroup/base
+
+we'll have this situation (with volumes in above order)::
+
+  # dmsetup table|grep volumeGroup
+
+  volumeGroup-base-real: 0 2097152 linear 8:19 384
+  volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
+  volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
+  volumeGroup-base: 0 2097152 snapshot-origin 254:11
+
+  # ls -lL /dev/mapper/volumeGroup-*
+  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
+  brw-------  1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
+  brw-------  1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
+  brw-------  1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
+
+
+How snapshot-merge is used by LVM2
+==================================
+A merging snapshot assumes the role of the "snapshot-origin" while
+merging.  As such the "snapshot-origin" is replaced with
+"snapshot-merge".  The "-real" device is not changed and the "-cow"
+device is renamed to <origin name>-cow to aid LVM2's cleanup of the
+merging snapshot after it completes.  The "snapshot" that hands over its
+COW device to the "snapshot-merge" is deactivated (unless using lvchange
+--refresh); but if it is left active it will simply return I/O errors.
+
+A snapshot will merge into its origin with the following command::
+
+  lvconvert --merge volumeGroup/snap
+
+we'll now have this situation::
+
+  # dmsetup table|grep volumeGroup
+
+  volumeGroup-base-real: 0 2097152 linear 8:19 384
+  volumeGroup-base-cow: 0 204800 linear 8:19 2097536
+  volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
+
+  # ls -lL /dev/mapper/volumeGroup-*
+  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
+  brw-------  1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
+  brw-------  1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
+
+
+How to determine when a merging is complete
+===========================================
+The snapshot-merge and snapshot status lines end with:
+
+  <sectors_allocated>/<total_sectors> <metadata_sectors>
+
+Both <sectors_allocated> and <total_sectors> include both data and metadata.
+During merging, the number of sectors allocated gets smaller and
+smaller.  Merging has finished when the number of sectors holding data
+is zero, in other words <sectors_allocated> == <metadata_sectors>.
+
+Here is a practical example (using a hybrid of lvm and dmsetup commands)::
+
+  # lvs
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup owi-a- 4.00g
+    snap    volumeGroup swi-a- 1.00g base  18.97
+
+  # dmsetup status volumeGroup-snap
+  0 8388608 snapshot 397896/2097152 1560
+                                    ^^^^ metadata sectors
+
+  # lvconvert --merge -b volumeGroup/snap
+    Merging of volume snap started.
+
+  # lvs volumeGroup/snap
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup Owi-a- 4.00g          17.23
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 281688/2097152 1104
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 180480/2097152 712
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 16/2097152 16
+
+Merging has finished.
+
+::
+
+  # lvs
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup owi-a- 4.00g
diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt
deleted file mode 100644
index b8bbb516f989..000000000000
--- a/Documentation/device-mapper/snapshot.txt
+++ /dev/null
@@ -1,176 +0,0 @@
-Device-mapper snapshot support
-==============================
-
-Device-mapper allows you, without massive data copying:
-
-*) To create snapshots of any block device i.e. mountable, saved states of
-the block device which are also writable without interfering with the
-original content;
-*) To create device "forks", i.e. multiple different versions of the
-same data stream.
-*) To merge a snapshot of a block device back into the snapshot's origin
-device.
-
-In the first two cases, dm copies only the chunks of data that get
-changed and uses a separate copy-on-write (COW) block device for
-storage.
-
-For snapshot merge the contents of the COW storage are merged back into
-the origin device.
-
-
-There are three dm targets available:
-snapshot, snapshot-origin, and snapshot-merge.
-
-*) snapshot-origin <origin>
-
-which will normally have one or more snapshots based on it.
-Reads will be mapped directly to the backing device. For each write, the
-original data will be saved in the <COW device> of each snapshot to keep
-its visible content unchanged, at least until the <COW device> fills up.
-
-
-*) snapshot <origin> <COW device> <persistent?> <chunksize>
-
-A snapshot of the <origin> block device is created. Changed chunks of
-<chunksize> sectors will be stored on the <COW device>.  Writes will
-only go to the <COW device>.  Reads will come from the <COW device> or
-from <origin> for unchanged data.  <COW device> will often be
-smaller than the origin and if it fills up the snapshot will become
-useless and be disabled, returning errors.  So it is important to monitor
-the amount of free space and expand the <COW device> before it fills up.
-
-<persistent?> is P (Persistent) or N (Not persistent - will not survive
-after reboot).  O (Overflow) can be added as a persistent store option
-to allow userspace to advertise its support for seeing "Overflow" in the
-snapshot status.  So supported store types are "P", "PO" and "N".
-
-The difference between persistent and transient is with transient
-snapshots less metadata must be saved on disk - they can be kept in
-memory by the kernel.
-
-When loading or unloading the snapshot target, the corresponding
-snapshot-origin or snapshot-merge target must be suspended. A failure to
-suspend the origin target could result in data corruption.
-
-
-* snapshot-merge <origin> <COW device> <persistent> <chunksize>
-
-takes the same table arguments as the snapshot target except it only
-works with persistent snapshots.  This target assumes the role of the
-"snapshot-origin" target and must not be loaded if the "snapshot-origin"
-is still present for <origin>.
-
-Creates a merging snapshot that takes control of the changed chunks
-stored in the <COW device> of an existing snapshot, through a handover
-procedure, and merges these chunks back into the <origin>.  Once merging
-has started (in the background) the <origin> may be opened and the merge
-will continue while I/O is flowing to it.  Changes to the <origin> are
-deferred until the merging snapshot's corresponding chunk(s) have been
-merged.  Once merging has started the snapshot device, associated with
-the "snapshot" target, will return -EIO when accessed.
-
-
-How snapshot is used by LVM2
-============================
-When you create the first LVM2 snapshot of a volume, four dm devices are used:
-
-1) a device containing the original mapping table of the source volume;
-2) a device used as the <COW device>;
-3) a "snapshot" device, combining #1 and #2, which is the visible snapshot
-   volume;
-4) the "original" volume (which uses the device number used by the original
-   source volume), whose table is replaced by a "snapshot-origin" mapping
-   from device #1.
-
-A fixed naming scheme is used, so with the following commands:
-
-lvcreate -L 1G -n base volumeGroup
-lvcreate -L 100M --snapshot -n snap volumeGroup/base
-
-we'll have this situation (with volumes in above order):
-
-# dmsetup table|grep volumeGroup
-
-volumeGroup-base-real: 0 2097152 linear 8:19 384
-volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
-volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
-volumeGroup-base: 0 2097152 snapshot-origin 254:11
-
-# ls -lL /dev/mapper/volumeGroup-*
-brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
-brw-------  1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
-brw-------  1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
-brw-------  1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
-
-
-How snapshot-merge is used by LVM2
-==================================
-A merging snapshot assumes the role of the "snapshot-origin" while
-merging.  As such the "snapshot-origin" is replaced with
-"snapshot-merge".  The "-real" device is not changed and the "-cow"
-device is renamed to <origin name>-cow to aid LVM2's cleanup of the
-merging snapshot after it completes.  The "snapshot" that hands over its
-COW device to the "snapshot-merge" is deactivated (unless using lvchange
---refresh); but if it is left active it will simply return I/O errors.
-
-A snapshot will merge into its origin with the following command:
-
-lvconvert --merge volumeGroup/snap
-
-we'll now have this situation:
-
-# dmsetup table|grep volumeGroup
-
-volumeGroup-base-real: 0 2097152 linear 8:19 384
-volumeGroup-base-cow: 0 204800 linear 8:19 2097536
-volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
-
-# ls -lL /dev/mapper/volumeGroup-*
-brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
-brw-------  1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
-brw-------  1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
-
-
-How to determine when a merging is complete
-===========================================
-The snapshot-merge and snapshot status lines end with:
-  <sectors_allocated>/<total_sectors> <metadata_sectors>
-
-Both <sectors_allocated> and <total_sectors> include both data and metadata.
-During merging, the number of sectors allocated gets smaller and
-smaller.  Merging has finished when the number of sectors holding data
-is zero, in other words <sectors_allocated> == <metadata_sectors>.
-
-Here is a practical example (using a hybrid of lvm and dmsetup commands):
-
-# lvs
-  LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-  base    volumeGroup owi-a- 4.00g
-  snap    volumeGroup swi-a- 1.00g base  18.97
-
-# dmsetup status volumeGroup-snap
-0 8388608 snapshot 397896/2097152 1560
-                                  ^^^^ metadata sectors
-
-# lvconvert --merge -b volumeGroup/snap
-  Merging of volume snap started.
-
-# lvs volumeGroup/snap
-  LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-  base    volumeGroup Owi-a- 4.00g          17.23
-
-# dmsetup status volumeGroup-base
-0 8388608 snapshot-merge 281688/2097152 1104
-
-# dmsetup status volumeGroup-base
-0 8388608 snapshot-merge 180480/2097152 712
-
-# dmsetup status volumeGroup-base
-0 8388608 snapshot-merge 16/2097152 16
-
-Merging has finished.
-
-# lvs
-  LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-  base    volumeGroup owi-a- 4.00g
diff --git a/Documentation/device-mapper/statistics.rst b/Documentation/device-mapper/statistics.rst
new file mode 100644
index 000000000000..3d80a9f850cc
--- /dev/null
+++ b/Documentation/device-mapper/statistics.rst
@@ -0,0 +1,225 @@
+=============
+DM statistics
+=============
+
+Device Mapper supports the collection of I/O statistics on user-defined
+regions of a DM device.	 If no regions are defined no statistics are
+collected so there isn't any performance impact.  Only bio-based DM
+devices are currently supported.
+
+Each user-defined region specifies a starting sector, length and step.
+Individual statistics will be collected for each step-sized area within
+the range specified.
+
+The I/O statistics counters for each step-sized area of a region are
+in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
+Documentation/iostats.txt).  But two extra counters (12 and 13) are
+provided: total time spent reading and writing.  When the histogram
+argument is used, the 14th parameter is reported that represents the
+histogram of latencies.  All these counters may be accessed by sending
+the @stats_print message to the appropriate DM device via dmsetup.
+
+The reported times are in milliseconds and the granularity depends on
+the kernel ticks.  When the option precise_timestamps is used, the
+reported times are in nanoseconds.
+
+Each region has a corresponding unique identifier, which we call a
+region_id, that is assigned when the region is created.	 The region_id
+must be supplied when querying statistics about the region, deleting the
+region, etc.  Unique region_ids enable multiple userspace programs to
+request and process statistics for the same DM device without stepping
+on each other's data.
+
+The creation of DM statistics will allocate memory via kmalloc or
+fallback to using vmalloc space.  At most, 1/4 of the overall system
+memory may be allocated by DM statistics.  The admin can see how much
+memory is used by reading:
+
+	/sys/module/dm_mod/parameters/stats_current_allocated_bytes
+
+Messages
+========
+
+    @stats_create <range> <step> [<number_of_optional_arguments> <optional_arguments>...] [<program_id> [<aux_data>]]
+	Create a new region and return the region_id.
+
+	<range>
+	  "-"
+		whole device
+	  "<start_sector>+<length>"
+		a range of <length> 512-byte sectors
+		starting with <start_sector>.
+
+	<step>
+	  "<area_size>"
+		the range is subdivided into areas each containing
+		<area_size> sectors.
+	  "/<number_of_areas>"
+		the range is subdivided into the specified
+		number of areas.
+
+	<number_of_optional_arguments>
+	  The number of optional arguments
+
+	<optional_arguments>
+	  The following optional arguments are supported:
+
+	  precise_timestamps
+		use precise timer with nanosecond resolution
+		instead of the "jiffies" variable.  When this argument is
+		used, the resulting times are in nanoseconds instead of
+		milliseconds.  Precise timestamps are a little bit slower
+		to obtain than jiffies-based timestamps.
+	  histogram:n1,n2,n3,n4,...
+		collect histogram of latencies.  The
+		numbers n1, n2, etc are times that represent the boundaries
+		of the histogram.  If precise_timestamps is not used, the
+		times are in milliseconds, otherwise they are in
+		nanoseconds.  For each range, the kernel will report the
+		number of requests that completed within this range. For
+		example, if we use "histogram:10,20,30", the kernel will
+		report four numbers a:b:c:d. a is the number of requests
+		that took 0-10 ms to complete, b is the number of requests
+		that took 10-20 ms to complete, c is the number of requests
+		that took 20-30 ms to complete and d is the number of
+		requests that took more than 30 ms to complete.
+
+	<program_id>
+	  An optional parameter.  A name that uniquely identifies
+	  the userspace owner of the range.  This groups ranges together
+	  so that userspace programs can identify the ranges they
+	  created and ignore those created by others.
+	  The kernel returns this string back in the output of
+	  @stats_list message, but it doesn't use it for anything else.
+	  If we omit the number of optional arguments, program id must not
+	  be a number, otherwise it would be interpreted as the number of
+	  optional arguments.
+
+	<aux_data>
+	  An optional parameter.  A word that provides auxiliary data
+	  that is useful to the client program that created the range.
+	  The kernel returns this string back in the output of
+	  @stats_list message, but it doesn't use this value for anything.
+
+    @stats_delete <region_id>
+	Delete the region with the specified id.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+    @stats_clear <region_id>
+	Clear all the counters except the in-flight i/o counters.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+    @stats_list [<program_id>]
+	List all regions registered with @stats_create.
+
+	<program_id>
+	  An optional parameter.
+	  If this parameter is specified, only matching regions
+	  are returned.
+	  If it is not specified, all regions are returned.
+
+	Output format:
+	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
+	        precise_timestamps histogram:n1,n2,n3,...
+
+	The strings "precise_timestamps" and "histogram" are printed only
+	if they were specified when creating the region.
+
+    @stats_print <region_id> [<starting_line> <number_of_lines>]
+	Print counters for each step-sized area of a region.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<starting_line>
+	  The index of the starting line in the output.
+	  If omitted, all lines are returned.
+
+	<number_of_lines>
+	  The number of lines to include in the output.
+	  If omitted, all lines are returned.
+
+	Output format for each step-sized area of a region:
+
+	  <start_sector>+<length>
+		counters
+
+	  The first 11 counters have the same meaning as
+	  `/sys/block/*/stat or /proc/diskstats`.
+
+	  Please refer to Documentation/iostats.txt for details.
+
+	  1. the number of reads completed
+	  2. the number of reads merged
+	  3. the number of sectors read
+	  4. the number of milliseconds spent reading
+	  5. the number of writes completed
+	  6. the number of writes merged
+	  7. the number of sectors written
+	  8. the number of milliseconds spent writing
+	  9. the number of I/Os currently in progress
+	  10. the number of milliseconds spent doing I/Os
+	  11. the weighted number of milliseconds spent doing I/Os
+
+	  Additional counters:
+
+	  12. the total time spent reading in milliseconds
+	  13. the total time spent writing in milliseconds
+
+    @stats_print_clear <region_id> [<starting_line> <number_of_lines>]
+	Atomically print and then clear all the counters except the
+	in-flight i/o counters.	 Useful when the client consuming the
+	statistics does not want to lose any statistics (those updated
+	between printing and clearing).
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<starting_line>
+	  The index of the starting line in the output.
+	  If omitted, all lines are printed and then cleared.
+
+	<number_of_lines>
+	  The number of lines to process.
+	  If omitted, all lines are printed and then cleared.
+
+    @stats_set_aux <region_id> <aux_data>
+	Store auxiliary data aux_data for the specified region.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<aux_data>
+	  The string that identifies data which is useful to the client
+	  program that created the range.  The kernel returns this
+	  string back in the output of @stats_list message, but it
+	  doesn't use this value for anything.
+
+Examples
+========
+
+Subdivide the DM device 'vol' into 100 pieces and start collecting
+statistics on them::
+
+  dmsetup message vol 0 @stats_create - /100
+
+Set the auxiliary data string to "foo bar baz" (the escape for each
+space must also be escaped, otherwise the shell will consume them)::
+
+  dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
+
+List the statistics::
+
+  dmsetup message vol 0 @stats_list
+
+Print the statistics::
+
+  dmsetup message vol 0 @stats_print 0
+
+Delete the statistics::
+
+  dmsetup message vol 0 @stats_delete 0
diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.txt
deleted file mode 100644
index 170ac02a1f50..000000000000
--- a/Documentation/device-mapper/statistics.txt
+++ /dev/null
@@ -1,223 +0,0 @@
-DM statistics
-=============
-
-Device Mapper supports the collection of I/O statistics on user-defined
-regions of a DM device.	 If no regions are defined no statistics are
-collected so there isn't any performance impact.  Only bio-based DM
-devices are currently supported.
-
-Each user-defined region specifies a starting sector, length and step.
-Individual statistics will be collected for each step-sized area within
-the range specified.
-
-The I/O statistics counters for each step-sized area of a region are
-in the same format as /sys/block/*/stat or /proc/diskstats (see:
-Documentation/iostats.txt).  But two extra counters (12 and 13) are
-provided: total time spent reading and writing.  When the histogram
-argument is used, the 14th parameter is reported that represents the
-histogram of latencies.  All these counters may be accessed by sending
-the @stats_print message to the appropriate DM device via dmsetup.
-
-The reported times are in milliseconds and the granularity depends on
-the kernel ticks.  When the option precise_timestamps is used, the
-reported times are in nanoseconds.
-
-Each region has a corresponding unique identifier, which we call a
-region_id, that is assigned when the region is created.	 The region_id
-must be supplied when querying statistics about the region, deleting the
-region, etc.  Unique region_ids enable multiple userspace programs to
-request and process statistics for the same DM device without stepping
-on each other's data.
-
-The creation of DM statistics will allocate memory via kmalloc or
-fallback to using vmalloc space.  At most, 1/4 of the overall system
-memory may be allocated by DM statistics.  The admin can see how much
-memory is used by reading
-/sys/module/dm_mod/parameters/stats_current_allocated_bytes
-
-Messages
-========
-
-    @stats_create <range> <step>
-		[<number_of_optional_arguments> <optional_arguments>...]
-		[<program_id> [<aux_data>]]
-
-	Create a new region and return the region_id.
-
-	<range>
-	  "-" - whole device
-	  "<start_sector>+<length>" - a range of <length> 512-byte sectors
-				      starting with <start_sector>.
-
-	<step>
-	  "<area_size>" - the range is subdivided into areas each containing
-			  <area_size> sectors.
-	  "/<number_of_areas>" - the range is subdivided into the specified
-				 number of areas.
-
-	<number_of_optional_arguments>
-	  The number of optional arguments
-
-	<optional_arguments>
-	  The following optional arguments are supported
-	  precise_timestamps - use precise timer with nanosecond resolution
-		instead of the "jiffies" variable.  When this argument is
-		used, the resulting times are in nanoseconds instead of
-		milliseconds.  Precise timestamps are a little bit slower
-		to obtain than jiffies-based timestamps.
-	  histogram:n1,n2,n3,n4,... - collect histogram of latencies.  The
-		numbers n1, n2, etc are times that represent the boundaries
-		of the histogram.  If precise_timestamps is not used, the
-		times are in milliseconds, otherwise they are in
-		nanoseconds.  For each range, the kernel will report the
-		number of requests that completed within this range. For
-		example, if we use "histogram:10,20,30", the kernel will
-		report four numbers a:b:c:d. a is the number of requests
-		that took 0-10 ms to complete, b is the number of requests
-		that took 10-20 ms to complete, c is the number of requests
-		that took 20-30 ms to complete and d is the number of
-		requests that took more than 30 ms to complete.
-
-	<program_id>
-	  An optional parameter.  A name that uniquely identifies
-	  the userspace owner of the range.  This groups ranges together
-	  so that userspace programs can identify the ranges they
-	  created and ignore those created by others.
-	  The kernel returns this string back in the output of
-	  @stats_list message, but it doesn't use it for anything else.
-	  If we omit the number of optional arguments, program id must not
-	  be a number, otherwise it would be interpreted as the number of
-	  optional arguments.
-
-	<aux_data>
-	  An optional parameter.  A word that provides auxiliary data
-	  that is useful to the client program that created the range.
-	  The kernel returns this string back in the output of
-	  @stats_list message, but it doesn't use this value for anything.
-
-    @stats_delete <region_id>
-
-	Delete the region with the specified id.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-    @stats_clear <region_id>
-
-	Clear all the counters except the in-flight i/o counters.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-    @stats_list [<program_id>]
-
-	List all regions registered with @stats_create.
-
-	<program_id>
-	  An optional parameter.
-	  If this parameter is specified, only matching regions
-	  are returned.
-	  If it is not specified, all regions are returned.
-
-	Output format:
-	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
-	        precise_timestamps histogram:n1,n2,n3,...
-
-	The strings "precise_timestamps" and "histogram" are printed only
-	if they were specified when creating the region.
-
-    @stats_print <region_id> [<starting_line> <number_of_lines>]
-
-	Print counters for each step-sized area of a region.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<starting_line>
-	  The index of the starting line in the output.
-	  If omitted, all lines are returned.
-
-	<number_of_lines>
-	  The number of lines to include in the output.
-	  If omitted, all lines are returned.
-
-	Output format for each step-sized area of a region:
-
-	  <start_sector>+<length> counters
-
-	  The first 11 counters have the same meaning as
-	  /sys/block/*/stat or /proc/diskstats.
-
-	  Please refer to Documentation/iostats.txt for details.
-
-	  1. the number of reads completed
-	  2. the number of reads merged
-	  3. the number of sectors read
-	  4. the number of milliseconds spent reading
-	  5. the number of writes completed
-	  6. the number of writes merged
-	  7. the number of sectors written
-	  8. the number of milliseconds spent writing
-	  9. the number of I/Os currently in progress
-	  10. the number of milliseconds spent doing I/Os
-	  11. the weighted number of milliseconds spent doing I/Os
-
-	  Additional counters:
-	  12. the total time spent reading in milliseconds
-	  13. the total time spent writing in milliseconds
-
-    @stats_print_clear <region_id> [<starting_line> <number_of_lines>]
-
-	Atomically print and then clear all the counters except the
-	in-flight i/o counters.	 Useful when the client consuming the
-	statistics does not want to lose any statistics (those updated
-	between printing and clearing).
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<starting_line>
-	  The index of the starting line in the output.
-	  If omitted, all lines are printed and then cleared.
-
-	<number_of_lines>
-	  The number of lines to process.
-	  If omitted, all lines are printed and then cleared.
-
-    @stats_set_aux <region_id> <aux_data>
-
-	Store auxiliary data aux_data for the specified region.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<aux_data>
-	  The string that identifies data which is useful to the client
-	  program that created the range.  The kernel returns this
-	  string back in the output of @stats_list message, but it
-	  doesn't use this value for anything.
-
-Examples
-========
-
-Subdivide the DM device 'vol' into 100 pieces and start collecting
-statistics on them:
-
-  dmsetup message vol 0 @stats_create - /100
-
-Set the auxiliary data string to "foo bar baz" (the escape for each
-space must also be escaped, otherwise the shell will consume them):
-
-  dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
-
-List the statistics:
-
-  dmsetup message vol 0 @stats_list
-
-Print the statistics:
-
-  dmsetup message vol 0 @stats_print 0
-
-Delete the statistics:
-
-  dmsetup message vol 0 @stats_delete 0
diff --git a/Documentation/device-mapper/striped.rst b/Documentation/device-mapper/striped.rst
new file mode 100644
index 000000000000..e9a8da192ae1
--- /dev/null
+++ b/Documentation/device-mapper/striped.rst
@@ -0,0 +1,61 @@
+=========
+dm-stripe
+=========
+
+Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
+device across one or more underlying devices. Data is written in "chunks",
+with consecutive chunks rotating among the underlying devices. This can
+potentially provide improved I/O throughput by utilizing several physical
+devices in parallel.
+
+Parameters: <num devs> <chunk size> [<dev path> <offset>]+
+    <num devs>:
+	Number of underlying devices.
+    <chunk size>:
+	Size of each chunk of data. Must be at least as
+        large as the system's PAGE_SIZE.
+    <dev path>:
+	Full pathname to the underlying block-device, or a
+	"major:minor" device-number.
+    <offset>:
+	Starting sector within the device.
+
+One or more underlying devices can be specified. The striped device size must
+be a multiple of the chunk size multiplied by the number of underlying devices.
+
+
+Example scripts
+===============
+
+::
+
+  #!/usr/bin/perl -w
+  # Create a striped device across any number of underlying devices. The device
+  # will be called "stripe_dev" and have a chunk-size of 128k.
+
+  my $chunk_size = 128 * 2;
+  my $dev_name = "stripe_dev";
+  my $num_devs = @ARGV;
+  my @devs = @ARGV;
+  my ($min_dev_size, $stripe_dev_size, $i);
+
+  if (!$num_devs) {
+          die("Specify at least one device\n");
+  }
+
+  $min_dev_size = `blockdev --getsz $devs[0]`;
+  for ($i = 1; $i < $num_devs; $i++) {
+          my $this_size = `blockdev --getsz $devs[$i]`;
+          $min_dev_size = ($min_dev_size < $this_size) ?
+                          $min_dev_size : $this_size;
+  }
+
+  $stripe_dev_size = $min_dev_size * $num_devs;
+  $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
+
+  $table = "0 $stripe_dev_size striped $num_devs $chunk_size";
+  for ($i = 0; $i < $num_devs; $i++) {
+          $table .= " $devs[$i] 0";
+  }
+
+  `echo $table | dmsetup create $dev_name`;
diff --git a/Documentation/device-mapper/striped.txt b/Documentation/device-mapper/striped.txt
deleted file mode 100644
index 07ec492cceee..000000000000
--- a/Documentation/device-mapper/striped.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-dm-stripe
-=========
-
-Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
-device across one or more underlying devices. Data is written in "chunks",
-with consecutive chunks rotating among the underlying devices. This can
-potentially provide improved I/O throughput by utilizing several physical
-devices in parallel.
-
-Parameters: <num devs> <chunk size> [<dev path> <offset>]+
-    <num devs>: Number of underlying devices.
-    <chunk size>: Size of each chunk of data. Must be at least as
-                  large as the system's PAGE_SIZE.
-    <dev path>: Full pathname to the underlying block-device, or a
-                "major:minor" device-number.
-    <offset>: Starting sector within the device.
-
-One or more underlying devices can be specified. The striped device size must
-be a multiple of the chunk size multiplied by the number of underlying devices.
-
-
-Example scripts
-===============
-
-[[
-#!/usr/bin/perl -w
-# Create a striped device across any number of underlying devices. The device
-# will be called "stripe_dev" and have a chunk-size of 128k.
-
-my $chunk_size = 128 * 2;
-my $dev_name = "stripe_dev";
-my $num_devs = @ARGV;
-my @devs = @ARGV;
-my ($min_dev_size, $stripe_dev_size, $i);
-
-if (!$num_devs) {
-        die("Specify at least one device\n");
-}
-
-$min_dev_size = `blockdev --getsz $devs[0]`;
-for ($i = 1; $i < $num_devs; $i++) {
-        my $this_size = `blockdev --getsz $devs[$i]`;
-        $min_dev_size = ($min_dev_size < $this_size) ?
-                        $min_dev_size : $this_size;
-}
-
-$stripe_dev_size = $min_dev_size * $num_devs;
-$stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
-
-$table = "0 $stripe_dev_size striped $num_devs $chunk_size";
-for ($i = 0; $i < $num_devs; $i++) {
-        $table .= " $devs[$i] 0";
-}
-
-`echo $table | dmsetup create $dev_name`;
-]]
-
diff --git a/Documentation/device-mapper/switch.rst b/Documentation/device-mapper/switch.rst
new file mode 100644
index 000000000000..7dde06be1a4f
--- /dev/null
+++ b/Documentation/device-mapper/switch.rst
@@ -0,0 +1,141 @@
+=========
+dm-switch
+=========
+
+The device-mapper switch target creates a device that supports an
+arbitrary mapping of fixed-size regions of I/O across a fixed set of
+paths.  The path used for any specific region can be switched
+dynamically by sending the target a message.
+
+It maps I/O to underlying block devices efficiently when there is a large
+number of fixed-sized address regions but there is no simple pattern
+that would allow for a compact representation of the mapping such as
+dm-stripe.
+
+Background
+----------
+
+Dell EqualLogic and some other iSCSI storage arrays use a distributed
+frameless architecture.  In this architecture, the storage group
+consists of a number of distinct storage arrays ("members") each having
+independent controllers, disk storage and network adapters.  When a LUN
+is created it is spread across multiple members.  The details of the
+spreading are hidden from initiators connected to this storage system.
+The storage group exposes a single target discovery portal, no matter
+how many members are being used.  When iSCSI sessions are created, each
+session is connected to an eth port on a single member.  Data to a LUN
+can be sent on any iSCSI session, and if the blocks being accessed are
+stored on another member the I/O will be forwarded as required.  This
+forwarding is invisible to the initiator.  The storage layout is also
+dynamic, and the blocks stored on disk may be moved from member to
+member as needed to balance the load.
+
+This architecture simplifies the management and configuration of both
+the storage group and initiators.  In a multipathing configuration, it
+is possible to set up multiple iSCSI sessions to use multiple network
+interfaces on both the host and target to take advantage of the
+increased network bandwidth.  An initiator could use a simple round
+robin algorithm to send I/O across all paths and let the storage array
+members forward it as necessary, but there is a performance advantage to
+sending data directly to the correct member.
+
+A device-mapper table already lets you map different regions of a
+device onto different targets.  However in this architecture the LUN is
+spread with an address region size on the order of 10s of MBs, which
+means the resulting table could have more than a million entries and
+consume far too much memory.
+
+Using this device-mapper switch target we can now build a two-layer
+device hierarchy:
+
+    Upper Tier - Determine which array member the I/O should be sent to.
+    Lower Tier - Load balance amongst paths to a particular member.
+
+The lower tier consists of a single dm multipath device for each member.
+Each of these multipath devices contains the set of paths directly to
+the array member in one priority group, and leverages existing path
+selectors to load balance amongst these paths.  We also build a
+non-preferred priority group containing paths to other array members for
+failover reasons.
+
+The upper tier consists of a single dm-switch device.  This device uses
+a bitmap to look up the location of the I/O and choose the appropriate
+lower tier device to route the I/O.  By using a bitmap we are able to
+use 4 bits for each address range in a 16 member group (which is very
+large for us).  This is a much denser representation than the dm table
+b-tree can achieve.
+
+Construction Parameters
+=======================
+
+    <num_paths> <region_size> <num_optional_args> [<optional_args>...] [<dev_path> <offset>]+
+	<num_paths>
+	    The number of paths across which to distribute the I/O.
+
+	<region_size>
+	    The number of 512-byte sectors in a region. Each region can be redirected
+	    to any of the available paths.
+
+	<num_optional_args>
+	    The number of optional arguments. Currently, no optional arguments
+	    are supported and so this must be zero.
+
+	<dev_path>
+	    The block device that represents a specific path to the device.
+
+	<offset>
+	    The offset of the start of data on the specific <dev_path> (in units
+	    of 512-byte sectors). This number is added to the sector number when
+	    forwarding the request to the specific path. Typically it is zero.
+
+Messages
+========
+
+set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
+
+Modify the region table by specifying which regions are redirected to
+which paths.
+
+<index>
+    The region number (region size was specified in constructor parameters).
+    If index is omitted, the next region (previous index + 1) is used.
+    Expressed in hexadecimal (WITHOUT any prefix like 0x).
+
+<path_nr>
+    The path number in the range 0 ... (<num_paths> - 1).
+    Expressed in hexadecimal (WITHOUT any prefix like 0x).
+
+R<n>,<m>
+    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
+    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
+    slots.
+
+Status
+======
+
+No status line is reported.
+
+Example
+=======
+
+Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
+the same size.
+
+Create a switch device with 64kB region size::
+
+    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
+	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
+
+Set mappings for the first 7 entries to point to devices switch0, switch1,
+switch2, switch0, switch1, switch2, switch1::
+
+    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
+
+Set repetitive mapping. This command::
+
+    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
+
+is equivalent to::
+
+    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
+	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
diff --git a/Documentation/device-mapper/switch.txt b/Documentation/device-mapper/switch.txt
deleted file mode 100644
index 5bd4831db4a8..000000000000
--- a/Documentation/device-mapper/switch.txt
+++ /dev/null
@@ -1,138 +0,0 @@
-dm-switch
-=========
-
-The device-mapper switch target creates a device that supports an
-arbitrary mapping of fixed-size regions of I/O across a fixed set of
-paths.  The path used for any specific region can be switched
-dynamically by sending the target a message.
-
-It maps I/O to underlying block devices efficiently when there is a large
-number of fixed-sized address regions but there is no simple pattern
-that would allow for a compact representation of the mapping such as
-dm-stripe.
-
-Background
-----------
-
-Dell EqualLogic and some other iSCSI storage arrays use a distributed
-frameless architecture.  In this architecture, the storage group
-consists of a number of distinct storage arrays ("members") each having
-independent controllers, disk storage and network adapters.  When a LUN
-is created it is spread across multiple members.  The details of the
-spreading are hidden from initiators connected to this storage system.
-The storage group exposes a single target discovery portal, no matter
-how many members are being used.  When iSCSI sessions are created, each
-session is connected to an eth port on a single member.  Data to a LUN
-can be sent on any iSCSI session, and if the blocks being accessed are
-stored on another member the I/O will be forwarded as required.  This
-forwarding is invisible to the initiator.  The storage layout is also
-dynamic, and the blocks stored on disk may be moved from member to
-member as needed to balance the load.
-
-This architecture simplifies the management and configuration of both
-the storage group and initiators.  In a multipathing configuration, it
-is possible to set up multiple iSCSI sessions to use multiple network
-interfaces on both the host and target to take advantage of the
-increased network bandwidth.  An initiator could use a simple round
-robin algorithm to send I/O across all paths and let the storage array
-members forward it as necessary, but there is a performance advantage to
-sending data directly to the correct member.
-
-A device-mapper table already lets you map different regions of a
-device onto different targets.  However in this architecture the LUN is
-spread with an address region size on the order of 10s of MBs, which
-means the resulting table could have more than a million entries and
-consume far too much memory.
-
-Using this device-mapper switch target we can now build a two-layer
-device hierarchy:
-
-    Upper Tier - Determine which array member the I/O should be sent to.
-    Lower Tier - Load balance amongst paths to a particular member.
-
-The lower tier consists of a single dm multipath device for each member.
-Each of these multipath devices contains the set of paths directly to
-the array member in one priority group, and leverages existing path
-selectors to load balance amongst these paths.  We also build a
-non-preferred priority group containing paths to other array members for
-failover reasons.
-
-The upper tier consists of a single dm-switch device.  This device uses
-a bitmap to look up the location of the I/O and choose the appropriate
-lower tier device to route the I/O.  By using a bitmap we are able to
-use 4 bits for each address range in a 16 member group (which is very
-large for us).  This is a much denser representation than the dm table
-b-tree can achieve.
-
-Construction Parameters
-=======================
-
-    <num_paths> <region_size> <num_optional_args> [<optional_args>...]
-    [<dev_path> <offset>]+
-
-<num_paths>
-    The number of paths across which to distribute the I/O.
-
-<region_size>
-    The number of 512-byte sectors in a region. Each region can be redirected
-    to any of the available paths.
-
-<num_optional_args>
-    The number of optional arguments. Currently, no optional arguments
-    are supported and so this must be zero.
-
-<dev_path>
-    The block device that represents a specific path to the device.
-
-<offset>
-    The offset of the start of data on the specific <dev_path> (in units
-    of 512-byte sectors). This number is added to the sector number when
-    forwarding the request to the specific path. Typically it is zero.
-
-Messages
-========
-
-set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
-
-Modify the region table by specifying which regions are redirected to
-which paths.
-
-<index>
-    The region number (region size was specified in constructor parameters).
-    If index is omitted, the next region (previous index + 1) is used.
-    Expressed in hexadecimal (WITHOUT any prefix like 0x).
-
-<path_nr>
-    The path number in the range 0 ... (<num_paths> - 1).
-    Expressed in hexadecimal (WITHOUT any prefix like 0x).
-
-R<n>,<m>
-    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
-    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
-    slots.
-
-Status
-======
-
-No status line is reported.
-
-Example
-=======
-
-Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
-the same size.
-
-Create a switch device with 64kB region size:
-    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
-	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
-
-Set mappings for the first 7 entries to point to devices switch0, switch1,
-switch2, switch0, switch1, switch2, switch1:
-    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
-
-Set repetitive mapping. This command:
-    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
-is equivalent to:
-    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
-	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
-
diff --git a/Documentation/device-mapper/thin-provisioning.rst b/Documentation/device-mapper/thin-provisioning.rst
new file mode 100644
index 000000000000..bafebf79da4b
--- /dev/null
+++ b/Documentation/device-mapper/thin-provisioning.rst
@@ -0,0 +1,427 @@
+=================
+Thin provisioning
+=================
+
+Introduction
+============
+
+This document describes a collection of device-mapper targets that
+between them implement thin-provisioning and snapshots.
+
+The main highlight of this implementation, compared to the previous
+implementation of snapshots, is that it allows many virtual devices to
+be stored on the same data volume.  This simplifies administration and
+allows the sharing of data between volumes, thus reducing disk usage.
+
+Another significant feature is support for an arbitrary depth of
+recursive snapshots (snapshots of snapshots of snapshots ...).  The
+previous implementation of snapshots did this by chaining together
+lookup tables, and so performance was O(depth).  This new
+implementation uses a single data structure to avoid this degradation
+with depth.  Fragmentation may still be an issue, however, in some
+scenarios.
+
+Metadata is stored on a separate device from data, giving the
+administrator some freedom, for example to:
+
+- Improve metadata resilience by storing metadata on a mirrored volume
+  but data on a non-mirrored one.
+
+- Improve performance by storing the metadata on SSD.
+
+Status
+======
+
+These targets are considered safe for production use.  But different use
+cases will have different performance characteristics, for example due
+to fragmentation of the data volume.
+
+If you find this software is not performing as expected please mail
+dm-devel@redhat.com with details and we'll try our best to improve
+things for you.
+
+Userspace tools for checking and repairing the metadata have been fully
+developed and are available as 'thin_check' and 'thin_repair'.  The name
+of the package that provides these utilities varies by distribution (on
+a Red Hat distribution it is named 'device-mapper-persistent-data').
+
+Cookbook
+========
+
+This section describes some quick recipes for using thin provisioning.
+They use the dmsetup program to control the device-mapper driver
+directly.  End users will be advised to use a higher-level volume
+manager such as LVM2 once support has been added.
+
+Pool device
+-----------
+
+The pool device ties together the metadata volume and the data volume.
+It maps I/O linearly to the data volume and updates the metadata via
+two mechanisms:
+
+- Function calls from the thin targets
+
+- Device-mapper 'messages' from userspace which control the creation of new
+  virtual devices amongst other things.
+
+Setting up a fresh pool device
+------------------------------
+
+Setting up a pool device requires a valid metadata device, and a
+data device.  If you do not have an existing metadata device you can
+make one by zeroing the first 4k to indicate empty metadata.
+
+    dd if=/dev/zero of=$metadata_dev bs=4096 count=1
+
+The amount of metadata you need will vary according to how many blocks
+are shared between thin devices (i.e. through snapshots).  If you have
+less sharing than average you'll need a larger-than-average metadata device.
+
+As a guide, we suggest you calculate the number of bytes to use in the
+metadata device as 48 * $data_dev_size / $data_block_size but round it up
+to 2MB if the answer is smaller.  If you're creating large numbers of
+snapshots which are recording large amounts of change, you may find you
+need to increase this.
+
+The largest size supported is 16GB: If the device is larger,
+a warning will be issued and the excess space will not be used.
+
+Reloading a pool table
+----------------------
+
+You may reload a pool's table, indeed this is how the pool is resized
+if it runs out of space.  (N.B. While specifying a different metadata
+device when reloading is not forbidden at the moment, things will go
+wrong if it does not route I/O to exactly the same on-disk location as
+previously.)
+
+Using an existing pool device
+-----------------------------
+
+::
+
+    dmsetup create pool \
+	--table "0 20971520 thin-pool $metadata_dev $data_dev \
+		 $data_block_size $low_water_mark"
+
+$data_block_size gives the smallest unit of disk space that can be
+allocated at a time expressed in units of 512-byte sectors.
+$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a
+multiple of 128 (64KB).  $data_block_size cannot be changed after the
+thin-pool is created.  People primarily interested in thin provisioning
+may want to use a value such as 1024 (512KB).  People doing lots of
+snapshotting may want a smaller value such as 128 (64KB).  If you are
+not zeroing newly-allocated data, a larger $data_block_size in the
+region of 256000 (128MB) is suggested.
+
+$low_water_mark is expressed in blocks of size $data_block_size.  If
+free space on the data device drops below this level then a dm event
+will be triggered which a userspace daemon should catch allowing it to
+extend the pool device.  Only one such event will be sent.
+
+No special event is triggered if a just resumed device's free space is below
+the low water mark. However, resuming a device always triggers an
+event; a userspace daemon should verify that free space exceeds the low
+water mark when handling this event.
+
+A low water mark for the metadata device is maintained in the kernel and
+will trigger a dm event if free space on the metadata device drops below
+it.
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the thin-provisioning target behaves like a physical disk that has
+a volatile write cache.  If power is lost you may lose some recent
+writes.  The metadata should always be consistent in spite of any crash.
+
+If data space is exhausted the pool will either error or queue IO
+according to the configuration (see: error_if_no_space).  If metadata
+space is exhausted or a metadata operation fails: the pool will error IO
+until the pool is taken offline and repair is performed to 1) fix any
+potential inconsistencies and 2) clear the flag that imposes repair.
+Once the pool's metadata device is repaired it may be resized, which
+will allow the pool to return to normal operation.  Note that if a pool
+is flagged as needing repair, the pool's data and metadata devices
+cannot be resized until repair is performed.  It should also be noted
+that when the pool's metadata space is exhausted the current metadata
+transaction is aborted.  Given that the pool will cache IO whose
+completion may have already been acknowledged to upper IO layers
+(e.g. filesystem) it is strongly suggested that consistency checks
+(e.g. fsck) be performed on those layers when repair of the pool is
+required.
+
+Thin provisioning
+-----------------
+
+i) Creating a new thinly-provisioned volume.
+
+  To create a new thinly- provisioned volume you must send a message to an
+  active pool device, /dev/mapper/pool in this example::
+
+    dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+  Here '0' is an identifier for the volume, a 24-bit number.  It's up
+  to the caller to allocate and manage these identifiers.  If the
+  identifier is already in use, the message will fail with -EEXIST.
+
+ii) Using a thinly-provisioned volume.
+
+  Thinly-provisioned volumes are activated using the 'thin' target::
+
+    dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
+
+  The last parameter is the identifier for the thinp device.
+
+Internal snapshots
+------------------
+
+i) Creating an internal snapshot.
+
+  Snapshots are created with another message to the pool.
+
+  N.B.  If the origin device that you wish to snapshot is active, you
+  must suspend it before creating the snapshot to avoid corruption.
+  This is NOT enforced at the moment, so please be careful!
+
+  ::
+
+    dmsetup suspend /dev/mapper/thin
+    dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
+    dmsetup resume /dev/mapper/thin
+
+  Here '1' is the identifier for the volume, a 24-bit number.  '0' is the
+  identifier for the origin device.
+
+ii) Using an internal snapshot.
+
+  Once created, the user doesn't have to worry about any connection
+  between the origin and the snapshot.  Indeed the snapshot is no
+  different from any other thinly-provisioned device and can be
+  snapshotted itself via the same method.  It's perfectly legal to
+  have only one of them active, and there's no ordering requirement on
+  activating or removing them both.  (This differs from conventional
+  device-mapper snapshots.)
+
+  Activate it exactly the same way as any other thinly-provisioned volume::
+
+    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
+
+External snapshots
+------------------
+
+You can use an external **read only** device as an origin for a
+thinly-provisioned volume.  Any read to an unprovisioned area of the
+thin device will be passed through to the origin.  Writes trigger
+the allocation of new blocks as usual.
+
+One use case for this is VM hosts that want to run guests on
+thinly-provisioned volumes but have the base image on another device
+(possibly shared between many VMs).
+
+You must not write to the origin device if you use this technique!
+Of course, you may write to the thin device and take internal snapshots
+of the thin volume.
+
+i) Creating a snapshot of an external device
+
+  This is the same as creating a thin device.
+  You don't mention the origin at this stage.
+
+  ::
+
+    dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+ii) Using a snapshot of an external device.
+
+  Append an extra parameter to the thin target specifying the origin::
+
+    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
+
+  N.B. All descendants (internal snapshots) of this snapshot require the
+  same extra origin parameter.
+
+Deactivation
+------------
+
+All devices using a pool must be deactivated before the pool itself
+can be.
+
+::
+
+    dmsetup remove thin
+    dmsetup remove snap
+    dmsetup remove pool
+
+Reference
+=========
+
+'thin-pool' target
+------------------
+
+i) Constructor
+
+    ::
+
+      thin-pool <metadata dev> <data dev> <data block size (sectors)> \
+	        <low water mark (blocks)> [<number of feature args> [<arg>]*]
+
+    Optional feature arguments:
+
+      skip_block_zeroing:
+	Skip the zeroing of newly-provisioned blocks.
+
+      ignore_discard:
+	Disable discard support.
+
+      no_discard_passdown:
+	Don't pass discards down to the underlying
+	data device, but just remove the mapping.
+
+      read_only:
+		 Don't allow any changes to be made to the pool
+		 metadata.  This mode is only available after the
+		 thin-pool has been created and first used in full
+		 read/write mode.  It cannot be specified on initial
+		 thin-pool creation.
+
+      error_if_no_space:
+	Error IOs, instead of queueing, if no space.
+
+    Data block size must be between 64KB (128 sectors) and 1GB
+    (2097152 sectors) inclusive.
+
+
+ii) Status
+
+    ::
+
+      <transaction id> <used metadata blocks>/<total metadata blocks>
+      <used data blocks>/<total data blocks> <held metadata root>
+      ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
+      needs_check|- metadata_low_watermark
+
+    transaction id:
+	A 64-bit number used by userspace to help synchronise with metadata
+	from volume managers.
+
+    used data blocks / total data blocks
+	If the number of free blocks drops below the pool's low water mark a
+	dm event will be sent to userspace.  This event is edge-triggered and
+	it will occur only once after each resume so volume manager writers
+	should register for the event and then check the target's status.
+
+    held metadata root:
+	The location, in blocks, of the metadata root that has been
+	'held' for userspace read access.  '-' indicates there is no
+	held root.
+
+    discard_passdown|no_discard_passdown
+	Whether or not discards are actually being passed down to the
+	underlying device.  When this is enabled when loading the table,
+	it can get disabled if the underlying device doesn't support it.
+
+    ro|rw|out_of_data_space
+	If the pool encounters certain types of device failures it will
+	drop into a read-only metadata mode in which no changes to
+	the pool metadata (like allocating new blocks) are permitted.
+
+	In serious cases where even a read-only mode is deemed unsafe
+	no further I/O will be permitted and the status will just
+	contain the string 'Fail'.  The userspace recovery tools
+	should then be used.
+
+    error_if_no_space|queue_if_no_space
+	If the pool runs out of data or metadata space, the pool will
+	either queue or error the IO destined to the data device.  The
+	default is to queue the IO until more space is added or the
+	'no_space_timeout' expires.  The 'no_space_timeout' dm-thin-pool
+	module parameter can be used to change this timeout -- it
+	defaults to 60 seconds but may be disabled using a value of 0.
+
+    needs_check
+	A metadata operation has failed, resulting in the needs_check
+	flag being set in the metadata's superblock.  The metadata
+	device must be deactivated and checked/repaired before the
+	thin-pool can be made fully operational again.  '-' indicates
+	needs_check is not set.
+
+    metadata_low_watermark:
+	Value of metadata low watermark in blocks.  The kernel sets this
+	value internally but userspace needs to know this value to
+	determine if an event was caused by crossing this threshold.
+
+iii) Messages
+
+    create_thin <dev id>
+	Create a new thinly-provisioned device.
+	<dev id> is an arbitrary unique 24-bit identifier chosen by
+	the caller.
+
+    create_snap <dev id> <origin id>
+	Create a new snapshot of another thinly-provisioned device.
+	<dev id> is an arbitrary unique 24-bit identifier chosen by
+	the caller.
+	<origin id> is the identifier of the thinly-provisioned device
+	of which the new device will be a snapshot.
+
+    delete <dev id>
+	Deletes a thin device.  Irreversible.
+
+    set_transaction_id <current id> <new id>
+	Userland volume managers, such as LVM, need a way to
+	synchronise their external metadata with the internal metadata of the
+	pool target.  The thin-pool target offers to store an
+	arbitrary 64-bit transaction id and return it on the target's
+	status line.  To avoid races you must provide what you think
+	the current transaction id is when you change it with this
+	compare-and-swap message.
+
+    reserve_metadata_snap
+        Reserve a copy of the data mapping btree for use by userland.
+        This allows userland to inspect the mappings as they were when
+        this message was executed.  Use the pool's status command to
+        get the root block associated with the metadata snapshot.
+
+    release_metadata_snap
+        Release a previously reserved copy of the data mapping btree.
+
+'thin' target
+-------------
+
+i) Constructor
+
+    ::
+
+        thin <pool dev> <dev id> [<external origin dev>]
+
+    pool dev:
+	the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
+
+    dev id:
+	the internal device identifier of the device to be
+	activated.
+
+    external origin dev:
+	an optional block device outside the pool to be treated as a
+	read-only snapshot origin: reads to unprovisioned areas of the
+	thin target will be mapped to this device.
+
+The pool doesn't store any size against the thin devices.  If you
+load a thin target that is smaller than you've been using previously,
+then you'll have no access to blocks mapped beyond the end.  If you
+load a target that is bigger than before, then extra blocks will be
+provisioned as and when needed.
+
+ii) Status
+
+    <nr mapped sectors> <highest mapped sector>
+	If the pool has encountered device errors and failed, the status
+	will just contain the string 'Fail'.  The userspace recovery
+	tools should then be used.
+
+    In the case where <nr mapped sectors> is 0, there is no highest
+    mapped sector and the value of <highest mapped sector> is unspecified.
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
deleted file mode 100644
index 883e7ca5f745..000000000000
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ /dev/null
@@ -1,411 +0,0 @@
-Introduction
-============
-
-This document describes a collection of device-mapper targets that
-between them implement thin-provisioning and snapshots.
-
-The main highlight of this implementation, compared to the previous
-implementation of snapshots, is that it allows many virtual devices to
-be stored on the same data volume.  This simplifies administration and
-allows the sharing of data between volumes, thus reducing disk usage.
-
-Another significant feature is support for an arbitrary depth of
-recursive snapshots (snapshots of snapshots of snapshots ...).  The
-previous implementation of snapshots did this by chaining together
-lookup tables, and so performance was O(depth).  This new
-implementation uses a single data structure to avoid this degradation
-with depth.  Fragmentation may still be an issue, however, in some
-scenarios.
-
-Metadata is stored on a separate device from data, giving the
-administrator some freedom, for example to:
-
-- Improve metadata resilience by storing metadata on a mirrored volume
-  but data on a non-mirrored one.
-
-- Improve performance by storing the metadata on SSD.
-
-Status
-======
-
-These targets are considered safe for production use.  But different use
-cases will have different performance characteristics, for example due
-to fragmentation of the data volume.
-
-If you find this software is not performing as expected please mail
-dm-devel@redhat.com with details and we'll try our best to improve
-things for you.
-
-Userspace tools for checking and repairing the metadata have been fully
-developed and are available as 'thin_check' and 'thin_repair'.  The name
-of the package that provides these utilities varies by distribution (on
-a Red Hat distribution it is named 'device-mapper-persistent-data').
-
-Cookbook
-========
-
-This section describes some quick recipes for using thin provisioning.
-They use the dmsetup program to control the device-mapper driver
-directly.  End users will be advised to use a higher-level volume
-manager such as LVM2 once support has been added.
-
-Pool device
------------
-
-The pool device ties together the metadata volume and the data volume.
-It maps I/O linearly to the data volume and updates the metadata via
-two mechanisms:
-
-- Function calls from the thin targets
-
-- Device-mapper 'messages' from userspace which control the creation of new
-  virtual devices amongst other things.
-
-Setting up a fresh pool device
-------------------------------
-
-Setting up a pool device requires a valid metadata device, and a
-data device.  If you do not have an existing metadata device you can
-make one by zeroing the first 4k to indicate empty metadata.
-
-    dd if=/dev/zero of=$metadata_dev bs=4096 count=1
-
-The amount of metadata you need will vary according to how many blocks
-are shared between thin devices (i.e. through snapshots).  If you have
-less sharing than average you'll need a larger-than-average metadata device.
-
-As a guide, we suggest you calculate the number of bytes to use in the
-metadata device as 48 * $data_dev_size / $data_block_size but round it up
-to 2MB if the answer is smaller.  If you're creating large numbers of
-snapshots which are recording large amounts of change, you may find you
-need to increase this.
-
-The largest size supported is 16GB: If the device is larger,
-a warning will be issued and the excess space will not be used.
-
-Reloading a pool table
-----------------------
-
-You may reload a pool's table, indeed this is how the pool is resized
-if it runs out of space.  (N.B. While specifying a different metadata
-device when reloading is not forbidden at the moment, things will go
-wrong if it does not route I/O to exactly the same on-disk location as
-previously.)
-
-Using an existing pool device
------------------------------
-
-    dmsetup create pool \
-	--table "0 20971520 thin-pool $metadata_dev $data_dev \
-		 $data_block_size $low_water_mark"
-
-$data_block_size gives the smallest unit of disk space that can be
-allocated at a time expressed in units of 512-byte sectors.
-$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a
-multiple of 128 (64KB).  $data_block_size cannot be changed after the
-thin-pool is created.  People primarily interested in thin provisioning
-may want to use a value such as 1024 (512KB).  People doing lots of
-snapshotting may want a smaller value such as 128 (64KB).  If you are
-not zeroing newly-allocated data, a larger $data_block_size in the
-region of 256000 (128MB) is suggested.
-
-$low_water_mark is expressed in blocks of size $data_block_size.  If
-free space on the data device drops below this level then a dm event
-will be triggered which a userspace daemon should catch allowing it to
-extend the pool device.  Only one such event will be sent.
-
-No special event is triggered if a just resumed device's free space is below
-the low water mark. However, resuming a device always triggers an
-event; a userspace daemon should verify that free space exceeds the low
-water mark when handling this event.
-
-A low water mark for the metadata device is maintained in the kernel and
-will trigger a dm event if free space on the metadata device drops below
-it.
-
-Updating on-disk metadata
--------------------------
-
-On-disk metadata is committed every time a FLUSH or FUA bio is written.
-If no such requests are made then commits will occur every second.  This
-means the thin-provisioning target behaves like a physical disk that has
-a volatile write cache.  If power is lost you may lose some recent
-writes.  The metadata should always be consistent in spite of any crash.
-
-If data space is exhausted the pool will either error or queue IO
-according to the configuration (see: error_if_no_space).  If metadata
-space is exhausted or a metadata operation fails: the pool will error IO
-until the pool is taken offline and repair is performed to 1) fix any
-potential inconsistencies and 2) clear the flag that imposes repair.
-Once the pool's metadata device is repaired it may be resized, which
-will allow the pool to return to normal operation.  Note that if a pool
-is flagged as needing repair, the pool's data and metadata devices
-cannot be resized until repair is performed.  It should also be noted
-that when the pool's metadata space is exhausted the current metadata
-transaction is aborted.  Given that the pool will cache IO whose
-completion may have already been acknowledged to upper IO layers
-(e.g. filesystem) it is strongly suggested that consistency checks
-(e.g. fsck) be performed on those layers when repair of the pool is
-required.
-
-Thin provisioning
------------------
-
-i) Creating a new thinly-provisioned volume.
-
-  To create a new thinly- provisioned volume you must send a message to an
-  active pool device, /dev/mapper/pool in this example.
-
-    dmsetup message /dev/mapper/pool 0 "create_thin 0"
-
-  Here '0' is an identifier for the volume, a 24-bit number.  It's up
-  to the caller to allocate and manage these identifiers.  If the
-  identifier is already in use, the message will fail with -EEXIST.
-
-ii) Using a thinly-provisioned volume.
-
-  Thinly-provisioned volumes are activated using the 'thin' target:
-
-    dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
-
-  The last parameter is the identifier for the thinp device.
-
-Internal snapshots
-------------------
-
-i) Creating an internal snapshot.
-
-  Snapshots are created with another message to the pool.
-
-  N.B.  If the origin device that you wish to snapshot is active, you
-  must suspend it before creating the snapshot to avoid corruption.
-  This is NOT enforced at the moment, so please be careful!
-
-    dmsetup suspend /dev/mapper/thin
-    dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
-    dmsetup resume /dev/mapper/thin
-
-  Here '1' is the identifier for the volume, a 24-bit number.  '0' is the
-  identifier for the origin device.
-
-ii) Using an internal snapshot.
-
-  Once created, the user doesn't have to worry about any connection
-  between the origin and the snapshot.  Indeed the snapshot is no
-  different from any other thinly-provisioned device and can be
-  snapshotted itself via the same method.  It's perfectly legal to
-  have only one of them active, and there's no ordering requirement on
-  activating or removing them both.  (This differs from conventional
-  device-mapper snapshots.)
-
-  Activate it exactly the same way as any other thinly-provisioned volume:
-
-    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
-
-External snapshots
-------------------
-
-You can use an external _read only_ device as an origin for a
-thinly-provisioned volume.  Any read to an unprovisioned area of the
-thin device will be passed through to the origin.  Writes trigger
-the allocation of new blocks as usual.
-
-One use case for this is VM hosts that want to run guests on
-thinly-provisioned volumes but have the base image on another device
-(possibly shared between many VMs).
-
-You must not write to the origin device if you use this technique!
-Of course, you may write to the thin device and take internal snapshots
-of the thin volume.
-
-i) Creating a snapshot of an external device
-
-  This is the same as creating a thin device.
-  You don't mention the origin at this stage.
-
-    dmsetup message /dev/mapper/pool 0 "create_thin 0"
-
-ii) Using a snapshot of an external device.
-
-  Append an extra parameter to the thin target specifying the origin:
-
-    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
-
-  N.B. All descendants (internal snapshots) of this snapshot require the
-  same extra origin parameter.
-
-Deactivation
-------------
-
-All devices using a pool must be deactivated before the pool itself
-can be.
-
-    dmsetup remove thin
-    dmsetup remove snap
-    dmsetup remove pool
-
-Reference
-=========
-
-'thin-pool' target
-------------------
-
-i) Constructor
-
-    thin-pool <metadata dev> <data dev> <data block size (sectors)> \
-	      <low water mark (blocks)> [<number of feature args> [<arg>]*]
-
-    Optional feature arguments:
-
-      skip_block_zeroing: Skip the zeroing of newly-provisioned blocks.
-
-      ignore_discard: Disable discard support.
-
-      no_discard_passdown: Don't pass discards down to the underlying
-			   data device, but just remove the mapping.
-
-      read_only: Don't allow any changes to be made to the pool
-		 metadata.  This mode is only available after the
-		 thin-pool has been created and first used in full
-		 read/write mode.  It cannot be specified on initial
-		 thin-pool creation.
-
-      error_if_no_space: Error IOs, instead of queueing, if no space.
-
-    Data block size must be between 64KB (128 sectors) and 1GB
-    (2097152 sectors) inclusive.
-
-
-ii) Status
-
-    <transaction id> <used metadata blocks>/<total metadata blocks>
-    <used data blocks>/<total data blocks> <held metadata root>
-    ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
-    needs_check|- metadata_low_watermark
-
-    transaction id:
-	A 64-bit number used by userspace to help synchronise with metadata
-	from volume managers.
-
-    used data blocks / total data blocks
-	If the number of free blocks drops below the pool's low water mark a
-	dm event will be sent to userspace.  This event is edge-triggered and
-	it will occur only once after each resume so volume manager writers
-	should register for the event and then check the target's status.
-
-    held metadata root:
-	The location, in blocks, of the metadata root that has been
-	'held' for userspace read access.  '-' indicates there is no
-	held root.
-
-    discard_passdown|no_discard_passdown
-	Whether or not discards are actually being passed down to the
-	underlying device.  When this is enabled when loading the table,
-	it can get disabled if the underlying device doesn't support it.
-
-    ro|rw|out_of_data_space
-	If the pool encounters certain types of device failures it will
-	drop into a read-only metadata mode in which no changes to
-	the pool metadata (like allocating new blocks) are permitted.
-
-	In serious cases where even a read-only mode is deemed unsafe
-	no further I/O will be permitted and the status will just
-	contain the string 'Fail'.  The userspace recovery tools
-	should then be used.
-
-    error_if_no_space|queue_if_no_space
-	If the pool runs out of data or metadata space, the pool will
-	either queue or error the IO destined to the data device.  The
-	default is to queue the IO until more space is added or the
-	'no_space_timeout' expires.  The 'no_space_timeout' dm-thin-pool
-	module parameter can be used to change this timeout -- it
-	defaults to 60 seconds but may be disabled using a value of 0.
-
-    needs_check
-	A metadata operation has failed, resulting in the needs_check
-	flag being set in the metadata's superblock.  The metadata
-	device must be deactivated and checked/repaired before the
-	thin-pool can be made fully operational again.  '-' indicates
-	needs_check is not set.
-
-    metadata_low_watermark:
-	Value of metadata low watermark in blocks.  The kernel sets this
-	value internally but userspace needs to know this value to
-	determine if an event was caused by crossing this threshold.
-
-iii) Messages
-
-    create_thin <dev id>
-
-	Create a new thinly-provisioned device.
-	<dev id> is an arbitrary unique 24-bit identifier chosen by
-	the caller.
-
-    create_snap <dev id> <origin id>
-
-	Create a new snapshot of another thinly-provisioned device.
-	<dev id> is an arbitrary unique 24-bit identifier chosen by
-	the caller.
-	<origin id> is the identifier of the thinly-provisioned device
-	of which the new device will be a snapshot.
-
-    delete <dev id>
-
-	Deletes a thin device.  Irreversible.
-
-    set_transaction_id <current id> <new id>
-
-	Userland volume managers, such as LVM, need a way to
-	synchronise their external metadata with the internal metadata of the
-	pool target.  The thin-pool target offers to store an
-	arbitrary 64-bit transaction id and return it on the target's
-	status line.  To avoid races you must provide what you think
-	the current transaction id is when you change it with this
-	compare-and-swap message.
-
-    reserve_metadata_snap
-
-        Reserve a copy of the data mapping btree for use by userland.
-        This allows userland to inspect the mappings as they were when
-        this message was executed.  Use the pool's status command to
-        get the root block associated with the metadata snapshot.
-
-    release_metadata_snap
-
-        Release a previously reserved copy of the data mapping btree.
-
-'thin' target
--------------
-
-i) Constructor
-
-    thin <pool dev> <dev id> [<external origin dev>]
-
-    pool dev:
-	the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
-
-    dev id:
-	the internal device identifier of the device to be
-	activated.
-
-    external origin dev:
-	an optional block device outside the pool to be treated as a
-	read-only snapshot origin: reads to unprovisioned areas of the
-	thin target will be mapped to this device.
-
-The pool doesn't store any size against the thin devices.  If you
-load a thin target that is smaller than you've been using previously,
-then you'll have no access to blocks mapped beyond the end.  If you
-load a target that is bigger than before, then extra blocks will be
-provisioned as and when needed.
-
-ii) Status
-
-     <nr mapped sectors> <highest mapped sector>
-
-	If the pool has encountered device errors and failed, the status
-	will just contain the string 'Fail'.  The userspace recovery
-	tools should then be used.
-
-    In the case where <nr mapped sectors> is 0, there is no highest
-    mapped sector and the value of <highest mapped sector> is unspecified.
diff --git a/Documentation/device-mapper/unstriped.rst b/Documentation/device-mapper/unstriped.rst
new file mode 100644
index 000000000000..0a8d3eb3f072
--- /dev/null
+++ b/Documentation/device-mapper/unstriped.rst
@@ -0,0 +1,135 @@
+================================
+Device-mapper "unstriped" target
+================================
+
+Introduction
+============
+
+The device-mapper "unstriped" target provides a transparent mechanism to
+unstripe a device-mapper "striped" target to access the underlying disks
+without having to touch the true backing block-device.  It can also be
+used to unstripe a hardware RAID-0 to access backing disks.
+
+Parameters:
+<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
+
+<number of stripes>
+        The number of stripes in the RAID 0.
+
+<chunk size>
+	The amount of 512B sectors in the chunk striping.
+
+<dev_path>
+	The block device you wish to unstripe.
+
+<stripe #>
+        The stripe number within the device that corresponds to physical
+        drive you wish to unstripe.  This must be 0 indexed.
+
+
+Why use this module?
+====================
+
+An example of undoing an existing dm-stripe
+-------------------------------------------
+
+This small bash script will setup 4 loop devices and use the existing
+striped target to combine the 4 devices into one.  It then will use
+the unstriped target ontop of the striped device to access the
+individual backing loop devices.  We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array::
+
+  #!/bin/bash
+
+  MEMBER_SIZE=$((128 * 1024 * 1024))
+  NUM=4
+  SEQ_END=$((${NUM}-1))
+  CHUNK=256
+  BS=4096
+
+  RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+  DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+  COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+    losetup /dev/loop${i} member-${i}
+    DM_PARMS+=" /dev/loop${i} 0"
+  done
+
+  echo $DM_PARMS | dmsetup create raid0
+  for i in $(seq 0 ${SEQ_END}); do
+    echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
+  done;
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
+    diff /dev/mapper/set-${i} member-${i}
+  done;
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dmsetup remove set-${i}
+  done
+
+  dmsetup remove raid0
+
+  for i in $(seq 0 ${SEQ_END}); do
+    losetup -d /dev/loop${i}
+    rm -f member-${i}
+  done
+
+Another example
+---------------
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores::
+
+   Core 0:       Core 1:
+  __________    __________
+  | LBA 512|    | LBA 768|
+  | LBA 0  |    | LBA 256|
+  ----------    ----------
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition.  This is because the partitions
+are striped across the two cores.  When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the dm-unstriped target we're able to segregate an fio script that
+has read and write jobs that are independent of each other.  Compared to
+when we run the test on a combined drive with partitions, we were able
+to get a 92% reduction in read latency using this device mapper target.
+
+
+Example dmsetup usage
+=====================
+
+unstriped ontop of Intel NVMe device that has 2 cores
+-----------------------------------------------------
+
+::
+
+  dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
+  dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
+
+There will now be two devices that expose Intel NVMe core 0 and 1
+respectively::
+
+  /dev/mapper/nvmset0
+  /dev/mapper/nvmset1
+
+unstriped ontop of striped with 4 drives using 128K chunk size
+--------------------------------------------------------------
+
+::
+
+  dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
+  dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
+  dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
+  dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
diff --git a/Documentation/device-mapper/unstriped.txt b/Documentation/device-mapper/unstriped.txt
deleted file mode 100644
index 0b2a306c54ee..000000000000
--- a/Documentation/device-mapper/unstriped.txt
+++ /dev/null
@@ -1,124 +0,0 @@
-Introduction
-============
-
-The device-mapper "unstriped" target provides a transparent mechanism to
-unstripe a device-mapper "striped" target to access the underlying disks
-without having to touch the true backing block-device.  It can also be
-used to unstripe a hardware RAID-0 to access backing disks.
-
-Parameters:
-<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
-
-<number of stripes>
-        The number of stripes in the RAID 0.
-
-<chunk size>
-	The amount of 512B sectors in the chunk striping.
-
-<dev_path>
-	The block device you wish to unstripe.
-
-<stripe #>
-        The stripe number within the device that corresponds to physical
-        drive you wish to unstripe.  This must be 0 indexed.
-
-
-Why use this module?
-====================
-
-An example of undoing an existing dm-stripe
--------------------------------------------
-
-This small bash script will setup 4 loop devices and use the existing
-striped target to combine the 4 devices into one.  It then will use
-the unstriped target ontop of the striped device to access the
-individual backing loop devices.  We write data to the newly exposed
-unstriped devices and verify the data written matches the correct
-underlying device on the striped array.
-
-#!/bin/bash
-
-MEMBER_SIZE=$((128 * 1024 * 1024))
-NUM=4
-SEQ_END=$((${NUM}-1))
-CHUNK=256
-BS=4096
-
-RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
-DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
-COUNT=$((${MEMBER_SIZE} / ${BS}))
-
-for i in $(seq 0 ${SEQ_END}); do
-  dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
-  losetup /dev/loop${i} member-${i}
-  DM_PARMS+=" /dev/loop${i} 0"
-done
-
-echo $DM_PARMS | dmsetup create raid0
-for i in $(seq 0 ${SEQ_END}); do
-  echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
-done;
-
-for i in $(seq 0 ${SEQ_END}); do
-  dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
-  diff /dev/mapper/set-${i} member-${i}
-done;
-
-for i in $(seq 0 ${SEQ_END}); do
-  dmsetup remove set-${i}
-done
-
-dmsetup remove raid0
-
-for i in $(seq 0 ${SEQ_END}); do
-  losetup -d /dev/loop${i}
-  rm -f member-${i}
-done
-
-Another example
----------------
-
-Intel NVMe drives contain two cores on the physical device.
-Each core of the drive has segregated access to its LBA range.
-The current LBA model has a RAID 0 128k chunk on each core, resulting
-in a 256k stripe across the two cores:
-
-   Core 0:       Core 1:
-  __________    __________
-  | LBA 512|    | LBA 768|
-  | LBA 0  |    | LBA 256|
-  ----------    ----------
-
-The purpose of this unstriping is to provide better QoS in noisy
-neighbor environments. When two partitions are created on the
-aggregate drive without this unstriping, reads on one partition
-can affect writes on another partition.  This is because the partitions
-are striped across the two cores.  When we unstripe this hardware RAID 0
-and make partitions on each new exposed device the two partitions are now
-physically separated.
-
-With the dm-unstriped target we're able to segregate an fio script that
-has read and write jobs that are independent of each other.  Compared to
-when we run the test on a combined drive with partitions, we were able
-to get a 92% reduction in read latency using this device mapper target.
-
-
-Example dmsetup usage
-=====================
-
-unstriped ontop of Intel NVMe device that has 2 cores
------------------------------------------------------
-dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
-dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
-
-There will now be two devices that expose Intel NVMe core 0 and 1
-respectively:
-/dev/mapper/nvmset0
-/dev/mapper/nvmset1
-
-unstriped ontop of striped with 4 drives using 128K chunk size
---------------------------------------------------------------
-dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
-dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
-dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
-dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
diff --git a/Documentation/device-mapper/verity.rst b/Documentation/device-mapper/verity.rst
new file mode 100644
index 000000000000..a4d1c1476d72
--- /dev/null
+++ b/Documentation/device-mapper/verity.rst
@@ -0,0 +1,229 @@
+=========
+dm-verity
+=========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Construction Parameters
+=======================
+
+::
+
+    <version> <dev> <hash_dev>
+    <data_block_size> <hash_block_size>
+    <num_data_blocks> <hash_start_block>
+    <algorithm> <digest> <salt>
+    [<#opt_params> <opt_params>]
+
+<version>
+    This is the type of the on-disk hash format.
+
+    0 is the original format used in the Chromium OS.
+      The salt is appended when hashing, digests are stored continuously and
+      the rest of the block is padded with zeroes.
+
+    1 is the current format that should be used for new devices.
+      The salt is prepended when hashing and each digest is
+      padded with zeroes to the power of two.
+
+<dev>
+    This is the device containing data, the integrity of which needs to be
+    checked.  It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash_start should be outside the configured
+    dm-verity device.
+
+<data_block_size>
+    The block size on a data device in bytes.
+    Each block corresponds to one digest on the hash device.
+
+<hash_block_size>
+    The size of a hash block in bytes.
+
+<num_data_blocks>
+    The number of data blocks on the data device.  Additional blocks are
+    inaccessible.  You can place hashes to the same partition as data, in this
+    case hashes are placed after <num_data_blocks>.
+
+<hash_start_block>
+    This is the offset, in <hash_block_size>-blocks, from the start of hash_dev
+    to the root block of the hash tree.
+
+<algorithm>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of the root hash block
+    and the salt.  This hash should be trusted as there is no other authenticity
+    beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+<#opt_params>
+    Number of optional parameters. If there are no optional parameters,
+    the optional paramaters section can be skipped or #opt_params can be zero.
+    Otherwise #opt_params is the number of following arguments.
+
+    Example of optional parameters section:
+        1 ignore_corruption
+
+ignore_corruption
+    Log corrupted blocks, but allow read operations to proceed normally.
+
+restart_on_corruption
+    Restart the system when a corrupted block is discovered. This option is
+    not compatible with ignore_corruption and requires user space support to
+    avoid restart loops.
+
+ignore_zero_blocks
+    Do not verify blocks that are expected to contain zeroes and always return
+    zeroes instead. This may be useful if the partition contains unused blocks
+    that are not guaranteed to contain zeroes.
+
+use_fec_from_device <fec_dev>
+    Use forward error correction (FEC) to recover from corruption if hash
+    verification fails. Use encoding data from the specified device. This
+    may be the same device where data and hash blocks reside, in which case
+    fec_start must be outside data and hash areas.
+
+    If the encoding data covers additional metadata, it must be accessible
+    on the hash device after the hash blocks.
+
+    Note: block sizes for data and hash devices must match. Also, if the
+    verity <dev> is encrypted the <fec_dev> should be too.
+
+fec_roots <num>
+    Number of generator roots. This equals to the number of parity bytes in
+    the encoding data. For example, in RS(M, N) encoding, the number of roots
+    is M-N.
+
+fec_blocks <num>
+    The number of encoding data blocks on the FEC device. The block size for
+    the FEC device is <data_block_size>.
+
+fec_start <offset>
+    This is the offset, in <data_block_size> blocks, from the start of the
+    FEC device to the beginning of the encoding data.
+
+check_at_most_once
+    Verify data blocks only the first time they are read from the data device,
+    rather than every time.  This reduces the overhead of dm-verity so that it
+    can be used on systems that are memory and/or CPU constrained.  However, it
+    provides a reduced level of security because only offline tampering of the
+    data device's content will be detected, not online tampering.
+
+    Hash blocks are still verified each time they are read from the hash device,
+    since verification of hash blocks is less performance critical than data
+    blocks, and a hash block will not be verified any more after all the data
+    blocks it covers have been verified anyway.
+
+Theory of operation
+===================
+
+dm-verity is meant to be set up as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should detect
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly, aligned to the nearest
+block size.
+
+If forward error correction (FEC) support is enabled any recovery of
+corrupted data will be verified using the cryptographic hash of the
+corresponding data. This is why combining error correction with
+integrity checking is essential.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+of some data block on disk is calculated. If it is an intermediary node,
+the hash of a number of child nodes is calculated.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly-ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+	alg = sha256, num_blocks = 32768, block_size = 4096
+
+::
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+
+On-disk format
+==============
+
+The verity kernel code does not read the verity metadata on-disk header.
+It only reads the hash blocks which directly follow the header.
+It is expected that a user-space tool will verify the integrity of the
+verity header.
+
+Alternatively, the header can be omitted and the dmsetup parameters can
+be passed via the kernel command-line in a rooted chain of trust where
+the command-line is verified.
+
+Directly following the header (and with sector number padded to the next hash
+block boundary) are the hash blocks which are stored a depth at a time
+(starting from the root), sorted in order of increasing index.
+
+The full specification of kernel parameters and on-disk metadata format
+is available at the cryptsetup project's wiki page
+
+  https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
+
+Status
+======
+V (for Valid) is returned if every check performed so far was valid.
+If any check failed, C (for Corruption) is returned.
+
+Example
+=======
+Set up a device::
+
+  # dmsetup create vroot --readonly --table \
+    "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
+    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
+    "1234000000000000000000000000000000000000000000000000000000000000"
+
+A command line tool veritysetup is available to compute or verify
+the hash tree or activate the kernel device. This is available from
+the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
+(as a libcryptsetup extension).
+
+Create hash on the device::
+
+  # veritysetup format /dev/sda1 /dev/sda2
+  ...
+  Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
+
+Activate the device::
+
+  # veritysetup create vroot /dev/sda1 /dev/sda2 \
+    4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
deleted file mode 100644
index b3d2e4a42255..000000000000
--- a/Documentation/device-mapper/verity.txt
+++ /dev/null
@@ -1,219 +0,0 @@
-dm-verity
-==========
-
-Device-Mapper's "verity" target provides transparent integrity checking of
-block devices using a cryptographic digest provided by the kernel crypto API.
-This target is read-only.
-
-Construction Parameters
-=======================
-    <version> <dev> <hash_dev>
-    <data_block_size> <hash_block_size>
-    <num_data_blocks> <hash_start_block>
-    <algorithm> <digest> <salt>
-    [<#opt_params> <opt_params>]
-
-<version>
-    This is the type of the on-disk hash format.
-
-    0 is the original format used in the Chromium OS.
-      The salt is appended when hashing, digests are stored continuously and
-      the rest of the block is padded with zeroes.
-
-    1 is the current format that should be used for new devices.
-      The salt is prepended when hashing and each digest is
-      padded with zeroes to the power of two.
-
-<dev>
-    This is the device containing data, the integrity of which needs to be
-    checked.  It may be specified as a path, like /dev/sdaX, or a device number,
-    <major>:<minor>.
-
-<hash_dev>
-    This is the device that supplies the hash tree data.  It may be
-    specified similarly to the device path and may be the same device.  If the
-    same device is used, the hash_start should be outside the configured
-    dm-verity device.
-
-<data_block_size>
-    The block size on a data device in bytes.
-    Each block corresponds to one digest on the hash device.
-
-<hash_block_size>
-    The size of a hash block in bytes.
-
-<num_data_blocks>
-    The number of data blocks on the data device.  Additional blocks are
-    inaccessible.  You can place hashes to the same partition as data, in this
-    case hashes are placed after <num_data_blocks>.
-
-<hash_start_block>
-    This is the offset, in <hash_block_size>-blocks, from the start of hash_dev
-    to the root block of the hash tree.
-
-<algorithm>
-    The cryptographic hash algorithm used for this device.  This should
-    be the name of the algorithm, like "sha1".
-
-<digest>
-    The hexadecimal encoding of the cryptographic hash of the root hash block
-    and the salt.  This hash should be trusted as there is no other authenticity
-    beyond this point.
-
-<salt>
-    The hexadecimal encoding of the salt value.
-
-<#opt_params>
-    Number of optional parameters. If there are no optional parameters,
-    the optional paramaters section can be skipped or #opt_params can be zero.
-    Otherwise #opt_params is the number of following arguments.
-
-    Example of optional parameters section:
-        1 ignore_corruption
-
-ignore_corruption
-    Log corrupted blocks, but allow read operations to proceed normally.
-
-restart_on_corruption
-    Restart the system when a corrupted block is discovered. This option is
-    not compatible with ignore_corruption and requires user space support to
-    avoid restart loops.
-
-ignore_zero_blocks
-    Do not verify blocks that are expected to contain zeroes and always return
-    zeroes instead. This may be useful if the partition contains unused blocks
-    that are not guaranteed to contain zeroes.
-
-use_fec_from_device <fec_dev>
-    Use forward error correction (FEC) to recover from corruption if hash
-    verification fails. Use encoding data from the specified device. This
-    may be the same device where data and hash blocks reside, in which case
-    fec_start must be outside data and hash areas.
-
-    If the encoding data covers additional metadata, it must be accessible
-    on the hash device after the hash blocks.
-
-    Note: block sizes for data and hash devices must match. Also, if the
-    verity <dev> is encrypted the <fec_dev> should be too.
-
-fec_roots <num>
-    Number of generator roots. This equals to the number of parity bytes in
-    the encoding data. For example, in RS(M, N) encoding, the number of roots
-    is M-N.
-
-fec_blocks <num>
-    The number of encoding data blocks on the FEC device. The block size for
-    the FEC device is <data_block_size>.
-
-fec_start <offset>
-    This is the offset, in <data_block_size> blocks, from the start of the
-    FEC device to the beginning of the encoding data.
-
-check_at_most_once
-    Verify data blocks only the first time they are read from the data device,
-    rather than every time.  This reduces the overhead of dm-verity so that it
-    can be used on systems that are memory and/or CPU constrained.  However, it
-    provides a reduced level of security because only offline tampering of the
-    data device's content will be detected, not online tampering.
-
-    Hash blocks are still verified each time they are read from the hash device,
-    since verification of hash blocks is less performance critical than data
-    blocks, and a hash block will not be verified any more after all the data
-    blocks it covers have been verified anyway.
-
-Theory of operation
-===================
-
-dm-verity is meant to be set up as part of a verified boot path.  This
-may be anything ranging from a boot using tboot or trustedgrub to just
-booting from a known-good device (like a USB drive or CD).
-
-When a dm-verity device is configured, it is expected that the caller
-has been authenticated in some way (cryptographic signatures, etc).
-After instantiation, all hashes will be verified on-demand during
-disk access.  If they cannot be verified up to the root node of the
-tree, the root hash, then the I/O will fail.  This should detect
-tampering with any data on the device and the hash data.
-
-Cryptographic hashes are used to assert the integrity of the device on a
-per-block basis. This allows for a lightweight hash computation on first read
-into the page cache. Block hashes are stored linearly, aligned to the nearest
-block size.
-
-If forward error correction (FEC) support is enabled any recovery of
-corrupted data will be verified using the cryptographic hash of the
-corresponding data. This is why combining error correction with
-integrity checking is essential.
-
-Hash Tree
----------
-
-Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
-of some data block on disk is calculated. If it is an intermediary node,
-the hash of a number of child nodes is calculated.
-
-Each entry in the tree is a collection of neighboring nodes that fit in one
-block.  The number is determined based on block_size and the size of the
-selected cryptographic digest algorithm.  The hashes are linearly-ordered in
-this entry and any unaligned trailing space is ignored but included when
-calculating the parent node.
-
-The tree looks something like:
-
-alg = sha256, num_blocks = 32768, block_size = 4096
-
-                                 [   root    ]
-                                /    . . .    \
-                     [entry_0]                 [entry_1]
-                    /  . . .  \                 . . .   \
-         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
-           / ... \             /   . . .  \             /           \
-     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
-
-
-On-disk format
-==============
-
-The verity kernel code does not read the verity metadata on-disk header.
-It only reads the hash blocks which directly follow the header.
-It is expected that a user-space tool will verify the integrity of the
-verity header.
-
-Alternatively, the header can be omitted and the dmsetup parameters can
-be passed via the kernel command-line in a rooted chain of trust where
-the command-line is verified.
-
-Directly following the header (and with sector number padded to the next hash
-block boundary) are the hash blocks which are stored a depth at a time
-(starting from the root), sorted in order of increasing index.
-
-The full specification of kernel parameters and on-disk metadata format
-is available at the cryptsetup project's wiki page
-  https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
-
-Status
-======
-V (for Valid) is returned if every check performed so far was valid.
-If any check failed, C (for Corruption) is returned.
-
-Example
-=======
-Set up a device:
-  # dmsetup create vroot --readonly --table \
-    "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
-    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
-    "1234000000000000000000000000000000000000000000000000000000000000"
-
-A command line tool veritysetup is available to compute or verify
-the hash tree or activate the kernel device. This is available from
-the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
-(as a libcryptsetup extension).
-
-Create hash on the device:
-  # veritysetup format /dev/sda1 /dev/sda2
-  ...
-  Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
-
-Activate the device:
-  # veritysetup create vroot /dev/sda1 /dev/sda2 \
-    4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
diff --git a/Documentation/device-mapper/writecache.rst b/Documentation/device-mapper/writecache.rst
new file mode 100644
index 000000000000..d3d7690f5e8d
--- /dev/null
+++ b/Documentation/device-mapper/writecache.rst
@@ -0,0 +1,79 @@
+=================
+Writecache target
+=================
+
+The writecache target caches writes on persistent memory or on SSD. It
+doesn't cache reads because reads are supposed to be cached in page cache
+in normal RAM.
+
+When the device is constructed, the first sector should be zeroed or the
+first sector should contain valid superblock from previous invocation.
+
+Constructor parameters:
+
+1. type of the cache device - "p" or "s"
+
+	- p - persistent memory
+	- s - SSD
+2. the underlying device that will be cached
+3. the cache device
+4. block size (4096 is recommended; the maximum block size is the page
+   size)
+5. the number of optional parameters (the parameters with an argument
+   count as two)
+
+	start_sector n		(default: 0)
+		offset from the start of cache device in 512-byte sectors
+	high_watermark n	(default: 50)
+		start writeback when the number of used blocks reach this
+		watermark
+	low_watermark x		(default: 45)
+		stop writeback when the number of used blocks drops below
+		this watermark
+	writeback_jobs n	(default: unlimited)
+		limit the number of blocks that are in flight during
+		writeback. Setting this value reduces writeback
+		throughput, but it may improve latency of read requests
+	autocommit_blocks n	(default: 64 for pmem, 65536 for ssd)
+		when the application writes this amount of blocks without
+		issuing the FLUSH request, the blocks are automatically
+		commited
+	autocommit_time ms	(default: 1000)
+		autocommit time in milliseconds. The data is automatically
+		commited if this time passes and no FLUSH request is
+		received
+	fua			(by default on)
+		applicable only to persistent memory - use the FUA flag
+		when writing data from persistent memory back to the
+		underlying device
+	nofua
+		applicable only to persistent memory - don't use the FUA
+		flag when writing back data and send the FLUSH request
+		afterwards
+
+		- some underlying devices perform better with fua, some
+		  with nofua. The user should test it
+
+Status:
+1. error indicator - 0 if there was no error, otherwise error number
+2. the number of blocks
+3. the number of free blocks
+4. the number of blocks under writeback
+
+Messages:
+	flush
+		flush the cache device. The message returns successfully
+		if the cache device was flushed without an error
+	flush_on_suspend
+		flush the cache device on next suspend. Use this message
+		when you are going to remove the cache device. The proper
+		sequence for removing the cache device is:
+
+		1. send the "flush_on_suspend" message
+		2. load an inactive table with a linear target that maps
+		   to the underlying device
+		3. suspend the device
+		4. ask for status and verify that there are no errors
+		5. resume the device, so that it will use the linear
+		   target
+		6. the cache device is now inactive and it can be deleted
diff --git a/Documentation/device-mapper/writecache.txt b/Documentation/device-mapper/writecache.txt
deleted file mode 100644
index 01532b3008ae..000000000000
--- a/Documentation/device-mapper/writecache.txt
+++ /dev/null
@@ -1,70 +0,0 @@
-The writecache target caches writes on persistent memory or on SSD. It
-doesn't cache reads because reads are supposed to be cached in page cache
-in normal RAM.
-
-When the device is constructed, the first sector should be zeroed or the
-first sector should contain valid superblock from previous invocation.
-
-Constructor parameters:
-1. type of the cache device - "p" or "s"
-	p - persistent memory
-	s - SSD
-2. the underlying device that will be cached
-3. the cache device
-4. block size (4096 is recommended; the maximum block size is the page
-   size)
-5. the number of optional parameters (the parameters with an argument
-   count as two)
-	start_sector n		(default: 0)
-		offset from the start of cache device in 512-byte sectors
-	high_watermark n	(default: 50)
-		start writeback when the number of used blocks reach this
-		watermark
-	low_watermark x		(default: 45)
-		stop writeback when the number of used blocks drops below
-		this watermark
-	writeback_jobs n	(default: unlimited)
-		limit the number of blocks that are in flight during
-		writeback. Setting this value reduces writeback
-		throughput, but it may improve latency of read requests
-	autocommit_blocks n	(default: 64 for pmem, 65536 for ssd)
-		when the application writes this amount of blocks without
-		issuing the FLUSH request, the blocks are automatically
-		commited
-	autocommit_time ms	(default: 1000)
-		autocommit time in milliseconds. The data is automatically
-		commited if this time passes and no FLUSH request is
-		received
-	fua			(by default on)
-		applicable only to persistent memory - use the FUA flag
-		when writing data from persistent memory back to the
-		underlying device
-	nofua
-		applicable only to persistent memory - don't use the FUA
-		flag when writing back data and send the FLUSH request
-		afterwards
-		- some underlying devices perform better with fua, some
-		  with nofua. The user should test it
-
-Status:
-1. error indicator - 0 if there was no error, otherwise error number
-2. the number of blocks
-3. the number of free blocks
-4. the number of blocks under writeback
-
-Messages:
-	flush
-		flush the cache device. The message returns successfully
-		if the cache device was flushed without an error
-	flush_on_suspend
-		flush the cache device on next suspend. Use this message
-		when you are going to remove the cache device. The proper
-		sequence for removing the cache device is:
-		1. send the "flush_on_suspend" message
-		2. load an inactive table with a linear target that maps
-		   to the underlying device
-		3. suspend the device
-		4. ask for status and verify that there are no errors
-		5. resume the device, so that it will use the linear
-		   target
-		6. the cache device is now inactive and it can be deleted
diff --git a/Documentation/device-mapper/zero.rst b/Documentation/device-mapper/zero.rst
new file mode 100644
index 000000000000..11fb5cf4597c
--- /dev/null
+++ b/Documentation/device-mapper/zero.rst
@@ -0,0 +1,37 @@
+=======
+dm-zero
+=======
+
+Device-Mapper's "zero" target provides a block-device that always returns
+zero'd data on reads and silently drops writes. This is similar behavior to
+/dev/zero, but as a block-device instead of a character-device.
+
+Dm-zero has no target-specific parameters.
+
+One very interesting use of dm-zero is for creating "sparse" devices in
+conjunction with dm-snapshot. A sparse device reports a device-size larger
+than the amount of actual storage space available for that device. A user can
+write data anywhere within the sparse device and read it back like a normal
+device. Reads to previously unwritten areas will return a zero'd buffer. When
+enough data has been written to fill up the actual storage space, the sparse
+device is deactivated. This can be very useful for testing device and
+filesystem limitations.
+
+To create a sparse device, start by creating a dm-zero device that's the
+desired size of the sparse device. For this example, we'll assume a 10TB
+sparse device::
+
+  TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2`   # 10 TB in sectors
+  echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
+
+Then create a snapshot of the zero device, using any available block-device as
+the COW device. The size of the COW device will determine the amount of real
+space available to the sparse device. For this example, we'll assume /dev/sdb1
+is an available 10GB partition::
+
+  echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
+     dmsetup create sparse1
+
+This will create a 10TB sparse device called /dev/mapper/sparse1 that has
+10GB of actual storage space available. If more than 10GB of data is written
+to this device, it will start returning I/O errors.
diff --git a/Documentation/device-mapper/zero.txt b/Documentation/device-mapper/zero.txt
deleted file mode 100644
index 20fb38e7fa7e..000000000000
--- a/Documentation/device-mapper/zero.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-dm-zero
-=======
-
-Device-Mapper's "zero" target provides a block-device that always returns
-zero'd data on reads and silently drops writes. This is similar behavior to
-/dev/zero, but as a block-device instead of a character-device.
-
-Dm-zero has no target-specific parameters.
-
-One very interesting use of dm-zero is for creating "sparse" devices in
-conjunction with dm-snapshot. A sparse device reports a device-size larger
-than the amount of actual storage space available for that device. A user can
-write data anywhere within the sparse device and read it back like a normal
-device. Reads to previously unwritten areas will return a zero'd buffer. When
-enough data has been written to fill up the actual storage space, the sparse
-device is deactivated. This can be very useful for testing device and
-filesystem limitations.
-
-To create a sparse device, start by creating a dm-zero device that's the
-desired size of the sparse device. For this example, we'll assume a 10TB
-sparse device.
-
-TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2`   # 10 TB in sectors
-echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
-
-Then create a snapshot of the zero device, using any available block-device as
-the COW device. The size of the COW device will determine the amount of real
-space available to the sparse device. For this example, we'll assume /dev/sdb1
-is an available 10GB partition.
-
-echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
-   dmsetup create sparse1
-
-This will create a 10TB sparse device called /dev/mapper/sparse1 that has
-10GB of actual storage space available. If more than 10GB of data is written
-to this device, it will start returning I/O errors.
-
diff --git a/Documentation/filesystems/ubifs-authentication.md b/Documentation/filesystems/ubifs-authentication.md
index 028b3e2e25f9..23e698167141 100644
--- a/Documentation/filesystems/ubifs-authentication.md
+++ b/Documentation/filesystems/ubifs-authentication.md
@@ -417,9 +417,9 @@ will then have to be provided beforehand in the normal way.
 
 [DMC-CBC-ATTACK]     http://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/
 
-[DM-INTEGRITY]       https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt
+[DM-INTEGRITY]       https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst
 
-[DM-VERITY]          https://www.kernel.org/doc/Documentation/device-mapper/verity.txt
+[DM-VERITY]          https://www.kernel.org/doc/Documentation/device-mapper/verity.rst
 
 [FSCRYPT-POLICY2]    https://www.spinics.net/lists/linux-ext4/msg58710.html
 
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 45254b3ef715..5ccac0b77f17 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -453,7 +453,7 @@ config DM_INIT
 	Enable "dm-mod.create=" parameter to create mapped devices at init time.
 	This option is useful to allow mounting rootfs without requiring an
 	initramfs.
-	See Documentation/device-mapper/dm-init.txt for dm-mod.create="..."
+	See Documentation/device-mapper/dm-init.rst for dm-mod.create="..."
 	format.
 
 	If unsure, say N.
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index 352e803f566e..a58d0944f592 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -25,7 +25,7 @@ static char *create;
  * Format: dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
  * Table format: <start_sector> <num_sectors> <target_type> <target_args>
  *
- * See Documentation/device-mapper/dm-init.txt for dm-mod.create="..." format
+ * See Documentation/device-mapper/dm-init.rst for dm-mod.create="..." format
  * details.
  */
 
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 9fdef6897316..7a87a640f8ba 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3558,7 +3558,7 @@ static void raid_status(struct dm_target *ti, status_type_t type,
 		 * v1.5.0+:
 		 *
 		 * Sync action:
-		 *   See Documentation/device-mapper/dm-raid.txt for
+		 *   See Documentation/device-mapper/dm-raid.rst for
 		 *   information on each of these states.
 		 */
 		DMEMIT(" %s", sync_action);
-- 
cgit v1.2.3-59-g8ed1b


From 10ffebbed5503b1830c7920ef528075785351be6 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:44 -0300
Subject: docs: fault-injection: convert docs to ReST and rename to *.rst

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/fault-injection/fault-injection.rst  | 446 +++++++++++++++++++++
 Documentation/fault-injection/fault-injection.txt  | 435 --------------------
 Documentation/fault-injection/index.rst            |  20 +
 .../fault-injection/notifier-error-inject.rst      |  98 +++++
 .../fault-injection/notifier-error-inject.txt      |  94 -----
 .../fault-injection/nvme-fault-injection.rst       | 120 ++++++
 .../fault-injection/nvme-fault-injection.txt       | 116 ------
 Documentation/fault-injection/provoke-crashes.rst  |  48 +++
 Documentation/fault-injection/provoke-crashes.txt  |  38 --
 Documentation/process/4.Coding.rst                 |   2 +-
 .../translations/it_IT/process/4.Coding.rst        |   2 +-
 .../translations/zh_CN/process/4.Coding.rst        |   2 +-
 drivers/misc/lkdtm/core.c                          |   2 +-
 include/linux/fault-inject.h                       |   2 +-
 lib/Kconfig.debug                                  |   2 +-
 tools/testing/fault-injection/failcmd.sh           |   2 +-
 16 files changed, 739 insertions(+), 690 deletions(-)
 create mode 100644 Documentation/fault-injection/fault-injection.rst
 delete mode 100644 Documentation/fault-injection/fault-injection.txt
 create mode 100644 Documentation/fault-injection/index.rst
 create mode 100644 Documentation/fault-injection/notifier-error-inject.rst
 delete mode 100644 Documentation/fault-injection/notifier-error-inject.txt
 create mode 100644 Documentation/fault-injection/nvme-fault-injection.rst
 delete mode 100644 Documentation/fault-injection/nvme-fault-injection.txt
 create mode 100644 Documentation/fault-injection/provoke-crashes.rst
 delete mode 100644 Documentation/fault-injection/provoke-crashes.txt

diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst
new file mode 100644
index 000000000000..f51bb21d20e4
--- /dev/null
+++ b/Documentation/fault-injection/fault-injection.rst
@@ -0,0 +1,446 @@
+===========================================
+Fault injection capabilities infrastructure
+===========================================
+
+See also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug.
+
+
+Available fault injection capabilities
+--------------------------------------
+
+- failslab
+
+  injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
+
+- fail_page_alloc
+
+  injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
+
+- fail_futex
+
+  injects futex deadlock and uaddr fault errors.
+
+- fail_make_request
+
+  injects disk IO errors on devices permitted by setting
+  /sys/block/<device>/make-it-fail or
+  /sys/block/<device>/<partition>/make-it-fail. (generic_make_request())
+
+- fail_mmc_request
+
+  injects MMC data errors on devices permitted by setting
+  debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
+
+- fail_function
+
+  injects error return on specific functions, which are marked by
+  ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
+  under /sys/kernel/debug/fail_function. No boot option supported.
+
+- NVMe fault injection
+
+  inject NVMe status code and retry flag on devices permitted by setting
+  debugfs entries under /sys/kernel/debug/nvme*/fault_inject. The default
+  status code is NVME_SC_INVALID_OPCODE with no retry. The status code and
+  retry flag can be set via the debugfs.
+
+
+Configure fault-injection capabilities behavior
+-----------------------------------------------
+
+debugfs entries
+^^^^^^^^^^^^^^^
+
+fault-inject-debugfs kernel module provides some debugfs entries for runtime
+configuration of fault-injection capabilities.
+
+- /sys/kernel/debug/fail*/probability:
+
+	likelihood of failure injection, in percent.
+
+	Format: <percent>
+
+	Note that one-failure-per-hundred is a very high error rate
+	for some testcases.  Consider setting probability=100 and configure
+	/sys/kernel/debug/fail*/interval for such testcases.
+
+- /sys/kernel/debug/fail*/interval:
+
+	specifies the interval between failures, for calls to
+	should_fail() that pass all the other tests.
+
+	Note that if you enable this, by setting interval>1, you will
+	probably want to set probability=100.
+
+- /sys/kernel/debug/fail*/times:
+
+	specifies how many times failures may happen at most.
+	A value of -1 means "no limit".
+
+- /sys/kernel/debug/fail*/space:
+
+	specifies an initial resource "budget", decremented by "size"
+	on each call to should_fail(,size).  Failure injection is
+	suppressed until "space" reaches zero.
+
+- /sys/kernel/debug/fail*/verbose
+
+	Format: { 0 | 1 | 2 }
+
+	specifies the verbosity of the messages when failure is
+	injected.  '0' means no messages; '1' will print only a single
+	log line per failure; '2' will print a call trace too -- useful
+	to debug the problems revealed by fault injection.
+
+- /sys/kernel/debug/fail*/task-filter:
+
+	Format: { 'Y' | 'N' }
+
+	A value of 'N' disables filtering by process (default).
+	Any positive value limits failures to only processes indicated by
+	/proc/<pid>/make-it-fail==1.
+
+- /sys/kernel/debug/fail*/require-start,
+  /sys/kernel/debug/fail*/require-end,
+  /sys/kernel/debug/fail*/reject-start,
+  /sys/kernel/debug/fail*/reject-end:
+
+	specifies the range of virtual addresses tested during
+	stacktrace walking.  Failure is injected only if some caller
+	in the walked stacktrace lies within the required range, and
+	none lies within the rejected range.
+	Default required range is [0,ULONG_MAX) (whole of virtual address space).
+	Default rejected range is [0,0).
+
+- /sys/kernel/debug/fail*/stacktrace-depth:
+
+	specifies the maximum stacktrace depth walked during search
+	for a caller within [require-start,require-end) OR
+	[reject-start,reject-end).
+
+- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
+
+	Format: { 'Y' | 'N' }
+
+	default is 'N', setting it to 'Y' won't inject failures into
+	highmem/user allocations.
+
+- /sys/kernel/debug/failslab/ignore-gfp-wait:
+- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
+
+	Format: { 'Y' | 'N' }
+
+	default is 'N', setting it to 'Y' will inject failures
+	only into non-sleep allocations (GFP_ATOMIC allocations).
+
+- /sys/kernel/debug/fail_page_alloc/min-order:
+
+	specifies the minimum page allocation order to be injected
+	failures.
+
+- /sys/kernel/debug/fail_futex/ignore-private:
+
+	Format: { 'Y' | 'N' }
+
+	default is 'N', setting it to 'Y' will disable failure injections
+	when dealing with private (address space) futexes.
+
+- /sys/kernel/debug/fail_function/inject:
+
+	Format: { 'function-name' | '!function-name' | '' }
+
+	specifies the target function of error injection by name.
+	If the function name leads '!' prefix, given function is
+	removed from injection list. If nothing specified ('')
+	injection list is cleared.
+
+- /sys/kernel/debug/fail_function/injectable:
+
+	(read only) shows error injectable functions and what type of
+	error values can be specified. The error type will be one of
+	below;
+	- NULL:	retval must be 0.
+	- ERRNO: retval must be -1 to -MAX_ERRNO (-4096).
+	- ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096).
+
+- /sys/kernel/debug/fail_function/<functiuon-name>/retval:
+
+	specifies the "error" return value to inject to the given
+	function for given function. This will be created when
+	user specifies new injection entry.
+
+Boot option
+^^^^^^^^^^^
+
+In order to inject faults while debugfs is not available (early boot time),
+use the boot option::
+
+	failslab=
+	fail_page_alloc=
+	fail_make_request=
+	fail_futex=
+	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
+
+proc entries
+^^^^^^^^^^^^
+
+- /proc/<pid>/fail-nth,
+  /proc/self/task/<tid>/fail-nth:
+
+	Write to this file of integer N makes N-th call in the task fail.
+	Read from this file returns a integer value. A value of '0' indicates
+	that the fault setup with a previous write to this file was injected.
+	A positive integer N indicates that the fault wasn't yet injected.
+	Note that this file enables all types of faults (slab, futex, etc).
+	This setting takes precedence over all other generic debugfs settings
+	like probability, interval, times, etc. But per-capability settings
+	(e.g. fail_futex/ignore-private) take precedence over it.
+
+	This feature is intended for systematic testing of faults in a single
+	system call. See an example below.
+
+How to add new fault injection capability
+-----------------------------------------
+
+- #include <linux/fault-inject.h>
+
+- define the fault attributes
+
+  DECLARE_FAULT_ATTR(name);
+
+  Please see the definition of struct fault_attr in fault-inject.h
+  for details.
+
+- provide a way to configure fault attributes
+
+- boot option
+
+  If you need to enable the fault injection capability from boot time, you can
+  provide boot option to configure it. There is a helper function for it:
+
+	setup_fault_attr(attr, str);
+
+- debugfs entries
+
+  failslab, fail_page_alloc, and fail_make_request use this way.
+  Helper functions:
+
+	fault_create_debugfs_attr(name, parent, attr);
+
+- module parameters
+
+  If the scope of the fault injection capability is limited to a
+  single kernel module, it is better to provide module parameters to
+  configure the fault attributes.
+
+- add a hook to insert failures
+
+  Upon should_fail() returning true, client code should inject a failure:
+
+	should_fail(attr, size);
+
+Application Examples
+--------------------
+
+- Inject slab allocation failures into module init/exit code::
+
+    #!/bin/bash
+
+    FAILTYPE=failslab
+    echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
+    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
+    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
+    echo -1 > /sys/kernel/debug/$FAILTYPE/times
+    echo 0 > /sys/kernel/debug/$FAILTYPE/space
+    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
+    echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
+
+    faulty_system()
+    {
+	bash -c "echo 1 > /proc/self/make-it-fail && exec $*"
+    }
+
+    if [ $# -eq 0 ]
+    then
+	echo "Usage: $0 modulename [ modulename ... ]"
+	exit 1
+    fi
+
+    for m in $*
+    do
+	echo inserting $m...
+	faulty_system modprobe $m
+
+	echo removing $m...
+	faulty_system modprobe -r $m
+    done
+
+------------------------------------------------------------------------------
+
+- Inject page allocation failures only for a specific module::
+
+    #!/bin/bash
+
+    FAILTYPE=fail_page_alloc
+    module=$1
+
+    if [ -z $module ]
+    then
+	echo "Usage: $0 <modulename>"
+	exit 1
+    fi
+
+    modprobe $module
+
+    if [ ! -d /sys/module/$module/sections ]
+    then
+	echo Module $module is not loaded
+	exit 1
+    fi
+
+    cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
+    cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
+
+    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
+    echo 10 > /sys/kernel/debug/$FAILTYPE/probability
+    echo 100 > /sys/kernel/debug/$FAILTYPE/interval
+    echo -1 > /sys/kernel/debug/$FAILTYPE/times
+    echo 0 > /sys/kernel/debug/$FAILTYPE/space
+    echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
+    echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
+    echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
+    echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
+
+    trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
+
+    echo "Injecting errors into the module $module... (interrupt to stop)"
+    sleep 1000000
+
+------------------------------------------------------------------------------
+
+- Inject open_ctree error while btrfs mount::
+
+    #!/bin/bash
+
+    rm -f testfile.img
+    dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
+    DEVICE=$(losetup --show -f testfile.img)
+    mkfs.btrfs -f $DEVICE
+    mkdir -p tmpmnt
+
+    FAILTYPE=fail_function
+    FAILFUNC=open_ctree
+    echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject
+    echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval
+    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
+    echo 100 > /sys/kernel/debug/$FAILTYPE/probability
+    echo 0 > /sys/kernel/debug/$FAILTYPE/interval
+    echo -1 > /sys/kernel/debug/$FAILTYPE/times
+    echo 0 > /sys/kernel/debug/$FAILTYPE/space
+    echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
+
+    mount -t btrfs $DEVICE tmpmnt
+    if [ $? -ne 0 ]
+    then
+	echo "SUCCESS!"
+    else
+	echo "FAILED!"
+	umount tmpmnt
+    fi
+
+    echo > /sys/kernel/debug/$FAILTYPE/inject
+
+    rmdir tmpmnt
+    losetup -d $DEVICE
+    rm testfile.img
+
+
+Tool to run command with failslab or fail_page_alloc
+----------------------------------------------------
+In order to make it easier to accomplish the tasks mentioned above, we can use
+tools/testing/fault-injection/failcmd.sh.  Please run a command
+"./tools/testing/fault-injection/failcmd.sh --help" for more information and
+see the following examples.
+
+Examples:
+
+Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab
+allocation failure::
+
+	# ./tools/testing/fault-injection/failcmd.sh \
+		-- make -C tools/testing/selftests/ run_tests
+
+Same as above except to specify 100 times failures at most instead of one time
+at most by default::
+
+	# ./tools/testing/fault-injection/failcmd.sh --times=100 \
+		-- make -C tools/testing/selftests/ run_tests
+
+Same as above except to inject page allocation failure instead of slab
+allocation failure::
+
+	# env FAILCMD_TYPE=fail_page_alloc \
+		./tools/testing/fault-injection/failcmd.sh --times=100 \
+		-- make -C tools/testing/selftests/ run_tests
+
+Systematic faults using fail-nth
+---------------------------------
+
+The following code systematically faults 0-th, 1-st, 2-nd and so on
+capabilities in the socketpair() system call::
+
+  #include <sys/types.h>
+  #include <sys/stat.h>
+  #include <sys/socket.h>
+  #include <sys/syscall.h>
+  #include <fcntl.h>
+  #include <unistd.h>
+  #include <string.h>
+  #include <stdlib.h>
+  #include <stdio.h>
+  #include <errno.h>
+
+  int main()
+  {
+	int i, err, res, fail_nth, fds[2];
+	char buf[128];
+
+	system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
+	sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
+	fail_nth = open(buf, O_RDWR);
+	for (i = 1;; i++) {
+		sprintf(buf, "%d", i);
+		write(fail_nth, buf, strlen(buf));
+		res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
+		err = errno;
+		pread(fail_nth, buf, sizeof(buf), 0);
+		if (res == 0) {
+			close(fds[0]);
+			close(fds[1]);
+		}
+		printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
+			res, err);
+		if (atoi(buf))
+			break;
+	}
+	return 0;
+  }
+
+An example output::
+
+	1-th fault Y: res=-1/23
+	2-th fault Y: res=-1/23
+	3-th fault Y: res=-1/12
+	4-th fault Y: res=-1/12
+	5-th fault Y: res=-1/23
+	6-th fault Y: res=-1/23
+	7-th fault Y: res=-1/23
+	8-th fault Y: res=-1/12
+	9-th fault Y: res=-1/12
+	10-th fault Y: res=-1/12
+	11-th fault Y: res=-1/12
+	12-th fault Y: res=-1/12
+	13-th fault Y: res=-1/12
+	14-th fault Y: res=-1/12
+	15-th fault Y: res=-1/12
+	16-th fault N: res=0/12
diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
deleted file mode 100644
index a17517a083c3..000000000000
--- a/Documentation/fault-injection/fault-injection.txt
+++ /dev/null
@@ -1,435 +0,0 @@
-Fault injection capabilities infrastructure
-===========================================
-
-See also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug.
-
-
-Available fault injection capabilities
---------------------------------------
-
-o failslab
-
-  injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...)
-
-o fail_page_alloc
-
-  injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
-
-o fail_futex
-
-  injects futex deadlock and uaddr fault errors.
-
-o fail_make_request
-
-  injects disk IO errors on devices permitted by setting
-  /sys/block/<device>/make-it-fail or
-  /sys/block/<device>/<partition>/make-it-fail. (generic_make_request())
-
-o fail_mmc_request
-
-  injects MMC data errors on devices permitted by setting
-  debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
-
-o fail_function
-
-  injects error return on specific functions, which are marked by
-  ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
-  under /sys/kernel/debug/fail_function. No boot option supported.
-
-o NVMe fault injection
-
-  inject NVMe status code and retry flag on devices permitted by setting
-  debugfs entries under /sys/kernel/debug/nvme*/fault_inject. The default
-  status code is NVME_SC_INVALID_OPCODE with no retry. The status code and
-  retry flag can be set via the debugfs.
-
-
-Configure fault-injection capabilities behavior
------------------------------------------------
-
-o debugfs entries
-
-fault-inject-debugfs kernel module provides some debugfs entries for runtime
-configuration of fault-injection capabilities.
-
-- /sys/kernel/debug/fail*/probability:
-
-	likelihood of failure injection, in percent.
-	Format: <percent>
-
-	Note that one-failure-per-hundred is a very high error rate
-	for some testcases.  Consider setting probability=100 and configure
-	/sys/kernel/debug/fail*/interval for such testcases.
-
-- /sys/kernel/debug/fail*/interval:
-
-	specifies the interval between failures, for calls to
-	should_fail() that pass all the other tests.
-
-	Note that if you enable this, by setting interval>1, you will
-	probably want to set probability=100.
-
-- /sys/kernel/debug/fail*/times:
-
-	specifies how many times failures may happen at most.
-	A value of -1 means "no limit".
-
-- /sys/kernel/debug/fail*/space:
-
-	specifies an initial resource "budget", decremented by "size"
-	on each call to should_fail(,size).  Failure injection is
-	suppressed until "space" reaches zero.
-
-- /sys/kernel/debug/fail*/verbose
-
-	Format: { 0 | 1 | 2 }
-	specifies the verbosity of the messages when failure is
-	injected.  '0' means no messages; '1' will print only a single
-	log line per failure; '2' will print a call trace too -- useful
-	to debug the problems revealed by fault injection.
-
-- /sys/kernel/debug/fail*/task-filter:
-
-	Format: { 'Y' | 'N' }
-	A value of 'N' disables filtering by process (default).
-	Any positive value limits failures to only processes indicated by
-	/proc/<pid>/make-it-fail==1.
-
-- /sys/kernel/debug/fail*/require-start:
-- /sys/kernel/debug/fail*/require-end:
-- /sys/kernel/debug/fail*/reject-start:
-- /sys/kernel/debug/fail*/reject-end:
-
-	specifies the range of virtual addresses tested during
-	stacktrace walking.  Failure is injected only if some caller
-	in the walked stacktrace lies within the required range, and
-	none lies within the rejected range.
-	Default required range is [0,ULONG_MAX) (whole of virtual address space).
-	Default rejected range is [0,0).
-
-- /sys/kernel/debug/fail*/stacktrace-depth:
-
-	specifies the maximum stacktrace depth walked during search
-	for a caller within [require-start,require-end) OR
-	[reject-start,reject-end).
-
-- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
-
-	Format: { 'Y' | 'N' }
-	default is 'N', setting it to 'Y' won't inject failures into
-	highmem/user allocations.
-
-- /sys/kernel/debug/failslab/ignore-gfp-wait:
-- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
-
-	Format: { 'Y' | 'N' }
-	default is 'N', setting it to 'Y' will inject failures
-	only into non-sleep allocations (GFP_ATOMIC allocations).
-
-- /sys/kernel/debug/fail_page_alloc/min-order:
-
-	specifies the minimum page allocation order to be injected
-	failures.
-
-- /sys/kernel/debug/fail_futex/ignore-private:
-
-	Format: { 'Y' | 'N' }
-	default is 'N', setting it to 'Y' will disable failure injections
-	when dealing with private (address space) futexes.
-
-- /sys/kernel/debug/fail_function/inject:
-
-	Format: { 'function-name' | '!function-name' | '' }
-	specifies the target function of error injection by name.
-	If the function name leads '!' prefix, given function is
-	removed from injection list. If nothing specified ('')
-	injection list is cleared.
-
-- /sys/kernel/debug/fail_function/injectable:
-
-	(read only) shows error injectable functions and what type of
-	error values can be specified. The error type will be one of
-	below;
-	- NULL:	retval must be 0.
-	- ERRNO: retval must be -1 to -MAX_ERRNO (-4096).
-	- ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096).
-
-- /sys/kernel/debug/fail_function/<functiuon-name>/retval:
-
-	specifies the "error" return value to inject to the given
-	function for given function. This will be created when
-	user specifies new injection entry.
-
-o Boot option
-
-In order to inject faults while debugfs is not available (early boot time),
-use the boot option:
-
-	failslab=
-	fail_page_alloc=
-	fail_make_request=
-	fail_futex=
-	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
-
-o proc entries
-
-- /proc/<pid>/fail-nth:
-- /proc/self/task/<tid>/fail-nth:
-
-	Write to this file of integer N makes N-th call in the task fail.
-	Read from this file returns a integer value. A value of '0' indicates
-	that the fault setup with a previous write to this file was injected.
-	A positive integer N indicates that the fault wasn't yet injected.
-	Note that this file enables all types of faults (slab, futex, etc).
-	This setting takes precedence over all other generic debugfs settings
-	like probability, interval, times, etc. But per-capability settings
-	(e.g. fail_futex/ignore-private) take precedence over it.
-
-	This feature is intended for systematic testing of faults in a single
-	system call. See an example below.
-
-How to add new fault injection capability
------------------------------------------
-
-o #include <linux/fault-inject.h>
-
-o define the fault attributes
-
-  DECLARE_FAULT_ATTR(name);
-
-  Please see the definition of struct fault_attr in fault-inject.h
-  for details.
-
-o provide a way to configure fault attributes
-
-- boot option
-
-  If you need to enable the fault injection capability from boot time, you can
-  provide boot option to configure it. There is a helper function for it:
-
-	setup_fault_attr(attr, str);
-
-- debugfs entries
-
-  failslab, fail_page_alloc, and fail_make_request use this way.
-  Helper functions:
-
-	fault_create_debugfs_attr(name, parent, attr);
-
-- module parameters
-
-  If the scope of the fault injection capability is limited to a
-  single kernel module, it is better to provide module parameters to
-  configure the fault attributes.
-
-o add a hook to insert failures
-
-  Upon should_fail() returning true, client code should inject a failure.
-
-	should_fail(attr, size);
-
-Application Examples
---------------------
-
-o Inject slab allocation failures into module init/exit code
-
-#!/bin/bash
-
-FAILTYPE=failslab
-echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
-echo 10 > /sys/kernel/debug/$FAILTYPE/probability
-echo 100 > /sys/kernel/debug/$FAILTYPE/interval
-echo -1 > /sys/kernel/debug/$FAILTYPE/times
-echo 0 > /sys/kernel/debug/$FAILTYPE/space
-echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
-echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
-
-faulty_system()
-{
-	bash -c "echo 1 > /proc/self/make-it-fail && exec $*"
-}
-
-if [ $# -eq 0 ]
-then
-	echo "Usage: $0 modulename [ modulename ... ]"
-	exit 1
-fi
-
-for m in $*
-do
-	echo inserting $m...
-	faulty_system modprobe $m
-
-	echo removing $m...
-	faulty_system modprobe -r $m
-done
-
-------------------------------------------------------------------------------
-
-o Inject page allocation failures only for a specific module
-
-#!/bin/bash
-
-FAILTYPE=fail_page_alloc
-module=$1
-
-if [ -z $module ]
-then
-	echo "Usage: $0 <modulename>"
-	exit 1
-fi
-
-modprobe $module
-
-if [ ! -d /sys/module/$module/sections ]
-then
-	echo Module $module is not loaded
-	exit 1
-fi
-
-cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
-cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
-
-echo N > /sys/kernel/debug/$FAILTYPE/task-filter
-echo 10 > /sys/kernel/debug/$FAILTYPE/probability
-echo 100 > /sys/kernel/debug/$FAILTYPE/interval
-echo -1 > /sys/kernel/debug/$FAILTYPE/times
-echo 0 > /sys/kernel/debug/$FAILTYPE/space
-echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
-echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
-echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
-echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
-
-trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
-
-echo "Injecting errors into the module $module... (interrupt to stop)"
-sleep 1000000
-
-------------------------------------------------------------------------------
-
-o Inject open_ctree error while btrfs mount
-
-#!/bin/bash
-
-rm -f testfile.img
-dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
-DEVICE=$(losetup --show -f testfile.img)
-mkfs.btrfs -f $DEVICE
-mkdir -p tmpmnt
-
-FAILTYPE=fail_function
-FAILFUNC=open_ctree
-echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject
-echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval
-echo N > /sys/kernel/debug/$FAILTYPE/task-filter
-echo 100 > /sys/kernel/debug/$FAILTYPE/probability
-echo 0 > /sys/kernel/debug/$FAILTYPE/interval
-echo -1 > /sys/kernel/debug/$FAILTYPE/times
-echo 0 > /sys/kernel/debug/$FAILTYPE/space
-echo 1 > /sys/kernel/debug/$FAILTYPE/verbose
-
-mount -t btrfs $DEVICE tmpmnt
-if [ $? -ne 0 ]
-then
-	echo "SUCCESS!"
-else
-	echo "FAILED!"
-	umount tmpmnt
-fi
-
-echo > /sys/kernel/debug/$FAILTYPE/inject
-
-rmdir tmpmnt
-losetup -d $DEVICE
-rm testfile.img
-
-
-Tool to run command with failslab or fail_page_alloc
-----------------------------------------------------
-In order to make it easier to accomplish the tasks mentioned above, we can use
-tools/testing/fault-injection/failcmd.sh.  Please run a command
-"./tools/testing/fault-injection/failcmd.sh --help" for more information and
-see the following examples.
-
-Examples:
-
-Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab
-allocation failure.
-
-	# ./tools/testing/fault-injection/failcmd.sh \
-		-- make -C tools/testing/selftests/ run_tests
-
-Same as above except to specify 100 times failures at most instead of one time
-at most by default.
-
-	# ./tools/testing/fault-injection/failcmd.sh --times=100 \
-		-- make -C tools/testing/selftests/ run_tests
-
-Same as above except to inject page allocation failure instead of slab
-allocation failure.
-
-	# env FAILCMD_TYPE=fail_page_alloc \
-		./tools/testing/fault-injection/failcmd.sh --times=100 \
-                -- make -C tools/testing/selftests/ run_tests
-
-Systematic faults using fail-nth
----------------------------------
-
-The following code systematically faults 0-th, 1-st, 2-nd and so on
-capabilities in the socketpair() system call.
-
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/socket.h>
-#include <sys/syscall.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <string.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <errno.h>
-
-int main()
-{
-	int i, err, res, fail_nth, fds[2];
-	char buf[128];
-
-	system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
-	sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
-	fail_nth = open(buf, O_RDWR);
-	for (i = 1;; i++) {
-		sprintf(buf, "%d", i);
-		write(fail_nth, buf, strlen(buf));
-		res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
-		err = errno;
-		pread(fail_nth, buf, sizeof(buf), 0);
-		if (res == 0) {
-			close(fds[0]);
-			close(fds[1]);
-		}
-		printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
-			res, err);
-		if (atoi(buf))
-			break;
-	}
-	return 0;
-}
-
-An example output:
-
-1-th fault Y: res=-1/23
-2-th fault Y: res=-1/23
-3-th fault Y: res=-1/12
-4-th fault Y: res=-1/12
-5-th fault Y: res=-1/23
-6-th fault Y: res=-1/23
-7-th fault Y: res=-1/23
-8-th fault Y: res=-1/12
-9-th fault Y: res=-1/12
-10-th fault Y: res=-1/12
-11-th fault Y: res=-1/12
-12-th fault Y: res=-1/12
-13-th fault Y: res=-1/12
-14-th fault Y: res=-1/12
-15-th fault Y: res=-1/12
-16-th fault N: res=0/12
diff --git a/Documentation/fault-injection/index.rst b/Documentation/fault-injection/index.rst
new file mode 100644
index 000000000000..92b5639ed07a
--- /dev/null
+++ b/Documentation/fault-injection/index.rst
@@ -0,0 +1,20 @@
+:orphan:
+
+===============
+fault-injection
+===============
+
+.. toctree::
+    :maxdepth: 1
+
+    fault-injection
+    notifier-error-inject
+    nvme-fault-injection
+    provoke-crashes
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/fault-injection/notifier-error-inject.rst b/Documentation/fault-injection/notifier-error-inject.rst
new file mode 100644
index 000000000000..1668b6e48d3a
--- /dev/null
+++ b/Documentation/fault-injection/notifier-error-inject.rst
@@ -0,0 +1,98 @@
+Notifier error injection
+========================
+
+Notifier error injection provides the ability to inject artificial errors to
+specified notifier chain callbacks. It is useful to test the error handling of
+notifier call chain failures which is rarely executed.  There are kernel
+modules that can be used to test the following notifiers.
+
+ * PM notifier
+ * Memory hotplug notifier
+ * powerpc pSeries reconfig notifier
+ * Netdevice notifier
+
+PM notifier error injection module
+----------------------------------
+This feature is controlled through debugfs interface
+
+  /sys/kernel/debug/notifier-error-inject/pm/actions/<notifier event>/error
+
+Possible PM notifier events to be failed are:
+
+ * PM_HIBERNATION_PREPARE
+ * PM_SUSPEND_PREPARE
+ * PM_RESTORE_PREPARE
+
+Example: Inject PM suspend error (-12 = -ENOMEM)::
+
+	# cd /sys/kernel/debug/notifier-error-inject/pm/
+	# echo -12 > actions/PM_SUSPEND_PREPARE/error
+	# echo mem > /sys/power/state
+	bash: echo: write error: Cannot allocate memory
+
+Memory hotplug notifier error injection module
+----------------------------------------------
+This feature is controlled through debugfs interface
+
+  /sys/kernel/debug/notifier-error-inject/memory/actions/<notifier event>/error
+
+Possible memory notifier events to be failed are:
+
+ * MEM_GOING_ONLINE
+ * MEM_GOING_OFFLINE
+
+Example: Inject memory hotplug offline error (-12 == -ENOMEM)::
+
+	# cd /sys/kernel/debug/notifier-error-inject/memory
+	# echo -12 > actions/MEM_GOING_OFFLINE/error
+	# echo offline > /sys/devices/system/memory/memoryXXX/state
+	bash: echo: write error: Cannot allocate memory
+
+powerpc pSeries reconfig notifier error injection module
+--------------------------------------------------------
+This feature is controlled through debugfs interface
+
+  /sys/kernel/debug/notifier-error-inject/pSeries-reconfig/actions/<notifier event>/error
+
+Possible pSeries reconfig notifier events to be failed are:
+
+ * PSERIES_RECONFIG_ADD
+ * PSERIES_RECONFIG_REMOVE
+ * PSERIES_DRCONF_MEM_ADD
+ * PSERIES_DRCONF_MEM_REMOVE
+
+Netdevice notifier error injection module
+----------------------------------------------
+This feature is controlled through debugfs interface
+
+  /sys/kernel/debug/notifier-error-inject/netdev/actions/<notifier event>/error
+
+Netdevice notifier events which can be failed are:
+
+ * NETDEV_REGISTER
+ * NETDEV_CHANGEMTU
+ * NETDEV_CHANGENAME
+ * NETDEV_PRE_UP
+ * NETDEV_PRE_TYPE_CHANGE
+ * NETDEV_POST_INIT
+ * NETDEV_PRECHANGEMTU
+ * NETDEV_PRECHANGEUPPER
+ * NETDEV_CHANGEUPPER
+
+Example: Inject netdevice mtu change error (-22 == -EINVAL)::
+
+	# cd /sys/kernel/debug/notifier-error-inject/netdev
+	# echo -22 > actions/NETDEV_CHANGEMTU/error
+	# ip link set eth0 mtu 1024
+	RTNETLINK answers: Invalid argument
+
+For more usage examples
+-----------------------
+There are tools/testing/selftests using the notifier error injection features
+for CPU and memory notifiers.
+
+ * tools/testing/selftests/cpu-hotplug/on-off-test.sh
+ * tools/testing/selftests/memory-hotplug/on-off-test.sh
+
+These scripts first do simple online and offline tests and then do fault
+injection tests if notifier error injection module is available.
diff --git a/Documentation/fault-injection/notifier-error-inject.txt b/Documentation/fault-injection/notifier-error-inject.txt
deleted file mode 100644
index e861d761de24..000000000000
--- a/Documentation/fault-injection/notifier-error-inject.txt
+++ /dev/null
@@ -1,94 +0,0 @@
-Notifier error injection
-========================
-
-Notifier error injection provides the ability to inject artificial errors to
-specified notifier chain callbacks. It is useful to test the error handling of
-notifier call chain failures which is rarely executed.  There are kernel
-modules that can be used to test the following notifiers.
-
- * PM notifier
- * Memory hotplug notifier
- * powerpc pSeries reconfig notifier
- * Netdevice notifier
-
-PM notifier error injection module
-----------------------------------
-This feature is controlled through debugfs interface
-/sys/kernel/debug/notifier-error-inject/pm/actions/<notifier event>/error
-
-Possible PM notifier events to be failed are:
-
- * PM_HIBERNATION_PREPARE
- * PM_SUSPEND_PREPARE
- * PM_RESTORE_PREPARE
-
-Example: Inject PM suspend error (-12 = -ENOMEM)
-
-	# cd /sys/kernel/debug/notifier-error-inject/pm/
-	# echo -12 > actions/PM_SUSPEND_PREPARE/error
-	# echo mem > /sys/power/state
-	bash: echo: write error: Cannot allocate memory
-
-Memory hotplug notifier error injection module
-----------------------------------------------
-This feature is controlled through debugfs interface
-/sys/kernel/debug/notifier-error-inject/memory/actions/<notifier event>/error
-
-Possible memory notifier events to be failed are:
-
- * MEM_GOING_ONLINE
- * MEM_GOING_OFFLINE
-
-Example: Inject memory hotplug offline error (-12 == -ENOMEM)
-
-	# cd /sys/kernel/debug/notifier-error-inject/memory
-	# echo -12 > actions/MEM_GOING_OFFLINE/error
-	# echo offline > /sys/devices/system/memory/memoryXXX/state
-	bash: echo: write error: Cannot allocate memory
-
-powerpc pSeries reconfig notifier error injection module
---------------------------------------------------------
-This feature is controlled through debugfs interface
-/sys/kernel/debug/notifier-error-inject/pSeries-reconfig/actions/<notifier event>/error
-
-Possible pSeries reconfig notifier events to be failed are:
-
- * PSERIES_RECONFIG_ADD
- * PSERIES_RECONFIG_REMOVE
- * PSERIES_DRCONF_MEM_ADD
- * PSERIES_DRCONF_MEM_REMOVE
-
-Netdevice notifier error injection module
-----------------------------------------------
-This feature is controlled through debugfs interface
-/sys/kernel/debug/notifier-error-inject/netdev/actions/<notifier event>/error
-
-Netdevice notifier events which can be failed are:
-
- * NETDEV_REGISTER
- * NETDEV_CHANGEMTU
- * NETDEV_CHANGENAME
- * NETDEV_PRE_UP
- * NETDEV_PRE_TYPE_CHANGE
- * NETDEV_POST_INIT
- * NETDEV_PRECHANGEMTU
- * NETDEV_PRECHANGEUPPER
- * NETDEV_CHANGEUPPER
-
-Example: Inject netdevice mtu change error (-22 == -EINVAL)
-
-	# cd /sys/kernel/debug/notifier-error-inject/netdev
-	# echo -22 > actions/NETDEV_CHANGEMTU/error
-	# ip link set eth0 mtu 1024
-	RTNETLINK answers: Invalid argument
-
-For more usage examples
------------------------
-There are tools/testing/selftests using the notifier error injection features
-for CPU and memory notifiers.
-
- * tools/testing/selftests/cpu-hotplug/on-off-test.sh
- * tools/testing/selftests/memory-hotplug/on-off-test.sh
-
-These scripts first do simple online and offline tests and then do fault
-injection tests if notifier error injection module is available.
diff --git a/Documentation/fault-injection/nvme-fault-injection.rst b/Documentation/fault-injection/nvme-fault-injection.rst
new file mode 100644
index 000000000000..bbb1bf3e8650
--- /dev/null
+++ b/Documentation/fault-injection/nvme-fault-injection.rst
@@ -0,0 +1,120 @@
+NVMe Fault Injection
+====================
+Linux's fault injection framework provides a systematic way to support
+error injection via debugfs in the /sys/kernel/debug directory. When
+enabled, the default NVME_SC_INVALID_OPCODE with no retry will be
+injected into the nvme_end_request. Users can change the default status
+code and no retry flag via the debugfs. The list of Generic Command
+Status can be found in include/linux/nvme.h
+
+Following examples show how to inject an error into the nvme.
+
+First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config,
+recompile the kernel. After booting up the kernel, do the
+following.
+
+Example 1: Inject default status code with no retry
+---------------------------------------------------
+
+::
+
+  mount /dev/nvme0n1 /mnt
+  echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times
+  echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability
+  cp a.file /mnt
+
+Expected Result::
+
+  cp: cannot stat ‘/mnt/a.file’: Input/output error
+
+Message from dmesg::
+
+  FAULT_INJECTION: forcing a failure.
+  name fault_inject, interval 1, probability 100, space 0, times 1
+  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2
+  Hardware name: innotek GmbH VirtualBox/VirtualBox,
+  BIOS VirtualBox 12/01/2006
+  Call Trace:
+    <IRQ>
+    dump_stack+0x5c/0x7d
+    should_fail+0x148/0x170
+    nvme_should_fail+0x2f/0x50 [nvme_core]
+    nvme_process_cq+0xe7/0x1d0 [nvme]
+    nvme_irq+0x1e/0x40 [nvme]
+    __handle_irq_event_percpu+0x3a/0x190
+    handle_irq_event_percpu+0x30/0x70
+    handle_irq_event+0x36/0x60
+    handle_fasteoi_irq+0x78/0x120
+    handle_irq+0xa7/0x130
+    ? tick_irq_enter+0xa8/0xc0
+    do_IRQ+0x43/0xc0
+    common_interrupt+0xa2/0xa2
+    </IRQ>
+  RIP: 0010:native_safe_halt+0x2/0x10
+  RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
+  RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000
+  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
+  RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000
+  R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480
+  R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000
+    ? __sched_text_end+0x4/0x4
+    default_idle+0x18/0xf0
+    do_idle+0x150/0x1d0
+    cpu_startup_entry+0x6f/0x80
+    start_kernel+0x4c4/0x4e4
+    ? set_init_arg+0x55/0x55
+    secondary_startup_64+0xa5/0xb0
+    print_req_error: I/O error, dev nvme0n1, sector 9240
+  EXT4-fs error (device nvme0n1): ext4_find_entry:1436:
+  inode #2: comm cp: reading directory lblock 0
+
+Example 2: Inject default status code with retry
+------------------------------------------------
+
+::
+
+  mount /dev/nvme0n1 /mnt
+  echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times
+  echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability
+  echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/status
+  echo 0 > /sys/kernel/debug/nvme0n1/fault_inject/dont_retry
+
+  cp a.file /mnt
+
+Expected Result::
+
+  command success without error
+
+Message from dmesg::
+
+  FAULT_INJECTION: forcing a failure.
+  name fault_inject, interval 1, probability 100, space 0, times 1
+  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc8+ #4
+  Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
+  Call Trace:
+    <IRQ>
+    dump_stack+0x5c/0x7d
+    should_fail+0x148/0x170
+    nvme_should_fail+0x30/0x60 [nvme_core]
+    nvme_loop_queue_response+0x84/0x110 [nvme_loop]
+    nvmet_req_complete+0x11/0x40 [nvmet]
+    nvmet_bio_done+0x28/0x40 [nvmet]
+    blk_update_request+0xb0/0x310
+    blk_mq_end_request+0x18/0x60
+    flush_smp_call_function_queue+0x3d/0xf0
+    smp_call_function_single_interrupt+0x2c/0xc0
+    call_function_single_interrupt+0xa2/0xb0
+    </IRQ>
+  RIP: 0010:native_safe_halt+0x2/0x10
+  RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
+  RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 RCX: 0000000000000000
+  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
+  RBP: 0000000000000001 R08: 000000008e38c131 R09: 0000000000000000
+  R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a3c9680
+  R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000
+    ? __sched_text_end+0x4/0x4
+    default_idle+0x18/0xf0
+    do_idle+0x150/0x1d0
+    cpu_startup_entry+0x6f/0x80
+    start_secondary+0x187/0x1e0
+    secondary_startup_64+0xa5/0xb0
diff --git a/Documentation/fault-injection/nvme-fault-injection.txt b/Documentation/fault-injection/nvme-fault-injection.txt
deleted file mode 100644
index 8fbf3bf60b62..000000000000
--- a/Documentation/fault-injection/nvme-fault-injection.txt
+++ /dev/null
@@ -1,116 +0,0 @@
-NVMe Fault Injection
-====================
-Linux's fault injection framework provides a systematic way to support
-error injection via debugfs in the /sys/kernel/debug directory. When
-enabled, the default NVME_SC_INVALID_OPCODE with no retry will be
-injected into the nvme_end_request. Users can change the default status
-code and no retry flag via the debugfs. The list of Generic Command
-Status can be found in include/linux/nvme.h
-
-Following examples show how to inject an error into the nvme.
-
-First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config,
-recompile the kernel. After booting up the kernel, do the
-following.
-
-Example 1: Inject default status code with no retry
----------------------------------------------------
-
-mount /dev/nvme0n1 /mnt
-echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times
-echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability
-cp a.file /mnt
-
-Expected Result:
-
-cp: cannot stat ‘/mnt/a.file’: Input/output error
-
-Message from dmesg:
-
-FAULT_INJECTION: forcing a failure.
-name fault_inject, interval 1, probability 100, space 0, times 1
-CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2
-Hardware name: innotek GmbH VirtualBox/VirtualBox,
-BIOS VirtualBox 12/01/2006
-Call Trace:
-  <IRQ>
-  dump_stack+0x5c/0x7d
-  should_fail+0x148/0x170
-  nvme_should_fail+0x2f/0x50 [nvme_core]
-  nvme_process_cq+0xe7/0x1d0 [nvme]
-  nvme_irq+0x1e/0x40 [nvme]
-  __handle_irq_event_percpu+0x3a/0x190
-  handle_irq_event_percpu+0x30/0x70
-  handle_irq_event+0x36/0x60
-  handle_fasteoi_irq+0x78/0x120
-  handle_irq+0xa7/0x130
-  ? tick_irq_enter+0xa8/0xc0
-  do_IRQ+0x43/0xc0
-  common_interrupt+0xa2/0xa2
-  </IRQ>
-RIP: 0010:native_safe_halt+0x2/0x10
-RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
-RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000
-RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
-RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000
-R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480
-R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000
-  ? __sched_text_end+0x4/0x4
-  default_idle+0x18/0xf0
-  do_idle+0x150/0x1d0
-  cpu_startup_entry+0x6f/0x80
-  start_kernel+0x4c4/0x4e4
-  ? set_init_arg+0x55/0x55
-  secondary_startup_64+0xa5/0xb0
-  print_req_error: I/O error, dev nvme0n1, sector 9240
-EXT4-fs error (device nvme0n1): ext4_find_entry:1436:
-inode #2: comm cp: reading directory lblock 0
-
-Example 2: Inject default status code with retry
-------------------------------------------------
-
-mount /dev/nvme0n1 /mnt
-echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times
-echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability
-echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/status
-echo 0 > /sys/kernel/debug/nvme0n1/fault_inject/dont_retry
-
-cp a.file /mnt
-
-Expected Result:
-
-command success without error
-
-Message from dmesg:
-
-FAULT_INJECTION: forcing a failure.
-name fault_inject, interval 1, probability 100, space 0, times 1
-CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc8+ #4
-Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
-Call Trace:
-  <IRQ>
-  dump_stack+0x5c/0x7d
-  should_fail+0x148/0x170
-  nvme_should_fail+0x30/0x60 [nvme_core]
-  nvme_loop_queue_response+0x84/0x110 [nvme_loop]
-  nvmet_req_complete+0x11/0x40 [nvmet]
-  nvmet_bio_done+0x28/0x40 [nvmet]
-  blk_update_request+0xb0/0x310
-  blk_mq_end_request+0x18/0x60
-  flush_smp_call_function_queue+0x3d/0xf0
-  smp_call_function_single_interrupt+0x2c/0xc0
-  call_function_single_interrupt+0xa2/0xb0
-  </IRQ>
-RIP: 0010:native_safe_halt+0x2/0x10
-RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
-RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 RCX: 0000000000000000
-RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
-RBP: 0000000000000001 R08: 000000008e38c131 R09: 0000000000000000
-R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a3c9680
-R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000
-  ? __sched_text_end+0x4/0x4
-  default_idle+0x18/0xf0
-  do_idle+0x150/0x1d0
-  cpu_startup_entry+0x6f/0x80
-  start_secondary+0x187/0x1e0
-  secondary_startup_64+0xa5/0xb0
diff --git a/Documentation/fault-injection/provoke-crashes.rst b/Documentation/fault-injection/provoke-crashes.rst
new file mode 100644
index 000000000000..9279a3e12278
--- /dev/null
+++ b/Documentation/fault-injection/provoke-crashes.rst
@@ -0,0 +1,48 @@
+===============
+Provoke crashes
+===============
+
+The lkdtm module provides an interface to crash or injure the kernel at
+predefined crashpoints to evaluate the reliability of crash dumps obtained
+using different dumping solutions. The module uses KPROBEs to instrument
+crashing points, but can also crash the kernel directly without KRPOBE
+support.
+
+
+You can provide the way either through module arguments when inserting
+the module, or through a debugfs interface.
+
+Usage::
+
+	insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<>
+			[cpoint_count={>0}]
+
+recur_count
+	Recursion level for the stack overflow test. Default is 10.
+
+cpoint_name
+	Crash point where the kernel is to be crashed. It can be
+	one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY,
+	FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD,
+	IDE_CORE_CP, DIRECT
+
+cpoint_type
+	Indicates the action to be taken on hitting the crash point.
+	It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW,
+	CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION,
+	WRITE_AFTER_FREE,
+
+cpoint_count
+	Indicates the number of times the crash point is to be hit
+	to trigger an action. The default is 10.
+
+You can also induce failures by mounting debugfs and writing the type to
+<mountpoint>/provoke-crash/<crashpoint>. E.g.::
+
+  mount -t debugfs debugfs /mnt
+  echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY
+
+
+A special file is `DIRECT` which will induce the crash directly without
+KPROBE instrumentation. This mode is the only one available when the module
+is built on a kernel without KPROBEs support.
diff --git a/Documentation/fault-injection/provoke-crashes.txt b/Documentation/fault-injection/provoke-crashes.txt
deleted file mode 100644
index 7a9d3d81525b..000000000000
--- a/Documentation/fault-injection/provoke-crashes.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-The lkdtm module provides an interface to crash or injure the kernel at
-predefined crashpoints to evaluate the reliability of crash dumps obtained
-using different dumping solutions. The module uses KPROBEs to instrument
-crashing points, but can also crash the kernel directly without KRPOBE
-support.
-
-
-You can provide the way either through module arguments when inserting
-the module, or through a debugfs interface.
-
-Usage: insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<>
-				[cpoint_count={>0}]
-
-  recur_count : Recursion level for the stack overflow test. Default is 10.
-
-  cpoint_name : Crash point where the kernel is to be crashed. It can be
-	 one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY,
-	 FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD,
-	 IDE_CORE_CP, DIRECT
-
-  cpoint_type : Indicates the action to be taken on hitting the crash point.
-     It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW,
-     CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION,
-     WRITE_AFTER_FREE,
-
-  cpoint_count : Indicates the number of times the crash point is to be hit
-    to trigger an action. The default is 10.
-
-You can also induce failures by mounting debugfs and writing the type to
-<mountpoint>/provoke-crash/<crashpoint>. E.g.,
-
-  mount -t debugfs debugfs /mnt
-  echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY
-
-
-A special file is `DIRECT' which will induce the crash directly without
-KPROBE instrumentation. This mode is the only one available when the module
-is built on a kernel without KPROBEs support.
diff --git a/Documentation/process/4.Coding.rst b/Documentation/process/4.Coding.rst
index 4b7a5ab3cec1..13dd893c9f88 100644
--- a/Documentation/process/4.Coding.rst
+++ b/Documentation/process/4.Coding.rst
@@ -298,7 +298,7 @@ enabled, a configurable percentage of memory allocations will be made to
 fail; these failures can be restricted to a specific range of code.
 Running with fault injection enabled allows the programmer to see how the
 code responds when things go badly.  See
-Documentation/fault-injection/fault-injection.txt for more information on
+Documentation/fault-injection/fault-injection.rst for more information on
 how to use this facility.
 
 Other kinds of errors can be found with the "sparse" static analysis tool.
diff --git a/Documentation/translations/it_IT/process/4.Coding.rst b/Documentation/translations/it_IT/process/4.Coding.rst
index c05b89e616dd..a5e36aa60448 100644
--- a/Documentation/translations/it_IT/process/4.Coding.rst
+++ b/Documentation/translations/it_IT/process/4.Coding.rst
@@ -314,7 +314,7 @@ di allocazione di memoria sarà destinata al fallimento; questi fallimenti
 possono essere ridotti ad uno specifico pezzo di codice.  Procedere con
 l'inserimento dei fallimenti attivo permette al programmatore di verificare
 come il codice risponde quando le cose vanno male.  Consultate:
-Documentation/fault-injection/fault-injection.txt per avere maggiori
+Documentation/fault-injection/fault-injection.rst per avere maggiori
 informazioni su come utilizzare questo strumento.
 
 Altre tipologie di errori possono essere riscontrati con lo strumento di
diff --git a/Documentation/translations/zh_CN/process/4.Coding.rst b/Documentation/translations/zh_CN/process/4.Coding.rst
index 8bb777941394..b82b1dde3122 100644
--- a/Documentation/translations/zh_CN/process/4.Coding.rst
+++ b/Documentation/translations/zh_CN/process/4.Coding.rst
@@ -205,7 +205,7 @@ Linus对这个问题给出了最佳答案:
 启用故障注入后，内存分配的可配置百分比将失败；这些失败可以限制在特定的代码
 范围内。在启用了故障注入的情况下运行，程序员可以看到当情况恶化时代码如何响
 应。有关如何使用此工具的详细信息，请参阅
-Documentation/fault-injection/fault-injection.txt。
+Documentation/fault-injection/fault-injection.rst。
 
 使用“sparse”静态分析工具可以发现其他类型的错误。对于sparse，可以警告程序员
 用户空间和内核空间地址之间的混淆、big endian和small endian数量的混合、在需
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 8a1428d4f138..bba49abb6750 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -15,7 +15,7 @@
  *
  * Debugfs support added by Simon Kagstrom <simon.kagstrom@netinsight.net>
  *
- * See Documentation/fault-injection/provoke-crashes.txt for instructions
+ * See Documentation/fault-injection/provoke-crashes.rst for instructions
  */
 #include "lkdtm.h"
 #include <linux/fs.h>
diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h
index 7e6c77740413..e525f6957c49 100644
--- a/include/linux/fault-inject.h
+++ b/include/linux/fault-inject.h
@@ -11,7 +11,7 @@
 
 /*
  * For explanation of the elements of this struct, see
- * Documentation/fault-injection/fault-injection.txt
+ * Documentation/fault-injection/fault-injection.rst
  */
 struct fault_attr {
 	unsigned long probability;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cbdfae379896..4d42a9a6006d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1701,7 +1701,7 @@ config LKDTM
 	called lkdtm.
 
 	Documentation on how to use the module can be found in
-	Documentation/fault-injection/provoke-crashes.txt
+	Documentation/fault-injection/provoke-crashes.rst
 
 config TEST_LIST_SORT
 	tristate "Linked list sorting test"
diff --git a/tools/testing/fault-injection/failcmd.sh b/tools/testing/fault-injection/failcmd.sh
index 29a6c63c5a15..78dac34264be 100644
--- a/tools/testing/fault-injection/failcmd.sh
+++ b/tools/testing/fault-injection/failcmd.sh
@@ -42,7 +42,7 @@ OPTIONS
 	--interval=value, --space=value, --verbose=value, --task-filter=value,
 	--stacktrace-depth=value, --require-start=value, --require-end=value,
 	--reject-start=value, --reject-end=value, --ignore-gfp-wait=value
-		See Documentation/fault-injection/fault-injection.txt for more
+		See Documentation/fault-injection/fault-injection.rst for more
 		information
 
 	failslab options:
-- 
cgit v1.2.3-59-g8ed1b


From ab42b818954c040fa13639dc031d8541edcecb4b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:45 -0300
Subject: docs: fb: convert docs to ReST and rename to *.rst

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Also, removed the Maintained by, as requested by Geert.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt |   2 +-
 Documentation/fb/api.rst                        | 307 ++++++++++++++++
 Documentation/fb/api.txt                        | 306 ----------------
 Documentation/fb/arkfb.rst                      |  68 ++++
 Documentation/fb/arkfb.txt                      |  68 ----
 Documentation/fb/aty128fb.rst                   |  75 ++++
 Documentation/fb/aty128fb.txt                   |  72 ----
 Documentation/fb/cirrusfb.rst                   |  94 +++++
 Documentation/fb/cirrusfb.txt                   |  97 ------
 Documentation/fb/cmap_xfbdev.rst                |  56 +++
 Documentation/fb/cmap_xfbdev.txt                |  53 ---
 Documentation/fb/deferred_io.rst                |  79 +++++
 Documentation/fb/deferred_io.txt                |  75 ----
 Documentation/fb/efifb.rst                      |  39 +++
 Documentation/fb/efifb.txt                      |  37 --
 Documentation/fb/ep93xx-fb.rst                  | 140 ++++++++
 Documentation/fb/ep93xx-fb.txt                  | 135 --------
 Documentation/fb/fbcon.rst                      | 350 +++++++++++++++++++
 Documentation/fb/fbcon.txt                      | 347 -------------------
 Documentation/fb/framebuffer.rst                | 353 +++++++++++++++++++
 Documentation/fb/framebuffer.txt                | 343 ------------------
 Documentation/fb/gxfb.rst                       |  54 +++
 Documentation/fb/gxfb.txt                       |  52 ---
 Documentation/fb/index.rst                      |  50 +++
 Documentation/fb/intel810.rst                   | 287 +++++++++++++++
 Documentation/fb/intel810.txt                   | 278 ---------------
 Documentation/fb/intelfb.rst                    | 155 +++++++++
 Documentation/fb/intelfb.txt                    | 149 --------
 Documentation/fb/internals.rst                  |  86 +++++
 Documentation/fb/internals.txt                  |  82 -----
 Documentation/fb/lxfb.rst                       |  55 +++
 Documentation/fb/lxfb.txt                       |  52 ---
 Documentation/fb/matroxfb.rst                   | 443 ++++++++++++++++++++++++
 Documentation/fb/matroxfb.txt                   | 413 ----------------------
 Documentation/fb/metronomefb.rst                |  38 ++
 Documentation/fb/metronomefb.txt                |  36 --
 Documentation/fb/modedb.rst                     | 155 +++++++++
 Documentation/fb/modedb.txt                     | 151 --------
 Documentation/fb/pvr2fb.rst                     |  66 ++++
 Documentation/fb/pvr2fb.txt                     |  65 ----
 Documentation/fb/pxafb.rst                      | 173 +++++++++
 Documentation/fb/pxafb.txt                      | 142 --------
 Documentation/fb/s3fb.rst                       |  82 +++++
 Documentation/fb/s3fb.txt                       |  82 -----
 Documentation/fb/sa1100fb.rst                   |  40 +++
 Documentation/fb/sa1100fb.txt                   |  39 ---
 Documentation/fb/sh7760fb.rst                   | 130 +++++++
 Documentation/fb/sh7760fb.txt                   | 131 -------
 Documentation/fb/sisfb.rst                      | 160 +++++++++
 Documentation/fb/sisfb.txt                      | 158 ---------
 Documentation/fb/sm501.rst                      |  15 +
 Documentation/fb/sm501.txt                      |  10 -
 Documentation/fb/sm712fb.rst                    |  35 ++
 Documentation/fb/sm712fb.txt                    |  31 --
 Documentation/fb/sstfb.rst                      | 207 +++++++++++
 Documentation/fb/sstfb.txt                      | 174 ----------
 Documentation/fb/tgafb.rst                      |  71 ++++
 Documentation/fb/tgafb.txt                      |  69 ----
 Documentation/fb/tridentfb.rst                  |  78 +++++
 Documentation/fb/tridentfb.txt                  |  70 ----
 Documentation/fb/udlfb.rst                      | 162 +++++++++
 Documentation/fb/udlfb.txt                      | 159 ---------
 Documentation/fb/uvesafb.rst                    | 188 ++++++++++
 Documentation/fb/uvesafb.txt                    | 184 ----------
 Documentation/fb/vesafb.rst                     | 192 ++++++++++
 Documentation/fb/vesafb.txt                     | 181 ----------
 Documentation/fb/viafb.rst                      | 297 ++++++++++++++++
 Documentation/fb/viafb.txt                      | 252 --------------
 Documentation/fb/vt8623fb.rst                   |  64 ++++
 Documentation/fb/vt8623fb.txt                   |  64 ----
 MAINTAINERS                                     |  10 +-
 drivers/tty/Kconfig                             |   2 +-
 drivers/video/fbdev/Kconfig                     |  24 +-
 drivers/video/fbdev/matrox/matroxfb_base.c      |   2 +-
 drivers/video/fbdev/pxafb.c                     |   2 +-
 drivers/video/fbdev/sh7760fb.c                  |   2 +-
 76 files changed, 4866 insertions(+), 4579 deletions(-)
 create mode 100644 Documentation/fb/api.rst
 delete mode 100644 Documentation/fb/api.txt
 create mode 100644 Documentation/fb/arkfb.rst
 delete mode 100644 Documentation/fb/arkfb.txt
 create mode 100644 Documentation/fb/aty128fb.rst
 delete mode 100644 Documentation/fb/aty128fb.txt
 create mode 100644 Documentation/fb/cirrusfb.rst
 delete mode 100644 Documentation/fb/cirrusfb.txt
 create mode 100644 Documentation/fb/cmap_xfbdev.rst
 delete mode 100644 Documentation/fb/cmap_xfbdev.txt
 create mode 100644 Documentation/fb/deferred_io.rst
 delete mode 100644 Documentation/fb/deferred_io.txt
 create mode 100644 Documentation/fb/efifb.rst
 delete mode 100644 Documentation/fb/efifb.txt
 create mode 100644 Documentation/fb/ep93xx-fb.rst
 delete mode 100644 Documentation/fb/ep93xx-fb.txt
 create mode 100644 Documentation/fb/fbcon.rst
 delete mode 100644 Documentation/fb/fbcon.txt
 create mode 100644 Documentation/fb/framebuffer.rst
 delete mode 100644 Documentation/fb/framebuffer.txt
 create mode 100644 Documentation/fb/gxfb.rst
 delete mode 100644 Documentation/fb/gxfb.txt
 create mode 100644 Documentation/fb/index.rst
 create mode 100644 Documentation/fb/intel810.rst
 delete mode 100644 Documentation/fb/intel810.txt
 create mode 100644 Documentation/fb/intelfb.rst
 delete mode 100644 Documentation/fb/intelfb.txt
 create mode 100644 Documentation/fb/internals.rst
 delete mode 100644 Documentation/fb/internals.txt
 create mode 100644 Documentation/fb/lxfb.rst
 delete mode 100644 Documentation/fb/lxfb.txt
 create mode 100644 Documentation/fb/matroxfb.rst
 delete mode 100644 Documentation/fb/matroxfb.txt
 create mode 100644 Documentation/fb/metronomefb.rst
 delete mode 100644 Documentation/fb/metronomefb.txt
 create mode 100644 Documentation/fb/modedb.rst
 delete mode 100644 Documentation/fb/modedb.txt
 create mode 100644 Documentation/fb/pvr2fb.rst
 delete mode 100644 Documentation/fb/pvr2fb.txt
 create mode 100644 Documentation/fb/pxafb.rst
 delete mode 100644 Documentation/fb/pxafb.txt
 create mode 100644 Documentation/fb/s3fb.rst
 delete mode 100644 Documentation/fb/s3fb.txt
 create mode 100644 Documentation/fb/sa1100fb.rst
 delete mode 100644 Documentation/fb/sa1100fb.txt
 create mode 100644 Documentation/fb/sh7760fb.rst
 delete mode 100644 Documentation/fb/sh7760fb.txt
 create mode 100644 Documentation/fb/sisfb.rst
 delete mode 100644 Documentation/fb/sisfb.txt
 create mode 100644 Documentation/fb/sm501.rst
 delete mode 100644 Documentation/fb/sm501.txt
 create mode 100644 Documentation/fb/sm712fb.rst
 delete mode 100644 Documentation/fb/sm712fb.txt
 create mode 100644 Documentation/fb/sstfb.rst
 delete mode 100644 Documentation/fb/sstfb.txt
 create mode 100644 Documentation/fb/tgafb.rst
 delete mode 100644 Documentation/fb/tgafb.txt
 create mode 100644 Documentation/fb/tridentfb.rst
 delete mode 100644 Documentation/fb/tridentfb.txt
 create mode 100644 Documentation/fb/udlfb.rst
 delete mode 100644 Documentation/fb/udlfb.txt
 create mode 100644 Documentation/fb/uvesafb.rst
 delete mode 100644 Documentation/fb/uvesafb.txt
 create mode 100644 Documentation/fb/vesafb.rst
 delete mode 100644 Documentation/fb/vesafb.txt
 create mode 100644 Documentation/fb/viafb.rst
 delete mode 100644 Documentation/fb/viafb.txt
 create mode 100644 Documentation/fb/vt8623fb.rst
 delete mode 100644 Documentation/fb/vt8623fb.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9b16b640ce48..83d6560f10f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5024,7 +5024,7 @@
 			vector=percpu: enable percpu vector domain
 
 	video=		[FB] Frame buffer configuration
-			See Documentation/fb/modedb.txt.
+			See Documentation/fb/modedb.rst.
 
 	video.brightness_switch_enabled= [0,1]
 			If set to 1, on receiving an ACPI notify event
diff --git a/Documentation/fb/api.rst b/Documentation/fb/api.rst
new file mode 100644
index 000000000000..79ec33dded74
--- /dev/null
+++ b/Documentation/fb/api.rst
@@ -0,0 +1,307 @@
+===========================
+The Frame Buffer Device API
+===========================
+
+Last revised: June 21, 2011
+
+
+0. Introduction
+---------------
+
+This document describes the frame buffer API used by applications to interact
+with frame buffer devices. In-kernel APIs between device drivers and the frame
+buffer core are not described.
+
+Due to a lack of documentation in the original frame buffer API, drivers
+behaviours differ in subtle (and not so subtle) ways. This document describes
+the recommended API implementation, but applications should be prepared to
+deal with different behaviours.
+
+
+1. Capabilities
+---------------
+
+Device and driver capabilities are reported in the fixed screen information
+capabilities field::
+
+  struct fb_fix_screeninfo {
+	...
+	__u16 capabilities;		/* see FB_CAP_*			*/
+	...
+  };
+
+Application should use those capabilities to find out what features they can
+expect from the device and driver.
+
+- FB_CAP_FOURCC
+
+The driver supports the four character code (FOURCC) based format setting API.
+When supported, formats are configured using a FOURCC instead of manually
+specifying color components layout.
+
+
+2. Types and visuals
+--------------------
+
+Pixels are stored in memory in hardware-dependent formats. Applications need
+to be aware of the pixel storage format in order to write image data to the
+frame buffer memory in the format expected by the hardware.
+
+Formats are described by frame buffer types and visuals. Some visuals require
+additional information, which are stored in the variable screen information
+bits_per_pixel, grayscale, red, green, blue and transp fields.
+
+Visuals describe how color information is encoded and assembled to create
+macropixels. Types describe how macropixels are stored in memory. The following
+types and visuals are supported.
+
+- FB_TYPE_PACKED_PIXELS
+
+Macropixels are stored contiguously in a single plane. If the number of bits
+per macropixel is not a multiple of 8, whether macropixels are padded to the
+next multiple of 8 bits or packed together into bytes depends on the visual.
+
+Padding at end of lines may be present and is then reported through the fixed
+screen information line_length field.
+
+- FB_TYPE_PLANES
+
+Macropixels are split across multiple planes. The number of planes is equal to
+the number of bits per macropixel, with plane i'th storing i'th bit from all
+macropixels.
+
+Planes are located contiguously in memory.
+
+- FB_TYPE_INTERLEAVED_PLANES
+
+Macropixels are split across multiple planes. The number of planes is equal to
+the number of bits per macropixel, with plane i'th storing i'th bit from all
+macropixels.
+
+Planes are interleaved in memory. The interleave factor, defined as the
+distance in bytes between the beginning of two consecutive interleaved blocks
+belonging to different planes, is stored in the fixed screen information
+type_aux field.
+
+- FB_TYPE_FOURCC
+
+Macropixels are stored in memory as described by the format FOURCC identifier
+stored in the variable screen information grayscale field.
+
+- FB_VISUAL_MONO01
+
+Pixels are black or white and stored on a number of bits (typically one)
+specified by the variable screen information bpp field.
+
+Black pixels are represented by all bits set to 1 and white pixels by all bits
+set to 0. When the number of bits per pixel is smaller than 8, several pixels
+are packed together in a byte.
+
+FB_VISUAL_MONO01 is currently used with FB_TYPE_PACKED_PIXELS only.
+
+- FB_VISUAL_MONO10
+
+Pixels are black or white and stored on a number of bits (typically one)
+specified by the variable screen information bpp field.
+
+Black pixels are represented by all bits set to 0 and white pixels by all bits
+set to 1. When the number of bits per pixel is smaller than 8, several pixels
+are packed together in a byte.
+
+FB_VISUAL_MONO01 is currently used with FB_TYPE_PACKED_PIXELS only.
+
+- FB_VISUAL_TRUECOLOR
+
+Pixels are broken into red, green and blue components, and each component
+indexes a read-only lookup table for the corresponding value. Lookup tables
+are device-dependent, and provide linear or non-linear ramps.
+
+Each component is stored in a macropixel according to the variable screen
+information red, green, blue and transp fields.
+
+- FB_VISUAL_PSEUDOCOLOR and FB_VISUAL_STATIC_PSEUDOCOLOR
+
+Pixel values are encoded as indices into a colormap that stores red, green and
+blue components. The colormap is read-only for FB_VISUAL_STATIC_PSEUDOCOLOR
+and read-write for FB_VISUAL_PSEUDOCOLOR.
+
+Each pixel value is stored in the number of bits reported by the variable
+screen information bits_per_pixel field.
+
+- FB_VISUAL_DIRECTCOLOR
+
+Pixels are broken into red, green and blue components, and each component
+indexes a programmable lookup table for the corresponding value.
+
+Each component is stored in a macropixel according to the variable screen
+information red, green, blue and transp fields.
+
+- FB_VISUAL_FOURCC
+
+Pixels are encoded and  interpreted as described by the format FOURCC
+identifier stored in the variable screen information grayscale field.
+
+
+3. Screen information
+---------------------
+
+Screen information are queried by applications using the FBIOGET_FSCREENINFO
+and FBIOGET_VSCREENINFO ioctls. Those ioctls take a pointer to a
+fb_fix_screeninfo and fb_var_screeninfo structure respectively.
+
+struct fb_fix_screeninfo stores device independent unchangeable information
+about the frame buffer device and the current format. Those information can't
+be directly modified by applications, but can be changed by the driver when an
+application modifies the format::
+
+  struct fb_fix_screeninfo {
+	char id[16];			/* identification string eg "TT Builtin" */
+	unsigned long smem_start;	/* Start of frame buffer mem */
+					/* (physical address) */
+	__u32 smem_len;			/* Length of frame buffer mem */
+	__u32 type;			/* see FB_TYPE_*		*/
+	__u32 type_aux;			/* Interleave for interleaved Planes */
+	__u32 visual;			/* see FB_VISUAL_*		*/
+	__u16 xpanstep;			/* zero if no hardware panning  */
+	__u16 ypanstep;			/* zero if no hardware panning  */
+	__u16 ywrapstep;		/* zero if no hardware ywrap    */
+	__u32 line_length;		/* length of a line in bytes    */
+	unsigned long mmio_start;	/* Start of Memory Mapped I/O   */
+					/* (physical address) */
+	__u32 mmio_len;			/* Length of Memory Mapped I/O  */
+	__u32 accel;			/* Indicate to driver which	*/
+					/*  specific chip/card we have	*/
+	__u16 capabilities;		/* see FB_CAP_*			*/
+	__u16 reserved[2];		/* Reserved for future compatibility */
+  };
+
+struct fb_var_screeninfo stores device independent changeable information
+about a frame buffer device, its current format and video mode, as well as
+other miscellaneous parameters::
+
+  struct fb_var_screeninfo {
+	__u32 xres;			/* visible resolution		*/
+	__u32 yres;
+	__u32 xres_virtual;		/* virtual resolution		*/
+	__u32 yres_virtual;
+	__u32 xoffset;			/* offset from virtual to visible */
+	__u32 yoffset;			/* resolution			*/
+
+	__u32 bits_per_pixel;		/* guess what			*/
+	__u32 grayscale;		/* 0 = color, 1 = grayscale,	*/
+					/* >1 = FOURCC			*/
+	struct fb_bitfield red;		/* bitfield in fb mem if true color, */
+	struct fb_bitfield green;	/* else only length is significant */
+	struct fb_bitfield blue;
+	struct fb_bitfield transp;	/* transparency			*/
+
+	__u32 nonstd;			/* != 0 Non standard pixel format */
+
+	__u32 activate;			/* see FB_ACTIVATE_*		*/
+
+	__u32 height;			/* height of picture in mm    */
+	__u32 width;			/* width of picture in mm     */
+
+	__u32 accel_flags;		/* (OBSOLETE) see fb_info.flags */
+
+	/* Timing: All values in pixclocks, except pixclock (of course) */
+	__u32 pixclock;			/* pixel clock in ps (pico seconds) */
+	__u32 left_margin;		/* time from sync to picture	*/
+	__u32 right_margin;		/* time from picture to sync	*/
+	__u32 upper_margin;		/* time from sync to picture	*/
+	__u32 lower_margin;
+	__u32 hsync_len;		/* length of horizontal sync	*/
+	__u32 vsync_len;		/* length of vertical sync	*/
+	__u32 sync;			/* see FB_SYNC_*		*/
+	__u32 vmode;			/* see FB_VMODE_*		*/
+	__u32 rotate;			/* angle we rotate counter clockwise */
+	__u32 colorspace;		/* colorspace for FOURCC-based modes */
+	__u32 reserved[4];		/* Reserved for future compatibility */
+  };
+
+To modify variable information, applications call the FBIOPUT_VSCREENINFO
+ioctl with a pointer to a fb_var_screeninfo structure. If the call is
+successful, the driver will update the fixed screen information accordingly.
+
+Instead of filling the complete fb_var_screeninfo structure manually,
+applications should call the FBIOGET_VSCREENINFO ioctl and modify only the
+fields they care about.
+
+
+4. Format configuration
+-----------------------
+
+Frame buffer devices offer two ways to configure the frame buffer format: the
+legacy API and the FOURCC-based API.
+
+
+The legacy API has been the only frame buffer format configuration API for a
+long time and is thus widely used by application. It is the recommended API
+for applications when using RGB and grayscale formats, as well as legacy
+non-standard formats.
+
+To select a format, applications set the fb_var_screeninfo bits_per_pixel field
+to the desired frame buffer depth. Values up to 8 will usually map to
+monochrome, grayscale or pseudocolor visuals, although this is not required.
+
+- For grayscale formats, applications set the grayscale field to one. The red,
+  blue, green and transp fields must be set to 0 by applications and ignored by
+  drivers. Drivers must fill the red, blue and green offsets to 0 and lengths
+  to the bits_per_pixel value.
+
+- For pseudocolor formats, applications set the grayscale field to zero. The
+  red, blue, green and transp fields must be set to 0 by applications and
+  ignored by drivers. Drivers must fill the red, blue and green offsets to 0
+  and lengths to the bits_per_pixel value.
+
+- For truecolor and directcolor formats, applications set the grayscale field
+  to zero, and the red, blue, green and transp fields to describe the layout of
+  color components in memory::
+
+    struct fb_bitfield {
+	__u32 offset;			/* beginning of bitfield	*/
+	__u32 length;			/* length of bitfield		*/
+	__u32 msb_right;		/* != 0 : Most significant bit is */
+					/* right */
+    };
+
+  Pixel values are bits_per_pixel wide and are split in non-overlapping red,
+  green, blue and alpha (transparency) components. Location and size of each
+  component in the pixel value are described by the fb_bitfield offset and
+  length fields. Offset are computed from the right.
+
+  Pixels are always stored in an integer number of bytes. If the number of
+  bits per pixel is not a multiple of 8, pixel values are padded to the next
+  multiple of 8 bits.
+
+Upon successful format configuration, drivers update the fb_fix_screeninfo
+type, visual and line_length fields depending on the selected format.
+
+
+The FOURCC-based API replaces format descriptions by four character codes
+(FOURCC). FOURCCs are abstract identifiers that uniquely define a format
+without explicitly describing it. This is the only API that supports YUV
+formats. Drivers are also encouraged to implement the FOURCC-based API for RGB
+and grayscale formats.
+
+Drivers that support the FOURCC-based API report this capability by setting
+the FB_CAP_FOURCC bit in the fb_fix_screeninfo capabilities field.
+
+FOURCC definitions are located in the linux/videodev2.h header. However, and
+despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2
+and don't require usage of the V4L2 subsystem. FOURCC documentation is
+available in Documentation/media/uapi/v4l/pixfmt.rst.
+
+To select a format, applications set the grayscale field to the desired FOURCC.
+For YUV formats, they should also select the appropriate colorspace by setting
+the colorspace field to one of the colorspaces listed in linux/videodev2.h and
+documented in Documentation/media/uapi/v4l/colorspaces.rst.
+
+The red, green, blue and transp fields are not used with the FOURCC-based API.
+For forward compatibility reasons applications must zero those fields, and
+drivers must ignore them. Values other than 0 may get a meaning in future
+extensions.
+
+Upon successful format configuration, drivers update the fb_fix_screeninfo
+type, visual and line_length fields depending on the selected format. The type
+and visual fields are set to FB_TYPE_FOURCC and FB_VISUAL_FOURCC respectively.
diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.txt
deleted file mode 100644
index d52cf1e3b975..000000000000
--- a/Documentation/fb/api.txt
+++ /dev/null
@@ -1,306 +0,0 @@
-			The Frame Buffer Device API
-			---------------------------
-
-Last revised: June 21, 2011
-
-
-0. Introduction
----------------
-
-This document describes the frame buffer API used by applications to interact
-with frame buffer devices. In-kernel APIs between device drivers and the frame
-buffer core are not described.
-
-Due to a lack of documentation in the original frame buffer API, drivers
-behaviours differ in subtle (and not so subtle) ways. This document describes
-the recommended API implementation, but applications should be prepared to
-deal with different behaviours.
-
-
-1. Capabilities
----------------
-
-Device and driver capabilities are reported in the fixed screen information
-capabilities field.
-
-struct fb_fix_screeninfo {
-	...
-	__u16 capabilities;		/* see FB_CAP_*			*/
-	...
-};
-
-Application should use those capabilities to find out what features they can
-expect from the device and driver.
-
-- FB_CAP_FOURCC
-
-The driver supports the four character code (FOURCC) based format setting API.
-When supported, formats are configured using a FOURCC instead of manually
-specifying color components layout.
-
-
-2. Types and visuals
---------------------
-
-Pixels are stored in memory in hardware-dependent formats. Applications need
-to be aware of the pixel storage format in order to write image data to the
-frame buffer memory in the format expected by the hardware.
-
-Formats are described by frame buffer types and visuals. Some visuals require
-additional information, which are stored in the variable screen information
-bits_per_pixel, grayscale, red, green, blue and transp fields.
-
-Visuals describe how color information is encoded and assembled to create
-macropixels. Types describe how macropixels are stored in memory. The following
-types and visuals are supported.
-
-- FB_TYPE_PACKED_PIXELS
-
-Macropixels are stored contiguously in a single plane. If the number of bits
-per macropixel is not a multiple of 8, whether macropixels are padded to the
-next multiple of 8 bits or packed together into bytes depends on the visual.
-
-Padding at end of lines may be present and is then reported through the fixed
-screen information line_length field.
-
-- FB_TYPE_PLANES
-
-Macropixels are split across multiple planes. The number of planes is equal to
-the number of bits per macropixel, with plane i'th storing i'th bit from all
-macropixels.
-
-Planes are located contiguously in memory.
-
-- FB_TYPE_INTERLEAVED_PLANES
-
-Macropixels are split across multiple planes. The number of planes is equal to
-the number of bits per macropixel, with plane i'th storing i'th bit from all
-macropixels.
-
-Planes are interleaved in memory. The interleave factor, defined as the
-distance in bytes between the beginning of two consecutive interleaved blocks
-belonging to different planes, is stored in the fixed screen information
-type_aux field.
-
-- FB_TYPE_FOURCC
-
-Macropixels are stored in memory as described by the format FOURCC identifier
-stored in the variable screen information grayscale field.
-
-- FB_VISUAL_MONO01
-
-Pixels are black or white and stored on a number of bits (typically one)
-specified by the variable screen information bpp field.
-
-Black pixels are represented by all bits set to 1 and white pixels by all bits
-set to 0. When the number of bits per pixel is smaller than 8, several pixels
-are packed together in a byte.
-
-FB_VISUAL_MONO01 is currently used with FB_TYPE_PACKED_PIXELS only.
-
-- FB_VISUAL_MONO10
-
-Pixels are black or white and stored on a number of bits (typically one)
-specified by the variable screen information bpp field.
-
-Black pixels are represented by all bits set to 0 and white pixels by all bits
-set to 1. When the number of bits per pixel is smaller than 8, several pixels
-are packed together in a byte.
-
-FB_VISUAL_MONO01 is currently used with FB_TYPE_PACKED_PIXELS only.
-
-- FB_VISUAL_TRUECOLOR
-
-Pixels are broken into red, green and blue components, and each component
-indexes a read-only lookup table for the corresponding value. Lookup tables
-are device-dependent, and provide linear or non-linear ramps.
-
-Each component is stored in a macropixel according to the variable screen
-information red, green, blue and transp fields.
-
-- FB_VISUAL_PSEUDOCOLOR and FB_VISUAL_STATIC_PSEUDOCOLOR
-
-Pixel values are encoded as indices into a colormap that stores red, green and
-blue components. The colormap is read-only for FB_VISUAL_STATIC_PSEUDOCOLOR
-and read-write for FB_VISUAL_PSEUDOCOLOR.
-
-Each pixel value is stored in the number of bits reported by the variable
-screen information bits_per_pixel field.
-
-- FB_VISUAL_DIRECTCOLOR
-
-Pixels are broken into red, green and blue components, and each component
-indexes a programmable lookup table for the corresponding value.
-
-Each component is stored in a macropixel according to the variable screen
-information red, green, blue and transp fields.
-
-- FB_VISUAL_FOURCC
-
-Pixels are encoded and  interpreted as described by the format FOURCC
-identifier stored in the variable screen information grayscale field.
-
-
-3. Screen information
----------------------
-
-Screen information are queried by applications using the FBIOGET_FSCREENINFO
-and FBIOGET_VSCREENINFO ioctls. Those ioctls take a pointer to a
-fb_fix_screeninfo and fb_var_screeninfo structure respectively.
-
-struct fb_fix_screeninfo stores device independent unchangeable information
-about the frame buffer device and the current format. Those information can't
-be directly modified by applications, but can be changed by the driver when an
-application modifies the format.
-
-struct fb_fix_screeninfo {
-	char id[16];			/* identification string eg "TT Builtin" */
-	unsigned long smem_start;	/* Start of frame buffer mem */
-					/* (physical address) */
-	__u32 smem_len;			/* Length of frame buffer mem */
-	__u32 type;			/* see FB_TYPE_*		*/
-	__u32 type_aux;			/* Interleave for interleaved Planes */
-	__u32 visual;			/* see FB_VISUAL_*		*/
-	__u16 xpanstep;			/* zero if no hardware panning  */
-	__u16 ypanstep;			/* zero if no hardware panning  */
-	__u16 ywrapstep;		/* zero if no hardware ywrap    */
-	__u32 line_length;		/* length of a line in bytes    */
-	unsigned long mmio_start;	/* Start of Memory Mapped I/O   */
-					/* (physical address) */
-	__u32 mmio_len;			/* Length of Memory Mapped I/O  */
-	__u32 accel;			/* Indicate to driver which	*/
-					/*  specific chip/card we have	*/
-	__u16 capabilities;		/* see FB_CAP_*			*/
-	__u16 reserved[2];		/* Reserved for future compatibility */
-};
-
-struct fb_var_screeninfo stores device independent changeable information
-about a frame buffer device, its current format and video mode, as well as
-other miscellaneous parameters.
-
-struct fb_var_screeninfo {
-	__u32 xres;			/* visible resolution		*/
-	__u32 yres;
-	__u32 xres_virtual;		/* virtual resolution		*/
-	__u32 yres_virtual;
-	__u32 xoffset;			/* offset from virtual to visible */
-	__u32 yoffset;			/* resolution			*/
-
-	__u32 bits_per_pixel;		/* guess what			*/
-	__u32 grayscale;		/* 0 = color, 1 = grayscale,	*/
-					/* >1 = FOURCC			*/
-	struct fb_bitfield red;		/* bitfield in fb mem if true color, */
-	struct fb_bitfield green;	/* else only length is significant */
-	struct fb_bitfield blue;
-	struct fb_bitfield transp;	/* transparency			*/
-
-	__u32 nonstd;			/* != 0 Non standard pixel format */
-
-	__u32 activate;			/* see FB_ACTIVATE_*		*/
-
-	__u32 height;			/* height of picture in mm    */
-	__u32 width;			/* width of picture in mm     */
-
-	__u32 accel_flags;		/* (OBSOLETE) see fb_info.flags */
-
-	/* Timing: All values in pixclocks, except pixclock (of course) */
-	__u32 pixclock;			/* pixel clock in ps (pico seconds) */
-	__u32 left_margin;		/* time from sync to picture	*/
-	__u32 right_margin;		/* time from picture to sync	*/
-	__u32 upper_margin;		/* time from sync to picture	*/
-	__u32 lower_margin;
-	__u32 hsync_len;		/* length of horizontal sync	*/
-	__u32 vsync_len;		/* length of vertical sync	*/
-	__u32 sync;			/* see FB_SYNC_*		*/
-	__u32 vmode;			/* see FB_VMODE_*		*/
-	__u32 rotate;			/* angle we rotate counter clockwise */
-	__u32 colorspace;		/* colorspace for FOURCC-based modes */
-	__u32 reserved[4];		/* Reserved for future compatibility */
-};
-
-To modify variable information, applications call the FBIOPUT_VSCREENINFO
-ioctl with a pointer to a fb_var_screeninfo structure. If the call is
-successful, the driver will update the fixed screen information accordingly.
-
-Instead of filling the complete fb_var_screeninfo structure manually,
-applications should call the FBIOGET_VSCREENINFO ioctl and modify only the
-fields they care about.
-
-
-4. Format configuration
------------------------
-
-Frame buffer devices offer two ways to configure the frame buffer format: the
-legacy API and the FOURCC-based API.
-
-
-The legacy API has been the only frame buffer format configuration API for a
-long time and is thus widely used by application. It is the recommended API
-for applications when using RGB and grayscale formats, as well as legacy
-non-standard formats.
-
-To select a format, applications set the fb_var_screeninfo bits_per_pixel field
-to the desired frame buffer depth. Values up to 8 will usually map to
-monochrome, grayscale or pseudocolor visuals, although this is not required.
-
-- For grayscale formats, applications set the grayscale field to one. The red,
-  blue, green and transp fields must be set to 0 by applications and ignored by
-  drivers. Drivers must fill the red, blue and green offsets to 0 and lengths
-  to the bits_per_pixel value.
-
-- For pseudocolor formats, applications set the grayscale field to zero. The
-  red, blue, green and transp fields must be set to 0 by applications and
-  ignored by drivers. Drivers must fill the red, blue and green offsets to 0
-  and lengths to the bits_per_pixel value.
-
-- For truecolor and directcolor formats, applications set the grayscale field
-  to zero, and the red, blue, green and transp fields to describe the layout of
-  color components in memory.
-
-struct fb_bitfield {
-	__u32 offset;			/* beginning of bitfield	*/
-	__u32 length;			/* length of bitfield		*/
-	__u32 msb_right;		/* != 0 : Most significant bit is */
-					/* right */
-};
-
-  Pixel values are bits_per_pixel wide and are split in non-overlapping red,
-  green, blue and alpha (transparency) components. Location and size of each
-  component in the pixel value are described by the fb_bitfield offset and
-  length fields. Offset are computed from the right.
-
-  Pixels are always stored in an integer number of bytes. If the number of
-  bits per pixel is not a multiple of 8, pixel values are padded to the next
-  multiple of 8 bits.
-
-Upon successful format configuration, drivers update the fb_fix_screeninfo
-type, visual and line_length fields depending on the selected format.
-
-
-The FOURCC-based API replaces format descriptions by four character codes
-(FOURCC). FOURCCs are abstract identifiers that uniquely define a format
-without explicitly describing it. This is the only API that supports YUV
-formats. Drivers are also encouraged to implement the FOURCC-based API for RGB
-and grayscale formats.
-
-Drivers that support the FOURCC-based API report this capability by setting
-the FB_CAP_FOURCC bit in the fb_fix_screeninfo capabilities field.
-
-FOURCC definitions are located in the linux/videodev2.h header. However, and
-despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2
-and don't require usage of the V4L2 subsystem. FOURCC documentation is
-available in Documentation/media/uapi/v4l/pixfmt.rst.
-
-To select a format, applications set the grayscale field to the desired FOURCC.
-For YUV formats, they should also select the appropriate colorspace by setting
-the colorspace field to one of the colorspaces listed in linux/videodev2.h and
-documented in Documentation/media/uapi/v4l/colorspaces.rst.
-
-The red, green, blue and transp fields are not used with the FOURCC-based API.
-For forward compatibility reasons applications must zero those fields, and
-drivers must ignore them. Values other than 0 may get a meaning in future
-extensions.
-
-Upon successful format configuration, drivers update the fb_fix_screeninfo
-type, visual and line_length fields depending on the selected format. The type
-and visual fields are set to FB_TYPE_FOURCC and FB_VISUAL_FOURCC respectively.
diff --git a/Documentation/fb/arkfb.rst b/Documentation/fb/arkfb.rst
new file mode 100644
index 000000000000..aeca8773dd7e
--- /dev/null
+++ b/Documentation/fb/arkfb.rst
@@ -0,0 +1,68 @@
+========================================
+arkfb - fbdev driver for ARK Logic chips
+========================================
+
+
+Supported Hardware
+==================
+
+	ARK 2000PV chip
+	ICS 5342 ramdac
+
+	- only BIOS initialized VGA devices supported
+	- probably not working on big endian
+
+
+Supported Features
+==================
+
+	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
+	*  8 bpp pseudocolor mode (with 18bit palette)
+	* 16 bpp truecolor modes (RGB 555 and RGB 565)
+	* 24 bpp truecolor mode (RGB 888)
+	* 32 bpp truecolor mode (RGB 888)
+	* text mode (activated by bpp = 0)
+	* doublescan mode variant (not available in text mode)
+	* panning in both directions
+	* suspend/resume support
+
+Text mode is supported even in higher resolutions, but there is limitation to
+lower pixclocks (i got maximum about 70 MHz, it is dependent on specific
+hardware). This limitation is not enforced by driver. Text mode supports 8bit
+wide fonts only (hardware limitation) and 16bit tall fonts (driver
+limitation). Unfortunately character attributes (like color) in text mode are
+broken for unknown reason, so its usefulness is limited.
+
+There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
+packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
+with interleaved planes (1 byte interleave), MSB first. Both modes support
+8bit wide fonts only (driver limitation).
+
+Suspend/resume works on systems that initialize video card during resume and
+if device is active (for example used by fbcon).
+
+
+Missing Features
+================
+(alias TODO list)
+
+	* secondary (not initialized by BIOS) device support
+	* big endian support
+	* DPMS support
+	* MMIO support
+	* interlaced mode variant
+	* support for fontwidths != 8 in 4 bpp modes
+	* support for fontheight != 16 in text mode
+	* hardware cursor
+	* vsync synchronization
+	* feature connector support
+	* acceleration support (8514-like 2D)
+
+
+Known bugs
+==========
+
+	* character attributes (and cursor) in text mode are broken
+
+--
+Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/Documentation/fb/arkfb.txt b/Documentation/fb/arkfb.txt
deleted file mode 100644
index e8487a9d6a05..000000000000
--- a/Documentation/fb/arkfb.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-
-	arkfb - fbdev driver for ARK Logic chips
-	========================================
-
-
-Supported Hardware
-==================
-
-	ARK 2000PV chip
-	ICS 5342 ramdac
-
-	- only BIOS initialized VGA devices supported
-	- probably not working on big endian
-
-
-Supported Features
-==================
-
-	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
-	*  8 bpp pseudocolor mode (with 18bit palette)
-	* 16 bpp truecolor modes (RGB 555 and RGB 565)
-	* 24 bpp truecolor mode (RGB 888)
-	* 32 bpp truecolor mode (RGB 888)
-	* text mode (activated by bpp = 0)
-	* doublescan mode variant (not available in text mode)
-	* panning in both directions
-	* suspend/resume support
-
-Text mode is supported even in higher resolutions, but there is limitation to
-lower pixclocks (i got maximum about 70 MHz, it is dependent on specific
-hardware). This limitation is not enforced by driver. Text mode supports 8bit
-wide fonts only (hardware limitation) and 16bit tall fonts (driver
-limitation). Unfortunately character attributes (like color) in text mode are
-broken for unknown reason, so its usefulness is limited.
-
-There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
-packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
-with interleaved planes (1 byte interleave), MSB first. Both modes support
-8bit wide fonts only (driver limitation).
-
-Suspend/resume works on systems that initialize video card during resume and
-if device is active (for example used by fbcon).
-
-
-Missing Features
-================
-(alias TODO list)
-
-	* secondary (not initialized by BIOS) device support
-   	* big endian support
-	* DPMS support
-	* MMIO support
-	* interlaced mode variant
-	* support for fontwidths != 8 in 4 bpp modes
-	* support for fontheight != 16 in text mode
-	* hardware cursor
-	* vsync synchronization
-	* feature connector support
-	* acceleration support (8514-like 2D)
-
-
-Known bugs
-==========
-
-	* character attributes (and cursor) in text mode are broken
-
---
-Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/Documentation/fb/aty128fb.rst b/Documentation/fb/aty128fb.rst
new file mode 100644
index 000000000000..3f107718f933
--- /dev/null
+++ b/Documentation/fb/aty128fb.rst
@@ -0,0 +1,75 @@
+=================
+What is aty128fb?
+=================
+
+.. [This file is cloned from VesaFB/matroxfb]
+
+This is a driver for a graphic framebuffer for ATI Rage128 based devices
+on Intel and PPC boxes.
+
+Advantages:
+
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts.
+ * You can run XF68_FBDev on top of /dev/fb0
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * graphic mode is slower than text mode... but you should not notice
+   if you use same resolution as you used in textmode.
+ * still experimental.
+
+
+How to use it?
+==============
+
+Switching modes is done using the  video=aty128fb:<resolution>... modedb
+boot parameter or using `fbset` program.
+
+See Documentation/fb/modedb.rst for more information on modedb
+resolutions.
+
+You should compile in both vgacon (to boot if you remove your Rage128 from
+box) and aty128fb (for graphics mode). You should not compile-in vesafb
+unless you have primary display on non-Rage128 VBE2.0 device (see
+Documentation/fb/vesafb.rst for details).
+
+
+X11
+===
+
+XF68_FBDev should generally work fine, but it is non-accelerated. As of
+this document, 8 and 32bpp works fine.  There have been palette issues
+when switching from X to console and back to X.  You will have to restart
+X to fix this.
+
+
+Configuration
+=============
+
+You can pass kernel command line options to vesafb with
+`video=aty128fb:option1,option2:value2,option3` (multiple options should
+be separated by comma, values are separated from options by `:`).
+Accepted options:
+
+========= =======================================================
+noaccel   do not use acceleration engine. It is default.
+accel     use acceleration engine. Not finished.
+vmode:x   chooses PowerMacintosh video mode <x>. Deprecated.
+cmode:x   chooses PowerMacintosh colour mode <x>. Deprecated.
+<XxX@X>   selects startup videomode. See modedb.txt for detailed
+	  explanation. Default is 640x480x8bpp.
+========= =======================================================
+
+
+Limitations
+===========
+
+There are known and unknown bugs, features and misfeatures.
+Currently there are following known bugs:
+
+ - This driver is still experimental and is not finished.  Too many
+   bugs/errata to list here.
+
+Brad Douglas <brad@neruo.com>
diff --git a/Documentation/fb/aty128fb.txt b/Documentation/fb/aty128fb.txt
deleted file mode 100644
index b605204fcfe1..000000000000
--- a/Documentation/fb/aty128fb.txt
+++ /dev/null
@@ -1,72 +0,0 @@
-[This file is cloned from VesaFB/matroxfb]
-
-What is aty128fb?
-=================
-
-This is a driver for a graphic framebuffer for ATI Rage128 based devices
-on Intel and PPC boxes.
-
-Advantages:
-
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts.
- * You can run XF68_FBDev on top of /dev/fb0
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * graphic mode is slower than text mode... but you should not notice
-   if you use same resolution as you used in textmode.
- * still experimental.
-
-
-How to use it?
-==============
-
-Switching modes is done using the  video=aty128fb:<resolution>... modedb
-boot parameter or using `fbset' program.
-
-See Documentation/fb/modedb.txt for more information on modedb
-resolutions.
-
-You should compile in both vgacon (to boot if you remove your Rage128 from
-box) and aty128fb (for graphics mode). You should not compile-in vesafb
-unless you have primary display on non-Rage128 VBE2.0 device (see 
-Documentation/fb/vesafb.txt for details).
-
-
-X11
-===
-
-XF68_FBDev should generally work fine, but it is non-accelerated. As of
-this document, 8 and 32bpp works fine.  There have been palette issues
-when switching from X to console and back to X.  You will have to restart
-X to fix this.
-
-
-Configuration
-=============
-
-You can pass kernel command line options to vesafb with
-`video=aty128fb:option1,option2:value2,option3' (multiple options should
-be separated by comma, values are separated from options by `:'). 
-Accepted options:
-
-noaccel  - do not use acceleration engine. It is default.
-accel    - use acceleration engine. Not finished.
-vmode:x  - chooses PowerMacintosh video mode <x>. Deprecated.
-cmode:x  - chooses PowerMacintosh colour mode <x>. Deprecated.
-<XxX@X>  - selects startup videomode. See modedb.txt for detailed
-	   explanation. Default is 640x480x8bpp.
-
-
-Limitations
-===========
-
-There are known and unknown bugs, features and misfeatures.
-Currently there are following known bugs:
- + This driver is still experimental and is not finished.  Too many
-   bugs/errata to list here.
-
---
-Brad Douglas <brad@neruo.com>
diff --git a/Documentation/fb/cirrusfb.rst b/Documentation/fb/cirrusfb.rst
new file mode 100644
index 000000000000..8c3e6c6cb114
--- /dev/null
+++ b/Documentation/fb/cirrusfb.rst
@@ -0,0 +1,94 @@
+============================================
+Framebuffer driver for Cirrus Logic chipsets
+============================================
+
+Copyright 1999 Jeff Garzik <jgarzik@pobox.com>
+
+
+.. just a little something to get people going; contributors welcome!
+
+
+Chip families supported:
+	- SD64
+	- Piccolo
+	- Picasso
+	- Spectrum
+	- Alpine (GD-543x/4x)
+	- Picasso4 (GD-5446)
+	- GD-5480
+	- Laguna (GD-546x)
+
+Bus's supported:
+	- PCI
+	- Zorro
+
+Architectures supported:
+	- i386
+	- Alpha
+	- PPC (Motorola Powerstack)
+	- m68k (Amiga)
+
+
+
+Default video modes
+-------------------
+At the moment, there are two kernel command line arguments supported:
+
+- mode:640x480
+- mode:800x600
+- mode:1024x768
+
+Full support for startup video modes (modedb) will be integrated soon.
+
+Version 1.9.9.1
+---------------
+* Fix memory detection for 512kB case
+* 800x600 mode
+* Fixed timings
+* Hint for AXP: Use -accel false -vyres -1 when changing resolution
+
+
+Version 1.9.4.4
+---------------
+* Preliminary Laguna support
+* Overhaul color register routines.
+* Associated with the above, console colors are now obtained from a LUT
+  called 'palette' instead of from the VGA registers.  This code was
+  modelled after that in atyfb and matroxfb.
+* Code cleanup, add comments.
+* Overhaul SR07 handling.
+* Bug fixes.
+
+
+Version 1.9.4.3
+---------------
+* Correctly set default startup video mode.
+* Do not override ram size setting.  Define
+  CLGEN_USE_HARDCODED_RAM_SETTINGS if you _do_ want to override the RAM
+  setting.
+* Compile fixes related to new 2.3.x IORESOURCE_IO[PORT] symbol changes.
+* Use new 2.3.x resource allocation.
+* Some code cleanup.
+
+
+Version 1.9.4.2
+---------------
+* Casting fixes.
+* Assertions no longer cause an oops on purpose.
+* Bug fixes.
+
+
+Version 1.9.4.1
+---------------
+* Add compatibility support.  Now requires a 2.1.x, 2.2.x or 2.3.x kernel.
+
+
+Version 1.9.4
+-------------
+* Several enhancements, smaller memory footprint, a few bugfixes.
+* Requires kernel 2.3.14-pre1 or later.
+
+
+Version 1.9.3
+-------------
+* Bundled with kernel 2.3.14-pre1 or later.
diff --git a/Documentation/fb/cirrusfb.txt b/Documentation/fb/cirrusfb.txt
deleted file mode 100644
index f75950d330a4..000000000000
--- a/Documentation/fb/cirrusfb.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-
-		Framebuffer driver for Cirrus Logic chipsets
-		Copyright 1999 Jeff Garzik <jgarzik@pobox.com>
-
-
-
-{ just a little something to get people going; contributors welcome! }
-
-
-
-Chip families supported:
-	SD64
-	Piccolo
-	Picasso
-	Spectrum
-	Alpine (GD-543x/4x)
-	Picasso4 (GD-5446)
-	GD-5480
-	Laguna (GD-546x)
-
-Bus's supported:
-	PCI
-	Zorro
-
-Architectures supported:
-	i386
-	Alpha
-	PPC (Motorola Powerstack)
-	m68k (Amiga)
-
-
-
-Default video modes
--------------------
-At the moment, there are two kernel command line arguments supported:
-
-mode:640x480
-mode:800x600
-	or
-mode:1024x768
-
-Full support for startup video modes (modedb) will be integrated soon.
-
-Version 1.9.9.1
----------------
-* Fix memory detection for 512kB case
-* 800x600 mode
-* Fixed timings
-* Hint for AXP: Use -accel false -vyres -1 when changing resolution
-
-
-Version 1.9.4.4
----------------
-* Preliminary Laguna support
-* Overhaul color register routines.
-* Associated with the above, console colors are now obtained from a LUT
-  called 'palette' instead of from the VGA registers.  This code was
-  modelled after that in atyfb and matroxfb.
-* Code cleanup, add comments.
-* Overhaul SR07 handling.
-* Bug fixes.
-
-
-Version 1.9.4.3
----------------
-* Correctly set default startup video mode.
-* Do not override ram size setting.  Define
-  CLGEN_USE_HARDCODED_RAM_SETTINGS if you _do_ want to override the RAM
-  setting.
-* Compile fixes related to new 2.3.x IORESOURCE_IO[PORT] symbol changes.
-* Use new 2.3.x resource allocation.
-* Some code cleanup.
-
-
-Version 1.9.4.2
----------------
-* Casting fixes.
-* Assertions no longer cause an oops on purpose.
-* Bug fixes.
-
-
-Version 1.9.4.1
----------------
-* Add compatibility support.  Now requires a 2.1.x, 2.2.x or 2.3.x kernel.
-
-
-Version 1.9.4
--------------
-* Several enhancements, smaller memory footprint, a few bugfixes.
-* Requires kernel 2.3.14-pre1 or later.
-
-
-Version 1.9.3
--------------
-* Bundled with kernel 2.3.14-pre1 or later.
-
-
diff --git a/Documentation/fb/cmap_xfbdev.rst b/Documentation/fb/cmap_xfbdev.rst
new file mode 100644
index 000000000000..5db5e9787361
--- /dev/null
+++ b/Documentation/fb/cmap_xfbdev.rst
@@ -0,0 +1,56 @@
+==========================
+Understanding fbdev's cmap
+==========================
+
+These notes explain how X's dix layer uses fbdev's cmap structures.
+
+-  example of relevant structures in fbdev as used for a 3-bit grayscale cmap::
+
+    struct fb_var_screeninfo {
+	    .bits_per_pixel = 8,
+	    .grayscale      = 1,
+	    .red =          { 4, 3, 0 },
+	    .green =        { 0, 0, 0 },
+	    .blue =         { 0, 0, 0 },
+    }
+    struct fb_fix_screeninfo {
+	    .visual =       FB_VISUAL_STATIC_PSEUDOCOLOR,
+    }
+    for (i = 0; i < 8; i++)
+	info->cmap.red[i] = (((2*i)+1)*(0xFFFF))/16;
+    memcpy(info->cmap.green, info->cmap.red, sizeof(u16)*8);
+    memcpy(info->cmap.blue, info->cmap.red, sizeof(u16)*8);
+
+-  X11 apps do something like the following when trying to use grayscale::
+
+    for (i=0; i < 8; i++) {
+	char colorspec[64];
+	memset(colorspec,0,64);
+	sprintf(colorspec, "rgb:%x/%x/%x", i*36,i*36,i*36);
+	if (!XParseColor(outputDisplay, testColormap, colorspec, &wantedColor))
+		printf("Can't get color %s\n",colorspec);
+	XAllocColor(outputDisplay, testColormap, &wantedColor);
+	grays[i] = wantedColor;
+    }
+
+There's also named equivalents like gray1..x provided you have an rgb.txt.
+
+Somewhere in X's callchain, this results in a call to X code that handles the
+colormap. For example, Xfbdev hits the following:
+
+xc-011010/programs/Xserver/dix/colormap.c::
+
+  FindBestPixel(pentFirst, size, prgb, channel)
+
+  dr = (long) pent->co.local.red - prgb->red;
+  dg = (long) pent->co.local.green - prgb->green;
+  db = (long) pent->co.local.blue - prgb->blue;
+  sq = dr * dr;
+  UnsignedToBigNum (sq, &sum);
+  BigNumAdd (&sum, &temp, &sum);
+
+co.local.red are entries that were brought in through FBIOGETCMAP which come
+directly from the info->cmap.red that was listed above. The prgb is the rgb
+that the app wants to match to. The above code is doing what looks like a least
+squares matching function. That's why the cmap entries can't be set to the left
+hand side boundaries of a color range.
diff --git a/Documentation/fb/cmap_xfbdev.txt b/Documentation/fb/cmap_xfbdev.txt
deleted file mode 100644
index 55e1f0a3d2b4..000000000000
--- a/Documentation/fb/cmap_xfbdev.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-Understanding fbdev's cmap
---------------------------
-
-These notes explain how X's dix layer uses fbdev's cmap structures.
-
-*. example of relevant structures in fbdev as used for a 3-bit grayscale cmap
-struct fb_var_screeninfo {
-        .bits_per_pixel = 8,
-        .grayscale      = 1,
-        .red =          { 4, 3, 0 },
-        .green =        { 0, 0, 0 },
-        .blue =         { 0, 0, 0 },
-}
-struct fb_fix_screeninfo {
-        .visual =       FB_VISUAL_STATIC_PSEUDOCOLOR,
-}
-for (i = 0; i < 8; i++)
-	info->cmap.red[i] = (((2*i)+1)*(0xFFFF))/16;
-memcpy(info->cmap.green, info->cmap.red, sizeof(u16)*8);
-memcpy(info->cmap.blue, info->cmap.red, sizeof(u16)*8);
-
-*. X11 apps do something like the following when trying to use grayscale.
-for (i=0; i < 8; i++) {
-	char colorspec[64];
-	memset(colorspec,0,64);
-	sprintf(colorspec, "rgb:%x/%x/%x", i*36,i*36,i*36);
-	if (!XParseColor(outputDisplay, testColormap, colorspec, &wantedColor))
-		printf("Can't get color %s\n",colorspec);
-	XAllocColor(outputDisplay, testColormap, &wantedColor);
-	grays[i] = wantedColor;
-}
-There's also named equivalents like gray1..x provided you have an rgb.txt.
-
-Somewhere in X's callchain, this results in a call to X code that handles the
-colormap. For example, Xfbdev hits the following:
-
-xc-011010/programs/Xserver/dix/colormap.c:
-
-FindBestPixel(pentFirst, size, prgb, channel)
-
-dr = (long) pent->co.local.red - prgb->red;
-dg = (long) pent->co.local.green - prgb->green;
-db = (long) pent->co.local.blue - prgb->blue;
-sq = dr * dr;
-UnsignedToBigNum (sq, &sum);
-BigNumAdd (&sum, &temp, &sum);
-
-co.local.red are entries that were brought in through FBIOGETCMAP which come
-directly from the info->cmap.red that was listed above. The prgb is the rgb
-that the app wants to match to. The above code is doing what looks like a least
-squares matching function. That's why the cmap entries can't be set to the left
-hand side boundaries of a color range.
-
diff --git a/Documentation/fb/deferred_io.rst b/Documentation/fb/deferred_io.rst
new file mode 100644
index 000000000000..7300cff255a3
--- /dev/null
+++ b/Documentation/fb/deferred_io.rst
@@ -0,0 +1,79 @@
+===========
+Deferred IO
+===========
+
+Deferred IO is a way to delay and repurpose IO. It uses host memory as a
+buffer and the MMU pagefault as a pretrigger for when to perform the device
+IO. The following example may be a useful explanation of how one such setup
+works:
+
+- userspace app like Xfbdev mmaps framebuffer
+- deferred IO and driver sets up fault and page_mkwrite handlers
+- userspace app tries to write to mmaped vaddress
+- we get pagefault and reach fault handler
+- fault handler finds and returns physical page
+- we get page_mkwrite where we add this page to a list
+- schedule a workqueue task to be run after a delay
+- app continues writing to that page with no additional cost. this is
+  the key benefit.
+- the workqueue task comes in and mkcleans the pages on the list, then
+  completes the work associated with updating the framebuffer. this is
+  the real work talking to the device.
+- app tries to write to the address (that has now been mkcleaned)
+- get pagefault and the above sequence occurs again
+
+As can be seen from above, one benefit is roughly to allow bursty framebuffer
+writes to occur at minimum cost. Then after some time when hopefully things
+have gone quiet, we go and really update the framebuffer which would be
+a relatively more expensive operation.
+
+For some types of nonvolatile high latency displays, the desired image is
+the final image rather than the intermediate stages which is why it's okay
+to not update for each write that is occurring.
+
+It may be the case that this is useful in other scenarios as well. Paul Mundt
+has mentioned a case where it is beneficial to use the page count to decide
+whether to coalesce and issue SG DMA or to do memory bursts.
+
+Another one may be if one has a device framebuffer that is in an usual format,
+say diagonally shifting RGB, this may then be a mechanism for you to allow
+apps to pretend to have a normal framebuffer but reswizzle for the device
+framebuffer at vsync time based on the touched pagelist.
+
+How to use it: (for applications)
+---------------------------------
+No changes needed. mmap the framebuffer like normal and just use it.
+
+How to use it: (for fbdev drivers)
+----------------------------------
+The following example may be helpful.
+
+1. Setup your structure. Eg::
+
+	static struct fb_deferred_io hecubafb_defio = {
+		.delay		= HZ,
+		.deferred_io	= hecubafb_dpy_deferred_io,
+	};
+
+The delay is the minimum delay between when the page_mkwrite trigger occurs
+and when the deferred_io callback is called. The deferred_io callback is
+explained below.
+
+2. Setup your deferred IO callback. Eg::
+
+	static void hecubafb_dpy_deferred_io(struct fb_info *info,
+					     struct list_head *pagelist)
+
+The deferred_io callback is where you would perform all your IO to the display
+device. You receive the pagelist which is the list of pages that were written
+to during the delay. You must not modify this list. This callback is called
+from a workqueue.
+
+3. Call init::
+
+	info->fbdefio = &hecubafb_defio;
+	fb_deferred_io_init(info);
+
+4. Call cleanup::
+
+	fb_deferred_io_cleanup(info);
diff --git a/Documentation/fb/deferred_io.txt b/Documentation/fb/deferred_io.txt
deleted file mode 100644
index 748328370250..000000000000
--- a/Documentation/fb/deferred_io.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-Deferred IO
------------
-
-Deferred IO is a way to delay and repurpose IO. It uses host memory as a
-buffer and the MMU pagefault as a pretrigger for when to perform the device
-IO. The following example may be a useful explanation of how one such setup
-works:
-
-- userspace app like Xfbdev mmaps framebuffer
-- deferred IO and driver sets up fault and page_mkwrite handlers
-- userspace app tries to write to mmaped vaddress
-- we get pagefault and reach fault handler
-- fault handler finds and returns physical page
-- we get page_mkwrite where we add this page to a list
-- schedule a workqueue task to be run after a delay
-- app continues writing to that page with no additional cost. this is
-  the key benefit.
-- the workqueue task comes in and mkcleans the pages on the list, then
- completes the work associated with updating the framebuffer. this is
-  the real work talking to the device.
-- app tries to write to the address (that has now been mkcleaned)
-- get pagefault and the above sequence occurs again
-
-As can be seen from above, one benefit is roughly to allow bursty framebuffer
-writes to occur at minimum cost. Then after some time when hopefully things
-have gone quiet, we go and really update the framebuffer which would be
-a relatively more expensive operation.
-
-For some types of nonvolatile high latency displays, the desired image is
-the final image rather than the intermediate stages which is why it's okay
-to not update for each write that is occurring.
-
-It may be the case that this is useful in other scenarios as well. Paul Mundt
-has mentioned a case where it is beneficial to use the page count to decide
-whether to coalesce and issue SG DMA or to do memory bursts.
-
-Another one may be if one has a device framebuffer that is in an usual format,
-say diagonally shifting RGB, this may then be a mechanism for you to allow
-apps to pretend to have a normal framebuffer but reswizzle for the device
-framebuffer at vsync time based on the touched pagelist.
-
-How to use it: (for applications)
----------------------------------
-No changes needed. mmap the framebuffer like normal and just use it.
-
-How to use it: (for fbdev drivers)
-----------------------------------
-The following example may be helpful.
-
-1. Setup your structure. Eg:
-
-static struct fb_deferred_io hecubafb_defio = {
-	.delay		= HZ,
-	.deferred_io	= hecubafb_dpy_deferred_io,
-};
-
-The delay is the minimum delay between when the page_mkwrite trigger occurs
-and when the deferred_io callback is called. The deferred_io callback is
-explained below.
-
-2. Setup your deferred IO callback. Eg:
-static void hecubafb_dpy_deferred_io(struct fb_info *info,
-				struct list_head *pagelist)
-
-The deferred_io callback is where you would perform all your IO to the display
-device. You receive the pagelist which is the list of pages that were written
-to during the delay. You must not modify this list. This callback is called
-from a workqueue.
-
-3. Call init
-	info->fbdefio = &hecubafb_defio;
-	fb_deferred_io_init(info);
-
-4. Call cleanup
-	fb_deferred_io_cleanup(info);
diff --git a/Documentation/fb/efifb.rst b/Documentation/fb/efifb.rst
new file mode 100644
index 000000000000..04840331a00e
--- /dev/null
+++ b/Documentation/fb/efifb.rst
@@ -0,0 +1,39 @@
+==============
+What is efifb?
+==============
+
+This is a generic EFI platform driver for Intel based Apple computers.
+efifb is only for EFI booted Intel Macs.
+
+Supported Hardware
+==================
+
+- iMac 17"/20"
+- Macbook
+- Macbook Pro 15"/17"
+- MacMini
+
+How to use it?
+==============
+
+efifb does not have any kind of autodetection of your machine.
+You have to add the following kernel parameters in your elilo.conf::
+
+	Macbook :
+		video=efifb:macbook
+	MacMini :
+		video=efifb:mini
+	Macbook Pro 15", iMac 17" :
+		video=efifb:i17
+	Macbook Pro 17", iMac 20" :
+		video=efifb:i20
+
+Accepted options:
+
+======= ===========================================================
+nowc	Don't map the framebuffer write combined. This can be used
+	to workaround side-effects and slowdowns on other CPU cores
+	when large amounts of console data are written.
+======= ===========================================================
+
+Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/fb/efifb.txt b/Documentation/fb/efifb.txt
deleted file mode 100644
index 1a85c1bdaf38..000000000000
--- a/Documentation/fb/efifb.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-
-What is efifb?
-===============
-
-This is a generic EFI platform driver for Intel based Apple computers.
-efifb is only for EFI booted Intel Macs.
-
-Supported Hardware
-==================
-
-iMac 17"/20"
-Macbook
-Macbook Pro 15"/17"
-MacMini
-
-How to use it?
-==============
-
-efifb does not have any kind of autodetection of your machine.
-You have to add the following kernel parameters in your elilo.conf:
-	Macbook :
-		video=efifb:macbook
-	MacMini :
-		video=efifb:mini
-	Macbook Pro 15", iMac 17" :
-		video=efifb:i17
-	Macbook Pro 17", iMac 20" :
-		video=efifb:i20
-
-Accepted options:
-
-nowc	Don't map the framebuffer write combined. This can be used
-	to workaround side-effects and slowdowns on other CPU cores
-	when large amounts of console data are written.
-
---
-Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/fb/ep93xx-fb.rst b/Documentation/fb/ep93xx-fb.rst
new file mode 100644
index 000000000000..6f7767926d1a
--- /dev/null
+++ b/Documentation/fb/ep93xx-fb.rst
@@ -0,0 +1,140 @@
+================================
+Driver for EP93xx LCD controller
+================================
+
+The EP93xx LCD controller can drive both standard desktop monitors and
+embedded LCD displays. If you have a standard desktop monitor then you
+can use the standard Linux video mode database. In your board file::
+
+	static struct ep93xxfb_mach_info some_board_fb_info = {
+		.num_modes	= EP93XXFB_USE_MODEDB,
+		.bpp		= 16,
+	};
+
+If you have an embedded LCD display then you need to define a video
+mode for it as follows::
+
+	static struct fb_videomode some_board_video_modes[] = {
+		{
+			.name		= "some_lcd_name",
+			/* Pixel clock, porches, etc */
+		},
+	};
+
+Note that the pixel clock value is in pico-seconds. You can use the
+KHZ2PICOS macro to convert the pixel clock value. Most other values
+are in pixel clocks. See Documentation/fb/framebuffer.rst for further
+details.
+
+The ep93xxfb_mach_info structure for your board should look like the
+following::
+
+	static struct ep93xxfb_mach_info some_board_fb_info = {
+		.num_modes	= ARRAY_SIZE(some_board_video_modes),
+		.modes		= some_board_video_modes,
+		.default_mode	= &some_board_video_modes[0],
+		.bpp		= 16,
+	};
+
+The framebuffer device can be registered by adding the following to
+your board initialisation function::
+
+	ep93xx_register_fb(&some_board_fb_info);
+
+=====================
+Video Attribute Flags
+=====================
+
+The ep93xxfb_mach_info structure has a flags field which can be used
+to configure the controller. The video attributes flags are fully
+documented in section 7 of the EP93xx users' guide. The following
+flags are available:
+
+=============================== ==========================================
+EP93XXFB_PCLK_FALLING		Clock data on the falling edge of the
+				pixel clock. The default is to clock
+				data on the rising edge.
+
+EP93XXFB_SYNC_BLANK_HIGH	Blank signal is active high. By
+				default the blank signal is active low.
+
+EP93XXFB_SYNC_HORIZ_HIGH	Horizontal sync is active high. By
+				default the horizontal sync is active low.
+
+EP93XXFB_SYNC_VERT_HIGH		Vertical sync is active high. By
+				default the vertical sync is active high.
+=============================== ==========================================
+
+The physical address of the framebuffer can be controlled using the
+following flags:
+
+=============================== ======================================
+EP93XXFB_USE_SDCSN0		Use SDCSn[0] for the framebuffer. This
+				is the default setting.
+
+EP93XXFB_USE_SDCSN1		Use SDCSn[1] for the framebuffer.
+
+EP93XXFB_USE_SDCSN2		Use SDCSn[2] for the framebuffer.
+
+EP93XXFB_USE_SDCSN3		Use SDCSn[3] for the framebuffer.
+=============================== ======================================
+
+==================
+Platform callbacks
+==================
+
+The EP93xx framebuffer driver supports three optional platform
+callbacks: setup, teardown and blank. The setup and teardown functions
+are called when the framebuffer driver is installed and removed
+respectively. The blank function is called whenever the display is
+blanked or unblanked.
+
+The setup and teardown devices pass the platform_device structure as
+an argument. The fb_info and ep93xxfb_mach_info structures can be
+obtained as follows::
+
+	static int some_board_fb_setup(struct platform_device *pdev)
+	{
+		struct ep93xxfb_mach_info *mach_info = pdev->dev.platform_data;
+		struct fb_info *fb_info = platform_get_drvdata(pdev);
+
+		/* Board specific framebuffer setup */
+	}
+
+======================
+Setting the video mode
+======================
+
+The video mode is set using the following syntax::
+
+	video=XRESxYRES[-BPP][@REFRESH]
+
+If the EP93xx video driver is built-in then the video mode is set on
+the Linux kernel command line, for example::
+
+	video=ep93xx-fb:800x600-16@60
+
+If the EP93xx video driver is built as a module then the video mode is
+set when the module is installed::
+
+	modprobe ep93xx-fb video=320x240
+
+==============
+Screenpage bug
+==============
+
+At least on the EP9315 there is a silicon bug which causes bit 27 of
+the VIDSCRNPAGE (framebuffer physical offset) to be tied low. There is
+an unofficial errata for this bug at::
+
+	http://marc.info/?l=linux-arm-kernel&m=110061245502000&w=2
+
+By default the EP93xx framebuffer driver checks if the allocated physical
+address has bit 27 set. If it does, then the memory is freed and an
+error is returned. The check can be disabled by adding the following
+option when loading the driver::
+
+      ep93xx-fb.check_screenpage_bug=0
+
+In some cases it may be possible to reconfigure your SDRAM layout to
+avoid this bug. See section 13 of the EP93xx users' guide for details.
diff --git a/Documentation/fb/ep93xx-fb.txt b/Documentation/fb/ep93xx-fb.txt
deleted file mode 100644
index 5af1bd9effae..000000000000
--- a/Documentation/fb/ep93xx-fb.txt
+++ /dev/null
@@ -1,135 +0,0 @@
-================================
-Driver for EP93xx LCD controller
-================================
-
-The EP93xx LCD controller can drive both standard desktop monitors and
-embedded LCD displays. If you have a standard desktop monitor then you
-can use the standard Linux video mode database. In your board file:
-
-	static struct ep93xxfb_mach_info some_board_fb_info = {
-		.num_modes	= EP93XXFB_USE_MODEDB,
-		.bpp		= 16,
-	};
-
-If you have an embedded LCD display then you need to define a video
-mode for it as follows:
-
-	static struct fb_videomode some_board_video_modes[] = {
-		{
-			.name		= "some_lcd_name",
-			/* Pixel clock, porches, etc */
-		},
-	};
-
-Note that the pixel clock value is in pico-seconds. You can use the
-KHZ2PICOS macro to convert the pixel clock value. Most other values
-are in pixel clocks. See Documentation/fb/framebuffer.txt for further
-details.
-
-The ep93xxfb_mach_info structure for your board should look like the
-following:
-
-	static struct ep93xxfb_mach_info some_board_fb_info = {
-		.num_modes	= ARRAY_SIZE(some_board_video_modes),
-		.modes		= some_board_video_modes,
-		.default_mode	= &some_board_video_modes[0],
-		.bpp		= 16,
-	};
-
-The framebuffer device can be registered by adding the following to
-your board initialisation function:
-
-	ep93xx_register_fb(&some_board_fb_info);
-
-=====================
-Video Attribute Flags
-=====================
-
-The ep93xxfb_mach_info structure has a flags field which can be used
-to configure the controller. The video attributes flags are fully
-documented in section 7 of the EP93xx users' guide. The following
-flags are available:
-
-EP93XXFB_PCLK_FALLING		Clock data on the falling edge of the
-				pixel clock. The default is to clock
-				data on the rising edge.
-
-EP93XXFB_SYNC_BLANK_HIGH	Blank signal is active high. By
-				default the blank signal is active low.
-
-EP93XXFB_SYNC_HORIZ_HIGH	Horizontal sync is active high. By
-				default the horizontal sync is active low.
-
-EP93XXFB_SYNC_VERT_HIGH		Vertical sync is active high. By
-				default the vertical sync is active high.
-
-The physical address of the framebuffer can be controlled using the
-following flags:
-
-EP93XXFB_USE_SDCSN0		Use SDCSn[0] for the framebuffer. This
-				is the default setting.
-
-EP93XXFB_USE_SDCSN1		Use SDCSn[1] for the framebuffer.
-
-EP93XXFB_USE_SDCSN2		Use SDCSn[2] for the framebuffer.
-
-EP93XXFB_USE_SDCSN3		Use SDCSn[3] for the framebuffer.
-
-==================
-Platform callbacks
-==================
-
-The EP93xx framebuffer driver supports three optional platform
-callbacks: setup, teardown and blank. The setup and teardown functions
-are called when the framebuffer driver is installed and removed
-respectively. The blank function is called whenever the display is
-blanked or unblanked.
-
-The setup and teardown devices pass the platform_device structure as
-an argument. The fb_info and ep93xxfb_mach_info structures can be
-obtained as follows:
-
-	static int some_board_fb_setup(struct platform_device *pdev)
-	{
-		struct ep93xxfb_mach_info *mach_info = pdev->dev.platform_data;
-		struct fb_info *fb_info = platform_get_drvdata(pdev);
-
-		/* Board specific framebuffer setup */
-	}
-
-======================
-Setting the video mode
-======================
-
-The video mode is set using the following syntax:
-
-	video=XRESxYRES[-BPP][@REFRESH]
-
-If the EP93xx video driver is built-in then the video mode is set on
-the Linux kernel command line, for example:
-
-	video=ep93xx-fb:800x600-16@60
-
-If the EP93xx video driver is built as a module then the video mode is
-set when the module is installed:
-
-	modprobe ep93xx-fb video=320x240
-
-==============
-Screenpage bug
-==============
-
-At least on the EP9315 there is a silicon bug which causes bit 27 of
-the VIDSCRNPAGE (framebuffer physical offset) to be tied low. There is
-an unofficial errata for this bug at:
-	http://marc.info/?l=linux-arm-kernel&m=110061245502000&w=2
-
-By default the EP93xx framebuffer driver checks if the allocated physical
-address has bit 27 set. If it does, then the memory is freed and an
-error is returned. The check can be disabled by adding the following
-option when loading the driver:
-
-      ep93xx-fb.check_screenpage_bug=0
-
-In some cases it may be possible to reconfigure your SDRAM layout to
-avoid this bug. See section 13 of the EP93xx users' guide for details.
diff --git a/Documentation/fb/fbcon.rst b/Documentation/fb/fbcon.rst
new file mode 100644
index 000000000000..cfb9f7c38f18
--- /dev/null
+++ b/Documentation/fb/fbcon.rst
@@ -0,0 +1,350 @@
+=======================
+The Framebuffer Console
+=======================
+
+The framebuffer console (fbcon), as its name implies, is a text
+console running on top of the framebuffer device. It has the functionality of
+any standard text console driver, such as the VGA console, with the added
+features that can be attributed to the graphical nature of the framebuffer.
+
+In the x86 architecture, the framebuffer console is optional, and
+some even treat it as a toy. For other architectures, it is the only available
+display device, text or graphical.
+
+What are the features of fbcon?  The framebuffer console supports
+high resolutions, varying font types, display rotation, primitive multihead,
+etc. Theoretically, multi-colored fonts, blending, aliasing, and any feature
+made available by the underlying graphics card are also possible.
+
+A. Configuration
+================
+
+The framebuffer console can be enabled by using your favorite kernel
+configuration tool.  It is under Device Drivers->Graphics Support->Frame
+buffer Devices->Console display driver support->Framebuffer Console Support.
+Select 'y' to compile support statically or 'm' for module support.  The
+module will be fbcon.
+
+In order for fbcon to activate, at least one framebuffer driver is
+required, so choose from any of the numerous drivers available. For x86
+systems, they almost universally have VGA cards, so vga16fb and vesafb will
+always be available. However, using a chipset-specific driver will give you
+more speed and features, such as the ability to change the video mode
+dynamically.
+
+To display the penguin logo, choose any logo available in Graphics
+support->Bootup logo.
+
+Also, you will need to select at least one compiled-in font, but if
+you don't do anything, the kernel configuration tool will select one for you,
+usually an 8x16 font.
+
+GOTCHA: A common bug report is enabling the framebuffer without enabling the
+framebuffer console.  Depending on the driver, you may get a blanked or
+garbled display, but the system still boots to completion.  If you are
+fortunate to have a driver that does not alter the graphics chip, then you
+will still get a VGA console.
+
+B. Loading
+==========
+
+Possible scenarios:
+
+1. Driver and fbcon are compiled statically
+
+	 Usually, fbcon will automatically take over your console. The notable
+	 exception is vesafb.  It needs to be explicitly activated with the
+	 vga= boot option parameter.
+
+2. Driver is compiled statically, fbcon is compiled as a module
+
+	 Depending on the driver, you either get a standard console, or a
+	 garbled display, as mentioned above.  To get a framebuffer console,
+	 do a 'modprobe fbcon'.
+
+3. Driver is compiled as a module, fbcon is compiled statically
+
+	 You get your standard console.  Once the driver is loaded with
+	 'modprobe xxxfb', fbcon automatically takes over the console with
+	 the possible exception of using the fbcon=map:n option. See below.
+
+4. Driver and fbcon are compiled as a module.
+
+	 You can load them in any order. Once both are loaded, fbcon will take
+	 over the console.
+
+C. Boot options
+
+	 The framebuffer console has several, largely unknown, boot options
+	 that can change its behavior.
+
+1. fbcon=font:<name>
+
+	Select the initial font to use. The value 'name' can be any of the
+	compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6,
+	PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8.
+
+	Note, not all drivers can handle font with widths not divisible by 8,
+	such as vga16fb.
+
+2. fbcon=scrollback:<value>[k]
+
+	The scrollback buffer is memory that is used to preserve display
+	contents that has already scrolled past your view.  This is accessed
+	by using the Shift-PageUp key combination.  The value 'value' is any
+	integer. It defaults to 32KB.  The 'k' suffix is optional, and will
+	multiply the 'value' by 1024.
+
+3. fbcon=map:<0123>
+
+	This is an interesting option. It tells which driver gets mapped to
+	which console. The value '0123' is a sequence that gets repeated until
+	the total length is 64 which is the number of consoles available. In
+	the above example, it is expanded to 012301230123... and the mapping
+	will be::
+
+		tty | 1 2 3 4 5 6 7 8 9 ...
+		fb  | 0 1 2 3 0 1 2 3 0 ...
+
+		('cat /proc/fb' should tell you what the fb numbers are)
+
+	One side effect that may be useful is using a map value that exceeds
+	the number of loaded fb drivers. For example, if only one driver is
+	available, fb0, adding fbcon=map:1 tells fbcon not to take over the
+	console.
+
+	Later on, when you want to map the console the to the framebuffer
+	device, you can use the con2fbmap utility.
+
+4. fbcon=vc:<n1>-<n2>
+
+	This option tells fbcon to take over only a range of consoles as
+	specified by the values 'n1' and 'n2'. The rest of the consoles
+	outside the given range will still be controlled by the standard
+	console driver.
+
+	NOTE: For x86 machines, the standard console is the VGA console which
+	is typically located on the same video card.  Thus, the consoles that
+	are controlled by the VGA console will be garbled.
+
+4. fbcon=rotate:<n>
+
+	This option changes the orientation angle of the console display. The
+	value 'n' accepts the following:
+
+	    - 0 - normal orientation (0 degree)
+	    - 1 - clockwise orientation (90 degrees)
+	    - 2 - upside down orientation (180 degrees)
+	    - 3 - counterclockwise orientation (270 degrees)
+
+	The angle can be changed anytime afterwards by 'echoing' the same
+	numbers to any one of the 2 attributes found in
+	/sys/class/graphics/fbcon:
+
+		- rotate     - rotate the display of the active console
+		- rotate_all - rotate the display of all consoles
+
+	Console rotation will only become available if Framebuffer Console
+	Rotation support is compiled in your kernel.
+
+	NOTE: This is purely console rotation.  Any other applications that
+	use the framebuffer will remain at their 'normal' orientation.
+	Actually, the underlying fb driver is totally ignorant of console
+	rotation.
+
+5. fbcon=margin:<color>
+
+	This option specifies the color of the margins. The margins are the
+	leftover area at the right and the bottom of the screen that are not
+	used by text. By default, this area will be black. The 'color' value
+	is an integer number that depends on the framebuffer driver being used.
+
+6. fbcon=nodefer
+
+	If the kernel is compiled with deferred fbcon takeover support, normally
+	the framebuffer contents, left in place by the firmware/bootloader, will
+	be preserved until there actually is some text is output to the console.
+	This option causes fbcon to bind immediately to the fbdev device.
+
+7. fbcon=logo-pos:<location>
+
+	The only possible 'location' is 'center' (without quotes), and when
+	given, the bootup logo is moved from the default top-left corner
+	location to the center of the framebuffer. If more than one logo is
+	displayed due to multiple CPUs, the collected line of logos is moved
+	as a whole.
+
+C. Attaching, Detaching and Unloading
+
+Before going on to how to attach, detach and unload the framebuffer console, an
+illustration of the dependencies may help.
+
+The console layer, as with most subsystems, needs a driver that interfaces with
+the hardware. Thus, in a VGA console::
+
+	console ---> VGA driver ---> hardware.
+
+Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
+from the console layer before unloading the driver.  The VGA driver cannot be
+unloaded if it is still bound to the console layer. (See
+Documentation/console/console.txt for more information).
+
+This is more complicated in the case of the framebuffer console (fbcon),
+because fbcon is an intermediate layer between the console and the drivers::
+
+	console ---> fbcon ---> fbdev drivers ---> hardware
+
+The fbdev drivers cannot be unloaded if bound to fbcon, and fbcon cannot
+be unloaded if it's bound to the console layer.
+
+So to unload the fbdev drivers, one must first unbind fbcon from the console,
+then unbind the fbdev drivers from fbcon.  Fortunately, unbinding fbcon from
+the console layer will automatically unbind framebuffer drivers from
+fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
+fbcon.
+
+So, how do we unbind fbcon from the console? Part of the answer is in
+Documentation/console/console.txt. To summarize:
+
+Echo a value to the bind file that represents the framebuffer console
+driver. So assuming vtcon1 represents fbcon, then::
+
+  echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to
+					     console layer
+  echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from
+					     console layer
+
+If fbcon is detached from the console layer, your boot console driver (which is
+usually VGA text mode) will take over.  A few drivers (rivafb and i810fb) will
+restore VGA text mode for you.  With the rest, before detaching fbcon, you
+must take a few additional steps to make sure that your VGA text mode is
+restored properly. The following is one of the several methods that you can do:
+
+1. Download or install vbetool.  This utility is included with most
+   distributions nowadays, and is usually part of the suspend/resume tool.
+
+2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set
+   to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers.
+
+3. Boot into text mode and as root run::
+
+	vbetool vbestate save > <vga state file>
+
+   The above command saves the register contents of your graphics
+   hardware to <vga state file>.  You need to do this step only once as
+   the state file can be reused.
+
+4. If fbcon is compiled as a module, load fbcon by doing::
+
+       modprobe fbcon
+
+5. Now to detach fbcon::
+
+       vbetool vbestate restore < <vga state file> && \
+       echo 0 > /sys/class/vtconsole/vtcon1/bind
+
+6. That's it, you're back to VGA mode. And if you compiled fbcon as a module,
+   you can unload it by 'rmmod fbcon'.
+
+7. To reattach fbcon::
+
+       echo 1 > /sys/class/vtconsole/vtcon1/bind
+
+8. Once fbcon is unbound, all drivers registered to the system will also
+become unbound.  This means that fbcon and individual framebuffer drivers
+can be unloaded or reloaded at will. Reloading the drivers or fbcon will
+automatically bind the console, fbcon and the drivers together. Unloading
+all the drivers without unloading fbcon will make it impossible for the
+console to bind fbcon.
+
+Notes for vesafb users:
+=======================
+
+Unfortunately, if your bootline includes a vga=xxx parameter that sets the
+hardware in graphics mode, such as when loading vesafb, vgacon will not load.
+Instead, vgacon will replace the default boot console with dummycon, and you
+won't get any display after detaching fbcon. Your machine is still alive, so
+you can reattach vesafb. However, to reattach vesafb, you need to do one of
+the following:
+
+Variation 1:
+
+    a. Before detaching fbcon, do::
+
+	vbetool vbemode save > <vesa state file> # do once for each vesafb mode,
+						 # the file can be reused
+
+    b. Detach fbcon as in step 5.
+
+    c. Attach fbcon::
+
+	vbetool vbestate restore < <vesa state file> && \
+	echo 1 > /sys/class/vtconsole/vtcon1/bind
+
+Variation 2:
+
+    a. Before detaching fbcon, do::
+
+	echo <ID> > /sys/class/tty/console/bind
+
+	vbetool vbemode get
+
+    b. Take note of the mode number
+
+    b. Detach fbcon as in step 5.
+
+    c. Attach fbcon::
+
+	vbetool vbemode set <mode number> && \
+	echo 1 > /sys/class/vtconsole/vtcon1/bind
+
+Samples:
+========
+
+Here are 2 sample bash scripts that you can use to bind or unbind the
+framebuffer console driver if you are on an X86 box::
+
+  #!/bin/bash
+  # Unbind fbcon
+
+  # Change this to where your actual vgastate file is located
+  # Or Use VGASTATE=$1 to indicate the state file at runtime
+  VGASTATE=/tmp/vgastate
+
+  # path to vbetool
+  VBETOOL=/usr/local/bin
+
+
+  for (( i = 0; i < 16; i++))
+  do
+    if test -x /sys/class/vtconsole/vtcon$i; then
+	if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
+	     = 1 ]; then
+	    if test -x $VBETOOL/vbetool; then
+	       echo Unbinding vtcon$i
+	       $VBETOOL/vbetool vbestate restore < $VGASTATE
+	       echo 0 > /sys/class/vtconsole/vtcon$i/bind
+	    fi
+	fi
+    fi
+  done
+
+---------------------------------------------------------------------------
+
+::
+
+  #!/bin/bash
+  # Bind fbcon
+
+  for (( i = 0; i < 16; i++))
+  do
+    if test -x /sys/class/vtconsole/vtcon$i; then
+	if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
+	     = 1 ]; then
+	  echo Unbinding vtcon$i
+	  echo 1 > /sys/class/vtconsole/vtcon$i/bind
+	fi
+    fi
+  done
+
+Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.txt
deleted file mode 100644
index 60a5ec04e8f0..000000000000
--- a/Documentation/fb/fbcon.txt
+++ /dev/null
@@ -1,347 +0,0 @@
-The Framebuffer Console
-=======================
-
-	The framebuffer console (fbcon), as its name implies, is a text
-console running on top of the framebuffer device. It has the functionality of
-any standard text console driver, such as the VGA console, with the added
-features that can be attributed to the graphical nature of the framebuffer.
-
-	 In the x86 architecture, the framebuffer console is optional, and
-some even treat it as a toy. For other architectures, it is the only available
-display device, text or graphical.
-
-	 What are the features of fbcon?  The framebuffer console supports
-high resolutions, varying font types, display rotation, primitive multihead,
-etc. Theoretically, multi-colored fonts, blending, aliasing, and any feature
-made available by the underlying graphics card are also possible.
-
-A. Configuration
-
-	The framebuffer console can be enabled by using your favorite kernel
-configuration tool.  It is under Device Drivers->Graphics Support->Frame
-buffer Devices->Console display driver support->Framebuffer Console Support.
-Select 'y' to compile support statically or 'm' for module support.  The
-module will be fbcon.
-
-	In order for fbcon to activate, at least one framebuffer driver is
-required, so choose from any of the numerous drivers available. For x86
-systems, they almost universally have VGA cards, so vga16fb and vesafb will
-always be available. However, using a chipset-specific driver will give you
-more speed and features, such as the ability to change the video mode
-dynamically.
-
-	To display the penguin logo, choose any logo available in Graphics
-support->Bootup logo.
-
-	Also, you will need to select at least one compiled-in font, but if
-you don't do anything, the kernel configuration tool will select one for you,
-usually an 8x16 font.
-
-GOTCHA: A common bug report is enabling the framebuffer without enabling the
-framebuffer console.  Depending on the driver, you may get a blanked or
-garbled display, but the system still boots to completion.  If you are
-fortunate to have a driver that does not alter the graphics chip, then you
-will still get a VGA console.
-
-B. Loading
-
-Possible scenarios:
-
-1. Driver and fbcon are compiled statically
-
-	 Usually, fbcon will automatically take over your console. The notable
-	 exception is vesafb.  It needs to be explicitly activated with the
-	 vga= boot option parameter.
-
-2. Driver is compiled statically, fbcon is compiled as a module
-
-	 Depending on the driver, you either get a standard console, or a
-	 garbled display, as mentioned above.  To get a framebuffer console,
-	 do a 'modprobe fbcon'.
-
-3. Driver is compiled as a module, fbcon is compiled statically
-
-	 You get your standard console.  Once the driver is loaded with
-	 'modprobe xxxfb', fbcon automatically takes over the console with
-	 the possible exception of using the fbcon=map:n option. See below.
-
-4. Driver and fbcon are compiled as a module.
-
-	 You can load them in any order. Once both are loaded, fbcon will take
-	 over the console.
-
-C. Boot options
-
-         The framebuffer console has several, largely unknown, boot options
-         that can change its behavior.
-
-1. fbcon=font:<name>
-
-        Select the initial font to use. The value 'name' can be any of the
-        compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6,
-        PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8.
-
-	Note, not all drivers can handle font with widths not divisible by 8,
-        such as vga16fb.
-
-2. fbcon=scrollback:<value>[k]
-
-        The scrollback buffer is memory that is used to preserve display
-        contents that has already scrolled past your view.  This is accessed
-        by using the Shift-PageUp key combination.  The value 'value' is any
-        integer. It defaults to 32KB.  The 'k' suffix is optional, and will
-        multiply the 'value' by 1024.
-
-3. fbcon=map:<0123>
-
-        This is an interesting option. It tells which driver gets mapped to
-        which console. The value '0123' is a sequence that gets repeated until
-        the total length is 64 which is the number of consoles available. In
-        the above example, it is expanded to 012301230123... and the mapping
-        will be:
-
-		tty | 1 2 3 4 5 6 7 8 9 ...
-		fb  | 0 1 2 3 0 1 2 3 0 ...
-
-		('cat /proc/fb' should tell you what the fb numbers are)
-
-	One side effect that may be useful is using a map value that exceeds
-	the number of loaded fb drivers. For example, if only one driver is
-	available, fb0, adding fbcon=map:1 tells fbcon not to take over the
-	console.
-
-	Later on, when you want to map the console the to the framebuffer
-	device, you can use the con2fbmap utility.
-
-4. fbcon=vc:<n1>-<n2>
-
-	This option tells fbcon to take over only a range of consoles as
-	specified by the values 'n1' and 'n2'. The rest of the consoles
-	outside the given range will still be controlled by the standard
-	console driver.
-
-	NOTE: For x86 machines, the standard console is the VGA console which
-	is typically located on the same video card.  Thus, the consoles that
-	are controlled by the VGA console will be garbled.
-
-4. fbcon=rotate:<n>
-
-        This option changes the orientation angle of the console display. The
-        value 'n' accepts the following:
-
-	      0 - normal orientation (0 degree)
-	      1 - clockwise orientation (90 degrees)
-	      2 - upside down orientation (180 degrees)
-	      3 - counterclockwise orientation (270 degrees)
-
-	The angle can be changed anytime afterwards by 'echoing' the same
-	numbers to any one of the 2 attributes found in
-	/sys/class/graphics/fbcon:
-
-		rotate     - rotate the display of the active console
-		rotate_all - rotate the display of all consoles
-
-	Console rotation will only become available if Framebuffer Console
-	Rotation support is compiled in your kernel.
-
-	NOTE: This is purely console rotation.  Any other applications that
-	use the framebuffer will remain at their 'normal' orientation.
-	Actually, the underlying fb driver is totally ignorant of console
-	rotation.
-
-5. fbcon=margin:<color>
-
-	This option specifies the color of the margins. The margins are the
-	leftover area at the right and the bottom of the screen that are not
-	used by text. By default, this area will be black. The 'color' value
-	is an integer number that depends on the framebuffer driver being used.
-
-6. fbcon=nodefer
-
-	If the kernel is compiled with deferred fbcon takeover support, normally
-	the framebuffer contents, left in place by the firmware/bootloader, will
-	be preserved until there actually is some text is output to the console.
-	This option causes fbcon to bind immediately to the fbdev device.
-
-7. fbcon=logo-pos:<location>
-
-	The only possible 'location' is 'center' (without quotes), and when
-	given, the bootup logo is moved from the default top-left corner
-	location to the center of the framebuffer. If more than one logo is
-	displayed due to multiple CPUs, the collected line of logos is moved
-	as a whole.
-
-C. Attaching, Detaching and Unloading
-
-Before going on to how to attach, detach and unload the framebuffer console, an
-illustration of the dependencies may help.
-
-The console layer, as with most subsystems, needs a driver that interfaces with
-the hardware. Thus, in a VGA console:
-
-console ---> VGA driver ---> hardware.
-
-Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
-from the console layer before unloading the driver.  The VGA driver cannot be
-unloaded if it is still bound to the console layer. (See
-Documentation/console/console.txt for more information).
-
-This is more complicated in the case of the framebuffer console (fbcon),
-because fbcon is an intermediate layer between the console and the drivers:
-
-console ---> fbcon ---> fbdev drivers ---> hardware
-
-The fbdev drivers cannot be unloaded if bound to fbcon, and fbcon cannot
-be unloaded if it's bound to the console layer.
-
-So to unload the fbdev drivers, one must first unbind fbcon from the console,
-then unbind the fbdev drivers from fbcon.  Fortunately, unbinding fbcon from
-the console layer will automatically unbind framebuffer drivers from
-fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
-fbcon.
-
-So, how do we unbind fbcon from the console? Part of the answer is in
-Documentation/console/console.txt. To summarize:
-
-Echo a value to the bind file that represents the framebuffer console
-driver. So assuming vtcon1 represents fbcon, then:
-
-echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to
-                                           console layer
-echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from
-                                           console layer
-
-If fbcon is detached from the console layer, your boot console driver (which is
-usually VGA text mode) will take over.  A few drivers (rivafb and i810fb) will
-restore VGA text mode for you.  With the rest, before detaching fbcon, you
-must take a few additional steps to make sure that your VGA text mode is
-restored properly. The following is one of the several methods that you can do:
-
-1. Download or install vbetool.  This utility is included with most
-   distributions nowadays, and is usually part of the suspend/resume tool.
-
-2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set
-   to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers.
-
-3. Boot into text mode and as root run:
-
-	vbetool vbestate save > <vga state file>
-
-	The above command saves the register contents of your graphics
-	hardware to <vga state file>.  You need to do this step only once as
-	the state file can be reused.
-
-4. If fbcon is compiled as a module, load fbcon by doing:
-
-       modprobe fbcon
-
-5. Now to detach fbcon:
-
-       vbetool vbestate restore < <vga state file> && \
-       echo 0 > /sys/class/vtconsole/vtcon1/bind
-
-6. That's it, you're back to VGA mode. And if you compiled fbcon as a module,
-   you can unload it by 'rmmod fbcon'.
-
-7. To reattach fbcon:
-
-       echo 1 > /sys/class/vtconsole/vtcon1/bind
-
-8. Once fbcon is unbound, all drivers registered to the system will also
-become unbound.  This means that fbcon and individual framebuffer drivers
-can be unloaded or reloaded at will. Reloading the drivers or fbcon will
-automatically bind the console, fbcon and the drivers together. Unloading
-all the drivers without unloading fbcon will make it impossible for the
-console to bind fbcon.
-
-Notes for vesafb users:
-=======================
-
-Unfortunately, if your bootline includes a vga=xxx parameter that sets the
-hardware in graphics mode, such as when loading vesafb, vgacon will not load.
-Instead, vgacon will replace the default boot console with dummycon, and you
-won't get any display after detaching fbcon. Your machine is still alive, so
-you can reattach vesafb. However, to reattach vesafb, you need to do one of
-the following:
-
-Variation 1:
-
-    a. Before detaching fbcon, do
-
-       vbetool vbemode save > <vesa state file> # do once for each vesafb mode,
-						# the file can be reused
-
-    b. Detach fbcon as in step 5.
-
-    c. Attach fbcon
-
-        vbetool vbestate restore < <vesa state file> && \
-	echo 1 > /sys/class/vtconsole/vtcon1/bind
-
-Variation 2:
-
-    a. Before detaching fbcon, do:
-	echo <ID> > /sys/class/tty/console/bind
-
-
-       vbetool vbemode get
-
-    b. Take note of the mode number
-
-    b. Detach fbcon as in step 5.
-
-    c. Attach fbcon:
-
-       vbetool vbemode set <mode number> && \
-       echo 1 > /sys/class/vtconsole/vtcon1/bind
-
-Samples:
-========
-
-Here are 2 sample bash scripts that you can use to bind or unbind the
-framebuffer console driver if you are on an X86 box:
-
----------------------------------------------------------------------------
-#!/bin/bash
-# Unbind fbcon
-
-# Change this to where your actual vgastate file is located
-# Or Use VGASTATE=$1 to indicate the state file at runtime
-VGASTATE=/tmp/vgastate
-
-# path to vbetool
-VBETOOL=/usr/local/bin
-
-
-for (( i = 0; i < 16; i++))
-do
-  if test -x /sys/class/vtconsole/vtcon$i; then
-      if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
-           = 1 ]; then
-	    if test -x $VBETOOL/vbetool; then
-	       echo Unbinding vtcon$i
-	       $VBETOOL/vbetool vbestate restore < $VGASTATE
-	       echo 0 > /sys/class/vtconsole/vtcon$i/bind
-	    fi
-      fi
-  fi
-done
-
----------------------------------------------------------------------------
-#!/bin/bash
-# Bind fbcon
-
-for (( i = 0; i < 16; i++))
-do
-  if test -x /sys/class/vtconsole/vtcon$i; then
-      if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
-           = 1 ]; then
-	  echo Unbinding vtcon$i
-	  echo 1 > /sys/class/vtconsole/vtcon$i/bind
-      fi
-  fi
-done
----------------------------------------------------------------------------
-
---
-Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/fb/framebuffer.rst b/Documentation/fb/framebuffer.rst
new file mode 100644
index 000000000000..7fe087310c82
--- /dev/null
+++ b/Documentation/fb/framebuffer.rst
@@ -0,0 +1,353 @@
+=======================
+The Frame Buffer Device
+=======================
+
+Last revised: May 10, 2001
+
+
+0. Introduction
+---------------
+
+The frame buffer device provides an abstraction for the graphics hardware. It
+represents the frame buffer of some video hardware and allows application
+software to access the graphics hardware through a well-defined interface, so
+the software doesn't need to know anything about the low-level (hardware
+register) stuff.
+
+The device is accessed through special device nodes, usually located in the
+/dev directory, i.e. /dev/fb*.
+
+
+1. User's View of /dev/fb*
+--------------------------
+
+From the user's point of view, the frame buffer device looks just like any
+other device in /dev. It's a character device using major 29; the minor
+specifies the frame buffer number.
+
+By convention, the following device nodes are used (numbers indicate the device
+minor numbers)::
+
+      0 = /dev/fb0	First frame buffer
+      1 = /dev/fb1	Second frame buffer
+	  ...
+     31 = /dev/fb31	32nd frame buffer
+
+For backwards compatibility, you may want to create the following symbolic
+links::
+
+    /dev/fb0current -> fb0
+    /dev/fb1current -> fb1
+
+and so on...
+
+The frame buffer devices are also `normal` memory devices, this means, you can
+read and write their contents. You can, for example, make a screen snapshot by::
+
+  cp /dev/fb0 myfile
+
+There also can be more than one frame buffer at a time, e.g. if you have a
+graphics card in addition to the built-in hardware. The corresponding frame
+buffer devices (/dev/fb0 and /dev/fb1 etc.) work independently.
+
+Application software that uses the frame buffer device (e.g. the X server) will
+use /dev/fb0 by default (older software uses /dev/fb0current). You can specify
+an alternative frame buffer device by setting the environment variable
+$FRAMEBUFFER to the path name of a frame buffer device, e.g. (for sh/bash
+users)::
+
+    export FRAMEBUFFER=/dev/fb1
+
+or (for csh users)::
+
+    setenv FRAMEBUFFER /dev/fb1
+
+After this the X server will use the second frame buffer.
+
+
+2. Programmer's View of /dev/fb*
+--------------------------------
+
+As you already know, a frame buffer device is a memory device like /dev/mem and
+it has the same features. You can read it, write it, seek to some location in
+it and mmap() it (the main usage). The difference is just that the memory that
+appears in the special file is not the whole memory, but the frame buffer of
+some video hardware.
+
+/dev/fb* also allows several ioctls on it, by which lots of information about
+the hardware can be queried and set. The color map handling works via ioctls,
+too. Look into <linux/fb.h> for more information on what ioctls exist and on
+which data structures they work. Here's just a brief overview:
+
+  - You can request unchangeable information about the hardware, like name,
+    organization of the screen memory (planes, packed pixels, ...) and address
+    and length of the screen memory.
+
+  - You can request and change variable information about the hardware, like
+    visible and virtual geometry, depth, color map format, timing, and so on.
+    If you try to change that information, the driver maybe will round up some
+    values to meet the hardware's capabilities (or return EINVAL if that isn't
+    possible).
+
+  - You can get and set parts of the color map. Communication is done with 16
+    bits per color part (red, green, blue, transparency) to support all
+    existing hardware. The driver does all the computations needed to apply
+    it to the hardware (round it down to less bits, maybe throw away
+    transparency).
+
+All this hardware abstraction makes the implementation of application programs
+easier and more portable. E.g. the X server works completely on /dev/fb* and
+thus doesn't need to know, for example, how the color registers of the concrete
+hardware are organized. XF68_FBDev is a general X server for bitmapped,
+unaccelerated video hardware. The only thing that has to be built into
+application programs is the screen organization (bitplanes or chunky pixels
+etc.), because it works on the frame buffer image data directly.
+
+For the future it is planned that frame buffer drivers for graphics cards and
+the like can be implemented as kernel modules that are loaded at runtime. Such
+a driver just has to call register_framebuffer() and supply some functions.
+Writing and distributing such drivers independently from the kernel will save
+much trouble...
+
+
+3. Frame Buffer Resolution Maintenance
+--------------------------------------
+
+Frame buffer resolutions are maintained using the utility `fbset`. It can
+change the video mode properties of a frame buffer device. Its main usage is
+to change the current video mode, e.g. during boot up in one of your `/etc/rc.*`
+or `/etc/init.d/*` files.
+
+Fbset uses a video mode database stored in a configuration file, so you can
+easily add your own modes and refer to them with a simple identifier.
+
+
+4. The X Server
+---------------
+
+The X server (XF68_FBDev) is the most notable application program for the frame
+buffer device. Starting with XFree86 release 3.2, the X server is part of
+XFree86 and has 2 modes:
+
+  - If the `Display` subsection for the `fbdev` driver in the /etc/XF86Config
+    file contains a::
+
+	Modes "default"
+
+    line, the X server will use the scheme discussed above, i.e. it will start
+    up in the resolution determined by /dev/fb0 (or $FRAMEBUFFER, if set). You
+    still have to specify the color depth (using the Depth keyword) and virtual
+    resolution (using the Virtual keyword) though. This is the default for the
+    configuration file supplied with XFree86. It's the most simple
+    configuration, but it has some limitations.
+
+  - Therefore it's also possible to specify resolutions in the /etc/XF86Config
+    file. This allows for on-the-fly resolution switching while retaining the
+    same virtual desktop size. The frame buffer device that's used is still
+    /dev/fb0current (or $FRAMEBUFFER), but the available resolutions are
+    defined by /etc/XF86Config now. The disadvantage is that you have to
+    specify the timings in a different format (but `fbset -x` may help).
+
+To tune a video mode, you can use fbset or xvidtune. Note that xvidtune doesn't
+work 100% with XF68_FBDev: the reported clock values are always incorrect.
+
+
+5. Video Mode Timings
+---------------------
+
+A monitor draws an image on the screen by using an electron beam (3 electron
+beams for color models, 1 electron beam for monochrome monitors). The front of
+the screen is covered by a pattern of colored phosphors (pixels). If a phosphor
+is hit by an electron, it emits a photon and thus becomes visible.
+
+The electron beam draws horizontal lines (scanlines) from left to right, and
+from the top to the bottom of the screen. By modifying the intensity of the
+electron beam, pixels with various colors and intensities can be shown.
+
+After each scanline the electron beam has to move back to the left side of the
+screen and to the next line: this is called the horizontal retrace. After the
+whole screen (frame) was painted, the beam moves back to the upper left corner:
+this is called the vertical retrace. During both the horizontal and vertical
+retrace, the electron beam is turned off (blanked).
+
+The speed at which the electron beam paints the pixels is determined by the
+dotclock in the graphics board. For a dotclock of e.g. 28.37516 MHz (millions
+of cycles per second), each pixel is 35242 ps (picoseconds) long::
+
+    1/(28.37516E6 Hz) = 35.242E-9 s
+
+If the screen resolution is 640x480, it will take::
+
+    640*35.242E-9 s = 22.555E-6 s
+
+to paint the 640 (xres) pixels on one scanline. But the horizontal retrace
+also takes time (e.g. 272 `pixels`), so a full scanline takes::
+
+    (640+272)*35.242E-9 s = 32.141E-6 s
+
+We'll say that the horizontal scanrate is about 31 kHz::
+
+    1/(32.141E-6 s) = 31.113E3 Hz
+
+A full screen counts 480 (yres) lines, but we have to consider the vertical
+retrace too (e.g. 49 `lines`). So a full screen will take::
+
+    (480+49)*32.141E-6 s = 17.002E-3 s
+
+The vertical scanrate is about 59 Hz::
+
+    1/(17.002E-3 s) = 58.815 Hz
+
+This means the screen data is refreshed about 59 times per second. To have a
+stable picture without visible flicker, VESA recommends a vertical scanrate of
+at least 72 Hz. But the perceived flicker is very human dependent: some people
+can use 50 Hz without any trouble, while I'll notice if it's less than 80 Hz.
+
+Since the monitor doesn't know when a new scanline starts, the graphics board
+will supply a synchronization pulse (horizontal sync or hsync) for each
+scanline.  Similarly it supplies a synchronization pulse (vertical sync or
+vsync) for each new frame. The position of the image on the screen is
+influenced by the moments at which the synchronization pulses occur.
+
+The following picture summarizes all timings. The horizontal retrace time is
+the sum of the left margin, the right margin and the hsync length, while the
+vertical retrace time is the sum of the upper margin, the lower margin and the
+vsync length::
+
+  +----------+---------------------------------------------+----------+-------+
+  |          |                ↑                            |          |       |
+  |          |                |upper_margin                |          |       |
+  |          |                ↓                            |          |       |
+  +----------###############################################----------+-------+
+  |          #                ↑                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |   left   #                |                            #  right   | hsync |
+  |  margin  #                |       xres                 #  margin  |  len  |
+  |<-------->#<---------------+--------------------------->#<-------->|<----->|
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |yres                        #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                |                            #          |       |
+  |          #                ↓                            #          |       |
+  +----------###############################################----------+-------+
+  |          |                ↑                            |          |       |
+  |          |                |lower_margin                |          |       |
+  |          |                ↓                            |          |       |
+  +----------+---------------------------------------------+----------+-------+
+  |          |                ↑                            |          |       |
+  |          |                |vsync_len                   |          |       |
+  |          |                ↓                            |          |       |
+  +----------+---------------------------------------------+----------+-------+
+
+The frame buffer device expects all horizontal timings in number of dotclocks
+(in picoseconds, 1E-12 s), and vertical timings in number of scanlines.
+
+
+6. Converting XFree86 timing values info frame buffer device timings
+--------------------------------------------------------------------
+
+An XFree86 mode line consists of the following fields::
+
+ "800x600"     50      800  856  976 1040    600  637  643  666
+ < name >     DCF       HR  SH1  SH2  HFL     VR  SV1  SV2  VFL
+
+The frame buffer device uses the following fields:
+
+  - pixclock: pixel clock in ps (pico seconds)
+  - left_margin: time from sync to picture
+  - right_margin: time from picture to sync
+  - upper_margin: time from sync to picture
+  - lower_margin: time from picture to sync
+  - hsync_len: length of horizontal sync
+  - vsync_len: length of vertical sync
+
+1) Pixelclock:
+
+   xfree: in MHz
+
+   fb: in picoseconds (ps)
+
+   pixclock = 1000000 / DCF
+
+2) horizontal timings:
+
+   left_margin = HFL - SH2
+
+   right_margin = SH1 - HR
+
+   hsync_len = SH2 - SH1
+
+3) vertical timings:
+
+   upper_margin = VFL - SV2
+
+   lower_margin = SV1 - VR
+
+   vsync_len = SV2 - SV1
+
+Good examples for VESA timings can be found in the XFree86 source tree,
+under "xc/programs/Xserver/hw/xfree86/doc/modeDB.txt".
+
+
+7. References
+-------------
+
+For more specific information about the frame buffer device and its
+applications, please refer to the Linux-fbdev website:
+
+    http://linux-fbdev.sourceforge.net/
+
+and to the following documentation:
+
+  - The manual pages for fbset: fbset(8), fb.modes(5)
+  - The manual pages for XFree86: XF68_FBDev(1), XF86Config(4/5)
+  - The mighty kernel sources:
+
+      - linux/drivers/video/
+      - linux/include/linux/fb.h
+      - linux/include/video/
+
+
+
+8. Mailing list
+---------------
+
+There is a frame buffer device related mailing list at kernel.org:
+linux-fbdev@vger.kernel.org.
+
+Point your web browser to http://sourceforge.net/projects/linux-fbdev/ for
+subscription information and archive browsing.
+
+
+9. Downloading
+--------------
+
+All necessary files can be found at
+
+    ftp://ftp.uni-erlangen.de/pub/Linux/LOCAL/680x0/
+
+and on its mirrors.
+
+The latest version of fbset can be found at
+
+    http://www.linux-fbdev.org/
+
+
+10. Credits
+-----------
+
+This readme was written by Geert Uytterhoeven, partly based on the original
+`X-framebuffer.README` by Roman Hodek and Martin Schaller. Section 6 was
+provided by Frank Neumann.
+
+The frame buffer device abstraction was designed by Martin Schaller.
diff --git a/Documentation/fb/framebuffer.txt b/Documentation/fb/framebuffer.txt
deleted file mode 100644
index 58c5ae2e9f59..000000000000
--- a/Documentation/fb/framebuffer.txt
+++ /dev/null
@@ -1,343 +0,0 @@
-			The Frame Buffer Device
-			-----------------------
-
-Maintained by Geert Uytterhoeven <geert@linux-m68k.org>
-Last revised: May 10, 2001
-
-
-0. Introduction
----------------
-
-The frame buffer device provides an abstraction for the graphics hardware. It
-represents the frame buffer of some video hardware and allows application
-software to access the graphics hardware through a well-defined interface, so
-the software doesn't need to know anything about the low-level (hardware
-register) stuff.
-
-The device is accessed through special device nodes, usually located in the
-/dev directory, i.e. /dev/fb*.
-
-
-1. User's View of /dev/fb*
---------------------------
-
-From the user's point of view, the frame buffer device looks just like any
-other device in /dev. It's a character device using major 29; the minor
-specifies the frame buffer number.
-
-By convention, the following device nodes are used (numbers indicate the device
-minor numbers):
-
-      0 = /dev/fb0	First frame buffer
-      1 = /dev/fb1	Second frame buffer
-	  ...
-     31 = /dev/fb31	32nd frame buffer
-
-For backwards compatibility, you may want to create the following symbolic
-links:
-
-    /dev/fb0current -> fb0
-    /dev/fb1current -> fb1
-
-and so on...
-
-The frame buffer devices are also `normal' memory devices, this means, you can
-read and write their contents. You can, for example, make a screen snapshot by
-
-  cp /dev/fb0 myfile
-
-There also can be more than one frame buffer at a time, e.g. if you have a
-graphics card in addition to the built-in hardware. The corresponding frame
-buffer devices (/dev/fb0 and /dev/fb1 etc.) work independently.
-
-Application software that uses the frame buffer device (e.g. the X server) will
-use /dev/fb0 by default (older software uses /dev/fb0current). You can specify
-an alternative frame buffer device by setting the environment variable
-$FRAMEBUFFER to the path name of a frame buffer device, e.g. (for sh/bash
-users):
-
-    export FRAMEBUFFER=/dev/fb1
-
-or (for csh users):
-
-    setenv FRAMEBUFFER /dev/fb1
-
-After this the X server will use the second frame buffer.
-
-
-2. Programmer's View of /dev/fb*
---------------------------------
-
-As you already know, a frame buffer device is a memory device like /dev/mem and
-it has the same features. You can read it, write it, seek to some location in
-it and mmap() it (the main usage). The difference is just that the memory that
-appears in the special file is not the whole memory, but the frame buffer of
-some video hardware.
-
-/dev/fb* also allows several ioctls on it, by which lots of information about
-the hardware can be queried and set. The color map handling works via ioctls,
-too. Look into <linux/fb.h> for more information on what ioctls exist and on
-which data structures they work. Here's just a brief overview:
-
-  - You can request unchangeable information about the hardware, like name,
-    organization of the screen memory (planes, packed pixels, ...) and address
-    and length of the screen memory.
-
-  - You can request and change variable information about the hardware, like
-    visible and virtual geometry, depth, color map format, timing, and so on.
-    If you try to change that information, the driver maybe will round up some
-    values to meet the hardware's capabilities (or return EINVAL if that isn't
-    possible).
-
-  - You can get and set parts of the color map. Communication is done with 16
-    bits per color part (red, green, blue, transparency) to support all 
-    existing hardware. The driver does all the computations needed to apply 
-    it to the hardware (round it down to less bits, maybe throw away 
-    transparency).
-
-All this hardware abstraction makes the implementation of application programs
-easier and more portable. E.g. the X server works completely on /dev/fb* and
-thus doesn't need to know, for example, how the color registers of the concrete
-hardware are organized. XF68_FBDev is a general X server for bitmapped,
-unaccelerated video hardware. The only thing that has to be built into
-application programs is the screen organization (bitplanes or chunky pixels
-etc.), because it works on the frame buffer image data directly.
-
-For the future it is planned that frame buffer drivers for graphics cards and
-the like can be implemented as kernel modules that are loaded at runtime. Such
-a driver just has to call register_framebuffer() and supply some functions.
-Writing and distributing such drivers independently from the kernel will save
-much trouble...
-
-
-3. Frame Buffer Resolution Maintenance
---------------------------------------
-
-Frame buffer resolutions are maintained using the utility `fbset'. It can
-change the video mode properties of a frame buffer device. Its main usage is
-to change the current video mode, e.g. during boot up in one of your /etc/rc.*
-or /etc/init.d/* files.
-
-Fbset uses a video mode database stored in a configuration file, so you can
-easily add your own modes and refer to them with a simple identifier.
-
-
-4. The X Server
----------------
-
-The X server (XF68_FBDev) is the most notable application program for the frame
-buffer device. Starting with XFree86 release 3.2, the X server is part of
-XFree86 and has 2 modes:
-
-  - If the `Display' subsection for the `fbdev' driver in the /etc/XF86Config
-    file contains a
-
-	Modes "default"
-
-    line, the X server will use the scheme discussed above, i.e. it will start
-    up in the resolution determined by /dev/fb0 (or $FRAMEBUFFER, if set). You
-    still have to specify the color depth (using the Depth keyword) and virtual
-    resolution (using the Virtual keyword) though. This is the default for the
-    configuration file supplied with XFree86. It's the most simple
-    configuration, but it has some limitations.
-
-  - Therefore it's also possible to specify resolutions in the /etc/XF86Config
-    file. This allows for on-the-fly resolution switching while retaining the
-    same virtual desktop size. The frame buffer device that's used is still
-    /dev/fb0current (or $FRAMEBUFFER), but the available resolutions are
-    defined by /etc/XF86Config now. The disadvantage is that you have to
-    specify the timings in a different format (but `fbset -x' may help).
-
-To tune a video mode, you can use fbset or xvidtune. Note that xvidtune doesn't
-work 100% with XF68_FBDev: the reported clock values are always incorrect.
-
-
-5. Video Mode Timings
----------------------
-
-A monitor draws an image on the screen by using an electron beam (3 electron
-beams for color models, 1 electron beam for monochrome monitors). The front of
-the screen is covered by a pattern of colored phosphors (pixels). If a phosphor
-is hit by an electron, it emits a photon and thus becomes visible.
-
-The electron beam draws horizontal lines (scanlines) from left to right, and
-from the top to the bottom of the screen. By modifying the intensity of the
-electron beam, pixels with various colors and intensities can be shown.
-
-After each scanline the electron beam has to move back to the left side of the
-screen and to the next line: this is called the horizontal retrace. After the
-whole screen (frame) was painted, the beam moves back to the upper left corner:
-this is called the vertical retrace. During both the horizontal and vertical
-retrace, the electron beam is turned off (blanked).
-
-The speed at which the electron beam paints the pixels is determined by the
-dotclock in the graphics board. For a dotclock of e.g. 28.37516 MHz (millions
-of cycles per second), each pixel is 35242 ps (picoseconds) long:
-
-    1/(28.37516E6 Hz) = 35.242E-9 s
-
-If the screen resolution is 640x480, it will take
-
-    640*35.242E-9 s = 22.555E-6 s
-
-to paint the 640 (xres) pixels on one scanline. But the horizontal retrace
-also takes time (e.g. 272 `pixels'), so a full scanline takes
-
-    (640+272)*35.242E-9 s = 32.141E-6 s
-
-We'll say that the horizontal scanrate is about 31 kHz:
-
-    1/(32.141E-6 s) = 31.113E3 Hz
-
-A full screen counts 480 (yres) lines, but we have to consider the vertical
-retrace too (e.g. 49 `lines'). So a full screen will take
-
-    (480+49)*32.141E-6 s = 17.002E-3 s
-
-The vertical scanrate is about 59 Hz:
-
-    1/(17.002E-3 s) = 58.815 Hz
-
-This means the screen data is refreshed about 59 times per second. To have a
-stable picture without visible flicker, VESA recommends a vertical scanrate of
-at least 72 Hz. But the perceived flicker is very human dependent: some people
-can use 50 Hz without any trouble, while I'll notice if it's less than 80 Hz.
-
-Since the monitor doesn't know when a new scanline starts, the graphics board
-will supply a synchronization pulse (horizontal sync or hsync) for each
-scanline.  Similarly it supplies a synchronization pulse (vertical sync or
-vsync) for each new frame. The position of the image on the screen is
-influenced by the moments at which the synchronization pulses occur.
-
-The following picture summarizes all timings. The horizontal retrace time is
-the sum of the left margin, the right margin and the hsync length, while the
-vertical retrace time is the sum of the upper margin, the lower margin and the
-vsync length.
-
-  +----------+---------------------------------------------+----------+-------+
-  |          |                ↑                            |          |       |
-  |          |                |upper_margin                |          |       |
-  |          |                ↓                            |          |       |
-  +----------###############################################----------+-------+
-  |          #                ↑                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |   left   #                |                            #  right   | hsync |
-  |  margin  #                |       xres                 #  margin  |  len  |
-  |<-------->#<---------------+--------------------------->#<-------->|<----->|
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |yres                        #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                |                            #          |       |
-  |          #                ↓                            #          |       |
-  +----------###############################################----------+-------+
-  |          |                ↑                            |          |       |
-  |          |                |lower_margin                |          |       |
-  |          |                ↓                            |          |       |
-  +----------+---------------------------------------------+----------+-------+
-  |          |                ↑                            |          |       |
-  |          |                |vsync_len                   |          |       |
-  |          |                ↓                            |          |       |
-  +----------+---------------------------------------------+----------+-------+
-
-The frame buffer device expects all horizontal timings in number of dotclocks
-(in picoseconds, 1E-12 s), and vertical timings in number of scanlines.
-
-
-6. Converting XFree86 timing values info frame buffer device timings
---------------------------------------------------------------------
-
-An XFree86 mode line consists of the following fields:
- "800x600"     50      800  856  976 1040    600  637  643  666
- < name >     DCF       HR  SH1  SH2  HFL     VR  SV1  SV2  VFL
-
-The frame buffer device uses the following fields:
-
-  - pixclock: pixel clock in ps (pico seconds)
-  - left_margin: time from sync to picture
-  - right_margin: time from picture to sync
-  - upper_margin: time from sync to picture
-  - lower_margin: time from picture to sync
-  - hsync_len: length of horizontal sync
-  - vsync_len: length of vertical sync
-
-1) Pixelclock:
-   xfree: in MHz
-   fb: in picoseconds (ps)
-
-   pixclock = 1000000 / DCF
-
-2) horizontal timings:
-   left_margin = HFL - SH2
-   right_margin = SH1 - HR
-   hsync_len = SH2 - SH1
-
-3) vertical timings:
-   upper_margin = VFL - SV2
-   lower_margin = SV1 - VR
-   vsync_len = SV2 - SV1
-
-Good examples for VESA timings can be found in the XFree86 source tree,
-under "xc/programs/Xserver/hw/xfree86/doc/modeDB.txt".
-
-
-7. References
--------------
-
-For more specific information about the frame buffer device and its
-applications, please refer to the Linux-fbdev website:
-
-    http://linux-fbdev.sourceforge.net/
-
-and to the following documentation:
-
-  - The manual pages for fbset: fbset(8), fb.modes(5)
-  - The manual pages for XFree86: XF68_FBDev(1), XF86Config(4/5)
-  - The mighty kernel sources:
-      o linux/drivers/video/
-      o linux/include/linux/fb.h
-      o linux/include/video/
-
-
-
-8. Mailing list
----------------
-
-There is a frame buffer device related mailing list at kernel.org:
-linux-fbdev@vger.kernel.org.
-
-Point your web browser to http://sourceforge.net/projects/linux-fbdev/ for
-subscription information and archive browsing.
-
-
-9. Downloading
---------------
-
-All necessary files can be found at
-
-    ftp://ftp.uni-erlangen.de/pub/Linux/LOCAL/680x0/
-
-and on its mirrors.
-
-The latest version of fbset can be found at
-
-    http://www.linux-fbdev.org/ 
-
-  
-10. Credits                                                       
-----------                                                       
-                
-This readme was written by Geert Uytterhoeven, partly based on the original
-`X-framebuffer.README' by Roman Hodek and Martin Schaller. Section 6 was
-provided by Frank Neumann.
-
-The frame buffer device abstraction was designed by Martin Schaller.
diff --git a/Documentation/fb/gxfb.rst b/Documentation/fb/gxfb.rst
new file mode 100644
index 000000000000..5738709bccbb
--- /dev/null
+++ b/Documentation/fb/gxfb.rst
@@ -0,0 +1,54 @@
+=============
+What is gxfb?
+=============
+
+.. [This file is cloned from VesaFB/aty128fb]
+
+This is a graphics framebuffer driver for AMD Geode GX2 based processors.
+
+Advantages:
+
+ * No need to use AMD's VSA code (or other VESA emulation layer) in the
+   BIOS.
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts.
+ * You can run XF68_FBDev on top of /dev/fb0
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * graphic mode is slower than text mode...
+
+
+How to use it?
+==============
+
+Switching modes is done using  gxfb.mode_option=<resolution>... boot
+parameter or using `fbset` program.
+
+See Documentation/fb/modedb.rst for more information on modedb
+resolutions.
+
+
+X11
+===
+
+XF68_FBDev should generally work fine, but it is non-accelerated.
+
+
+Configuration
+=============
+
+You can pass kernel command line options to gxfb with gxfb.<option>.
+For example, gxfb.mode_option=800x600@75.
+Accepted options:
+
+================ ==================================================
+mode_option	 specify the video mode.  Of the form
+		 <x>x<y>[-<bpp>][@<refresh>]
+vram		 size of video ram (normally auto-detected)
+vt_switch	 enable vt switching during suspend/resume.  The vt
+		 switch is slow, but harmless.
+================ ==================================================
+
+Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/gxfb.txt b/Documentation/fb/gxfb.txt
deleted file mode 100644
index 2f640903bbb2..000000000000
--- a/Documentation/fb/gxfb.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-[This file is cloned from VesaFB/aty128fb]
-
-What is gxfb?
-=================
-
-This is a graphics framebuffer driver for AMD Geode GX2 based processors.
-
-Advantages:
-
- * No need to use AMD's VSA code (or other VESA emulation layer) in the
-   BIOS.
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts.
- * You can run XF68_FBDev on top of /dev/fb0
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * graphic mode is slower than text mode...
-
-
-How to use it?
-==============
-
-Switching modes is done using  gxfb.mode_option=<resolution>... boot
-parameter or using `fbset' program.
-
-See Documentation/fb/modedb.txt for more information on modedb
-resolutions.
-
-
-X11
-===
-
-XF68_FBDev should generally work fine, but it is non-accelerated.
-
-
-Configuration
-=============
-
-You can pass kernel command line options to gxfb with gxfb.<option>.
-For example, gxfb.mode_option=800x600@75.
-Accepted options:
-
-mode_option	- specify the video mode.  Of the form
-		  <x>x<y>[-<bpp>][@<refresh>]
-vram		- size of video ram (normally auto-detected)
-vt_switch	- enable vt switching during suspend/resume.  The vt
-		  switch is slow, but harmless.
-
---
-Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/index.rst b/Documentation/fb/index.rst
new file mode 100644
index 000000000000..d47313714635
--- /dev/null
+++ b/Documentation/fb/index.rst
@@ -0,0 +1,50 @@
+:orphan:
+
+============
+Frame Buffer
+============
+
+.. toctree::
+    :maxdepth: 1
+
+    api
+    arkfb
+    aty128fb
+    cirrusfb
+    cmap_xfbdev
+    deferred_io
+    efifb
+    ep93xx-fb
+    fbcon
+    framebuffer
+    gxfb
+    intel810
+    intelfb
+    internals
+    lxfb
+    matroxfb
+    metronomefb
+    modedb
+    pvr2fb
+    pxafb
+    s3fb
+    sa1100fb
+    sh7760fb
+    sisfb
+    sm501
+    sm712fb
+    sstfb
+    tgafb
+    tridentfb
+    udlfb
+    uvesafb
+    vesafb
+    viafb
+    vt8623fb
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/fb/intel810.rst b/Documentation/fb/intel810.rst
new file mode 100644
index 000000000000..eb86098db91f
--- /dev/null
+++ b/Documentation/fb/intel810.rst
@@ -0,0 +1,287 @@
+================================
+Intel 810/815 Framebuffer driver
+================================
+
+Tony Daplas <adaplas@pol.net>
+
+http://i810fb.sourceforge.net
+
+March 17, 2002
+
+First Released: July 2001
+Last Update:    September 12, 2005
+
+A. Introduction
+===============
+
+	This is a framebuffer driver for various Intel 810/815 compatible
+	graphics devices.  These include:
+
+	- Intel 810
+	- Intel 810E
+	- Intel 810-DC100
+	- Intel 815 Internal graphics only, 100Mhz FSB
+	- Intel 815 Internal graphics only
+	- Intel 815 Internal graphics and AGP
+
+B.  Features
+============
+
+	- Choice of using Discrete Video Timings, VESA Generalized Timing
+	  Formula, or a framebuffer specific database to set the video mode
+
+	- Supports a variable range of horizontal and vertical resolution and
+	  vertical refresh rates if the VESA Generalized Timing Formula is
+	  enabled.
+
+	- Supports color depths of 8, 16, 24 and 32 bits per pixel
+
+	- Supports pseudocolor, directcolor, or truecolor visuals
+
+	- Full and optimized hardware acceleration at 8, 16 and 24 bpp
+
+	- Robust video state save and restore
+
+	- MTRR support
+
+	- Utilizes user-entered monitor specifications to automatically
+	  calculate required video mode parameters.
+
+	- Can concurrently run with xfree86 running with native i810 drivers
+
+	- Hardware Cursor Support
+
+	- Supports EDID probing either by DDC/I2C or through the BIOS
+
+C.  List of available options
+=============================
+
+   a. "video=i810fb"
+	enables the i810 driver
+
+	Recommendation: required
+
+   b. "xres:<value>"
+	select horizontal resolution in pixels. (This parameter will be
+	ignored if 'mode_option' is specified.  See 'o' below).
+
+	Recommendation: user preference
+	(default = 640)
+
+   c. "yres:<value>"
+	select vertical resolution in scanlines. If Discrete Video Timings
+	is enabled, this will be ignored and computed as 3*xres/4.  (This
+	parameter will be ignored if 'mode_option' is specified.  See 'o'
+	below)
+
+	Recommendation: user preference
+	(default = 480)
+
+   d. "vyres:<value>"
+	select virtual vertical resolution in scanlines. If (0) or none
+	is specified, this will be computed against maximum available memory.
+
+	Recommendation: do not set
+	(default = 480)
+
+   e. "vram:<value>"
+	select amount of system RAM in MB to allocate for the video memory
+
+	Recommendation: 1 - 4 MB.
+	(default = 4)
+
+   f. "bpp:<value>"
+	select desired pixel depth
+
+	Recommendation: 8
+	(default = 8)
+
+   g. "hsync1/hsync2:<value>"
+	select the minimum and maximum Horizontal Sync Frequency of the
+	monitor in kHz.  If using a fixed frequency monitor, hsync1 must
+	be equal to hsync2. If EDID probing is successful, these will be
+	ignored and values will be taken from the EDID block.
+
+	Recommendation: check monitor manual for correct values
+	(default = 29/30)
+
+   h. "vsync1/vsync2:<value>"
+	select the minimum and maximum Vertical Sync Frequency of the monitor
+	in Hz. You can also use this option to lock your monitor's refresh
+	rate. If EDID probing is successful, these will be ignored and values
+	will be taken from the EDID block.
+
+	Recommendation: check monitor manual for correct values
+	(default = 60/60)
+
+	IMPORTANT:  If you need to clamp your timings, try to give some
+	leeway for computational errors (over/underflows).  Example: if
+	using vsync1/vsync2 = 60/60, make sure hsync1/hsync2 has at least
+	a 1 unit difference, and vice versa.
+
+   i. "voffset:<value>"
+	select at what offset in MB of the logical memory to allocate the
+	framebuffer memory.  The intent is to avoid the memory blocks
+	used by standard graphics applications (XFree86).  The default
+	offset (16 MB for a 64 MB aperture, 8 MB for a 32 MB aperture) will
+	avoid XFree86's usage and allows up to 7 MB/15 MB of framebuffer
+	memory.  Depending on your usage, adjust the value up or down
+	(0 for maximum usage, 31/63 MB for the least amount).  Note, an
+	arbitrary setting may conflict with XFree86.
+
+	Recommendation: do not set
+	(default = 8 or 16 MB)
+
+   j. "accel"
+	enable text acceleration.  This can be enabled/reenabled anytime
+	by using 'fbset -accel true/false'.
+
+	Recommendation: enable
+	(default = not set)
+
+   k. "mtrr"
+	enable MTRR.  This allows data transfers to the framebuffer memory
+	to occur in bursts which can significantly increase performance.
+	Not very helpful with the i810/i815 because of 'shared memory'.
+
+	Recommendation: do not set
+	(default = not set)
+
+   l. "extvga"
+	if specified, secondary/external VGA output will always be enabled.
+	Useful if the BIOS turns off the VGA port when no monitor is attached.
+	The external VGA monitor can then be attached without rebooting.
+
+	Recommendation: do not set
+	(default = not set)
+
+   m. "sync"
+	Forces the hardware engine to do a "sync" or wait for the hardware
+	to finish before starting another instruction. This will produce a
+	more stable setup, but will be slower.
+
+	Recommendation: do not set
+	(default = not set)
+
+   n. "dcolor"
+	Use directcolor visual instead of truecolor for pixel depths greater
+	than 8 bpp.  Useful for color tuning, such as gamma control.
+
+	Recommendation: do not set
+	(default = not set)
+
+   o. <xres>x<yres>[-<bpp>][@<refresh>]
+	The driver will now accept specification of boot mode option.  If this
+	is specified, the options 'xres' and 'yres' will be ignored. See
+	Documentation/fb/modedb.rst for usage.
+
+D. Kernel booting
+=================
+
+Separate each option/option-pair by commas (,) and the option from its value
+with a colon (:) as in the following::
+
+	video=i810fb:option1,option2:value2
+
+Sample Usage
+------------
+
+In /etc/lilo.conf, add the line::
+
+  append="video=i810fb:vram:2,xres:1024,yres:768,bpp:8,hsync1:30,hsync2:55, \
+	  vsync1:50,vsync2:85,accel,mtrr"
+
+This will initialize the framebuffer to 1024x768 at 8bpp.  The framebuffer
+will use 2 MB of System RAM. MTRR support will be enabled. The refresh rate
+will be computed based on the hsync1/hsync2 and vsync1/vsync2 values.
+
+IMPORTANT:
+  You must include hsync1, hsync2, vsync1 and vsync2 to enable video modes
+  better than 640x480 at 60Hz. HOWEVER, if your chipset/display combination
+  supports I2C and has an EDID block, you can safely exclude hsync1, hsync2,
+  vsync1 and vsync2 parameters.  These parameters will be taken from the EDID
+  block.
+
+E.  Module options
+==================
+
+The module parameters are essentially similar to the kernel
+parameters. The main difference is that you need to include a Boolean value
+(1 for TRUE, and 0 for FALSE) for those options which don't need a value.
+
+Example, to enable MTRR, include "mtrr=1".
+
+Sample Usage
+------------
+
+Using the same setup as described above, load the module like this::
+
+	modprobe i810fb vram=2 xres=1024 bpp=8 hsync1=30 hsync2=55 vsync1=50 \
+		 vsync2=85 accel=1 mtrr=1
+
+Or just add the following to a configuration file in /etc/modprobe.d/::
+
+	options i810fb vram=2 xres=1024 bpp=16 hsync1=30 hsync2=55 vsync1=50 \
+	vsync2=85 accel=1 mtrr=1
+
+and just do a::
+
+	modprobe i810fb
+
+
+F.  Setup
+=========
+
+	a. Do your usual method of configuring the kernel
+
+	   make menuconfig/xconfig/config
+
+	b. Under "Code maturity level options" enable "Prompt for development
+	   and/or incomplete code/drivers".
+
+	c. Enable agpgart support for the Intel 810/815 on-board graphics.
+	   This is required.  The option is under "Character Devices".
+
+	d. Under "Graphics Support", select "Intel 810/815" either statically
+	   or as a module.  Choose "use VESA Generalized Timing Formula" if
+	   you need to maximize the capability of your display.  To be on the
+	   safe side, you can leave this unselected.
+
+	e. If you want support for DDC/I2C probing (Plug and Play Displays),
+	   set 'Enable DDC Support' to 'y'. To make this option appear, set
+	   'use VESA Generalized Timing Formula' to 'y'.
+
+	f. If you want a framebuffer console, enable it under "Console
+	   Drivers".
+
+	g. Compile your kernel.
+
+	h. Load the driver as described in sections D and E.
+
+	i.  Try the DirectFB (http://www.directfb.org) + the i810 gfxdriver
+	    patch to see the chipset in action (or inaction :-).
+
+G.  Acknowledgment:
+===================
+
+	1.  Geert Uytterhoeven - his excellent howto and the virtual
+	    framebuffer driver code made this possible.
+
+	2.  Jeff Hartmann for his agpgart code.
+
+	3.  The X developers.  Insights were provided just by reading the
+	    XFree86 source code.
+
+	4.  Intel(c).  For this value-oriented chipset driver and for
+	    providing documentation.
+
+	5. Matt Sottek.  His inputs and ideas  helped in making some
+	   optimizations possible.
+
+H.  Home Page:
+==============
+
+	A more complete, and probably updated information is provided at
+	http://i810fb.sourceforge.net.
+
+Tony
diff --git a/Documentation/fb/intel810.txt b/Documentation/fb/intel810.txt
deleted file mode 100644
index a8e9f5bca6f3..000000000000
--- a/Documentation/fb/intel810.txt
+++ /dev/null
@@ -1,278 +0,0 @@
-Intel 810/815 Framebuffer driver
- 	Tony Daplas <adaplas@pol.net>
-	http://i810fb.sourceforge.net
-
-	March 17, 2002
-
-	First Released: July 2001
-	Last Update:    September 12, 2005
-================================================================
-
-A. Introduction
-
-	This is a framebuffer driver for various Intel 810/815 compatible
-	graphics devices.  These include:
-
-	Intel 810
-	Intel 810E
-	Intel 810-DC100
-	Intel 815 Internal graphics only, 100Mhz FSB
-	Intel 815 Internal graphics only
-	Intel 815 Internal graphics and AGP
-
-B.  Features
-
-	- Choice of using Discrete Video Timings, VESA Generalized Timing
-	  Formula, or a framebuffer specific database to set the video mode
-
-	- Supports a variable range of horizontal and vertical resolution and
-	  vertical refresh rates if the VESA Generalized Timing Formula is
-	  enabled.
-
-	- Supports color depths of 8, 16, 24 and 32 bits per pixel
-
-	- Supports pseudocolor, directcolor, or truecolor visuals
-
-	- Full and optimized hardware acceleration at 8, 16 and 24 bpp
-
-	- Robust video state save and restore
-
-	- MTRR support
-
-	- Utilizes user-entered monitor specifications to automatically
-	  calculate required video mode parameters.
-
-	- Can concurrently run with xfree86 running with native i810 drivers
-
-	- Hardware Cursor Support
- 
-	- Supports EDID probing either by DDC/I2C or through the BIOS
-
-C.  List of available options
-
-   a. "video=i810fb"
-	enables the i810 driver
-
-	Recommendation: required
-
-   b. "xres:<value>"
-	select horizontal resolution in pixels. (This parameter will be
-	ignored if 'mode_option' is specified.  See 'o' below).
-
-	Recommendation: user preference
-	(default = 640)
-
-   c. "yres:<value>"
-	select vertical resolution in scanlines. If Discrete Video Timings
-	is enabled, this will be ignored and computed as 3*xres/4.  (This
-	parameter will be ignored if 'mode_option' is specified.  See 'o'
-	below)
-
-	Recommendation: user preference
-	(default = 480)
-
-   d. "vyres:<value>"
-	select virtual vertical resolution in scanlines. If (0) or none
-	is specified, this will be computed against maximum available memory.
-
-	Recommendation: do not set
-	(default = 480)
-
-   e. "vram:<value>"
-	select amount of system RAM in MB to allocate for the video memory
-
-	Recommendation: 1 - 4 MB.
-	(default = 4)
-
-   f. "bpp:<value>"
-	select desired pixel depth
-
-	Recommendation: 8
-	(default = 8)
-
-   g. "hsync1/hsync2:<value>"
-	select the minimum and maximum Horizontal Sync Frequency of the
-	monitor in kHz.  If using a fixed frequency monitor, hsync1 must
-	be equal to hsync2. If EDID probing is successful, these will be
-	ignored and values will be taken from the EDID block.
-
-	Recommendation: check monitor manual for correct values
-	(default = 29/30)
-
-   h. "vsync1/vsync2:<value>"
-	select the minimum and maximum Vertical Sync Frequency of the monitor
-	in Hz. You can also use this option to lock your monitor's refresh
-	rate. If EDID probing is successful, these will be ignored and values
-	will be taken from the EDID block.
-
-	Recommendation: check monitor manual for correct values
-	(default = 60/60)
-
-	IMPORTANT:  If you need to clamp your timings, try to give some
-	leeway for computational errors (over/underflows).  Example: if
-	using vsync1/vsync2 = 60/60, make sure hsync1/hsync2 has at least
-	a 1 unit difference, and vice versa.
-
-   i. "voffset:<value>"
-	select at what offset in MB of the logical memory to allocate the
-	framebuffer memory.  The intent is to avoid the memory blocks
-	used by standard graphics applications (XFree86).  The default
-	offset (16 MB for a 64 MB aperture, 8 MB for a 32 MB aperture) will
-	avoid XFree86's usage and allows up to 7 MB/15 MB of framebuffer
-	memory.  Depending on your usage, adjust the value up or down
-	(0 for maximum usage, 31/63 MB for the least amount).  Note, an
-	arbitrary setting may conflict with XFree86.
-
-	Recommendation: do not set
-	(default = 8 or 16 MB)
-
-   j. "accel"
-	enable text acceleration.  This can be enabled/reenabled anytime
-	by using 'fbset -accel true/false'.
-
-	Recommendation: enable
-	(default = not set)
-
-   k. "mtrr"
-	enable MTRR.  This allows data transfers to the framebuffer memory
-	to occur in bursts which can significantly increase performance.
-	Not very helpful with the i810/i815 because of 'shared memory'.
-
-	Recommendation: do not set
-	(default = not set)
-
-   l. "extvga"
-	if specified, secondary/external VGA output will always be enabled.
-	Useful if the BIOS turns off the VGA port when no monitor is attached.
-	The external VGA monitor can then be attached without rebooting.
-
-	Recommendation: do not set
-	(default = not set)
-
-   m. "sync"
-	Forces the hardware engine to do a "sync" or wait for the hardware
-	to finish before starting another instruction. This will produce a
-	more stable setup, but will be slower.
-
-	Recommendation: do not set
-	(default = not set)
-
-   n. "dcolor"
-        Use directcolor visual instead of truecolor for pixel depths greater
-	than 8 bpp.  Useful for color tuning, such as gamma control.
-
-	Recommendation: do not set
-	(default = not set)
-
-   o. <xres>x<yres>[-<bpp>][@<refresh>]
-	The driver will now accept specification of boot mode option.  If this
-	is specified, the options 'xres' and 'yres' will be ignored. See
-	Documentation/fb/modedb.txt for usage.
-
-D. Kernel booting
-
-Separate each option/option-pair by commas (,) and the option from its value
-with a colon (:) as in the following:
-
-video=i810fb:option1,option2:value2
-
-Sample Usage
-------------
-
-In /etc/lilo.conf, add the line:
-
-append="video=i810fb:vram:2,xres:1024,yres:768,bpp:8,hsync1:30,hsync2:55, \
-        vsync1:50,vsync2:85,accel,mtrr"
-
-This will initialize the framebuffer to 1024x768 at 8bpp.  The framebuffer
-will use 2 MB of System RAM. MTRR support will be enabled. The refresh rate
-will be computed based on the hsync1/hsync2 and vsync1/vsync2 values.
-
-IMPORTANT:
-You must include hsync1, hsync2, vsync1 and vsync2 to enable video modes
-better than 640x480 at 60Hz. HOWEVER, if your chipset/display combination
-supports I2C and has an EDID block, you can safely exclude hsync1, hsync2,
-vsync1 and vsync2 parameters.  These parameters will be taken from the EDID
-block.
-
-E.  Module options
-
-The module parameters are essentially similar to the kernel
-parameters. The main difference is that you need to include a Boolean value
-(1 for TRUE, and 0 for FALSE) for those options which don't need a value.
-
-Example, to enable MTRR, include "mtrr=1".
-
-Sample Usage
-------------
-
-Using the same setup as described above, load the module like this:
-
-	modprobe i810fb vram=2 xres=1024 bpp=8 hsync1=30 hsync2=55 vsync1=50 \
-	         vsync2=85 accel=1 mtrr=1
-
-Or just add the following to a configuration file in /etc/modprobe.d/
-
-	options i810fb vram=2 xres=1024 bpp=16 hsync1=30 hsync2=55 vsync1=50 \
-	vsync2=85 accel=1 mtrr=1
-
-and just do a
-
-	modprobe i810fb
-
-
-F.  Setup
-
-	a. Do your usual method of configuring the kernel.
-
-	make menuconfig/xconfig/config
-
-	b. Under "Code maturity level options" enable "Prompt for development
-	   and/or incomplete code/drivers".
-
- 	c. Enable agpgart support for the Intel 810/815 on-board graphics.
-	   This is required.  The option is under "Character Devices".
-
-	d. Under "Graphics Support", select "Intel 810/815" either statically
-	   or as a module.  Choose "use VESA Generalized Timing Formula" if
-	   you need to maximize the capability of your display.  To be on the
-	   safe side, you can leave this unselected.
-
-	e. If you want support for DDC/I2C probing (Plug and Play Displays),
-	   set 'Enable DDC Support' to 'y'. To make this option appear, set
-	   'use VESA Generalized Timing Formula' to 'y'.
-
-        f. If you want a framebuffer console, enable it under "Console
-	   Drivers".
-
-	g. Compile your kernel.
-
-	h. Load the driver as described in sections D and E.
-
-	i.  Try the DirectFB (http://www.directfb.org) + the i810 gfxdriver
-	    patch to see the chipset in action (or inaction :-).
-
-G.  Acknowledgment:
-
-	1.  Geert Uytterhoeven - his excellent howto and the virtual
-	    framebuffer driver code made this possible.
-
-	2.  Jeff Hartmann for his agpgart code.
-
-	3.  The X developers.  Insights were provided just by reading the
-	    XFree86 source code.
-
-	4.  Intel(c).  For this value-oriented chipset driver and for
-	    providing documentation.
-
-	5. Matt Sottek.  His inputs and ideas  helped in making some
-	   optimizations possible.
-
-H.  Home Page:
-
-	A more complete, and probably updated information is provided at
-	http://i810fb.sourceforge.net.
-
-###########################
-Tony
-
diff --git a/Documentation/fb/intelfb.rst b/Documentation/fb/intelfb.rst
new file mode 100644
index 000000000000..e2d0903f4efb
--- /dev/null
+++ b/Documentation/fb/intelfb.rst
@@ -0,0 +1,155 @@
+=============================================================
+Intel 830M/845G/852GM/855GM/865G/915G/945G Framebuffer driver
+=============================================================
+
+A. Introduction
+===============
+
+This is a framebuffer driver for various Intel 8xx/9xx compatible
+graphics devices.  These would include:
+
+	- Intel 830M
+	- Intel 845G
+	- Intel 852GM
+	- Intel 855GM
+	- Intel 865G
+	- Intel 915G
+	- Intel 915GM
+	- Intel 945G
+	- Intel 945GM
+	- Intel 945GME
+	- Intel 965G
+	- Intel 965GM
+
+B.  List of available options
+=============================
+
+   a. "video=intelfb"
+	enables the intelfb driver
+
+	Recommendation: required
+
+   b. "mode=<xres>x<yres>[-<bpp>][@<refresh>]"
+	select mode
+
+	Recommendation: user preference
+	(default = 1024x768-32@70)
+
+   c. "vram=<value>"
+	select amount of system RAM in MB to allocate for the video memory
+	if not enough RAM was already allocated by the BIOS.
+
+	Recommendation: 1 - 4 MB.
+	(default = 4 MB)
+
+   d. "voffset=<value>"
+	select at what offset in MB of the logical memory to allocate the
+	framebuffer memory.  The intent is to avoid the memory blocks
+	used by standard graphics applications (XFree86). Depending on your
+	usage, adjust the value up or down, (0 for maximum usage, 63/127 MB
+	for the least amount).  Note, an arbitrary setting may conflict
+	with XFree86.
+
+	Recommendation: do not set
+	(default = 48 MB)
+
+   e. "accel"
+	enable text acceleration.  This can be enabled/reenabled anytime
+	by using 'fbset -accel true/false'.
+
+	Recommendation: enable
+	(default = set)
+
+   f. "hwcursor"
+	enable cursor acceleration.
+
+	Recommendation: enable
+	(default = set)
+
+   g. "mtrr"
+	enable MTRR.  This allows data transfers to the framebuffer memory
+	to occur in bursts which can significantly increase performance.
+	Not very helpful with the intel chips because of 'shared memory'.
+
+	Recommendation: set
+	(default = set)
+
+   h. "fixed"
+	disable mode switching.
+
+	Recommendation: do not set
+	(default = not set)
+
+   The binary parameters can be unset with a "no" prefix, example "noaccel".
+   The default parameter (not named) is the mode.
+
+C. Kernel booting
+=================
+
+Separate each option/option-pair by commas (,) and the option from its value
+with an equals sign (=) as in the following::
+
+	video=intelfb:option1,option2=value2
+
+Sample Usage
+------------
+
+In /etc/lilo.conf, add the line::
+
+	append="video=intelfb:mode=800x600-32@75,accel,hwcursor,vram=8"
+
+This will initialize the framebuffer to 800x600 at 32bpp and 75Hz. The
+framebuffer will use 8 MB of System RAM. hw acceleration of text and cursor
+will be enabled.
+
+Remarks
+-------
+
+If setting this parameter doesn't work (you stay in a 80x25 text-mode),
+you might need to set the "vga=<mode>" parameter too - see vesafb.txt
+in this directory.
+
+
+D.  Module options
+==================
+
+The module parameters are essentially similar to the kernel
+parameters. The main difference is that you need to include a Boolean value
+(1 for TRUE, and 0 for FALSE) for those options which don't need a value.
+
+Example, to enable MTRR, include "mtrr=1".
+
+Sample Usage
+------------
+
+Using the same setup as described above, load the module like this::
+
+	modprobe intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1
+
+Or just add the following to a configuration file in /etc/modprobe.d/::
+
+	options intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1
+
+and just do a::
+
+	modprobe intelfb
+
+
+E.  Acknowledgment:
+===================
+
+	1.  Geert Uytterhoeven - his excellent howto and the virtual
+	    framebuffer driver code made this possible.
+
+	2.  Jeff Hartmann for his agpgart code.
+
+	3.  David Dawes for his original kernel 2.4 code.
+
+	4.  The X developers.  Insights were provided just by reading the
+	    XFree86 source code.
+
+	5.  Antonino A. Daplas for his inspiring i810fb driver.
+
+	6.  Andrew Morton for his kernel patches maintenance.
+
+Sylvain
diff --git a/Documentation/fb/intelfb.txt b/Documentation/fb/intelfb.txt
deleted file mode 100644
index feac4e4d6968..000000000000
--- a/Documentation/fb/intelfb.txt
+++ /dev/null
@@ -1,149 +0,0 @@
-Intel 830M/845G/852GM/855GM/865G/915G/945G Framebuffer driver
-================================================================
-
-A. Introduction
-	This is a framebuffer driver for various Intel 8xx/9xx compatible
-graphics devices.  These would include:
-
-	Intel 830M
-	Intel 845G
-	Intel 852GM
-	Intel 855GM
-	Intel 865G
-	Intel 915G
-	Intel 915GM
-	Intel 945G
-	Intel 945GM
-	Intel 945GME
-	Intel 965G
-	Intel 965GM
-
-B.  List of available options
-
-   a. "video=intelfb"
-	enables the intelfb driver
-
-	Recommendation: required
-
-   b. "mode=<xres>x<yres>[-<bpp>][@<refresh>]"
-	select mode
-
-	Recommendation: user preference
-	(default = 1024x768-32@70)
-
-   c. "vram=<value>"
-	select amount of system RAM in MB to allocate for the video memory
-	if not enough RAM was already allocated by the BIOS.
-
-	Recommendation: 1 - 4 MB.
-	(default = 4 MB)
-
-   d. "voffset=<value>"
-        select at what offset in MB of the logical memory to allocate the
-	framebuffer memory.  The intent is to avoid the memory blocks
-	used by standard graphics applications (XFree86). Depending on your
-        usage, adjust the value up or down, (0 for maximum usage, 63/127 MB
-        for the least amount).  Note, an arbitrary setting may conflict
-        with XFree86.
-
-	Recommendation: do not set
-	(default = 48 MB)
-
-   e. "accel"
-	enable text acceleration.  This can be enabled/reenabled anytime
-	by using 'fbset -accel true/false'.
-
-	Recommendation: enable
-	(default = set)
-
-   f. "hwcursor"
-	enable cursor acceleration.
-
-	Recommendation: enable
-	(default = set)
-
-   g. "mtrr"
-	enable MTRR.  This allows data transfers to the framebuffer memory
-	to occur in bursts which can significantly increase performance.
-	Not very helpful with the intel chips because of 'shared memory'.
-
-	Recommendation: set
-	(default = set)
-
-   h. "fixed"
-	disable mode switching.
-
-	Recommendation: do not set
-	(default = not set)
-
-   The binary parameters can be unset with a "no" prefix, example "noaccel".
-   The default parameter (not named) is the mode.
-
-C. Kernel booting
-
-Separate each option/option-pair by commas (,) and the option from its value
-with an equals sign (=) as in the following:
-
-video=intelfb:option1,option2=value2
-
-Sample Usage
-------------
-
-In /etc/lilo.conf, add the line:
-
-append="video=intelfb:mode=800x600-32@75,accel,hwcursor,vram=8"
-
-This will initialize the framebuffer to 800x600 at 32bpp and 75Hz. The
-framebuffer will use 8 MB of System RAM. hw acceleration of text and cursor
-will be enabled.
-
-Remarks
--------
-
-If setting this parameter doesn't work (you stay in a 80x25 text-mode),
-you might need to set the "vga=<mode>" parameter too - see vesafb.txt
-in this directory.
-
-
-D.  Module options
-
-	The module parameters are essentially similar to the kernel
-parameters. The main difference is that you need to include a Boolean value
-(1 for TRUE, and 0 for FALSE) for those options which don't need a value.
-
-Example, to enable MTRR, include "mtrr=1".
-
-Sample Usage
-------------
-
-Using the same setup as described above, load the module like this:
-
-	modprobe intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1
-
-Or just add the following to a configuration file in /etc/modprobe.d/
-
-	options intelfb mode=800x600-32@75 vram=8 accel=1 hwcursor=1
-
-and just do a
-
-	modprobe intelfb
-
-
-E.  Acknowledgment:
-
-	1.  Geert Uytterhoeven - his excellent howto and the virtual
-                                 framebuffer driver code made this possible.
-
-	2.  Jeff Hartmann for his agpgart code.
-
-	3.  David Dawes for his original kernel 2.4 code.
-
-	4.  The X developers.  Insights were provided just by reading the
-	    XFree86 source code.
-
-	5.  Antonino A. Daplas for his inspiring i810fb driver.
-
-	6.  Andrew Morton for his kernel patches maintenance.
-
-###########################
-Sylvain
diff --git a/Documentation/fb/internals.rst b/Documentation/fb/internals.rst
new file mode 100644
index 000000000000..696b50aa7c24
--- /dev/null
+++ b/Documentation/fb/internals.rst
@@ -0,0 +1,86 @@
+=============================
+Frame Buffer device internals
+=============================
+
+This is a first start for some documentation about frame buffer device
+internals.
+
+Authors:
+
+- Geert Uytterhoeven <geert@linux-m68k.org>, 21 July 1998
+- James Simmons <jsimmons@user.sf.net>, Nov 26 2002
+
+--------------------------------------------------------------------------------
+
+Structures used by the frame buffer device API
+==============================================
+
+The following structures play a role in the game of frame buffer devices. They
+are defined in <linux/fb.h>.
+
+1. Outside the kernel (user space)
+
+  - struct fb_fix_screeninfo
+
+    Device independent unchangeable information about a frame buffer device and
+    a specific video mode. This can be obtained using the FBIOGET_FSCREENINFO
+    ioctl.
+
+  - struct fb_var_screeninfo
+
+    Device independent changeable information about a frame buffer device and a
+    specific video mode. This can be obtained using the FBIOGET_VSCREENINFO
+    ioctl, and updated with the FBIOPUT_VSCREENINFO ioctl. If you want to pan
+    the screen only, you can use the FBIOPAN_DISPLAY ioctl.
+
+  - struct fb_cmap
+
+    Device independent colormap information. You can get and set the colormap
+    using the FBIOGETCMAP and FBIOPUTCMAP ioctls.
+
+
+2. Inside the kernel
+
+  - struct fb_info
+
+    Generic information, API and low level information about a specific frame
+    buffer device instance (slot number, board address, ...).
+
+  - struct `par`
+
+    Device dependent information that uniquely defines the video mode for this
+    particular piece of hardware.
+
+
+Visuals used by the frame buffer device API
+===========================================
+
+
+Monochrome (FB_VISUAL_MONO01 and FB_VISUAL_MONO10)
+--------------------------------------------------
+Each pixel is either black or white.
+
+
+Pseudo color (FB_VISUAL_PSEUDOCOLOR and FB_VISUAL_STATIC_PSEUDOCOLOR)
+---------------------------------------------------------------------
+The whole pixel value is fed through a programmable lookup table that has one
+color (including red, green, and blue intensities) for each possible pixel
+value, and that color is displayed.
+
+
+True color (FB_VISUAL_TRUECOLOR)
+--------------------------------
+The pixel value is broken up into red, green, and blue fields.
+
+
+Direct color (FB_VISUAL_DIRECTCOLOR)
+------------------------------------
+The pixel value is broken up into red, green, and blue fields, each of which
+are looked up in separate red, green, and blue lookup tables.
+
+
+Grayscale displays
+------------------
+Grayscale and static grayscale are special variants of pseudo color and static
+pseudo color, where the red, green and blue components are always equal to
+each other.
diff --git a/Documentation/fb/internals.txt b/Documentation/fb/internals.txt
deleted file mode 100644
index 9b2a2b2f3e57..000000000000
--- a/Documentation/fb/internals.txt
+++ /dev/null
@@ -1,82 +0,0 @@
-
-This is a first start for some documentation about frame buffer device
-internals.
-
-Geert Uytterhoeven <geert@linux-m68k.org>, 21 July 1998
-James Simmons <jsimmons@user.sf.net>, Nov 26 2002
-
---------------------------------------------------------------------------------
-
-	    ***  STRUCTURES USED BY THE FRAME BUFFER DEVICE API  ***
-
-The following structures play a role in the game of frame buffer devices. They
-are defined in <linux/fb.h>.
-
-1. Outside the kernel (user space)
-
-  - struct fb_fix_screeninfo
-
-    Device independent unchangeable information about a frame buffer device and
-    a specific video mode. This can be obtained using the FBIOGET_FSCREENINFO
-    ioctl.
-
-  - struct fb_var_screeninfo
-
-    Device independent changeable information about a frame buffer device and a
-    specific video mode. This can be obtained using the FBIOGET_VSCREENINFO
-    ioctl, and updated with the FBIOPUT_VSCREENINFO ioctl. If you want to pan
-    the screen only, you can use the FBIOPAN_DISPLAY ioctl.
-
-  - struct fb_cmap
-
-    Device independent colormap information. You can get and set the colormap
-    using the FBIOGETCMAP and FBIOPUTCMAP ioctls.
-
-
-2. Inside the kernel
-
-  - struct fb_info
-
-    Generic information, API and low level information about a specific frame
-    buffer device instance (slot number, board address, ...).
-
-  - struct `par'
-
-    Device dependent information that uniquely defines the video mode for this
-    particular piece of hardware.
-
-
---------------------------------------------------------------------------------
-
-	    ***  VISUALS USED BY THE FRAME BUFFER DEVICE API  ***
-
-
-Monochrome (FB_VISUAL_MONO01 and FB_VISUAL_MONO10)
--------------------------------------------------
-Each pixel is either black or white.
-
-
-Pseudo color (FB_VISUAL_PSEUDOCOLOR and FB_VISUAL_STATIC_PSEUDOCOLOR)
----------------------------------------------------------------------
-The whole pixel value is fed through a programmable lookup table that has one
-color (including red, green, and blue intensities) for each possible pixel
-value, and that color is displayed.
-
-
-True color (FB_VISUAL_TRUECOLOR)
---------------------------------
-The pixel value is broken up into red, green, and blue fields.
-
-
-Direct color (FB_VISUAL_DIRECTCOLOR)
-------------------------------------
-The pixel value is broken up into red, green, and blue fields, each of which 
-are looked up in separate red, green, and blue lookup tables.
-
-
-Grayscale displays
-------------------
-Grayscale and static grayscale are special variants of pseudo color and static
-pseudo color, where the red, green and blue components are always equal to
-each other.
-
diff --git a/Documentation/fb/lxfb.rst b/Documentation/fb/lxfb.rst
new file mode 100644
index 000000000000..863e6b98fbae
--- /dev/null
+++ b/Documentation/fb/lxfb.rst
@@ -0,0 +1,55 @@
+=============
+What is lxfb?
+=============
+
+.. [This file is cloned from VesaFB/aty128fb]
+
+
+This is a graphics framebuffer driver for AMD Geode LX based processors.
+
+Advantages:
+
+ * No need to use AMD's VSA code (or other VESA emulation layer) in the
+   BIOS.
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts.
+ * You can run XF68_FBDev on top of /dev/fb0
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * graphic mode is slower than text mode...
+
+
+How to use it?
+==============
+
+Switching modes is done using  lxfb.mode_option=<resolution>... boot
+parameter or using `fbset` program.
+
+See Documentation/fb/modedb.rst for more information on modedb
+resolutions.
+
+
+X11
+===
+
+XF68_FBDev should generally work fine, but it is non-accelerated.
+
+
+Configuration
+=============
+
+You can pass kernel command line options to lxfb with lxfb.<option>.
+For example, lxfb.mode_option=800x600@75.
+Accepted options:
+
+================ ==================================================
+mode_option	 specify the video mode.  Of the form
+		 <x>x<y>[-<bpp>][@<refresh>]
+vram		 size of video ram (normally auto-detected)
+vt_switch	 enable vt switching during suspend/resume.  The vt
+		 switch is slow, but harmless.
+================ ==================================================
+
+Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/lxfb.txt b/Documentation/fb/lxfb.txt
deleted file mode 100644
index 38b3ca6f6ca7..000000000000
--- a/Documentation/fb/lxfb.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-[This file is cloned from VesaFB/aty128fb]
-
-What is lxfb?
-=================
-
-This is a graphics framebuffer driver for AMD Geode LX based processors.
-
-Advantages:
-
- * No need to use AMD's VSA code (or other VESA emulation layer) in the
-   BIOS.
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts.
- * You can run XF68_FBDev on top of /dev/fb0
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * graphic mode is slower than text mode...
-
-
-How to use it?
-==============
-
-Switching modes is done using  lxfb.mode_option=<resolution>... boot
-parameter or using `fbset' program.
-
-See Documentation/fb/modedb.txt for more information on modedb
-resolutions.
-
-
-X11
-===
-
-XF68_FBDev should generally work fine, but it is non-accelerated.
-
-
-Configuration
-=============
-
-You can pass kernel command line options to lxfb with lxfb.<option>.
-For example, lxfb.mode_option=800x600@75.
-Accepted options:
-
-mode_option	- specify the video mode.  Of the form
-		  <x>x<y>[-<bpp>][@<refresh>]
-vram		- size of video ram (normally auto-detected)
-vt_switch	- enable vt switching during suspend/resume.  The vt
-		  switch is slow, but harmless.
-
---
-Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/matroxfb.rst b/Documentation/fb/matroxfb.rst
new file mode 100644
index 000000000000..f1859d98606e
--- /dev/null
+++ b/Documentation/fb/matroxfb.rst
@@ -0,0 +1,443 @@
+=================
+What is matroxfb?
+=================
+
+.. [This file is cloned from VesaFB. Thanks go to Gerd Knorr]
+
+
+This is a driver for a graphic framebuffer for Matrox devices on
+Alpha, Intel and PPC boxes.
+
+Advantages:
+
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts.
+ * You can run XF{68,86}_FBDev or XFree86 fbdev driver on top of /dev/fb0
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * graphic mode is slower than text mode... but you should not notice
+   if you use same resolution as you used in textmode.
+
+
+How to use it?
+==============
+
+Switching modes is done using the video=matroxfb:vesa:... boot parameter
+or using `fbset` program.
+
+If you want, for example, enable a resolution of 1280x1024x24bpp you should
+pass to the kernel this command line: "video=matroxfb:vesa:0x1BB".
+
+You should compile in both vgacon (to boot if you remove you Matrox from
+box) and matroxfb (for graphics mode). You should not compile-in vesafb
+unless you have primary display on non-Matrox VBE2.0 device (see
+Documentation/fb/vesafb.rst for details).
+
+Currently supported video modes are (through vesa:... interface, PowerMac
+has [as addon] compatibility code):
+
+
+Graphic modes
+-------------
+
+===  =======  =======  =======  =======  =======
+bpp  640x400  640x480  768x576  800x600  960x720
+===  =======  =======  =======  =======  =======
+  4             0x12             0x102
+  8   0x100    0x101    0x180    0x103    0x188
+ 15            0x110    0x181    0x113    0x189
+ 16            0x111    0x182    0x114    0x18A
+ 24            0x1B2    0x184    0x1B5    0x18C
+ 32            0x112    0x183    0x115    0x18B
+===  =======  =======  =======  =======  =======
+
+
+Graphic modes (continued)
+-------------------------
+
+===  ======== ======== ========= ========= =========
+bpp  1024x768 1152x864 1280x1024 1408x1056 1600x1200
+===  ======== ======== ========= ========= =========
+  4    0x104             0x106
+  8    0x105    0x190    0x107     0x198     0x11C
+ 15    0x116    0x191    0x119     0x199     0x11D
+ 16    0x117    0x192    0x11A     0x19A     0x11E
+ 24    0x1B8    0x194    0x1BB     0x19C     0x1BF
+ 32    0x118    0x193    0x11B     0x19B
+===  ======== ======== ========= ========= =========
+
+
+Text modes
+----------
+
+==== =======  =======  ========  ========  ========
+text 640x400  640x480  1056x344  1056x400  1056x480
+==== =======  =======  ========  ========  ========
+ 8x8   0x1C0    0x108     0x10A     0x10B     0x10C
+8x16 2, 3, 7                        0x109
+==== =======  =======  ========  ========  ========
+
+You can enter these number either hexadecimal (leading `0x`) or decimal
+(0x100 = 256). You can also use value + 512 to achieve compatibility
+with your old number passed to vesafb.
+
+Non-listed number can be achieved by more complicated command-line, for
+example 1600x1200x32bpp can be specified by `video=matroxfb:vesa:0x11C,depth:32`.
+
+
+X11
+===
+
+XF{68,86}_FBDev should work just fine, but it is non-accelerated. On non-intel
+architectures there are some glitches for 24bpp videomodes. 8, 16 and 32bpp
+works fine.
+
+Running another (accelerated) X-Server like XF86_SVGA works too. But (at least)
+XFree servers have big troubles in multihead configurations (even on first
+head, not even talking about second). Running XFree86 4.x accelerated mga
+driver is possible, but you must not enable DRI - if you do, resolution and
+color depth of your X desktop must match resolution and color depths of your
+virtual consoles, otherwise X will corrupt accelerator settings.
+
+
+SVGALib
+=======
+
+Driver contains SVGALib compatibility code. It is turned on by choosing textual
+mode for console. You can do it at boot time by using videomode
+2,3,7,0x108-0x10C or 0x1C0. At runtime, `fbset -depth 0` does this work.
+Unfortunately, after SVGALib application exits, screen contents is corrupted.
+Switching to another console and back fixes it. I hope that it is SVGALib's
+problem and not mine, but I'm not sure.
+
+
+Configuration
+=============
+
+You can pass kernel command line options to matroxfb with
+`video=matroxfb:option1,option2:value2,option3` (multiple options should be
+separated by comma, values are separated from options by `:`).
+Accepted options:
+
+============ ===================================================================
+mem:X        size of memory (X can be in megabytes, kilobytes or bytes)
+	     You can only decrease value determined by driver because of
+	     it always probe for memory. Default is to use whole detected
+	     memory usable for on-screen display (i.e. max. 8 MB).
+disabled     do not load driver; you can use also `off`, but `disabled`
+	     is here too.
+enabled      load driver, if you have `video=matroxfb:disabled` in LILO
+	     configuration, you can override it by this (you cannot override
+	     `off`). It is default.
+noaccel      do not use acceleration engine. It does not work on Alphas.
+accel        use acceleration engine. It is default.
+nopan        create initial consoles with vyres = yres, thus disabling virtual
+	     scrolling.
+pan          create initial consoles as tall as possible (vyres = memory/vxres).
+	     It is default.
+nopciretry   disable PCI retries. It is needed for some broken chipsets,
+	     it is autodetected for intel's 82437. In this case device does
+	     not comply to PCI 2.1 specs (it will not guarantee that every
+	     transaction terminate with success or retry in 32 PCLK).
+pciretry     enable PCI retries. It is default, except for intel's 82437.
+novga        disables VGA I/O ports. It is default if BIOS did not enable
+	     device. You should not use this option, some boards then do not
+	     restart without power off.
+vga          preserve state of VGA I/O ports. It is default. Driver does not
+	     enable VGA I/O if BIOS did not it (it is not safe to enable it in
+	     most cases).
+nobios       disables BIOS ROM. It is default if BIOS did not enable BIOS
+	     itself. You should not use this option, some boards then do not
+	     restart without power off.
+bios         preserve state of BIOS ROM. It is default. Driver does not enable
+	     BIOS if BIOS was not enabled before.
+noinit       tells driver, that devices were already initialized. You should use
+	     it if you have G100 and/or if driver cannot detect memory, you see
+	     strange pattern on screen and so on. Devices not enabled by BIOS
+	     are still initialized. It is default.
+init         driver initializes every device it knows about.
+memtype      specifies memory type, implies 'init'. This is valid only for G200
+	     and G400 and has following meaning:
+
+	       G200:
+		 -  0 -> 2x128Kx32 chips, 2MB onboard, probably sgram
+		 -  1 -> 2x128Kx32 chips, 4MB onboard, probably sgram
+		 -  2 -> 2x256Kx32 chips, 4MB onboard, probably sgram
+		 -  3 -> 2x256Kx32 chips, 8MB onboard, probably sgram
+		 -  4 -> 2x512Kx16 chips, 8/16MB onboard, probably sdram only
+		 -  5 -> same as above
+		 -  6 -> 4x128Kx32 chips, 4MB onboard, probably sgram
+		 -  7 -> 4x128Kx32 chips, 8MB onboard, probably sgram
+	       G400:
+		 -  0 -> 2x512Kx16 SDRAM, 16/32MB
+		 -	 2x512Kx32 SGRAM, 16/32MB
+		 -  1 -> 2x256Kx32 SGRAM, 8/16MB
+		 -  2 -> 4x128Kx32 SGRAM, 8/16MB
+		 -  3 -> 4x512Kx32 SDRAM, 32MB
+		 -  4 -> 4x256Kx32 SGRAM, 16/32MB
+		 -  5 -> 2x1Mx32 SDRAM, 32MB
+		 -  6 -> reserved
+		 -  7 -> reserved
+
+	     You should use sdram or sgram parameter in addition to memtype
+	     parameter.
+nomtrr       disables write combining on frame buffer. This slows down driver
+	     but there is reported minor incompatibility between GUS DMA and
+	     XFree under high loads if write combining is enabled (sound
+	     dropouts).
+mtrr         enables write combining on frame buffer. It speeds up video
+	     accesses much. It is default. You must have MTRR support enabled
+	     in kernel and your CPU must have MTRR (f.e. Pentium II have them).
+sgram        tells to driver that you have Gxx0 with SGRAM memory. It has no
+	     effect without `init`.
+sdram        tells to driver that you have Gxx0 with SDRAM memory.
+	     It is a default.
+inv24        change timings parameters for 24bpp modes on Millennium and
+	     Millennium II. Specify this if you see strange color shadows
+	     around  characters.
+noinv24      use standard timings. It is the default.
+inverse      invert colors on screen (for LCD displays)
+noinverse    show true colors on screen. It is default.
+dev:X        bind driver to device X. Driver numbers device from 0 up to N,
+	     where device 0 is first `known` device found, 1 second and so on.
+	     lspci lists devices in this order.
+	     Default is `every` known device.
+nohwcursor   disables hardware cursor (use software cursor instead).
+hwcursor     enables hardware cursor. It is default. If you are using
+	     non-accelerated mode (`noaccel` or `fbset -accel false`), software
+	     cursor is used (except for text mode).
+noblink      disables cursor blinking. Cursor in text mode always blinks (hw
+	     limitation).
+blink        enables cursor blinking. It is default.
+nofastfont   disables fastfont feature. It is default.
+fastfont:X   enables fastfont feature. X specifies size of memory reserved for
+	     font data, it must be >= (fontwidth*fontheight*chars_in_font)/8.
+	     It is faster on Gx00 series, but slower on older cards.
+grayscale    enable grayscale summing. It works in PSEUDOCOLOR modes (text,
+	     4bpp, 8bpp). In DIRECTCOLOR modes it is limited to characters
+	     displayed through putc/putcs. Direct accesses to framebuffer
+	     can paint colors.
+nograyscale  disable grayscale summing. It is default.
+cross4MB     enables that pixel line can cross 4MB boundary. It is default for
+	     non-Millennium.
+nocross4MB   pixel line must not cross 4MB boundary. It is default for
+	     Millennium I or II, because of these devices have hardware
+	     limitations which do not allow this. But this option is
+	     incompatible with some (if not all yet released) versions of
+	     XF86_FBDev.
+dfp          enables digital flat panel interface. This option is incompatible
+	     with secondary (TV) output - if DFP is active, TV output must be
+	     inactive and vice versa. DFP always uses same timing as primary
+	     (monitor) output.
+dfp:X        use settings X for digital flat panel interface. X is number from
+	     0 to 0xFF, and meaning of each individual bit is described in
+	     G400 manual, in description of DAC register 0x1F. For normal
+	     operation you should set all bits to zero, except lowest bit. This
+	     lowest bit selects who is source of display clocks, whether G400,
+	     or panel. Default value is now read back from hardware - so you
+	     should specify this value only if you are also using `init`
+	     parameter.
+outputs:XYZ  set mapping between CRTC and outputs. Each letter can have value
+	     of 0 (for no CRTC), 1 (CRTC1) or 2 (CRTC2), and first letter
+	     corresponds to primary analog output, second letter to the
+	     secondary analog output and third letter to the DVI output.
+	     Default setting is 100 for cards below G400 or G400 without DFP,
+	     101 for G400 with DFP, and 111 for G450 and G550. You can set
+	     mapping only on first card, use matroxset for setting up other
+	     devices.
+vesa:X       selects startup videomode. X is number from 0 to 0x1FF, see table
+	     above for detailed explanation. Default is 640x480x8bpp if driver
+	     has 8bpp support. Otherwise first available of 640x350x4bpp,
+	     640x480x15bpp, 640x480x24bpp, 640x480x32bpp or 80x25 text
+	     (80x25 text is always available).
+============ ===================================================================
+
+If you are not satisfied with videomode selected by `vesa` option, you
+can modify it with these options:
+
+============ ===================================================================
+xres:X       horizontal resolution, in pixels. Default is derived from `vesa`
+	     option.
+yres:X       vertical resolution, in pixel lines. Default is derived from `vesa`
+	     option.
+upper:X      top boundary: lines between end of VSYNC pulse and start of first
+	     pixel line of picture. Default is derived from `vesa` option.
+lower:X      bottom boundary: lines between end of picture and start of VSYNC
+	     pulse. Default is derived from `vesa` option.
+vslen:X      length of VSYNC pulse, in lines. Default is derived from `vesa`
+	     option.
+left:X       left boundary: pixels between end of HSYNC pulse and first pixel.
+	     Default is derived from `vesa` option.
+right:X      right boundary: pixels between end of picture and start of HSYNC
+	     pulse. Default is derived from `vesa` option.
+hslen:X      length of HSYNC pulse, in pixels. Default is derived from `vesa`
+	     option.
+pixclock:X   dotclocks, in ps (picoseconds). Default is derived from `vesa`
+	     option and from `fh` and `fv` options.
+sync:X       sync. pulse - bit 0 inverts HSYNC polarity, bit 1 VSYNC polarity.
+	     If bit 3 (value 0x08) is set, composite sync instead of HSYNC is
+	     generated. If bit 5 (value 0x20) is set, sync on green is turned
+	     on. Do not forget that if you want sync on green, you also probably
+	     want composite sync.
+	     Default depends on `vesa`.
+depth:X      Bits per pixel: 0=text, 4,8,15,16,24 or 32. Default depends on
+	     `vesa`.
+============ ===================================================================
+
+If you know capabilities of your monitor, you can specify some (or all) of
+`maxclk`, `fh` and `fv`. In this case, `pixclock` is computed so that
+pixclock <= maxclk, real_fh <= fh and real_fv <= fv.
+
+============ ==================================================================
+maxclk:X     maximum dotclock. X can be specified in MHz, kHz or Hz. Default is
+	     `don`t care`.
+fh:X         maximum horizontal synchronization frequency. X can be specified
+	     in kHz or Hz. Default is `don't care`.
+fv:X         maximum vertical frequency. X must be specified in Hz. Default is
+	     70 for modes derived from `vesa` with yres <= 400, 60Hz for
+	     yres > 400.
+============ ==================================================================
+
+
+Limitations
+===========
+
+There are known and unknown bugs, features and misfeatures.
+Currently there are following known bugs:
+
+ - SVGALib does not restore screen on exit
+ - generic fbcon-cfbX procedures do not work on Alphas. Due to this,
+   `noaccel` (and cfb4 accel) driver does not work on Alpha. So everyone
+   with access to `/dev/fb*` on Alpha can hang machine (you should restrict
+   access to `/dev/fb*` - everyone with access to this device can destroy
+   your monitor, believe me...).
+ - 24bpp does not support correctly XF-FBDev on big-endian architectures.
+ - interlaced text mode is not supported; it looks like hardware limitation,
+   but I'm not sure.
+ - Gxx0 SGRAM/SDRAM is not autodetected.
+ - If you are using more than one framebuffer device, you must boot kernel
+   with 'video=scrollback:0'.
+ - maybe more...
+
+And following misfeatures:
+
+ - SVGALib does not restore screen on exit.
+ - pixclock for text modes is limited by hardware to
+
+    - 83 MHz on G200
+    - 66 MHz on Millennium I
+    - 60 MHz on Millennium II
+
+   Because I have no access to other devices, I do not know specific
+   frequencies for them. So driver does not check this and allows you to
+   set frequency higher that this. It causes sparks, black holes and other
+   pretty effects on screen. Device was not destroyed during tests. :-)
+ - my Millennium G200 oscillator has frequency range from 35 MHz to 380 MHz
+   (and it works with 8bpp on about 320 MHz dotclocks (and changed mclk)).
+   But Matrox says on product sheet that VCO limit is 50-250 MHz, so I believe
+   them (maybe that chip overheats, but it has a very big cooler (G100 has
+   none), so it should work).
+ - special mixed video/graphics videomodes of Mystique and Gx00 - 2G8V16 and
+   G16V16 are not supported
+ - color keying is not supported
+ - feature connector of Mystique and Gx00 is set to VGA mode (it is disabled
+   by BIOS)
+ - DDC (monitor detection) is supported through dualhead driver
+ - some check for input values are not so strict how it should be (you can
+   specify vslen=4000 and so on).
+ - maybe more...
+
+And following features:
+
+ - 4bpp is available only on Millennium I and Millennium II. It is hardware
+   limitation.
+ - selection between 1:5:5:5 and 5:6:5 16bpp videomode is done by -rgba
+   option of fbset: "fbset -depth 16 -rgba 5,5,5" selects 1:5:5:5, anything
+   else selects 5:6:5 mode.
+ - text mode uses 6 bit VGA palette instead of 8 bit (one of 262144 colors
+   instead of one of 16M colors). It is due to hardware limitation of
+   Millennium I/II and SVGALib compatibility.
+
+
+Benchmarks
+==========
+It is time to redraw whole screen 1000 times in 1024x768, 60Hz. It is
+time for draw 6144000 characters on screen through /dev/vcsa
+(for 32bpp it is about 3GB of data (exactly 3000 MB); for 8x16 font in
+16 seconds, i.e. 187 MBps).
+Times were obtained from one older version of driver, now they are about 3%
+faster, it is kernel-space only time on P-II/350 MHz, Millennium I in 33 MHz
+PCI slot, G200 in AGP 2x slot. I did not test vgacon::
+
+  NOACCEL
+	8x16                 12x22
+	Millennium I  G200   Millennium I  G200
+  8bpp    16.42         9.54   12.33         9.13
+  16bpp   21.00        15.70   19.11        15.02
+  24bpp   36.66        36.66   35.00        35.00
+  32bpp   35.00        30.00   33.85        28.66
+
+  ACCEL, nofastfont
+	8x16                 12x22                6x11
+	Millennium I  G200   Millennium I  G200   Millennium I  G200
+  8bpp     7.79         7.24   13.55         7.78   30.00        21.01
+  16bpp    9.13         7.78   16.16         7.78   30.00        21.01
+  24bpp   14.17        10.72   18.69        10.24   34.99        21.01
+  32bpp   16.15	     16.16   18.73        13.09   34.99        21.01
+
+  ACCEL, fastfont
+	8x16                 12x22                6x11
+	Millennium I  G200   Millennium I  G200   Millennium I  G200
+  8bpp     8.41         6.01    6.54         4.37   16.00        10.51
+  16bpp    9.54         9.12    8.76         6.17   17.52        14.01
+  24bpp   15.00        12.36   11.67        10.00   22.01        18.32
+  32bpp   16.18        18.29*  12.71        12.74   24.44        21.00
+
+  TEXT
+	8x16
+	Millennium I  G200
+  TEXT     3.29         1.50
+
+  * Yes, it is slower than Millennium I.
+
+
+Dualhead G400
+=============
+Driver supports dualhead G400 with some limitations:
+ + secondary head shares videomemory with primary head. It is not problem
+   if you have 32MB of videoram, but if you have only 16MB, you may have
+   to think twice before choosing videomode (for example twice 1880x1440x32bpp
+   is not possible).
+ + due to hardware limitation, secondary head can use only 16 and 32bpp
+   videomodes.
+ + secondary head is not accelerated. There were bad problems with accelerated
+   XFree when secondary head used to use acceleration.
+ + secondary head always powerups in 640x480@60-32 videomode. You have to use
+   fbset to change this mode.
+ + secondary head always powerups in monitor mode. You have to use fbmatroxset
+   to change it to TV mode. Also, you must select at least 525 lines for
+   NTSC output and 625 lines for PAL output.
+ + kernel is not fully multihead ready. So some things are impossible to do.
+ + if you compiled it as module, you must insert i2c-matroxfb, matroxfb_maven
+   and matroxfb_crtc2 into kernel.
+
+
+Dualhead G450
+=============
+Driver supports dualhead G450 with some limitations:
+ + secondary head shares videomemory with primary head. It is not problem
+   if you have 32MB of videoram, but if you have only 16MB, you may have
+   to think twice before choosing videomode.
+ + due to hardware limitation, secondary head can use only 16 and 32bpp
+   videomodes.
+ + secondary head is not accelerated.
+ + secondary head always powerups in 640x480@60-32 videomode. You have to use
+   fbset to change this mode.
+ + TV output is not supported
+ + kernel is not fully multihead ready, so some things are impossible to do.
+ + if you compiled it as module, you must insert matroxfb_g450 and matroxfb_crtc2
+   into kernel.
+
+Petr Vandrovec <vandrove@vc.cvut.cz>
diff --git a/Documentation/fb/matroxfb.txt b/Documentation/fb/matroxfb.txt
deleted file mode 100644
index b95f5bb522f2..000000000000
--- a/Documentation/fb/matroxfb.txt
+++ /dev/null
@@ -1,413 +0,0 @@
-[This file is cloned from VesaFB. Thanks go to Gerd Knorr]
-
-What is matroxfb?
-=================
-
-This is a driver for a graphic framebuffer for Matrox devices on
-Alpha, Intel and PPC boxes.
-
-Advantages:
-
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts.
- * You can run XF{68,86}_FBDev or XFree86 fbdev driver on top of /dev/fb0
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * graphic mode is slower than text mode... but you should not notice
-   if you use same resolution as you used in textmode.
-
-
-How to use it?
-==============
-
-Switching modes is done using the video=matroxfb:vesa:... boot parameter
-or using `fbset' program.
-
-If you want, for example, enable a resolution of 1280x1024x24bpp you should
-pass to the kernel this command line: "video=matroxfb:vesa:0x1BB".
-
-You should compile in both vgacon (to boot if you remove you Matrox from
-box) and matroxfb (for graphics mode). You should not compile-in vesafb
-unless you have primary display on non-Matrox VBE2.0 device (see 
-Documentation/fb/vesafb.txt for details).
-
-Currently supported video modes are (through vesa:... interface, PowerMac
-has [as addon] compatibility code):
-
-
-[Graphic modes]
-
-bpp | 640x400  640x480  768x576  800x600  960x720
-----+--------------------------------------------
-  4 |            0x12             0x102            
-  8 |  0x100    0x101    0x180    0x103    0x188   
- 15 |           0x110    0x181    0x113    0x189   
- 16 |           0x111    0x182    0x114    0x18A   
- 24 |           0x1B2    0x184    0x1B5    0x18C   
- 32 |           0x112    0x183    0x115    0x18B   
-
-
-[Graphic modes (continued)]
-
-bpp | 1024x768 1152x864 1280x1024 1408x1056 1600x1200
-----+------------------------------------------------
-  4 |   0x104             0x106
-  8 |   0x105    0x190    0x107     0x198     0x11C
- 15 |   0x116    0x191    0x119     0x199     0x11D
- 16 |   0x117    0x192    0x11A     0x19A     0x11E
- 24 |   0x1B8    0x194    0x1BB     0x19C     0x1BF
- 32 |   0x118    0x193    0x11B     0x19B
-
-
-[Text modes]
-
-text | 640x400  640x480  1056x344  1056x400  1056x480
------+------------------------------------------------
- 8x8 |  0x1C0    0x108     0x10A     0x10B     0x10C
-8x16 | 2, 3, 7                       0x109
-
-You can enter these number either hexadecimal (leading `0x') or decimal
-(0x100 = 256). You can also use value + 512 to achieve compatibility
-with your old number passed to vesafb.
-
-Non-listed number can be achieved by more complicated command-line, for
-example 1600x1200x32bpp can be specified by `video=matroxfb:vesa:0x11C,depth:32'.
-
-
-X11
-===
-
-XF{68,86}_FBDev should work just fine, but it is non-accelerated. On non-intel
-architectures there are some glitches for 24bpp videomodes. 8, 16 and 32bpp
-works fine.
-
-Running another (accelerated) X-Server like XF86_SVGA works too. But (at least)
-XFree servers have big troubles in multihead configurations (even on first
-head, not even talking about second). Running XFree86 4.x accelerated mga 
-driver is possible, but you must not enable DRI - if you do, resolution and
-color depth of your X desktop must match resolution and color depths of your
-virtual consoles, otherwise X will corrupt accelerator settings.
-
-
-SVGALib
-=======
-
-Driver contains SVGALib compatibility code. It is turned on by choosing textual
-mode for console. You can do it at boot time by using videomode
-2,3,7,0x108-0x10C or 0x1C0. At runtime, `fbset -depth 0' does this work.
-Unfortunately, after SVGALib application exits, screen contents is corrupted.
-Switching to another console and back fixes it. I hope that it is SVGALib's
-problem and not mine, but I'm not sure.
-
-
-Configuration
-=============
-
-You can pass kernel command line options to matroxfb with
-`video=matroxfb:option1,option2:value2,option3' (multiple options should be 
-separated by comma, values are separated from options by `:'). 
-Accepted options:
-
-mem:X    - size of memory (X can be in megabytes, kilobytes or bytes)
-           You can only decrease value determined by driver because of
-	   it always probe for memory. Default is to use whole detected
-	   memory usable for on-screen display (i.e. max. 8 MB).
-disabled - do not load driver; you can use also `off', but `disabled'
-           is here too.
-enabled  - load driver, if you have `video=matroxfb:disabled' in LILO
-           configuration, you can override it by this (you cannot override
-	   `off'). It is default.
-noaccel  - do not use acceleration engine. It does not work on Alphas.
-accel    - use acceleration engine. It is default.
-nopan    - create initial consoles with vyres = yres, thus disabling virtual
-           scrolling.
-pan      - create initial consoles as tall as possible (vyres = memory/vxres).
-           It is default.
-nopciretry - disable PCI retries. It is needed for some broken chipsets,
-           it is autodetected for intel's 82437. In this case device does
-	   not comply to PCI 2.1 specs (it will not guarantee that every
-	   transaction terminate with success or retry in 32 PCLK).
-pciretry - enable PCI retries. It is default, except for intel's 82437.
-novga    - disables VGA I/O ports. It is default if BIOS did not enable device.
-           You should not use this option, some boards then do not restart
-	   without power off.
-vga      - preserve state of VGA I/O ports. It is default. Driver does not
-           enable VGA I/O if BIOS did not it (it is not safe to enable it in
-	   most cases).
-nobios   - disables BIOS ROM. It is default if BIOS did not enable BIOS itself.
-           You should not use this option, some boards then do not restart
-	   without power off.
-bios     - preserve state of BIOS ROM. It is default. Driver does not enable
-           BIOS if BIOS was not enabled before.
-noinit   - tells driver, that devices were already initialized. You should use
-           it if you have G100 and/or if driver cannot detect memory, you see
-	   strange pattern on screen and so on. Devices not enabled by BIOS
-	   are still initialized. It is default.
-init     - driver initializes every device it knows about.
-memtype  - specifies memory type, implies 'init'. This is valid only for G200 
-           and G400 and has following meaning:
-             G200: 0 -> 2x128Kx32 chips, 2MB onboard, probably sgram
-                   1 -> 2x128Kx32 chips, 4MB onboard, probably sgram
-                   2 -> 2x256Kx32 chips, 4MB onboard, probably sgram
-                   3 -> 2x256Kx32 chips, 8MB onboard, probably sgram
-                   4 -> 2x512Kx16 chips, 8/16MB onboard, probably sdram only
-                   5 -> same as above
-                   6 -> 4x128Kx32 chips, 4MB onboard, probably sgram
-                   7 -> 4x128Kx32 chips, 8MB onboard, probably sgram
-             G400: 0 -> 2x512Kx16 SDRAM, 16/32MB
-                        2x512Kx32 SGRAM, 16/32MB
-                   1 -> 2x256Kx32 SGRAM, 8/16MB
-                   2 -> 4x128Kx32 SGRAM, 8/16MB
-                   3 -> 4x512Kx32 SDRAM, 32MB
-                   4 -> 4x256Kx32 SGRAM, 16/32MB
-                   5 -> 2x1Mx32 SDRAM, 32MB
-                   6 -> reserved
-                   7 -> reserved
-           You should use sdram or sgram parameter in addition to memtype 
-           parameter.
-nomtrr   - disables write combining on frame buffer. This slows down driver but
-           there is reported minor incompatibility between GUS DMA and XFree
-	   under high loads if write combining is enabled (sound dropouts).
-mtrr     - enables write combining on frame buffer. It speeds up video accesses
-           much. It is default. You must have MTRR support enabled in kernel
-	   and your CPU must have MTRR (f.e. Pentium II have them).
-sgram    - tells to driver that you have Gxx0 with SGRAM memory. It has no
-           effect without `init'.
-sdram    - tells to driver that you have Gxx0 with SDRAM memory.
-           It is a default.
-inv24    - change timings parameters for 24bpp modes on Millennium and
-           Millennium II. Specify this if you see strange color shadows around
-	   characters.
-noinv24  - use standard timings. It is the default.
-inverse  - invert colors on screen (for LCD displays)
-noinverse - show true colors on screen. It is default.
-dev:X    - bind driver to device X. Driver numbers device from 0 up to N,
-           where device 0 is first `known' device found, 1 second and so on.
-	   lspci lists devices in this order.
-	   Default is `every' known device.
-nohwcursor - disables hardware cursor (use software cursor instead).
-hwcursor - enables hardware cursor. It is default. If you are using
-           non-accelerated mode (`noaccel' or `fbset -accel false'), software
-	   cursor is used (except for text mode).
-noblink  - disables cursor blinking. Cursor in text mode always blinks (hw
-           limitation).
-blink    - enables cursor blinking. It is default.
-nofastfont - disables fastfont feature. It is default.
-fastfont:X - enables fastfont feature. X specifies size of memory reserved for
-             font data, it must be >= (fontwidth*fontheight*chars_in_font)/8.
-	     It is faster on Gx00 series, but slower on older cards.
-grayscale - enable grayscale summing. It works in PSEUDOCOLOR modes (text,
-            4bpp, 8bpp). In DIRECTCOLOR modes it is limited to characters
-	    displayed through putc/putcs. Direct accesses to framebuffer
-	    can paint colors.
-nograyscale - disable grayscale summing. It is default.
-cross4MB - enables that pixel line can cross 4MB boundary. It is default for
-           non-Millennium.
-nocross4MB - pixel line must not cross 4MB boundary. It is default for
-             Millennium I or II, because of these devices have hardware
-	     limitations which do not allow this. But this option is
-	     incompatible with some (if not all yet released) versions of
-	     XF86_FBDev.
-dfp      - enables digital flat panel interface. This option is incompatible with
-           secondary (TV) output - if DFP is active, TV output must be
-	   inactive and vice versa. DFP always uses same timing as primary
-	   (monitor) output.
-dfp:X    - use settings X for digital flat panel interface. X is number from
-           0 to 0xFF, and meaning of each individual bit is described in
-	   G400 manual, in description of DAC register 0x1F. For normal operation
-	   you should set all bits to zero, except lowest bit. This lowest bit
-	   selects who is source of display clocks, whether G400, or panel.
-	   Default value is now read back from hardware - so you should specify
-	   this value only if you are also using `init' parameter.
-outputs:XYZ - set mapping between CRTC and outputs. Each letter can have value
-           of 0 (for no CRTC), 1 (CRTC1) or 2 (CRTC2), and first letter corresponds
-	   to primary analog output, second letter to the secondary analog output
-	   and third letter to the DVI output. Default setting is 100 for
-	   cards below G400 or G400 without DFP, 101 for G400 with DFP, and
-	   111 for G450 and G550. You can set mapping only on first card,
-	   use matroxset for setting up other devices.
-vesa:X   - selects startup videomode. X is number from 0 to 0x1FF, see table
-           above for detailed explanation. Default is 640x480x8bpp if driver
-	   has 8bpp support. Otherwise first available of 640x350x4bpp,
-	   640x480x15bpp, 640x480x24bpp, 640x480x32bpp or 80x25 text
-	   (80x25 text is always available).
-
-If you are not satisfied with videomode selected by `vesa' option, you
-can modify it with these options:
-
-xres:X   - horizontal resolution, in pixels. Default is derived from `vesa'
-           option.
-yres:X   - vertical resolution, in pixel lines. Default is derived from `vesa'
-           option.
-upper:X  - top boundary: lines between end of VSYNC pulse and start of first
-           pixel line of picture. Default is derived from `vesa' option.
-lower:X  - bottom boundary: lines between end of picture and start of VSYNC
-           pulse. Default is derived from `vesa' option.
-vslen:X  - length of VSYNC pulse, in lines. Default is derived from `vesa'
-           option.
-left:X   - left boundary: pixels between end of HSYNC pulse and first pixel.
-           Default is derived from `vesa' option.
-right:X  - right boundary: pixels between end of picture and start of HSYNC
-           pulse. Default is derived from `vesa' option.
-hslen:X  - length of HSYNC pulse, in pixels. Default is derived from `vesa'
-           option.
-pixclock:X - dotclocks, in ps (picoseconds). Default is derived from `vesa'
-             option and from `fh' and `fv' options.
-sync:X   - sync. pulse - bit 0 inverts HSYNC polarity, bit 1 VSYNC polarity.
-           If bit 3 (value 0x08) is set, composite sync instead of HSYNC is
-	   generated. If bit 5 (value 0x20) is set, sync on green is turned on.
-	   Do not forget that if you want sync on green, you also probably
-	   want composite sync.
-	   Default depends on `vesa'.
-depth:X  - Bits per pixel: 0=text, 4,8,15,16,24 or 32. Default depends on
-           `vesa'.
-
-If you know capabilities of your monitor, you can specify some (or all) of
-`maxclk', `fh' and `fv'. In this case, `pixclock' is computed so that
-pixclock <= maxclk, real_fh <= fh and real_fv <= fv.
-
-maxclk:X - maximum dotclock. X can be specified in MHz, kHz or Hz. Default is
-           `don't care'.
-fh:X     - maximum horizontal synchronization frequency. X can be specified
-           in kHz or Hz. Default is `don't care'.
-fv:X     - maximum vertical frequency. X must be specified in Hz. Default is
-           70 for modes derived from `vesa' with yres <= 400, 60Hz for
-	   yres > 400.
-
-
-Limitations
-===========
-
-There are known and unknown bugs, features and misfeatures.
-Currently there are following known bugs:
- + SVGALib does not restore screen on exit
- + generic fbcon-cfbX procedures do not work on Alphas. Due to this,
-   `noaccel' (and cfb4 accel) driver does not work on Alpha. So everyone
-   with access to /dev/fb* on Alpha can hang machine (you should restrict
-   access to /dev/fb* - everyone with access to this device can destroy
-   your monitor, believe me...).
- + 24bpp does not support correctly XF-FBDev on big-endian architectures.
- + interlaced text mode is not supported; it looks like hardware limitation,
-   but I'm not sure.
- + Gxx0 SGRAM/SDRAM is not autodetected.
- + If you are using more than one framebuffer device, you must boot kernel
-   with 'video=scrollback:0'.
- + maybe more...
-And following misfeatures:
- + SVGALib does not restore screen on exit.
- + pixclock for text modes is limited by hardware to
-    83 MHz on G200
-    66 MHz on Millennium I
-    60 MHz on Millennium II
-   Because I have no access to other devices, I do not know specific
-   frequencies for them. So driver does not check this and allows you to
-   set frequency higher that this. It causes sparks, black holes and other
-   pretty effects on screen. Device was not destroyed during tests. :-)
- + my Millennium G200 oscillator has frequency range from 35 MHz to 380 MHz
-   (and it works with 8bpp on about 320 MHz dotclocks (and changed mclk)).
-   But Matrox says on product sheet that VCO limit is 50-250 MHz, so I believe
-   them (maybe that chip overheats, but it has a very big cooler (G100 has
-   none), so it should work).
- + special mixed video/graphics videomodes of Mystique and Gx00 - 2G8V16 and
-   G16V16 are not supported
- + color keying is not supported
- + feature connector of Mystique and Gx00 is set to VGA mode (it is disabled
-   by BIOS)
- + DDC (monitor detection) is supported through dualhead driver
- + some check for input values are not so strict how it should be (you can
-   specify vslen=4000 and so on).
- + maybe more...
-And following features:
- + 4bpp is available only on Millennium I and Millennium II. It is hardware
-   limitation.
- + selection between 1:5:5:5 and 5:6:5 16bpp videomode is done by -rgba 
-   option of fbset: "fbset -depth 16 -rgba 5,5,5" selects 1:5:5:5, anything
-   else selects 5:6:5 mode.
- + text mode uses 6 bit VGA palette instead of 8 bit (one of 262144 colors
-   instead of one of 16M colors). It is due to hardware limitation of 
-   Millennium I/II and SVGALib compatibility.
-
-
-Benchmarks
-==========
-It is time to redraw whole screen 1000 times in 1024x768, 60Hz. It is
-time for draw 6144000 characters on screen through /dev/vcsa
-(for 32bpp it is about 3GB of data (exactly 3000 MB); for 8x16 font in 
-16 seconds, i.e. 187 MBps).
-Times were obtained from one older version of driver, now they are about 3%
-faster, it is kernel-space only time on P-II/350 MHz, Millennium I in 33 MHz
-PCI slot, G200 in AGP 2x slot. I did not test vgacon.
-
-NOACCEL
-        8x16                 12x22
-        Millennium I  G200   Millennium I  G200
-8bpp    16.42         9.54   12.33         9.13
-16bpp   21.00        15.70   19.11        15.02
-24bpp   36.66        36.66   35.00        35.00
-32bpp   35.00        30.00   33.85        28.66
-
-ACCEL, nofastfont
-        8x16                 12x22                6x11
-	Millennium I  G200   Millennium I  G200   Millennium I  G200
-8bpp     7.79         7.24   13.55         7.78   30.00        21.01
-16bpp    9.13         7.78   16.16         7.78   30.00        21.01
-24bpp   14.17        10.72   18.69        10.24   34.99        21.01
-32bpp   16.15	     16.16   18.73        13.09   34.99        21.01
-
-ACCEL, fastfont
-        8x16                 12x22                6x11
-	Millennium I  G200   Millennium I  G200   Millennium I  G200
-8bpp     8.41         6.01    6.54         4.37   16.00        10.51
-16bpp    9.54         9.12    8.76         6.17   17.52        14.01
-24bpp   15.00        12.36   11.67        10.00   22.01        18.32
-32bpp   16.18        18.29*  12.71        12.74   24.44        21.00
-
-TEXT
-        8x16
-	Millennium I  G200
-TEXT     3.29         1.50
-
-* Yes, it is slower than Millennium I.
-
-
-Dualhead G400
-=============
-Driver supports dualhead G400 with some limitations:
- + secondary head shares videomemory with primary head. It is not problem
-   if you have 32MB of videoram, but if you have only 16MB, you may have
-   to think twice before choosing videomode (for example twice 1880x1440x32bpp
-   is not possible).
- + due to hardware limitation, secondary head can use only 16 and 32bpp
-   videomodes.
- + secondary head is not accelerated. There were bad problems with accelerated
-   XFree when secondary head used to use acceleration.
- + secondary head always powerups in 640x480@60-32 videomode. You have to use
-   fbset to change this mode.
- + secondary head always powerups in monitor mode. You have to use fbmatroxset
-   to change it to TV mode. Also, you must select at least 525 lines for
-   NTSC output and 625 lines for PAL output.
- + kernel is not fully multihead ready. So some things are impossible to do.
- + if you compiled it as module, you must insert i2c-matroxfb, matroxfb_maven
-   and matroxfb_crtc2 into kernel.
-
-
-Dualhead G450
-=============
-Driver supports dualhead G450 with some limitations:
- + secondary head shares videomemory with primary head. It is not problem
-   if you have 32MB of videoram, but if you have only 16MB, you may have
-   to think twice before choosing videomode.
- + due to hardware limitation, secondary head can use only 16 and 32bpp
-   videomodes.
- + secondary head is not accelerated.
- + secondary head always powerups in 640x480@60-32 videomode. You have to use
-   fbset to change this mode.
- + TV output is not supported
- + kernel is not fully multihead ready, so some things are impossible to do.
- + if you compiled it as module, you must insert matroxfb_g450 and matroxfb_crtc2
-   into kernel.
-	
---
-Petr Vandrovec <vandrove@vc.cvut.cz>
diff --git a/Documentation/fb/metronomefb.rst b/Documentation/fb/metronomefb.rst
new file mode 100644
index 000000000000..63e1d31a7e54
--- /dev/null
+++ b/Documentation/fb/metronomefb.rst
@@ -0,0 +1,38 @@
+===========
+Metronomefb
+===========
+
+Maintained by Jaya Kumar <jayakumar.lkml.gmail.com>
+
+Last revised: Mar 10, 2008
+
+Metronomefb is a driver for the Metronome display controller. The controller
+is from E-Ink Corporation. It is intended to be used to drive the E-Ink
+Vizplex display media. E-Ink hosts some details of this controller and the
+display media here http://www.e-ink.com/products/matrix/metronome.html .
+
+Metronome is interfaced to the host CPU through the AMLCD interface. The
+host CPU generates the control information and the image in a framebuffer
+which is then delivered to the AMLCD interface by a host specific method.
+The display and error status are each pulled through individual GPIOs.
+
+Metronomefb is platform independent and depends on a board specific driver
+to do all physical IO work. Currently, an example is implemented for the
+PXA board used in the AM-200 EPD devkit. This example is am200epd.c
+
+Metronomefb requires waveform information which is delivered via the AMLCD
+interface to the metronome controller. The waveform information is expected to
+be delivered from userspace via the firmware class interface. The waveform file
+can be compressed as long as your udev or hotplug script is aware of the need
+to uncompress it before delivering it. metronomefb will ask for metronome.wbf
+which would typically go into /lib/firmware/metronome.wbf depending on your
+udev/hotplug setup. I have only tested with a single waveform file which was
+originally labeled 23P01201_60_WT0107_MTC. I do not know what it stands for.
+Caution should be exercised when manipulating the waveform as there may be
+a possibility that it could have some permanent effects on the display media.
+I neither have access to nor know exactly what the waveform does in terms of
+the physical media.
+
+Metronomefb uses the deferred IO interface so that it can provide a memory
+mappable frame buffer. It has been tested with tinyx (Xfbdev). It is known
+to work at this time with xeyes, xclock, xloadimage, xpdf.
diff --git a/Documentation/fb/metronomefb.txt b/Documentation/fb/metronomefb.txt
deleted file mode 100644
index 237ca412582d..000000000000
--- a/Documentation/fb/metronomefb.txt
+++ /dev/null
@@ -1,36 +0,0 @@
-			Metronomefb
-			-----------
-Maintained by Jaya Kumar <jayakumar.lkml.gmail.com>
-Last revised: Mar 10, 2008
-
-Metronomefb is a driver for the Metronome display controller. The controller
-is from E-Ink Corporation. It is intended to be used to drive the E-Ink
-Vizplex display media. E-Ink hosts some details of this controller and the
-display media here http://www.e-ink.com/products/matrix/metronome.html .
-
-Metronome is interfaced to the host CPU through the AMLCD interface. The
-host CPU generates the control information and the image in a framebuffer
-which is then delivered to the AMLCD interface by a host specific method.
-The display and error status are each pulled through individual GPIOs.
-
-Metronomefb is platform independent and depends on a board specific driver
-to do all physical IO work. Currently, an example is implemented for the
-PXA board used in the AM-200 EPD devkit. This example is am200epd.c
-
-Metronomefb requires waveform information which is delivered via the AMLCD
-interface to the metronome controller. The waveform information is expected to
-be delivered from userspace via the firmware class interface. The waveform file
-can be compressed as long as your udev or hotplug script is aware of the need
-to uncompress it before delivering it. metronomefb will ask for metronome.wbf
-which would typically go into /lib/firmware/metronome.wbf depending on your
-udev/hotplug setup. I have only tested with a single waveform file which was
-originally labeled 23P01201_60_WT0107_MTC. I do not know what it stands for.
-Caution should be exercised when manipulating the waveform as there may be
-a possibility that it could have some permanent effects on the display media.
-I neither have access to nor know exactly what the waveform does in terms of
-the physical media.
-
-Metronomefb uses the deferred IO interface so that it can provide a memory
-mappable frame buffer. It has been tested with tinyx (Xfbdev). It is known
-to work at this time with xeyes, xclock, xloadimage, xpdf.
-
diff --git a/Documentation/fb/modedb.rst b/Documentation/fb/modedb.rst
new file mode 100644
index 000000000000..3c2397293977
--- /dev/null
+++ b/Documentation/fb/modedb.rst
@@ -0,0 +1,155 @@
+=================================
+modedb default video mode support
+=================================
+
+
+Currently all frame buffer device drivers have their own video mode databases,
+which is a mess and a waste of resources. The main idea of modedb is to have
+
+  - one routine to probe for video modes, which can be used by all frame buffer
+    devices
+  - one generic video mode database with a fair amount of standard videomodes
+    (taken from XFree86)
+  - the possibility to supply your own mode database for graphics hardware that
+    needs non-standard modes, like amifb and Mac frame buffer drivers (which
+    use macmodes.c)
+
+When a frame buffer device receives a video= option it doesn't know, it should
+consider that to be a video mode option. If no frame buffer device is specified
+in a video= option, fbmem considers that to be a global video mode option.
+
+Valid mode specifiers (mode_option argument)::
+
+    <xres>x<yres>[M][R][-<bpp>][@<refresh>][i][m][eDd]
+    <name>[-<bpp>][@<refresh>]
+
+with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string.
+Things between square brackets are optional.
+
+If 'M' is specified in the mode_option argument (after <yres> and before
+<bpp> and <refresh>, if specified) the timings will be calculated using
+VESA(TM) Coordinated Video Timings instead of looking up the mode from a table.
+If 'R' is specified, do a 'reduced blanking' calculation for digital displays.
+If 'i' is specified, calculate for an interlaced mode.  And if 'm' is
+specified, add margins to the calculation (1.8% of xres rounded down to 8
+pixels and 1.8% of yres).
+
+       Sample usage: 1024x768M@60m - CVT timing with margins
+
+DRM drivers also add options to enable or disable outputs:
+
+'e' will force the display to be enabled, i.e. it will override the detection
+if a display is connected. 'D' will force the display to be enabled and use
+digital output. This is useful for outputs that have both analog and digital
+signals (e.g. HDMI and DVI-I). For other outputs it behaves like 'e'. If 'd'
+is specified the output is disabled.
+
+You can additionally specify which output the options matches to.
+To force the VGA output to be enabled and drive a specific mode say::
+
+    video=VGA-1:1280x1024@60me
+
+Specifying the option multiple times for different ports is possible, e.g.::
+
+    video=LVDS-1:d video=HDMI-1:D
+
+-----------------------------------------------------------------------------
+
+What is the VESA(TM) Coordinated Video Timings (CVT)?
+=====================================================
+
+From the VESA(TM) Website:
+
+     "The purpose of CVT is to provide a method for generating a consistent
+      and coordinated set of standard formats, display refresh rates, and
+      timing specifications for computer display products, both those
+      employing CRTs, and those using other display technologies. The
+      intention of CVT is to give both source and display manufacturers a
+      common set of tools to enable new timings to be developed in a
+      consistent manner that ensures greater compatibility."
+
+This is the third standard approved by VESA(TM) concerning video timings.  The
+first was the Discrete Video Timings (DVT) which is  a collection of
+pre-defined modes approved by VESA(TM).  The second is the Generalized Timing
+Formula (GTF) which is an algorithm to calculate the timings, given the
+pixelclock, the horizontal sync frequency, or the vertical refresh rate.
+
+The GTF is limited by the fact that it is designed mainly for CRT displays.
+It artificially increases the pixelclock because of its high blanking
+requirement. This is inappropriate for digital display interface with its high
+data rate which requires that it conserves the pixelclock as much as possible.
+Also, GTF does not take into account the aspect ratio of the display.
+
+The CVT addresses these limitations.  If used with CRT's, the formula used
+is a derivation of GTF with a few modifications.  If used with digital
+displays, the "reduced blanking" calculation can be used.
+
+From the framebuffer subsystem perspective, new formats need not be added
+to the global mode database whenever a new mode is released by display
+manufacturers. Specifying for CVT will work for most, if not all, relatively
+new CRT displays and probably with most flatpanels, if 'reduced blanking'
+calculation is specified.  (The CVT compatibility of the display can be
+determined from its EDID. The version 1.3 of the EDID has extra 128-byte
+blocks where additional timing information is placed.  As of this time, there
+is no support yet in the layer to parse this additional blocks.)
+
+CVT also introduced a new naming convention (should be seen from dmesg output)::
+
+    <pix>M<a>[-R]
+
+    where: pix = total amount of pixels in MB (xres x yres)
+	   M   = always present
+	   a   = aspect ratio (3 - 4:3; 4 - 5:4; 9 - 15:9, 16:9; A - 16:10)
+	  -R   = reduced blanking
+
+	  example:  .48M3-R - 800x600 with reduced blanking
+
+Note: VESA(TM) has restrictions on what is a standard CVT timing:
+
+      - aspect ratio can only be one of the above values
+      - acceptable refresh rates are 50, 60, 70 or 85 Hz only
+      - if reduced blanking, the refresh rate must be at 60Hz
+
+If one of the above are not satisfied, the kernel will print a warning but the
+timings will still be calculated.
+
+-----------------------------------------------------------------------------
+
+To find a suitable video mode, you just call::
+
+  int __init fb_find_mode(struct fb_var_screeninfo *var,
+			  struct fb_info *info, const char *mode_option,
+			  const struct fb_videomode *db, unsigned int dbsize,
+			  const struct fb_videomode *default_mode,
+			  unsigned int default_bpp)
+
+with db/dbsize your non-standard video mode database, or NULL to use the
+standard video mode database.
+
+fb_find_mode() first tries the specified video mode (or any mode that matches,
+e.g. there can be multiple 640x480 modes, each of them is tried). If that
+fails, the default mode is tried. If that fails, it walks over all modes.
+
+To specify a video mode at bootup, use the following boot options::
+
+    video=<driver>:<xres>x<yres>[-<bpp>][@refresh]
+
+where <driver> is a name from the table below.  Valid default modes can be
+found in linux/drivers/video/modedb.c.  Check your driver's documentation.
+There may be more modes::
+
+    Drivers that support modedb boot options
+    Boot Name	  Cards Supported
+
+    amifb	- Amiga chipset frame buffer
+    aty128fb	- ATI Rage128 / Pro frame buffer
+    atyfb	- ATI Mach64 frame buffer
+    pm2fb	- Permedia 2/2V frame buffer
+    pm3fb	- Permedia 3 frame buffer
+    sstfb	- Voodoo 1/2 (SST1) chipset frame buffer
+    tdfxfb	- 3D Fx frame buffer
+    tridentfb	- Trident (Cyber)blade chipset frame buffer
+    vt8623fb	- VIA 8623 frame buffer
+
+BTW, only a few fb drivers use this at the moment. Others are to follow
+(feel free to send patches). The DRM drivers also support this.
diff --git a/Documentation/fb/modedb.txt b/Documentation/fb/modedb.txt
deleted file mode 100644
index 16aa08453911..000000000000
--- a/Documentation/fb/modedb.txt
+++ /dev/null
@@ -1,151 +0,0 @@
-
-
-			modedb default video mode support
-
-
-Currently all frame buffer device drivers have their own video mode databases,
-which is a mess and a waste of resources. The main idea of modedb is to have
-
-  - one routine to probe for video modes, which can be used by all frame buffer
-    devices
-  - one generic video mode database with a fair amount of standard videomodes
-    (taken from XFree86)
-  - the possibility to supply your own mode database for graphics hardware that
-    needs non-standard modes, like amifb and Mac frame buffer drivers (which
-    use macmodes.c)
-
-When a frame buffer device receives a video= option it doesn't know, it should
-consider that to be a video mode option. If no frame buffer device is specified
-in a video= option, fbmem considers that to be a global video mode option.
-
-Valid mode specifiers (mode_option argument):
-
-    <xres>x<yres>[M][R][-<bpp>][@<refresh>][i][m][eDd]
-    <name>[-<bpp>][@<refresh>]
-
-with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string.
-Things between square brackets are optional.
-
-If 'M' is specified in the mode_option argument (after <yres> and before
-<bpp> and <refresh>, if specified) the timings will be calculated using
-VESA(TM) Coordinated Video Timings instead of looking up the mode from a table.
-If 'R' is specified, do a 'reduced blanking' calculation for digital displays.
-If 'i' is specified, calculate for an interlaced mode.  And if 'm' is
-specified, add margins to the calculation (1.8% of xres rounded down to 8
-pixels and 1.8% of yres).
-
-       Sample usage: 1024x768M@60m - CVT timing with margins
-
-DRM drivers also add options to enable or disable outputs:
-
-'e' will force the display to be enabled, i.e. it will override the detection
-if a display is connected. 'D' will force the display to be enabled and use
-digital output. This is useful for outputs that have both analog and digital
-signals (e.g. HDMI and DVI-I). For other outputs it behaves like 'e'. If 'd'
-is specified the output is disabled.
-
-You can additionally specify which output the options matches to.
-To force the VGA output to be enabled and drive a specific mode say:
-    video=VGA-1:1280x1024@60me
-
-Specifying the option multiple times for different ports is possible, e.g.:
-    video=LVDS-1:d video=HDMI-1:D
-
-***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
-
-What is the VESA(TM) Coordinated Video Timings (CVT)?
-
-From the VESA(TM) Website:
-
-     "The purpose of CVT is to provide a method for generating a consistent
-      and coordinated set of standard formats, display refresh rates, and
-      timing specifications for computer display products, both those
-      employing CRTs, and those using other display technologies. The
-      intention of CVT is to give both source and display manufacturers a
-      common set of tools to enable new timings to be developed in a
-      consistent manner that ensures greater compatibility."
-
-This is the third standard approved by VESA(TM) concerning video timings.  The
-first was the Discrete Video Timings (DVT) which is  a collection of
-pre-defined modes approved by VESA(TM).  The second is the Generalized Timing
-Formula (GTF) which is an algorithm to calculate the timings, given the
-pixelclock, the horizontal sync frequency, or the vertical refresh rate.
-
-The GTF is limited by the fact that it is designed mainly for CRT displays.
-It artificially increases the pixelclock because of its high blanking
-requirement. This is inappropriate for digital display interface with its high
-data rate which requires that it conserves the pixelclock as much as possible.
-Also, GTF does not take into account the aspect ratio of the display.
-
-The CVT addresses these limitations.  If used with CRT's, the formula used
-is a derivation of GTF with a few modifications.  If used with digital
-displays, the "reduced blanking" calculation can be used.
-
-From the framebuffer subsystem perspective, new formats need not be added
-to the global mode database whenever a new mode is released by display
-manufacturers. Specifying for CVT will work for most, if not all, relatively
-new CRT displays and probably with most flatpanels, if 'reduced blanking'
-calculation is specified.  (The CVT compatibility of the display can be
-determined from its EDID. The version 1.3 of the EDID has extra 128-byte
-blocks where additional timing information is placed.  As of this time, there
-is no support yet in the layer to parse this additional blocks.)
-
-CVT also introduced a new naming convention (should be seen from dmesg output):
-
-    <pix>M<a>[-R]
-
-    where: pix = total amount of pixels in MB (xres x yres)
-           M   = always present
-           a   = aspect ratio (3 - 4:3; 4 - 5:4; 9 - 15:9, 16:9; A - 16:10)
-          -R   = reduced blanking
-
-	  example:  .48M3-R - 800x600 with reduced blanking
-
-Note: VESA(TM) has restrictions on what is a standard CVT timing:
-
-      - aspect ratio can only be one of the above values
-      - acceptable refresh rates are 50, 60, 70 or 85 Hz only
-      - if reduced blanking, the refresh rate must be at 60Hz
-
-If one of the above are not satisfied, the kernel will print a warning but the
-timings will still be calculated.
-
-***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
-
-To find a suitable video mode, you just call
-
-int __init fb_find_mode(struct fb_var_screeninfo *var,
-                        struct fb_info *info, const char *mode_option,
-                        const struct fb_videomode *db, unsigned int dbsize,
-                        const struct fb_videomode *default_mode,
-                        unsigned int default_bpp)
-
-with db/dbsize your non-standard video mode database, or NULL to use the
-standard video mode database.
-
-fb_find_mode() first tries the specified video mode (or any mode that matches,
-e.g. there can be multiple 640x480 modes, each of them is tried). If that
-fails, the default mode is tried. If that fails, it walks over all modes.
-
-To specify a video mode at bootup, use the following boot options:
-    video=<driver>:<xres>x<yres>[-<bpp>][@refresh]
-
-where <driver> is a name from the table below.  Valid default modes can be
-found in linux/drivers/video/modedb.c.  Check your driver's documentation.
-There may be more modes.
-
-    Drivers that support modedb boot options
-    Boot Name	  Cards Supported
-
-    amifb	- Amiga chipset frame buffer
-    aty128fb	- ATI Rage128 / Pro frame buffer
-    atyfb	- ATI Mach64 frame buffer
-    pm2fb	- Permedia 2/2V frame buffer
-    pm3fb	- Permedia 3 frame buffer
-    sstfb	- Voodoo 1/2 (SST1) chipset frame buffer
-    tdfxfb	- 3D Fx frame buffer
-    tridentfb	- Trident (Cyber)blade chipset frame buffer
-    vt8623fb	- VIA 8623 frame buffer
-
-BTW, only a few fb drivers use this at the moment. Others are to follow
-(feel free to send patches). The DRM drivers also support this.
diff --git a/Documentation/fb/pvr2fb.rst b/Documentation/fb/pvr2fb.rst
new file mode 100644
index 000000000000..fcf2c21c8fcf
--- /dev/null
+++ b/Documentation/fb/pvr2fb.rst
@@ -0,0 +1,66 @@
+===============
+What is pvr2fb?
+===============
+
+This is a driver for PowerVR 2 based graphics frame buffers, such as the
+one found in the Dreamcast.
+
+Advantages:
+
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts (NOT on the Dreamcast)
+ * You can run XF86_FBDev on top of /dev/fb0
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * Driver is largely untested on non-Dreamcast systems.
+
+Configuration
+=============
+
+You can pass kernel command line options to pvr2fb with
+`video=pvr2fb:option1,option2:value2,option3` (multiple options should be
+separated by comma, values are separated from options by `:`).
+
+Accepted options:
+
+==========  ==================================================================
+font:X      default font to use. All fonts are supported, including the
+	    SUN12x22 font which is very nice at high resolutions.
+
+
+mode:X      default video mode with format [xres]x[yres]-<bpp>@<refresh rate>
+	    The following video modes are supported:
+	    640x640-16@60, 640x480-24@60, 640x480-32@60. The Dreamcast
+	    defaults to 640x480-16@60. At the time of writing the
+	    24bpp and 32bpp modes function poorly. Work to fix that is
+	    ongoing
+
+	    Note: the 640x240 mode is currently broken, and should not be
+	    used for any reason. It is only mentioned here as a reference.
+
+inverse     invert colors on screen (for LCD displays)
+
+nomtrr      disables write combining on frame buffer. This slows down driver
+	    but there is reported minor incompatibility between GUS DMA and
+	    XFree under high loads if write combining is enabled (sound
+	    dropouts). MTRR is enabled by default on systems that have it
+	    configured and that support it.
+
+cable:X     cable type. This can be any of the following: vga, rgb, and
+	    composite. If none is specified, we guess.
+
+output:X    output type. This can be any of the following: pal, ntsc, and
+	    vga. If none is specified, we guess.
+==========  ==================================================================
+
+X11
+===
+
+XF86_FBDev has been shown to work on the Dreamcast in the past - though not yet
+on any 2.6 series kernel.
+
+Paul Mundt <lethal@linuxdc.org>
+
+Updated by Adrian McMenamin <adrian@mcmen.demon.co.uk>
diff --git a/Documentation/fb/pvr2fb.txt b/Documentation/fb/pvr2fb.txt
deleted file mode 100644
index 36bdeff585e2..000000000000
--- a/Documentation/fb/pvr2fb.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-$Id: pvr2fb.txt,v 1.1 2001/05/24 05:09:16 mrbrown Exp $
-
-What is pvr2fb?
-===============
-
-This is a driver for PowerVR 2 based graphics frame buffers, such as the
-one found in the Dreamcast.
-
-Advantages:
-
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts (NOT on the Dreamcast)
- * You can run XF86_FBDev on top of /dev/fb0
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * Driver is largely untested on non-Dreamcast systems.
-
-Configuration
-=============
-
-You can pass kernel command line options to pvr2fb with
-`video=pvr2fb:option1,option2:value2,option3' (multiple options should be
-separated by comma, values are separated from options by `:').
-Accepted options:
-
-font:X    - default font to use. All fonts are supported, including the
-            SUN12x22 font which is very nice at high resolutions.
-
-	    
-mode:X    - default video mode with format [xres]x[yres]-<bpp>@<refresh rate>
-            The following video modes are supported:
-            640x640-16@60, 640x480-24@60, 640x480-32@60. The Dreamcast
-            defaults to 640x480-16@60. At the time of writing the
-            24bpp and 32bpp modes function poorly. Work to fix that is
-            ongoing
-
-            Note: the 640x240 mode is currently broken, and should not be
-            used for any reason. It is only mentioned here as a reference.
-
-inverse   - invert colors on screen (for LCD displays)
-
-nomtrr    - disables write combining on frame buffer. This slows down driver
-            but there is reported minor incompatibility between GUS DMA and
-            XFree under high loads if write combining is enabled (sound
-            dropouts). MTRR is enabled by default on systems that have it
-            configured and that support it.
-
-cable:X   - cable type. This can be any of the following: vga, rgb, and
-            composite. If none is specified, we guess.
-
-output:X  - output type. This can be any of the following: pal, ntsc, and
-            vga. If none is specified, we guess.
-
-X11
-===
-
-XF86_FBDev has been shown to work on the Dreamcast in the past - though not yet
-on any 2.6 series kernel.
-
---
-Paul Mundt <lethal@linuxdc.org>
-Updated by Adrian McMenamin <adrian@mcmen.demon.co.uk>
-
diff --git a/Documentation/fb/pxafb.rst b/Documentation/fb/pxafb.rst
new file mode 100644
index 000000000000..90177f5e7e76
--- /dev/null
+++ b/Documentation/fb/pxafb.rst
@@ -0,0 +1,173 @@
+================================
+Driver for PXA25x LCD controller
+================================
+
+The driver supports the following options, either via
+options=<OPTIONS> when modular or video=pxafb:<OPTIONS> when built in.
+
+For example::
+
+	modprobe pxafb options=vmem:2M,mode:640x480-8,passive
+
+or on the kernel command line::
+
+	video=pxafb:vmem:2M,mode:640x480-8,passive
+
+vmem: VIDEO_MEM_SIZE
+
+	Amount of video memory to allocate (can be suffixed with K or M
+	for kilobytes or megabytes)
+
+mode:XRESxYRES[-BPP]
+
+	XRES == LCCR1_PPL + 1
+
+	YRES == LLCR2_LPP + 1
+
+		The resolution of the display in pixels
+
+	BPP == The bit depth. Valid values are 1, 2, 4, 8 and 16.
+
+pixclock:PIXCLOCK
+
+	Pixel clock in picoseconds
+
+left:LEFT == LCCR1_BLW + 1
+
+right:RIGHT == LCCR1_ELW + 1
+
+hsynclen:HSYNC == LCCR1_HSW + 1
+
+upper:UPPER == LCCR2_BFW
+
+lower:LOWER == LCCR2_EFR
+
+vsynclen:VSYNC == LCCR2_VSW + 1
+
+	Display margins and sync times
+
+color | mono => LCCR0_CMS
+
+	umm...
+
+active | passive => LCCR0_PAS
+
+	Active (TFT) or Passive (STN) display
+
+single | dual => LCCR0_SDS
+
+	Single or dual panel passive display
+
+4pix | 8pix => LCCR0_DPD
+
+	4 or 8 pixel monochrome single panel data
+
+hsync:HSYNC, vsync:VSYNC
+
+	Horizontal and vertical sync. 0 => active low, 1 => active
+	high.
+
+dpc:DPC
+
+	Double pixel clock. 1=>true, 0=>false
+
+outputen:POLARITY
+
+	Output Enable Polarity. 0 => active low, 1 => active high
+
+pixclockpol:POLARITY
+
+	pixel clock polarity
+	0 => falling edge, 1 => rising edge
+
+
+Overlay Support for PXA27x and later LCD controllers
+====================================================
+
+  PXA27x and later processors support overlay1 and overlay2 on-top of the
+  base framebuffer (although under-neath the base is also possible). They
+  support palette and no-palette RGB formats, as well as YUV formats (only
+  available on overlay2). These overlays have dedicated DMA channels and
+  behave in a similar way as a framebuffer.
+
+  However, there are some differences between these overlay framebuffers
+  and normal framebuffers, as listed below:
+
+  1. overlay can start at a 32-bit word aligned position within the base
+     framebuffer, which means they have a start (x, y). This information
+     is encoded into var->nonstd (no, var->xoffset and var->yoffset are
+     not for such purpose).
+
+  2. overlay framebuffer is allocated dynamically according to specified
+     'struct fb_var_screeninfo', the amount is decided by::
+
+	var->xres_virtual * var->yres_virtual * bpp
+
+     bpp = 16 -- for RGB565 or RGBT555
+
+     bpp = 24 -- for YUV444 packed
+
+     bpp = 24 -- for YUV444 planar
+
+     bpp = 16 -- for YUV422 planar (1 pixel = 1 Y + 1/2 Cb + 1/2 Cr)
+
+     bpp = 12 -- for YUV420 planar (1 pixel = 1 Y + 1/4 Cb + 1/4 Cr)
+
+     NOTE:
+
+     a. overlay does not support panning in x-direction, thus
+	var->xres_virtual will always be equal to var->xres
+
+     b. line length of overlay(s) must be on a 32-bit word boundary,
+	for YUV planar modes, it is a requirement for the component
+	with minimum bits per pixel,  e.g. for YUV420, Cr component
+	for one pixel is actually 2-bits, it means the line length
+	should be a multiple of 16-pixels
+
+     c. starting horizontal position (XPOS) should start on a 32-bit
+	word boundary, otherwise the fb_check_var() will just fail.
+
+     d. the rectangle of the overlay should be within the base plane,
+	otherwise fail
+
+     Applications should follow the sequence below to operate an overlay
+     framebuffer:
+
+	 a. open("/dev/fb[1-2]", ...)
+	 b. ioctl(fd, FBIOGET_VSCREENINFO, ...)
+	 c. modify 'var' with desired parameters:
+
+	    1) var->xres and var->yres
+	    2) larger var->yres_virtual if more memory is required,
+	       usually for double-buffering
+	    3) var->nonstd for starting (x, y) and color format
+	    4) var->{red, green, blue, transp} if RGB mode is to be used
+
+	 d. ioctl(fd, FBIOPUT_VSCREENINFO, ...)
+	 e. ioctl(fd, FBIOGET_FSCREENINFO, ...)
+	 f. mmap
+	 g. ...
+
+  3. for YUV planar formats, these are actually not supported within the
+     framebuffer framework, application has to take care of the offsets
+     and lengths of each component within the framebuffer.
+
+  4. var->nonstd is used to pass starting (x, y) position and color format,
+     the detailed bit fields are shown below::
+
+      31                23  20         10          0
+       +-----------------+---+----------+----------+
+       |  ... unused ... |FOR|   XPOS   |   YPOS   |
+       +-----------------+---+----------+----------+
+
+     FOR  - color format, as defined by OVERLAY_FORMAT_* in pxafb.h
+
+	  - 0 - RGB
+	  - 1 - YUV444 PACKED
+	  - 2 - YUV444 PLANAR
+	  - 3 - YUV422 PLANAR
+	  - 4 - YUR420 PLANAR
+
+     XPOS - starting horizontal position
+
+     YPOS - starting vertical position
diff --git a/Documentation/fb/pxafb.txt b/Documentation/fb/pxafb.txt
deleted file mode 100644
index d143a0a749f9..000000000000
--- a/Documentation/fb/pxafb.txt
+++ /dev/null
@@ -1,142 +0,0 @@
-Driver for PXA25x LCD controller
-================================
-
-The driver supports the following options, either via
-options=<OPTIONS> when modular or video=pxafb:<OPTIONS> when built in.
-
-For example:
-	modprobe pxafb options=vmem:2M,mode:640x480-8,passive
-or on the kernel command line
-	video=pxafb:vmem:2M,mode:640x480-8,passive
-
-vmem: VIDEO_MEM_SIZE
-	Amount of video memory to allocate (can be suffixed with K or M
-	for kilobytes or megabytes)
-
-mode:XRESxYRES[-BPP]
-	XRES == LCCR1_PPL + 1
-	YRES == LLCR2_LPP + 1
-		The resolution of the display in pixels
-	BPP == The bit depth. Valid values are 1, 2, 4, 8 and 16.
-
-pixclock:PIXCLOCK
-	Pixel clock in picoseconds
-
-left:LEFT == LCCR1_BLW + 1
-right:RIGHT == LCCR1_ELW + 1
-hsynclen:HSYNC == LCCR1_HSW + 1
-upper:UPPER == LCCR2_BFW
-lower:LOWER == LCCR2_EFR
-vsynclen:VSYNC == LCCR2_VSW + 1
-	Display margins and sync times
-
-color | mono => LCCR0_CMS
-	umm...
-
-active | passive => LCCR0_PAS
-	Active (TFT) or Passive (STN) display
-
-single | dual => LCCR0_SDS
-	Single or dual panel passive display
-
-4pix | 8pix => LCCR0_DPD
-	4 or 8 pixel monochrome single panel data
-
-hsync:HSYNC
-vsync:VSYNC
-	Horizontal and vertical sync. 0 => active low, 1 => active
-	high.
-
-dpc:DPC
-	Double pixel clock. 1=>true, 0=>false
-
-outputen:POLARITY
-	Output Enable Polarity. 0 => active low, 1 => active high
-
-pixclockpol:POLARITY
-	pixel clock polarity
-	0 => falling edge, 1 => rising edge
-
-
-Overlay Support for PXA27x and later LCD controllers
-====================================================
-
-  PXA27x and later processors support overlay1 and overlay2 on-top of the
-  base framebuffer (although under-neath the base is also possible). They
-  support palette and no-palette RGB formats, as well as YUV formats (only
-  available on overlay2). These overlays have dedicated DMA channels and
-  behave in a similar way as a framebuffer.
-
-  However, there are some differences between these overlay framebuffers
-  and normal framebuffers, as listed below:
-
-  1. overlay can start at a 32-bit word aligned position within the base
-     framebuffer, which means they have a start (x, y). This information
-     is encoded into var->nonstd (no, var->xoffset and var->yoffset are
-     not for such purpose).
-
-  2. overlay framebuffer is allocated dynamically according to specified
-     'struct fb_var_screeninfo', the amount is decided by:
-
-        var->xres_virtual * var->yres_virtual * bpp
-
-     bpp = 16 -- for RGB565 or RGBT555
-         = 24 -- for YUV444 packed
-         = 24 -- for YUV444 planar
-	 = 16 -- for YUV422 planar (1 pixel = 1 Y + 1/2 Cb + 1/2 Cr)
-	 = 12 -- for YUV420 planar (1 pixel = 1 Y + 1/4 Cb + 1/4 Cr)
-
-     NOTE:
-
-     a. overlay does not support panning in x-direction, thus
-        var->xres_virtual will always be equal to var->xres
-
-     b. line length of overlay(s) must be on a 32-bit word boundary,
-        for YUV planar modes, it is a requirement for the component
-	with minimum bits per pixel,  e.g. for YUV420, Cr component
-	for one pixel is actually 2-bits, it means the line length
-	should be a multiple of 16-pixels
-
-     c. starting horizontal position (XPOS) should start on a 32-bit
-        word boundary, otherwise the fb_check_var() will just fail.
-
-     d. the rectangle of the overlay should be within the base plane,
-        otherwise fail
-
-     Applications should follow the sequence below to operate an overlay
-     framebuffer:
-
-         a. open("/dev/fb[1-2]", ...)
-	 b. ioctl(fd, FBIOGET_VSCREENINFO, ...)
-	 c. modify 'var' with desired parameters:
-	    1) var->xres and var->yres
-	    2) larger var->yres_virtual if more memory is required,
-	       usually for double-buffering
-	    3) var->nonstd for starting (x, y) and color format
-	    4) var->{red, green, blue, transp} if RGB mode is to be used
-	 d. ioctl(fd, FBIOPUT_VSCREENINFO, ...)
-	 e. ioctl(fd, FBIOGET_FSCREENINFO, ...)
-	 f. mmap
-	 g. ...
-
-  3. for YUV planar formats, these are actually not supported within the
-     framebuffer framework, application has to take care of the offsets
-     and lengths of each component within the framebuffer.
-
-  4. var->nonstd is used to pass starting (x, y) position and color format,
-     the detailed bit fields are shown below:
-
-    31                23  20         10          0
-     +-----------------+---+----------+----------+
-     |  ... unused ... |FOR|   XPOS   |   YPOS   |
-     +-----------------+---+----------+----------+
-
-     FOR  - color format, as defined by OVERLAY_FORMAT_* in pxafb.h
-            0 - RGB
-	    1 - YUV444 PACKED
-	    2 - YUV444 PLANAR
-	    3 - YUV422 PLANAR
-	    4 - YUR420 PLANAR
-
-     XPOS - starting horizontal position
-     YPOS - starting vertical position
diff --git a/Documentation/fb/s3fb.rst b/Documentation/fb/s3fb.rst
new file mode 100644
index 000000000000..e809d69c21a7
--- /dev/null
+++ b/Documentation/fb/s3fb.rst
@@ -0,0 +1,82 @@
+===========================================
+s3fb - fbdev driver for S3 Trio/Virge chips
+===========================================
+
+
+Supported Hardware
+==================
+
+	S3 Trio32
+	S3 Trio64 (and variants V+, UV+, V2/DX, V2/GX)
+	S3 Virge  (and variants VX, DX, GX and GX2+)
+	S3 Plato/PX		(completely untested)
+	S3 Aurora64V+		(completely untested)
+
+	- only PCI bus supported
+	- only BIOS initialized VGA devices supported
+	- probably not working on big endian
+
+I tested s3fb on Trio64 (plain, V+ and V2/DX) and Virge (plain, VX, DX),
+all on i386.
+
+
+Supported Features
+==================
+
+	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
+	*  8 bpp pseudocolor mode (with 18bit palette)
+	* 16 bpp truecolor modes (RGB 555 and RGB 565)
+	* 24 bpp truecolor mode (RGB 888) on (only on Virge VX)
+	* 32 bpp truecolor mode (RGB 888) on (not on Virge VX)
+	* text mode (activated by bpp = 0)
+	* interlaced mode variant (not available in text mode)
+	* doublescan mode variant (not available in text mode)
+	* panning in both directions
+	* suspend/resume support
+	* DPMS support
+
+Text mode is supported even in higher resolutions, but there is limitation to
+lower pixclocks (maximum usually between 50-60 MHz, depending on specific
+hardware, i get best results from plain S3 Trio32 card - about 75 MHz). This
+limitation is not enforced by driver. Text mode supports 8bit wide fonts only
+(hardware limitation) and 16bit tall fonts (driver limitation). Text mode
+support is broken on S3 Trio64 V2/DX.
+
+There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
+packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
+with interleaved planes (1 byte interleave), MSB first. Both modes support
+8bit wide fonts only (driver limitation).
+
+Suspend/resume works on systems that initialize video card during resume and
+if device is active (for example used by fbcon).
+
+
+Missing Features
+================
+(alias TODO list)
+
+	* secondary (not initialized by BIOS) device support
+	* big endian support
+	* Zorro bus support
+	* MMIO support
+	* 24 bpp mode support on more cards
+	* support for fontwidths != 8 in 4 bpp modes
+	* support for fontheight != 16 in text mode
+	* composite and external sync (is anyone able to test this?)
+	* hardware cursor
+	* video overlay support
+	* vsync synchronization
+	* feature connector support
+	* acceleration support (8514-like 2D, Virge 3D, busmaster transfers)
+	* better values for some magic registers (performance issues)
+
+
+Known bugs
+==========
+
+	* cursor disable in text mode doesn't work
+	* text mode broken on S3 Trio64 V2/DX
+
+
+--
+Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/Documentation/fb/s3fb.txt b/Documentation/fb/s3fb.txt
deleted file mode 100644
index 2c97770bdbaa..000000000000
--- a/Documentation/fb/s3fb.txt
+++ /dev/null
@@ -1,82 +0,0 @@
-
-	s3fb - fbdev driver for S3 Trio/Virge chips
-	===========================================
-
-
-Supported Hardware
-==================
-
-	S3 Trio32
-	S3 Trio64 (and variants V+, UV+, V2/DX, V2/GX)
-	S3 Virge  (and variants VX, DX, GX and GX2+)
-	S3 Plato/PX		(completely untested)
-	S3 Aurora64V+		(completely untested)
-
-	- only PCI bus supported
-	- only BIOS initialized VGA devices supported
-	- probably not working on big endian
-
-I tested s3fb on Trio64 (plain, V+ and V2/DX) and Virge (plain, VX, DX),
-all on i386.
-
-
-Supported Features
-==================
-
-	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
-	*  8 bpp pseudocolor mode (with 18bit palette)
-	* 16 bpp truecolor modes (RGB 555 and RGB 565)
-	* 24 bpp truecolor mode (RGB 888) on (only on Virge VX)
-	* 32 bpp truecolor mode (RGB 888) on (not on Virge VX)
-	* text mode (activated by bpp = 0)
-	* interlaced mode variant (not available in text mode)
-	* doublescan mode variant (not available in text mode)
-	* panning in both directions
-	* suspend/resume support
-	* DPMS support
-
-Text mode is supported even in higher resolutions, but there is limitation to
-lower pixclocks (maximum usually between 50-60 MHz, depending on specific
-hardware, i get best results from plain S3 Trio32 card - about 75 MHz). This
-limitation is not enforced by driver. Text mode supports 8bit wide fonts only
-(hardware limitation) and 16bit tall fonts (driver limitation). Text mode
-support is broken on S3 Trio64 V2/DX.
-
-There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
-packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
-with interleaved planes (1 byte interleave), MSB first. Both modes support
-8bit wide fonts only (driver limitation).
-
-Suspend/resume works on systems that initialize video card during resume and
-if device is active (for example used by fbcon).
-
-
-Missing Features
-================
-(alias TODO list)
-
-	* secondary (not initialized by BIOS) device support
-   	* big endian support
-	* Zorro bus support
-	* MMIO support
-	* 24 bpp mode support on more cards
-	* support for fontwidths != 8 in 4 bpp modes
-	* support for fontheight != 16 in text mode
-	* composite and external sync (is anyone able to test this?)
-	* hardware cursor
-	* video overlay support
-	* vsync synchronization
-	* feature connector support
-	* acceleration support (8514-like 2D, Virge 3D, busmaster transfers)
-	* better values for some magic registers (performance issues)
-
-
-Known bugs
-==========
-
-	* cursor disable in text mode doesn't work
-	* text mode broken on S3 Trio64 V2/DX
-
-
---
-Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/Documentation/fb/sa1100fb.rst b/Documentation/fb/sa1100fb.rst
new file mode 100644
index 000000000000..67e2650e017d
--- /dev/null
+++ b/Documentation/fb/sa1100fb.rst
@@ -0,0 +1,40 @@
+=================
+What is sa1100fb?
+=================
+
+.. [This file is cloned from VesaFB/matroxfb]
+
+
+This is a driver for a graphic framebuffer for the SA-1100 LCD
+controller.
+
+Configuration
+==============
+
+For most common passive displays, giving the option::
+
+  video=sa1100fb:bpp:<value>,lccr0:<value>,lccr1:<value>,lccr2:<value>,lccr3:<value>
+
+on the kernel command line should be enough to configure the
+controller. The bits per pixel (bpp) value should be 4, 8, 12, or
+16. LCCR values are display-specific and should be computed as
+documented in the SA-1100 Developer's Manual, Section 11.7. Dual-panel
+displays are supported as long as the SDS bit is set in LCCR0; GPIO<9:2>
+are used for the lower panel.
+
+For active displays or displays requiring additional configuration
+(controlling backlights, powering on the LCD, etc.), the command line
+options may not be enough to configure the display. Adding sections to
+sa1100fb_init_fbinfo(), sa1100fb_activate_var(),
+sa1100fb_disable_lcd_controller(), and sa1100fb_enable_lcd_controller()
+will probably be necessary.
+
+Accepted options::
+
+	bpp:<value>	Configure for <value> bits per pixel
+	lccr0:<value>	Configure LCD control register 0 (11.7.3)
+	lccr1:<value>	Configure LCD control register 1 (11.7.4)
+	lccr2:<value>	Configure LCD control register 2 (11.7.5)
+	lccr3:<value>	Configure LCD control register 3 (11.7.6)
+
+Mark Huang <mhuang@livetoy.com>
diff --git a/Documentation/fb/sa1100fb.txt b/Documentation/fb/sa1100fb.txt
deleted file mode 100644
index f1b4220464df..000000000000
--- a/Documentation/fb/sa1100fb.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-[This file is cloned from VesaFB/matroxfb]
-
-What is sa1100fb?
-=================
-
-This is a driver for a graphic framebuffer for the SA-1100 LCD
-controller.
-
-Configuration
-==============
-
-For most common passive displays, giving the option
-
-video=sa1100fb:bpp:<value>,lccr0:<value>,lccr1:<value>,lccr2:<value>,lccr3:<value>
-
-on the kernel command line should be enough to configure the
-controller. The bits per pixel (bpp) value should be 4, 8, 12, or
-16. LCCR values are display-specific and should be computed as
-documented in the SA-1100 Developer's Manual, Section 11.7. Dual-panel
-displays are supported as long as the SDS bit is set in LCCR0; GPIO<9:2>
-are used for the lower panel.
-
-For active displays or displays requiring additional configuration
-(controlling backlights, powering on the LCD, etc.), the command line
-options may not be enough to configure the display. Adding sections to
-sa1100fb_init_fbinfo(), sa1100fb_activate_var(),
-sa1100fb_disable_lcd_controller(), and sa1100fb_enable_lcd_controller()
-will probably be necessary.
-
-Accepted options:
-
-bpp:<value>	Configure for <value> bits per pixel
-lccr0:<value>	Configure LCD control register 0 (11.7.3)
-lccr1:<value>	Configure LCD control register 1 (11.7.4)
-lccr2:<value>	Configure LCD control register 2 (11.7.5)
-lccr3:<value>	Configure LCD control register 3 (11.7.6)
-
---
-Mark Huang <mhuang@livetoy.com>
diff --git a/Documentation/fb/sh7760fb.rst b/Documentation/fb/sh7760fb.rst
new file mode 100644
index 000000000000..c3266485f810
--- /dev/null
+++ b/Documentation/fb/sh7760fb.rst
@@ -0,0 +1,130 @@
+================================================
+SH7760/SH7763 integrated LCDC Framebuffer driver
+================================================
+
+0. Overview
+-----------
+The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
+supports (in theory) resolutions ranging from 1x1 to 1024x1024,
+with color depths ranging from 1 to 16 bits, on STN, DSTN and TFT Panels.
+
+Caveats:
+
+* Framebuffer memory must be a large chunk allocated at the top
+  of Area3 (HW requirement). Because of this requirement you should NOT
+  make the driver a module since at runtime it may become impossible to
+  get a large enough contiguous chunk of memory.
+
+* The driver does not support changing resolution while loaded
+  (displays aren't hotpluggable anyway)
+
+* Heavy flickering may be observed
+  a) if you're using 15/16bit color modes at >= 640x480 px resolutions,
+  b) during PCMCIA (or any other slow bus) activity.
+
+* Rotation works only 90degress clockwise, and only if horizontal
+  resolution is <= 320 pixels.
+
+Files:
+	- drivers/video/sh7760fb.c
+	- include/asm-sh/sh7760fb.h
+	- Documentation/fb/sh7760fb.rst
+
+1. Platform setup
+-----------------
+SH7760:
+ Video data is fetched via the DMABRG DMA engine, so you have to
+ configure the SH DMAC for DMABRG mode (write 0x94808080 to the
+ DMARSRA register somewhere at boot).
+
+ PFC registers PCCR and PCDR must be set to peripheral mode.
+ (write zeros to both).
+
+The driver does NOT do the above for you since board setup is, well, job
+of the board setup code.
+
+2. Panel definitions
+--------------------
+The LCDC must explicitly be told about the type of LCD panel
+attached.  Data must be wrapped in a "struct sh7760fb_platdata" and
+passed to the driver as platform_data.
+
+Suggest you take a closer look at the SH7760 Manual, Section 30.
+(http://documentation.renesas.com/eng/products/mpumcu/e602291_sh7760.pdf)
+
+The following code illustrates what needs to be done to
+get the framebuffer working on a 640x480 TFT::
+
+  #include <linux/fb.h>
+  #include <asm/sh7760fb.h>
+
+  /*
+   * NEC NL6440bc26-01 640x480 TFT
+   * dotclock 25175 kHz
+   * Xres                640     Yres            480
+   * Htotal      800     Vtotal          525
+   * HsynStart   656     VsynStart       490
+   * HsynLenn    30      VsynLenn        2
+   *
+   * The linux framebuffer layer does not use the syncstart/synclen
+   * values but right/left/upper/lower margin values. The comments
+   * for the x_margin explain how to calculate those from given
+   * panel sync timings.
+   */
+  static struct fb_videomode nl6448bc26 = {
+         .name           = "NL6448BC26",
+         .refresh        = 60,
+         .xres           = 640,
+         .yres           = 480,
+         .pixclock       = 39683,        /* in picoseconds! */
+         .hsync_len      = 30,
+         .vsync_len      = 2,
+         .left_margin    = 114,  /* HTOT - (HSYNSLEN + HSYNSTART) */
+         .right_margin   = 16,   /* HSYNSTART - XRES */
+         .upper_margin   = 33,   /* VTOT - (VSYNLEN + VSYNSTART) */
+         .lower_margin   = 10,   /* VSYNSTART - YRES */
+         .sync           = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
+         .vmode          = FB_VMODE_NONINTERLACED,
+         .flag           = 0,
+  };
+
+  static struct sh7760fb_platdata sh7760fb_nl6448 = {
+         .def_mode       = &nl6448bc26,
+         .ldmtr          = LDMTR_TFT_COLOR_16,   /* 16bit TFT panel */
+         .lddfr          = LDDFR_8BPP,           /* we want 8bit output */
+         .ldpmmr         = 0x0070,
+         .ldpspr         = 0x0500,
+         .ldaclnr        = 0,
+         .ldickr         = LDICKR_CLKSRC(LCDC_CLKSRC_EXTERNAL) |
+			 LDICKR_CLKDIV(1),
+         .rotate         = 0,
+         .novsync        = 1,
+         .blank          = NULL,
+  };
+
+  /* SH7760:
+   * 0xFE300800: 256 * 4byte xRGB palette ram
+   * 0xFE300C00: 42 bytes ctrl registers
+   */
+  static struct resource sh7760_lcdc_res[] = {
+         [0] = {
+	       .start  = 0xFE300800,
+	       .end    = 0xFE300CFF,
+	       .flags  = IORESOURCE_MEM,
+         },
+         [1] = {
+	       .start  = 65,
+	       .end    = 65,
+	       .flags  = IORESOURCE_IRQ,
+         },
+  };
+
+  static struct platform_device sh7760_lcdc_dev = {
+         .dev    = {
+	       .platform_data = &sh7760fb_nl6448,
+         },
+         .name           = "sh7760-lcdc",
+         .id             = -1,
+         .resource       = sh7760_lcdc_res,
+         .num_resources  = ARRAY_SIZE(sh7760_lcdc_res),
+  };
diff --git a/Documentation/fb/sh7760fb.txt b/Documentation/fb/sh7760fb.txt
deleted file mode 100644
index b994c3b10549..000000000000
--- a/Documentation/fb/sh7760fb.txt
+++ /dev/null
@@ -1,131 +0,0 @@
-SH7760/SH7763 integrated LCDC Framebuffer driver
-================================================
-
-0. Overview
------------
-The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
-supports (in theory) resolutions ranging from 1x1 to 1024x1024,
-with color depths ranging from 1 to 16 bits, on STN, DSTN and TFT Panels.
-
-Caveats:
-* Framebuffer memory must be a large chunk allocated at the top
-  of Area3 (HW requirement). Because of this requirement you should NOT
-  make the driver a module since at runtime it may become impossible to
-  get a large enough contiguous chunk of memory.
-
-* The driver does not support changing resolution while loaded
-  (displays aren't hotpluggable anyway)
-
-* Heavy flickering may be observed
-  a) if you're using 15/16bit color modes at >= 640x480 px resolutions,
-  b) during PCMCIA (or any other slow bus) activity.
-
-* Rotation works only 90degress clockwise, and only if horizontal
-  resolution is <= 320 pixels.
-
-files:   drivers/video/sh7760fb.c
-        include/asm-sh/sh7760fb.h
-        Documentation/fb/sh7760fb.txt
-
-1. Platform setup
------------------
-SH7760:
- Video data is fetched via the DMABRG DMA engine, so you have to
- configure the SH DMAC for DMABRG mode (write 0x94808080 to the
- DMARSRA register somewhere at boot).
-
- PFC registers PCCR and PCDR must be set to peripheral mode.
- (write zeros to both).
-
-The driver does NOT do the above for you since board setup is, well, job
-of the board setup code.
-
-2. Panel definitions
---------------------
-The LCDC must explicitly be told about the type of LCD panel
-attached.  Data must be wrapped in a "struct sh7760fb_platdata" and
-passed to the driver as platform_data.
-
-Suggest you take a closer look at the SH7760 Manual, Section 30.
-(http://documentation.renesas.com/eng/products/mpumcu/e602291_sh7760.pdf)
-
-The following code illustrates what needs to be done to
-get the framebuffer working on a 640x480 TFT:
-
-====================== cut here ======================================
-
-#include <linux/fb.h>
-#include <asm/sh7760fb.h>
-
-/*
- * NEC NL6440bc26-01 640x480 TFT
- * dotclock 25175 kHz
- * Xres                640     Yres            480
- * Htotal      800     Vtotal          525
- * HsynStart   656     VsynStart       490
- * HsynLenn    30      VsynLenn        2
- *
- * The linux framebuffer layer does not use the syncstart/synclen
- * values but right/left/upper/lower margin values. The comments
- * for the x_margin explain how to calculate those from given
- * panel sync timings.
- */
-static struct fb_videomode nl6448bc26 = {
-       .name           = "NL6448BC26",
-       .refresh        = 60,
-       .xres           = 640,
-       .yres           = 480,
-       .pixclock       = 39683,        /* in picoseconds! */
-       .hsync_len      = 30,
-       .vsync_len      = 2,
-       .left_margin    = 114,  /* HTOT - (HSYNSLEN + HSYNSTART) */
-       .right_margin   = 16,   /* HSYNSTART - XRES */
-       .upper_margin   = 33,   /* VTOT - (VSYNLEN + VSYNSTART) */
-       .lower_margin   = 10,   /* VSYNSTART - YRES */
-       .sync           = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
-       .vmode          = FB_VMODE_NONINTERLACED,
-       .flag           = 0,
-};
-
-static struct sh7760fb_platdata sh7760fb_nl6448 = {
-       .def_mode       = &nl6448bc26,
-       .ldmtr          = LDMTR_TFT_COLOR_16,   /* 16bit TFT panel */
-       .lddfr          = LDDFR_8BPP,           /* we want 8bit output */
-       .ldpmmr         = 0x0070,
-       .ldpspr         = 0x0500,
-       .ldaclnr        = 0,
-       .ldickr         = LDICKR_CLKSRC(LCDC_CLKSRC_EXTERNAL) |
-                         LDICKR_CLKDIV(1),
-       .rotate         = 0,
-       .novsync        = 1,
-       .blank          = NULL,
-};
-
-/* SH7760:
- * 0xFE300800: 256 * 4byte xRGB palette ram
- * 0xFE300C00: 42 bytes ctrl registers
- */
-static struct resource sh7760_lcdc_res[] = {
-       [0] = {
-               .start  = 0xFE300800,
-               .end    = 0xFE300CFF,
-               .flags  = IORESOURCE_MEM,
-       },
-       [1] = {
-               .start  = 65,
-               .end    = 65,
-               .flags  = IORESOURCE_IRQ,
-       },
-};
-
-static struct platform_device sh7760_lcdc_dev = {
-       .dev    = {
-               .platform_data = &sh7760fb_nl6448,
-       },
-       .name           = "sh7760-lcdc",
-       .id             = -1,
-       .resource       = sh7760_lcdc_res,
-       .num_resources  = ARRAY_SIZE(sh7760_lcdc_res),
-};
-
-====================== cut here ======================================
diff --git a/Documentation/fb/sisfb.rst b/Documentation/fb/sisfb.rst
new file mode 100644
index 000000000000..8f4e502ea12e
--- /dev/null
+++ b/Documentation/fb/sisfb.rst
@@ -0,0 +1,160 @@
+==============
+What is sisfb?
+==============
+
+sisfb is a framebuffer device driver for SiS (Silicon Integrated Systems)
+graphics chips. Supported are:
+
+- SiS 300 series: SiS 300/305, 540, 630(S), 730(S)
+- SiS 315 series: SiS 315/H/PRO, 55x, (M)65x, 740, (M)661(F/M)X, (M)741(GX)
+- SiS 330 series: SiS 330 ("Xabre"), (M)760
+
+
+Why do I need a framebuffer driver?
+===================================
+
+sisfb is eg. useful if you want a high-resolution text console. Besides that,
+sisfb is required to run DirectFB (which comes with an additional, dedicated
+driver for the 315 series).
+
+On the 300 series, sisfb on kernels older than 2.6.3 furthermore plays an
+important role in connection with DRM/DRI: Sisfb manages the memory heap
+used by DRM/DRI for 3D texture and other data. This memory management is
+required for using DRI/DRM.
+
+Kernels >= around 2.6.3 do not need sisfb any longer for DRI/DRM memory
+management. The SiS DRM driver has been updated and features a memory manager
+of its own (which will be used if sisfb is not compiled). So unless you want
+a graphical console, you don't need sisfb on kernels >=2.6.3.
+
+Sidenote: Since this seems to be a commonly made mistake: sisfb and vesafb
+cannot be active at the same time! Do only select one of them in your kernel
+configuration.
+
+
+How are parameters passed to sisfb?
+===================================
+
+Well, it depends: If compiled statically into the kernel, use lilo's append
+statement to add the parameters to the kernel command line. Please see lilo's
+(or GRUB's) documentation for more information. If sisfb is a kernel module,
+parameters are given with the modprobe (or insmod) command.
+
+Example for sisfb as part of the static kernel: Add the following line to your
+lilo.conf::
+
+     append="video=sisfb:mode:1024x768x16,mem:12288,rate:75"
+
+Example for sisfb as a module: Start sisfb by typing::
+
+     modprobe sisfb mode=1024x768x16 rate=75 mem=12288
+
+A common mistake is that folks use a wrong parameter format when using the
+driver compiled into the kernel. Please note: If compiled into the kernel,
+the parameter format is video=sisfb:mode:none or video=sisfb:mode:1024x768x16
+(or whatever mode you want to use, alternatively using any other format
+described above or the vesa keyword instead of mode). If compiled as a module,
+the parameter format reads mode=none or mode=1024x768x16 (or whatever mode you
+want to use). Using a "=" for a ":" (and vice versa) is a huge difference!
+Additionally: If you give more than one argument to the in-kernel sisfb, the
+arguments are separated with ",". For example::
+
+   video=sisfb:mode:1024x768x16,rate:75,mem:12288
+
+
+How do I use it?
+================
+
+Preface statement: This file only covers very little of the driver's
+capabilities and features. Please refer to the author's and maintainer's
+website at http://www.winischhofer.net/linuxsisvga.shtml for more
+information. Additionally, "modinfo sisfb" gives an overview over all
+supported options including some explanation.
+
+The desired display mode can be specified using the keyword "mode" with
+a parameter in one of the following formats:
+
+  - XxYxDepth or
+  - XxY-Depth or
+  - XxY-Depth@Rate or
+  - XxY
+  - or simply use the VESA mode number in hexadecimal or decimal.
+
+For example: 1024x768x16, 1024x768-16@75, 1280x1024-16. If no depth is
+specified, it defaults to 8. If no rate is given, it defaults to 60Hz. Depth 32
+means 24bit color depth (but 32 bit framebuffer depth, which is not relevant
+to the user).
+
+Additionally, sisfb understands the keyword "vesa" followed by a VESA mode
+number in decimal or hexadecimal. For example: vesa=791 or vesa=0x117. Please
+use either "mode" or "vesa" but not both.
+
+Linux 2.4 only: If no mode is given, sisfb defaults to "no mode" (mode=none) if
+compiled as a module; if sisfb is statically compiled into the kernel, it
+defaults to 800x600x8 unless CRT2 type is LCD, in which case the LCD's native
+resolution is used. If you want to switch to a different mode, use the fbset
+shell command.
+
+Linux 2.6 only: If no mode is given, sisfb defaults to 800x600x8 unless CRT2
+type is LCD, in which case it defaults to the LCD's native resolution. If
+you want to switch to another mode, use the stty shell command.
+
+You should compile in both vgacon (to boot if you remove you SiS card from
+your system) and sisfb (for graphics mode). Under Linux 2.6, also "Framebuffer
+console support" (fbcon) is needed for a graphical console.
+
+You should *not* compile-in vesafb. And please do not use the "vga=" keyword
+in lilo's or grub's configuration file; mode selection is done using the
+"mode" or "vesa" keywords as a parameter. See above and below.
+
+
+X11
+===
+
+If using XFree86 or X.org, it is recommended that you don't use the "fbdev"
+driver but the dedicated "sis" X driver. The "sis" X driver and sisfb are
+developed by the same person (Thomas Winischhofer) and cooperate well with
+each other.
+
+
+SVGALib
+=======
+
+SVGALib, if directly accessing the hardware, never restores the screen
+correctly, especially on laptops or if the output devices are LCD or TV.
+Therefore, use the chipset "FBDEV" in SVGALib configuration. This will make
+SVGALib use the framebuffer device for mode switches and restoration.
+
+
+Configuration
+=============
+
+(Some) accepted options:
+
+=========  ==================================================================
+off        Disable sisfb. This option is only understood if sisfb is
+	   in-kernel, not a module.
+mem:X      size of memory for the console, rest will be used for DRI/DRM. X
+	   is in kilobytes. On 300 series, the default is 4096, 8192 or
+	   16384 (each in kilobyte) depending on how much video ram the card
+	   has. On 315/330 series, the default is the maximum available ram
+	   (since DRI/DRM is not supported for these chipsets).
+noaccel    do not use 2D acceleration engine. (Default: use acceleration)
+noypan     disable y-panning and scroll by redrawing the entire screen.
+	   This is much slower than y-panning. (Default: use y-panning)
+vesa:X     selects startup videomode. X is number from 0 to 0x1FF and
+	   represents the VESA mode number (can be given in decimal or
+	   hexadecimal form, the latter prefixed with "0x").
+mode:X     selects startup videomode. Please see above for the format of
+	   "X".
+=========  ==================================================================
+
+Boolean options such as "noaccel" or "noypan" are to be given without a
+parameter if sisfb is in-kernel (for example "video=sisfb:noypan). If
+sisfb is a module, these are to be set to 1 (for example "modprobe sisfb
+noypan=1").
+
+
+Thomas Winischhofer <thomas@winischhofer.net>
+
+May 27, 2004
diff --git a/Documentation/fb/sisfb.txt b/Documentation/fb/sisfb.txt
deleted file mode 100644
index 2e68e503e72f..000000000000
--- a/Documentation/fb/sisfb.txt
+++ /dev/null
@@ -1,158 +0,0 @@
-
-What is sisfb?
-==============
-
-sisfb is a framebuffer device driver for SiS (Silicon Integrated Systems)
-graphics chips. Supported are:
-
-- SiS 300 series: SiS 300/305, 540, 630(S), 730(S)
-- SiS 315 series: SiS 315/H/PRO, 55x, (M)65x, 740, (M)661(F/M)X, (M)741(GX)
-- SiS 330 series: SiS 330 ("Xabre"), (M)760
-
-
-Why do I need a framebuffer driver?
-===================================
-
-sisfb is eg. useful if you want a high-resolution text console. Besides that,
-sisfb is required to run DirectFB (which comes with an additional, dedicated
-driver for the 315 series).
-
-On the 300 series, sisfb on kernels older than 2.6.3 furthermore plays an
-important role in connection with DRM/DRI: Sisfb manages the memory heap
-used by DRM/DRI for 3D texture and other data. This memory management is
-required for using DRI/DRM.
-
-Kernels >= around 2.6.3 do not need sisfb any longer for DRI/DRM memory
-management. The SiS DRM driver has been updated and features a memory manager
-of its own (which will be used if sisfb is not compiled). So unless you want
-a graphical console, you don't need sisfb on kernels >=2.6.3.
-
-Sidenote: Since this seems to be a commonly made mistake: sisfb and vesafb
-cannot be active at the same time! Do only select one of them in your kernel
-configuration.
-
-
-How are parameters passed to sisfb?
-===================================
-
-Well, it depends: If compiled statically into the kernel, use lilo's append
-statement to add the parameters to the kernel command line. Please see lilo's
-(or GRUB's) documentation for more information. If sisfb is a kernel module,
-parameters are given with the modprobe (or insmod) command.
-
-Example for sisfb as part of the static kernel: Add the following line to your
-lilo.conf:
-
-     append="video=sisfb:mode:1024x768x16,mem:12288,rate:75"
-
-Example for sisfb as a module: Start sisfb by typing
-
-     modprobe sisfb mode=1024x768x16 rate=75 mem=12288
-
-A common mistake is that folks use a wrong parameter format when using the
-driver compiled into the kernel. Please note: If compiled into the kernel,
-the parameter format is video=sisfb:mode:none or video=sisfb:mode:1024x768x16
-(or whatever mode you want to use, alternatively using any other format
-described above or the vesa keyword instead of mode). If compiled as a module,
-the parameter format reads mode=none or mode=1024x768x16 (or whatever mode you
-want to use). Using a "=" for a ":" (and vice versa) is a huge difference!
-Additionally: If you give more than one argument to the in-kernel sisfb, the
-arguments are separated with ",". For example:
-
-   video=sisfb:mode:1024x768x16,rate:75,mem:12288
-
-
-How do I use it?
-================
-
-Preface statement: This file only covers very little of the driver's
-capabilities and features. Please refer to the author's and maintainer's
-website at http://www.winischhofer.net/linuxsisvga.shtml for more
-information. Additionally, "modinfo sisfb" gives an overview over all
-supported options including some explanation.
-
-The desired display mode can be specified using the keyword "mode" with
-a parameter in one of the following formats:
-  - XxYxDepth or
-  - XxY-Depth or
-  - XxY-Depth@Rate or
-  - XxY
-  - or simply use the VESA mode number in hexadecimal or decimal.
-
-For example: 1024x768x16, 1024x768-16@75, 1280x1024-16. If no depth is
-specified, it defaults to 8. If no rate is given, it defaults to 60Hz. Depth 32
-means 24bit color depth (but 32 bit framebuffer depth, which is not relevant
-to the user).
-
-Additionally, sisfb understands the keyword "vesa" followed by a VESA mode
-number in decimal or hexadecimal. For example: vesa=791 or vesa=0x117. Please
-use either "mode" or "vesa" but not both.
-
-Linux 2.4 only: If no mode is given, sisfb defaults to "no mode" (mode=none) if
-compiled as a module; if sisfb is statically compiled into the kernel, it
-defaults to 800x600x8 unless CRT2 type is LCD, in which case the LCD's native
-resolution is used. If you want to switch to a different mode, use the fbset
-shell command.
-
-Linux 2.6 only: If no mode is given, sisfb defaults to 800x600x8 unless CRT2
-type is LCD, in which case it defaults to the LCD's native resolution. If
-you want to switch to another mode, use the stty shell command.
-
-You should compile in both vgacon (to boot if you remove you SiS card from
-your system) and sisfb (for graphics mode). Under Linux 2.6, also "Framebuffer
-console support" (fbcon) is needed for a graphical console.
-
-You should *not* compile-in vesafb. And please do not use the "vga=" keyword
-in lilo's or grub's configuration file; mode selection is done using the
-"mode" or "vesa" keywords as a parameter. See above and below.
-
-
-X11
-===
-
-If using XFree86 or X.org, it is recommended that you don't use the "fbdev"
-driver but the dedicated "sis" X driver. The "sis" X driver and sisfb are
-developed by the same person (Thomas Winischhofer) and cooperate well with
-each other.
-
-
-SVGALib
-=======
-
-SVGALib, if directly accessing the hardware, never restores the screen
-correctly, especially on laptops or if the output devices are LCD or TV.
-Therefore, use the chipset "FBDEV" in SVGALib configuration. This will make
-SVGALib use the framebuffer device for mode switches and restoration.
-
-
-Configuration
-=============
-
-(Some) accepted options:
-
-off      - Disable sisfb. This option is only understood if sisfb is
-           in-kernel, not a module.
-mem:X    - size of memory for the console, rest will be used for DRI/DRM. X
-           is in kilobytes. On 300 series, the default is 4096, 8192 or
-	   16384 (each in kilobyte) depending on how much video ram the card
-           has. On 315/330 series, the default is the maximum available ram
-	   (since DRI/DRM is not supported for these chipsets).
-noaccel  - do not use 2D acceleration engine. (Default: use acceleration)
-noypan   - disable y-panning and scroll by redrawing the entire screen.
-           This is much slower than y-panning. (Default: use y-panning)
-vesa:X   - selects startup videomode. X is number from 0 to 0x1FF and
-           represents the VESA mode number (can be given in decimal or
-	   hexadecimal form, the latter prefixed with "0x").
-mode:X   - selects startup videomode. Please see above for the format of
-           "X".
-
-Boolean options such as "noaccel" or "noypan" are to be given without a
-parameter if sisfb is in-kernel (for example "video=sisfb:noypan). If
-sisfb is a module, these are to be set to 1 (for example "modprobe sisfb
-noypan=1").
-
---
-Thomas Winischhofer <thomas@winischhofer.net>
-May 27, 2004
-
-
diff --git a/Documentation/fb/sm501.rst b/Documentation/fb/sm501.rst
new file mode 100644
index 000000000000..03e02c8042a7
--- /dev/null
+++ b/Documentation/fb/sm501.rst
@@ -0,0 +1,15 @@
+=======
+sm501fb
+=======
+
+Configuration:
+
+You can pass the following kernel command line options to sm501
+videoframebuffer::
+
+	sm501fb.bpp=	SM501 Display driver:
+			Specify bits-per-pixel if not specified by 'mode'
+
+	sm501fb.mode=	SM501 Display driver:
+			Specify resolution as
+			"<xres>x<yres>[-<bpp>][@<refresh>]"
diff --git a/Documentation/fb/sm501.txt b/Documentation/fb/sm501.txt
deleted file mode 100644
index 187f3b3ccb6c..000000000000
--- a/Documentation/fb/sm501.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-Configuration:
-
-You can pass the following kernel command line options to sm501 videoframebuffer:
-
-	sm501fb.bpp=	SM501 Display driver:
-			Specify bits-per-pixel if not specified by 'mode'
-
-	sm501fb.mode=	SM501 Display driver:
-			Specify resolution as
-			"<xres>x<yres>[-<bpp>][@<refresh>]"
diff --git a/Documentation/fb/sm712fb.rst b/Documentation/fb/sm712fb.rst
new file mode 100644
index 000000000000..994dad3b0238
--- /dev/null
+++ b/Documentation/fb/sm712fb.rst
@@ -0,0 +1,35 @@
+================
+What is sm712fb?
+================
+
+This is a graphics framebuffer driver for Silicon Motion SM712 based processors.
+
+How to use it?
+==============
+
+Switching modes is done using the video=sm712fb:... boot parameter.
+
+If you want, for example, enable a resolution of 1280x1024x24bpp you should
+pass to the kernel this command line: "video=sm712fb:0x31B".
+
+You should not compile-in vesafb.
+
+Currently supported video modes are:
+
+Graphic modes
+-------------
+
+===  =======  =======  ========  =========
+bpp  640x480  800x600  1024x768  1280x1024
+===  =======  =======  ========  =========
+  8  0x301    0x303    0x305     0x307
+ 16  0x311    0x314    0x317     0x31A
+ 24  0x312    0x315    0x318     0x31B
+===  =======  =======  ========  =========
+
+Missing Features
+================
+(alias TODO list)
+
+	* 2D acceleratrion
+	* dual-head support
diff --git a/Documentation/fb/sm712fb.txt b/Documentation/fb/sm712fb.txt
deleted file mode 100644
index c388442edf51..000000000000
--- a/Documentation/fb/sm712fb.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-What is sm712fb?
-=================
-
-This is a graphics framebuffer driver for Silicon Motion SM712 based processors.
-
-How to use it?
-==============
-
-Switching modes is done using the video=sm712fb:... boot parameter.
-
-If you want, for example, enable a resolution of 1280x1024x24bpp you should
-pass to the kernel this command line: "video=sm712fb:0x31B".
-
-You should not compile-in vesafb.
-
-Currently supported video modes are:
-
-[Graphic modes]
-
-bpp | 640x480  800x600  1024x768  1280x1024
-----+--------------------------------------------
-  8 | 0x301    0x303    0x305    0x307
- 16 | 0x311    0x314    0x317    0x31A
- 24 | 0x312    0x315    0x318    0x31B
-
-Missing Features
-================
-(alias TODO list)
-
-	* 2D acceleratrion
-	* dual-head support
diff --git a/Documentation/fb/sstfb.rst b/Documentation/fb/sstfb.rst
new file mode 100644
index 000000000000..8e8c1b940359
--- /dev/null
+++ b/Documentation/fb/sstfb.rst
@@ -0,0 +1,207 @@
+=====
+sstfb
+=====
+
+Introduction
+============
+
+This is a frame buffer device driver for 3dfx' Voodoo Graphics
+(aka voodoo 1, aka sst1) and Voodoo² (aka Voodoo 2, aka CVG) based
+video boards. It's highly experimental code, but is guaranteed to work
+on my computer, with my "Maxi Gamer 3D" and "Maxi Gamer 3d²" boards,
+and with me "between chair and keyboard". Some people tested other
+combinations and it seems that it works.
+The main page is located at <http://sstfb.sourceforge.net>, and if
+you want the latest version, check out the CVS, as the driver is a work
+in progress, I feel uncomfortable with releasing tarballs of something
+not completely working...Don't worry, it's still more than usable
+(I eat my own dog food)
+
+Please read the Bug section, and report any success or failure to me
+(Ghozlane Toumi <gtoumi@laposte.net>).
+BTW, If you have only one monitor , and you don't feel like playing
+with the vga passthrou cable, I can only suggest borrowing a screen
+somewhere...
+
+
+Installation
+============
+
+This driver (should) work on ix86, with "late" 2.2.x kernel (tested
+with x = 19) and "recent" 2.4.x kernel, as a module or compiled in.
+It has been included in mainstream kernel since the infamous 2.4.10.
+You can apply the patches found in `sstfb/kernel/*-2.{2|4}.x.patch`,
+and copy sstfb.c to linux/drivers/video/, or apply a single patch,
+`sstfb/patch-2.{2|4}.x-sstfb-yymmdd` to your linux source tree.
+
+Then configure your kernel as usual: choose "m" or "y" to 3Dfx Voodoo
+Graphics in section "console". Compile, install, have fun... and please
+drop me a report :)
+
+
+Module Usage
+============
+
+.. warning::
+
+       #. You should read completely this section before issuing any command.
+
+       #. If you have only one monitor to play with, once you insmod the
+	  module, the 3dfx takes control of the output, so you'll have to
+	  plug the monitor to the "normal" video board in order to issue
+	  the commands, or you can blindly use sst_dbg_vgapass
+	  in the tools directory (See Tools). The latest solution is pass the
+	  parameter vgapass=1 when insmodding the driver. (See Kernel/Modules
+	  Options)
+
+Module insertion
+----------------
+
+       #. insmod sstfb.o
+
+	  you should see some strange output from the board:
+	  a big blue square, a green and a red small squares and a vertical
+	  white rectangle. why? the function's name is self-explanatory:
+	  "sstfb_test()"...
+	  (if you don't have a second monitor, you'll have to plug your monitor
+	  directly to the 2D videocard to see what you're typing)
+
+       #. con2fb /dev/fbx /dev/ttyx
+
+	  bind a tty to the new frame buffer. if you already have a frame
+	  buffer driver, the voodoo fb will likely be /dev/fb1. if not,
+	  the device will be /dev/fb0. You can check this by doing a
+	  cat /proc/fb. You can find a copy of con2fb in tools/ directory.
+	  if you don't have another fb device, this step is superfluous,
+	  as the console subsystem automagicaly binds ttys to the fb.
+       #. switch to the virtual console you just mapped. "tadaaa" ...
+
+Module removal
+--------------
+
+       #. con2fb /dev/fbx /dev/ttyx
+
+	  bind the tty to the old frame buffer so the module can be removed.
+	  (how does it work with vgacon ? short answer : it doesn't work)
+
+       #. rmmod sstfb
+
+
+Kernel/Modules Options
+----------------------
+
+You can pass some options to the sstfb module, and via the kernel
+command line when the driver is compiled in:
+for module : insmod sstfb.o option1=value1 option2=value2 ...
+in kernel :  video=sstfb:option1,option2:value2,option3 ...
+
+sstfb supports the following options:
+
+=============== =============== ===============================================
+Module		Kernel		Description
+=============== =============== ===============================================
+vgapass=0	vganopass	Enable or disable VGA passthrou cable.
+vgapass=1	vgapass		When enabled, the monitor will get the signal
+				from the VGA board and not from the voodoo.
+
+				Default: nopass
+
+mem=x		mem:x		Force frame buffer memory in MiB
+				allowed values: 0, 1, 2, 4.
+
+				Default: 0 (= autodetect)
+
+inverse=1	inverse		Supposed to enable inverse console.
+				doesn't work yet...
+
+clipping=1	clipping	Enable or disable clipping.
+clipping=0	noclipping	With clipping enabled, all offscreen
+				reads and writes are discarded.
+
+				Default: enable clipping.
+
+gfxclk=x	gfxclk:x	Force graphic clock frequency (in MHz).
+				Be careful with this option, it may be
+				DANGEROUS.
+
+				Default: auto
+
+					- 50Mhz for Voodoo 1,
+					- 75MHz for Voodoo 2.
+
+slowpci=1	fastpci		Enable or disable fast PCI read/writes.
+slowpci=1	slowpci		Default : fastpci
+
+dev=x		dev:x		Attach the driver to device number x.
+				0 is the first compatible board (in
+				lspci order)
+=============== =============== ===============================================
+
+Tools
+=====
+
+These tools are mostly for debugging purposes, but you can
+find some of these interesting:
+
+- `con2fb`, maps a tty to a fbramebuffer::
+
+	con2fb /dev/fb1 /dev/tty5
+
+- `sst_dbg_vgapass`, changes vga passthrou. You have to recompile the
+  driver with SST_DEBUG and SST_DEBUG_IOCTL set to 1::
+
+	sst_dbg_vgapass /dev/fb1 1 (enables vga cable)
+	sst_dbg_vgapass /dev/fb1 0 (disables vga cable)
+
+- `glide_reset`, resets the voodoo using glide
+  use this after rmmoding sstfb, if the module refuses to
+  reinsert.
+
+Bugs
+====
+
+- DO NOT use glide while the sstfb module is in, you'll most likely
+  hang your computer.
+- If you see some artefacts (pixels not cleaning and stuff like that),
+  try turning off clipping (clipping=0), and/or using slowpci
+- the driver don't detect the 4Mb frame buffer voodoos, it seems that
+  the 2 last Mbs wrap around. looking into that .
+- The driver is 16 bpp only, 24/32 won't work.
+- The driver is not your_favorite_toy-safe. this includes SMP...
+
+	[Actually from inspection it seems to be safe - Alan]
+
+- When using XFree86 FBdev (X over fbdev) you may see strange color
+  patterns at the border of your windows (the pixels lose the lowest
+  byte -> basically the blue component and some of the green). I'm unable
+  to reproduce this with XFree86-3.3, but one of the testers has this
+  problem with XFree86-4. Apparently recent Xfree86-4.x solve this
+  problem.
+- I didn't really test changing the palette, so you may find some weird
+  things when playing with that.
+- Sometimes the driver will not recognise the DAC, and the
+  initialisation will fail. This is specifically true for
+  voodoo 2 boards, but it should be solved in recent versions. Please
+  contact me.
+- The 24/32 is not likely to work anytime soon, knowing that the
+  hardware does ... unusual things in 24/32 bpp.
+- When used with another video board, current limitations of the linux
+  console subsystem can cause some troubles, specifically, you should
+  disable software scrollback, as it can oops badly ...
+
+Todo
+====
+
+- Get rid of the previous paragraph.
+- Buy more coffee.
+- test/port to other arch.
+- try to add panning using tweeks with front and back buffer .
+- try to implement accel on voodoo2, this board can actually do a
+  lot in 2D even if it was sold as a 3D only board ...
+
+Ghozlane Toumi <gtoumi@laposte.net>
+
+
+Date: 2002/05/09 20:11:45
+
+http://sstfb.sourceforge.net/README
diff --git a/Documentation/fb/sstfb.txt b/Documentation/fb/sstfb.txt
deleted file mode 100644
index 13db1075e4a5..000000000000
--- a/Documentation/fb/sstfb.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-
-Introduction
-
-	  This is a frame buffer device driver for 3dfx' Voodoo Graphics 
-	(aka voodoo 1, aka sst1) and Voodoo² (aka Voodoo 2, aka CVG) based 
-	video boards. It's highly experimental code, but is guaranteed to work
-	on my computer, with my "Maxi Gamer 3D" and "Maxi Gamer 3d²" boards,
-	and with me "between chair and keyboard". Some people tested other
-	combinations and it seems that it works.
-	  The main page is located at <http://sstfb.sourceforge.net>, and if
-	you want the latest version, check out the CVS, as the driver is a work
-	in progress, I feel uncomfortable with releasing tarballs of something
-	not completely working...Don't worry, it's still more than usable
-	(I eat my own dog food)
-
-	  Please read the Bug section, and report any success or failure to me
-	(Ghozlane Toumi <gtoumi@laposte.net>).
-	  BTW, If you have only one monitor , and you don't feel like playing
-	with the vga passthrou cable, I can only suggest borrowing a screen
-	somewhere... 
-
-
-Installation 
-
-	  This driver (should) work on ix86, with "late" 2.2.x kernel (tested
-	with x = 19) and "recent" 2.4.x kernel, as a module or compiled in.
-	  It has been included in mainstream kernel since the infamous 2.4.10.
-	  You can apply the patches found in sstfb/kernel/*-2.{2|4}.x.patch,
-	and copy sstfb.c to linux/drivers/video/, or apply a single patch, 
-	sstfb/patch-2.{2|4}.x-sstfb-yymmdd to your linux source tree.
-
-	  Then configure your kernel as usual: choose "m" or "y" to 3Dfx Voodoo
-	Graphics in section "console". Compile, install, have fun... and please
-	drop me a report :)
-
-
-Module Usage
-	
-	Warnings.
-	# You should read completely this section before issuing any command.
-	# If you have only one monitor to play with, once you insmod the
-	  module, the 3dfx takes control of the output, so you'll have to
-	  plug the monitor to the "normal" video board in order to issue
-	  the commands, or you can blindly use sst_dbg_vgapass
-          in the tools directory (See Tools). The latest solution is pass the
-	  parameter vgapass=1 when insmodding the driver. (See Kernel/Modules
-	  Options)
-
-	Module insertion:
-	# insmod sstfb.o
-	  you should see some strange output from the board: 
-	  a big blue square, a green and a red small squares and a vertical
-	  white rectangle. why? the function's name is self-explanatory:
-	  "sstfb_test()"...
-	  (if you don't have a second monitor, you'll have to plug your monitor
-	  directly to the 2D videocard to see what you're typing)
-	# con2fb /dev/fbx /dev/ttyx
-	  bind a tty to the new frame buffer. if you already have a frame
-	  buffer driver, the voodoo fb will likely be /dev/fb1. if not, 
-	  the device will be /dev/fb0. You can check this by doing a 
-	  cat /proc/fb. You can find a copy of con2fb in tools/ directory.
-	  if you don't have another fb device, this step is superfluous,
-	  as the console subsystem automagicaly binds ttys to the fb.
-	# switch to the virtual console you just mapped. "tadaaa" ...
-
-	Module removal:
-	# con2fb /dev/fbx /dev/ttyx
-	  bind the tty to the old frame buffer so the module can be removed.
-	  (how does it work with vgacon ? short answer : it doesn't work)
-	# rmmod sstfb
-
-
-Kernel/Modules Options
-
-	You can pass some options to the sstfb module, and via the kernel 
-	command line when the driver is compiled in:
-	for module : insmod sstfb.o option1=value1 option2=value2 ...
-	in kernel :  video=sstfb:option1,option2:value2,option3 ...
-	
-	sstfb supports the following options :
-
-Module		Kernel		Description
-
-vgapass=0	vganopass	Enable or disable VGA passthrou cable.
-vgapass=1	vgapass		When enabled, the monitor will get the signal
-				from the VGA board and not from the voodoo.
-				Default: nopass
-
-mem=x		mem:x		Force frame buffer memory in MiB
-				allowed values: 0, 1, 2, 4.
-				Default: 0 (= autodetect)
-
-inverse=1	inverse		Supposed to enable inverse console.
-				doesn't work yet...
-
-clipping=1	clipping	Enable or disable clipping.
-clipping=0	noclipping	With clipping enabled, all offscreen
-				reads and writes are discarded.
-				Default: enable clipping.
-
-gfxclk=x	gfxclk:x	Force graphic clock frequency (in MHz).
-				Be careful with this option, it may be
-				DANGEROUS.
-				Default: auto 
-					50Mhz for Voodoo 1,
-					75MHz for Voodoo 2. 
-
-slowpci=1	fastpci		Enable or disable fast PCI read/writes.
-slowpci=1	slowpci		Default : fastpci
-
-dev=x		dev:x		Attach the driver to device number x.
-				0 is the first compatible board (in 
-				lspci order)
-
-Tools
-
-	These tools are mostly for debugging purposes, but you can 
-	find some of these interesting :
-	 - con2fb , maps a tty to a fbramebuffer .
-		con2fb /dev/fb1 /dev/tty5
-	 - sst_dbg_vgapass , changes vga passthrou. You have to recompile the
-	driver with SST_DEBUG and SST_DEBUG_IOCTL set to 1
-		sst_dbg_vgapass /dev/fb1 1 (enables vga cable)
-		sst_dbg_vgapass /dev/fb1 0 (disables vga cable)
-	 - glide_reset , resets the voodoo using glide
-		use this after rmmoding sstfb, if the module refuses to
-		reinsert .
-
-Bugs
-
-	- DO NOT use glide while the sstfb module is in, you'll most likely
-	hang your computer.
-	- If you see some artefacts (pixels not cleaning and stuff like that), 
-	try turning off clipping (clipping=0), and/or using slowpci
-	- the driver don't detect the 4Mb frame buffer voodoos, it seems that
-	the 2 last Mbs wrap around. looking into that .
-	- The driver is 16 bpp only, 24/32 won't work.
-	- The driver is not your_favorite_toy-safe. this includes SMP...
-          [Actually from inspection it seems to be safe - Alan]
-	- When using XFree86 FBdev (X over fbdev) you may see strange color
-	patterns at the border of your windows (the pixels lose the lowest
-	byte -> basically the blue component and some of the green). I'm unable
-	to reproduce this with XFree86-3.3, but one of the testers has this
-	problem with XFree86-4. Apparently recent Xfree86-4.x solve this
-	problem.
-	- I didn't really test changing the palette, so you may find some weird
-	things when playing with that.
-	- Sometimes the driver will not recognise the DAC, and the
-        initialisation will fail. This is specifically true for
-	voodoo 2 boards, but it should be solved in recent versions. Please
-	contact me.
-	- The 24/32 is not likely to work anytime soon, knowing that the
-	hardware does ... unusual things in 24/32 bpp.
-	- When used with another video board, current limitations of the linux
-	console subsystem can cause some troubles, specifically, you should
-	disable software scrollback, as it can oops badly ...
-
-Todo
-
-	- Get rid of the previous paragraph.
-	- Buy more coffee.
-	- test/port to other arch.
-	- try to add panning using tweeks with front and back buffer .
-	- try to implement accel on voodoo2, this board can actually do a 
-	  lot in 2D even if it was sold as a 3D only board ...
-
-ghoz.
-
--- 
-Ghozlane Toumi <gtoumi@laposte.net>
-
-
-$Date: 2002/05/09 20:11:45 $
-http://sstfb.sourceforge.net/README
diff --git a/Documentation/fb/tgafb.rst b/Documentation/fb/tgafb.rst
new file mode 100644
index 000000000000..0c50d2134aa4
--- /dev/null
+++ b/Documentation/fb/tgafb.rst
@@ -0,0 +1,71 @@
+==============
+What is tgafb?
+==============
+
+This is a driver for DECChip 21030 based graphics framebuffers, a.k.a. TGA
+cards, which are usually found in older Digital Alpha systems. The
+following models are supported:
+
+- ZLxP-E1 (8bpp, 2 MB VRAM)
+- ZLxP-E2 (32bpp, 8 MB VRAM)
+- ZLxP-E3 (32bpp, 16 MB VRAM, Zbuffer)
+
+This version is an almost complete rewrite of the code written by Geert
+Uytterhoeven, which was based on the original TGA console code written by
+Jay Estabrook.
+
+Major new features since Linux 2.0.x:
+
+ * Support for multiple resolutions
+ * Support for fixed-frequency and other oddball monitors
+   (by allowing the video mode to be set at boot time)
+
+User-visible changes since Linux 2.2.x:
+
+ * Sync-on-green is now handled properly
+ * More useful information is printed on bootup
+   (this helps if people run into problems)
+
+This driver does not (yet) support the TGA2 family of framebuffers, so the
+PowerStorm 3D30/4D20 (also known as PBXGB) cards are not supported. These
+can however be used with the standard VGA Text Console driver.
+
+
+Configuration
+=============
+
+You can pass kernel command line options to tgafb with
+`video=tgafb:option1,option2:value2,option3` (multiple options should be
+separated by comma, values are separated from options by `:`).
+
+Accepted options:
+
+==========  ============================================================
+font:X      default font to use. All fonts are supported, including the
+	    SUN12x22 font which is very nice at high resolutions.
+
+mode:X      default video mode. The following video modes are supported:
+	    640x480-60, 800x600-56, 640x480-72, 800x600-60, 800x600-72,
+	    1024x768-60, 1152x864-60, 1024x768-70, 1024x768-76,
+	    1152x864-70, 1280x1024-61, 1024x768-85, 1280x1024-70,
+	    1152x864-84, 1280x1024-76, 1280x1024-85
+==========  ============================================================
+
+
+Known Issues
+============
+
+The XFree86 FBDev server has been reported not to work, since tgafb doesn't do
+mmap(). Running the standard XF86_TGA server from XFree86 3.3.x works fine for
+me, however this server does not do acceleration, which make certain operations
+quite slow. Support for acceleration is being progressively integrated in
+XFree86 4.x.
+
+When running tgafb in resolutions higher than 640x480, on switching VCs from
+tgafb to XF86_TGA 3.3.x, the entire screen is not re-drawn and must be manually
+refreshed. This is an X server problem, not a tgafb problem, and is fixed in
+XFree86 4.0.
+
+Enjoy!
+
+Martin Lucina <mato@kotelna.sk>
diff --git a/Documentation/fb/tgafb.txt b/Documentation/fb/tgafb.txt
deleted file mode 100644
index 250083ada8fb..000000000000
--- a/Documentation/fb/tgafb.txt
+++ /dev/null
@@ -1,69 +0,0 @@
-$Id: tgafb.txt,v 1.1.2.2 2000/04/04 06:50:18 mato Exp $
-
-What is tgafb?
-===============
-
-This is a driver for DECChip 21030 based graphics framebuffers, a.k.a. TGA
-cards, which are usually found in older Digital Alpha systems. The
-following models are supported:
-
-ZLxP-E1 (8bpp, 2 MB VRAM)
-ZLxP-E2 (32bpp, 8 MB VRAM)
-ZLxP-E3 (32bpp, 16 MB VRAM, Zbuffer)
-
-This version is an almost complete rewrite of the code written by Geert
-Uytterhoeven, which was based on the original TGA console code written by
-Jay Estabrook.
-
-Major new features since Linux 2.0.x:
-
- * Support for multiple resolutions
- * Support for fixed-frequency and other oddball monitors 
-   (by allowing the video mode to be set at boot time)
-
-User-visible changes since Linux 2.2.x:
-
- * Sync-on-green is now handled properly
- * More useful information is printed on bootup
-   (this helps if people run into problems)
-
-This driver does not (yet) support the TGA2 family of framebuffers, so the
-PowerStorm 3D30/4D20 (also known as PBXGB) cards are not supported. These
-can however be used with the standard VGA Text Console driver.
-
-
-Configuration
-=============
-
-You can pass kernel command line options to tgafb with
-`video=tgafb:option1,option2:value2,option3' (multiple options should be
-separated by comma, values are separated from options by `:').
-Accepted options:
-
-font:X    - default font to use. All fonts are supported, including the
-            SUN12x22 font which is very nice at high resolutions.
-
-mode:X    - default video mode. The following video modes are supported:
-            640x480-60, 800x600-56, 640x480-72, 800x600-60, 800x600-72, 
-	    1024x768-60, 1152x864-60, 1024x768-70, 1024x768-76,
-	    1152x864-70, 1280x1024-61, 1024x768-85, 1280x1024-70,
-	    1152x864-84, 1280x1024-76, 1280x1024-85
- 
-
-Known Issues
-============
-
-The XFree86 FBDev server has been reported not to work, since tgafb doesn't do
-mmap(). Running the standard XF86_TGA server from XFree86 3.3.x works fine for
-me, however this server does not do acceleration, which make certain operations
-quite slow. Support for acceleration is being progressively integrated in
-XFree86 4.x.
-
-When running tgafb in resolutions higher than 640x480, on switching VCs from
-tgafb to XF86_TGA 3.3.x, the entire screen is not re-drawn and must be manually
-refreshed. This is an X server problem, not a tgafb problem, and is fixed in
-XFree86 4.0.
-
-Enjoy!
-
-Martin Lucina <mato@kotelna.sk>
diff --git a/Documentation/fb/tridentfb.rst b/Documentation/fb/tridentfb.rst
new file mode 100644
index 000000000000..7921c9dee78c
--- /dev/null
+++ b/Documentation/fb/tridentfb.rst
@@ -0,0 +1,78 @@
+=========
+Tridentfb
+=========
+
+Tridentfb is a framebuffer driver for some Trident chip based cards.
+
+The following list of chips is thought to be supported although not all are
+tested:
+
+those from the TGUI series 9440/96XX and with Cyber in their names
+those from the Image series and with Cyber in their names
+those with Blade in their names (Blade3D,CyberBlade...)
+the newer CyberBladeXP family
+
+All families are accelerated. Only PCI/AGP based cards are supported,
+none of the older Tridents.
+The driver supports 8, 16 and 32 bits per pixel depths.
+The TGUI family requires a line length to be power of 2 if acceleration
+is enabled. This means that range of possible resolutions and bpp is
+limited comparing to the range if acceleration is disabled (see list
+of parameters below).
+
+Known bugs:
+
+1. The driver randomly locks up on 3DImage975 chip with acceleration
+   enabled. The same happens in X11 (Xorg).
+2. The ramdac speeds require some more fine tuning. It is possible to
+   switch resolution which the chip does not support at some depths for
+   older chips.
+
+How to use it?
+==============
+
+When booting you can pass the video parameter::
+
+	video=tridentfb
+
+The parameters for tridentfb are concatenated with a ':' as in this example::
+
+	video=tridentfb:800x600-16@75,noaccel
+
+The second level parameters that tridentfb understands are:
+
+========  =====================================================================
+noaccel   turns off acceleration (when it doesn't work for your card)
+
+fp	  use flat panel related stuff
+crt 	  assume monitor is present instead of fp
+
+center 	  for flat panels and resolutions smaller than native size center the
+	  image, otherwise use
+stretch
+
+memsize   integer value in KB, use if your card's memory size is misdetected.
+	  look at the driver output to see what it says when initializing.
+
+memdiff   integer value in KB, should be nonzero if your card reports
+	  more memory than it actually has. For instance mine is 192K less than
+	  detection says in all three BIOS selectable situations 2M, 4M, 8M.
+	  Only use if your video memory is taken from main memory hence of
+	  configurable size. Otherwise use memsize.
+	  If in some modes which barely fit the memory you see garbage
+	  at the bottom this might help by not letting change to that mode
+	  anymore.
+
+nativex   the width in pixels of the flat panel.If you know it (usually 1024
+	  800 or 1280) and it is not what the driver seems to detect use it.
+
+bpp	  bits per pixel (8,16 or 32)
+mode	  a mode name like 800x600-8@75 as described in
+	  Documentation/fb/modedb.rst
+========  =====================================================================
+
+Using insane values for the above parameters will probably result in driver
+misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
+nativex=93)
+
+Contact: jani@astechnix.ro
diff --git a/Documentation/fb/tridentfb.txt b/Documentation/fb/tridentfb.txt
deleted file mode 100644
index 45d9de5b13a3..000000000000
--- a/Documentation/fb/tridentfb.txt
+++ /dev/null
@@ -1,70 +0,0 @@
-Tridentfb is a framebuffer driver for some Trident chip based cards.
-
-The following list of chips is thought to be supported although not all are
-tested:
-
-those from the TGUI series 9440/96XX and with Cyber in their names
-those from the Image series and with Cyber in their names
-those with Blade in their names (Blade3D,CyberBlade...)
-the newer CyberBladeXP family
-
-All families are accelerated. Only PCI/AGP based cards are supported,
-none of the older Tridents.
-The driver supports 8, 16 and 32 bits per pixel depths.
-The TGUI family requires a line length to be power of 2 if acceleration
-is enabled. This means that range of possible resolutions and bpp is
-limited comparing to the range if acceleration is disabled (see list
-of parameters below).
-
-Known bugs:
-1. The driver randomly locks up on 3DImage975 chip with acceleration
-   enabled. The same happens in X11 (Xorg).
-2. The ramdac speeds require some more fine tuning. It is possible to
-   switch resolution which the chip does not support at some depths for
-   older chips.
-
-How to use it?
-==============
-
-When booting you can pass the video parameter.
-video=tridentfb
-
-The parameters for tridentfb are concatenated with a ':' as in this example.
-
-video=tridentfb:800x600-16@75,noaccel
-
-The second level parameters that tridentfb understands are:
-
-noaccel - turns off acceleration (when it doesn't work for your card)
-
-fp	- use flat panel related stuff
-crt 	- assume monitor is present instead of fp
-
-center 	- for flat panels and resolutions smaller than native size center the
-	  image, otherwise use
-stretch
-
-memsize - integer value in KB, use if your card's memory size is misdetected.
-	  look at the driver output to see what it says when initializing.
-
-memdiff - integer value in KB, should be nonzero if your card reports
-	  more memory than it actually has. For instance mine is 192K less than
-	  detection says in all three BIOS selectable situations 2M, 4M, 8M.
-	  Only use if your video memory is taken from main memory hence of
-	  configurable size. Otherwise use memsize.
-	  If in some modes which barely fit the memory you see garbage
-	  at the bottom this might help by not letting change to that mode
-	  anymore.
-
-nativex - the width in pixels of the flat panel.If you know it (usually 1024
-	  800 or 1280) and it is not what the driver seems to detect use it.
-
-bpp	- bits per pixel (8,16 or 32)
-mode	- a mode name like 800x600-8@75 as described in
-	  Documentation/fb/modedb.txt
-
-Using insane values for the above parameters will probably result in driver
-misbehaviour so take care(for instance memsize=12345678 or memdiff=23784 or
-nativex=93)
-
-Contact: jani@astechnix.ro
diff --git a/Documentation/fb/udlfb.rst b/Documentation/fb/udlfb.rst
new file mode 100644
index 000000000000..732b37db3504
--- /dev/null
+++ b/Documentation/fb/udlfb.rst
@@ -0,0 +1,162 @@
+==============
+What is udlfb?
+==============
+
+This is a driver for DisplayLink USB 2.0 era graphics chips.
+
+DisplayLink chips provide simple hline/blit operations with some compression,
+pairing that with a hardware framebuffer (16MB) on the other end of the
+USB wire.  That hardware framebuffer is able to drive the VGA, DVI, or HDMI
+monitor with no CPU involvement until a pixel has to change.
+
+The CPU or other local resource does all the rendering; optionally compares the
+result with a local shadow of the remote hardware framebuffer to identify
+the minimal set of pixels that have changed; and compresses and sends those
+pixels line-by-line via USB bulk transfers.
+
+Because of the efficiency of bulk transfers and a protocol on top that
+does not require any acks - the effect is very low latency that
+can support surprisingly high resolutions with good performance for
+non-gaming and non-video applications.
+
+Mode setting, EDID read, etc are other bulk or control transfers. Mode
+setting is very flexible - able to set nearly arbitrary modes from any timing.
+
+Advantages of USB graphics in general:
+
+ * Ability to add a nearly arbitrary number of displays to any USB 2.0
+   capable system. On Linux, number of displays is limited by fbdev interface
+   (FB_MAX is currently 32). Of course, all USB devices on the same
+   host controller share the same 480Mbs USB 2.0 interface.
+
+Advantages of supporting DisplayLink chips with kernel framebuffer interface:
+
+ * The actual hardware functionality of DisplayLink chips matches nearly
+   one-to-one with the fbdev interface, making the driver quite small and
+   tight relative to the functionality it provides.
+ * X servers and other applications can use the standard fbdev interface
+   from user mode to talk to the device, without needing to know anything
+   about USB or DisplayLink's protocol at all. A "displaylink" X driver
+   and a slightly modified "fbdev" X driver are among those that already do.
+
+Disadvantages:
+
+ * Fbdev's mmap interface assumes a real hardware framebuffer is mapped.
+   In the case of USB graphics, it is just an allocated (virtual) buffer.
+   Writes need to be detected and encoded into USB bulk transfers by the CPU.
+   Accurate damage/changed area notifications work around this problem.
+   In the future, hopefully fbdev will be enhanced with an small standard
+   interface to allow mmap clients to report damage, for the benefit
+   of virtual or remote framebuffers.
+ * Fbdev does not arbitrate client ownership of the framebuffer well.
+ * Fbcon assumes the first framebuffer it finds should be consumed for console.
+ * It's not clear what the future of fbdev is, given the rise of KMS/DRM.
+
+How to use it?
+==============
+
+Udlfb, when loaded as a module, will match against all USB 2.0 generation
+DisplayLink chips (Alex and Ollie family). It will then attempt to read the EDID
+of the monitor, and set the best common mode between the DisplayLink device
+and the monitor's capabilities.
+
+If the DisplayLink device is successful, it will paint a "green screen" which
+means that from a hardware and fbdev software perspective, everything is good.
+
+At that point, a /dev/fb? interface will be present for user-mode applications
+to open and begin writing to the framebuffer of the DisplayLink device using
+standard fbdev calls.  Note that if mmap() is used, by default the user mode
+application must send down damage notifications to trigger repaints of the
+changed regions.  Alternatively, udlfb can be recompiled with experimental
+defio support enabled, to support a page-fault based detection mechanism
+that can work without explicit notification.
+
+The most common client of udlfb is xf86-video-displaylink or a modified
+xf86-video-fbdev X server. These servers have no real DisplayLink specific
+code. They write to the standard framebuffer interface and rely on udlfb
+to do its thing.  The one extra feature they have is the ability to report
+rectangles from the X DAMAGE protocol extension down to udlfb via udlfb's
+damage interface (which will hopefully be standardized for all virtual
+framebuffers that need damage info). These damage notifications allow
+udlfb to efficiently process the changed pixels.
+
+Module Options
+==============
+
+Special configuration for udlfb is usually unnecessary. There are a few
+options, however.
+
+From the command line, pass options to modprobe
+modprobe udlfb fb_defio=0 console=1 shadow=1
+
+Or modify options on the fly at /sys/module/udlfb/parameters directory via
+sudo nano fb_defio
+change the parameter in place, and save the file.
+
+Unplug/replug USB device to apply with new settings
+
+Or for permanent option, create file like /etc/modprobe.d/udlfb.conf with text
+options udlfb fb_defio=0 console=1 shadow=1
+
+Accepted boolean options:
+
+=============== ================================================================
+fb_defio	Make use of the fb_defio (CONFIG_FB_DEFERRED_IO) kernel
+		module to track changed areas of the framebuffer by page faults.
+		Standard fbdev applications that use mmap but that do not
+		report damage, should be able to work with this enabled.
+		Disable when running with X server that supports reporting
+		changed regions via ioctl, as this method is simpler,
+		more stable, and higher performance.
+		default: fb_defio=1
+
+console		Allow fbcon to attach to udlfb provided framebuffers.
+		Can be disabled if fbcon and other clients
+		(e.g. X with --shared-vt) are in conflict.
+		default: console=1
+
+shadow		Allocate a 2nd framebuffer to shadow what's currently across
+		the USB bus in device memory. If any pixels are unchanged,
+		do not transmit. Spends host memory to save USB transfers.
+		Enabled by default. Only disable on very low memory systems.
+		default: shadow=1
+=============== ================================================================
+
+Sysfs Attributes
+================
+
+Udlfb creates several files in /sys/class/graphics/fb?
+Where ? is the sequential framebuffer id of the particular DisplayLink device
+
+======================== ========================================================
+edid			 If a valid EDID blob is written to this file (typically
+			 by a udev rule), then udlfb will use this EDID as a
+			 backup in case reading the actual EDID of the monitor
+			 attached to the DisplayLink device fails. This is
+			 especially useful for fixed panels, etc. that cannot
+			 communicate their capabilities via EDID. Reading
+			 this file returns the current EDID of the attached
+			 monitor (or last backup value written). This is
+			 useful to get the EDID of the attached monitor,
+			 which can be passed to utilities like parse-edid.
+
+metrics_bytes_rendered	 32-bit count of pixel bytes rendered
+
+metrics_bytes_identical  32-bit count of how many of those bytes were found to be
+			 unchanged, based on a shadow framebuffer check
+
+metrics_bytes_sent	 32-bit count of how many bytes were transferred over
+			 USB to communicate the resulting changed pixels to the
+			 hardware. Includes compression and protocol overhead
+
+metrics_cpu_kcycles_used 32-bit count of CPU cycles used in processing the
+			 above pixels (in thousands of cycles).
+
+metrics_reset		 Write-only. Any write to this file resets all metrics
+			 above to zero.  Note that the 32-bit counters above
+			 roll over very quickly. To get reliable results, design
+			 performance tests to start and finish in a very short
+			 period of time (one minute or less is safe).
+======================== ========================================================
+
+Bernie Thompson <bernie@plugable.com>
diff --git a/Documentation/fb/udlfb.txt b/Documentation/fb/udlfb.txt
deleted file mode 100644
index c985cb65dd06..000000000000
--- a/Documentation/fb/udlfb.txt
+++ /dev/null
@@ -1,159 +0,0 @@
-
-What is udlfb?
-===============
-
-This is a driver for DisplayLink USB 2.0 era graphics chips.
-
-DisplayLink chips provide simple hline/blit operations with some compression,
-pairing that with a hardware framebuffer (16MB) on the other end of the
-USB wire.  That hardware framebuffer is able to drive the VGA, DVI, or HDMI
-monitor with no CPU involvement until a pixel has to change.
-
-The CPU or other local resource does all the rendering; optionally compares the
-result with a local shadow of the remote hardware framebuffer to identify
-the minimal set of pixels that have changed; and compresses and sends those
-pixels line-by-line via USB bulk transfers.
-
-Because of the efficiency of bulk transfers and a protocol on top that
-does not require any acks - the effect is very low latency that
-can support surprisingly high resolutions with good performance for
-non-gaming and non-video applications.
-
-Mode setting, EDID read, etc are other bulk or control transfers. Mode
-setting is very flexible - able to set nearly arbitrary modes from any timing.
-
-Advantages of USB graphics in general:
-
- * Ability to add a nearly arbitrary number of displays to any USB 2.0
-   capable system. On Linux, number of displays is limited by fbdev interface
-   (FB_MAX is currently 32). Of course, all USB devices on the same
-   host controller share the same 480Mbs USB 2.0 interface.
-
-Advantages of supporting DisplayLink chips with kernel framebuffer interface:
-
- * The actual hardware functionality of DisplayLink chips matches nearly
-   one-to-one with the fbdev interface, making the driver quite small and
-   tight relative to the functionality it provides.
- * X servers and other applications can use the standard fbdev interface
-   from user mode to talk to the device, without needing to know anything
-   about USB or DisplayLink's protocol at all. A "displaylink" X driver
-   and a slightly modified "fbdev" X driver are among those that already do.
-
-Disadvantages:
-
- * Fbdev's mmap interface assumes a real hardware framebuffer is mapped.
-   In the case of USB graphics, it is just an allocated (virtual) buffer.
-   Writes need to be detected and encoded into USB bulk transfers by the CPU.
-   Accurate damage/changed area notifications work around this problem.
-   In the future, hopefully fbdev will be enhanced with an small standard
-   interface to allow mmap clients to report damage, for the benefit
-   of virtual or remote framebuffers.
- * Fbdev does not arbitrate client ownership of the framebuffer well.
- * Fbcon assumes the first framebuffer it finds should be consumed for console.
- * It's not clear what the future of fbdev is, given the rise of KMS/DRM.
-
-How to use it?
-==============
-
-Udlfb, when loaded as a module, will match against all USB 2.0 generation
-DisplayLink chips (Alex and Ollie family). It will then attempt to read the EDID
-of the monitor, and set the best common mode between the DisplayLink device
-and the monitor's capabilities.
-
-If the DisplayLink device is successful, it will paint a "green screen" which
-means that from a hardware and fbdev software perspective, everything is good.
-
-At that point, a /dev/fb? interface will be present for user-mode applications
-to open and begin writing to the framebuffer of the DisplayLink device using
-standard fbdev calls.  Note that if mmap() is used, by default the user mode
-application must send down damage notifications to trigger repaints of the
-changed regions.  Alternatively, udlfb can be recompiled with experimental
-defio support enabled, to support a page-fault based detection mechanism
-that can work without explicit notification.
-
-The most common client of udlfb is xf86-video-displaylink or a modified
-xf86-video-fbdev X server. These servers have no real DisplayLink specific
-code. They write to the standard framebuffer interface and rely on udlfb
-to do its thing.  The one extra feature they have is the ability to report
-rectangles from the X DAMAGE protocol extension down to udlfb via udlfb's
-damage interface (which will hopefully be standardized for all virtual
-framebuffers that need damage info). These damage notifications allow
-udlfb to efficiently process the changed pixels.
-
-Module Options
-==============
-
-Special configuration for udlfb is usually unnecessary. There are a few
-options, however.
-
-From the command line, pass options to modprobe
-modprobe udlfb fb_defio=0 console=1 shadow=1
-
-Or modify options on the fly at /sys/module/udlfb/parameters directory via
-sudo nano fb_defio
-change the parameter in place, and save the file.
-
-Unplug/replug USB device to apply with new settings
-
-Or for permanent option, create file like /etc/modprobe.d/udlfb.conf with text
-options udlfb fb_defio=0 console=1 shadow=1
-
-Accepted boolean options:
-
-fb_defio	Make use of the fb_defio (CONFIG_FB_DEFERRED_IO) kernel
-		module to track changed areas of the framebuffer by page faults.
-		Standard fbdev applications that use mmap but that do not
-		report damage, should be able to work with this enabled.
-		Disable when running with X server that supports reporting
-		changed regions via ioctl, as this method is simpler,
-		more stable, and higher performance.
-		default: fb_defio=1
-
-console	Allow fbcon to attach to udlfb provided framebuffers.
-		Can be disabled if fbcon and other clients
-		(e.g. X with --shared-vt) are in conflict.
-		default: console=1
-
-shadow		Allocate a 2nd framebuffer to shadow what's currently across
-		the USB bus in device memory. If any pixels are unchanged,
-		do not transmit. Spends host memory to save USB transfers.
-		Enabled by default. Only disable on very low memory systems.
-		default: shadow=1
-
-Sysfs Attributes
-================
-
-Udlfb creates several files in /sys/class/graphics/fb?
-Where ? is the sequential framebuffer id of the particular DisplayLink device
-
-edid	       		If a valid EDID blob is written to this file (typically
-			by a udev rule), then udlfb will use this EDID as a
-			backup in case reading the actual EDID of the monitor
-			attached to the DisplayLink device fails. This is
-			especially useful for fixed panels, etc. that cannot
-			communicate their capabilities via EDID. Reading
-			this file returns the current EDID of the attached
-			monitor (or last backup value written). This is
-			useful to get the EDID of the attached monitor,
-			which can be passed to utilities like parse-edid.
-
-metrics_bytes_rendered	32-bit count of pixel bytes rendered
-
-metrics_bytes_identical 32-bit count of how many of those bytes were found to be
-			unchanged, based on a shadow framebuffer check
-
-metrics_bytes_sent	32-bit count of how many bytes were transferred over
-			USB to communicate the resulting changed pixels to the
-			hardware. Includes compression and protocol overhead
-
-metrics_cpu_kcycles_used 32-bit count of CPU cycles used in processing the
-			above pixels (in thousands of cycles).
-
-metrics_reset		Write-only. Any write to this file resets all metrics
-			above to zero.  Note that the 32-bit counters above
-			roll over very quickly. To get reliable results, design
-			performance tests to start and finish in a very short
-			period of time (one minute or less is safe).
-
---
-Bernie Thompson <bernie@plugable.com>
diff --git a/Documentation/fb/uvesafb.rst b/Documentation/fb/uvesafb.rst
new file mode 100644
index 000000000000..d1c2523fbb33
--- /dev/null
+++ b/Documentation/fb/uvesafb.rst
@@ -0,0 +1,188 @@
+==========================================================
+uvesafb - A Generic Driver for VBE2+ compliant video cards
+==========================================================
+
+1. Requirements
+---------------
+
+uvesafb should work with any video card that has a Video BIOS compliant
+with the VBE 2.0 standard.
+
+Unlike other drivers, uvesafb makes use of a userspace helper called
+v86d.  v86d is used to run the x86 Video BIOS code in a simulated and
+controlled environment.  This allows uvesafb to function on arches other
+than x86.  Check the v86d documentation for a list of currently supported
+arches.
+
+v86d source code can be downloaded from the following website:
+
+  https://github.com/mjanusz/v86d
+
+Please refer to the v86d documentation for detailed configuration and
+installation instructions.
+
+Note that the v86d userspace helper has to be available at all times in
+order for uvesafb to work properly.  If you want to use uvesafb during
+early boot, you will have to include v86d into an initramfs image, and
+either compile it into the kernel or use it as an initrd.
+
+2. Caveats and limitations
+--------------------------
+
+uvesafb is a _generic_ driver which supports a wide variety of video
+cards, but which is ultimately limited by the Video BIOS interface.
+The most important limitations are:
+
+- Lack of any type of acceleration.
+- A strict and limited set of supported video modes.  Often the native
+  or most optimal resolution/refresh rate for your setup will not work
+  with uvesafb, simply because the Video BIOS doesn't support the
+  video mode you want to use.  This can be especially painful with
+  widescreen panels, where native video modes don't have the 4:3 aspect
+  ratio, which is what most BIOS-es are limited to.
+- Adjusting the refresh rate is only possible with a VBE 3.0 compliant
+  Video BIOS.  Note that many nVidia Video BIOS-es claim to be VBE 3.0
+  compliant, while they simply ignore any refresh rate settings.
+
+3. Configuration
+----------------
+
+uvesafb can be compiled either as a module, or directly into the kernel.
+In both cases it supports the same set of configuration options, which
+are either given on the kernel command line or as module parameters, e.g.::
+
+ video=uvesafb:1024x768-32,mtrr:3,ywrap (compiled into the kernel)
+
+ # modprobe uvesafb mode_option=1024x768-32 mtrr=3 scroll=ywrap  (module)
+
+Accepted options:
+
+======= =========================================================
+ypan    Enable display panning using the VESA protected mode
+	interface.  The visible screen is just a window of the
+	video memory, console scrolling is done by changing the
+	start of the window.  This option is available on x86
+	only and is the default option on that architecture.
+
+ywrap   Same as ypan, but assumes your gfx board can wrap-around
+	the video memory (i.e. starts reading from top if it
+	reaches the end of video memory).  Faster than ypan.
+	Available on x86 only.
+
+redraw  Scroll by redrawing the affected part of the screen, this
+	is the default on non-x86.
+======= =========================================================
+
+(If you're using uvesafb as a module, the above three options are
+used a parameter of the scroll option, e.g. scroll=ypan.)
+
+=========== ====================================================================
+vgapal      Use the standard VGA registers for palette changes.
+
+pmipal      Use the protected mode interface for palette changes.
+            This is the default if the protected mode interface is
+            available.  Available on x86 only.
+
+mtrr:n      Setup memory type range registers for the framebuffer
+            where n:
+
+                - 0 - disabled (equivalent to nomtrr)
+                - 3 - write-combining (default)
+
+            Values other than 0 and 3 will result in a warning and will be
+            treated just like 3.
+
+nomtrr      Do not use memory type range registers.
+
+vremap:n
+            Remap 'n' MiB of video RAM.  If 0 or not specified, remap memory
+            according to video mode.
+
+vtotal:n    If the video BIOS of your card incorrectly determines the total
+            amount of video RAM, use this option to override the BIOS (in MiB).
+
+<mode>      The mode you want to set, in the standard modedb format.  Refer to
+            modedb.txt for a detailed description.  When uvesafb is compiled as
+            a module, the mode string should be provided as a value of the
+            'mode_option' option.
+
+vbemode:x   Force the use of VBE mode x.  The mode will only be set if it's
+            found in the VBE-provided list of supported modes.
+            NOTE: The mode number 'x' should be specified in VESA mode number
+            notation, not the Linux kernel one (eg. 257 instead of 769).
+            HINT: If you use this option because normal <mode> parameter does
+            not work for you and you use a X server, you'll probably want to
+            set the 'nocrtc' option to ensure that the video mode is properly
+            restored after console <-> X switches.
+
+nocrtc      Do not use CRTC timings while setting the video mode.  This option
+            has any effect only if the Video BIOS is VBE 3.0 compliant.  Use it
+            if you have problems with modes set the standard way.  Note that
+            using this option implies that any refresh rate adjustments will
+            be ignored and the refresh rate will stay at your BIOS default
+            (60 Hz).
+
+noedid      Do not try to fetch and use EDID-provided modes.
+
+noblank     Disable hardware blanking.
+
+v86d:path   Set path to the v86d executable. This option is only available as
+            a module parameter, and not as a part of the video= string.  If you
+            need to use it and have uvesafb built into the kernel, use
+            uvesafb.v86d="path".
+=========== ====================================================================
+
+Additionally, the following parameters may be provided.  They all override the
+EDID-provided values and BIOS defaults.  Refer to your monitor's specs to get
+the correct values for maxhf, maxvf and maxclk for your hardware.
+
+=========== ======================================
+maxhf:n     Maximum horizontal frequency (in kHz).
+maxvf:n     Maximum vertical frequency (in Hz).
+maxclk:n    Maximum pixel clock (in MHz).
+=========== ======================================
+
+4. The sysfs interface
+----------------------
+
+uvesafb provides several sysfs nodes for configurable parameters and
+additional information.
+
+Driver attributes:
+
+/sys/bus/platform/drivers/uvesafb
+  v86d
+    (default: /sbin/v86d)
+
+    Path to the v86d executable. v86d is started by uvesafb
+    if an instance of the daemon isn't already running.
+
+Device attributes:
+
+/sys/bus/platform/drivers/uvesafb/uvesafb.0
+  nocrtc
+    Use the default refresh rate (60 Hz) if set to 1.
+
+  oem_product_name, oem_product_rev, oem_string, oem_vendor
+    Information about the card and its maker.
+
+  vbe_modes
+    A list of video modes supported by the Video BIOS along with their
+    VBE mode numbers in hex.
+
+  vbe_version
+    A BCD value indicating the implemented VBE standard.
+
+5. Miscellaneous
+----------------
+
+Uvesafb will set a video mode with the default refresh rate and timings
+from the Video BIOS if you set pixclock to 0 in fb_var_screeninfo.
+
+
+
+ Michal Januszewski <spock@gentoo.org>
+
+ Last updated: 2017-10-10
+
+ Documentation of the uvesafb options is loosely based on vesafb.txt.
diff --git a/Documentation/fb/uvesafb.txt b/Documentation/fb/uvesafb.txt
deleted file mode 100644
index aa924196c366..000000000000
--- a/Documentation/fb/uvesafb.txt
+++ /dev/null
@@ -1,184 +0,0 @@
-
-uvesafb - A Generic Driver for VBE2+ compliant video cards
-==========================================================
-
-1. Requirements
----------------
-
-uvesafb should work with any video card that has a Video BIOS compliant
-with the VBE 2.0 standard.
-
-Unlike other drivers, uvesafb makes use of a userspace helper called
-v86d.  v86d is used to run the x86 Video BIOS code in a simulated and
-controlled environment.  This allows uvesafb to function on arches other
-than x86.  Check the v86d documentation for a list of currently supported
-arches.
-
-v86d source code can be downloaded from the following website:
-
-  https://github.com/mjanusz/v86d
-
-Please refer to the v86d documentation for detailed configuration and
-installation instructions.
-
-Note that the v86d userspace helper has to be available at all times in
-order for uvesafb to work properly.  If you want to use uvesafb during
-early boot, you will have to include v86d into an initramfs image, and
-either compile it into the kernel or use it as an initrd.
-
-2. Caveats and limitations
---------------------------
-
-uvesafb is a _generic_ driver which supports a wide variety of video
-cards, but which is ultimately limited by the Video BIOS interface.
-The most important limitations are:
-
-- Lack of any type of acceleration.
-- A strict and limited set of supported video modes.  Often the native
-  or most optimal resolution/refresh rate for your setup will not work
-  with uvesafb, simply because the Video BIOS doesn't support the
-  video mode you want to use.  This can be especially painful with
-  widescreen panels, where native video modes don't have the 4:3 aspect
-  ratio, which is what most BIOS-es are limited to.
-- Adjusting the refresh rate is only possible with a VBE 3.0 compliant
-  Video BIOS.  Note that many nVidia Video BIOS-es claim to be VBE 3.0
-  compliant, while they simply ignore any refresh rate settings.
-
-3. Configuration
-----------------
-
-uvesafb can be compiled either as a module, or directly into the kernel.
-In both cases it supports the same set of configuration options, which
-are either given on the kernel command line or as module parameters, e.g.:
-
- video=uvesafb:1024x768-32,mtrr:3,ywrap (compiled into the kernel)
-
- # modprobe uvesafb mode_option=1024x768-32 mtrr=3 scroll=ywrap  (module)
-
-Accepted options:
-
-ypan    Enable display panning using the VESA protected mode
-        interface.  The visible screen is just a window of the
-        video memory, console scrolling is done by changing the
-        start of the window.  This option is available on x86
-        only and is the default option on that architecture.
-
-ywrap   Same as ypan, but assumes your gfx board can wrap-around
-        the video memory (i.e. starts reading from top if it
-        reaches the end of video memory).  Faster than ypan.
-        Available on x86 only.
-
-redraw  Scroll by redrawing the affected part of the screen, this
-        is the default on non-x86.
-
-(If you're using uvesafb as a module, the above three options are
- used a parameter of the scroll option, e.g. scroll=ypan.)
-
-vgapal  Use the standard VGA registers for palette changes.
-
-pmipal  Use the protected mode interface for palette changes.
-        This is the default if the protected mode interface is
-        available.  Available on x86 only.
-
-mtrr:n  Setup memory type range registers for the framebuffer
-        where n:
-              0 - disabled (equivalent to nomtrr)
-              3 - write-combining (default)
-
-	Values other than 0 and 3 will result in a warning and will be
-	treated just like 3.
-
-nomtrr  Do not use memory type range registers.
-
-vremap:n
-        Remap 'n' MiB of video RAM.  If 0 or not specified, remap memory
-        according to video mode.
-
-vtotal:n
-        If the video BIOS of your card incorrectly determines the total
-        amount of video RAM, use this option to override the BIOS (in MiB).
-
-<mode>  The mode you want to set, in the standard modedb format.  Refer to
-        modedb.txt for a detailed description.  When uvesafb is compiled as
-        a module, the mode string should be provided as a value of the
-        'mode_option' option.
-
-vbemode:x
-        Force the use of VBE mode x.  The mode will only be set if it's
-        found in the VBE-provided list of supported modes.
-        NOTE: The mode number 'x' should be specified in VESA mode number
-        notation, not the Linux kernel one (eg. 257 instead of 769).
-        HINT: If you use this option because normal <mode> parameter does
-        not work for you and you use a X server, you'll probably want to
-        set the 'nocrtc' option to ensure that the video mode is properly
-        restored after console <-> X switches.
-
-nocrtc  Do not use CRTC timings while setting the video mode.  This option
-        has any effect only if the Video BIOS is VBE 3.0 compliant.  Use it
-        if you have problems with modes set the standard way.  Note that
-        using this option implies that any refresh rate adjustments will
-        be ignored and the refresh rate will stay at your BIOS default (60 Hz).
-
-noedid  Do not try to fetch and use EDID-provided modes.
-
-noblank Disable hardware blanking.
-
-v86d:path
-        Set path to the v86d executable. This option is only available as
-        a module parameter, and not as a part of the video= string.  If you
-        need to use it and have uvesafb built into the kernel, use
-        uvesafb.v86d="path".
-
-Additionally, the following parameters may be provided.  They all override the
-EDID-provided values and BIOS defaults.  Refer to your monitor's specs to get
-the correct values for maxhf, maxvf and maxclk for your hardware.
-
-maxhf:n     Maximum horizontal frequency (in kHz).
-maxvf:n     Maximum vertical frequency (in Hz).
-maxclk:n    Maximum pixel clock (in MHz).
-
-4. The sysfs interface
-----------------------
-
-uvesafb provides several sysfs nodes for configurable parameters and
-additional information.
-
-Driver attributes:
-
-/sys/bus/platform/drivers/uvesafb
-  - v86d (default: /sbin/v86d)
-    Path to the v86d executable. v86d is started by uvesafb
-    if an instance of the daemon isn't already running.
-
-Device attributes:
-
-/sys/bus/platform/drivers/uvesafb/uvesafb.0
-  - nocrtc
-    Use the default refresh rate (60 Hz) if set to 1.
-
-  - oem_product_name
-  - oem_product_rev
-  - oem_string
-  - oem_vendor
-    Information about the card and its maker.
-
-  - vbe_modes
-    A list of video modes supported by the Video BIOS along with their
-    VBE mode numbers in hex.
-
-  - vbe_version
-    A BCD value indicating the implemented VBE standard.
-
-5. Miscellaneous
-----------------
-
-Uvesafb will set a video mode with the default refresh rate and timings
-from the Video BIOS if you set pixclock to 0 in fb_var_screeninfo.
-
-
---
- Michal Januszewski <spock@gentoo.org>
- Last updated: 2017-10-10
-
- Documentation of the uvesafb options is loosely based on vesafb.txt.
-
diff --git a/Documentation/fb/vesafb.rst b/Documentation/fb/vesafb.rst
new file mode 100644
index 000000000000..2ed0dfb661cf
--- /dev/null
+++ b/Documentation/fb/vesafb.rst
@@ -0,0 +1,192 @@
+===============
+What is vesafb?
+===============
+
+This is a generic driver for a graphic framebuffer on intel boxes.
+
+The idea is simple:  Turn on graphics mode at boot time with the help
+of the BIOS, and use this as framebuffer device /dev/fb0, like the m68k
+(and other) ports do.
+
+This means we decide at boot time whenever we want to run in text or
+graphics mode.  Switching mode later on (in protected mode) is
+impossible; BIOS calls work in real mode only.  VESA BIOS Extensions
+Version 2.0 are required, because we need a linear frame buffer.
+
+Advantages:
+
+ * It provides a nice large console (128 cols + 48 lines with 1024x768)
+   without using tiny, unreadable fonts.
+ * You can run XF68_FBDev on top of /dev/fb0 (=> non-accelerated X11
+   support for every VBE 2.0 compliant graphics board).
+ * Most important: boot logo :-)
+
+Disadvantages:
+
+ * graphic mode is slower than text mode...
+
+
+How to use it?
+==============
+
+Switching modes is done using the vga=... boot parameter.  Read
+Documentation/svga.txt for details.
+
+You should compile in both vgacon (for text mode) and vesafb (for
+graphics mode). Which of them takes over the console depends on
+whenever the specified mode is text or graphics.
+
+The graphic modes are NOT in the list which you get if you boot with
+vga=ask and hit return. The mode you wish to use is derived from the
+VESA mode number. Here are those VESA mode numbers:
+
+====== =======  =======  ======== =========
+colors 640x480  800x600  1024x768 1280x1024
+====== =======  =======  ======== =========
+256    0x101    0x103    0x105    0x107
+32k    0x110    0x113    0x116    0x119
+64k    0x111    0x114    0x117    0x11A
+16M    0x112    0x115    0x118    0x11B
+====== =======  =======  ======== =========
+
+
+The video mode number of the Linux kernel is the VESA mode number plus
+0x200:
+
+ Linux_kernel_mode_number = VESA_mode_number + 0x200
+
+So the table for the Kernel mode numbers are:
+
+====== =======  =======  ======== =========
+colors 640x480  800x600  1024x768 1280x1024
+====== =======  =======  ======== =========
+256    0x301    0x303    0x305    0x307
+32k    0x310    0x313    0x316    0x319
+64k    0x311    0x314    0x317    0x31A
+16M    0x312    0x315    0x318    0x31B
+====== =======  =======  ======== =========
+
+To enable one of those modes you have to specify "vga=ask" in the
+lilo.conf file and rerun LILO. Then you can type in the desired
+mode at the "vga=ask" prompt. For example if you like to use
+1024x768x256 colors you have to say "305" at this prompt.
+
+If this does not work, this might be because your BIOS does not support
+linear framebuffers or because it does not support this mode at all.
+Even if your board does, it might be the BIOS which does not.  VESA BIOS
+Extensions v2.0 are required, 1.2 is NOT sufficient.  You will get a
+"bad mode number" message if something goes wrong.
+
+1. Note: LILO cannot handle hex, for booting directly with
+   "vga=mode-number" you have to transform the numbers to decimal.
+2. Note: Some newer versions of LILO appear to work with those hex values,
+   if you set the 0x in front of the numbers.
+
+X11
+===
+
+XF68_FBDev should work just fine, but it is non-accelerated.  Running
+another (accelerated) X-Server like XF86_SVGA might or might not work.
+It depends on X-Server and graphics board.
+
+The X-Server must restore the video mode correctly, else you end up
+with a broken console (and vesafb cannot do anything about this).
+
+
+Refresh rates
+=============
+
+There is no way to change the vesafb video mode and/or timings after
+booting linux.  If you are not happy with the 60 Hz refresh rate, you
+have these options:
+
+ * configure and load the DOS-Tools for the graphics board (if
+   available) and boot linux with loadlin.
+ * use a native driver (matroxfb/atyfb) instead if vesafb.  If none
+   is available, write a new one!
+ * VBE 3.0 might work too.  I have neither a gfx board with VBE 3.0
+   support nor the specs, so I have not checked this yet.
+
+
+Configuration
+=============
+
+The VESA BIOS provides protected mode interface for changing
+some parameters.  vesafb can use it for palette changes and
+to pan the display.  It is turned off by default because it
+seems not to work with some BIOS versions, but there are options
+to turn it on.
+
+You can pass options to vesafb using "video=vesafb:option" on
+the kernel command line.  Multiple options should be separated
+by comma, like this: "video=vesafb:ypan,inverse"
+
+Accepted options:
+
+inverse	use inverse color map
+
+========= ======================================================================
+ypan	  enable display panning using the VESA protected mode
+          interface.  The visible screen is just a window of the
+          video memory, console scrolling is done by changing the
+          start of the window.
+
+          pro:
+
+                * scrolling (fullscreen) is fast, because there is
+		  no need to copy around data.
+		* You'll get scrollback (the Shift-PgUp thing),
+		  the video memory can be used as scrollback buffer
+
+          kontra:
+
+		* scrolling only parts of the screen causes some
+		  ugly flicker effects (boot logo flickers for
+		  example).
+
+ywrap	  Same as ypan, but assumes your gfx board can wrap-around
+          the video memory (i.e. starts reading from top if it
+          reaches the end of video memory).  Faster than ypan.
+
+redraw	  Scroll by redrawing the affected part of the screen, this
+          is the safe (and slow) default.
+
+
+vgapal	  Use the standard vga registers for palette changes.
+          This is the default.
+pmipal    Use the protected mode interface for palette changes.
+
+mtrr:n	  Setup memory type range registers for the vesafb framebuffer
+          where n:
+
+              - 0 - disabled (equivalent to nomtrr) (default)
+              - 1 - uncachable
+              - 2 - write-back
+              - 3 - write-combining
+              - 4 - write-through
+
+          If you see the following in dmesg, choose the type that matches the
+          old one. In this example, use "mtrr:2".
+...
+mtrr:     type mismatch for e0000000,8000000 old: write-back new:
+	  write-combining
+...
+
+nomtrr    disable mtrr
+
+vremap:n
+          Remap 'n' MiB of video RAM. If 0 or not specified, remap memory
+          according to video mode. (2.5.66 patch/idea by Antonino Daplas
+          reversed to give override possibility (allocate more fb memory
+          than the kernel would) to 2.4 by tmb@iki.fi)
+
+vtotal:n  If the video BIOS of your card incorrectly determines the total
+          amount of video RAM, use this option to override the BIOS (in MiB).
+========= ======================================================================
+
+Have fun!
+
+Gerd Knorr <kraxel@goldbach.in-berlin.de>
+
+Minor (mostly typo) changes
+by Nico Schmoigl <schmoigl@rumms.uni-mannheim.de>
diff --git a/Documentation/fb/vesafb.txt b/Documentation/fb/vesafb.txt
deleted file mode 100644
index 413bb73235be..000000000000
--- a/Documentation/fb/vesafb.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-
-What is vesafb?
-===============
-
-This is a generic driver for a graphic framebuffer on intel boxes.
-
-The idea is simple:  Turn on graphics mode at boot time with the help
-of the BIOS, and use this as framebuffer device /dev/fb0, like the m68k
-(and other) ports do.
-
-This means we decide at boot time whenever we want to run in text or
-graphics mode.  Switching mode later on (in protected mode) is
-impossible; BIOS calls work in real mode only.  VESA BIOS Extensions
-Version 2.0 are required, because we need a linear frame buffer.
-
-Advantages:
-
- * It provides a nice large console (128 cols + 48 lines with 1024x768)
-   without using tiny, unreadable fonts.
- * You can run XF68_FBDev on top of /dev/fb0 (=> non-accelerated X11
-   support for every VBE 2.0 compliant graphics board).
- * Most important: boot logo :-)
-
-Disadvantages:
-
- * graphic mode is slower than text mode...
-
-
-How to use it?
-==============
-
-Switching modes is done using the vga=... boot parameter.  Read
-Documentation/svga.txt for details.
-
-You should compile in both vgacon (for text mode) and vesafb (for
-graphics mode). Which of them takes over the console depends on
-whenever the specified mode is text or graphics.
-
-The graphic modes are NOT in the list which you get if you boot with
-vga=ask and hit return. The mode you wish to use is derived from the
-VESA mode number. Here are those VESA mode numbers:
-
-    | 640x480  800x600  1024x768 1280x1024
-----+-------------------------------------
-256 |  0x101    0x103    0x105    0x107   
-32k |  0x110    0x113    0x116    0x119   
-64k |  0x111    0x114    0x117    0x11A   
-16M |  0x112    0x115    0x118    0x11B   
-
-The video mode number of the Linux kernel is the VESA mode number plus
-0x200.
- 
- Linux_kernel_mode_number = VESA_mode_number + 0x200
-
-So the table for the Kernel mode numbers are:
-
-    | 640x480  800x600  1024x768 1280x1024
-----+-------------------------------------
-256 |  0x301    0x303    0x305    0x307   
-32k |  0x310    0x313    0x316    0x319   
-64k |  0x311    0x314    0x317    0x31A   
-16M |  0x312    0x315    0x318    0x31B   
-
-To enable one of those modes you have to specify "vga=ask" in the
-lilo.conf file and rerun LILO. Then you can type in the desired
-mode at the "vga=ask" prompt. For example if you like to use 
-1024x768x256 colors you have to say "305" at this prompt.
-
-If this does not work, this might be because your BIOS does not support
-linear framebuffers or because it does not support this mode at all.
-Even if your board does, it might be the BIOS which does not.  VESA BIOS
-Extensions v2.0 are required, 1.2 is NOT sufficient.  You will get a
-"bad mode number" message if something goes wrong.
-
-1. Note: LILO cannot handle hex, for booting directly with 
-         "vga=mode-number" you have to transform the numbers to decimal.
-2. Note: Some newer versions of LILO appear to work with those hex values,
-         if you set the 0x in front of the numbers.
-
-X11
-===
-
-XF68_FBDev should work just fine, but it is non-accelerated.  Running
-another (accelerated) X-Server like XF86_SVGA might or might not work.
-It depends on X-Server and graphics board.
-
-The X-Server must restore the video mode correctly, else you end up
-with a broken console (and vesafb cannot do anything about this).
-
-
-Refresh rates
-=============
-
-There is no way to change the vesafb video mode and/or timings after
-booting linux.  If you are not happy with the 60 Hz refresh rate, you
-have these options:
-
- * configure and load the DOS-Tools for the graphics board (if
-   available) and boot linux with loadlin.
- * use a native driver (matroxfb/atyfb) instead if vesafb.  If none
-   is available, write a new one!
- * VBE 3.0 might work too.  I have neither a gfx board with VBE 3.0
-   support nor the specs, so I have not checked this yet.
-
-
-Configuration
-=============
-
-The VESA BIOS provides protected mode interface for changing
-some parameters.  vesafb can use it for palette changes and
-to pan the display.  It is turned off by default because it
-seems not to work with some BIOS versions, but there are options
-to turn it on.
-
-You can pass options to vesafb using "video=vesafb:option" on
-the kernel command line.  Multiple options should be separated
-by comma, like this: "video=vesafb:ypan,inverse"
-
-Accepted options:
-
-inverse	use inverse color map
-
-ypan	enable display panning using the VESA protected mode 
-	interface.  The visible screen is just a window of the
-	video memory, console scrolling is done by changing the
-	start of the window.
-	pro:	* scrolling (fullscreen) is fast, because there is
-		  no need to copy around data.
-		* You'll get scrollback (the Shift-PgUp thing),
-		  the video memory can be used as scrollback buffer
-	kontra: * scrolling only parts of the screen causes some
-		  ugly flicker effects (boot logo flickers for
-		  example).
-
-ywrap	Same as ypan, but assumes your gfx board can wrap-around 
-	the video memory (i.e. starts reading from top if it
-	reaches the end of video memory).  Faster than ypan.
-
-redraw	scroll by redrawing the affected part of the screen, this
-	is the safe (and slow) default.
-
-
-vgapal	Use the standard vga registers for palette changes.
-	This is the default.
-pmipal	Use the protected mode interface for palette changes.
-
-mtrr:n	setup memory type range registers for the vesafb framebuffer
-	where n:
-	      0 - disabled (equivalent to nomtrr) (default)
-	      1 - uncachable
-	      2 - write-back
-	      3 - write-combining
-	      4 - write-through
-
-	If you see the following in dmesg, choose the type that matches the
-	old one. In this example, use "mtrr:2".
-...
-mtrr: type mismatch for e0000000,8000000 old: write-back new: write-combining
-...
-
-nomtrr  disable mtrr
-
-vremap:n
-        remap 'n' MiB of video RAM. If 0 or not specified, remap memory
-	according to video mode. (2.5.66 patch/idea by Antonino Daplas
-	reversed to give override possibility (allocate more fb memory
-	than the kernel would) to 2.4 by tmb@iki.fi)
-
-vtotal:n
-        if the video BIOS of your card incorrectly determines the total
-        amount of video RAM, use this option to override the BIOS (in MiB).
-
-Have fun!
-
-  Gerd
-
---
-Gerd Knorr <kraxel@goldbach.in-berlin.de>
-
-Minor (mostly typo) changes 
-by Nico Schmoigl <schmoigl@rumms.uni-mannheim.de>
diff --git a/Documentation/fb/viafb.rst b/Documentation/fb/viafb.rst
new file mode 100644
index 000000000000..8eb7a3bb068c
--- /dev/null
+++ b/Documentation/fb/viafb.rst
@@ -0,0 +1,297 @@
+=======================================================
+VIA Integration Graphic Chip Console Framebuffer Driver
+=======================================================
+
+Platform
+--------
+    The console framebuffer driver is for graphics chips of
+    VIA UniChrome Family
+    (CLE266, PM800 / CN400 / CN300,
+    P4M800CE / P4M800Pro / CN700 / VN800,
+    CX700 / VX700, K8M890, P4M890,
+    CN896 / P4M900, VX800, VX855)
+
+Driver features
+---------------
+    Device: CRT, LCD, DVI
+
+    Support viafb_mode::
+
+	CRT:
+	    640x480(60, 75, 85, 100, 120 Hz), 720x480(60 Hz),
+	    720x576(60 Hz), 800x600(60, 75, 85, 100, 120 Hz),
+	    848x480(60 Hz), 856x480(60 Hz), 1024x512(60 Hz),
+	    1024x768(60, 75, 85, 100 Hz), 1152x864(75 Hz),
+	    1280x768(60 Hz), 1280x960(60 Hz), 1280x1024(60, 75, 85 Hz),
+	    1440x1050(60 Hz), 1600x1200(60, 75 Hz), 1280x720(60 Hz),
+	    1920x1080(60 Hz), 1400x1050(60 Hz), 800x480(60 Hz)
+
+    color depth: 8 bpp, 16 bpp, 32 bpp supports.
+
+    Support 2D hardware accelerator.
+
+Using the viafb module
+----------------------
+    Start viafb with default settings::
+
+	#modprobe viafb
+
+    Start viafb with user options::
+
+	#modprobe viafb viafb_mode=800x600 viafb_bpp=16 viafb_refresh=60
+		  viafb_active_dev=CRT+DVI viafb_dvi_port=DVP1
+		  viafb_mode1=1024x768 viafb_bpp=16 viafb_refresh1=60
+		  viafb_SAMM_ON=1
+
+    viafb_mode:
+	- 640x480 (default)
+	- 720x480
+	- 800x600
+	- 1024x768
+
+    viafb_bpp:
+	- 8, 16, 32 (default:32)
+
+    viafb_refresh:
+	- 60, 75, 85, 100, 120 (default:60)
+
+    viafb_lcd_dsp_method:
+	- 0 : expansion (default)
+	- 1 : centering
+
+    viafb_lcd_mode:
+	0 : LCD panel with LSB data format input (default)
+	1 : LCD panel with MSB data format input
+
+    viafb_lcd_panel_id:
+	- 0 : Resolution: 640x480, Channel: single, Dithering: Enable
+	- 1 : Resolution: 800x600, Channel: single, Dithering: Enable
+	- 2 : Resolution: 1024x768, Channel: single, Dithering: Enable (default)
+	- 3 : Resolution: 1280x768, Channel: single, Dithering: Enable
+	- 4 : Resolution: 1280x1024, Channel: dual, Dithering: Enable
+	- 5 : Resolution: 1400x1050, Channel: dual, Dithering: Enable
+	- 6 : Resolution: 1600x1200, Channel: dual, Dithering: Enable
+
+	- 8 : Resolution: 800x480, Channel: single, Dithering: Enable
+	- 9 : Resolution: 1024x768, Channel: dual, Dithering: Enable
+	- 10: Resolution: 1024x768, Channel: single, Dithering: Disable
+	- 11: Resolution: 1024x768, Channel: dual, Dithering: Disable
+	- 12: Resolution: 1280x768, Channel: single, Dithering: Disable
+	- 13: Resolution: 1280x1024, Channel: dual, Dithering: Disable
+	- 14: Resolution: 1400x1050, Channel: dual, Dithering: Disable
+	- 15: Resolution: 1600x1200, Channel: dual, Dithering: Disable
+	- 16: Resolution: 1366x768, Channel: single, Dithering: Disable
+	- 17: Resolution: 1024x600, Channel: single, Dithering: Enable
+	- 18: Resolution: 1280x768, Channel: dual, Dithering: Enable
+	- 19: Resolution: 1280x800, Channel: single, Dithering: Enable
+
+    viafb_accel:
+	- 0 : No 2D Hardware Acceleration
+	- 1 : 2D Hardware Acceleration (default)
+
+    viafb_SAMM_ON:
+	- 0 : viafb_SAMM_ON disable (default)
+	- 1 : viafb_SAMM_ON enable
+
+    viafb_mode1: (secondary display device)
+	- 640x480 (default)
+	- 720x480
+	- 800x600
+	- 1024x768
+
+    viafb_bpp1: (secondary display device)
+	- 8, 16, 32 (default:32)
+
+    viafb_refresh1: (secondary display device)
+	- 60, 75, 85, 100, 120 (default:60)
+
+    viafb_active_dev:
+	This option is used to specify active devices.(CRT, DVI, CRT+LCD...)
+	DVI stands for DVI or HDMI, E.g., If you want to enable HDMI,
+	set viafb_active_dev=DVI. In SAMM case, the previous of
+	viafb_active_dev is primary device, and the following is
+	secondary device.
+
+	For example:
+
+	To enable one device, such as DVI only, we can use::
+
+	    modprobe viafb viafb_active_dev=DVI
+
+	To enable two devices, such as CRT+DVI::
+
+	    modprobe viafb viafb_active_dev=CRT+DVI;
+
+	For DuoView case, we can use::
+
+	    modprobe viafb viafb_active_dev=CRT+DVI
+
+	OR::
+
+	    modprobe viafb viafb_active_dev=DVI+CRT...
+
+	For SAMM case:
+
+	If CRT is primary and DVI is secondary, we should use::
+
+	    modprobe viafb viafb_active_dev=CRT+DVI viafb_SAMM_ON=1...
+
+	If DVI is primary and CRT is secondary, we should use::
+
+	    modprobe viafb viafb_active_dev=DVI+CRT viafb_SAMM_ON=1...
+
+    viafb_display_hardware_layout:
+	This option is used to specify display hardware layout for CX700 chip.
+
+	- 1 : LCD only
+	- 2 : DVI only
+	- 3 : LCD+DVI (default)
+	- 4 : LCD1+LCD2 (internal + internal)
+	- 16: LCD1+ExternalLCD2 (internal + external)
+
+    viafb_second_size:
+	This option is used to set second device memory size(MB) in SAMM case.
+	The minimal size is 16.
+
+    viafb_platform_epia_dvi:
+	This option is used to enable DVI on EPIA - M
+
+	- 0 : No DVI on EPIA - M (default)
+	- 1 : DVI on EPIA - M
+
+    viafb_bus_width:
+	When using 24 - Bit Bus Width Digital Interface,
+	this option should be set.
+
+	- 12: 12-Bit LVDS or 12-Bit TMDS (default)
+	- 24: 24-Bit LVDS or 24-Bit TMDS
+
+    viafb_device_lcd_dualedge:
+	When using Dual Edge Panel, this option should be set.
+
+	- 0 : No Dual Edge Panel (default)
+	- 1 : Dual Edge Panel
+
+    viafb_lcd_port:
+	This option is used to specify LCD output port,
+	available values are "DVP0" "DVP1" "DFP_HIGHLOW" "DFP_HIGH" "DFP_LOW".
+
+	for external LCD + external DVI on CX700(External LCD is on DVP0),
+	we should use::
+
+	    modprobe viafb viafb_lcd_port=DVP0...
+
+Notes:
+    1. CRT may not display properly for DuoView CRT & DVI display at
+       the "640x480" PAL mode with DVI overscan enabled.
+    2. SAMM stands for single adapter multi monitors. It is different from
+       multi-head since SAMM support multi monitor at driver layers, thus fbcon
+       layer doesn't even know about it; SAMM's second screen doesn't have a
+       device node file, thus a user mode application can't access it directly.
+       When SAMM is enabled, viafb_mode and viafb_mode1, viafb_bpp and
+       viafb_bpp1, viafb_refresh and viafb_refresh1 can be different.
+    3. When console is depending on viafbinfo1, dynamically change resolution
+       and bpp, need to call VIAFB specified ioctl interface VIAFB_SET_DEVICE
+       instead of calling common ioctl function FBIOPUT_VSCREENINFO since
+       viafb doesn't support multi-head well, or it will cause screen crush.
+
+
+Configure viafb with "fbset" tool
+---------------------------------
+
+    "fbset" is an inbox utility of Linux.
+
+    1. Inquire current viafb information, type::
+
+	   # fbset -i
+
+    2. Set various resolutions and viafb_refresh rates::
+
+	   # fbset <resolution-vertical_sync>
+
+       example::
+
+	   # fbset "1024x768-75"
+
+       or::
+
+	   # fbset -g 1024 768 1024 768 32
+
+       Check the file "/etc/fb.modes" to find display modes available.
+
+    3. Set the color depth::
+
+	   # fbset -depth <value>
+
+       example::
+
+	   # fbset -depth 16
+
+
+Configure viafb via /proc
+-------------------------
+    The following files exist in /proc/viafb
+
+    supported_output_devices
+	This read-only file contains a full ',' separated list containing all
+	output devices that could be available on your platform. It is likely
+	that not all of those have a connector on your hardware but it should
+	provide a good starting point to figure out which of those names match
+	a real connector.
+
+	Example::
+
+		# cat /proc/viafb/supported_output_devices
+
+    iga1/output_devices, iga2/output_devices
+	These two files are readable and writable. iga1 and iga2 are the two
+	independent units that produce the screen image. Those images can be
+	forwarded to one or more output devices. Reading those files is a way
+	to query which output devices are currently used by an iga.
+
+	Example::
+
+		# cat /proc/viafb/iga1/output_devices
+
+	If there are no output devices printed the output of this iga is lost.
+	This can happen for example if only one (the other) iga is used.
+	Writing to these files allows adjusting the output devices during
+	runtime. One can add new devices, remove existing ones or switch
+	between igas. Essentially you can write a ',' separated list of device
+	names (or a single one) in the same format as the output to those
+	files. You can add a '+' or '-' as a prefix allowing simple addition
+	and removal of devices. So a prefix '+' adds the devices from your list
+	to the already existing ones, '-' removes the listed devices from the
+	existing ones and if no prefix is given it replaces all existing ones
+	with the listed ones. If you remove devices they are expected to turn
+	off. If you add devices that are already part of the other iga they are
+	removed there and added to the new one.
+
+	Examples:
+
+	Add CRT as output device to iga1::
+
+		# echo +CRT > /proc/viafb/iga1/output_devices
+
+	Remove (turn off) DVP1 and LVDS1 as output devices of iga2::
+
+		# echo -DVP1,LVDS1 > /proc/viafb/iga2/output_devices
+
+	Replace all iga1 output devices by CRT::
+
+		# echo CRT > /proc/viafb/iga1/output_devices
+
+
+Bootup with viafb
+-----------------
+
+Add the following line to your grub.conf::
+
+    append = "video=viafb:viafb_mode=1024x768,viafb_bpp=32,viafb_refresh=85"
+
+
+VIA Framebuffer modes
+=====================
+
+.. include:: viafb.modes
+   :literal:
diff --git a/Documentation/fb/viafb.txt b/Documentation/fb/viafb.txt
deleted file mode 100644
index 1cb2462a71ce..000000000000
--- a/Documentation/fb/viafb.txt
+++ /dev/null
@@ -1,252 +0,0 @@
-
-        VIA Integration Graphic Chip Console Framebuffer Driver
-
-[Platform]
------------------------
-    The console framebuffer driver is for graphics chips of
-    VIA UniChrome Family(CLE266, PM800 / CN400 / CN300,
-                        P4M800CE / P4M800Pro / CN700 / VN800,
-                        CX700 / VX700, K8M890, P4M890,
-                        CN896 / P4M900, VX800, VX855)
-
-[Driver features]
-------------------------
-    Device: CRT, LCD, DVI
-
-    Support viafb_mode:
-        CRT:
-            640x480(60, 75, 85, 100, 120 Hz), 720x480(60 Hz),
-            720x576(60 Hz), 800x600(60, 75, 85, 100, 120 Hz),
-            848x480(60 Hz), 856x480(60 Hz), 1024x512(60 Hz),
-            1024x768(60, 75, 85, 100 Hz), 1152x864(75 Hz),
-            1280x768(60 Hz), 1280x960(60 Hz), 1280x1024(60, 75, 85 Hz),
-            1440x1050(60 Hz), 1600x1200(60, 75 Hz), 1280x720(60 Hz),
-            1920x1080(60 Hz), 1400x1050(60 Hz), 800x480(60 Hz)
-
-    color depth: 8 bpp, 16 bpp, 32 bpp supports.
-
-    Support 2D hardware accelerator.
-
-[Using the viafb module]
--- -- --------------------
-    Start viafb with default settings:
-        #modprobe viafb
-
-    Start viafb with user options:
-        #modprobe viafb viafb_mode=800x600 viafb_bpp=16 viafb_refresh=60
-                  viafb_active_dev=CRT+DVI viafb_dvi_port=DVP1
-                  viafb_mode1=1024x768 viafb_bpp=16 viafb_refresh1=60
-                  viafb_SAMM_ON=1
-
-    viafb_mode:
-        640x480 (default)
-        720x480
-        800x600
-        1024x768
-        ......
-
-    viafb_bpp:
-        8, 16, 32 (default:32)
-
-    viafb_refresh:
-        60, 75, 85, 100, 120 (default:60)
-
-    viafb_lcd_dsp_method:
-        0 : expansion (default)
-        1 : centering
-
-    viafb_lcd_mode:
-        0 : LCD panel with LSB data format input (default)
-        1 : LCD panel with MSB data format input
-
-    viafb_lcd_panel_id:
-        0 : Resolution: 640x480, Channel: single, Dithering: Enable
-        1 : Resolution: 800x600, Channel: single, Dithering: Enable
-        2 : Resolution: 1024x768, Channel: single, Dithering: Enable (default)
-        3 : Resolution: 1280x768, Channel: single, Dithering: Enable
-        4 : Resolution: 1280x1024, Channel: dual, Dithering: Enable
-        5 : Resolution: 1400x1050, Channel: dual, Dithering: Enable
-        6 : Resolution: 1600x1200, Channel: dual, Dithering: Enable
-
-        8 : Resolution: 800x480, Channel: single, Dithering: Enable
-        9 : Resolution: 1024x768, Channel: dual, Dithering: Enable
-        10: Resolution: 1024x768, Channel: single, Dithering: Disable
-        11: Resolution: 1024x768, Channel: dual, Dithering: Disable
-        12: Resolution: 1280x768, Channel: single, Dithering: Disable
-        13: Resolution: 1280x1024, Channel: dual, Dithering: Disable
-        14: Resolution: 1400x1050, Channel: dual, Dithering: Disable
-        15: Resolution: 1600x1200, Channel: dual, Dithering: Disable
-        16: Resolution: 1366x768, Channel: single, Dithering: Disable
-        17: Resolution: 1024x600, Channel: single, Dithering: Enable
-        18: Resolution: 1280x768, Channel: dual, Dithering: Enable
-        19: Resolution: 1280x800, Channel: single, Dithering: Enable
-
-    viafb_accel:
-        0 : No 2D Hardware Acceleration
-        1 : 2D Hardware Acceleration (default)
-
-    viafb_SAMM_ON:
-        0 : viafb_SAMM_ON disable (default)
-        1 : viafb_SAMM_ON enable
-
-    viafb_mode1: (secondary display device)
-        640x480 (default)
-        720x480
-        800x600
-        1024x768
-        ... ...
-
-    viafb_bpp1: (secondary display device)
-        8, 16, 32 (default:32)
-
-    viafb_refresh1: (secondary display device)
-        60, 75, 85, 100, 120 (default:60)
-
-    viafb_active_dev:
-        This option is used to specify active devices.(CRT, DVI, CRT+LCD...)
-        DVI stands for DVI or HDMI, E.g., If you want to enable HDMI,
-        set viafb_active_dev=DVI. In SAMM case, the previous of
-        viafb_active_dev is primary device, and the following is
-        secondary device.
-
-        For example:
-        To enable one device, such as DVI only, we can use:
-            modprobe viafb viafb_active_dev=DVI
-        To enable two devices, such as CRT+DVI:
-            modprobe viafb viafb_active_dev=CRT+DVI;
-
-        For DuoView case, we can use:
-            modprobe viafb viafb_active_dev=CRT+DVI
-            OR
-            modprobe viafb viafb_active_dev=DVI+CRT...
-
-        For SAMM case:
-        If CRT is primary and DVI is secondary, we should use:
-            modprobe viafb viafb_active_dev=CRT+DVI viafb_SAMM_ON=1...
-        If DVI is primary and CRT is secondary, we should use:
-            modprobe viafb viafb_active_dev=DVI+CRT viafb_SAMM_ON=1...
-
-    viafb_display_hardware_layout:
-        This option is used to specify display hardware layout for CX700 chip.
-        1 : LCD only
-        2 : DVI only
-        3 : LCD+DVI (default)
-        4 : LCD1+LCD2 (internal + internal)
-        16: LCD1+ExternalLCD2 (internal + external)
-
-    viafb_second_size:
-        This option is used to set second device memory size(MB) in SAMM case.
-        The minimal size is 16.
-
-    viafb_platform_epia_dvi:
-        This option is used to enable DVI on EPIA - M
-        0 : No DVI on EPIA - M (default)
-        1 : DVI on EPIA - M
-
-    viafb_bus_width:
-        When using 24 - Bit Bus Width Digital Interface,
-        this option should be set.
-        12: 12-Bit LVDS or 12-Bit TMDS (default)
-        24: 24-Bit LVDS or 24-Bit TMDS
-
-    viafb_device_lcd_dualedge:
-        When using Dual Edge Panel, this option should be set.
-        0 : No Dual Edge Panel (default)
-        1 : Dual Edge Panel
-
-    viafb_lcd_port:
-        This option is used to specify LCD output port,
-        available values are "DVP0" "DVP1" "DFP_HIGHLOW" "DFP_HIGH" "DFP_LOW".
-        for external LCD + external DVI on CX700(External LCD is on DVP0),
-        we should use:
-            modprobe viafb viafb_lcd_port=DVP0...
-
-Notes:
-    1. CRT may not display properly for DuoView CRT & DVI display at
-       the "640x480" PAL mode with DVI overscan enabled.
-    2. SAMM stands for single adapter multi monitors. It is different from
-       multi-head since SAMM support multi monitor at driver layers, thus fbcon
-       layer doesn't even know about it; SAMM's second screen doesn't have a
-       device node file, thus a user mode application can't access it directly.
-       When SAMM is enabled, viafb_mode and viafb_mode1, viafb_bpp and
-       viafb_bpp1, viafb_refresh and viafb_refresh1 can be different.
-    3. When console is depending on viafbinfo1, dynamically change resolution
-       and bpp, need to call VIAFB specified ioctl interface VIAFB_SET_DEVICE
-       instead of calling common ioctl function FBIOPUT_VSCREENINFO since
-       viafb doesn't support multi-head well, or it will cause screen crush.
-
-
-[Configure viafb with "fbset" tool]
------------------------------------
-    "fbset" is an inbox utility of Linux.
-    1. Inquire current viafb information, type,
-           # fbset -i
-
-    2. Set various resolutions and viafb_refresh rates,
-           # fbset <resolution-vertical_sync>
-
-       example,
-           # fbset "1024x768-75"
-       or
-           # fbset -g 1024 768 1024 768 32
-       Check the file "/etc/fb.modes" to find display modes available.
-
-    3. Set the color depth,
-           # fbset -depth <value>
-
-       example,
-           # fbset -depth 16
-
-
-[Configure viafb via /proc]
----------------------------
-    The following files exist in /proc/viafb
-
-    supported_output_devices
-
-        This read-only file contains a full ',' separated list containing all
-        output devices that could be available on your platform. It is likely
-        that not all of those have a connector on your hardware but it should
-        provide a good starting point to figure out which of those names match
-        a real connector.
-        Example:
-        # cat /proc/viafb/supported_output_devices
-
-    iga1/output_devices
-    iga2/output_devices
-
-        These two files are readable and writable. iga1 and iga2 are the two
-        independent units that produce the screen image. Those images can be
-        forwarded to one or more output devices. Reading those files is a way
-        to query which output devices are currently used by an iga.
-        Example:
-        # cat /proc/viafb/iga1/output_devices
-        If there are no output devices printed the output of this iga is lost.
-        This can happen for example if only one (the other) iga is used.
-        Writing to these files allows adjusting the output devices during
-        runtime. One can add new devices, remove existing ones or switch
-        between igas. Essentially you can write a ',' separated list of device
-        names (or a single one) in the same format as the output to those
-        files. You can add a '+' or '-' as a prefix allowing simple addition
-        and removal of devices. So a prefix '+' adds the devices from your list
-        to the already existing ones, '-' removes the listed devices from the
-        existing ones and if no prefix is given it replaces all existing ones
-        with the listed ones. If you remove devices they are expected to turn
-        off. If you add devices that are already part of the other iga they are
-        removed there and added to the new one.
-        Examples:
-        Add CRT as output device to iga1
-        # echo +CRT > /proc/viafb/iga1/output_devices
-
-        Remove (turn off) DVP1 and LVDS1 as output devices of iga2
-        # echo -DVP1,LVDS1 > /proc/viafb/iga2/output_devices
-
-        Replace all iga1 output devices by CRT
-        # echo CRT > /proc/viafb/iga1/output_devices
-
-
-[Bootup with viafb]:
---------------------
-    Add the following line to your grub.conf:
-    append = "video=viafb:viafb_mode=1024x768,viafb_bpp=32,viafb_refresh=85"
-
diff --git a/Documentation/fb/vt8623fb.rst b/Documentation/fb/vt8623fb.rst
new file mode 100644
index 000000000000..ba1730937dd8
--- /dev/null
+++ b/Documentation/fb/vt8623fb.rst
@@ -0,0 +1,64 @@
+===============================================================
+vt8623fb - fbdev driver for graphics core in VIA VT8623 chipset
+===============================================================
+
+
+Supported Hardware
+==================
+
+VIA VT8623 [CLE266] chipset and	its graphics core
+(known as CastleRock or Unichrome)
+
+I tested vt8623fb on VIA EPIA ML-6000
+
+
+Supported Features
+==================
+
+	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
+	*  8 bpp pseudocolor mode (with 18bit palette)
+	* 16 bpp truecolor mode (RGB 565)
+	* 32 bpp truecolor mode (RGB 888)
+	* text mode (activated by bpp = 0)
+	* doublescan mode variant (not available in text mode)
+	* panning in both directions
+	* suspend/resume support
+	* DPMS support
+
+Text mode is supported even in higher resolutions, but there is limitation to
+lower pixclocks (maximum about 100 MHz). This limitation is not enforced by
+driver. Text mode supports 8bit wide fonts only (hardware limitation) and
+16bit tall fonts (driver limitation).
+
+There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
+packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
+with interleaved planes (1 byte interleave), MSB first. Both modes support
+8bit wide fonts only (driver limitation).
+
+Suspend/resume works on systems that initialize video card during resume and
+if device is active (for example used by fbcon).
+
+
+Missing Features
+================
+(alias TODO list)
+
+	* secondary (not initialized by BIOS) device support
+	* MMIO support
+	* interlaced mode variant
+	* support for fontwidths != 8 in 4 bpp modes
+	* support for fontheight != 16 in text mode
+	* hardware cursor
+	* video overlay support
+	* vsync synchronization
+	* acceleration support (8514-like 2D, busmaster transfers)
+
+
+Known bugs
+==========
+
+	* cursor disable in text mode doesn't work
+
+
+--
+Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/Documentation/fb/vt8623fb.txt b/Documentation/fb/vt8623fb.txt
deleted file mode 100644
index f654576c56b7..000000000000
--- a/Documentation/fb/vt8623fb.txt
+++ /dev/null
@@ -1,64 +0,0 @@
-
-	vt8623fb - fbdev driver for graphics core in VIA VT8623 chipset
-	===============================================================
-
-
-Supported Hardware
-==================
-
-	VIA VT8623 [CLE266] chipset and	its graphics core
-		(known as CastleRock or Unichrome)
-
-I tested vt8623fb on VIA EPIA ML-6000
-
-
-Supported Features
-==================
-
-	*  4 bpp pseudocolor modes (with 18bit palette, two variants)
-	*  8 bpp pseudocolor mode (with 18bit palette)
-	* 16 bpp truecolor mode (RGB 565)
-	* 32 bpp truecolor mode (RGB 888)
-	* text mode (activated by bpp = 0)
-	* doublescan mode variant (not available in text mode)
-	* panning in both directions
-	* suspend/resume support
-	* DPMS support
-
-Text mode is supported even in higher resolutions, but there is limitation to
-lower pixclocks (maximum about 100 MHz). This limitation is not enforced by
-driver. Text mode supports 8bit wide fonts only (hardware limitation) and
-16bit tall fonts (driver limitation).
-
-There are two 4 bpp modes. First mode (selected if nonstd == 0) is mode with
-packed pixels, high nibble first. Second mode (selected if nonstd == 1) is mode
-with interleaved planes (1 byte interleave), MSB first. Both modes support
-8bit wide fonts only (driver limitation).
-
-Suspend/resume works on systems that initialize video card during resume and
-if device is active (for example used by fbcon).
-
-
-Missing Features
-================
-(alias TODO list)
-
-	* secondary (not initialized by BIOS) device support
-	* MMIO support
-	* interlaced mode variant
-	* support for fontwidths != 8 in 4 bpp modes
-	* support for fontheight != 16 in text mode
-	* hardware cursor
-	* video overlay support
-	* vsync synchronization
-	* acceleration support (8514-like 2D, busmaster transfers)
-
-
-Known bugs
-==========
-
-	* cursor disable in text mode doesn't work
-
-
---
-Ondrej Zajicek <santiago@crfreenet.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index c95c29735327..314545af6f45 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4789,7 +4789,7 @@ S:	Maintained
 W:	http://plugable.com/category/projects/udlfb/
 F:	drivers/video/fbdev/udlfb.c
 F:	include/video/udlfb.h
-F:	Documentation/fb/udlfb.txt
+F:	Documentation/fb/udlfb.rst
 
 DISTRIBUTED LOCK MANAGER (DLM)
 M:	Christine Caulfield <ccaulfie@redhat.com>
@@ -7923,7 +7923,7 @@ INTEL FRAMEBUFFER DRIVER (excluding 810 and 815)
 M:	Maik Broemme <mbroemme@libmpq.org>
 L:	linux-fbdev@vger.kernel.org
 S:	Maintained
-F:	Documentation/fb/intelfb.txt
+F:	Documentation/fb/intelfb.rst
 F:	drivers/video/fbdev/intelfb/
 
 INTEL GPIO DRIVERS
@@ -14350,7 +14350,7 @@ M:	Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
 L:	linux-fbdev@vger.kernel.org
 S:	Maintained
 F:	drivers/video/fbdev/sm712*
-F:	Documentation/fb/sm712fb.txt
+F:	Documentation/fb/sm712fb.rst
 
 SIMPLE FIRMWARE INTERFACE (SFI)
 M:	Len Brown <lenb@kernel.org>
@@ -14420,7 +14420,7 @@ SIS FRAMEBUFFER DRIVER
 M:	Thomas Winischhofer <thomas@winischhofer.net>
 W:	http://www.winischhofer.net/linuxsisvga.shtml
 S:	Maintained
-F:	Documentation/fb/sisfb.txt
+F:	Documentation/fb/sisfb.rst
 F:	drivers/video/fbdev/sis/
 F:	include/video/sisfb.h
 
@@ -16608,7 +16608,7 @@ M:	Michal Januszewski <spock@gentoo.org>
 L:	linux-fbdev@vger.kernel.org
 W:	https://github.com/mjanusz/v86d
 S:	Maintained
-F:	Documentation/fb/uvesafb.txt
+F:	Documentation/fb/uvesafb.rst
 F:	drivers/video/fbdev/uvesafb.*
 
 VF610 NAND DRIVER
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 3b1d312bb175..0e3e4dacbc12 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -95,7 +95,7 @@ config VT_HW_CONSOLE_BINDING
 
 	 See <file:Documentation/console/console.txt> for more
 	 information. For framebuffer console users, please refer to
-	 <file:Documentation/fb/fbcon.txt>.
+	 <file:Documentation/fb/fbcon.rst>.
 
 config UNIX98_PTYS
 	bool "Unix98 PTY support" if EXPERT
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 1b2f5f31fb6f..737b86328c9e 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -31,7 +31,7 @@ menuconfig FB
 	  in the /dev directory, i.e. /dev/fb*.
 
 	  You need an utility program called fbset to make full use of frame
-	  buffer devices. Please read <file:Documentation/fb/framebuffer.txt>
+	  buffer devices. Please read <file:Documentation/fb/framebuffer.rst>
 	  and the Framebuffer-HOWTO at
 	  <http://www.munted.org.uk/programming/Framebuffer-HOWTO-1.3.html> for more
 	  information.
@@ -241,7 +241,7 @@ config FB_CIRRUS
 	  If you have a PCI-based system, this enables support for these
 	  chips: GD-543x, GD-544x, GD-5480.
 
-	  Please read the file <file:Documentation/fb/cirrusfb.txt>.
+	  Please read the file <file:Documentation/fb/cirrusfb.rst>.
 
 	  Say N unless you have such a graphics board or plan to get one
 	  before you next recompile the kernel.
@@ -614,7 +614,7 @@ config FB_UVESA
 
 	  This driver generally provides more features than vesafb but
 	  requires a userspace helper application called 'v86d'. See
-	  <file:Documentation/fb/uvesafb.txt> for more information.
+	  <file:Documentation/fb/uvesafb.rst> for more information.
 
 	  If unsure, say N.
 
@@ -629,7 +629,7 @@ config FB_VESA
 	  This is the frame buffer device driver for generic VESA 2.0
 	  compliant graphic cards. The older VESA 1.2 cards are not supported.
 	  You will get a boot time penguin logo at no additional cost. Please
-	  read <file:Documentation/fb/vesafb.txt>. If unsure, say Y.
+	  read <file:Documentation/fb/vesafb.rst>. If unsure, say Y.
 
 config FB_EFI
 	bool "EFI-based Framebuffer Support"
@@ -825,7 +825,7 @@ config FB_PVR2
 	  module load time.  The parameters look like "video=pvr2:XXX", where
 	  the meaning of XXX can be found at the end of the main source file
 	  (<file:drivers/video/pvr2fb.c>). Please see the file
-	  <file:Documentation/fb/pvr2fb.txt>.
+	  <file:Documentation/fb/pvr2fb.rst>.
 
 config FB_OPENCORES
 	tristate "OpenCores VGA/LCD core 2.0 framebuffer support"
@@ -987,7 +987,7 @@ config FB_I810
 	  module will be called i810fb.
 
 	  For more information, please read
-	  <file:Documentation/fb/intel810.txt>
+	  <file:Documentation/fb/intel810.rst>
 
 config FB_I810_GTF
 	bool "use VESA Generalized Timing Formula"
@@ -1057,7 +1057,7 @@ config FB_INTEL
 	  To compile this driver as a module, choose M here: the
 	  module will be called intelfb.
 
-	  For more information, please read <file:Documentation/fb/intelfb.txt>
+	  For more information, please read <file:Documentation/fb/intelfb.rst>
 
 config FB_INTEL_DEBUG
 	bool "Intel driver Debug Messages"
@@ -1094,7 +1094,7 @@ config FB_MATROX
 
 	  You can pass several parameters to the driver at boot time or at
 	  module load time. The parameters look like "video=matroxfb:XXX", and
-	  are described in <file:Documentation/fb/matroxfb.txt>.
+	  are described in <file:Documentation/fb/matroxfb.rst>.
 
 config FB_MATROX_MILLENIUM
 	bool "Millennium I/II support"
@@ -1245,7 +1245,7 @@ config FB_ATY128
 	help
 	  This driver supports graphics boards with the ATI Rage128 chips.
 	  Say Y if you have such a graphics board and read
-	  <file:Documentation/fb/aty128fb.txt>.
+	  <file:Documentation/fb/aty128fb.rst>.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called aty128fb.
@@ -1507,7 +1507,7 @@ config FB_VOODOO1
 
 	  WARNING: Do not use any application that uses the 3D engine
 	  (namely glide) while using this driver.
-	  Please read the <file:Documentation/fb/sstfb.txt> for supported
+	  Please read the <file:Documentation/fb/sstfb.rst> for supported
 	  options and other important info  support.
 
 config FB_VT8623
@@ -1539,7 +1539,7 @@ config FB_TRIDENT
 	  There are also integrated versions of these chips called CyberXXXX,
 	  CyberImage or CyberBlade. These chips are mostly found in laptops
 	  but also on some motherboards including early VIA EPIA motherboards.
-	  For more information, read <file:Documentation/fb/tridentfb.txt>
+	  For more information, read <file:Documentation/fb/tridentfb.rst>
 
 	  Say Y if you have such a graphics board.
 
@@ -1778,7 +1778,7 @@ config FB_PXA_PARAMETERS
 	  single model of flatpanel then you can safely leave this
 	  option disabled.
 
-	  <file:Documentation/fb/pxafb.txt> describes the available parameters.
+	  <file:Documentation/fb/pxafb.rst> describes the available parameters.
 
 config PXA3XX_GCU
 	tristate "PXA3xx 2D graphics accelerator driver"
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
index c76bef078c75..1a555f70923a 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.c
+++ b/drivers/video/fbdev/matrox/matroxfb_base.c
@@ -2502,7 +2502,7 @@ MODULE_PARM_DESC(nobios, "Disables ROM BIOS (0 or 1=disabled) (default=do not ch
 module_param(noinit, int, 0);
 MODULE_PARM_DESC(noinit, "Disables W/SG/SD-RAM and bus interface initialization (0 or 1=do not initialize) (default=0)");
 module_param(memtype, int, 0);
-MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.txt for explanation) (default=3 for G200, 0 for G400)");
+MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.rst for explanation) (default=3 for G200, 0 for G400)");
 module_param(mtrr, int, 0);
 MODULE_PARM_DESC(mtrr, "This speeds up video memory accesses (0=disabled or 1) (default=1)");
 module_param(sgram, int, 0);
diff --git a/drivers/video/fbdev/pxafb.c b/drivers/video/fbdev/pxafb.c
index d59c8a59f582..4282cb117b92 100644
--- a/drivers/video/fbdev/pxafb.c
+++ b/drivers/video/fbdev/pxafb.c
@@ -2068,7 +2068,7 @@ static int __init pxafb_setup_options(void)
 #define pxafb_setup_options()		(0)
 
 module_param_string(options, g_options, sizeof(g_options), 0);
-MODULE_PARM_DESC(options, "LCD parameters (see Documentation/fb/pxafb.txt)");
+MODULE_PARM_DESC(options, "LCD parameters (see Documentation/fb/pxafb.rst)");
 #endif
 
 #else
diff --git a/drivers/video/fbdev/sh7760fb.c b/drivers/video/fbdev/sh7760fb.c
index 405715b60ec7..ab8fe838c776 100644
--- a/drivers/video/fbdev/sh7760fb.c
+++ b/drivers/video/fbdev/sh7760fb.c
@@ -6,7 +6,7 @@
  *             Manuel Lauss <mano@roarinelk.homelinux.net>
  * (c) 2008 Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
  *
- * PLEASE HAVE A LOOK AT Documentation/fb/sh7760fb.txt!
+ * PLEASE HAVE A LOOK AT Documentation/fb/sh7760fb.rst!
  *
  * Thanks to Siegfried Schaefer <s.schaefer at schaefer-edv.de>
  *     for his original source and testing!
-- 
cgit v1.2.3-59-g8ed1b


From c220a1fae6c5df52ed3a02f88b86a27830ea0210 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:46 -0300
Subject: docs: fpga: convert docs to ReST and rename to *.rst

The dfl.txt file is almost there. It needs just a few
adjustments to be properly parsed.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/fpga/dfl.rst   | 291 +++++++++++++++++++++++++++++++++++++++++++
 Documentation/fpga/dfl.txt   | 285 ------------------------------------------
 Documentation/fpga/index.rst |  17 +++
 MAINTAINERS                  |   2 +-
 4 files changed, 309 insertions(+), 286 deletions(-)
 create mode 100644 Documentation/fpga/dfl.rst
 delete mode 100644 Documentation/fpga/dfl.txt
 create mode 100644 Documentation/fpga/index.rst

diff --git a/Documentation/fpga/dfl.rst b/Documentation/fpga/dfl.rst
new file mode 100644
index 000000000000..2f125abd777f
--- /dev/null
+++ b/Documentation/fpga/dfl.rst
@@ -0,0 +1,291 @@
+=================================================
+FPGA Device Feature List (DFL) Framework Overview
+=================================================
+
+Authors:
+
+- Enno Luebbers <enno.luebbers@intel.com>
+- Xiao Guangrong <guangrong.xiao@linux.intel.com>
+- Wu Hao <hao.wu@intel.com>
+
+The Device Feature List (DFL) FPGA framework (and drivers according to this
+this framework) hides the very details of low layer hardwares and provides
+unified interfaces to userspace. Applications could use these interfaces to
+configure, enumerate, open and access FPGA accelerators on platforms which
+implement the DFL in the device memory. Besides this, the DFL framework
+enables system level management functions such as FPGA reconfiguration.
+
+
+Device Feature List (DFL) Overview
+==================================
+Device Feature List (DFL) defines a linked list of feature headers within the
+device MMIO space to provide an extensible way of adding features. Software can
+walk through these predefined data structures to enumerate FPGA features:
+FPGA Interface Unit (FIU), Accelerated Function Unit (AFU) and Private Features,
+as illustrated below::
+
+    Header            Header            Header            Header
+ +----------+  +-->+----------+  +-->+----------+  +-->+----------+
+ |   Type   |  |   |  Type    |  |   |  Type    |  |   |  Type    |
+ |   FIU    |  |   | Private  |  |   | Private  |  |   | Private  |
+ +----------+  |   | Feature  |  |   | Feature  |  |   | Feature  |
+ | Next_DFH |--+   +----------+  |   +----------+  |   +----------+
+ +----------+      | Next_DFH |--+   | Next_DFH |--+   | Next_DFH |--> NULL
+ |    ID    |      +----------+      +----------+      +----------+
+ +----------+      |    ID    |      |    ID    |      |    ID    |
+ | Next_AFU |--+   +----------+      +----------+      +----------+
+ +----------+  |   | Feature  |      | Feature  |      | Feature  |
+ |  Header  |  |   | Register |      | Register |      | Register |
+ | Register |  |   |   Set    |      |   Set    |      |   Set    |
+ |   Set    |  |   +----------+      +----------+      +----------+
+ +----------+  |      Header
+               +-->+----------+
+                   |   Type   |
+                   |   AFU    |
+                   +----------+
+                   | Next_DFH |--> NULL
+                   +----------+
+                   |   GUID   |
+                   +----------+
+                   |  Header  |
+                   | Register |
+                   |   Set    |
+                   +----------+
+
+FPGA Interface Unit (FIU) represents a standalone functional unit for the
+interface to FPGA, e.g. the FPGA Management Engine (FME) and Port (more
+descriptions on FME and Port in later sections).
+
+Accelerated Function Unit (AFU) represents a FPGA programmable region and
+always connects to a FIU (e.g. a Port) as its child as illustrated above.
+
+Private Features represent sub features of the FIU and AFU. They could be
+various function blocks with different IDs, but all private features which
+belong to the same FIU or AFU, must be linked to one list via the Next Device
+Feature Header (Next_DFH) pointer.
+
+Each FIU, AFU and Private Feature could implement its own functional registers.
+The functional register set for FIU and AFU, is named as Header Register Set,
+e.g. FME Header Register Set, and the one for Private Feature, is named as
+Feature Register Set, e.g. FME Partial Reconfiguration Feature Register Set.
+
+This Device Feature List provides a way of linking features together, it's
+convenient for software to locate each feature by walking through this list,
+and can be implemented in register regions of any FPGA device.
+
+
+FIU - FME (FPGA Management Engine)
+==================================
+The FPGA Management Engine performs reconfiguration and other infrastructure
+functions. Each FPGA device only has one FME.
+
+User-space applications can acquire exclusive access to the FME using open(),
+and release it using close().
+
+The following functions are exposed through ioctls:
+
+- Get driver API version (DFL_FPGA_GET_API_VERSION)
+- Check for extensions (DFL_FPGA_CHECK_EXTENSION)
+- Program bitstream (DFL_FPGA_FME_PORT_PR)
+
+More functions are exposed through sysfs
+(/sys/class/fpga_region/regionX/dfl-fme.n/):
+
+ Read bitstream ID (bitstream_id)
+     bitstream_id indicates version of the static FPGA region.
+
+ Read bitstream metadata (bitstream_metadata)
+     bitstream_metadata includes detailed information of static FPGA region,
+     e.g. synthesis date and seed.
+
+ Read number of ports (ports_num)
+     one FPGA device may have more than one port, this sysfs interface indicates
+     how many ports the FPGA device has.
+
+
+FIU - PORT
+==========
+A port represents the interface between the static FPGA fabric and a partially
+reconfigurable region containing an AFU. It controls the communication from SW
+to the accelerator and exposes features such as reset and debug. Each FPGA
+device may have more than one port, but always one AFU per port.
+
+
+AFU
+===
+An AFU is attached to a port FIU and exposes a fixed length MMIO region to be
+used for accelerator-specific control registers.
+
+User-space applications can acquire exclusive access to an AFU attached to a
+port by using open() on the port device node and release it using close().
+
+The following functions are exposed through ioctls:
+
+- Get driver API version (DFL_FPGA_GET_API_VERSION)
+- Check for extensions (DFL_FPGA_CHECK_EXTENSION)
+- Get port info (DFL_FPGA_PORT_GET_INFO)
+- Get MMIO region info (DFL_FPGA_PORT_GET_REGION_INFO)
+- Map DMA buffer (DFL_FPGA_PORT_DMA_MAP)
+- Unmap DMA buffer (DFL_FPGA_PORT_DMA_UNMAP)
+- Reset AFU (DFL_FPGA_PORT_RESET)
+
+DFL_FPGA_PORT_RESET:
+  reset the FPGA Port and its AFU. Userspace can do Port
+  reset at any time, e.g. during DMA or Partial Reconfiguration. But it should
+  never cause any system level issue, only functional failure (e.g. DMA or PR
+  operation failure) and be recoverable from the failure.
+
+User-space applications can also mmap() accelerator MMIO regions.
+
+More functions are exposed through sysfs:
+(/sys/class/fpga_region/<regionX>/<dfl-port.m>/):
+
+ Read Accelerator GUID (afu_id)
+     afu_id indicates which PR bitstream is programmed to this AFU.
+
+
+DFL Framework Overview
+======================
+
+::
+
+         +----------+    +--------+ +--------+ +--------+
+         |   FME    |    |  AFU   | |  AFU   | |  AFU   |
+         |  Module  |    | Module | | Module | | Module |
+         +----------+    +--------+ +--------+ +--------+
+                 +-----------------------+
+                 | FPGA Container Device |    Device Feature List
+                 |  (FPGA Base Region)   |         Framework
+                 +-----------------------+
+  ------------------------------------------------------------------
+               +----------------------------+
+               |   FPGA DFL Device Module   |
+               | (e.g. PCIE/Platform Device)|
+               +----------------------------+
+                 +------------------------+
+                 |  FPGA Hardware Device  |
+                 +------------------------+
+
+DFL framework in kernel provides common interfaces to create container device
+(FPGA base region), discover feature devices and their private features from the
+given Device Feature Lists and create platform devices for feature devices
+(e.g. FME, Port and AFU) with related resources under the container device. It
+also abstracts operations for the private features and exposes common ops to
+feature device drivers.
+
+The FPGA DFL Device could be different hardwares, e.g. PCIe device, platform
+device and etc. Its driver module is always loaded first once the device is
+created by the system. This driver plays an infrastructural role in the
+driver architecture. It locates the DFLs in the device memory, handles them
+and related resources to common interfaces from DFL framework for enumeration.
+(Please refer to drivers/fpga/dfl.c for detailed enumeration APIs).
+
+The FPGA Management Engine (FME) driver is a platform driver which is loaded
+automatically after FME platform device creation from the DFL device module. It
+provides the key features for FPGA management, including:
+
+	a) Expose static FPGA region information, e.g. version and metadata.
+	   Users can read related information via sysfs interfaces exposed
+	   by FME driver.
+
+	b) Partial Reconfiguration. The FME driver creates FPGA manager, FPGA
+	   bridges and FPGA regions during PR sub feature initialization. Once
+	   it receives a DFL_FPGA_FME_PORT_PR ioctl from user, it invokes the
+	   common interface function from FPGA Region to complete the partial
+	   reconfiguration of the PR bitstream to the given port.
+
+Similar to the FME driver, the FPGA Accelerated Function Unit (AFU) driver is
+probed once the AFU platform device is created. The main function of this module
+is to provide an interface for userspace applications to access the individual
+accelerators, including basic reset control on port, AFU MMIO region export, dma
+buffer mapping service functions.
+
+After feature platform devices creation, matched platform drivers will be loaded
+automatically to handle different functionalities. Please refer to next sections
+for detailed information on functional units which have been already implemented
+under this DFL framework.
+
+
+Partial Reconfiguration
+=======================
+As mentioned above, accelerators can be reconfigured through partial
+reconfiguration of a PR bitstream file. The PR bitstream file must have been
+generated for the exact static FPGA region and targeted reconfigurable region
+(port) of the FPGA, otherwise, the reconfiguration operation will fail and
+possibly cause system instability. This compatibility can be checked by
+comparing the compatibility ID noted in the header of PR bitstream file against
+the compat_id exposed by the target FPGA region. This check is usually done by
+userspace before calling the reconfiguration IOCTL.
+
+
+Device enumeration
+==================
+This section introduces how applications enumerate the fpga device from
+the sysfs hierarchy under /sys/class/fpga_region.
+
+In the example below, two DFL based FPGA devices are installed in the host. Each
+fpga device has one FME and two ports (AFUs).
+
+FPGA regions are created under /sys/class/fpga_region/::
+
+	/sys/class/fpga_region/region0
+	/sys/class/fpga_region/region1
+	/sys/class/fpga_region/region2
+	...
+
+Application needs to search each regionX folder, if feature device is found,
+(e.g. "dfl-port.n" or "dfl-fme.m" is found), then it's the base
+fpga region which represents the FPGA device.
+
+Each base region has one FME and two ports (AFUs) as child devices::
+
+	/sys/class/fpga_region/region0/dfl-fme.0
+	/sys/class/fpga_region/region0/dfl-port.0
+	/sys/class/fpga_region/region0/dfl-port.1
+	...
+
+	/sys/class/fpga_region/region3/dfl-fme.1
+	/sys/class/fpga_region/region3/dfl-port.2
+	/sys/class/fpga_region/region3/dfl-port.3
+	...
+
+In general, the FME/AFU sysfs interfaces are named as follows::
+
+	/sys/class/fpga_region/<regionX>/<dfl-fme.n>/
+	/sys/class/fpga_region/<regionX>/<dfl-port.m>/
+
+with 'n' consecutively numbering all FMEs and 'm' consecutively numbering all
+ports.
+
+The device nodes used for ioctl() or mmap() can be referenced through::
+
+	/sys/class/fpga_region/<regionX>/<dfl-fme.n>/dev
+	/sys/class/fpga_region/<regionX>/<dfl-port.n>/dev
+
+
+Add new FIUs support
+====================
+It's possible that developers made some new function blocks (FIUs) under this
+DFL framework, then new platform device driver needs to be developed for the
+new feature dev (FIU) following the same way as existing feature dev drivers
+(e.g. FME and Port/AFU platform device driver). Besides that, it requires
+modification on DFL framework enumeration code too, for new FIU type detection
+and related platform devices creation.
+
+
+Add new private features support
+================================
+In some cases, we may need to add some new private features to existing FIUs
+(e.g. FME or Port). Developers don't need to touch enumeration code in DFL
+framework, as each private feature will be parsed automatically and related
+mmio resources can be found under FIU platform device created by DFL framework.
+Developer only needs to provide a sub feature driver with matched feature id.
+FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
+could be a reference.
+
+
+Open discussion
+===============
+FME driver exports one ioctl (DFL_FPGA_FME_PORT_PR) for partial reconfiguration
+to user now. In the future, if unified user interfaces for reconfiguration are
+added, FME driver should switch to them from ioctl interface.
diff --git a/Documentation/fpga/dfl.txt b/Documentation/fpga/dfl.txt
deleted file mode 100644
index 6df4621c3f2a..000000000000
--- a/Documentation/fpga/dfl.txt
+++ /dev/null
@@ -1,285 +0,0 @@
-===============================================================================
-              FPGA Device Feature List (DFL) Framework Overview
--------------------------------------------------------------------------------
-                Enno Luebbers <enno.luebbers@intel.com>
-                Xiao Guangrong <guangrong.xiao@linux.intel.com>
-                Wu Hao <hao.wu@intel.com>
-
-The Device Feature List (DFL) FPGA framework (and drivers according to this
-this framework) hides the very details of low layer hardwares and provides
-unified interfaces to userspace. Applications could use these interfaces to
-configure, enumerate, open and access FPGA accelerators on platforms which
-implement the DFL in the device memory. Besides this, the DFL framework
-enables system level management functions such as FPGA reconfiguration.
-
-
-Device Feature List (DFL) Overview
-==================================
-Device Feature List (DFL) defines a linked list of feature headers within the
-device MMIO space to provide an extensible way of adding features. Software can
-walk through these predefined data structures to enumerate FPGA features:
-FPGA Interface Unit (FIU), Accelerated Function Unit (AFU) and Private Features,
-as illustrated below:
-
-    Header            Header            Header            Header
- +----------+  +-->+----------+  +-->+----------+  +-->+----------+
- |   Type   |  |   |  Type    |  |   |  Type    |  |   |  Type    |
- |   FIU    |  |   | Private  |  |   | Private  |  |   | Private  |
- +----------+  |   | Feature  |  |   | Feature  |  |   | Feature  |
- | Next_DFH |--+   +----------+  |   +----------+  |   +----------+
- +----------+      | Next_DFH |--+   | Next_DFH |--+   | Next_DFH |--> NULL
- |    ID    |      +----------+      +----------+      +----------+
- +----------+      |    ID    |      |    ID    |      |    ID    |
- | Next_AFU |--+   +----------+      +----------+      +----------+
- +----------+  |   | Feature  |      | Feature  |      | Feature  |
- |  Header  |  |   | Register |      | Register |      | Register |
- | Register |  |   |   Set    |      |   Set    |      |   Set    |
- |   Set    |  |   +----------+      +----------+      +----------+
- +----------+  |      Header
-               +-->+----------+
-                   |   Type   |
-                   |   AFU    |
-                   +----------+
-                   | Next_DFH |--> NULL
-                   +----------+
-                   |   GUID   |
-                   +----------+
-                   |  Header  |
-                   | Register |
-                   |   Set    |
-                   +----------+
-
-FPGA Interface Unit (FIU) represents a standalone functional unit for the
-interface to FPGA, e.g. the FPGA Management Engine (FME) and Port (more
-descriptions on FME and Port in later sections).
-
-Accelerated Function Unit (AFU) represents a FPGA programmable region and
-always connects to a FIU (e.g. a Port) as its child as illustrated above.
-
-Private Features represent sub features of the FIU and AFU. They could be
-various function blocks with different IDs, but all private features which
-belong to the same FIU or AFU, must be linked to one list via the Next Device
-Feature Header (Next_DFH) pointer.
-
-Each FIU, AFU and Private Feature could implement its own functional registers.
-The functional register set for FIU and AFU, is named as Header Register Set,
-e.g. FME Header Register Set, and the one for Private Feature, is named as
-Feature Register Set, e.g. FME Partial Reconfiguration Feature Register Set.
-
-This Device Feature List provides a way of linking features together, it's
-convenient for software to locate each feature by walking through this list,
-and can be implemented in register regions of any FPGA device.
-
-
-FIU - FME (FPGA Management Engine)
-==================================
-The FPGA Management Engine performs reconfiguration and other infrastructure
-functions. Each FPGA device only has one FME.
-
-User-space applications can acquire exclusive access to the FME using open(),
-and release it using close().
-
-The following functions are exposed through ioctls:
-
- Get driver API version (DFL_FPGA_GET_API_VERSION)
- Check for extensions (DFL_FPGA_CHECK_EXTENSION)
- Program bitstream (DFL_FPGA_FME_PORT_PR)
-
-More functions are exposed through sysfs
-(/sys/class/fpga_region/regionX/dfl-fme.n/):
-
- Read bitstream ID (bitstream_id)
-     bitstream_id indicates version of the static FPGA region.
-
- Read bitstream metadata (bitstream_metadata)
-     bitstream_metadata includes detailed information of static FPGA region,
-     e.g. synthesis date and seed.
-
- Read number of ports (ports_num)
-     one FPGA device may have more than one port, this sysfs interface indicates
-     how many ports the FPGA device has.
-
-
-FIU - PORT
-==========
-A port represents the interface between the static FPGA fabric and a partially
-reconfigurable region containing an AFU. It controls the communication from SW
-to the accelerator and exposes features such as reset and debug. Each FPGA
-device may have more than one port, but always one AFU per port.
-
-
-AFU
-===
-An AFU is attached to a port FIU and exposes a fixed length MMIO region to be
-used for accelerator-specific control registers.
-
-User-space applications can acquire exclusive access to an AFU attached to a
-port by using open() on the port device node and release it using close().
-
-The following functions are exposed through ioctls:
-
- Get driver API version (DFL_FPGA_GET_API_VERSION)
- Check for extensions (DFL_FPGA_CHECK_EXTENSION)
- Get port info (DFL_FPGA_PORT_GET_INFO)
- Get MMIO region info (DFL_FPGA_PORT_GET_REGION_INFO)
- Map DMA buffer (DFL_FPGA_PORT_DMA_MAP)
- Unmap DMA buffer (DFL_FPGA_PORT_DMA_UNMAP)
- Reset AFU (*DFL_FPGA_PORT_RESET)
-
-*DFL_FPGA_PORT_RESET: reset the FPGA Port and its AFU. Userspace can do Port
-reset at any time, e.g. during DMA or Partial Reconfiguration. But it should
-never cause any system level issue, only functional failure (e.g. DMA or PR
-operation failure) and be recoverable from the failure.
-
-User-space applications can also mmap() accelerator MMIO regions.
-
-More functions are exposed through sysfs:
-(/sys/class/fpga_region/<regionX>/<dfl-port.m>/):
-
- Read Accelerator GUID (afu_id)
-     afu_id indicates which PR bitstream is programmed to this AFU.
-
-
-DFL Framework Overview
-======================
-
-         +----------+    +--------+ +--------+ +--------+
-         |   FME    |    |  AFU   | |  AFU   | |  AFU   |
-         |  Module  |    | Module | | Module | | Module |
-         +----------+    +--------+ +--------+ +--------+
-                 +-----------------------+
-                 | FPGA Container Device |    Device Feature List
-                 |  (FPGA Base Region)   |         Framework
-                 +-----------------------+
---------------------------------------------------------------------
-               +----------------------------+
-               |   FPGA DFL Device Module   |
-               | (e.g. PCIE/Platform Device)|
-               +----------------------------+
-                 +------------------------+
-                 |  FPGA Hardware Device  |
-                 +------------------------+
-
-DFL framework in kernel provides common interfaces to create container device
-(FPGA base region), discover feature devices and their private features from the
-given Device Feature Lists and create platform devices for feature devices
-(e.g. FME, Port and AFU) with related resources under the container device. It
-also abstracts operations for the private features and exposes common ops to
-feature device drivers.
-
-The FPGA DFL Device could be different hardwares, e.g. PCIe device, platform
-device and etc. Its driver module is always loaded first once the device is
-created by the system. This driver plays an infrastructural role in the
-driver architecture. It locates the DFLs in the device memory, handles them
-and related resources to common interfaces from DFL framework for enumeration.
-(Please refer to drivers/fpga/dfl.c for detailed enumeration APIs).
-
-The FPGA Management Engine (FME) driver is a platform driver which is loaded
-automatically after FME platform device creation from the DFL device module. It
-provides the key features for FPGA management, including:
-
-	a) Expose static FPGA region information, e.g. version and metadata.
-	   Users can read related information via sysfs interfaces exposed
-	   by FME driver.
-
-	b) Partial Reconfiguration. The FME driver creates FPGA manager, FPGA
-	   bridges and FPGA regions during PR sub feature initialization. Once
-	   it receives a DFL_FPGA_FME_PORT_PR ioctl from user, it invokes the
-	   common interface function from FPGA Region to complete the partial
-	   reconfiguration of the PR bitstream to the given port.
-
-Similar to the FME driver, the FPGA Accelerated Function Unit (AFU) driver is
-probed once the AFU platform device is created. The main function of this module
-is to provide an interface for userspace applications to access the individual
-accelerators, including basic reset control on port, AFU MMIO region export, dma
-buffer mapping service functions.
-
-After feature platform devices creation, matched platform drivers will be loaded
-automatically to handle different functionalities. Please refer to next sections
-for detailed information on functional units which have been already implemented
-under this DFL framework.
-
-
-Partial Reconfiguration
-=======================
-As mentioned above, accelerators can be reconfigured through partial
-reconfiguration of a PR bitstream file. The PR bitstream file must have been
-generated for the exact static FPGA region and targeted reconfigurable region
-(port) of the FPGA, otherwise, the reconfiguration operation will fail and
-possibly cause system instability. This compatibility can be checked by
-comparing the compatibility ID noted in the header of PR bitstream file against
-the compat_id exposed by the target FPGA region. This check is usually done by
-userspace before calling the reconfiguration IOCTL.
-
-
-Device enumeration
-==================
-This section introduces how applications enumerate the fpga device from
-the sysfs hierarchy under /sys/class/fpga_region.
-
-In the example below, two DFL based FPGA devices are installed in the host. Each
-fpga device has one FME and two ports (AFUs).
-
-FPGA regions are created under /sys/class/fpga_region/
-
-	/sys/class/fpga_region/region0
-	/sys/class/fpga_region/region1
-	/sys/class/fpga_region/region2
-	...
-
-Application needs to search each regionX folder, if feature device is found,
-(e.g. "dfl-port.n" or "dfl-fme.m" is found), then it's the base
-fpga region which represents the FPGA device.
-
-Each base region has one FME and two ports (AFUs) as child devices:
-
-	/sys/class/fpga_region/region0/dfl-fme.0
-	/sys/class/fpga_region/region0/dfl-port.0
-	/sys/class/fpga_region/region0/dfl-port.1
-	...
-
-	/sys/class/fpga_region/region3/dfl-fme.1
-	/sys/class/fpga_region/region3/dfl-port.2
-	/sys/class/fpga_region/region3/dfl-port.3
-	...
-
-In general, the FME/AFU sysfs interfaces are named as follows:
-
-	/sys/class/fpga_region/<regionX>/<dfl-fme.n>/
-	/sys/class/fpga_region/<regionX>/<dfl-port.m>/
-
-with 'n' consecutively numbering all FMEs and 'm' consecutively numbering all
-ports.
-
-The device nodes used for ioctl() or mmap() can be referenced through:
-
-	/sys/class/fpga_region/<regionX>/<dfl-fme.n>/dev
-	/sys/class/fpga_region/<regionX>/<dfl-port.n>/dev
-
-
-Add new FIUs support
-====================
-It's possible that developers made some new function blocks (FIUs) under this
-DFL framework, then new platform device driver needs to be developed for the
-new feature dev (FIU) following the same way as existing feature dev drivers
-(e.g. FME and Port/AFU platform device driver). Besides that, it requires
-modification on DFL framework enumeration code too, for new FIU type detection
-and related platform devices creation.
-
-
-Add new private features support
-================================
-In some cases, we may need to add some new private features to existing FIUs
-(e.g. FME or Port). Developers don't need to touch enumeration code in DFL
-framework, as each private feature will be parsed automatically and related
-mmio resources can be found under FIU platform device created by DFL framework.
-Developer only needs to provide a sub feature driver with matched feature id.
-FME Partial Reconfiguration Sub Feature driver (see drivers/fpga/dfl-fme-pr.c)
-could be a reference.
-
-
-Open discussion
-===============
-FME driver exports one ioctl (DFL_FPGA_FME_PORT_PR) for partial reconfiguration
-to user now. In the future, if unified user interfaces for reconfiguration are
-added, FME driver should switch to them from ioctl interface.
diff --git a/Documentation/fpga/index.rst b/Documentation/fpga/index.rst
new file mode 100644
index 000000000000..2c87d1ea084f
--- /dev/null
+++ b/Documentation/fpga/index.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+====
+fpga
+====
+
+.. toctree::
+    :maxdepth: 1
+
+    dfl
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/MAINTAINERS b/MAINTAINERS
index 314545af6f45..ac88ed99fca5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6251,7 +6251,7 @@ FPGA DFL DRIVERS
 M:	Wu Hao <hao.wu@intel.com>
 L:	linux-fpga@vger.kernel.org
 S:	Maintained
-F:	Documentation/fpga/dfl.txt
+F:	Documentation/fpga/dfl.rst
 F:	include/uapi/linux/fpga-dfl.h
 F:	drivers/fpga/dfl*
 
-- 
cgit v1.2.3-59-g8ed1b


From d7b461c5e82fc5f5e4261f3b0228ecda58eb9f1a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:47 -0300
Subject: docs: ide: convert docs to ReST and rename to *.rst

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt |   2 +-
 Documentation/cdrom/ide-cd.rst                  |  18 +-
 Documentation/ide/changelogs.rst                |  17 ++
 Documentation/ide/ide-tape.rst                  |  68 ++++++
 Documentation/ide/ide-tape.txt                  |  65 ------
 Documentation/ide/ide.rst                       | 265 ++++++++++++++++++++++++
 Documentation/ide/ide.txt                       | 256 -----------------------
 Documentation/ide/index.rst                     |  21 ++
 Documentation/ide/warm-plug-howto.rst           |  18 ++
 Documentation/ide/warm-plug-howto.txt           |  18 --
 arch/m68k/q40/README                            |   2 +-
 drivers/ide/Kconfig                             |  20 +-
 12 files changed, 410 insertions(+), 360 deletions(-)
 create mode 100644 Documentation/ide/changelogs.rst
 create mode 100644 Documentation/ide/ide-tape.rst
 delete mode 100644 Documentation/ide/ide-tape.txt
 create mode 100644 Documentation/ide/ide.rst
 delete mode 100644 Documentation/ide/ide.txt
 create mode 100644 Documentation/ide/index.rst
 create mode 100644 Documentation/ide/warm-plug-howto.rst
 delete mode 100644 Documentation/ide/warm-plug-howto.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 83d6560f10f0..81c168b25b20 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1504,7 +1504,7 @@
 			Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
 			.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
 			.cdrom .chs .ignore_cable are additional options
-			See Documentation/ide/ide.txt.
+			See Documentation/ide/ide.rst.
 
 	ide-generic.probe-mask= [HW] (E)IDE subsystem
 			Format: <int>
diff --git a/Documentation/cdrom/ide-cd.rst b/Documentation/cdrom/ide-cd.rst
index dadc94ef6b6c..bdccb74fc92d 100644
--- a/Documentation/cdrom/ide-cd.rst
+++ b/Documentation/cdrom/ide-cd.rst
@@ -47,7 +47,7 @@ This driver provides the following features:
 ---------------
 
 0. The ide-cd relies on the ide disk driver.  See
-   Documentation/ide/ide.txt for up-to-date information on the ide
+   Documentation/ide/ide.rst for up-to-date information on the ide
    driver.
 
 1. Make sure that the ide and ide-cd drivers are compiled into the
@@ -62,7 +62,7 @@ This driver provides the following features:
 
    Depending on what type of IDE interface you have, you may need to
    specify additional configuration options.  See
-   Documentation/ide/ide.txt.
+   Documentation/ide/ide.rst.
 
 2. You should also ensure that the iso9660 filesystem is either
    compiled into the kernel or available as a loadable module.  You
@@ -82,7 +82,7 @@ This driver provides the following features:
    on the primary IDE interface are called `hda` and `hdb`,
    respectively.  The drives on the secondary interface are called
    `hdc` and `hdd`.  (Interfaces at other locations get other letters
-   in the third position; see Documentation/ide/ide.txt.)
+   in the third position; see Documentation/ide/ide.rst.)
 
    If you want your CDROM drive to be found automatically by the
    driver, you should make sure your IDE interface uses either the
@@ -91,7 +91,7 @@ This driver provides the following features:
    be jumpered as `master`.  (If for some reason you cannot configure
    your system in this manner, you can probably still use the driver.
    You may have to pass extra configuration information to the kernel
-   when you boot, however.  See Documentation/ide/ide.txt for more
+   when you boot, however.  See Documentation/ide/ide.rst for more
    information.)
 
 4. Boot the system.  If the drive is recognized, you should see a
@@ -163,7 +163,7 @@ to change.  If the slot number is -1, the drive is unloaded.
 This section discusses some common problems encountered when trying to
 use the driver, and some possible solutions.  Note that if you are
 experiencing problems, you should probably also review
-Documentation/ide/ide.txt for current information about the underlying
+Documentation/ide/ide.rst for current information about the underlying
 IDE support code.  Some of these items apply only to earlier versions
 of the driver, but are mentioned here for completeness.
 
@@ -173,7 +173,7 @@ from the driver.
 a. Drive is not detected during booting.
 
    - Review the configuration instructions above and in
-     Documentation/ide/ide.txt, and check how your hardware is
+     Documentation/ide/ide.rst, and check how your hardware is
      configured.
 
    - If your drive is the only device on an IDE interface, it should
@@ -181,7 +181,7 @@ a. Drive is not detected during booting.
 
    - If your IDE interface is not at the standard addresses of 0x170
      or 0x1f0, you'll need to explicitly inform the driver using a
-     lilo option.  See Documentation/ide/ide.txt.  (This feature was
+     lilo option.  See Documentation/ide/ide.rst.  (This feature was
      added around kernel version 1.3.30.)
 
    - If the autoprobing is not finding your drive, you can tell the
@@ -207,7 +207,7 @@ a. Drive is not detected during booting.
      Support for some interfaces needing extra initialization is
      provided in later 1.3.x kernels.  You may need to turn on
      additional kernel configuration options to get them to work;
-     see Documentation/ide/ide.txt.
+     see Documentation/ide/ide.rst.
 
      Even if support is not available for your interface, you may be
      able to get it to work with the following procedure.  First boot
@@ -261,7 +261,7 @@ c. System hangups.
     be worked around by specifying the `serialize` option when
     booting.  Recent kernels should be able to detect the need for
     this automatically in most cases, but the detection is not
-    foolproof.  See Documentation/ide/ide.txt for more information
+    foolproof.  See Documentation/ide/ide.rst for more information
     about the `serialize` option and the CMD640B.
 
   - Note that many MS-DOS CDROM drivers will work with such buggy
diff --git a/Documentation/ide/changelogs.rst b/Documentation/ide/changelogs.rst
new file mode 100644
index 000000000000..fdf9d0fb8027
--- /dev/null
+++ b/Documentation/ide/changelogs.rst
@@ -0,0 +1,17 @@
+Changelog for ide cd
+--------------------
+
+ .. include:: ChangeLog.ide-cd.1994-2004
+    :literal:
+
+Changelog for ide floppy
+------------------------
+
+ .. include:: ChangeLog.ide-floppy.1996-2002
+    :literal:
+
+Changelog for ide tape
+----------------------
+
+ .. include:: ChangeLog.ide-tape.1995-2002
+    :literal:
diff --git a/Documentation/ide/ide-tape.rst b/Documentation/ide/ide-tape.rst
new file mode 100644
index 000000000000..3e061d9c0e38
--- /dev/null
+++ b/Documentation/ide/ide-tape.rst
@@ -0,0 +1,68 @@
+===============================
+IDE ATAPI streaming tape driver
+===============================
+
+This driver is a part of the Linux ide driver.
+
+The driver, in co-operation with ide.c, basically traverses the
+request-list for the block device interface. The character device
+interface, on the other hand, creates new requests, adds them
+to the request-list of the block device, and waits for their completion.
+
+The block device major and minor numbers are determined from the
+tape's relative position in the ide interfaces, as explained in ide.c.
+
+The character device interface consists of the following devices::
+
+  ht0		major 37, minor 0	first  IDE tape, rewind on close.
+  ht1		major 37, minor 1	second IDE tape, rewind on close.
+  ...
+  nht0		major 37, minor 128	first  IDE tape, no rewind on close.
+  nht1		major 37, minor 129	second IDE tape, no rewind on close.
+  ...
+
+The general magnetic tape commands compatible interface, as defined by
+include/linux/mtio.h, is accessible through the character device.
+
+General ide driver configuration options, such as the interrupt-unmask
+flag, can be configured by issuing an ioctl to the block device interface,
+as any other ide device.
+
+Our own ide-tape ioctl's can be issued to either the block device or
+the character device interface.
+
+Maximal throughput with minimal bus load will usually be achieved in the
+following scenario:
+
+     1.	ide-tape is operating in the pipelined operation mode.
+     2.	No buffering is performed by the user backup program.
+
+Testing was done with a 2 GB CONNER CTMA 4000 IDE ATAPI Streaming Tape Drive.
+
+Here are some words from the first releases of hd.c, which are quoted
+in ide.c and apply here as well:
+
+* Special care is recommended.  Have Fun!
+
+Possible improvements
+=====================
+
+1. Support for the ATAPI overlap protocol.
+
+In order to maximize bus throughput, we currently use the DSC
+overlap method which enables ide.c to service requests from the
+other device while the tape is busy executing a command. The
+DSC overlap method involves polling the tape's status register
+for the DSC bit, and servicing the other device while the tape
+isn't ready.
+
+In the current QIC development standard (December 1995),
+it is recommended that new tape drives will *in addition*
+implement the ATAPI overlap protocol, which is used for the
+same purpose - efficient use of the IDE bus, but is interrupt
+driven and thus has much less CPU overhead.
+
+ATAPI overlap is likely to be supported in most new ATAPI
+devices, including new ATAPI cdroms, and thus provides us
+a method by which we can achieve higher throughput when
+sharing a (fast) ATA-2 disk with any (slow) new ATAPI device.
diff --git a/Documentation/ide/ide-tape.txt b/Documentation/ide/ide-tape.txt
deleted file mode 100644
index 3f348a0b21d8..000000000000
--- a/Documentation/ide/ide-tape.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-IDE ATAPI streaming tape driver.
-
-This driver is a part of the Linux ide driver.
-
-The driver, in co-operation with ide.c, basically traverses the
-request-list for the block device interface. The character device
-interface, on the other hand, creates new requests, adds them
-to the request-list of the block device, and waits for their completion.
-
-The block device major and minor numbers are determined from the
-tape's relative position in the ide interfaces, as explained in ide.c.
-
-The character device interface consists of the following devices:
-
-ht0		major 37, minor 0	first  IDE tape, rewind on close.
-ht1		major 37, minor 1	second IDE tape, rewind on close.
-...
-nht0		major 37, minor 128	first  IDE tape, no rewind on close.
-nht1		major 37, minor 129	second IDE tape, no rewind on close.
-...
-
-The general magnetic tape commands compatible interface, as defined by
-include/linux/mtio.h, is accessible through the character device.
-
-General ide driver configuration options, such as the interrupt-unmask
-flag, can be configured by issuing an ioctl to the block device interface,
-as any other ide device.
-
-Our own ide-tape ioctl's can be issued to either the block device or
-the character device interface.
-
-Maximal throughput with minimal bus load will usually be achieved in the
-following scenario:
-
-     1.	ide-tape is operating in the pipelined operation mode.
-     2.	No buffering is performed by the user backup program.
-
-Testing was done with a 2 GB CONNER CTMA 4000 IDE ATAPI Streaming Tape Drive.
-
-Here are some words from the first releases of hd.c, which are quoted
-in ide.c and apply here as well:
-
-| Special care is recommended.  Have Fun!
-
-Possible improvements:
-
-1. Support for the ATAPI overlap protocol.
-
-In order to maximize bus throughput, we currently use the DSC
-overlap method which enables ide.c to service requests from the
-other device while the tape is busy executing a command. The
-DSC overlap method involves polling the tape's status register
-for the DSC bit, and servicing the other device while the tape
-isn't ready.
-
-In the current QIC development standard (December 1995),
-it is recommended that new tape drives will *in addition*
-implement the ATAPI overlap protocol, which is used for the
-same purpose - efficient use of the IDE bus, but is interrupt
-driven and thus has much less CPU overhead.
-
-ATAPI overlap is likely to be supported in most new ATAPI
-devices, including new ATAPI cdroms, and thus provides us
-a method by which we can achieve higher throughput when
-sharing a (fast) ATA-2 disk with any (slow) new ATAPI device.
diff --git a/Documentation/ide/ide.rst b/Documentation/ide/ide.rst
new file mode 100644
index 000000000000..88bdcba92f7d
--- /dev/null
+++ b/Documentation/ide/ide.rst
@@ -0,0 +1,265 @@
+============================================
+Information regarding the Enhanced IDE drive
+============================================
+
+   The hdparm utility can be used to control various IDE features on a
+   running system. It is packaged separately.  Please Look for it on popular
+   linux FTP sites.
+
+-------------------------------------------------------------------------------
+
+.. important::
+
+   BUGGY IDE CHIPSETS CAN CORRUPT DATA!!
+
+    PCI versions of the CMD640 and RZ1000 interfaces are now detected
+    automatically at startup when PCI BIOS support is configured.
+
+    Linux disables the "prefetch" ("readahead") mode of the RZ1000
+    to prevent data corruption possible due to hardware design flaws.
+
+    For the CMD640, linux disables "IRQ unmasking" (hdparm -u1) on any
+    drive for which the "prefetch" mode of the CMD640 is turned on.
+    If "prefetch" is disabled (hdparm -p8), then "IRQ unmasking" can be
+    used again.
+
+    For the CMD640, linux disables "32bit I/O" (hdparm -c1) on any drive
+    for which the "prefetch" mode of the CMD640 is turned off.
+    If "prefetch" is enabled (hdparm -p9), then "32bit I/O" can be
+    used again.
+
+    The CMD640 is also used on some Vesa Local Bus (VLB) cards, and is *NOT*
+    automatically detected by Linux.  For safe, reliable operation with such
+    interfaces, one *MUST* use the "cmd640.probe_vlb" kernel option.
+
+    Use of the "serialize" option is no longer necessary.
+
+-------------------------------------------------------------------------------
+
+Common pitfalls
+===============
+
+- 40-conductor IDE cables are capable of transferring data in DMA modes up to
+  udma2, but no faster.
+
+- If possible devices should be attached to separate channels if they are
+  available. Typically the disk on the first and CD-ROM on the second.
+
+- If you mix devices on the same cable, please consider using similar devices
+  in respect of the data transfer mode they support.
+
+- Even better try to stick to the same vendor and device type on the same
+  cable.
+
+This is the multiple IDE interface driver, as evolved from hd.c
+===============================================================
+
+It supports up to 9 IDE interfaces per default, on one or more IRQs (usually
+14 & 15).  There can be up to two drives per interface, as per the ATA-6 spec.::
+
+  Primary:    ide0, port 0x1f0; major=3;  hda is minor=0; hdb is minor=64
+  Secondary:  ide1, port 0x170; major=22; hdc is minor=0; hdd is minor=64
+  Tertiary:   ide2, port 0x1e8; major=33; hde is minor=0; hdf is minor=64
+  Quaternary: ide3, port 0x168; major=34; hdg is minor=0; hdh is minor=64
+  fifth..     ide4, usually PCI, probed
+  sixth..     ide5, usually PCI, probed
+
+To access devices on interfaces > ide0, device entries please make sure that
+device files for them are present in /dev.  If not, please create such
+entries, by using /dev/MAKEDEV.
+
+This driver automatically probes for most IDE interfaces (including all PCI
+ones), for the drives/geometries attached to those interfaces, and for the IRQ
+lines being used by the interfaces (normally 14, 15 for ide0/ide1).
+
+Any number of interfaces may share a single IRQ if necessary, at a slight
+performance penalty, whether on separate cards or a single VLB card.
+The IDE driver automatically detects and handles this.  However, this may
+or may not be harmful to your hardware.. two or more cards driving the same IRQ
+can potentially burn each other's bus driver, though in practice this
+seldom occurs.  Be careful, and if in doubt, don't do it!
+
+Drives are normally found by auto-probing and/or examining the CMOS/BIOS data.
+For really weird situations, the apparent (fdisk) geometry can also be specified
+on the kernel "command line" using LILO.  The format of such lines is::
+
+	ide_core.chs=[interface_number.device_number]:cyls,heads,sects
+
+or::
+
+	ide_core.cdrom=[interface_number.device_number]
+
+For example::
+
+	ide_core.chs=1.0:1050,32,64  ide_core.cdrom=1.1
+
+The results of successful auto-probing may override the physical geometry/irq
+specified, though the "original" geometry may be retained as the "logical"
+geometry for partitioning purposes (fdisk).
+
+If the auto-probing during boot time confuses a drive (ie. the drive works
+with hd.c but not with ide.c), then an command line option may be specified
+for each drive for which you'd like the drive to skip the hardware
+probe/identification sequence.  For example::
+
+	ide_core.noprobe=0.1
+
+or::
+
+	ide_core.chs=1.0:768,16,32
+	ide_core.noprobe=1.0
+
+Note that when only one IDE device is attached to an interface, it should be
+jumpered as "single" or "master", *not* "slave".  Many folks have had
+"trouble" with cdroms because of this requirement, so the driver now probes
+for both units, though success is more likely when the drive is jumpered
+correctly.
+
+Courtesy of Scott Snyder and others, the driver supports ATAPI cdrom drives
+such as the NEC-260 and the new MITSUMI triple/quad speed drives.
+Such drives will be identified at boot time, just like a hard disk.
+
+If for some reason your cdrom drive is *not* found at boot time, you can force
+the probe to look harder by supplying a kernel command line parameter
+via LILO, such as:::
+
+	ide_core.cdrom=1.0	/* "master" on second interface (hdc) */
+
+or::
+
+	ide_core.cdrom=1.1	/* "slave" on second interface (hdd) */
+
+For example, a GW2000 system might have a hard drive on the primary
+interface (/dev/hda) and an IDE cdrom drive on the secondary interface
+(/dev/hdc).  To mount a CD in the cdrom drive, one would use something like::
+
+	ln -sf /dev/hdc /dev/cdrom
+	mkdir /mnt/cdrom
+	mount /dev/cdrom /mnt/cdrom -t iso9660 -o ro
+
+If, after doing all of the above, mount doesn't work and you see
+errors from the driver (with dmesg) complaining about `status=0xff`,
+this means that the hardware is not responding to the driver's attempts
+to read it.  One of the following is probably the problem:
+
+  - Your hardware is broken.
+
+  - You are using the wrong address for the device, or you have the
+    drive jumpered wrong.  Review the configuration instructions above.
+
+  - Your IDE controller requires some nonstandard initialization sequence
+    before it will work properly.  If this is the case, there will often
+    be a separate MS-DOS driver just for the controller.  IDE interfaces
+    on sound cards usually fall into this category.  Such configurations
+    can often be made to work by first booting MS-DOS, loading the
+    appropriate drivers, and then warm-booting linux (without powering
+    off).  This can be automated using loadlin in the MS-DOS autoexec.
+
+If you always get timeout errors, interrupts from the drive are probably
+not making it to the host.  Check how you have the hardware jumpered
+and make sure it matches what the driver expects (see the configuration
+instructions above).  If you have a PCI system, also check the BIOS
+setup; I've had one report of a system which was shipped with IRQ 15
+disabled by the BIOS.
+
+The kernel is able to execute binaries directly off of the cdrom,
+provided it is mounted with the default block size of 1024 (as above).
+
+Please pass on any feedback on any of this stuff to the maintainer,
+whose address can be found in linux/MAINTAINERS.
+
+The IDE driver is modularized.  The high level disk/CD-ROM/tape/floppy
+drivers can always be compiled as loadable modules, the chipset drivers
+can only be compiled into the kernel, and the core code (ide.c) can be
+compiled as a loadable module provided no chipset support is needed.
+
+When using ide.c as a module in combination with kmod, add::
+
+	alias block-major-3 ide-probe
+
+to a configuration file in /etc/modprobe.d/.
+
+When ide.c is used as a module, you can pass command line parameters to the
+driver using the "options=" keyword to insmod, while replacing any ',' with
+';'.
+
+
+Summary of ide driver parameters for kernel command line
+========================================================
+
+For legacy IDE VLB host drivers (ali14xx/dtc2278/ht6560b/qd65xx/umc8672)
+you need to explicitly enable probing by using "probe" kernel parameter,
+i.e. to enable probing for ALI M14xx chipsets (ali14xx host driver) use:
+
+* "ali14xx.probe" boot option when ali14xx driver is built-in the kernel
+
+* "probe" module parameter when ali14xx driver is compiled as module
+  ("modprobe ali14xx probe")
+
+Also for legacy CMD640 host driver (cmd640) you need to use "probe_vlb"
+kernel paremeter to enable probing for VLB version of the chipset (PCI ones
+are detected automatically).
+
+You also need to use "probe" kernel parameter for ide-4drives driver
+(support for IDE generic chipset with four drives on one port).
+
+To enable support for IDE doublers on Amiga use "doubler" kernel parameter
+for gayle host driver (i.e. "gayle.doubler" if the driver is built-in).
+
+To force ignoring cable detection (this should be needed only if you're using
+short 40-wires cable which cannot be automatically detected - if this is not
+a case please report it as a bug instead) use "ignore_cable" kernel parameter:
+
+* "ide_core.ignore_cable=[interface_number]" boot option if IDE is built-in
+  (i.e. "ide_core.ignore_cable=1" to force ignoring cable for "ide1")
+
+* "ignore_cable=[interface_number]" module parameter (for ide_core module)
+  if IDE is compiled as module
+
+Other kernel parameters for ide_core are:
+
+* "nodma=[interface_number.device_number]" to disallow DMA for a device
+
+* "noflush=[interface_number.device_number]" to disable flush requests
+
+* "nohpa=[interface_number.device_number]" to disable Host Protected Area
+
+* "noprobe=[interface_number.device_number]" to skip probing
+
+* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
+
+* "cdrom=[interface_number.device_number]" to force device as a CD-ROM
+
+* "chs=[interface_number.device_number]" to force device as a disk (using CHS)
+
+
+Some Terminology
+================
+
+IDE
+  Integrated Drive Electronics, meaning that each drive has a built-in
+  controller, which is why an "IDE interface card" is not a "controller card".
+
+ATA
+  AT (the old IBM 286 computer) Attachment Interface, a draft American
+  National Standard for connecting hard drives to PCs.  This is the official
+  name for "IDE".
+
+  The latest standards define some enhancements, known as the ATA-6 spec,
+  which grew out of vendor-specific "Enhanced IDE" (EIDE) implementations.
+
+ATAPI
+  ATA Packet Interface, a new protocol for controlling the drives,
+  similar to SCSI protocols, created at the same time as the ATA2 standard.
+  ATAPI is currently used for controlling CDROM, TAPE and FLOPPY (ZIP or
+  LS120/240) devices, removable R/W cartridges, and for high capacity hard disk
+  drives.
+
+mlord@pobox.com
+
+
+Wed Apr 17 22:52:44 CEST 2002 edited by Marcin Dalecki, the current
+maintainer.
+
+Wed Aug 20 22:31:29 CEST 2003 updated ide boot options to current ide.c
+comments at 2.6.0-test4 time. Maciej Soltysiak <solt@dns.toxicfilms.tv>
diff --git a/Documentation/ide/ide.txt b/Documentation/ide/ide.txt
deleted file mode 100644
index 7aca987c23d9..000000000000
--- a/Documentation/ide/ide.txt
+++ /dev/null
@@ -1,256 +0,0 @@
-
-	Information regarding the Enhanced IDE drive in Linux 2.6
-
-==============================================================================
-
-
-   The hdparm utility can be used to control various IDE features on a
-   running system. It is packaged separately.  Please Look for it on popular
-   linux FTP sites.
-
-
-
-***  IMPORTANT NOTICES:  BUGGY IDE CHIPSETS CAN CORRUPT DATA!!
-***  =================
-***  PCI versions of the CMD640 and RZ1000 interfaces are now detected
-***  automatically at startup when PCI BIOS support is configured.
-***
-***  Linux disables the "prefetch" ("readahead") mode of the RZ1000
-***  to prevent data corruption possible due to hardware design flaws.
-***
-***  For the CMD640, linux disables "IRQ unmasking" (hdparm -u1) on any
-***  drive for which the "prefetch" mode of the CMD640 is turned on.
-***  If "prefetch" is disabled (hdparm -p8), then "IRQ unmasking" can be
-***  used again.
-***
-***  For the CMD640, linux disables "32bit I/O" (hdparm -c1) on any drive
-***  for which the "prefetch" mode of the CMD640 is turned off.
-***  If "prefetch" is enabled (hdparm -p9), then "32bit I/O" can be
-***  used again.
-***
-***  The CMD640 is also used on some Vesa Local Bus (VLB) cards, and is *NOT*
-***  automatically detected by Linux.  For safe, reliable operation with such
-***  interfaces, one *MUST* use the "cmd640.probe_vlb" kernel option.
-***
-***  Use of the "serialize" option is no longer necessary.
-
-================================================================================
-Common pitfalls:
-
-- 40-conductor IDE cables are capable of transferring data in DMA modes up to
-  udma2, but no faster.
-
-- If possible devices should be attached to separate channels if they are
-  available. Typically the disk on the first and CD-ROM on the second.
-
-- If you mix devices on the same cable, please consider using similar devices
-  in respect of the data transfer mode they support.
-
-- Even better try to stick to the same vendor and device type on the same
-  cable.
-
-================================================================================
-
-This is the multiple IDE interface driver, as evolved from hd.c.
-
-It supports up to 9 IDE interfaces per default, on one or more IRQs (usually
-14 & 15).  There can be up to two drives per interface, as per the ATA-6 spec.
-
-Primary:    ide0, port 0x1f0; major=3;  hda is minor=0; hdb is minor=64
-Secondary:  ide1, port 0x170; major=22; hdc is minor=0; hdd is minor=64
-Tertiary:   ide2, port 0x1e8; major=33; hde is minor=0; hdf is minor=64
-Quaternary: ide3, port 0x168; major=34; hdg is minor=0; hdh is minor=64
-fifth..     ide4, usually PCI, probed
-sixth..     ide5, usually PCI, probed
-
-To access devices on interfaces > ide0, device entries please make sure that
-device files for them are present in /dev.  If not, please create such
-entries, by using /dev/MAKEDEV.
-
-This driver automatically probes for most IDE interfaces (including all PCI
-ones), for the drives/geometries attached to those interfaces, and for the IRQ
-lines being used by the interfaces (normally 14, 15 for ide0/ide1).
-
-Any number of interfaces may share a single IRQ if necessary, at a slight
-performance penalty, whether on separate cards or a single VLB card.
-The IDE driver automatically detects and handles this.  However, this may
-or may not be harmful to your hardware.. two or more cards driving the same IRQ
-can potentially burn each other's bus driver, though in practice this
-seldom occurs.  Be careful, and if in doubt, don't do it!
-
-Drives are normally found by auto-probing and/or examining the CMOS/BIOS data.
-For really weird situations, the apparent (fdisk) geometry can also be specified
-on the kernel "command line" using LILO.  The format of such lines is:
-
-	ide_core.chs=[interface_number.device_number]:cyls,heads,sects
-or	ide_core.cdrom=[interface_number.device_number]
-
-For example:
-
-	ide_core.chs=1.0:1050,32,64  ide_core.cdrom=1.1
-
-The results of successful auto-probing may override the physical geometry/irq
-specified, though the "original" geometry may be retained as the "logical"
-geometry for partitioning purposes (fdisk).
-
-If the auto-probing during boot time confuses a drive (ie. the drive works
-with hd.c but not with ide.c), then an command line option may be specified
-for each drive for which you'd like the drive to skip the hardware
-probe/identification sequence.  For example:
-
-	ide_core.noprobe=0.1
-or
-	ide_core.chs=1.0:768,16,32
-	ide_core.noprobe=1.0
-
-Note that when only one IDE device is attached to an interface, it should be
-jumpered as "single" or "master", *not* "slave".  Many folks have had
-"trouble" with cdroms because of this requirement, so the driver now probes
-for both units, though success is more likely when the drive is jumpered
-correctly.
-
-Courtesy of Scott Snyder and others, the driver supports ATAPI cdrom drives
-such as the NEC-260 and the new MITSUMI triple/quad speed drives.
-Such drives will be identified at boot time, just like a hard disk.
-
-If for some reason your cdrom drive is *not* found at boot time, you can force
-the probe to look harder by supplying a kernel command line parameter
-via LILO, such as:
-
-	ide_core.cdrom=1.0	/* "master" on second interface (hdc) */
-or
-	ide_core.cdrom=1.1	/* "slave" on second interface (hdd) */
-
-For example, a GW2000 system might have a hard drive on the primary
-interface (/dev/hda) and an IDE cdrom drive on the secondary interface
-(/dev/hdc).  To mount a CD in the cdrom drive, one would use something like:
-
-	ln -sf /dev/hdc /dev/cdrom
-	mkdir /mnt/cdrom
-	mount /dev/cdrom /mnt/cdrom -t iso9660 -o ro
-
-If, after doing all of the above, mount doesn't work and you see
-errors from the driver (with dmesg) complaining about `status=0xff',
-this means that the hardware is not responding to the driver's attempts
-to read it.  One of the following is probably the problem:
-
-  - Your hardware is broken.
-
-  - You are using the wrong address for the device, or you have the
-    drive jumpered wrong.  Review the configuration instructions above.
-
-  - Your IDE controller requires some nonstandard initialization sequence
-    before it will work properly.  If this is the case, there will often
-    be a separate MS-DOS driver just for the controller.  IDE interfaces
-    on sound cards usually fall into this category.  Such configurations
-    can often be made to work by first booting MS-DOS, loading the
-    appropriate drivers, and then warm-booting linux (without powering
-    off).  This can be automated using loadlin in the MS-DOS autoexec.
-
-If you always get timeout errors, interrupts from the drive are probably
-not making it to the host.  Check how you have the hardware jumpered
-and make sure it matches what the driver expects (see the configuration
-instructions above).  If you have a PCI system, also check the BIOS
-setup; I've had one report of a system which was shipped with IRQ 15
-disabled by the BIOS.
-
-The kernel is able to execute binaries directly off of the cdrom,
-provided it is mounted with the default block size of 1024 (as above).
-
-Please pass on any feedback on any of this stuff to the maintainer,
-whose address can be found in linux/MAINTAINERS.
-
-The IDE driver is modularized.  The high level disk/CD-ROM/tape/floppy
-drivers can always be compiled as loadable modules, the chipset drivers
-can only be compiled into the kernel, and the core code (ide.c) can be
-compiled as a loadable module provided no chipset support is needed.
-
-When using ide.c as a module in combination with kmod, add:
-
-	alias block-major-3 ide-probe
-
-to a configuration file in /etc/modprobe.d/.
-
-When ide.c is used as a module, you can pass command line parameters to the
-driver using the "options=" keyword to insmod, while replacing any ',' with
-';'.
-
-
-================================================================================
-
-Summary of ide driver parameters for kernel command line
---------------------------------------------------------
-
-For legacy IDE VLB host drivers (ali14xx/dtc2278/ht6560b/qd65xx/umc8672)
-you need to explicitly enable probing by using "probe" kernel parameter,
-i.e. to enable probing for ALI M14xx chipsets (ali14xx host driver) use:
-
-* "ali14xx.probe" boot option when ali14xx driver is built-in the kernel
-
-* "probe" module parameter when ali14xx driver is compiled as module
-  ("modprobe ali14xx probe")
-
-Also for legacy CMD640 host driver (cmd640) you need to use "probe_vlb"
-kernel paremeter to enable probing for VLB version of the chipset (PCI ones
-are detected automatically).
-
-You also need to use "probe" kernel parameter for ide-4drives driver
-(support for IDE generic chipset with four drives on one port).
-
-To enable support for IDE doublers on Amiga use "doubler" kernel parameter
-for gayle host driver (i.e. "gayle.doubler" if the driver is built-in).
-
-To force ignoring cable detection (this should be needed only if you're using
-short 40-wires cable which cannot be automatically detected - if this is not
-a case please report it as a bug instead) use "ignore_cable" kernel parameter:
-
-* "ide_core.ignore_cable=[interface_number]" boot option if IDE is built-in
-  (i.e. "ide_core.ignore_cable=1" to force ignoring cable for "ide1")
-
-* "ignore_cable=[interface_number]" module parameter (for ide_core module)
-  if IDE is compiled as module
-
-Other kernel parameters for ide_core are:
-
-* "nodma=[interface_number.device_number]" to disallow DMA for a device
-
-* "noflush=[interface_number.device_number]" to disable flush requests
-
-* "nohpa=[interface_number.device_number]" to disable Host Protected Area
-
-* "noprobe=[interface_number.device_number]" to skip probing
-
-* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
-
-* "cdrom=[interface_number.device_number]" to force device as a CD-ROM
-
-* "chs=[interface_number.device_number]" to force device as a disk (using CHS)
-
-================================================================================
-
-Some Terminology
-----------------
-IDE = Integrated Drive Electronics, meaning that each drive has a built-in
-controller, which is why an "IDE interface card" is not a "controller card".
-
-ATA = AT (the old IBM 286 computer) Attachment Interface, a draft American
-National Standard for connecting hard drives to PCs.  This is the official
-name for "IDE".
-
-The latest standards define some enhancements, known as the ATA-6 spec,
-which grew out of vendor-specific "Enhanced IDE" (EIDE) implementations.
-
-ATAPI = ATA Packet Interface, a new protocol for controlling the drives,
-similar to SCSI protocols, created at the same time as the ATA2 standard.
-ATAPI is currently used for controlling CDROM, TAPE and FLOPPY (ZIP or
-LS120/240) devices, removable R/W cartridges, and for high capacity hard disk
-drives.
-
-mlord@pobox.com
---
-
-Wed Apr 17 22:52:44 CEST 2002 edited by Marcin Dalecki, the current
-maintainer.
-
-Wed Aug 20 22:31:29 CEST 2003 updated ide boot options to current ide.c
-comments at 2.6.0-test4 time. Maciej Soltysiak <solt@dns.toxicfilms.tv>
diff --git a/Documentation/ide/index.rst b/Documentation/ide/index.rst
new file mode 100644
index 000000000000..45bc12d3957f
--- /dev/null
+++ b/Documentation/ide/index.rst
@@ -0,0 +1,21 @@
+:orphan:
+
+==================================
+Integrated Drive Electronics (IDE)
+==================================
+
+.. toctree::
+    :maxdepth: 1
+
+    ide
+    ide-tape
+    warm-plug-howto
+
+    changelogs
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/ide/warm-plug-howto.rst b/Documentation/ide/warm-plug-howto.rst
new file mode 100644
index 000000000000..c245242ef2f1
--- /dev/null
+++ b/Documentation/ide/warm-plug-howto.rst
@@ -0,0 +1,18 @@
+===================
+IDE warm-plug HOWTO
+===================
+
+To warm-plug devices on a port 'idex'::
+
+	# echo -n "1" > /sys/class/ide_port/idex/delete_devices
+
+unplug old device(s) and plug new device(s)::
+
+	# echo -n "1" > /sys/class/ide_port/idex/scan
+
+done
+
+NOTE: please make sure that partitions are unmounted and that there are
+no other active references to devices before doing "delete_devices" step,
+also do not attempt "scan" step on devices currently in use -- otherwise
+results may be unpredictable and lead to data loss if you're unlucky
diff --git a/Documentation/ide/warm-plug-howto.txt b/Documentation/ide/warm-plug-howto.txt
deleted file mode 100644
index 98152bcd515a..000000000000
--- a/Documentation/ide/warm-plug-howto.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-
-IDE warm-plug HOWTO
-===================
-
-To warm-plug devices on a port 'idex':
-
-# echo -n "1" > /sys/class/ide_port/idex/delete_devices
-
-unplug old device(s) and plug new device(s)
-
-# echo -n "1" > /sys/class/ide_port/idex/scan
-
-done
-
-NOTE: please make sure that partitions are unmounted and that there are
-no other active references to devices before doing "delete_devices" step,
-also do not attempt "scan" step on devices currently in use -- otherwise
-results may be unpredictable and lead to data loss if you're unlucky
diff --git a/arch/m68k/q40/README b/arch/m68k/q40/README
index 93f4c4cd3c45..a4991d2d8af6 100644
--- a/arch/m68k/q40/README
+++ b/arch/m68k/q40/README
@@ -31,7 +31,7 @@ drivers used by the Q40, apart from the very obvious (console etc.):
 		char/joystick/*		# most of this should work, not
 				        # in default config.in
 	        block/q40ide.c		# startup for ide
-		      ide*		# see Documentation/ide/ide.txt
+		      ide*		# see Documentation/ide/ide.rst
 		      floppy.c		# normal PC driver, DMA emu in asm/floppy.h
 					# and arch/m68k/kernel/entry.S
 					# see drivers/block/README.fd
diff --git a/drivers/ide/Kconfig b/drivers/ide/Kconfig
index fdd2a62f9d52..9eada392df15 100644
--- a/drivers/ide/Kconfig
+++ b/drivers/ide/Kconfig
@@ -25,13 +25,13 @@ menuconfig IDE
 	  To compile this driver as a module, choose M here: the
 	  module will be called ide-core.
 
-	  For further information, please read <file:Documentation/ide/ide.txt>.
+	  For further information, please read <file:Documentation/ide/ide.rst>.
 
 	  If unsure, say N.
 
 if IDE
 
-comment "Please see Documentation/ide/ide.txt for help/info on IDE drives"
+comment "Please see Documentation/ide/ide.rst for help/info on IDE drives"
 
 config IDE_XFER_MODE
 	bool
@@ -163,7 +163,7 @@ config BLK_DEV_IDETAPE
 	  along with other IDE devices, as "hdb" or "hdc", or something
 	  similar, and will be mapped to a character device such as "ht0"
 	  (check the boot messages with dmesg).  Be sure to consult the
-	  <file:drivers/ide/ide-tape.c> and <file:Documentation/ide/ide.txt>
+	  <file:drivers/ide/ide-tape.c> and <file:Documentation/ide/ide.rst>
 	  files for usage information.
 
 	  To compile this driver as a module, choose M here: the
@@ -251,7 +251,7 @@ config BLK_DEV_CMD640
 
 	  The CMD640 chip is also used on add-in cards by Acculogic, and on
 	  the "CSA-6400E PCI to IDE controller" that some people have. For
-	  details, read <file:Documentation/ide/ide.txt>.
+	  details, read <file:Documentation/ide/ide.rst>.
 
 config BLK_DEV_CMD640_ENHANCED
 	bool "CMD640 enhanced support"
@@ -259,7 +259,7 @@ config BLK_DEV_CMD640_ENHANCED
 	help
 	  This option includes support for setting/autotuning PIO modes and
 	  prefetch on CMD640 IDE interfaces.  For details, read
-	  <file:Documentation/ide/ide.txt>. If you have a CMD640 IDE interface
+	  <file:Documentation/ide/ide.rst>. If you have a CMD640 IDE interface
 	  and your BIOS does not already do this for you, then say Y here.
 	  Otherwise say N.
 
@@ -819,7 +819,7 @@ config BLK_DEV_ALI14XX
 	  boot parameter.  It enables support for the secondary IDE interface
 	  of the ALI M1439/1443/1445/1487/1489 chipsets, and permits faster
 	  I/O speeds to be set as well.
-	  See the files <file:Documentation/ide/ide.txt> and
+	  See the files <file:Documentation/ide/ide.rst> and
 	  <file:drivers/ide/ali14xx.c> for more info.
 
 config BLK_DEV_DTC2278
@@ -830,7 +830,7 @@ config BLK_DEV_DTC2278
 	  This driver is enabled at runtime using the "dtc2278.probe" kernel
 	  boot parameter. It enables support for the secondary IDE interface
 	  of the DTC-2278 card, and permits faster I/O speeds to be set as
-	  well. See the <file:Documentation/ide/ide.txt> and
+	  well. See the <file:Documentation/ide/ide.rst> and
 	  <file:drivers/ide/dtc2278.c> files for more info.
 
 config BLK_DEV_HT6560B
@@ -841,7 +841,7 @@ config BLK_DEV_HT6560B
 	  This driver is enabled at runtime using the "ht6560b.probe" kernel
 	  boot parameter. It enables support for the secondary IDE interface
 	  of the Holtek card, and permits faster I/O speeds to be set as well.
-	  See the <file:Documentation/ide/ide.txt> and
+	  See the <file:Documentation/ide/ide.rst> and
 	  <file:drivers/ide/ht6560b.c> files for more info.
 
 config BLK_DEV_QD65XX
@@ -851,7 +851,7 @@ config BLK_DEV_QD65XX
 	help
 	  This driver is enabled at runtime using the "qd65xx.probe" kernel
 	  boot parameter.  It permits faster I/O speeds to be set.  See the
-	  <file:Documentation/ide/ide.txt> and <file:drivers/ide/qd65xx.c>
+	  <file:Documentation/ide/ide.rst> and <file:drivers/ide/qd65xx.c>
 	  for more info.
 
 config BLK_DEV_UMC8672
@@ -862,7 +862,7 @@ config BLK_DEV_UMC8672
 	  This driver is enabled at runtime using the "umc8672.probe" kernel
 	  boot parameter. It enables support for the secondary IDE interface
 	  of the UMC-8672, and permits faster I/O speeds to be set as well.
-	  See the files <file:Documentation/ide/ide.txt> and
+	  See the files <file:Documentation/ide/ide.rst> and
 	  <file:drivers/ide/umc8672.c> for more info.
 
 endif
-- 
cgit v1.2.3-59-g8ed1b


From cd238effefa28fac177e51dcf5e9d1a8b59c3c6b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:48 -0300
Subject: docs: kbuild: convert docs to ReST and rename to *.rst

The kbuild documentation clearly shows that the documents
there are written at different times: some use markdown,
some use their own peculiar logic to split sections.

Convert everything to ReST without affecting too much
the author's style and avoiding adding uneeded markups.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/README.rst               |    2 +-
 Documentation/kbuild/headers_install.rst           |   51 +
 Documentation/kbuild/headers_install.txt           |   50 -
 Documentation/kbuild/index.rst                     |   27 +
 Documentation/kbuild/issues.rst                    |   11 +
 Documentation/kbuild/kbuild.rst                    |  265 ++++
 Documentation/kbuild/kbuild.txt                    |  248 ----
 Documentation/kbuild/kconfig-language.rst          |  689 +++++++++
 Documentation/kbuild/kconfig-language.txt          |  669 ---------
 Documentation/kbuild/kconfig-macro-language.rst    |  247 ++++
 Documentation/kbuild/kconfig-macro-language.txt    |  242 ----
 Documentation/kbuild/kconfig.rst                   |  300 ++++
 Documentation/kbuild/kconfig.txt                   |  272 ----
 Documentation/kbuild/makefiles.rst                 | 1509 ++++++++++++++++++++
 Documentation/kbuild/makefiles.txt                 | 1369 ------------------
 Documentation/kbuild/modules.rst                   |  571 ++++++++
 Documentation/kbuild/modules.txt                   |  541 -------
 Documentation/kernel-hacking/hacking.rst           |    4 +-
 Documentation/process/coding-style.rst             |    2 +-
 Documentation/process/submit-checklist.rst         |    2 +-
 .../translations/it_IT/kernel-hacking/hacking.rst  |    4 +-
 .../translations/it_IT/process/coding-style.rst    |    2 +-
 .../it_IT/process/submit-checklist.rst             |    2 +-
 .../translations/zh_CN/process/coding-style.rst    |    2 +-
 .../zh_CN/process/submit-checklist.rst             |    2 +-
 Kconfig                                            |    2 +-
 arch/arc/plat-eznps/Kconfig                        |    2 +-
 arch/c6x/Kconfig                                   |    2 +-
 arch/microblaze/Kconfig.debug                      |    2 +-
 arch/microblaze/Kconfig.platform                   |    2 +-
 arch/nds32/Kconfig                                 |    2 +-
 arch/openrisc/Kconfig                              |    2 +-
 arch/powerpc/sysdev/Kconfig                        |    2 +-
 arch/riscv/Kconfig                                 |    2 +-
 drivers/auxdisplay/Kconfig                         |    2 +-
 drivers/firmware/Kconfig                           |    2 +-
 drivers/mtd/devices/Kconfig                        |    2 +-
 drivers/net/ethernet/smsc/Kconfig                  |    6 +-
 drivers/net/wireless/intel/iwlegacy/Kconfig        |    4 +-
 drivers/net/wireless/intel/iwlwifi/Kconfig         |    2 +-
 drivers/parport/Kconfig                            |    2 +-
 drivers/scsi/Kconfig                               |    4 +-
 drivers/staging/sm750fb/Kconfig                    |    2 +-
 drivers/usb/misc/Kconfig                           |    4 +-
 drivers/video/fbdev/Kconfig                        |   14 +-
 net/bridge/netfilter/Kconfig                       |    2 +-
 net/ipv4/netfilter/Kconfig                         |    2 +-
 net/ipv6/netfilter/Kconfig                         |    2 +-
 net/netfilter/Kconfig                              |   16 +-
 net/tipc/Kconfig                                   |    2 +-
 scripts/Kbuild.include                             |    4 +-
 scripts/Makefile.host                              |    2 +-
 scripts/kconfig/symbol.c                           |    2 +-
 .../tests/err_recursive_dep/expected_stderr        |   14 +-
 sound/oss/dmasound/Kconfig                         |    6 +-
 55 files changed, 3738 insertions(+), 3459 deletions(-)
 create mode 100644 Documentation/kbuild/headers_install.rst
 delete mode 100644 Documentation/kbuild/headers_install.txt
 create mode 100644 Documentation/kbuild/index.rst
 create mode 100644 Documentation/kbuild/issues.rst
 create mode 100644 Documentation/kbuild/kbuild.rst
 delete mode 100644 Documentation/kbuild/kbuild.txt
 create mode 100644 Documentation/kbuild/kconfig-language.rst
 delete mode 100644 Documentation/kbuild/kconfig-language.txt
 create mode 100644 Documentation/kbuild/kconfig-macro-language.rst
 delete mode 100644 Documentation/kbuild/kconfig-macro-language.txt
 create mode 100644 Documentation/kbuild/kconfig.rst
 delete mode 100644 Documentation/kbuild/kconfig.txt
 create mode 100644 Documentation/kbuild/makefiles.rst
 delete mode 100644 Documentation/kbuild/makefiles.txt
 create mode 100644 Documentation/kbuild/modules.rst
 delete mode 100644 Documentation/kbuild/modules.txt

diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index a582c780c3bd..cc6151fc0845 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -227,7 +227,7 @@ Configuring the kernel
      "make tinyconfig"  Configure the tiniest possible kernel.
 
    You can find more information on using the Linux kernel config tools
-   in Documentation/kbuild/kconfig.txt.
+   in Documentation/kbuild/kconfig.rst.
 
  - NOTES on ``make config``:
 
diff --git a/Documentation/kbuild/headers_install.rst b/Documentation/kbuild/headers_install.rst
new file mode 100644
index 000000000000..1ab7294e41ac
--- /dev/null
+++ b/Documentation/kbuild/headers_install.rst
@@ -0,0 +1,51 @@
+=============================================
+Exporting kernel headers for use by userspace
+=============================================
+
+The "make headers_install" command exports the kernel's header files in a
+form suitable for use by userspace programs.
+
+The linux kernel's exported header files describe the API for user space
+programs attempting to use kernel services.  These kernel header files are
+used by the system's C library (such as glibc or uClibc) to define available
+system calls, as well as constants and structures to be used with these
+system calls.  The C library's header files include the kernel header files
+from the "linux" subdirectory.  The system's libc headers are usually
+installed at the default location /usr/include and the kernel headers in
+subdirectories under that (most notably /usr/include/linux and
+/usr/include/asm).
+
+Kernel headers are backwards compatible, but not forwards compatible.  This
+means that a program built against a C library using older kernel headers
+should run on a newer kernel (although it may not have access to new
+features), but a program built against newer kernel headers may not work on an
+older kernel.
+
+The "make headers_install" command can be run in the top level directory of the
+kernel source code (or using a standard out-of-tree build).  It takes two
+optional arguments::
+
+  make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr
+
+ARCH indicates which architecture to produce headers for, and defaults to the
+current architecture.  The linux/asm directory of the exported kernel headers
+is platform-specific, to see a complete list of supported architectures use
+the command::
+
+  ls -d include/asm-* | sed 's/.*-//'
+
+INSTALL_HDR_PATH indicates where to install the headers. It defaults to
+"./usr".
+
+An 'include' directory is automatically created inside INSTALL_HDR_PATH and
+headers are installed in 'INSTALL_HDR_PATH/include'.
+
+The command "make headers_install_all" exports headers for all architectures
+simultaneously.  (This is mostly of interest to distribution maintainers,
+who create an architecture-independent tarball from the resulting include
+directory.)  You also can use HDR_ARCH_LIST to specify list of architectures.
+Remember to provide the appropriate linux/asm directory via "mv" or "ln -s"
+before building a C library with headers exported this way.
+
+The kernel header export infrastructure is maintained by David Woodhouse
+<dwmw2@infradead.org>.
diff --git a/Documentation/kbuild/headers_install.txt b/Documentation/kbuild/headers_install.txt
deleted file mode 100644
index f0153adb95e2..000000000000
--- a/Documentation/kbuild/headers_install.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-Exporting kernel headers for use by userspace
-=============================================
-
-The "make headers_install" command exports the kernel's header files in a
-form suitable for use by userspace programs.
-
-The linux kernel's exported header files describe the API for user space
-programs attempting to use kernel services.  These kernel header files are
-used by the system's C library (such as glibc or uClibc) to define available
-system calls, as well as constants and structures to be used with these
-system calls.  The C library's header files include the kernel header files
-from the "linux" subdirectory.  The system's libc headers are usually
-installed at the default location /usr/include and the kernel headers in
-subdirectories under that (most notably /usr/include/linux and
-/usr/include/asm).
-
-Kernel headers are backwards compatible, but not forwards compatible.  This
-means that a program built against a C library using older kernel headers
-should run on a newer kernel (although it may not have access to new
-features), but a program built against newer kernel headers may not work on an
-older kernel.
-
-The "make headers_install" command can be run in the top level directory of the
-kernel source code (or using a standard out-of-tree build).  It takes two
-optional arguments:
-
-  make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr
-
-ARCH indicates which architecture to produce headers for, and defaults to the
-current architecture.  The linux/asm directory of the exported kernel headers
-is platform-specific, to see a complete list of supported architectures use
-the command:
-
-  ls -d include/asm-* | sed 's/.*-//'
-
-INSTALL_HDR_PATH indicates where to install the headers. It defaults to
-"./usr".
-
-An 'include' directory is automatically created inside INSTALL_HDR_PATH and
-headers are installed in 'INSTALL_HDR_PATH/include'.
-
-The command "make headers_install_all" exports headers for all architectures
-simultaneously.  (This is mostly of interest to distribution maintainers,
-who create an architecture-independent tarball from the resulting include
-directory.)  You also can use HDR_ARCH_LIST to specify list of architectures.
-Remember to provide the appropriate linux/asm directory via "mv" or "ln -s"
-before building a C library with headers exported this way.
-
-The kernel header export infrastructure is maintained by David Woodhouse
-<dwmw2@infradead.org>.
diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
new file mode 100644
index 000000000000..42d4cbe4460c
--- /dev/null
+++ b/Documentation/kbuild/index.rst
@@ -0,0 +1,27 @@
+:orphan:
+
+===================
+Kernel Build System
+===================
+
+.. toctree::
+    :maxdepth: 1
+
+    kconfig-language
+    kconfig-macro-language
+
+    kbuild
+    kconfig
+    makefiles
+    modules
+
+    headers_install
+
+    issues
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/kbuild/issues.rst b/Documentation/kbuild/issues.rst
new file mode 100644
index 000000000000..9fdded4b681c
--- /dev/null
+++ b/Documentation/kbuild/issues.rst
@@ -0,0 +1,11 @@
+Recursion issue #1
+------------------
+
+ .. include:: Kconfig.recursion-issue-01
+    :literal:
+
+Recursion issue #2
+------------------
+
+ .. include:: Kconfig.recursion-issue-02
+    :literal:
diff --git a/Documentation/kbuild/kbuild.rst b/Documentation/kbuild/kbuild.rst
new file mode 100644
index 000000000000..e774e760522d
--- /dev/null
+++ b/Documentation/kbuild/kbuild.rst
@@ -0,0 +1,265 @@
+======
+Kbuild
+======
+
+
+Output files
+============
+
+modules.order
+-------------
+This file records the order in which modules appear in Makefiles. This
+is used by modprobe to deterministically resolve aliases that match
+multiple modules.
+
+modules.builtin
+---------------
+This file lists all modules that are built into the kernel. This is used
+by modprobe to not fail when trying to load something builtin.
+
+modules.builtin.modinfo
+--------------------------------------------------
+This file contains modinfo from all modules that are built into the kernel.
+Unlike modinfo of a separate module, all fields are prefixed with module name.
+
+
+Environment variables
+=====================
+
+KCPPFLAGS
+---------
+Additional options to pass when preprocessing. The preprocessing options
+will be used in all cases where kbuild does preprocessing including
+building C files and assembler files.
+
+KAFLAGS
+-------
+Additional options to the assembler (for built-in and modules).
+
+AFLAGS_MODULE
+-------------
+Additional module specific options to use for $(AS).
+
+AFLAGS_KERNEL
+-------------
+Additional options for $(AS) when used for assembler
+code for code that is compiled as built-in.
+
+KCFLAGS
+-------
+Additional options to the C compiler (for built-in and modules).
+
+CFLAGS_KERNEL
+-------------
+Additional options for $(CC) when used to compile
+code that is compiled as built-in.
+
+CFLAGS_MODULE
+-------------
+Additional module specific options to use for $(CC).
+
+LDFLAGS_MODULE
+--------------
+Additional options used for $(LD) when linking modules.
+
+HOSTCFLAGS
+----------
+Additional flags to be passed to $(HOSTCC) when building host programs.
+
+HOSTCXXFLAGS
+------------
+Additional flags to be passed to $(HOSTCXX) when building host programs.
+
+HOSTLDFLAGS
+-----------
+Additional flags to be passed when linking host programs.
+
+HOSTLDLIBS
+----------
+Additional libraries to link against when building host programs.
+
+KBUILD_KCONFIG
+--------------
+Set the top-level Kconfig file to the value of this environment
+variable.  The default name is "Kconfig".
+
+KBUILD_VERBOSE
+--------------
+Set the kbuild verbosity. Can be assigned same values as "V=...".
+
+See make help for the full list.
+
+Setting "V=..." takes precedence over KBUILD_VERBOSE.
+
+KBUILD_EXTMOD
+-------------
+Set the directory to look for the kernel source when building external
+modules.
+
+Setting "M=..." takes precedence over KBUILD_EXTMOD.
+
+KBUILD_OUTPUT
+-------------
+Specify the output directory when building the kernel.
+
+The output directory can also be specified using "O=...".
+
+Setting "O=..." takes precedence over KBUILD_OUTPUT.
+
+KBUILD_DEBARCH
+--------------
+For the deb-pkg target, allows overriding the normal heuristics deployed by
+deb-pkg. Normally deb-pkg attempts to guess the right architecture based on
+the UTS_MACHINE variable, and on some architectures also the kernel config.
+The value of KBUILD_DEBARCH is assumed (not checked) to be a valid Debian
+architecture.
+
+ARCH
+----
+Set ARCH to the architecture to be built.
+
+In most cases the name of the architecture is the same as the
+directory name found in the arch/ directory.
+
+But some architectures such as x86 and sparc have aliases.
+
+- x86: i386 for 32 bit, x86_64 for 64 bit
+- sh: sh for 32 bit, sh64 for 64 bit
+- sparc: sparc32 for 32 bit, sparc64 for 64 bit
+
+CROSS_COMPILE
+-------------
+Specify an optional fixed part of the binutils filename.
+CROSS_COMPILE can be a part of the filename or the full path.
+
+CROSS_COMPILE is also used for ccache in some setups.
+
+CF
+--
+Additional options for sparse.
+
+CF is often used on the command-line like this::
+
+    make CF=-Wbitwise C=2
+
+INSTALL_PATH
+------------
+INSTALL_PATH specifies where to place the updated kernel and system map
+images. Default is /boot, but you can set it to other values.
+
+INSTALLKERNEL
+-------------
+Install script called when using "make install".
+The default name is "installkernel".
+
+The script will be called with the following arguments:
+   - $1 - kernel version
+   - $2 - kernel image file
+   - $3 - kernel map file
+   - $4 - default install path (use root directory if blank)
+
+The implementation of "make install" is architecture specific
+and it may differ from the above.
+
+INSTALLKERNEL is provided to enable the possibility to
+specify a custom installer when cross compiling a kernel.
+
+MODLIB
+------
+Specify where to install modules.
+The default value is::
+
+     $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
+
+The value can be overridden in which case the default value is ignored.
+
+INSTALL_MOD_PATH
+----------------
+INSTALL_MOD_PATH specifies a prefix to MODLIB for module directory
+relocations required by build roots.  This is not defined in the
+makefile but the argument can be passed to make if needed.
+
+INSTALL_MOD_STRIP
+-----------------
+INSTALL_MOD_STRIP, if defined, will cause modules to be
+stripped after they are installed.  If INSTALL_MOD_STRIP is '1', then
+the default option --strip-debug will be used.  Otherwise,
+INSTALL_MOD_STRIP value will be used as the options to the strip command.
+
+INSTALL_HDR_PATH
+----------------
+INSTALL_HDR_PATH specifies where to install user space headers when
+executing "make headers_*".
+
+The default value is::
+
+    $(objtree)/usr
+
+$(objtree) is the directory where output files are saved.
+The output directory is often set using "O=..." on the commandline.
+
+The value can be overridden in which case the default value is ignored.
+
+KBUILD_SIGN_PIN
+---------------
+This variable allows a passphrase or PIN to be passed to the sign-file
+utility when signing kernel modules, if the private key requires such.
+
+KBUILD_MODPOST_WARN
+-------------------
+KBUILD_MODPOST_WARN can be set to avoid errors in case of undefined
+symbols in the final module linking stage. It changes such errors
+into warnings.
+
+KBUILD_MODPOST_NOFINAL
+----------------------
+KBUILD_MODPOST_NOFINAL can be set to skip the final link of modules.
+This is solely useful to speed up test compiles.
+
+KBUILD_EXTRA_SYMBOLS
+--------------------
+For modules that use symbols from other modules.
+See more details in modules.txt.
+
+ALLSOURCE_ARCHS
+---------------
+For tags/TAGS/cscope targets, you can specify more than one arch
+to be included in the databases, separated by blank space. E.g.::
+
+    $ make ALLSOURCE_ARCHS="x86 mips arm" tags
+
+To get all available archs you can also specify all. E.g.::
+
+    $ make ALLSOURCE_ARCHS=all tags
+
+KBUILD_ENABLE_EXTRA_GCC_CHECKS
+------------------------------
+If enabled over the make command line with "W=1", it turns on additional
+gcc -W... options for more extensive build-time checking.
+
+KBUILD_BUILD_TIMESTAMP
+----------------------
+Setting this to a date string overrides the timestamp used in the
+UTS_VERSION definition (uname -v in the running kernel). The value has to
+be a string that can be passed to date -d. The default value
+is the output of the date command at one point during build.
+
+KBUILD_BUILD_USER, KBUILD_BUILD_HOST
+------------------------------------
+These two variables allow to override the user@host string displayed during
+boot and in /proc/version. The default value is the output of the commands
+whoami and host, respectively.
+
+KBUILD_LDS
+----------
+The linker script with full path. Assigned by the top-level Makefile.
+
+KBUILD_VMLINUX_OBJS
+-------------------
+All object files for vmlinux. They are linked to vmlinux in the same
+order as listed in KBUILD_VMLINUX_OBJS.
+
+KBUILD_VMLINUX_LIBS
+-------------------
+All .a "lib" files for vmlinux. KBUILD_VMLINUX_OBJS and KBUILD_VMLINUX_LIBS
+together specify all the object files used to link vmlinux.
diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
deleted file mode 100644
index 9c230ea71963..000000000000
--- a/Documentation/kbuild/kbuild.txt
+++ /dev/null
@@ -1,248 +0,0 @@
-Output files
-
-modules.order
---------------------------------------------------
-This file records the order in which modules appear in Makefiles. This
-is used by modprobe to deterministically resolve aliases that match
-multiple modules.
-
-modules.builtin
---------------------------------------------------
-This file lists all modules that are built into the kernel. This is used
-by modprobe to not fail when trying to load something builtin.
-
-modules.builtin.modinfo
---------------------------------------------------
-This file contains modinfo from all modules that are built into the kernel.
-Unlike modinfo of a separate module, all fields are prefixed with module name.
-
-
-Environment variables
-
-KCPPFLAGS
---------------------------------------------------
-Additional options to pass when preprocessing. The preprocessing options
-will be used in all cases where kbuild does preprocessing including
-building C files and assembler files.
-
-KAFLAGS
---------------------------------------------------
-Additional options to the assembler (for built-in and modules).
-
-AFLAGS_MODULE
---------------------------------------------------
-Additional module specific options to use for $(AS).
-
-AFLAGS_KERNEL
---------------------------------------------------
-Additional options for $(AS) when used for assembler
-code for code that is compiled as built-in.
-
-KCFLAGS
---------------------------------------------------
-Additional options to the C compiler (for built-in and modules).
-
-CFLAGS_KERNEL
---------------------------------------------------
-Additional options for $(CC) when used to compile
-code that is compiled as built-in.
-
-CFLAGS_MODULE
---------------------------------------------------
-Additional module specific options to use for $(CC).
-
-LDFLAGS_MODULE
---------------------------------------------------
-Additional options used for $(LD) when linking modules.
-
-HOSTCFLAGS
---------------------------------------------------
-Additional flags to be passed to $(HOSTCC) when building host programs.
-
-HOSTCXXFLAGS
---------------------------------------------------
-Additional flags to be passed to $(HOSTCXX) when building host programs.
-
-HOSTLDFLAGS
---------------------------------------------------
-Additional flags to be passed when linking host programs.
-
-HOSTLDLIBS
---------------------------------------------------
-Additional libraries to link against when building host programs.
-
-KBUILD_KCONFIG
---------------------------------------------------
-Set the top-level Kconfig file to the value of this environment
-variable.  The default name is "Kconfig".
-
-KBUILD_VERBOSE
---------------------------------------------------
-Set the kbuild verbosity. Can be assigned same values as "V=...".
-See make help for the full list.
-Setting "V=..." takes precedence over KBUILD_VERBOSE.
-
-KBUILD_EXTMOD
---------------------------------------------------
-Set the directory to look for the kernel source when building external
-modules.
-Setting "M=..." takes precedence over KBUILD_EXTMOD.
-
-KBUILD_OUTPUT
---------------------------------------------------
-Specify the output directory when building the kernel.
-The output directory can also be specified using "O=...".
-Setting "O=..." takes precedence over KBUILD_OUTPUT.
-
-KBUILD_DEBARCH
---------------------------------------------------
-For the deb-pkg target, allows overriding the normal heuristics deployed by
-deb-pkg. Normally deb-pkg attempts to guess the right architecture based on
-the UTS_MACHINE variable, and on some architectures also the kernel config.
-The value of KBUILD_DEBARCH is assumed (not checked) to be a valid Debian
-architecture.
-
-ARCH
---------------------------------------------------
-Set ARCH to the architecture to be built.
-In most cases the name of the architecture is the same as the
-directory name found in the arch/ directory.
-But some architectures such as x86 and sparc have aliases.
-x86: i386 for 32 bit, x86_64 for 64 bit
-sh: sh for 32 bit, sh64 for 64 bit
-sparc: sparc32 for 32 bit, sparc64 for 64 bit
-
-CROSS_COMPILE
---------------------------------------------------
-Specify an optional fixed part of the binutils filename.
-CROSS_COMPILE can be a part of the filename or the full path.
-
-CROSS_COMPILE is also used for ccache in some setups.
-
-CF
---------------------------------------------------
-Additional options for sparse.
-CF is often used on the command-line like this:
-
-    make CF=-Wbitwise C=2
-
-INSTALL_PATH
---------------------------------------------------
-INSTALL_PATH specifies where to place the updated kernel and system map
-images. Default is /boot, but you can set it to other values.
-
-INSTALLKERNEL
---------------------------------------------------
-Install script called when using "make install".
-The default name is "installkernel".
-
-The script will be called with the following arguments:
-    $1 - kernel version
-    $2 - kernel image file
-    $3 - kernel map file
-    $4 - default install path (use root directory if blank)
-
-The implementation of "make install" is architecture specific
-and it may differ from the above.
-
-INSTALLKERNEL is provided to enable the possibility to
-specify a custom installer when cross compiling a kernel.
-
-MODLIB
---------------------------------------------------
-Specify where to install modules.
-The default value is:
-
-     $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
-
-The value can be overridden in which case the default value is ignored.
-
-INSTALL_MOD_PATH
---------------------------------------------------
-INSTALL_MOD_PATH specifies a prefix to MODLIB for module directory
-relocations required by build roots.  This is not defined in the
-makefile but the argument can be passed to make if needed.
-
-INSTALL_MOD_STRIP
---------------------------------------------------
-INSTALL_MOD_STRIP, if defined, will cause modules to be
-stripped after they are installed.  If INSTALL_MOD_STRIP is '1', then
-the default option --strip-debug will be used.  Otherwise,
-INSTALL_MOD_STRIP value will be used as the options to the strip command.
-
-INSTALL_HDR_PATH
---------------------------------------------------
-INSTALL_HDR_PATH specifies where to install user space headers when
-executing "make headers_*".
-The default value is:
-
-    $(objtree)/usr
-
-$(objtree) is the directory where output files are saved.
-The output directory is often set using "O=..." on the commandline.
-
-The value can be overridden in which case the default value is ignored.
-
-KBUILD_SIGN_PIN
---------------------------------------------------
-This variable allows a passphrase or PIN to be passed to the sign-file
-utility when signing kernel modules, if the private key requires such.
-
-KBUILD_MODPOST_WARN
---------------------------------------------------
-KBUILD_MODPOST_WARN can be set to avoid errors in case of undefined
-symbols in the final module linking stage. It changes such errors
-into warnings.
-
-KBUILD_MODPOST_NOFINAL
---------------------------------------------------
-KBUILD_MODPOST_NOFINAL can be set to skip the final link of modules.
-This is solely useful to speed up test compiles.
-
-KBUILD_EXTRA_SYMBOLS
---------------------------------------------------
-For modules that use symbols from other modules.
-See more details in modules.txt.
-
-ALLSOURCE_ARCHS
---------------------------------------------------
-For tags/TAGS/cscope targets, you can specify more than one arch
-to be included in the databases, separated by blank space. E.g.:
-
-    $ make ALLSOURCE_ARCHS="x86 mips arm" tags
-
-To get all available archs you can also specify all. E.g.:
-
-    $ make ALLSOURCE_ARCHS=all tags
-
-KBUILD_ENABLE_EXTRA_GCC_CHECKS
---------------------------------------------------
-If enabled over the make command line with "W=1", it turns on additional
-gcc -W... options for more extensive build-time checking.
-
-KBUILD_BUILD_TIMESTAMP
---------------------------------------------------
-Setting this to a date string overrides the timestamp used in the
-UTS_VERSION definition (uname -v in the running kernel). The value has to
-be a string that can be passed to date -d. The default value
-is the output of the date command at one point during build.
-
-KBUILD_BUILD_USER, KBUILD_BUILD_HOST
---------------------------------------------------
-These two variables allow to override the user@host string displayed during
-boot and in /proc/version. The default value is the output of the commands
-whoami and host, respectively.
-
-KBUILD_LDS
---------------------------------------------------
-The linker script with full path. Assigned by the top-level Makefile.
-
-KBUILD_VMLINUX_OBJS
---------------------------------------------------
-All object files for vmlinux. They are linked to vmlinux in the same
-order as listed in KBUILD_VMLINUX_OBJS.
-
-KBUILD_VMLINUX_LIBS
---------------------------------------------------
-All .a "lib" files for vmlinux. KBUILD_VMLINUX_OBJS and KBUILD_VMLINUX_LIBS
-together specify all the object files used to link vmlinux.
diff --git a/Documentation/kbuild/kconfig-language.rst b/Documentation/kbuild/kconfig-language.rst
new file mode 100644
index 000000000000..2bc8a7803365
--- /dev/null
+++ b/Documentation/kbuild/kconfig-language.rst
@@ -0,0 +1,689 @@
+================
+Kconfig Language
+================
+
+Introduction
+------------
+
+The configuration database is a collection of configuration options
+organized in a tree structure::
+
+	+- Code maturity level options
+	|  +- Prompt for development and/or incomplete code/drivers
+	+- General setup
+	|  +- Networking support
+	|  +- System V IPC
+	|  +- BSD Process Accounting
+	|  +- Sysctl support
+	+- Loadable module support
+	|  +- Enable loadable module support
+	|     +- Set version information on all module symbols
+	|     +- Kernel module loader
+	+- ...
+
+Every entry has its own dependencies. These dependencies are used
+to determine the visibility of an entry. Any child entry is only
+visible if its parent entry is also visible.
+
+Menu entries
+------------
+
+Most entries define a config option; all other entries help to organize
+them. A single configuration option is defined like this::
+
+  config MODVERSIONS
+	bool "Set version information on all module symbols"
+	depends on MODULES
+	help
+	  Usually, modules have to be recompiled whenever you switch to a new
+	  kernel.  ...
+
+Every line starts with a key word and can be followed by multiple
+arguments.  "config" starts a new config entry. The following lines
+define attributes for this config option. Attributes can be the type of
+the config option, input prompt, dependencies, help text and default
+values. A config option can be defined multiple times with the same
+name, but every definition can have only a single input prompt and the
+type must not conflict.
+
+Menu attributes
+---------------
+
+A menu entry can have a number of attributes. Not all of them are
+applicable everywhere (see syntax).
+
+- type definition: "bool"/"tristate"/"string"/"hex"/"int"
+  Every config option must have a type. There are only two basic types:
+  tristate and string; the other types are based on these two. The type
+  definition optionally accepts an input prompt, so these two examples
+  are equivalent::
+
+	bool "Networking support"
+
+  and::
+
+	bool
+	prompt "Networking support"
+
+- input prompt: "prompt" <prompt> ["if" <expr>]
+  Every menu entry can have at most one prompt, which is used to display
+  to the user. Optionally dependencies only for this prompt can be added
+  with "if".
+
+- default value: "default" <expr> ["if" <expr>]
+  A config option can have any number of default values. If multiple
+  default values are visible, only the first defined one is active.
+  Default values are not limited to the menu entry where they are
+  defined. This means the default can be defined somewhere else or be
+  overridden by an earlier definition.
+  The default value is only assigned to the config symbol if no other
+  value was set by the user (via the input prompt above). If an input
+  prompt is visible the default value is presented to the user and can
+  be overridden by him.
+  Optionally, dependencies only for this default value can be added with
+  "if".
+
+ The default value deliberately defaults to 'n' in order to avoid bloating the
+ build. With few exceptions, new config options should not change this. The
+ intent is for "make oldconfig" to add as little as possible to the config from
+ release to release.
+
+ Note:
+	Things that merit "default y/m" include:
+
+	a) A new Kconfig option for something that used to always be built
+	   should be "default y".
+
+	b) A new gatekeeping Kconfig option that hides/shows other Kconfig
+	   options (but does not generate any code of its own), should be
+	   "default y" so people will see those other options.
+
+	c) Sub-driver behavior or similar options for a driver that is
+	   "default n". This allows you to provide sane defaults.
+
+	d) Hardware or infrastructure that everybody expects, such as CONFIG_NET
+	   or CONFIG_BLOCK. These are rare exceptions.
+
+- type definition + default value::
+
+	"def_bool"/"def_tristate" <expr> ["if" <expr>]
+
+  This is a shorthand notation for a type definition plus a value.
+  Optionally dependencies for this default value can be added with "if".
+
+- dependencies: "depends on" <expr>
+  This defines a dependency for this menu entry. If multiple
+  dependencies are defined, they are connected with '&&'. Dependencies
+  are applied to all other options within this menu entry (which also
+  accept an "if" expression), so these two examples are equivalent::
+
+	bool "foo" if BAR
+	default y if BAR
+
+  and::
+
+	depends on BAR
+	bool "foo"
+	default y
+
+- reverse dependencies: "select" <symbol> ["if" <expr>]
+  While normal dependencies reduce the upper limit of a symbol (see
+  below), reverse dependencies can be used to force a lower limit of
+  another symbol. The value of the current menu symbol is used as the
+  minimal value <symbol> can be set to. If <symbol> is selected multiple
+  times, the limit is set to the largest selection.
+  Reverse dependencies can only be used with boolean or tristate
+  symbols.
+
+  Note:
+	select should be used with care. select will force
+	a symbol to a value without visiting the dependencies.
+	By abusing select you are able to select a symbol FOO even
+	if FOO depends on BAR that is not set.
+	In general use select only for non-visible symbols
+	(no prompts anywhere) and for symbols with no dependencies.
+	That will limit the usefulness but on the other hand avoid
+	the illegal configurations all over.
+
+- weak reverse dependencies: "imply" <symbol> ["if" <expr>]
+  This is similar to "select" as it enforces a lower limit on another
+  symbol except that the "implied" symbol's value may still be set to n
+  from a direct dependency or with a visible prompt.
+
+  Given the following example::
+
+    config FOO
+	tristate
+	imply BAZ
+
+    config BAZ
+	tristate
+	depends on BAR
+
+  The following values are possible:
+
+	===		===		=============	==============
+	FOO		BAR		BAZ's default	choice for BAZ
+	===		===		=============	==============
+	n		y		n		N/m/y
+	m		y		m		M/y/n
+	y		y		y		Y/n
+	y		n		*		N
+	===		===		=============	==============
+
+  This is useful e.g. with multiple drivers that want to indicate their
+  ability to hook into a secondary subsystem while allowing the user to
+  configure that subsystem out without also having to unset these drivers.
+
+- limiting menu display: "visible if" <expr>
+  This attribute is only applicable to menu blocks, if the condition is
+  false, the menu block is not displayed to the user (the symbols
+  contained there can still be selected by other symbols, though). It is
+  similar to a conditional "prompt" attribute for individual menu
+  entries. Default value of "visible" is true.
+
+- numerical ranges: "range" <symbol> <symbol> ["if" <expr>]
+  This allows to limit the range of possible input values for int
+  and hex symbols. The user can only input a value which is larger than
+  or equal to the first symbol and smaller than or equal to the second
+  symbol.
+
+- help text: "help" or "---help---"
+  This defines a help text. The end of the help text is determined by
+  the indentation level, this means it ends at the first line which has
+  a smaller indentation than the first line of the help text.
+  "---help---" and "help" do not differ in behaviour, "---help---" is
+  used to help visually separate configuration logic from help within
+  the file as an aid to developers.
+
+- misc options: "option" <symbol>[=<value>]
+  Various less common options can be defined via this option syntax,
+  which can modify the behaviour of the menu entry and its config
+  symbol. These options are currently possible:
+
+  - "defconfig_list"
+    This declares a list of default entries which can be used when
+    looking for the default configuration (which is used when the main
+    .config doesn't exists yet.)
+
+  - "modules"
+    This declares the symbol to be used as the MODULES symbol, which
+    enables the third modular state for all config symbols.
+    At most one symbol may have the "modules" option set.
+
+  - "allnoconfig_y"
+    This declares the symbol as one that should have the value y when
+    using "allnoconfig". Used for symbols that hide other symbols.
+
+Menu dependencies
+-----------------
+
+Dependencies define the visibility of a menu entry and can also reduce
+the input range of tristate symbols. The tristate logic used in the
+expressions uses one more state than normal boolean logic to express the
+module state. Dependency expressions have the following syntax::
+
+  <expr> ::= <symbol>                           (1)
+           <symbol> '=' <symbol>                (2)
+           <symbol> '!=' <symbol>               (3)
+           <symbol1> '<' <symbol2>              (4)
+           <symbol1> '>' <symbol2>              (4)
+           <symbol1> '<=' <symbol2>             (4)
+           <symbol1> '>=' <symbol2>             (4)
+           '(' <expr> ')'                       (5)
+           '!' <expr>                           (6)
+           <expr> '&&' <expr>                   (7)
+           <expr> '||' <expr>                   (8)
+
+Expressions are listed in decreasing order of precedence.
+
+(1) Convert the symbol into an expression. Boolean and tristate symbols
+    are simply converted into the respective expression values. All
+    other symbol types result in 'n'.
+(2) If the values of both symbols are equal, it returns 'y',
+    otherwise 'n'.
+(3) If the values of both symbols are equal, it returns 'n',
+    otherwise 'y'.
+(4) If value of <symbol1> is respectively lower, greater, lower-or-equal,
+    or greater-or-equal than value of <symbol2>, it returns 'y',
+    otherwise 'n'.
+(5) Returns the value of the expression. Used to override precedence.
+(6) Returns the result of (2-/expr/).
+(7) Returns the result of min(/expr/, /expr/).
+(8) Returns the result of max(/expr/, /expr/).
+
+An expression can have a value of 'n', 'm' or 'y' (or 0, 1, 2
+respectively for calculations). A menu entry becomes visible when its
+expression evaluates to 'm' or 'y'.
+
+There are two types of symbols: constant and non-constant symbols.
+Non-constant symbols are the most common ones and are defined with the
+'config' statement. Non-constant symbols consist entirely of alphanumeric
+characters or underscores.
+Constant symbols are only part of expressions. Constant symbols are
+always surrounded by single or double quotes. Within the quote, any
+other character is allowed and the quotes can be escaped using '\'.
+
+Menu structure
+--------------
+
+The position of a menu entry in the tree is determined in two ways. First
+it can be specified explicitly::
+
+  menu "Network device support"
+	depends on NET
+
+  config NETDEVICES
+	...
+
+  endmenu
+
+All entries within the "menu" ... "endmenu" block become a submenu of
+"Network device support". All subentries inherit the dependencies from
+the menu entry, e.g. this means the dependency "NET" is added to the
+dependency list of the config option NETDEVICES.
+
+The other way to generate the menu structure is done by analyzing the
+dependencies. If a menu entry somehow depends on the previous entry, it
+can be made a submenu of it. First, the previous (parent) symbol must
+be part of the dependency list and then one of these two conditions
+must be true:
+
+- the child entry must become invisible, if the parent is set to 'n'
+- the child entry must only be visible, if the parent is visible::
+
+    config MODULES
+	bool "Enable loadable module support"
+
+    config MODVERSIONS
+	bool "Set version information on all module symbols"
+	depends on MODULES
+
+    comment "module support disabled"
+	depends on !MODULES
+
+MODVERSIONS directly depends on MODULES, this means it's only visible if
+MODULES is different from 'n'. The comment on the other hand is only
+visible when MODULES is set to 'n'.
+
+
+Kconfig syntax
+--------------
+
+The configuration file describes a series of menu entries, where every
+line starts with a keyword (except help texts). The following keywords
+end a menu entry:
+
+- config
+- menuconfig
+- choice/endchoice
+- comment
+- menu/endmenu
+- if/endif
+- source
+
+The first five also start the definition of a menu entry.
+
+config::
+	"config" <symbol>
+	<config options>
+
+This defines a config symbol <symbol> and accepts any of above
+attributes as options.
+
+menuconfig::
+	"menuconfig" <symbol>
+	<config options>
+
+This is similar to the simple config entry above, but it also gives a
+hint to front ends, that all suboptions should be displayed as a
+separate list of options. To make sure all the suboptions will really
+show up under the menuconfig entry and not outside of it, every item
+from the <config options> list must depend on the menuconfig symbol.
+In practice, this is achieved by using one of the next two constructs::
+
+  (1):
+  menuconfig M
+  if M
+      config C1
+      config C2
+  endif
+
+  (2):
+  menuconfig M
+  config C1
+      depends on M
+  config C2
+      depends on M
+
+In the following examples (3) and (4), C1 and C2 still have the M
+dependency, but will not appear under menuconfig M anymore, because
+of C0, which doesn't depend on M::
+
+  (3):
+  menuconfig M
+      config C0
+  if M
+      config C1
+      config C2
+  endif
+
+  (4):
+  menuconfig M
+  config C0
+  config C1
+      depends on M
+  config C2
+      depends on M
+
+choices::
+
+	"choice" [symbol]
+	<choice options>
+	<choice block>
+	"endchoice"
+
+This defines a choice group and accepts any of the above attributes as
+options. A choice can only be of type bool or tristate.  If no type is
+specified for a choice, its type will be determined by the type of
+the first choice element in the group or remain unknown if none of the
+choice elements have a type specified, as well.
+
+While a boolean choice only allows a single config entry to be
+selected, a tristate choice also allows any number of config entries
+to be set to 'm'. This can be used if multiple drivers for a single
+hardware exists and only a single driver can be compiled/loaded into
+the kernel, but all drivers can be compiled as modules.
+
+A choice accepts another option "optional", which allows to set the
+choice to 'n' and no entry needs to be selected.
+If no [symbol] is associated with a choice, then you can not have multiple
+definitions of that choice. If a [symbol] is associated to the choice,
+then you may define the same choice (i.e. with the same entries) in another
+place.
+
+comment::
+
+	"comment" <prompt>
+	<comment options>
+
+This defines a comment which is displayed to the user during the
+configuration process and is also echoed to the output files. The only
+possible options are dependencies.
+
+menu::
+
+	"menu" <prompt>
+	<menu options>
+	<menu block>
+	"endmenu"
+
+This defines a menu block, see "Menu structure" above for more
+information. The only possible options are dependencies and "visible"
+attributes.
+
+if::
+
+	"if" <expr>
+	<if block>
+	"endif"
+
+This defines an if block. The dependency expression <expr> is appended
+to all enclosed menu entries.
+
+source::
+
+	"source" <prompt>
+
+This reads the specified configuration file. This file is always parsed.
+
+mainmenu::
+
+	"mainmenu" <prompt>
+
+This sets the config program's title bar if the config program chooses
+to use it. It should be placed at the top of the configuration, before any
+other statement.
+
+'#' Kconfig source file comment:
+
+An unquoted '#' character anywhere in a source file line indicates
+the beginning of a source file comment.  The remainder of that line
+is a comment.
+
+
+Kconfig hints
+-------------
+This is a collection of Kconfig tips, most of which aren't obvious at
+first glance and most of which have become idioms in several Kconfig
+files.
+
+Adding common features and make the usage configurable
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+It is a common idiom to implement a feature/functionality that are
+relevant for some architectures but not all.
+The recommended way to do so is to use a config variable named HAVE_*
+that is defined in a common Kconfig file and selected by the relevant
+architectures.
+An example is the generic IOMAP functionality.
+
+We would in lib/Kconfig see::
+
+  # Generic IOMAP is used to ...
+  config HAVE_GENERIC_IOMAP
+
+  config GENERIC_IOMAP
+	depends on HAVE_GENERIC_IOMAP && FOO
+
+And in lib/Makefile we would see::
+
+	obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
+
+For each architecture using the generic IOMAP functionality we would see::
+
+  config X86
+	select ...
+	select HAVE_GENERIC_IOMAP
+	select ...
+
+Note: we use the existing config option and avoid creating a new
+config variable to select HAVE_GENERIC_IOMAP.
+
+Note: the use of the internal config variable HAVE_GENERIC_IOMAP, it is
+introduced to overcome the limitation of select which will force a
+config option to 'y' no matter the dependencies.
+The dependencies are moved to the symbol GENERIC_IOMAP and we avoid the
+situation where select forces a symbol equals to 'y'.
+
+Adding features that need compiler support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are several features that need compiler support. The recommended way
+to describe the dependency on the compiler feature is to use "depends on"
+followed by a test macro::
+
+  config STACKPROTECTOR
+	bool "Stack Protector buffer overflow detection"
+	depends on $(cc-option,-fstack-protector)
+	...
+
+If you need to expose a compiler capability to makefiles and/or C source files,
+`CC_HAS_` is the recommended prefix for the config option::
+
+  config CC_HAS_STACKPROTECTOR_NONE
+	def_bool $(cc-option,-fno-stack-protector)
+
+Build as module only
+~~~~~~~~~~~~~~~~~~~~
+To restrict a component build to module-only, qualify its config symbol
+with "depends on m".  E.g.::
+
+  config FOO
+	depends on BAR && m
+
+limits FOO to module (=m) or disabled (=n).
+
+Kconfig recursive dependency limitations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you've hit the Kconfig error: "recursive dependency detected" you've run
+into a recursive dependency issue with Kconfig, a recursive dependency can be
+summarized as a circular dependency. The kconfig tools need to ensure that
+Kconfig files comply with specified configuration requirements. In order to do
+that kconfig must determine the values that are possible for all Kconfig
+symbols, this is currently not possible if there is a circular relation
+between two or more Kconfig symbols. For more details refer to the "Simple
+Kconfig recursive issue" subsection below. Kconfig does not do recursive
+dependency resolution; this has a few implications for Kconfig file writers.
+We'll first explain why this issues exists and then provide an example
+technical limitation which this brings upon Kconfig developers. Eager
+developers wishing to try to address this limitation should read the next
+subsections.
+
+Simple Kconfig recursive issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Read: Documentation/kbuild/Kconfig.recursion-issue-01
+
+Test with::
+
+  make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-01 allnoconfig
+
+Cumulative Kconfig recursive issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Read: Documentation/kbuild/Kconfig.recursion-issue-02
+
+Test with::
+
+  make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-02 allnoconfig
+
+Practical solutions to kconfig recursive issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Developers who run into the recursive Kconfig issue have two options
+at their disposal. We document them below and also provide a list of
+historical issues resolved through these different solutions.
+
+  a) Remove any superfluous "select FOO" or "depends on FOO"
+  b) Match dependency semantics:
+
+	b1) Swap all "select FOO" to "depends on FOO" or,
+
+	b2) Swap all "depends on FOO" to "select FOO"
+
+The resolution to a) can be tested with the sample Kconfig file
+Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
+of the "select CORE" from CORE_BELL_A_ADVANCED as that is implicit already
+since CORE_BELL_A depends on CORE. At times it may not be possible to remove
+some dependency criteria, for such cases you can work with solution b).
+
+The two different resolutions for b) can be tested in the sample Kconfig file
+Documentation/kbuild/Kconfig.recursion-issue-02.
+
+Below is a list of examples of prior fixes for these types of recursive issues;
+all errors appear to involve one or more select's and one or more "depends on".
+
+============    ===================================
+commit          fix
+============    ===================================
+06b718c01208    select A -> depends on A
+c22eacfe82f9    depends on A -> depends on B
+6a91e854442c    select A -> depends on A
+118c565a8f2e    select A -> select B
+f004e5594705    select A -> depends on A
+c7861f37b4c6    depends on A -> (null)
+80c69915e5fb    select A -> (null)              (1)
+c2218e26c0d0    select A -> depends on A        (1)
+d6ae99d04e1c    select A -> depends on A
+95ca19cf8cbf    select A -> depends on A
+8f057d7bca54    depends on A -> (null)
+8f057d7bca54    depends on A -> select A
+a0701f04846e    select A -> depends on A
+0c8b92f7f259    depends on A -> (null)
+e4e9e0540928    select A -> depends on A        (2)
+7453ea886e87    depends on A > (null)           (1)
+7b1fff7e4fdf    select A -> depends on A
+86c747d2a4f0    select A -> depends on A
+d9f9ab51e55e    select A -> depends on A
+0c51a4d8abd6    depends on A -> select A        (3)
+e98062ed6dc4    select A -> depends on A        (3)
+91e5d284a7f1    select A -> (null)
+============    ===================================
+
+(1) Partial (or no) quote of error.
+(2) That seems to be the gist of that fix.
+(3) Same error.
+
+Future kconfig work
+~~~~~~~~~~~~~~~~~~~
+
+Work on kconfig is welcomed on both areas of clarifying semantics and on
+evaluating the use of a full SAT solver for it. A full SAT solver can be
+desirable to enable more complex dependency mappings and / or queries,
+for instance on possible use case for a SAT solver could be that of handling
+the current known recursive dependency issues. It is not known if this would
+address such issues but such evaluation is desirable. If support for a full SAT
+solver proves too complex or that it cannot address recursive dependency issues
+Kconfig should have at least clear and well defined semantics which also
+addresses and documents limitations or requirements such as the ones dealing
+with recursive dependencies.
+
+Further work on both of these areas is welcomed on Kconfig. We elaborate
+on both of these in the next two subsections.
+
+Semantics of Kconfig
+~~~~~~~~~~~~~~~~~~~~
+
+The use of Kconfig is broad, Linux is now only one of Kconfig's users:
+one study has completed a broad analysis of Kconfig use in 12 projects [0]_.
+Despite its widespread use, and although this document does a reasonable job
+in documenting basic Kconfig syntax a more precise definition of Kconfig
+semantics is welcomed. One project deduced Kconfig semantics through
+the use of the xconfig configurator [1]_. Work should be done to confirm if
+the deduced semantics matches our intended Kconfig design goals.
+
+Having well defined semantics can be useful for tools for practical
+evaluation of depenencies, for instance one such use known case was work to
+express in boolean abstraction of the inferred semantics of Kconfig to
+translate Kconfig logic into boolean formulas and run a SAT solver on this to
+find dead code / features (always inactive), 114 dead features were found in
+Linux using this methodology [1]_ (Section 8: Threats to validity).
+
+Confirming this could prove useful as Kconfig stands as one of the the leading
+industrial variability modeling languages [1]_ [2]_. Its study would help
+evaluate practical uses of such languages, their use was only theoretical
+and real world requirements were not well understood. As it stands though
+only reverse engineering techniques have been used to deduce semantics from
+variability modeling languages such as Kconfig [3]_.
+
+.. [0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
+.. [1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
+.. [2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
+.. [3] http://gsd.uwaterloo.ca/sites/default/files/icse2011.pdf
+
+Full SAT solver for Kconfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Although SAT solvers [4]_ haven't yet been used by Kconfig directly, as noted
+in the previous subsection, work has been done however to express in boolean
+abstraction the inferred semantics of Kconfig to translate Kconfig logic into
+boolean formulas and run a SAT solver on it [5]_. Another known related project
+is CADOS [6]_ (former VAMOS [7]_) and the tools, mainly undertaker [8]_, which
+has been introduced first with [9]_.  The basic concept of undertaker is to
+exract variability models from Kconfig, and put them together with a
+propositional formula extracted from CPP #ifdefs and build-rules into a SAT
+solver in order to find dead code, dead files, and dead symbols. If using a SAT
+solver is desirable on Kconfig one approach would be to evaluate repurposing
+such efforts somehow on Kconfig. There is enough interest from mentors of
+existing projects to not only help advise how to integrate this work upstream
+but also help maintain it long term. Interested developers should visit:
+
+http://kernelnewbies.org/KernelProjects/kconfig-sat
+
+.. [4] http://www.cs.cornell.edu/~sabhar/chapters/SATSolvers-KR-Handbook.pdf
+.. [5] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
+.. [6] https://cados.cs.fau.de
+.. [7] https://vamos.cs.fau.de
+.. [8] https://undertaker.cs.fau.de
+.. [9] https://www4.cs.fau.de/Publications/2011/tartler_11_eurosys.pdf
diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt
deleted file mode 100644
index 864e740811da..000000000000
--- a/Documentation/kbuild/kconfig-language.txt
+++ /dev/null
@@ -1,669 +0,0 @@
-Introduction
-------------
-
-The configuration database is a collection of configuration options
-organized in a tree structure:
-
-	+- Code maturity level options
-	|  +- Prompt for development and/or incomplete code/drivers
-	+- General setup
-	|  +- Networking support
-	|  +- System V IPC
-	|  +- BSD Process Accounting
-	|  +- Sysctl support
-	+- Loadable module support
-	|  +- Enable loadable module support
-	|     +- Set version information on all module symbols
-	|     +- Kernel module loader
-	+- ...
-
-Every entry has its own dependencies. These dependencies are used
-to determine the visibility of an entry. Any child entry is only
-visible if its parent entry is also visible.
-
-Menu entries
-------------
-
-Most entries define a config option; all other entries help to organize
-them. A single configuration option is defined like this:
-
-config MODVERSIONS
-	bool "Set version information on all module symbols"
-	depends on MODULES
-	help
-	  Usually, modules have to be recompiled whenever you switch to a new
-	  kernel.  ...
-
-Every line starts with a key word and can be followed by multiple
-arguments.  "config" starts a new config entry. The following lines
-define attributes for this config option. Attributes can be the type of
-the config option, input prompt, dependencies, help text and default
-values. A config option can be defined multiple times with the same
-name, but every definition can have only a single input prompt and the
-type must not conflict.
-
-Menu attributes
----------------
-
-A menu entry can have a number of attributes. Not all of them are
-applicable everywhere (see syntax).
-
-- type definition: "bool"/"tristate"/"string"/"hex"/"int"
-  Every config option must have a type. There are only two basic types:
-  tristate and string; the other types are based on these two. The type
-  definition optionally accepts an input prompt, so these two examples
-  are equivalent:
-
-	bool "Networking support"
-  and
-	bool
-	prompt "Networking support"
-
-- input prompt: "prompt" <prompt> ["if" <expr>]
-  Every menu entry can have at most one prompt, which is used to display
-  to the user. Optionally dependencies only for this prompt can be added
-  with "if".
-
-- default value: "default" <expr> ["if" <expr>]
-  A config option can have any number of default values. If multiple
-  default values are visible, only the first defined one is active.
-  Default values are not limited to the menu entry where they are
-  defined. This means the default can be defined somewhere else or be
-  overridden by an earlier definition.
-  The default value is only assigned to the config symbol if no other
-  value was set by the user (via the input prompt above). If an input
-  prompt is visible the default value is presented to the user and can
-  be overridden by him.
-  Optionally, dependencies only for this default value can be added with
-  "if".
-
- The default value deliberately defaults to 'n' in order to avoid bloating the
- build. With few exceptions, new config options should not change this. The
- intent is for "make oldconfig" to add as little as possible to the config from
- release to release.
-
- Note:
-	Things that merit "default y/m" include:
-
-	a) A new Kconfig option for something that used to always be built
-	   should be "default y".
-
-	b) A new gatekeeping Kconfig option that hides/shows other Kconfig
-	   options (but does not generate any code of its own), should be
-	   "default y" so people will see those other options.
-
-	c) Sub-driver behavior or similar options for a driver that is
-	   "default n". This allows you to provide sane defaults.
-
-	d) Hardware or infrastructure that everybody expects, such as CONFIG_NET
-	   or CONFIG_BLOCK. These are rare exceptions.
-
-- type definition + default value:
-	"def_bool"/"def_tristate" <expr> ["if" <expr>]
-  This is a shorthand notation for a type definition plus a value.
-  Optionally dependencies for this default value can be added with "if".
-
-- dependencies: "depends on" <expr>
-  This defines a dependency for this menu entry. If multiple
-  dependencies are defined, they are connected with '&&'. Dependencies
-  are applied to all other options within this menu entry (which also
-  accept an "if" expression), so these two examples are equivalent:
-
-	bool "foo" if BAR
-	default y if BAR
-  and
-	depends on BAR
-	bool "foo"
-	default y
-
-- reverse dependencies: "select" <symbol> ["if" <expr>]
-  While normal dependencies reduce the upper limit of a symbol (see
-  below), reverse dependencies can be used to force a lower limit of
-  another symbol. The value of the current menu symbol is used as the
-  minimal value <symbol> can be set to. If <symbol> is selected multiple
-  times, the limit is set to the largest selection.
-  Reverse dependencies can only be used with boolean or tristate
-  symbols.
-  Note:
-	select should be used with care. select will force
-	a symbol to a value without visiting the dependencies.
-	By abusing select you are able to select a symbol FOO even
-	if FOO depends on BAR that is not set.
-	In general use select only for non-visible symbols
-	(no prompts anywhere) and for symbols with no dependencies.
-	That will limit the usefulness but on the other hand avoid
-	the illegal configurations all over.
-
-- weak reverse dependencies: "imply" <symbol> ["if" <expr>]
-  This is similar to "select" as it enforces a lower limit on another
-  symbol except that the "implied" symbol's value may still be set to n
-  from a direct dependency or with a visible prompt.
-
-  Given the following example:
-
-  config FOO
-	tristate
-	imply BAZ
-
-  config BAZ
-	tristate
-	depends on BAR
-
-  The following values are possible:
-
-	FOO		BAR		BAZ's default	choice for BAZ
-	---		---		-------------	--------------
-	n		y		n		N/m/y
-	m		y		m		M/y/n
-	y		y		y		Y/n
-	y		n		*		N
-
-  This is useful e.g. with multiple drivers that want to indicate their
-  ability to hook into a secondary subsystem while allowing the user to
-  configure that subsystem out without also having to unset these drivers.
-
-- limiting menu display: "visible if" <expr>
-  This attribute is only applicable to menu blocks, if the condition is
-  false, the menu block is not displayed to the user (the symbols
-  contained there can still be selected by other symbols, though). It is
-  similar to a conditional "prompt" attribute for individual menu
-  entries. Default value of "visible" is true.
-
-- numerical ranges: "range" <symbol> <symbol> ["if" <expr>]
-  This allows to limit the range of possible input values for int
-  and hex symbols. The user can only input a value which is larger than
-  or equal to the first symbol and smaller than or equal to the second
-  symbol.
-
-- help text: "help" or "---help---"
-  This defines a help text. The end of the help text is determined by
-  the indentation level, this means it ends at the first line which has
-  a smaller indentation than the first line of the help text.
-  "---help---" and "help" do not differ in behaviour, "---help---" is
-  used to help visually separate configuration logic from help within
-  the file as an aid to developers.
-
-- misc options: "option" <symbol>[=<value>]
-  Various less common options can be defined via this option syntax,
-  which can modify the behaviour of the menu entry and its config
-  symbol. These options are currently possible:
-
-  - "defconfig_list"
-    This declares a list of default entries which can be used when
-    looking for the default configuration (which is used when the main
-    .config doesn't exists yet.)
-
-  - "modules"
-    This declares the symbol to be used as the MODULES symbol, which
-    enables the third modular state for all config symbols.
-    At most one symbol may have the "modules" option set.
-
-  - "allnoconfig_y"
-    This declares the symbol as one that should have the value y when
-    using "allnoconfig". Used for symbols that hide other symbols.
-
-Menu dependencies
------------------
-
-Dependencies define the visibility of a menu entry and can also reduce
-the input range of tristate symbols. The tristate logic used in the
-expressions uses one more state than normal boolean logic to express the
-module state. Dependency expressions have the following syntax:
-
-<expr> ::= <symbol>                             (1)
-           <symbol> '=' <symbol>                (2)
-           <symbol> '!=' <symbol>               (3)
-           <symbol1> '<' <symbol2>              (4)
-           <symbol1> '>' <symbol2>              (4)
-           <symbol1> '<=' <symbol2>             (4)
-           <symbol1> '>=' <symbol2>             (4)
-           '(' <expr> ')'                       (5)
-           '!' <expr>                           (6)
-           <expr> '&&' <expr>                   (7)
-           <expr> '||' <expr>                   (8)
-
-Expressions are listed in decreasing order of precedence. 
-
-(1) Convert the symbol into an expression. Boolean and tristate symbols
-    are simply converted into the respective expression values. All
-    other symbol types result in 'n'.
-(2) If the values of both symbols are equal, it returns 'y',
-    otherwise 'n'.
-(3) If the values of both symbols are equal, it returns 'n',
-    otherwise 'y'.
-(4) If value of <symbol1> is respectively lower, greater, lower-or-equal,
-    or greater-or-equal than value of <symbol2>, it returns 'y',
-    otherwise 'n'.
-(5) Returns the value of the expression. Used to override precedence.
-(6) Returns the result of (2-/expr/).
-(7) Returns the result of min(/expr/, /expr/).
-(8) Returns the result of max(/expr/, /expr/).
-
-An expression can have a value of 'n', 'm' or 'y' (or 0, 1, 2
-respectively for calculations). A menu entry becomes visible when its
-expression evaluates to 'm' or 'y'.
-
-There are two types of symbols: constant and non-constant symbols.
-Non-constant symbols are the most common ones and are defined with the
-'config' statement. Non-constant symbols consist entirely of alphanumeric
-characters or underscores.
-Constant symbols are only part of expressions. Constant symbols are
-always surrounded by single or double quotes. Within the quote, any
-other character is allowed and the quotes can be escaped using '\'.
-
-Menu structure
---------------
-
-The position of a menu entry in the tree is determined in two ways. First
-it can be specified explicitly:
-
-menu "Network device support"
-	depends on NET
-
-config NETDEVICES
-	...
-
-endmenu
-
-All entries within the "menu" ... "endmenu" block become a submenu of
-"Network device support". All subentries inherit the dependencies from
-the menu entry, e.g. this means the dependency "NET" is added to the
-dependency list of the config option NETDEVICES.
-
-The other way to generate the menu structure is done by analyzing the
-dependencies. If a menu entry somehow depends on the previous entry, it
-can be made a submenu of it. First, the previous (parent) symbol must
-be part of the dependency list and then one of these two conditions
-must be true:
-- the child entry must become invisible, if the parent is set to 'n'
-- the child entry must only be visible, if the parent is visible
-
-config MODULES
-	bool "Enable loadable module support"
-
-config MODVERSIONS
-	bool "Set version information on all module symbols"
-	depends on MODULES
-
-comment "module support disabled"
-	depends on !MODULES
-
-MODVERSIONS directly depends on MODULES, this means it's only visible if
-MODULES is different from 'n'. The comment on the other hand is only
-visible when MODULES is set to 'n'.
-
-
-Kconfig syntax
---------------
-
-The configuration file describes a series of menu entries, where every
-line starts with a keyword (except help texts). The following keywords
-end a menu entry:
-- config
-- menuconfig
-- choice/endchoice
-- comment
-- menu/endmenu
-- if/endif
-- source
-The first five also start the definition of a menu entry.
-
-config:
-
-	"config" <symbol>
-	<config options>
-
-This defines a config symbol <symbol> and accepts any of above
-attributes as options.
-
-menuconfig:
-	"menuconfig" <symbol>
-	<config options>
-
-This is similar to the simple config entry above, but it also gives a
-hint to front ends, that all suboptions should be displayed as a
-separate list of options. To make sure all the suboptions will really
-show up under the menuconfig entry and not outside of it, every item
-from the <config options> list must depend on the menuconfig symbol.
-In practice, this is achieved by using one of the next two constructs:
-
-(1):
-menuconfig M
-if M
-    config C1
-    config C2
-endif
-
-(2):
-menuconfig M
-config C1
-    depends on M
-config C2
-    depends on M
-
-In the following examples (3) and (4), C1 and C2 still have the M
-dependency, but will not appear under menuconfig M anymore, because
-of C0, which doesn't depend on M:
-
-(3):
-menuconfig M
-    config C0
-if M
-    config C1
-    config C2
-endif
-
-(4):
-menuconfig M
-config C0
-config C1
-    depends on M
-config C2
-    depends on M
-
-choices:
-
-	"choice" [symbol]
-	<choice options>
-	<choice block>
-	"endchoice"
-
-This defines a choice group and accepts any of the above attributes as
-options. A choice can only be of type bool or tristate.  If no type is
-specified for a choice, its type will be determined by the type of
-the first choice element in the group or remain unknown if none of the
-choice elements have a type specified, as well.
-
-While a boolean choice only allows a single config entry to be
-selected, a tristate choice also allows any number of config entries
-to be set to 'm'. This can be used if multiple drivers for a single
-hardware exists and only a single driver can be compiled/loaded into
-the kernel, but all drivers can be compiled as modules.
-
-A choice accepts another option "optional", which allows to set the
-choice to 'n' and no entry needs to be selected.
-If no [symbol] is associated with a choice, then you can not have multiple
-definitions of that choice. If a [symbol] is associated to the choice,
-then you may define the same choice (i.e. with the same entries) in another
-place.
-
-comment:
-
-	"comment" <prompt>
-	<comment options>
-
-This defines a comment which is displayed to the user during the
-configuration process and is also echoed to the output files. The only
-possible options are dependencies.
-
-menu:
-
-	"menu" <prompt>
-	<menu options>
-	<menu block>
-	"endmenu"
-
-This defines a menu block, see "Menu structure" above for more
-information. The only possible options are dependencies and "visible"
-attributes.
-
-if:
-
-	"if" <expr>
-	<if block>
-	"endif"
-
-This defines an if block. The dependency expression <expr> is appended
-to all enclosed menu entries.
-
-source:
-
-	"source" <prompt>
-
-This reads the specified configuration file. This file is always parsed.
-
-mainmenu:
-
-	"mainmenu" <prompt>
-
-This sets the config program's title bar if the config program chooses
-to use it. It should be placed at the top of the configuration, before any
-other statement.
-
-'#' Kconfig source file comment:
-
-An unquoted '#' character anywhere in a source file line indicates
-the beginning of a source file comment.  The remainder of that line
-is a comment.
-
-
-Kconfig hints
--------------
-This is a collection of Kconfig tips, most of which aren't obvious at
-first glance and most of which have become idioms in several Kconfig
-files.
-
-Adding common features and make the usage configurable
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-It is a common idiom to implement a feature/functionality that are
-relevant for some architectures but not all.
-The recommended way to do so is to use a config variable named HAVE_*
-that is defined in a common Kconfig file and selected by the relevant
-architectures.
-An example is the generic IOMAP functionality.
-
-We would in lib/Kconfig see:
-
-# Generic IOMAP is used to ...
-config HAVE_GENERIC_IOMAP
-
-config GENERIC_IOMAP
-	depends on HAVE_GENERIC_IOMAP && FOO
-
-And in lib/Makefile we would see:
-obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
-
-For each architecture using the generic IOMAP functionality we would see:
-
-config X86
-	select ...
-	select HAVE_GENERIC_IOMAP
-	select ...
-
-Note: we use the existing config option and avoid creating a new
-config variable to select HAVE_GENERIC_IOMAP.
-
-Note: the use of the internal config variable HAVE_GENERIC_IOMAP, it is
-introduced to overcome the limitation of select which will force a
-config option to 'y' no matter the dependencies.
-The dependencies are moved to the symbol GENERIC_IOMAP and we avoid the
-situation where select forces a symbol equals to 'y'.
-
-Adding features that need compiler support
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-There are several features that need compiler support. The recommended way
-to describe the dependency on the compiler feature is to use "depends on"
-followed by a test macro.
-
-config STACKPROTECTOR
-	bool "Stack Protector buffer overflow detection"
-	depends on $(cc-option,-fstack-protector)
-	...
-
-If you need to expose a compiler capability to makefiles and/or C source files,
-CC_HAS_ is the recommended prefix for the config option.
-
-config CC_HAS_STACKPROTECTOR_NONE
-	def_bool $(cc-option,-fno-stack-protector)
-
-Build as module only
-~~~~~~~~~~~~~~~~~~~~
-To restrict a component build to module-only, qualify its config symbol
-with "depends on m".  E.g.:
-
-config FOO
-	depends on BAR && m
-
-limits FOO to module (=m) or disabled (=n).
-
-Kconfig recursive dependency limitations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-If you've hit the Kconfig error: "recursive dependency detected" you've run
-into a recursive dependency issue with Kconfig, a recursive dependency can be
-summarized as a circular dependency. The kconfig tools need to ensure that
-Kconfig files comply with specified configuration requirements. In order to do
-that kconfig must determine the values that are possible for all Kconfig
-symbols, this is currently not possible if there is a circular relation
-between two or more Kconfig symbols. For more details refer to the "Simple
-Kconfig recursive issue" subsection below. Kconfig does not do recursive
-dependency resolution; this has a few implications for Kconfig file writers.
-We'll first explain why this issues exists and then provide an example
-technical limitation which this brings upon Kconfig developers. Eager
-developers wishing to try to address this limitation should read the next
-subsections.
-
-Simple Kconfig recursive issue
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Read: Documentation/kbuild/Kconfig.recursion-issue-01
-
-Test with:
-
-make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-01 allnoconfig
-
-Cumulative Kconfig recursive issue
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Read: Documentation/kbuild/Kconfig.recursion-issue-02
-
-Test with:
-
-make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-02 allnoconfig
-
-Practical solutions to kconfig recursive issue
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Developers who run into the recursive Kconfig issue have two options
-at their disposal. We document them below and also provide a list of
-historical issues resolved through these different solutions.
-
-  a) Remove any superfluous "select FOO" or "depends on FOO"
-  b) Match dependency semantics:
-	b1) Swap all "select FOO" to "depends on FOO" or,
-	b2) Swap all "depends on FOO" to "select FOO"
-
-The resolution to a) can be tested with the sample Kconfig file
-Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
-of the "select CORE" from CORE_BELL_A_ADVANCED as that is implicit already
-since CORE_BELL_A depends on CORE. At times it may not be possible to remove
-some dependency criteria, for such cases you can work with solution b).
-
-The two different resolutions for b) can be tested in the sample Kconfig file
-Documentation/kbuild/Kconfig.recursion-issue-02.
-
-Below is a list of examples of prior fixes for these types of recursive issues;
-all errors appear to involve one or more select's and one or more "depends on".
-
-commit          fix
-======          ===
-06b718c01208    select A -> depends on A
-c22eacfe82f9    depends on A -> depends on B
-6a91e854442c    select A -> depends on A
-118c565a8f2e    select A -> select B
-f004e5594705    select A -> depends on A
-c7861f37b4c6    depends on A -> (null)
-80c69915e5fb    select A -> (null)              (1)
-c2218e26c0d0    select A -> depends on A        (1)
-d6ae99d04e1c    select A -> depends on A
-95ca19cf8cbf    select A -> depends on A
-8f057d7bca54    depends on A -> (null)
-8f057d7bca54    depends on A -> select A
-a0701f04846e    select A -> depends on A
-0c8b92f7f259    depends on A -> (null)
-e4e9e0540928    select A -> depends on A        (2)
-7453ea886e87    depends on A > (null)           (1)
-7b1fff7e4fdf    select A -> depends on A
-86c747d2a4f0    select A -> depends on A
-d9f9ab51e55e    select A -> depends on A
-0c51a4d8abd6    depends on A -> select A        (3)
-e98062ed6dc4    select A -> depends on A        (3)
-91e5d284a7f1    select A -> (null)
-
-(1) Partial (or no) quote of error.
-(2) That seems to be the gist of that fix.
-(3) Same error.
-
-Future kconfig work
-~~~~~~~~~~~~~~~~~~~
-
-Work on kconfig is welcomed on both areas of clarifying semantics and on
-evaluating the use of a full SAT solver for it. A full SAT solver can be
-desirable to enable more complex dependency mappings and / or queries,
-for instance on possible use case for a SAT solver could be that of handling
-the current known recursive dependency issues. It is not known if this would
-address such issues but such evaluation is desirable. If support for a full SAT
-solver proves too complex or that it cannot address recursive dependency issues
-Kconfig should have at least clear and well defined semantics which also
-addresses and documents limitations or requirements such as the ones dealing
-with recursive dependencies.
-
-Further work on both of these areas is welcomed on Kconfig. We elaborate
-on both of these in the next two subsections.
-
-Semantics of Kconfig
-~~~~~~~~~~~~~~~~~~~~
-
-The use of Kconfig is broad, Linux is now only one of Kconfig's users:
-one study has completed a broad analysis of Kconfig use in 12 projects [0].
-Despite its widespread use, and although this document does a reasonable job
-in documenting basic Kconfig syntax a more precise definition of Kconfig
-semantics is welcomed. One project deduced Kconfig semantics through
-the use of the xconfig configurator [1]. Work should be done to confirm if
-the deduced semantics matches our intended Kconfig design goals.
-
-Having well defined semantics can be useful for tools for practical
-evaluation of depenencies, for instance one such use known case was work to
-express in boolean abstraction of the inferred semantics of Kconfig to
-translate Kconfig logic into boolean formulas and run a SAT solver on this to
-find dead code / features (always inactive), 114 dead features were found in
-Linux using this methodology [1] (Section 8: Threats to validity).
-
-Confirming this could prove useful as Kconfig stands as one of the the leading
-industrial variability modeling languages [1] [2]. Its study would help
-evaluate practical uses of such languages, their use was only theoretical
-and real world requirements were not well understood. As it stands though
-only reverse engineering techniques have been used to deduce semantics from
-variability modeling languages such as Kconfig [3].
-
-[0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
-[1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
-[2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
-[3] http://gsd.uwaterloo.ca/sites/default/files/icse2011.pdf
-
-Full SAT solver for Kconfig
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Although SAT solvers [0] haven't yet been used by Kconfig directly, as noted in
-the previous subsection, work has been done however to express in boolean
-abstraction the inferred semantics of Kconfig to translate Kconfig logic into
-boolean formulas and run a SAT solver on it [1]. Another known related project
-is CADOS [2] (former VAMOS [3]) and the tools, mainly undertaker [4], which has
-been introduced first with [5].  The basic concept of undertaker is to exract
-variability models from Kconfig, and put them together with a propositional
-formula extracted from CPP #ifdefs and build-rules into a SAT solver in order
-to find dead code, dead files, and dead symbols. If using a SAT solver is
-desirable on Kconfig one approach would be to evaluate repurposing such efforts
-somehow on Kconfig. There is enough interest from mentors of existing projects
-to not only help advise how to integrate this work upstream but also help
-maintain it long term. Interested developers should visit:
-
-http://kernelnewbies.org/KernelProjects/kconfig-sat
-
-[0] http://www.cs.cornell.edu/~sabhar/chapters/SATSolvers-KR-Handbook.pdf
-[1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
-[2] https://cados.cs.fau.de
-[3] https://vamos.cs.fau.de
-[4] https://undertaker.cs.fau.de
-[5] https://www4.cs.fau.de/Publications/2011/tartler_11_eurosys.pdf
diff --git a/Documentation/kbuild/kconfig-macro-language.rst b/Documentation/kbuild/kconfig-macro-language.rst
new file mode 100644
index 000000000000..35b3263b7e40
--- /dev/null
+++ b/Documentation/kbuild/kconfig-macro-language.rst
@@ -0,0 +1,247 @@
+======================
+Kconfig macro language
+======================
+
+Concept
+-------
+
+The basic idea was inspired by Make. When we look at Make, we notice sort of
+two languages in one. One language describes dependency graphs consisting of
+targets and prerequisites. The other is a macro language for performing textual
+substitution.
+
+There is clear distinction between the two language stages. For example, you
+can write a makefile like follows::
+
+    APP := foo
+    SRC := foo.c
+    CC := gcc
+
+    $(APP): $(SRC)
+            $(CC) -o $(APP) $(SRC)
+
+The macro language replaces the variable references with their expanded form,
+and handles as if the source file were input like follows::
+
+    foo: foo.c
+            gcc -o foo foo.c
+
+Then, Make analyzes the dependency graph and determines the targets to be
+updated.
+
+The idea is quite similar in Kconfig - it is possible to describe a Kconfig
+file like this::
+
+    CC := gcc
+
+    config CC_HAS_FOO
+            def_bool $(shell, $(srctree)/scripts/gcc-check-foo.sh $(CC))
+
+The macro language in Kconfig processes the source file into the following
+intermediate::
+
+    config CC_HAS_FOO
+            def_bool y
+
+Then, Kconfig moves onto the evaluation stage to resolve inter-symbol
+dependency as explained in kconfig-language.txt.
+
+
+Variables
+---------
+
+Like in Make, a variable in Kconfig works as a macro variable.  A macro
+variable is expanded "in place" to yield a text string that may then be
+expanded further. To get the value of a variable, enclose the variable name in
+$( ). The parentheses are required even for single-letter variable names; $X is
+a syntax error. The curly brace form as in ${CC} is not supported either.
+
+There are two types of variables: simply expanded variables and recursively
+expanded variables.
+
+A simply expanded variable is defined using the := assignment operator. Its
+righthand side is expanded immediately upon reading the line from the Kconfig
+file.
+
+A recursively expanded variable is defined using the = assignment operator.
+Its righthand side is simply stored as the value of the variable without
+expanding it in any way. Instead, the expansion is performed when the variable
+is used.
+
+There is another type of assignment operator; += is used to append text to a
+variable. The righthand side of += is expanded immediately if the lefthand
+side was originally defined as a simple variable. Otherwise, its evaluation is
+deferred.
+
+The variable reference can take parameters, in the following form::
+
+  $(name,arg1,arg2,arg3)
+
+You can consider the parameterized reference as a function. (more precisely,
+"user-defined function" in contrast to "built-in function" listed below).
+
+Useful functions must be expanded when they are used since the same function is
+expanded differently if different parameters are passed. Hence, a user-defined
+function is defined using the = assignment operator. The parameters are
+referenced within the body definition with $(1), $(2), etc.
+
+In fact, recursively expanded variables and user-defined functions are the same
+internally. (In other words, "variable" is "function with zero argument".)
+When we say "variable" in a broad sense, it includes "user-defined function".
+
+
+Built-in functions
+------------------
+
+Like Make, Kconfig provides several built-in functions. Every function takes a
+particular number of arguments.
+
+In Make, every built-in function takes at least one argument. Kconfig allows
+zero argument for built-in functions, such as $(fileno), $(lineno). You could
+consider those as "built-in variable", but it is just a matter of how we call
+it after all. Let's say "built-in function" here to refer to natively supported
+functionality.
+
+Kconfig currently supports the following built-in functions.
+
+ - $(shell,command)
+
+  The "shell" function accepts a single argument that is expanded and passed
+  to a subshell for execution. The standard output of the command is then read
+  and returned as the value of the function. Every newline in the output is
+  replaced with a space. Any trailing newlines are deleted. The standard error
+  is not returned, nor is any program exit status.
+
+ - $(info,text)
+
+  The "info" function takes a single argument and prints it to stdout.
+  It evaluates to an empty string.
+
+ - $(warning-if,condition,text)
+
+  The "warning-if" function takes two arguments. If the condition part is "y",
+  the text part is sent to stderr. The text is prefixed with the name of the
+  current Kconfig file and the current line number.
+
+ - $(error-if,condition,text)
+
+  The "error-if" function is similar to "warning-if", but it terminates the
+  parsing immediately if the condition part is "y".
+
+ - $(filename)
+
+  The 'filename' takes no argument, and $(filename) is expanded to the file
+  name being parsed.
+
+ - $(lineno)
+
+  The 'lineno' takes no argument, and $(lineno) is expanded to the line number
+  being parsed.
+
+
+Make vs Kconfig
+---------------
+
+Kconfig adopts Make-like macro language, but the function call syntax is
+slightly different.
+
+A function call in Make looks like this::
+
+  $(func-name arg1,arg2,arg3)
+
+The function name and the first argument are separated by at least one
+whitespace. Then, leading whitespaces are trimmed from the first argument,
+while whitespaces in the other arguments are kept. You need to use a kind of
+trick to start the first parameter with spaces. For example, if you want
+to make "info" function print "  hello", you can write like follows::
+
+  empty :=
+  space := $(empty) $(empty)
+  $(info $(space)$(space)hello)
+
+Kconfig uses only commas for delimiters, and keeps all whitespaces in the
+function call. Some people prefer putting a space after each comma delimiter::
+
+  $(func-name, arg1, arg2, arg3)
+
+In this case, "func-name" will receive " arg1", " arg2", " arg3". The presence
+of leading spaces may matter depending on the function. The same applies to
+Make - for example, $(subst .c, .o, $(sources)) is a typical mistake; it
+replaces ".c" with " .o".
+
+In Make, a user-defined function is referenced by using a built-in function,
+'call', like this::
+
+    $(call my-func,arg1,arg2,arg3)
+
+Kconfig invokes user-defined functions and built-in functions in the same way.
+The omission of 'call' makes the syntax shorter.
+
+In Make, some functions treat commas verbatim instead of argument separators.
+For example, $(shell echo hello, world) runs the command "echo hello, world".
+Likewise, $(info hello, world) prints "hello, world" to stdout. You could say
+this is _useful_ inconsistency.
+
+In Kconfig, for simpler implementation and grammatical consistency, commas that
+appear in the $( ) context are always delimiters. It means::
+
+  $(shell, echo hello, world)
+
+is an error because it is passing two parameters where the 'shell' function
+accepts only one. To pass commas in arguments, you can use the following trick::
+
+  comma := ,
+  $(shell, echo hello$(comma) world)
+
+
+Caveats
+-------
+
+A variable (or function) cannot be expanded across tokens. So, you cannot use
+a variable as a shorthand for an expression that consists of multiple tokens.
+The following works::
+
+    RANGE_MIN := 1
+    RANGE_MAX := 3
+
+    config FOO
+            int "foo"
+            range $(RANGE_MIN) $(RANGE_MAX)
+
+But, the following does not work::
+
+    RANGES := 1 3
+
+    config FOO
+            int "foo"
+            range $(RANGES)
+
+A variable cannot be expanded to any keyword in Kconfig.  The following does
+not work::
+
+    MY_TYPE := tristate
+
+    config FOO
+            $(MY_TYPE) "foo"
+            default y
+
+Obviously from the design, $(shell command) is expanded in the textual
+substitution phase. You cannot pass symbols to the 'shell' function.
+
+The following does not work as expected::
+
+    config ENDIAN_FLAG
+            string
+            default "-mbig-endian" if CPU_BIG_ENDIAN
+            default "-mlittle-endian" if CPU_LITTLE_ENDIAN
+
+    config CC_HAS_ENDIAN_FLAG
+            def_bool $(shell $(srctree)/scripts/gcc-check-flag ENDIAN_FLAG)
+
+Instead, you can do like follows so that any function call is statically
+expanded::
+
+    config CC_HAS_ENDIAN_FLAG
+            bool
+            default $(shell $(srctree)/scripts/gcc-check-flag -mbig-endian) if CPU_BIG_ENDIAN
+            default $(shell $(srctree)/scripts/gcc-check-flag -mlittle-endian) if CPU_LITTLE_ENDIAN
diff --git a/Documentation/kbuild/kconfig-macro-language.txt b/Documentation/kbuild/kconfig-macro-language.txt
deleted file mode 100644
index 07da2ea68dce..000000000000
--- a/Documentation/kbuild/kconfig-macro-language.txt
+++ /dev/null
@@ -1,242 +0,0 @@
-Concept
--------
-
-The basic idea was inspired by Make. When we look at Make, we notice sort of
-two languages in one. One language describes dependency graphs consisting of
-targets and prerequisites. The other is a macro language for performing textual
-substitution.
-
-There is clear distinction between the two language stages. For example, you
-can write a makefile like follows:
-
-    APP := foo
-    SRC := foo.c
-    CC := gcc
-
-    $(APP): $(SRC)
-            $(CC) -o $(APP) $(SRC)
-
-The macro language replaces the variable references with their expanded form,
-and handles as if the source file were input like follows:
-
-    foo: foo.c
-            gcc -o foo foo.c
-
-Then, Make analyzes the dependency graph and determines the targets to be
-updated.
-
-The idea is quite similar in Kconfig - it is possible to describe a Kconfig
-file like this:
-
-    CC := gcc
-
-    config CC_HAS_FOO
-            def_bool $(shell, $(srctree)/scripts/gcc-check-foo.sh $(CC))
-
-The macro language in Kconfig processes the source file into the following
-intermediate:
-
-    config CC_HAS_FOO
-            def_bool y
-
-Then, Kconfig moves onto the evaluation stage to resolve inter-symbol
-dependency as explained in kconfig-language.txt.
-
-
-Variables
----------
-
-Like in Make, a variable in Kconfig works as a macro variable.  A macro
-variable is expanded "in place" to yield a text string that may then be
-expanded further. To get the value of a variable, enclose the variable name in
-$( ). The parentheses are required even for single-letter variable names; $X is
-a syntax error. The curly brace form as in ${CC} is not supported either.
-
-There are two types of variables: simply expanded variables and recursively
-expanded variables.
-
-A simply expanded variable is defined using the := assignment operator. Its
-righthand side is expanded immediately upon reading the line from the Kconfig
-file.
-
-A recursively expanded variable is defined using the = assignment operator.
-Its righthand side is simply stored as the value of the variable without
-expanding it in any way. Instead, the expansion is performed when the variable
-is used.
-
-There is another type of assignment operator; += is used to append text to a
-variable. The righthand side of += is expanded immediately if the lefthand
-side was originally defined as a simple variable. Otherwise, its evaluation is
-deferred.
-
-The variable reference can take parameters, in the following form:
-
-  $(name,arg1,arg2,arg3)
-
-You can consider the parameterized reference as a function. (more precisely,
-"user-defined function" in contrast to "built-in function" listed below).
-
-Useful functions must be expanded when they are used since the same function is
-expanded differently if different parameters are passed. Hence, a user-defined
-function is defined using the = assignment operator. The parameters are
-referenced within the body definition with $(1), $(2), etc.
-
-In fact, recursively expanded variables and user-defined functions are the same
-internally. (In other words, "variable" is "function with zero argument".)
-When we say "variable" in a broad sense, it includes "user-defined function".
-
-
-Built-in functions
-------------------
-
-Like Make, Kconfig provides several built-in functions. Every function takes a
-particular number of arguments.
-
-In Make, every built-in function takes at least one argument. Kconfig allows
-zero argument for built-in functions, such as $(fileno), $(lineno). You could
-consider those as "built-in variable", but it is just a matter of how we call
-it after all. Let's say "built-in function" here to refer to natively supported
-functionality.
-
-Kconfig currently supports the following built-in functions.
-
- - $(shell,command)
-
-  The "shell" function accepts a single argument that is expanded and passed
-  to a subshell for execution. The standard output of the command is then read
-  and returned as the value of the function. Every newline in the output is
-  replaced with a space. Any trailing newlines are deleted. The standard error
-  is not returned, nor is any program exit status.
-
- - $(info,text)
-
-  The "info" function takes a single argument and prints it to stdout.
-  It evaluates to an empty string.
-
- - $(warning-if,condition,text)
-
-  The "warning-if" function takes two arguments. If the condition part is "y",
-  the text part is sent to stderr. The text is prefixed with the name of the
-  current Kconfig file and the current line number.
-
- - $(error-if,condition,text)
-
-  The "error-if" function is similar to "warning-if", but it terminates the
-  parsing immediately if the condition part is "y".
-
- - $(filename)
-
-  The 'filename' takes no argument, and $(filename) is expanded to the file
-  name being parsed.
-
- - $(lineno)
-
-  The 'lineno' takes no argument, and $(lineno) is expanded to the line number
-  being parsed.
-
-
-Make vs Kconfig
----------------
-
-Kconfig adopts Make-like macro language, but the function call syntax is
-slightly different.
-
-A function call in Make looks like this:
-
-  $(func-name arg1,arg2,arg3)
-
-The function name and the first argument are separated by at least one
-whitespace. Then, leading whitespaces are trimmed from the first argument,
-while whitespaces in the other arguments are kept. You need to use a kind of
-trick to start the first parameter with spaces. For example, if you want
-to make "info" function print "  hello", you can write like follows:
-
-  empty :=
-  space := $(empty) $(empty)
-  $(info $(space)$(space)hello)
-
-Kconfig uses only commas for delimiters, and keeps all whitespaces in the
-function call. Some people prefer putting a space after each comma delimiter:
-
-  $(func-name, arg1, arg2, arg3)
-
-In this case, "func-name" will receive " arg1", " arg2", " arg3". The presence
-of leading spaces may matter depending on the function. The same applies to
-Make - for example, $(subst .c, .o, $(sources)) is a typical mistake; it
-replaces ".c" with " .o".
-
-In Make, a user-defined function is referenced by using a built-in function,
-'call', like this:
-
-    $(call my-func,arg1,arg2,arg3)
-
-Kconfig invokes user-defined functions and built-in functions in the same way.
-The omission of 'call' makes the syntax shorter.
-
-In Make, some functions treat commas verbatim instead of argument separators.
-For example, $(shell echo hello, world) runs the command "echo hello, world".
-Likewise, $(info hello, world) prints "hello, world" to stdout. You could say
-this is _useful_ inconsistency.
-
-In Kconfig, for simpler implementation and grammatical consistency, commas that
-appear in the $( ) context are always delimiters. It means
-
-  $(shell, echo hello, world)
-
-is an error because it is passing two parameters where the 'shell' function
-accepts only one. To pass commas in arguments, you can use the following trick:
-
-  comma := ,
-  $(shell, echo hello$(comma) world)
-
-
-Caveats
--------
-
-A variable (or function) cannot be expanded across tokens. So, you cannot use
-a variable as a shorthand for an expression that consists of multiple tokens.
-The following works:
-
-    RANGE_MIN := 1
-    RANGE_MAX := 3
-
-    config FOO
-            int "foo"
-            range $(RANGE_MIN) $(RANGE_MAX)
-
-But, the following does not work:
-
-    RANGES := 1 3
-
-    config FOO
-            int "foo"
-            range $(RANGES)
-
-A variable cannot be expanded to any keyword in Kconfig.  The following does
-not work:
-
-    MY_TYPE := tristate
-
-    config FOO
-            $(MY_TYPE) "foo"
-            default y
-
-Obviously from the design, $(shell command) is expanded in the textual
-substitution phase. You cannot pass symbols to the 'shell' function.
-The following does not work as expected.
-
-    config ENDIAN_FLAG
-            string
-            default "-mbig-endian" if CPU_BIG_ENDIAN
-            default "-mlittle-endian" if CPU_LITTLE_ENDIAN
-
-    config CC_HAS_ENDIAN_FLAG
-            def_bool $(shell $(srctree)/scripts/gcc-check-flag ENDIAN_FLAG)
-
-Instead, you can do like follows so that any function call is statically
-expanded.
-
-    config CC_HAS_ENDIAN_FLAG
-            bool
-            default $(shell $(srctree)/scripts/gcc-check-flag -mbig-endian) if CPU_BIG_ENDIAN
-            default $(shell $(srctree)/scripts/gcc-check-flag -mlittle-endian) if CPU_LITTLE_ENDIAN
diff --git a/Documentation/kbuild/kconfig.rst b/Documentation/kbuild/kconfig.rst
new file mode 100644
index 000000000000..88129af7e539
--- /dev/null
+++ b/Documentation/kbuild/kconfig.rst
@@ -0,0 +1,300 @@
+===================
+Kconfig make config
+===================
+
+This file contains some assistance for using `make *config`.
+
+Use "make help" to list all of the possible configuration targets.
+
+The xconfig ('qconf'), menuconfig ('mconf'), and nconfig ('nconf')
+programs also have embedded help text.  Be sure to check that for
+navigation, search, and other general help text.
+
+General
+-------
+
+New kernel releases often introduce new config symbols.  Often more
+important, new kernel releases may rename config symbols.  When
+this happens, using a previously working .config file and running
+"make oldconfig" won't necessarily produce a working new kernel
+for you, so you may find that you need to see what NEW kernel
+symbols have been introduced.
+
+To see a list of new config symbols, use::
+
+	cp user/some/old.config .config
+	make listnewconfig
+
+and the config program will list any new symbols, one per line.
+
+Alternatively, you can use the brute force method::
+
+	make oldconfig
+	scripts/diffconfig .config.old .config | less
+
+----------------------------------------------------------------------
+
+Environment variables for `*config`
+
+KCONFIG_CONFIG
+--------------
+This environment variable can be used to specify a default kernel config
+file name to override the default name of ".config".
+
+KCONFIG_OVERWRITECONFIG
+-----------------------
+If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
+break symlinks when .config is a symlink to somewhere else.
+
+`CONFIG_`
+---------
+If you set `CONFIG_` in the environment, Kconfig will prefix all symbols
+with its value when saving the configuration, instead of using the default,
+`CONFIG_`.
+
+----------------------------------------------------------------------
+
+Environment variables for '{allyes/allmod/allno/rand}config'
+
+KCONFIG_ALLCONFIG
+-----------------
+(partially based on lkml email from/by Rob Landley, re: miniconfig)
+
+--------------------------------------------------
+
+The allyesconfig/allmodconfig/allnoconfig/randconfig variants can also
+use the environment variable KCONFIG_ALLCONFIG as a flag or a filename
+that contains config symbols that the user requires to be set to a
+specific value.  If KCONFIG_ALLCONFIG is used without a filename where
+KCONFIG_ALLCONFIG == "" or KCONFIG_ALLCONFIG == "1", `make *config`
+checks for a file named "all{yes/mod/no/def/random}.config"
+(corresponding to the `*config` command that was used) for symbol values
+that are to be forced.  If this file is not found, it checks for a
+file named "all.config" to contain forced values.
+
+This enables you to create "miniature" config (miniconfig) or custom
+config files containing just the config symbols that you are interested
+in.  Then the kernel config system generates the full .config file,
+including symbols of your miniconfig file.
+
+This 'KCONFIG_ALLCONFIG' file is a config file which contains
+(usually a subset of all) preset config symbols.  These variable
+settings are still subject to normal dependency checks.
+
+Examples::
+
+	KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
+
+or::
+
+	KCONFIG_ALLCONFIG=mini.config make allnoconfig
+
+or::
+
+	make KCONFIG_ALLCONFIG=mini.config allnoconfig
+
+These examples will disable most options (allnoconfig) but enable or
+disable the options that are explicitly listed in the specified
+mini-config files.
+
+----------------------------------------------------------------------
+
+Environment variables for 'randconfig'
+
+KCONFIG_SEED
+------------
+You can set this to the integer value used to seed the RNG, if you want
+to somehow debug the behaviour of the kconfig parser/frontends.
+If not set, the current time will be used.
+
+KCONFIG_PROBABILITY
+-------------------
+This variable can be used to skew the probabilities. This variable can
+be unset or empty, or set to three different formats:
+
+    =======================     ==================  =====================
+	KCONFIG_PROBABILITY     y:n split           y:m:n split
+    =======================     ==================  =====================
+	unset or empty          50  : 50            33  : 33  : 34
+	N                        N  : 100-N         N/2 : N/2 : 100-N
+    [1] N:M                     N+M : 100-(N+M)      N  :  M  : 100-(N+M)
+    [2] N:M:L                    N  : 100-N          M  :  L  : 100-(M+L)
+    =======================     ==================  =====================
+
+where N, M and L are integers (in base 10) in the range [0,100], and so
+that:
+
+    [1] N+M is in the range [0,100]
+
+    [2] M+L is in the range [0,100]
+
+Examples::
+
+	KCONFIG_PROBABILITY=10
+		10% of booleans will be set to 'y', 90% to 'n'
+		5% of tristates will be set to 'y', 5% to 'm', 90% to 'n'
+	KCONFIG_PROBABILITY=15:25
+		40% of booleans will be set to 'y', 60% to 'n'
+		15% of tristates will be set to 'y', 25% to 'm', 60% to 'n'
+	KCONFIG_PROBABILITY=10:15:15
+		10% of booleans will be set to 'y', 90% to 'n'
+		15% of tristates will be set to 'y', 15% to 'm', 70% to 'n'
+
+----------------------------------------------------------------------
+
+Environment variables for 'syncconfig'
+
+KCONFIG_NOSILENTUPDATE
+----------------------
+If this variable has a non-blank value, it prevents silent kernel
+config updates (requires explicit updates).
+
+KCONFIG_AUTOCONFIG
+------------------
+This environment variable can be set to specify the path & name of the
+"auto.conf" file.  Its default value is "include/config/auto.conf".
+
+KCONFIG_TRISTATE
+----------------
+This environment variable can be set to specify the path & name of the
+"tristate.conf" file.  Its default value is "include/config/tristate.conf".
+
+KCONFIG_AUTOHEADER
+------------------
+This environment variable can be set to specify the path & name of the
+"autoconf.h" (header) file.
+Its default value is "include/generated/autoconf.h".
+
+
+----------------------------------------------------------------------
+
+menuconfig
+----------
+
+SEARCHING for CONFIG symbols
+
+Searching in menuconfig:
+
+	The Search function searches for kernel configuration symbol
+	names, so you have to know something close to what you are
+	looking for.
+
+	Example::
+
+		/hotplug
+		This lists all config symbols that contain "hotplug",
+		e.g., HOTPLUG_CPU, MEMORY_HOTPLUG.
+
+	For search help, enter / followed by TAB-TAB (to highlight
+	<Help>) and Enter.  This will tell you that you can also use
+	regular expressions (regexes) in the search string, so if you
+	are not interested in MEMORY_HOTPLUG, you could try::
+
+		/^hotplug
+
+	When searching, symbols are sorted thus:
+
+	  - first, exact matches, sorted alphabetically (an exact match
+	    is when the search matches the complete symbol name);
+	  - then, other matches, sorted alphabetically.
+
+	For example: ^ATH.K matches:
+
+	    ATH5K ATH9K ATH5K_AHB ATH5K_DEBUG [...] ATH6KL ATH6KL_DEBUG
+	    [...] ATH9K_AHB ATH9K_BTCOEX_SUPPORT ATH9K_COMMON [...]
+
+	of which only ATH5K and ATH9K match exactly and so are sorted
+	first (and in alphabetical order), then come all other symbols,
+	sorted in alphabetical order.
+
+----------------------------------------------------------------------
+
+User interface options for 'menuconfig'
+
+MENUCONFIG_COLOR
+----------------
+It is possible to select different color themes using the variable
+MENUCONFIG_COLOR.  To select a theme use::
+
+	make MENUCONFIG_COLOR=<theme> menuconfig
+
+Available themes are::
+
+  - mono       => selects colors suitable for monochrome displays
+  - blackbg    => selects a color scheme with black background
+  - classic    => theme with blue background. The classic look
+  - bluetitle  => a LCD friendly version of classic. (default)
+
+MENUCONFIG_MODE
+---------------
+This mode shows all sub-menus in one large tree.
+
+Example::
+
+	make MENUCONFIG_MODE=single_menu menuconfig
+
+----------------------------------------------------------------------
+
+nconfig
+-------
+
+nconfig is an alternate text-based configurator.  It lists function
+keys across the bottom of the terminal (window) that execute commands.
+You can also just use the corresponding numeric key to execute the
+commands unless you are in a data entry window.  E.g., instead of F6
+for Save, you can just press 6.
+
+Use F1 for Global help or F3 for the Short help menu.
+
+Searching in nconfig:
+
+	You can search either in the menu entry "prompt" strings
+	or in the configuration symbols.
+
+	Use / to begin a search through the menu entries.  This does
+	not support regular expressions.  Use <Down> or <Up> for
+	Next hit and Previous hit, respectively.  Use <Esc> to
+	terminate the search mode.
+
+	F8 (SymSearch) searches the configuration symbols for the
+	given string or regular expression (regex).
+
+NCONFIG_MODE
+------------
+This mode shows all sub-menus in one large tree.
+
+Example::
+	make NCONFIG_MODE=single_menu nconfig
+
+----------------------------------------------------------------------
+
+xconfig
+-------
+
+Searching in xconfig:
+
+	The Search function searches for kernel configuration symbol
+	names, so you have to know something close to what you are
+	looking for.
+
+	Example:
+		Ctrl-F hotplug
+	or
+		Menu: File, Search, hotplug
+
+	lists all config symbol entries that contain "hotplug" in
+	the symbol name.  In this Search dialog, you may change the
+	config setting for any of the entries that are not grayed out.
+	You can also enter a different search string without having
+	to return to the main menu.
+
+
+----------------------------------------------------------------------
+
+gconfig
+-------
+
+Searching in gconfig:
+
+	There is no search command in gconfig.  However, gconfig does
+	have several different viewing choices, modes, and options.
diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.txt
deleted file mode 100644
index 68c82914c0f3..000000000000
--- a/Documentation/kbuild/kconfig.txt
+++ /dev/null
@@ -1,272 +0,0 @@
-This file contains some assistance for using "make *config".
-
-Use "make help" to list all of the possible configuration targets.
-
-The xconfig ('qconf'), menuconfig ('mconf'), and nconfig ('nconf')
-programs also have embedded help text.  Be sure to check that for
-navigation, search, and other general help text.
-
-======================================================================
-General
---------------------------------------------------
-
-New kernel releases often introduce new config symbols.  Often more
-important, new kernel releases may rename config symbols.  When
-this happens, using a previously working .config file and running
-"make oldconfig" won't necessarily produce a working new kernel
-for you, so you may find that you need to see what NEW kernel
-symbols have been introduced.
-
-To see a list of new config symbols, use
-
-	cp user/some/old.config .config
-	make listnewconfig
-
-and the config program will list any new symbols, one per line.
-
-Alternatively, you can use the brute force method:
-
-	make oldconfig
-	scripts/diffconfig .config.old .config | less
-
-______________________________________________________________________
-Environment variables for '*config'
-
-KCONFIG_CONFIG
---------------------------------------------------
-This environment variable can be used to specify a default kernel config
-file name to override the default name of ".config".
-
-KCONFIG_OVERWRITECONFIG
---------------------------------------------------
-If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
-break symlinks when .config is a symlink to somewhere else.
-
-CONFIG_
---------------------------------------------------
-If you set CONFIG_ in the environment, Kconfig will prefix all symbols
-with its value when saving the configuration, instead of using the default,
-"CONFIG_".
-
-______________________________________________________________________
-Environment variables for '{allyes/allmod/allno/rand}config'
-
-KCONFIG_ALLCONFIG
---------------------------------------------------
-(partially based on lkml email from/by Rob Landley, re: miniconfig)
---------------------------------------------------
-The allyesconfig/allmodconfig/allnoconfig/randconfig variants can also
-use the environment variable KCONFIG_ALLCONFIG as a flag or a filename
-that contains config symbols that the user requires to be set to a
-specific value.  If KCONFIG_ALLCONFIG is used without a filename where
-KCONFIG_ALLCONFIG == "" or KCONFIG_ALLCONFIG == "1", "make *config"
-checks for a file named "all{yes/mod/no/def/random}.config"
-(corresponding to the *config command that was used) for symbol values
-that are to be forced.  If this file is not found, it checks for a
-file named "all.config" to contain forced values.
-
-This enables you to create "miniature" config (miniconfig) or custom
-config files containing just the config symbols that you are interested
-in.  Then the kernel config system generates the full .config file,
-including symbols of your miniconfig file.
-
-This 'KCONFIG_ALLCONFIG' file is a config file which contains
-(usually a subset of all) preset config symbols.  These variable
-settings are still subject to normal dependency checks.
-
-Examples:
-	KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
-or
-	KCONFIG_ALLCONFIG=mini.config make allnoconfig
-or
-	make KCONFIG_ALLCONFIG=mini.config allnoconfig
-
-These examples will disable most options (allnoconfig) but enable or
-disable the options that are explicitly listed in the specified
-mini-config files.
-
-______________________________________________________________________
-Environment variables for 'randconfig'
-
-KCONFIG_SEED
---------------------------------------------------
-You can set this to the integer value used to seed the RNG, if you want
-to somehow debug the behaviour of the kconfig parser/frontends.
-If not set, the current time will be used.
-
-KCONFIG_PROBABILITY
---------------------------------------------------
-This variable can be used to skew the probabilities. This variable can
-be unset or empty, or set to three different formats:
-	KCONFIG_PROBABILITY     y:n split           y:m:n split
-	-----------------------------------------------------------------
-	unset or empty          50  : 50            33  : 33  : 34
-	N                        N  : 100-N         N/2 : N/2 : 100-N
-    [1] N:M                     N+M : 100-(N+M)      N  :  M  : 100-(N+M)
-    [2] N:M:L                    N  : 100-N          M  :  L  : 100-(M+L)
-
-where N, M and L are integers (in base 10) in the range [0,100], and so
-that:
-    [1] N+M is in the range [0,100]
-    [2] M+L is in the range [0,100]
-
-Examples:
-	KCONFIG_PROBABILITY=10
-		10% of booleans will be set to 'y', 90% to 'n'
-		5% of tristates will be set to 'y', 5% to 'm', 90% to 'n'
-	KCONFIG_PROBABILITY=15:25
-		40% of booleans will be set to 'y', 60% to 'n'
-		15% of tristates will be set to 'y', 25% to 'm', 60% to 'n'
-	KCONFIG_PROBABILITY=10:15:15
-		10% of booleans will be set to 'y', 90% to 'n'
-		15% of tristates will be set to 'y', 15% to 'm', 70% to 'n'
-
-______________________________________________________________________
-Environment variables for 'syncconfig'
-
-KCONFIG_NOSILENTUPDATE
---------------------------------------------------
-If this variable has a non-blank value, it prevents silent kernel
-config updates (requires explicit updates).
-
-KCONFIG_AUTOCONFIG
---------------------------------------------------
-This environment variable can be set to specify the path & name of the
-"auto.conf" file.  Its default value is "include/config/auto.conf".
-
-KCONFIG_TRISTATE
---------------------------------------------------
-This environment variable can be set to specify the path & name of the
-"tristate.conf" file.  Its default value is "include/config/tristate.conf".
-
-KCONFIG_AUTOHEADER
---------------------------------------------------
-This environment variable can be set to specify the path & name of the
-"autoconf.h" (header) file.
-Its default value is "include/generated/autoconf.h".
-
-
-======================================================================
-menuconfig
---------------------------------------------------
-
-SEARCHING for CONFIG symbols
-
-Searching in menuconfig:
-
-	The Search function searches for kernel configuration symbol
-	names, so you have to know something close to what you are
-	looking for.
-
-	Example:
-		/hotplug
-		This lists all config symbols that contain "hotplug",
-		e.g., HOTPLUG_CPU, MEMORY_HOTPLUG.
-
-	For search help, enter / followed by TAB-TAB (to highlight
-	<Help>) and Enter.  This will tell you that you can also use
-	regular expressions (regexes) in the search string, so if you
-	are not interested in MEMORY_HOTPLUG, you could try
-
-		/^hotplug
-
-	When searching, symbols are sorted thus:
-	  - first, exact matches, sorted alphabetically (an exact match
-	    is when the search matches the complete symbol name);
-	  - then, other matches, sorted alphabetically.
-	For example: ^ATH.K matches:
-	    ATH5K ATH9K ATH5K_AHB ATH5K_DEBUG [...] ATH6KL ATH6KL_DEBUG
-	    [...] ATH9K_AHB ATH9K_BTCOEX_SUPPORT ATH9K_COMMON [...]
-	of which only ATH5K and ATH9K match exactly and so are sorted
-	first (and in alphabetical order), then come all other symbols,
-	sorted in alphabetical order.
-
-______________________________________________________________________
-User interface options for 'menuconfig'
-
-MENUCONFIG_COLOR
---------------------------------------------------
-It is possible to select different color themes using the variable
-MENUCONFIG_COLOR.  To select a theme use:
-
-	make MENUCONFIG_COLOR=<theme> menuconfig
-
-Available themes are:
-  mono       => selects colors suitable for monochrome displays
-  blackbg    => selects a color scheme with black background
-  classic    => theme with blue background. The classic look
-  bluetitle  => a LCD friendly version of classic. (default)
-
-MENUCONFIG_MODE
---------------------------------------------------
-This mode shows all sub-menus in one large tree.
-
-Example:
-	make MENUCONFIG_MODE=single_menu menuconfig
-
-
-======================================================================
-nconfig
---------------------------------------------------
-
-nconfig is an alternate text-based configurator.  It lists function
-keys across the bottom of the terminal (window) that execute commands.
-You can also just use the corresponding numeric key to execute the
-commands unless you are in a data entry window.  E.g., instead of F6
-for Save, you can just press 6.
-
-Use F1 for Global help or F3 for the Short help menu.
-
-Searching in nconfig:
-
-	You can search either in the menu entry "prompt" strings
-	or in the configuration symbols.
-
-	Use / to begin a search through the menu entries.  This does
-	not support regular expressions.  Use <Down> or <Up> for
-	Next hit and Previous hit, respectively.  Use <Esc> to
-	terminate the search mode.
-
-	F8 (SymSearch) searches the configuration symbols for the
-	given string or regular expression (regex).
-
-NCONFIG_MODE
---------------------------------------------------
-This mode shows all sub-menus in one large tree.
-
-Example:
-	make NCONFIG_MODE=single_menu nconfig
-
-
-======================================================================
-xconfig
---------------------------------------------------
-
-Searching in xconfig:
-
-	The Search function searches for kernel configuration symbol
-	names, so you have to know something close to what you are
-	looking for.
-
-	Example:
-		Ctrl-F hotplug
-	or
-		Menu: File, Search, hotplug
-
-	lists all config symbol entries that contain "hotplug" in
-	the symbol name.  In this Search dialog, you may change the
-	config setting for any of the entries that are not grayed out.
-	You can also enter a different search string without having
-	to return to the main menu.
-
-
-======================================================================
-gconfig
---------------------------------------------------
-
-Searching in gconfig:
-
-	There is no search command in gconfig.  However, gconfig does
-	have several different viewing choices, modes, and options.
-
-###
diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst
new file mode 100644
index 000000000000..9274cdcc9bd2
--- /dev/null
+++ b/Documentation/kbuild/makefiles.rst
@@ -0,0 +1,1509 @@
+======================
+Linux Kernel Makefiles
+======================
+
+This document describes the Linux kernel Makefiles.
+
+.. Table of Contents
+
+	=== 1 Overview
+	=== 2 Who does what
+	=== 3 The kbuild files
+	   --- 3.1 Goal definitions
+	   --- 3.2 Built-in object goals - obj-y
+	   --- 3.3 Loadable module goals - obj-m
+	   --- 3.4 Objects which export symbols
+	   --- 3.5 Library file goals - lib-y
+	   --- 3.6 Descending down in directories
+	   --- 3.7 Compilation flags
+	   --- 3.8 Command line dependency
+	   --- 3.9 Dependency tracking
+	   --- 3.10 Special Rules
+	   --- 3.11 $(CC) support functions
+	   --- 3.12 $(LD) support functions
+
+	=== 4 Host Program support
+	   --- 4.1 Simple Host Program
+	   --- 4.2 Composite Host Programs
+	   --- 4.3 Using C++ for host programs
+	   --- 4.4 Controlling compiler options for host programs
+	   --- 4.5 When host programs are actually built
+	   --- 4.6 Using hostprogs-$(CONFIG_FOO)
+
+	=== 5 Kbuild clean infrastructure
+
+	=== 6 Architecture Makefiles
+	   --- 6.1 Set variables to tweak the build to the architecture
+	   --- 6.2 Add prerequisites to archheaders:
+	   --- 6.3 Add prerequisites to archprepare:
+	   --- 6.4 List directories to visit when descending
+	   --- 6.5 Architecture-specific boot images
+	   --- 6.6 Building non-kbuild targets
+	   --- 6.7 Commands useful for building a boot image
+	   --- 6.8 Custom kbuild commands
+	   --- 6.9 Preprocessing linker scripts
+	   --- 6.10 Generic header files
+	   --- 6.11 Post-link pass
+
+	=== 7 Kbuild syntax for exported headers
+		--- 7.1 no-export-headers
+		--- 7.2 generic-y
+		--- 7.3 generated-y
+		--- 7.4 mandatory-y
+
+	=== 8 Kbuild Variables
+	=== 9 Makefile language
+	=== 10 Credits
+	=== 11 TODO
+
+1 Overview
+==========
+
+The Makefiles have five parts::
+
+	Makefile		the top Makefile.
+	.config			the kernel configuration file.
+	arch/$(ARCH)/Makefile	the arch Makefile.
+	scripts/Makefile.*	common rules etc. for all kbuild Makefiles.
+	kbuild Makefiles	there are about 500 of these.
+
+The top Makefile reads the .config file, which comes from the kernel
+configuration process.
+
+The top Makefile is responsible for building two major products: vmlinux
+(the resident kernel image) and modules (any module files).
+It builds these goals by recursively descending into the subdirectories of
+the kernel source tree.
+The list of subdirectories which are visited depends upon the kernel
+configuration. The top Makefile textually includes an arch Makefile
+with the name arch/$(ARCH)/Makefile. The arch Makefile supplies
+architecture-specific information to the top Makefile.
+
+Each subdirectory has a kbuild Makefile which carries out the commands
+passed down from above. The kbuild Makefile uses information from the
+.config file to construct various file lists used by kbuild to build
+any built-in or modular targets.
+
+scripts/Makefile.* contains all the definitions/rules etc. that
+are used to build the kernel based on the kbuild makefiles.
+
+
+2 Who does what
+===============
+
+People have four different relationships with the kernel Makefiles.
+
+*Users* are people who build kernels.  These people type commands such as
+"make menuconfig" or "make".  They usually do not read or edit
+any kernel Makefiles (or any other source files).
+
+*Normal developers* are people who work on features such as device
+drivers, file systems, and network protocols.  These people need to
+maintain the kbuild Makefiles for the subsystem they are
+working on.  In order to do this effectively, they need some overall
+knowledge about the kernel Makefiles, plus detailed knowledge about the
+public interface for kbuild.
+
+*Arch developers* are people who work on an entire architecture, such
+as sparc or ia64.  Arch developers need to know about the arch Makefile
+as well as kbuild Makefiles.
+
+*Kbuild developers* are people who work on the kernel build system itself.
+These people need to know about all aspects of the kernel Makefiles.
+
+This document is aimed towards normal developers and arch developers.
+
+
+3 The kbuild files
+==================
+
+Most Makefiles within the kernel are kbuild Makefiles that use the
+kbuild infrastructure. This chapter introduces the syntax used in the
+kbuild makefiles.
+The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can
+be used and if both a 'Makefile' and a 'Kbuild' file exists, then the 'Kbuild'
+file will be used.
+
+Section 3.1 "Goal definitions" is a quick intro, further chapters provide
+more details, with real examples.
+
+3.1 Goal definitions
+--------------------
+
+	Goal definitions are the main part (heart) of the kbuild Makefile.
+	These lines define the files to be built, any special compilation
+	options, and any subdirectories to be entered recursively.
+
+	The most simple kbuild makefile contains one line:
+
+	Example::
+
+		obj-y += foo.o
+
+	This tells kbuild that there is one object in that directory, named
+	foo.o. foo.o will be built from foo.c or foo.S.
+
+	If foo.o shall be built as a module, the variable obj-m is used.
+	Therefore the following pattern is often used:
+
+	Example::
+
+		obj-$(CONFIG_FOO) += foo.o
+
+	$(CONFIG_FOO) evaluates to either y (for built-in) or m (for module).
+	If CONFIG_FOO is neither y nor m, then the file will not be compiled
+	nor linked.
+
+3.2 Built-in object goals - obj-y
+---------------------------------
+
+	The kbuild Makefile specifies object files for vmlinux
+	in the $(obj-y) lists.  These lists depend on the kernel
+	configuration.
+
+	Kbuild compiles all the $(obj-y) files.  It then calls
+	"$(AR) rcSTP" to merge these files into one built-in.a file.
+	This is a thin archive without a symbol table. It will be later
+	linked into vmlinux by scripts/link-vmlinux.sh
+
+	The order of files in $(obj-y) is significant.  Duplicates in
+	the lists are allowed: the first instance will be linked into
+	built-in.a and succeeding instances will be ignored.
+
+	Link order is significant, because certain functions
+	(module_init() / __initcall) will be called during boot in the
+	order they appear. So keep in mind that changing the link
+	order may e.g. change the order in which your SCSI
+	controllers are detected, and thus your disks are renumbered.
+
+	Example::
+
+		#drivers/isdn/i4l/Makefile
+		# Makefile for the kernel ISDN subsystem and device drivers.
+		# Each configuration option enables a list of files.
+		obj-$(CONFIG_ISDN_I4L)         += isdn.o
+		obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
+
+3.3 Loadable module goals - obj-m
+---------------------------------
+
+	$(obj-m) specifies object files which are built as loadable
+	kernel modules.
+
+	A module may be built from one source file or several source
+	files. In the case of one source file, the kbuild makefile
+	simply adds the file to $(obj-m).
+
+	Example::
+
+		#drivers/isdn/i4l/Makefile
+		obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
+
+	Note: In this example $(CONFIG_ISDN_PPP_BSDCOMP) evaluates to 'm'
+
+	If a kernel module is built from several source files, you specify
+	that you want to build a module in the same way as above; however,
+	kbuild needs to know which object files you want to build your
+	module from, so you have to tell it by setting a $(<module_name>-y)
+	variable.
+
+	Example::
+
+		#drivers/isdn/i4l/Makefile
+		obj-$(CONFIG_ISDN_I4L) += isdn.o
+		isdn-y := isdn_net_lib.o isdn_v110.o isdn_common.o
+
+	In this example, the module name will be isdn.o. Kbuild will
+	compile the objects listed in $(isdn-y) and then run
+	"$(LD) -r" on the list of these files to generate isdn.o.
+
+	Due to kbuild recognizing $(<module_name>-y) for composite objects,
+	you can use the value of a `CONFIG_` symbol to optionally include an
+	object file as part of a composite object.
+
+	Example::
+
+		#fs/ext2/Makefile
+	        obj-$(CONFIG_EXT2_FS) += ext2.o
+		ext2-y := balloc.o dir.o file.o ialloc.o inode.o ioctl.o \
+			  namei.o super.o symlink.o
+	        ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o xattr_user.o \
+						xattr_trusted.o
+
+	In this example, xattr.o, xattr_user.o and xattr_trusted.o are only
+	part of the composite object ext2.o if $(CONFIG_EXT2_FS_XATTR)
+	evaluates to 'y'.
+
+	Note: Of course, when you are building objects into the kernel,
+	the syntax above will also work. So, if you have CONFIG_EXT2_FS=y,
+	kbuild will build an ext2.o file for you out of the individual
+	parts and then link this into built-in.a, as you would expect.
+
+3.4 Objects which export symbols
+--------------------------------
+
+	No special notation is required in the makefiles for
+	modules exporting symbols.
+
+3.5 Library file goals - lib-y
+------------------------------
+
+	Objects listed with obj-* are used for modules, or
+	combined in a built-in.a for that specific directory.
+	There is also the possibility to list objects that will
+	be included in a library, lib.a.
+	All objects listed with lib-y are combined in a single
+	library for that directory.
+	Objects that are listed in obj-y and additionally listed in
+	lib-y will not be included in the library, since they will
+	be accessible anyway.
+	For consistency, objects listed in lib-m will be included in lib.a.
+
+	Note that the same kbuild makefile may list files to be built-in
+	and to be part of a library. Therefore the same directory
+	may contain both a built-in.a and a lib.a file.
+
+	Example::
+
+		#arch/x86/lib/Makefile
+		lib-y    := delay.o
+
+	This will create a library lib.a based on delay.o. For kbuild to
+	actually recognize that there is a lib.a being built, the directory
+	shall be listed in libs-y.
+
+	See also "6.4 List directories to visit when descending".
+
+	Use of lib-y is normally restricted to `lib/` and `arch/*/lib`.
+
+3.6 Descending down in directories
+----------------------------------
+
+	A Makefile is only responsible for building objects in its own
+	directory. Files in subdirectories should be taken care of by
+	Makefiles in these subdirs. The build system will automatically
+	invoke make recursively in subdirectories, provided you let it know of
+	them.
+
+	To do so, obj-y and obj-m are used.
+	ext2 lives in a separate directory, and the Makefile present in fs/
+	tells kbuild to descend down using the following assignment.
+
+	Example::
+
+		#fs/Makefile
+		obj-$(CONFIG_EXT2_FS) += ext2/
+
+	If CONFIG_EXT2_FS is set to either 'y' (built-in) or 'm' (modular)
+	the corresponding obj- variable will be set, and kbuild will descend
+	down in the ext2 directory.
+	Kbuild only uses this information to decide that it needs to visit
+	the directory, it is the Makefile in the subdirectory that
+	specifies what is modular and what is built-in.
+
+	It is good practice to use a `CONFIG_` variable when assigning directory
+	names. This allows kbuild to totally skip the directory if the
+	corresponding `CONFIG_` option is neither 'y' nor 'm'.
+
+3.7 Compilation flags
+---------------------
+
+    ccflags-y, asflags-y and ldflags-y
+	These three flags apply only to the kbuild makefile in which they
+	are assigned. They are used for all the normal cc, as and ld
+	invocations happening during a recursive build.
+	Note: Flags with the same behaviour were previously named:
+	EXTRA_CFLAGS, EXTRA_AFLAGS and EXTRA_LDFLAGS.
+	They are still supported but their usage is deprecated.
+
+	ccflags-y specifies options for compiling with $(CC).
+
+	Example::
+
+		# drivers/acpi/acpica/Makefile
+		ccflags-y			:= -Os -D_LINUX -DBUILDING_ACPICA
+		ccflags-$(CONFIG_ACPI_DEBUG)	+= -DACPI_DEBUG_OUTPUT
+
+	This variable is necessary because the top Makefile owns the
+	variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
+	entire tree.
+
+	asflags-y specifies options for assembling with $(AS).
+
+	Example::
+
+		#arch/sparc/kernel/Makefile
+		asflags-y := -ansi
+
+	ldflags-y specifies options for linking with $(LD).
+
+	Example::
+
+		#arch/cris/boot/compressed/Makefile
+		ldflags-y += -T $(srctree)/$(src)/decompress_$(arch-y).lds
+
+    subdir-ccflags-y, subdir-asflags-y
+	The two flags listed above are similar to ccflags-y and asflags-y.
+	The difference is that the subdir- variants have effect for the kbuild
+	file where they are present and all subdirectories.
+	Options specified using subdir-* are added to the commandline before
+	the options specified using the non-subdir variants.
+
+	Example::
+
+		subdir-ccflags-y := -Werror
+
+    CFLAGS_$@, AFLAGS_$@
+	CFLAGS_$@ and AFLAGS_$@ only apply to commands in current
+	kbuild makefile.
+
+	$(CFLAGS_$@) specifies per-file options for $(CC).  The $@
+	part has a literal value which specifies the file that it is for.
+
+	Example::
+
+		# drivers/scsi/Makefile
+		CFLAGS_aha152x.o =   -DAHA152X_STAT -DAUTOCONF
+		CFLAGS_gdth.o    = # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ \
+				     -DGDTH_STATISTICS
+
+	These two lines specify compilation flags for aha152x.o and gdth.o.
+
+	$(AFLAGS_$@) is a similar feature for source files in assembly
+	languages.
+
+	Example::
+
+		# arch/arm/kernel/Makefile
+		AFLAGS_head.o        := -DTEXT_OFFSET=$(TEXT_OFFSET)
+		AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
+		AFLAGS_iwmmxt.o      := -Wa,-mcpu=iwmmxt
+
+
+3.9 Dependency tracking
+-----------------------
+
+	Kbuild tracks dependencies on the following:
+	1) All prerequisite files (both `*.c` and `*.h`)
+	2) `CONFIG_` options used in all prerequisite files
+	3) Command-line used to compile target
+
+	Thus, if you change an option to $(CC) all affected files will
+	be re-compiled.
+
+3.10 Special Rules
+------------------
+
+	Special rules are used when the kbuild infrastructure does
+	not provide the required support. A typical example is
+	header files generated during the build process.
+	Another example are the architecture-specific Makefiles which
+	need special rules to prepare boot images etc.
+
+	Special rules are written as normal Make rules.
+	Kbuild is not executing in the directory where the Makefile is
+	located, so all special rules shall provide a relative
+	path to prerequisite files and target files.
+
+	Two variables are used when defining special rules:
+
+	$(src)
+	    $(src) is a relative path which points to the directory
+	    where the Makefile is located. Always use $(src) when
+	    referring to files located in the src tree.
+
+	$(obj)
+	    $(obj) is a relative path which points to the directory
+	    where the target is saved. Always use $(obj) when
+	    referring to generated files.
+
+	    Example::
+
+		#drivers/scsi/Makefile
+		$(obj)/53c8xx_d.h: $(src)/53c7,8xx.scr $(src)/script_asm.pl
+			$(CPP) -DCHIP=810 - < $< | ... $(src)/script_asm.pl
+
+	    This is a special rule, following the normal syntax
+	    required by make.
+
+	    The target file depends on two prerequisite files. References
+	    to the target file are prefixed with $(obj), references
+	    to prerequisites are referenced with $(src) (because they are not
+	    generated files).
+
+	$(kecho)
+	    echoing information to user in a rule is often a good practice
+	    but when execution "make -s" one does not expect to see any output
+	    except for warnings/errors.
+	    To support this kbuild defines $(kecho) which will echo out the
+	    text following $(kecho) to stdout except if "make -s" is used.
+
+	Example::
+
+		#arch/blackfin/boot/Makefile
+		$(obj)/vmImage: $(obj)/vmlinux.gz
+			$(call if_changed,uimage)
+			@$(kecho) 'Kernel: $@ is ready'
+
+
+3.11 $(CC) support functions
+----------------------------
+
+	The kernel may be built with several different versions of
+	$(CC), each supporting a unique set of features and options.
+	kbuild provides basic support to check for valid options for $(CC).
+	$(CC) is usually the gcc compiler, but other alternatives are
+	available.
+
+    as-option
+	as-option is used to check if $(CC) -- when used to compile
+	assembler (`*.S`) files -- supports the given option. An optional
+	second option may be specified if the first option is not supported.
+
+	Example::
+
+		#arch/sh/Makefile
+		cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),)
+
+	In the above example, cflags-y will be assigned the option
+	-Wa$(comma)-isa=$(isa-y) if it is supported by $(CC).
+	The second argument is optional, and if supplied will be used
+	if first argument is not supported.
+
+    cc-ldoption
+	cc-ldoption is used to check if $(CC) when used to link object files
+	supports the given option.  An optional second option may be
+	specified if first option are not supported.
+
+	Example::
+
+		#arch/x86/kernel/Makefile
+		vsyscall-flags += $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+
+	In the above example, vsyscall-flags will be assigned the option
+	-Wl$(comma)--hash-style=sysv if it is supported by $(CC).
+	The second argument is optional, and if supplied will be used
+	if first argument is not supported.
+
+    as-instr
+	as-instr checks if the assembler reports a specific instruction
+	and then outputs either option1 or option2
+	C escapes are supported in the test instruction
+	Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options
+
+    cc-option
+	cc-option is used to check if $(CC) supports a given option, and if
+	not supported to use an optional second option.
+
+	Example::
+
+		#arch/x86/Makefile
+		cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586)
+
+	In the above example, cflags-y will be assigned the option
+	-march=pentium-mmx if supported by $(CC), otherwise -march=i586.
+	The second argument to cc-option is optional, and if omitted,
+	cflags-y will be assigned no value if first option is not supported.
+	Note: cc-option uses KBUILD_CFLAGS for $(CC) options
+
+   cc-option-yn
+	cc-option-yn is used to check if gcc supports a given option
+	and return 'y' if supported, otherwise 'n'.
+
+	Example::
+
+		#arch/ppc/Makefile
+		biarch := $(call cc-option-yn, -m32)
+		aflags-$(biarch) += -a32
+		cflags-$(biarch) += -m32
+
+	In the above example, $(biarch) is set to y if $(CC) supports the -m32
+	option. When $(biarch) equals 'y', the expanded variables $(aflags-y)
+	and $(cflags-y) will be assigned the values -a32 and -m32,
+	respectively.
+	Note: cc-option-yn uses KBUILD_CFLAGS for $(CC) options
+
+    cc-disable-warning
+	cc-disable-warning checks if gcc supports a given warning and returns
+	the commandline switch to disable it. This special function is needed,
+	because gcc 4.4 and later accept any unknown -Wno-* option and only
+	warn about it if there is another warning in the source file.
+
+	Example::
+
+		KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable)
+
+	In the above example, -Wno-unused-but-set-variable will be added to
+	KBUILD_CFLAGS only if gcc really accepts it.
+
+    cc-ifversion
+	cc-ifversion tests the version of $(CC) and equals the fourth parameter
+	if version expression is true, or the fifth (if given) if the version
+	expression is false.
+
+	Example::
+
+		#fs/reiserfs/Makefile
+		ccflags-y := $(call cc-ifversion, -lt, 0402, -O1)
+
+	In this example, ccflags-y will be assigned the value -O1 if the
+	$(CC) version is less than 4.2.
+	cc-ifversion takes all the shell operators:
+	-eq, -ne, -lt, -le, -gt, and -ge
+	The third parameter may be a text as in this example, but it may also
+	be an expanded variable or a macro.
+
+    cc-cross-prefix
+	cc-cross-prefix is used to check if there exists a $(CC) in path with
+	one of the listed prefixes. The first prefix where there exist a
+	prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found
+	then nothing is returned.
+	Additional prefixes are separated by a single space in the
+	call of cc-cross-prefix.
+	This functionality is useful for architecture Makefiles that try
+	to set CROSS_COMPILE to well-known values but may have several
+	values to select between.
+	It is recommended only to try to set CROSS_COMPILE if it is a cross
+	build (host arch is different from target arch). And if CROSS_COMPILE
+	is already set then leave it with the old value.
+
+	Example::
+
+		#arch/m68k/Makefile
+		ifneq ($(SUBARCH),$(ARCH))
+		        ifeq ($(CROSS_COMPILE),)
+		               CROSS_COMPILE := $(call cc-cross-prefix, m68k-linux-gnu-)
+			endif
+		endif
+
+3.12 $(LD) support functions
+----------------------------
+
+    ld-option
+	ld-option is used to check if $(LD) supports the supplied option.
+	ld-option takes two options as arguments.
+	The second argument is an optional option that can be used if the
+	first option is not supported by $(LD).
+
+	Example::
+
+		#Makefile
+		LDFLAGS_vmlinux += $(call ld-option, -X)
+
+
+4 Host Program support
+======================
+
+Kbuild supports building executables on the host for use during the
+compilation stage.
+Two steps are required in order to use a host executable.
+
+The first step is to tell kbuild that a host program exists. This is
+done utilising the variable hostprogs-y.
+
+The second step is to add an explicit dependency to the executable.
+This can be done in two ways. Either add the dependency in a rule,
+or utilise the variable $(always).
+Both possibilities are described in the following.
+
+4.1 Simple Host Program
+-----------------------
+
+	In some cases there is a need to compile and run a program on the
+	computer where the build is running.
+	The following line tells kbuild that the program bin2hex shall be
+	built on the build host.
+
+	Example::
+
+		hostprogs-y := bin2hex
+
+	Kbuild assumes in the above example that bin2hex is made from a single
+	c-source file named bin2hex.c located in the same directory as
+	the Makefile.
+
+4.2 Composite Host Programs
+---------------------------
+
+	Host programs can be made up based on composite objects.
+	The syntax used to define composite objects for host programs is
+	similar to the syntax used for kernel objects.
+	$(<executable>-objs) lists all objects used to link the final
+	executable.
+
+	Example::
+
+		#scripts/lxdialog/Makefile
+		hostprogs-y   := lxdialog
+		lxdialog-objs := checklist.o lxdialog.o
+
+	Objects with extension .o are compiled from the corresponding .c
+	files. In the above example, checklist.c is compiled to checklist.o
+	and lxdialog.c is compiled to lxdialog.o.
+
+	Finally, the two .o files are linked to the executable, lxdialog.
+	Note: The syntax <executable>-y is not permitted for host-programs.
+
+4.3 Using C++ for host programs
+-------------------------------
+
+	kbuild offers support for host programs written in C++. This was
+	introduced solely to support kconfig, and is not recommended
+	for general use.
+
+	Example::
+
+		#scripts/kconfig/Makefile
+		hostprogs-y   := qconf
+		qconf-cxxobjs := qconf.o
+
+	In the example above the executable is composed of the C++ file
+	qconf.cc - identified by $(qconf-cxxobjs).
+
+	If qconf is composed of a mixture of .c and .cc files, then an
+	additional line can be used to identify this.
+
+	Example::
+
+		#scripts/kconfig/Makefile
+		hostprogs-y   := qconf
+		qconf-cxxobjs := qconf.o
+		qconf-objs    := check.o
+
+4.4 Controlling compiler options for host programs
+--------------------------------------------------
+
+	When compiling host programs, it is possible to set specific flags.
+	The programs will always be compiled utilising $(HOSTCC) passed
+	the options specified in $(KBUILD_HOSTCFLAGS).
+	To set flags that will take effect for all host programs created
+	in that Makefile, use the variable HOST_EXTRACFLAGS.
+
+	Example::
+
+		#scripts/lxdialog/Makefile
+		HOST_EXTRACFLAGS += -I/usr/include/ncurses
+
+	To set specific flags for a single file the following construction
+	is used:
+
+	Example::
+
+		#arch/ppc64/boot/Makefile
+		HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE)
+
+	It is also possible to specify additional options to the linker.
+
+	Example::
+
+		#scripts/kconfig/Makefile
+		HOSTLDLIBS_qconf := -L$(QTDIR)/lib
+
+	When linking qconf, it will be passed the extra option
+	"-L$(QTDIR)/lib".
+
+4.5 When host programs are actually built
+-----------------------------------------
+
+	Kbuild will only build host-programs when they are referenced
+	as a prerequisite.
+	This is possible in two ways:
+
+	(1) List the prerequisite explicitly in a special rule.
+
+	Example::
+
+		#drivers/pci/Makefile
+		hostprogs-y := gen-devlist
+		$(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist
+			( cd $(obj); ./gen-devlist ) < $<
+
+	The target $(obj)/devlist.h will not be built before
+	$(obj)/gen-devlist is updated. Note that references to
+	the host programs in special rules must be prefixed with $(obj).
+
+	(2) Use $(always)
+
+	When there is no suitable special rule, and the host program
+	shall be built when a makefile is entered, the $(always)
+	variable shall be used.
+
+	Example::
+
+		#scripts/lxdialog/Makefile
+		hostprogs-y   := lxdialog
+		always        := $(hostprogs-y)
+
+	This will tell kbuild to build lxdialog even if not referenced in
+	any rule.
+
+4.6 Using hostprogs-$(CONFIG_FOO)
+---------------------------------
+
+	A typical pattern in a Kbuild file looks like this:
+
+	Example::
+
+		#scripts/Makefile
+		hostprogs-$(CONFIG_KALLSYMS) += kallsyms
+
+	Kbuild knows about both 'y' for built-in and 'm' for module.
+	So if a config symbol evaluates to 'm', kbuild will still build
+	the binary. In other words, Kbuild handles hostprogs-m exactly
+	like hostprogs-y. But only hostprogs-y is recommended to be used
+	when no CONFIG symbols are involved.
+
+5 Kbuild clean infrastructure
+=============================
+
+"make clean" deletes most generated files in the obj tree where the kernel
+is compiled. This includes generated files such as host programs.
+Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always),
+$(extra-y) and $(targets). They are all deleted during "make clean".
+Files matching the patterns "*.[oas]", "*.ko", plus some additional files
+generated by kbuild are deleted all over the kernel src tree when
+"make clean" is executed.
+
+Additional files can be specified in kbuild makefiles by use of $(clean-files).
+
+	Example::
+
+		#lib/Makefile
+		clean-files := crc32table.h
+
+When executing "make clean", the file "crc32table.h" will be deleted.
+Kbuild will assume files to be in the same relative directory as the
+Makefile, except if prefixed with $(objtree).
+
+To delete a directory hierarchy use:
+
+	Example::
+
+		#scripts/package/Makefile
+		clean-dirs := $(objtree)/debian/
+
+This will delete the directory debian in the toplevel directory, including all
+subdirectories.
+
+To exclude certain files from make clean, use the $(no-clean-files) variable.
+This is only a special case used in the top level Kbuild file:
+
+	Example::
+
+		#Kbuild
+		no-clean-files := $(bounds-file) $(offsets-file)
+
+Usually kbuild descends down in subdirectories due to "obj-* := dir/",
+but in the architecture makefiles where the kbuild infrastructure
+is not sufficient this sometimes needs to be explicit.
+
+	Example::
+
+		#arch/x86/boot/Makefile
+		subdir- := compressed/
+
+The above assignment instructs kbuild to descend down in the
+directory compressed/ when "make clean" is executed.
+
+To support the clean infrastructure in the Makefiles that build the
+final bootimage there is an optional target named archclean:
+
+	Example::
+
+		#arch/x86/Makefile
+		archclean:
+			$(Q)$(MAKE) $(clean)=arch/x86/boot
+
+When "make clean" is executed, make will descend down in arch/x86/boot,
+and clean as usual. The Makefile located in arch/x86/boot/ may use
+the subdir- trick to descend further down.
+
+Note 1: arch/$(ARCH)/Makefile cannot use "subdir-", because that file is
+included in the top level makefile, and the kbuild infrastructure
+is not operational at that point.
+
+Note 2: All directories listed in core-y, libs-y, drivers-y and net-y will
+be visited during "make clean".
+
+6 Architecture Makefiles
+========================
+
+The top level Makefile sets up the environment and does the preparation,
+before starting to descend down in the individual directories.
+The top level makefile contains the generic part, whereas
+arch/$(ARCH)/Makefile contains what is required to set up kbuild
+for said architecture.
+To do so, arch/$(ARCH)/Makefile sets up a number of variables and defines
+a few targets.
+
+When kbuild executes, the following steps are followed (roughly):
+
+1) Configuration of the kernel => produce .config
+2) Store kernel version in include/linux/version.h
+3) Updating all other prerequisites to the target prepare:
+   - Additional prerequisites are specified in arch/$(ARCH)/Makefile
+4) Recursively descend down in all directories listed in
+   init-* core* drivers-* net-* libs-* and build all targets.
+   - The values of the above variables are expanded in arch/$(ARCH)/Makefile.
+5) All object files are then linked and the resulting file vmlinux is
+   located at the root of the obj tree.
+   The very first objects linked are listed in head-y, assigned by
+   arch/$(ARCH)/Makefile.
+6) Finally, the architecture-specific part does any required post processing
+   and builds the final bootimage.
+   - This includes building boot records
+   - Preparing initrd images and the like
+
+
+6.1 Set variables to tweak the build to the architecture
+--------------------------------------------------------
+
+    LDFLAGS
+	Generic $(LD) options
+
+	Flags used for all invocations of the linker.
+	Often specifying the emulation is sufficient.
+
+	Example::
+
+		#arch/s390/Makefile
+		LDFLAGS         := -m elf_s390
+
+	Note: ldflags-y can be used to further customise
+	the flags used. See chapter 3.7.
+
+    LDFLAGS_vmlinux
+	Options for $(LD) when linking vmlinux
+
+	LDFLAGS_vmlinux is used to specify additional flags to pass to
+	the linker when linking the final vmlinux image.
+	LDFLAGS_vmlinux uses the LDFLAGS_$@ support.
+
+	Example::
+
+		#arch/x86/Makefile
+		LDFLAGS_vmlinux := -e stext
+
+    OBJCOPYFLAGS
+	objcopy flags
+
+	When $(call if_changed,objcopy) is used to translate a .o file,
+	the flags specified in OBJCOPYFLAGS will be used.
+	$(call if_changed,objcopy) is often used to generate raw binaries on
+	vmlinux.
+
+	Example::
+
+		#arch/s390/Makefile
+		OBJCOPYFLAGS := -O binary
+
+		#arch/s390/boot/Makefile
+		$(obj)/image: vmlinux FORCE
+			$(call if_changed,objcopy)
+
+	In this example, the binary $(obj)/image is a binary version of
+	vmlinux. The usage of $(call if_changed,xxx) will be described later.
+
+    KBUILD_AFLAGS
+	$(AS) assembler flags
+
+	Default value - see top level Makefile
+	Append or modify as required per architecture.
+
+	Example::
+
+		#arch/sparc64/Makefile
+		KBUILD_AFLAGS += -m64 -mcpu=ultrasparc
+
+    KBUILD_CFLAGS
+	$(CC) compiler flags
+
+	Default value - see top level Makefile
+	Append or modify as required per architecture.
+
+	Often, the KBUILD_CFLAGS variable depends on the configuration.
+
+	Example::
+
+		#arch/x86/boot/compressed/Makefile
+		cflags-$(CONFIG_X86_32) := -march=i386
+		cflags-$(CONFIG_X86_64) := -mcmodel=small
+		KBUILD_CFLAGS += $(cflags-y)
+
+	Many arch Makefiles dynamically run the target C compiler to
+	probe supported options::
+
+		#arch/x86/Makefile
+
+		...
+		cflags-$(CONFIG_MPENTIUMII)     += $(call cc-option,\
+						-march=pentium2,-march=i686)
+		...
+		# Disable unit-at-a-time mode ...
+		KBUILD_CFLAGS += $(call cc-option,-fno-unit-at-a-time)
+		...
+
+
+	The first example utilises the trick that a config option expands
+	to 'y' when selected.
+
+    KBUILD_AFLAGS_KERNEL
+	$(AS) options specific for built-in
+
+	$(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile
+	resident kernel code.
+
+    KBUILD_AFLAGS_MODULE
+	Options for $(AS) when building modules
+
+	$(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
+	are used for $(AS).
+
+	From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
+
+    KBUILD_CFLAGS_KERNEL
+	$(CC) options specific for built-in
+
+	$(KBUILD_CFLAGS_KERNEL) contains extra C compiler flags used to compile
+	resident kernel code.
+
+    KBUILD_CFLAGS_MODULE
+	Options for $(CC) when building modules
+
+	$(KBUILD_CFLAGS_MODULE) is used to add arch-specific options that
+	are used for $(CC).
+	From commandline CFLAGS_MODULE shall be used (see kbuild.txt).
+
+    KBUILD_LDFLAGS_MODULE
+	Options for $(LD) when linking modules
+
+	$(KBUILD_LDFLAGS_MODULE) is used to add arch-specific options
+	used when linking modules. This is often a linker script.
+
+	From commandline LDFLAGS_MODULE shall be used (see kbuild.txt).
+
+    KBUILD_ARFLAGS   Options for $(AR) when creating archives
+
+	$(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic
+	mode) if this option is supported by $(AR).
+
+    ARCH_CPPFLAGS, ARCH_AFLAGS, ARCH_CFLAGS   Overrides the kbuild defaults
+
+	These variables are appended to the KBUILD_CPPFLAGS,
+	KBUILD_AFLAGS, and KBUILD_CFLAGS, respectively, after the
+	top-level Makefile has set any other flags. This provides a
+	means for an architecture to override the defaults.
+
+
+6.2 Add prerequisites to archheaders
+------------------------------------
+
+	The archheaders: rule is used to generate header files that
+	may be installed into user space by "make header_install" or
+	"make headers_install_all".  In order to support
+	"make headers_install_all", this target has to be able to run
+	on an unconfigured tree, or a tree configured for another
+	architecture.
+
+	It is run before "make archprepare" when run on the
+	architecture itself.
+
+
+6.3 Add prerequisites to archprepare
+------------------------------------
+
+	The archprepare: rule is used to list prerequisites that need to be
+	built before starting to descend down in the subdirectories.
+	This is usually used for header files containing assembler constants.
+
+	Example::
+
+		#arch/arm/Makefile
+		archprepare: maketools
+
+	In this example, the file target maketools will be processed
+	before descending down in the subdirectories.
+	See also chapter XXX-TODO that describe how kbuild supports
+	generating offset header files.
+
+
+6.4 List directories to visit when descending
+---------------------------------------------
+
+	An arch Makefile cooperates with the top Makefile to define variables
+	which specify how to build the vmlinux file.  Note that there is no
+	corresponding arch-specific section for modules; the module-building
+	machinery is all architecture-independent.
+
+
+	head-y, init-y, core-y, libs-y, drivers-y, net-y
+	    $(head-y) lists objects to be linked first in vmlinux.
+
+	    $(libs-y) lists directories where a lib.a archive can be located.
+
+	    The rest list directories where a built-in.a object file can be
+	    located.
+
+	    $(init-y) objects will be located after $(head-y).
+
+	    Then the rest follows in this order:
+
+		$(core-y), $(libs-y), $(drivers-y) and $(net-y).
+
+	    The top level Makefile defines values for all generic directories,
+	    and arch/$(ARCH)/Makefile only adds architecture-specific
+	    directories.
+
+	    Example::
+
+		#arch/sparc64/Makefile
+		core-y += arch/sparc64/kernel/
+		libs-y += arch/sparc64/prom/ arch/sparc64/lib/
+		drivers-$(CONFIG_OPROFILE)  += arch/sparc64/oprofile/
+
+
+6.5 Architecture-specific boot images
+-------------------------------------
+
+	An arch Makefile specifies goals that take the vmlinux file, compress
+	it, wrap it in bootstrapping code, and copy the resulting files
+	somewhere. This includes various kinds of installation commands.
+	The actual goals are not standardized across architectures.
+
+	It is common to locate any additional processing in a boot/
+	directory below arch/$(ARCH)/.
+
+	Kbuild does not provide any smart way to support building a
+	target specified in boot/. Therefore arch/$(ARCH)/Makefile shall
+	call make manually to build a target in boot/.
+
+	The recommended approach is to include shortcuts in
+	arch/$(ARCH)/Makefile, and use the full path when calling down
+	into the arch/$(ARCH)/boot/Makefile.
+
+	Example::
+
+		#arch/x86/Makefile
+		boot := arch/x86/boot
+		bzImage: vmlinux
+			$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
+
+	"$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke
+	make in a subdirectory.
+
+	There are no rules for naming architecture-specific targets,
+	but executing "make help" will list all relevant targets.
+	To support this, $(archhelp) must be defined.
+
+	Example::
+
+		#arch/x86/Makefile
+		define archhelp
+		  echo  '* bzImage      - Image (arch/$(ARCH)/boot/bzImage)'
+		endif
+
+	When make is executed without arguments, the first goal encountered
+	will be built. In the top level Makefile the first goal present
+	is all:.
+	An architecture shall always, per default, build a bootable image.
+	In "make help", the default goal is highlighted with a '*'.
+	Add a new prerequisite to all: to select a default goal different
+	from vmlinux.
+
+	Example::
+
+		#arch/x86/Makefile
+		all: bzImage
+
+	When "make" is executed without arguments, bzImage will be built.
+
+6.6 Building non-kbuild targets
+-------------------------------
+
+    extra-y
+	extra-y specifies additional targets created in the current
+	directory, in addition to any targets specified by `obj-*`.
+
+	Listing all targets in extra-y is required for two purposes:
+
+	1) Enable kbuild to check changes in command lines
+
+	   - When $(call if_changed,xxx) is used
+
+	2) kbuild knows what files to delete during "make clean"
+
+	Example::
+
+		#arch/x86/kernel/Makefile
+		extra-y := head.o init_task.o
+
+	In this example, extra-y is used to list object files that
+	shall be built, but shall not be linked as part of built-in.a.
+
+
+6.7 Commands useful for building a boot image
+---------------------------------------------
+
+    Kbuild provides a few macros that are useful when building a
+    boot image.
+
+    if_changed
+	if_changed is the infrastructure used for the following commands.
+
+	Usage::
+
+		target: source(s) FORCE
+			$(call if_changed,ld/objcopy/gzip/...)
+
+	When the rule is evaluated, it is checked to see if any files
+	need an update, or the command line has changed since the last
+	invocation. The latter will force a rebuild if any options
+	to the executable have changed.
+	Any target that utilises if_changed must be listed in $(targets),
+	otherwise the command line check will fail, and the target will
+	always be built.
+	Assignments to $(targets) are without $(obj)/ prefix.
+	if_changed may be used in conjunction with custom commands as
+	defined in 6.8 "Custom kbuild commands".
+
+	Note: It is a typical mistake to forget the FORCE prerequisite.
+	Another common pitfall is that whitespace is sometimes
+	significant; for instance, the below will fail (note the extra space
+	after the comma)::
+
+		target: source(s) FORCE
+
+	**WRONG!**	$(call if_changed, ld/objcopy/gzip/...)
+
+        Note:
+	      if_changed should not be used more than once per target.
+              It stores the executed command in a corresponding .cmd
+
+        file and multiple calls would result in overwrites and
+        unwanted results when the target is up to date and only the
+        tests on changed commands trigger execution of commands.
+
+    ld
+	Link target. Often, LDFLAGS_$@ is used to set specific options to ld.
+
+	Example::
+
+		#arch/x86/boot/Makefile
+		LDFLAGS_bootsect := -Ttext 0x0 -s --oformat binary
+		LDFLAGS_setup    := -Ttext 0x0 -s --oformat binary -e begtext
+
+		targets += setup setup.o bootsect bootsect.o
+		$(obj)/setup $(obj)/bootsect: %: %.o FORCE
+			$(call if_changed,ld)
+
+	In this example, there are two possible targets, requiring different
+	options to the linker. The linker options are specified using the
+	LDFLAGS_$@ syntax - one for each potential target.
+	$(targets) are assigned all potential targets, by which kbuild knows
+	the targets and will:
+
+		1) check for commandline changes
+		2) delete target during make clean
+
+	The ": %: %.o" part of the prerequisite is a shorthand that
+	frees us from listing the setup.o and bootsect.o files.
+
+	Note:
+	      It is a common mistake to forget the "targets :=" assignment,
+	      resulting in the target file being recompiled for no
+	      obvious reason.
+
+    objcopy
+	Copy binary. Uses OBJCOPYFLAGS usually specified in
+	arch/$(ARCH)/Makefile.
+	OBJCOPYFLAGS_$@ may be used to set additional options.
+
+    gzip
+	Compress target. Use maximum compression to compress target.
+
+	Example::
+
+		#arch/x86/boot/compressed/Makefile
+		$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
+			$(call if_changed,gzip)
+
+    dtc
+	Create flattened device tree blob object suitable for linking
+	into vmlinux. Device tree blobs linked into vmlinux are placed
+	in an init section in the image. Platform code *must* copy the
+	blob to non-init memory prior to calling unflatten_device_tree().
+
+	To use this command, simply add `*.dtb` into obj-y or targets, or make
+	some other target depend on `%.dtb`
+
+	A central rule exists to create `$(obj)/%.dtb` from `$(src)/%.dts`;
+	architecture Makefiles do no need to explicitly write out that rule.
+
+	Example::
+
+		targets += $(dtb-y)
+		DTC_FLAGS ?= -p 1024
+
+6.8 Custom kbuild commands
+--------------------------
+
+	When kbuild is executing with KBUILD_VERBOSE=0, then only a shorthand
+	of a command is normally displayed.
+	To enable this behaviour for custom commands kbuild requires
+	two variables to be set::
+
+		quiet_cmd_<command>	- what shall be echoed
+		      cmd_<command>	- the command to execute
+
+	Example::
+
+		#
+		quiet_cmd_image = BUILD   $@
+		      cmd_image = $(obj)/tools/build $(BUILDFLAGS) \
+		                                     $(obj)/vmlinux.bin > $@
+
+		targets += bzImage
+		$(obj)/bzImage: $(obj)/vmlinux.bin $(obj)/tools/build FORCE
+			$(call if_changed,image)
+			@echo 'Kernel: $@ is ready'
+
+	When updating the $(obj)/bzImage target, the line:
+
+		BUILD    arch/x86/boot/bzImage
+
+	will be displayed with "make KBUILD_VERBOSE=0".
+
+
+--- 6.9 Preprocessing linker scripts
+
+	When the vmlinux image is built, the linker script
+	arch/$(ARCH)/kernel/vmlinux.lds is used.
+	The script is a preprocessed variant of the file vmlinux.lds.S
+	located in the same directory.
+	kbuild knows .lds files and includes a rule `*lds.S` -> `*lds`.
+
+	Example::
+
+		#arch/x86/kernel/Makefile
+		always := vmlinux.lds
+
+		#Makefile
+		export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH)
+
+	The assignment to $(always) is used to tell kbuild to build the
+	target vmlinux.lds.
+	The assignment to $(CPPFLAGS_vmlinux.lds) tells kbuild to use the
+	specified options when building the target vmlinux.lds.
+
+	When building the `*.lds` target, kbuild uses the variables::
+
+		KBUILD_CPPFLAGS	: Set in top-level Makefile
+		cppflags-y	: May be set in the kbuild makefile
+		CPPFLAGS_$(@F)  : Target-specific flags.
+				Note that the full filename is used in this
+				assignment.
+
+	The kbuild infrastructure for `*lds` files is used in several
+	architecture-specific files.
+
+6.10 Generic header files
+-------------------------
+
+	The directory include/asm-generic contains the header files
+	that may be shared between individual architectures.
+	The recommended approach how to use a generic header file is
+	to list the file in the Kbuild file.
+	See "7.2 generic-y" for further info on syntax etc.
+
+6.11 Post-link pass
+-------------------
+
+	If the file arch/xxx/Makefile.postlink exists, this makefile
+	will be invoked for post-link objects (vmlinux and modules.ko)
+	for architectures to run post-link passes on. Must also handle
+	the clean target.
+
+	This pass runs after kallsyms generation. If the architecture
+	needs to modify symbol locations, rather than manipulate the
+	kallsyms, it may be easier to add another postlink target for
+	.tmp_vmlinux? targets to be called from link-vmlinux.sh.
+
+	For example, powerpc uses this to check relocation sanity of
+	the linked vmlinux file.
+
+7 Kbuild syntax for exported headers
+------------------------------------
+
+The kernel includes a set of headers that is exported to userspace.
+Many headers can be exported as-is but other headers require a
+minimal pre-processing before they are ready for user-space.
+The pre-processing does:
+
+- drop kernel-specific annotations
+- drop include of compiler.h
+- drop all sections that are kernel internal (guarded by `ifdef __KERNEL__`)
+
+All headers under include/uapi/, include/generated/uapi/,
+arch/<arch>/include/uapi/ and arch/<arch>/include/generated/uapi/
+are exported.
+
+A Kbuild file may be defined under arch/<arch>/include/uapi/asm/ and
+arch/<arch>/include/asm/ to list asm files coming from asm-generic.
+See subsequent chapter for the syntax of the Kbuild file.
+
+7.1 no-export-headers
+---------------------
+
+	no-export-headers is essentially used by include/uapi/linux/Kbuild to
+	avoid exporting specific headers (e.g. kvm.h) on architectures that do
+	not support it. It should be avoided as much as possible.
+
+7.2 generic-y
+-------------
+
+	If an architecture uses a verbatim copy of a header from
+	include/asm-generic then this is listed in the file
+	arch/$(ARCH)/include/asm/Kbuild like this:
+
+		Example::
+
+			#arch/x86/include/asm/Kbuild
+			generic-y += termios.h
+			generic-y += rtc.h
+
+	During the prepare phase of the build a wrapper include
+	file is generated in the directory::
+
+		arch/$(ARCH)/include/generated/asm
+
+	When a header is exported where the architecture uses
+	the generic header a similar wrapper is generated as part
+	of the set of exported headers in the directory::
+
+		usr/include/asm
+
+	The generated wrapper will in both cases look like the following:
+
+		Example: termios.h::
+
+			#include <asm-generic/termios.h>
+
+7.3 generated-y
+---------------
+
+	If an architecture generates other header files alongside generic-y
+	wrappers, generated-y specifies them.
+
+	This prevents them being treated as stale asm-generic wrappers and
+	removed.
+
+		Example::
+
+			#arch/x86/include/asm/Kbuild
+			generated-y += syscalls_32.h
+
+7.4 mandatory-y
+---------------
+
+	mandatory-y is essentially used by include/(uapi/)asm-generic/Kbuild
+	to define the minimum set of ASM headers that all architectures must have.
+
+	This works like optional generic-y. If a mandatory header is missing
+	in arch/$(ARCH)/include/(uapi/)/asm, Kbuild will automatically generate
+	a wrapper of the asm-generic one.
+
+	The convention is to list one subdir per line and
+	preferably in alphabetic order.
+
+8 Kbuild Variables
+==================
+
+The top Makefile exports the following variables:
+
+    VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION
+	These variables define the current kernel version.  A few arch
+	Makefiles actually use these values directly; they should use
+	$(KERNELRELEASE) instead.
+
+	$(VERSION), $(PATCHLEVEL), and $(SUBLEVEL) define the basic
+	three-part version number, such as "2", "4", and "0".  These three
+	values are always numeric.
+
+	$(EXTRAVERSION) defines an even tinier sublevel for pre-patches
+	or additional patches.	It is usually some non-numeric string
+	such as "-pre4", and is often blank.
+
+    KERNELRELEASE
+	$(KERNELRELEASE) is a single string such as "2.4.0-pre4", suitable
+	for constructing installation directory names or showing in
+	version strings.  Some arch Makefiles use it for this purpose.
+
+    ARCH
+	This variable defines the target architecture, such as "i386",
+	"arm", or "sparc". Some kbuild Makefiles test $(ARCH) to
+	determine which files to compile.
+
+	By default, the top Makefile sets $(ARCH) to be the same as the
+	host system architecture.  For a cross build, a user may
+	override the value of $(ARCH) on the command line::
+
+	    make ARCH=m68k ...
+
+
+    INSTALL_PATH
+	This variable defines a place for the arch Makefiles to install
+	the resident kernel image and System.map file.
+	Use this for architecture-specific install targets.
+
+    INSTALL_MOD_PATH, MODLIB
+	$(INSTALL_MOD_PATH) specifies a prefix to $(MODLIB) for module
+	installation.  This variable is not defined in the Makefile but
+	may be passed in by the user if desired.
+
+	$(MODLIB) specifies the directory for module installation.
+	The top Makefile defines $(MODLIB) to
+	$(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE).  The user may
+	override this value on the command line if desired.
+
+    INSTALL_MOD_STRIP
+	If this variable is specified, it will cause modules to be stripped
+	after they are installed.  If INSTALL_MOD_STRIP is '1', then the
+	default option --strip-debug will be used.  Otherwise, the
+	INSTALL_MOD_STRIP value will be used as the option(s) to the strip
+	command.
+
+
+9 Makefile language
+===================
+
+The kernel Makefiles are designed to be run with GNU Make.  The Makefiles
+use only the documented features of GNU Make, but they do use many
+GNU extensions.
+
+GNU Make supports elementary list-processing functions.  The kernel
+Makefiles use a novel style of list building and manipulation with few
+"if" statements.
+
+GNU Make has two assignment operators, ":=" and "=".  ":=" performs
+immediate evaluation of the right-hand side and stores an actual string
+into the left-hand side.  "=" is like a formula definition; it stores the
+right-hand side in an unevaluated form and then evaluates this form each
+time the left-hand side is used.
+
+There are some cases where "=" is appropriate.  Usually, though, ":="
+is the right choice.
+
+10 Credits
+==========
+
+- Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net>
+- Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
+- Updates by Sam Ravnborg <sam@ravnborg.org>
+- Language QA by Jan Engelhardt <jengelh@gmx.de>
+
+11 TODO
+=======
+
+- Describe how kbuild supports shipped files with _shipped.
+- Generating offset header files.
+- Add more variables to section 7?
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
deleted file mode 100644
index d65ad5746f94..000000000000
--- a/Documentation/kbuild/makefiles.txt
+++ /dev/null
@@ -1,1369 +0,0 @@
-Linux Kernel Makefiles
-
-This document describes the Linux kernel Makefiles.
-
-=== Table of Contents
-
-	=== 1 Overview
-	=== 2 Who does what
-	=== 3 The kbuild files
-	   --- 3.1 Goal definitions
-	   --- 3.2 Built-in object goals - obj-y
-	   --- 3.3 Loadable module goals - obj-m
-	   --- 3.4 Objects which export symbols
-	   --- 3.5 Library file goals - lib-y
-	   --- 3.6 Descending down in directories
-	   --- 3.7 Compilation flags
-	   --- 3.8 Command line dependency
-	   --- 3.9 Dependency tracking
-	   --- 3.10 Special Rules
-	   --- 3.11 $(CC) support functions
-	   --- 3.12 $(LD) support functions
-
-	=== 4 Host Program support
-	   --- 4.1 Simple Host Program
-	   --- 4.2 Composite Host Programs
-	   --- 4.3 Using C++ for host programs
-	   --- 4.4 Controlling compiler options for host programs
-	   --- 4.5 When host programs are actually built
-	   --- 4.6 Using hostprogs-$(CONFIG_FOO)
-
-	=== 5 Kbuild clean infrastructure
-
-	=== 6 Architecture Makefiles
-	   --- 6.1 Set variables to tweak the build to the architecture
-	   --- 6.2 Add prerequisites to archheaders:
-	   --- 6.3 Add prerequisites to archprepare:
-	   --- 6.4 List directories to visit when descending
-	   --- 6.5 Architecture-specific boot images
-	   --- 6.6 Building non-kbuild targets
-	   --- 6.7 Commands useful for building a boot image
-	   --- 6.8 Custom kbuild commands
-	   --- 6.9 Preprocessing linker scripts
-	   --- 6.10 Generic header files
-	   --- 6.11 Post-link pass
-
-	=== 7 Kbuild syntax for exported headers
-		--- 7.1 no-export-headers
-		--- 7.2 generic-y
-		--- 7.3 generated-y
-		--- 7.4 mandatory-y
-
-	=== 8 Kbuild Variables
-	=== 9 Makefile language
-	=== 10 Credits
-	=== 11 TODO
-
-=== 1 Overview
-
-The Makefiles have five parts:
-
-	Makefile		the top Makefile.
-	.config			the kernel configuration file.
-	arch/$(ARCH)/Makefile	the arch Makefile.
-	scripts/Makefile.*	common rules etc. for all kbuild Makefiles.
-	kbuild Makefiles	there are about 500 of these.
-
-The top Makefile reads the .config file, which comes from the kernel
-configuration process.
-
-The top Makefile is responsible for building two major products: vmlinux
-(the resident kernel image) and modules (any module files).
-It builds these goals by recursively descending into the subdirectories of
-the kernel source tree.
-The list of subdirectories which are visited depends upon the kernel
-configuration. The top Makefile textually includes an arch Makefile
-with the name arch/$(ARCH)/Makefile. The arch Makefile supplies
-architecture-specific information to the top Makefile.
-
-Each subdirectory has a kbuild Makefile which carries out the commands
-passed down from above. The kbuild Makefile uses information from the
-.config file to construct various file lists used by kbuild to build
-any built-in or modular targets.
-
-scripts/Makefile.* contains all the definitions/rules etc. that
-are used to build the kernel based on the kbuild makefiles.
-
-
-=== 2 Who does what
-
-People have four different relationships with the kernel Makefiles.
-
-*Users* are people who build kernels.  These people type commands such as
-"make menuconfig" or "make".  They usually do not read or edit
-any kernel Makefiles (or any other source files).
-
-*Normal developers* are people who work on features such as device
-drivers, file systems, and network protocols.  These people need to
-maintain the kbuild Makefiles for the subsystem they are
-working on.  In order to do this effectively, they need some overall
-knowledge about the kernel Makefiles, plus detailed knowledge about the
-public interface for kbuild.
-
-*Arch developers* are people who work on an entire architecture, such
-as sparc or ia64.  Arch developers need to know about the arch Makefile
-as well as kbuild Makefiles.
-
-*Kbuild developers* are people who work on the kernel build system itself.
-These people need to know about all aspects of the kernel Makefiles.
-
-This document is aimed towards normal developers and arch developers.
-
-
-=== 3 The kbuild files
-
-Most Makefiles within the kernel are kbuild Makefiles that use the
-kbuild infrastructure. This chapter introduces the syntax used in the
-kbuild makefiles.
-The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can
-be used and if both a 'Makefile' and a 'Kbuild' file exists, then the 'Kbuild'
-file will be used.
-
-Section 3.1 "Goal definitions" is a quick intro, further chapters provide
-more details, with real examples.
-
---- 3.1 Goal definitions
-
-	Goal definitions are the main part (heart) of the kbuild Makefile.
-	These lines define the files to be built, any special compilation
-	options, and any subdirectories to be entered recursively.
-
-	The most simple kbuild makefile contains one line:
-
-	Example:
-		obj-y += foo.o
-
-	This tells kbuild that there is one object in that directory, named
-	foo.o. foo.o will be built from foo.c or foo.S.
-
-	If foo.o shall be built as a module, the variable obj-m is used.
-	Therefore the following pattern is often used:
-
-	Example:
-		obj-$(CONFIG_FOO) += foo.o
-
-	$(CONFIG_FOO) evaluates to either y (for built-in) or m (for module).
-	If CONFIG_FOO is neither y nor m, then the file will not be compiled
-	nor linked.
-
---- 3.2 Built-in object goals - obj-y
-
-	The kbuild Makefile specifies object files for vmlinux
-	in the $(obj-y) lists.  These lists depend on the kernel
-	configuration.
-
-	Kbuild compiles all the $(obj-y) files.  It then calls
-	"$(AR) rcSTP" to merge these files into one built-in.a file.
-	This is a thin archive without a symbol table. It will be later
-	linked into vmlinux by scripts/link-vmlinux.sh
-
-	The order of files in $(obj-y) is significant.  Duplicates in
-	the lists are allowed: the first instance will be linked into
-	built-in.a and succeeding instances will be ignored.
-
-	Link order is significant, because certain functions
-	(module_init() / __initcall) will be called during boot in the
-	order they appear. So keep in mind that changing the link
-	order may e.g. change the order in which your SCSI
-	controllers are detected, and thus your disks are renumbered.
-
-	Example:
-		#drivers/isdn/i4l/Makefile
-		# Makefile for the kernel ISDN subsystem and device drivers.
-		# Each configuration option enables a list of files.
-		obj-$(CONFIG_ISDN_I4L)         += isdn.o
-		obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
-
---- 3.3 Loadable module goals - obj-m
-
-	$(obj-m) specifies object files which are built as loadable
-	kernel modules.
-
-	A module may be built from one source file or several source
-	files. In the case of one source file, the kbuild makefile
-	simply adds the file to $(obj-m).
-
-	Example:
-		#drivers/isdn/i4l/Makefile
-		obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
-
-	Note: In this example $(CONFIG_ISDN_PPP_BSDCOMP) evaluates to 'm'
-
-	If a kernel module is built from several source files, you specify
-	that you want to build a module in the same way as above; however,
-	kbuild needs to know which object files you want to build your
-	module from, so you have to tell it by setting a $(<module_name>-y)
-	variable.
-
-	Example:
-		#drivers/isdn/i4l/Makefile
-		obj-$(CONFIG_ISDN_I4L) += isdn.o
-		isdn-y := isdn_net_lib.o isdn_v110.o isdn_common.o
-
-	In this example, the module name will be isdn.o. Kbuild will
-	compile the objects listed in $(isdn-y) and then run
-	"$(LD) -r" on the list of these files to generate isdn.o.
-
-	Due to kbuild recognizing $(<module_name>-y) for composite objects,
-	you can use the value of a CONFIG_ symbol to optionally include an
-	object file as part of a composite object.
-
-	Example:
-		#fs/ext2/Makefile
-	        obj-$(CONFIG_EXT2_FS) += ext2.o
-		ext2-y := balloc.o dir.o file.o ialloc.o inode.o ioctl.o \
-			  namei.o super.o symlink.o
-	        ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o xattr_user.o \
-						xattr_trusted.o
-
-	In this example, xattr.o, xattr_user.o and xattr_trusted.o are only
-	part of the composite object ext2.o if $(CONFIG_EXT2_FS_XATTR)
-	evaluates to 'y'.
-
-	Note: Of course, when you are building objects into the kernel,
-	the syntax above will also work. So, if you have CONFIG_EXT2_FS=y,
-	kbuild will build an ext2.o file for you out of the individual
-	parts and then link this into built-in.a, as you would expect.
-
---- 3.4 Objects which export symbols
-
-	No special notation is required in the makefiles for
-	modules exporting symbols.
-
---- 3.5 Library file goals - lib-y
-
-	Objects listed with obj-* are used for modules, or
-	combined in a built-in.a for that specific directory.
-	There is also the possibility to list objects that will
-	be included in a library, lib.a.
-	All objects listed with lib-y are combined in a single
-	library for that directory.
-	Objects that are listed in obj-y and additionally listed in
-	lib-y will not be included in the library, since they will
-	be accessible anyway.
-	For consistency, objects listed in lib-m will be included in lib.a.
-
-	Note that the same kbuild makefile may list files to be built-in
-	and to be part of a library. Therefore the same directory
-	may contain both a built-in.a and a lib.a file.
-
-	Example:
-		#arch/x86/lib/Makefile
-		lib-y    := delay.o
-
-	This will create a library lib.a based on delay.o. For kbuild to
-	actually recognize that there is a lib.a being built, the directory
-	shall be listed in libs-y.
-	See also "6.4 List directories to visit when descending".
-
-	Use of lib-y is normally restricted to lib/ and arch/*/lib.
-
---- 3.6 Descending down in directories
-
-	A Makefile is only responsible for building objects in its own
-	directory. Files in subdirectories should be taken care of by
-	Makefiles in these subdirs. The build system will automatically
-	invoke make recursively in subdirectories, provided you let it know of
-	them.
-
-	To do so, obj-y and obj-m are used.
-	ext2 lives in a separate directory, and the Makefile present in fs/
-	tells kbuild to descend down using the following assignment.
-
-	Example:
-		#fs/Makefile
-		obj-$(CONFIG_EXT2_FS) += ext2/
-
-	If CONFIG_EXT2_FS is set to either 'y' (built-in) or 'm' (modular)
-	the corresponding obj- variable will be set, and kbuild will descend
-	down in the ext2 directory.
-	Kbuild only uses this information to decide that it needs to visit
-	the directory, it is the Makefile in the subdirectory that
-	specifies what is modular and what is built-in.
-
-	It is good practice to use a CONFIG_ variable when assigning directory
-	names. This allows kbuild to totally skip the directory if the
-	corresponding CONFIG_ option is neither 'y' nor 'm'.
-
---- 3.7 Compilation flags
-
-    ccflags-y, asflags-y and ldflags-y
-	These three flags apply only to the kbuild makefile in which they
-	are assigned. They are used for all the normal cc, as and ld
-	invocations happening during a recursive build.
-	Note: Flags with the same behaviour were previously named:
-	EXTRA_CFLAGS, EXTRA_AFLAGS and EXTRA_LDFLAGS.
-	They are still supported but their usage is deprecated.
-
-	ccflags-y specifies options for compiling with $(CC).
-
-	Example:
-		# drivers/acpi/acpica/Makefile
-		ccflags-y			:= -Os -D_LINUX -DBUILDING_ACPICA
-		ccflags-$(CONFIG_ACPI_DEBUG)	+= -DACPI_DEBUG_OUTPUT
-
-	This variable is necessary because the top Makefile owns the
-	variable $(KBUILD_CFLAGS) and uses it for compilation flags for the
-	entire tree.
-
-	asflags-y specifies options for assembling with $(AS).
-
-	Example:
-		#arch/sparc/kernel/Makefile
-		asflags-y := -ansi
-
-	ldflags-y specifies options for linking with $(LD).
-
-	Example:
-		#arch/cris/boot/compressed/Makefile
-		ldflags-y += -T $(srctree)/$(src)/decompress_$(arch-y).lds
-
-    subdir-ccflags-y, subdir-asflags-y
-	The two flags listed above are similar to ccflags-y and asflags-y.
-	The difference is that the subdir- variants have effect for the kbuild
-	file where they are present and all subdirectories.
-	Options specified using subdir-* are added to the commandline before
-	the options specified using the non-subdir variants.
-
-	Example:
-		subdir-ccflags-y := -Werror
-
-    CFLAGS_$@, AFLAGS_$@
-
-	CFLAGS_$@ and AFLAGS_$@ only apply to commands in current
-	kbuild makefile.
-
-	$(CFLAGS_$@) specifies per-file options for $(CC).  The $@
-	part has a literal value which specifies the file that it is for.
-
-	Example:
-		# drivers/scsi/Makefile
-		CFLAGS_aha152x.o =   -DAHA152X_STAT -DAUTOCONF
-		CFLAGS_gdth.o    = # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ \
-				     -DGDTH_STATISTICS
-
-	These two lines specify compilation flags for aha152x.o and gdth.o.
-
-	$(AFLAGS_$@) is a similar feature for source files in assembly
-	languages.
-
-	Example:
-		# arch/arm/kernel/Makefile
-		AFLAGS_head.o        := -DTEXT_OFFSET=$(TEXT_OFFSET)
-		AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
-		AFLAGS_iwmmxt.o      := -Wa,-mcpu=iwmmxt
-
-
---- 3.9 Dependency tracking
-
-	Kbuild tracks dependencies on the following:
-	1) All prerequisite files (both *.c and *.h)
-	2) CONFIG_ options used in all prerequisite files
-	3) Command-line used to compile target
-
-	Thus, if you change an option to $(CC) all affected files will
-	be re-compiled.
-
---- 3.10 Special Rules
-
-	Special rules are used when the kbuild infrastructure does
-	not provide the required support. A typical example is
-	header files generated during the build process.
-	Another example are the architecture-specific Makefiles which
-	need special rules to prepare boot images etc.
-
-	Special rules are written as normal Make rules.
-	Kbuild is not executing in the directory where the Makefile is
-	located, so all special rules shall provide a relative
-	path to prerequisite files and target files.
-
-	Two variables are used when defining special rules:
-
-    $(src)
-	$(src) is a relative path which points to the directory
-	where the Makefile is located. Always use $(src) when
-	referring to files located in the src tree.
-
-    $(obj)
-	$(obj) is a relative path which points to the directory
-	where the target is saved. Always use $(obj) when
-	referring to generated files.
-
-	Example:
-		#drivers/scsi/Makefile
-		$(obj)/53c8xx_d.h: $(src)/53c7,8xx.scr $(src)/script_asm.pl
-			$(CPP) -DCHIP=810 - < $< | ... $(src)/script_asm.pl
-
-	This is a special rule, following the normal syntax
-	required by make.
-	The target file depends on two prerequisite files. References
-	to the target file are prefixed with $(obj), references
-	to prerequisites are referenced with $(src) (because they are not
-	generated files).
-
-    $(kecho)
-	echoing information to user in a rule is often a good practice
-	but when execution "make -s" one does not expect to see any output
-	except for warnings/errors.
-	To support this kbuild defines $(kecho) which will echo out the
-	text following $(kecho) to stdout except if "make -s" is used.
-
-	Example:
-		#arch/blackfin/boot/Makefile
-		$(obj)/vmImage: $(obj)/vmlinux.gz
-			$(call if_changed,uimage)
-			@$(kecho) 'Kernel: $@ is ready'
-
-
---- 3.11 $(CC) support functions
-
-	The kernel may be built with several different versions of
-	$(CC), each supporting a unique set of features and options.
-	kbuild provides basic support to check for valid options for $(CC).
-	$(CC) is usually the gcc compiler, but other alternatives are
-	available.
-
-    as-option
-	as-option is used to check if $(CC) -- when used to compile
-	assembler (*.S) files -- supports the given option. An optional
-	second option may be specified if the first option is not supported.
-
-	Example:
-		#arch/sh/Makefile
-		cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),)
-
-	In the above example, cflags-y will be assigned the option
-	-Wa$(comma)-isa=$(isa-y) if it is supported by $(CC).
-	The second argument is optional, and if supplied will be used
-	if first argument is not supported.
-
-    as-instr
-	as-instr checks if the assembler reports a specific instruction
-	and then outputs either option1 or option2
-	C escapes are supported in the test instruction
-	Note: as-instr-option uses KBUILD_AFLAGS for $(AS) options
-
-    cc-option
-	cc-option is used to check if $(CC) supports a given option, and if
-	not supported to use an optional second option.
-
-	Example:
-		#arch/x86/Makefile
-		cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586)
-
-	In the above example, cflags-y will be assigned the option
-	-march=pentium-mmx if supported by $(CC), otherwise -march=i586.
-	The second argument to cc-option is optional, and if omitted,
-	cflags-y will be assigned no value if first option is not supported.
-	Note: cc-option uses KBUILD_CFLAGS for $(CC) options
-
-   cc-option-yn
-	cc-option-yn is used to check if gcc supports a given option
-	and return 'y' if supported, otherwise 'n'.
-
-	Example:
-		#arch/ppc/Makefile
-		biarch := $(call cc-option-yn, -m32)
-		aflags-$(biarch) += -a32
-		cflags-$(biarch) += -m32
-
-	In the above example, $(biarch) is set to y if $(CC) supports the -m32
-	option. When $(biarch) equals 'y', the expanded variables $(aflags-y)
-	and $(cflags-y) will be assigned the values -a32 and -m32,
-	respectively.
-	Note: cc-option-yn uses KBUILD_CFLAGS for $(CC) options
-
-    cc-disable-warning
-	cc-disable-warning checks if gcc supports a given warning and returns
-	the commandline switch to disable it. This special function is needed,
-	because gcc 4.4 and later accept any unknown -Wno-* option and only
-	warn about it if there is another warning in the source file.
-
-	Example:
-		KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable)
-
-	In the above example, -Wno-unused-but-set-variable will be added to
-	KBUILD_CFLAGS only if gcc really accepts it.
-
-    cc-ifversion
-	cc-ifversion tests the version of $(CC) and equals the fourth parameter
-	if version expression is true, or the fifth (if given) if the version
-	expression is false.
-
-	Example:
-		#fs/reiserfs/Makefile
-		ccflags-y := $(call cc-ifversion, -lt, 0402, -O1)
-
-	In this example, ccflags-y will be assigned the value -O1 if the
-	$(CC) version is less than 4.2.
-	cc-ifversion takes all the shell operators:
-	-eq, -ne, -lt, -le, -gt, and -ge
-	The third parameter may be a text as in this example, but it may also
-	be an expanded variable or a macro.
-
-    cc-cross-prefix
-	cc-cross-prefix is used to check if there exists a $(CC) in path with
-	one of the listed prefixes. The first prefix where there exist a
-	prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found
-	then nothing is returned.
-	Additional prefixes are separated by a single space in the
-	call of cc-cross-prefix.
-	This functionality is useful for architecture Makefiles that try
-	to set CROSS_COMPILE to well-known values but may have several
-	values to select between.
-	It is recommended only to try to set CROSS_COMPILE if it is a cross
-	build (host arch is different from target arch). And if CROSS_COMPILE
-	is already set then leave it with the old value.
-
-	Example:
-		#arch/m68k/Makefile
-		ifneq ($(SUBARCH),$(ARCH))
-		        ifeq ($(CROSS_COMPILE),)
-		               CROSS_COMPILE := $(call cc-cross-prefix, m68k-linux-gnu-)
-			endif
-		endif
-
---- 3.12 $(LD) support functions
-
-    ld-option
-	ld-option is used to check if $(LD) supports the supplied option.
-	ld-option takes two options as arguments.
-	The second argument is an optional option that can be used if the
-	first option is not supported by $(LD).
-
-	Example:
-		#Makefile
-		LDFLAGS_vmlinux += $(call ld-option, -X)
-
-
-=== 4 Host Program support
-
-Kbuild supports building executables on the host for use during the
-compilation stage.
-Two steps are required in order to use a host executable.
-
-The first step is to tell kbuild that a host program exists. This is
-done utilising the variable hostprogs-y.
-
-The second step is to add an explicit dependency to the executable.
-This can be done in two ways. Either add the dependency in a rule,
-or utilise the variable $(always).
-Both possibilities are described in the following.
-
---- 4.1 Simple Host Program
-
-	In some cases there is a need to compile and run a program on the
-	computer where the build is running.
-	The following line tells kbuild that the program bin2hex shall be
-	built on the build host.
-
-	Example:
-		hostprogs-y := bin2hex
-
-	Kbuild assumes in the above example that bin2hex is made from a single
-	c-source file named bin2hex.c located in the same directory as
-	the Makefile.
-
---- 4.2 Composite Host Programs
-
-	Host programs can be made up based on composite objects.
-	The syntax used to define composite objects for host programs is
-	similar to the syntax used for kernel objects.
-	$(<executable>-objs) lists all objects used to link the final
-	executable.
-
-	Example:
-		#scripts/lxdialog/Makefile
-		hostprogs-y   := lxdialog
-		lxdialog-objs := checklist.o lxdialog.o
-
-	Objects with extension .o are compiled from the corresponding .c
-	files. In the above example, checklist.c is compiled to checklist.o
-	and lxdialog.c is compiled to lxdialog.o.
-	Finally, the two .o files are linked to the executable, lxdialog.
-	Note: The syntax <executable>-y is not permitted for host-programs.
-
---- 4.3 Using C++ for host programs
-
-	kbuild offers support for host programs written in C++. This was
-	introduced solely to support kconfig, and is not recommended
-	for general use.
-
-	Example:
-		#scripts/kconfig/Makefile
-		hostprogs-y   := qconf
-		qconf-cxxobjs := qconf.o
-
-	In the example above the executable is composed of the C++ file
-	qconf.cc - identified by $(qconf-cxxobjs).
-
-	If qconf is composed of a mixture of .c and .cc files, then an
-	additional line can be used to identify this.
-
-	Example:
-		#scripts/kconfig/Makefile
-		hostprogs-y   := qconf
-		qconf-cxxobjs := qconf.o
-		qconf-objs    := check.o
-
---- 4.4 Controlling compiler options for host programs
-
-	When compiling host programs, it is possible to set specific flags.
-	The programs will always be compiled utilising $(HOSTCC) passed
-	the options specified in $(KBUILD_HOSTCFLAGS).
-	To set flags that will take effect for all host programs created
-	in that Makefile, use the variable HOST_EXTRACFLAGS.
-
-	Example:
-		#scripts/lxdialog/Makefile
-		HOST_EXTRACFLAGS += -I/usr/include/ncurses
-
-	To set specific flags for a single file the following construction
-	is used:
-
-	Example:
-		#arch/ppc64/boot/Makefile
-		HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE)
-
-	It is also possible to specify additional options to the linker.
-
-	Example:
-		#scripts/kconfig/Makefile
-		HOSTLDLIBS_qconf := -L$(QTDIR)/lib
-
-	When linking qconf, it will be passed the extra option
-	"-L$(QTDIR)/lib".
-
---- 4.5 When host programs are actually built
-
-	Kbuild will only build host-programs when they are referenced
-	as a prerequisite.
-	This is possible in two ways:
-
-	(1) List the prerequisite explicitly in a special rule.
-
-	Example:
-		#drivers/pci/Makefile
-		hostprogs-y := gen-devlist
-		$(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist
-			( cd $(obj); ./gen-devlist ) < $<
-
-	The target $(obj)/devlist.h will not be built before
-	$(obj)/gen-devlist is updated. Note that references to
-	the host programs in special rules must be prefixed with $(obj).
-
-	(2) Use $(always)
-	When there is no suitable special rule, and the host program
-	shall be built when a makefile is entered, the $(always)
-	variable shall be used.
-
-	Example:
-		#scripts/lxdialog/Makefile
-		hostprogs-y   := lxdialog
-		always        := $(hostprogs-y)
-
-	This will tell kbuild to build lxdialog even if not referenced in
-	any rule.
-
---- 4.6 Using hostprogs-$(CONFIG_FOO)
-
-	A typical pattern in a Kbuild file looks like this:
-
-	Example:
-		#scripts/Makefile
-		hostprogs-$(CONFIG_KALLSYMS) += kallsyms
-
-	Kbuild knows about both 'y' for built-in and 'm' for module.
-	So if a config symbol evaluates to 'm', kbuild will still build
-	the binary. In other words, Kbuild handles hostprogs-m exactly
-	like hostprogs-y. But only hostprogs-y is recommended to be used
-	when no CONFIG symbols are involved.
-
-=== 5 Kbuild clean infrastructure
-
-"make clean" deletes most generated files in the obj tree where the kernel
-is compiled. This includes generated files such as host programs.
-Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always),
-$(extra-y) and $(targets). They are all deleted during "make clean".
-Files matching the patterns "*.[oas]", "*.ko", plus some additional files
-generated by kbuild are deleted all over the kernel src tree when
-"make clean" is executed.
-
-Additional files can be specified in kbuild makefiles by use of $(clean-files).
-
-	Example:
-		#lib/Makefile
-		clean-files := crc32table.h
-
-When executing "make clean", the file "crc32table.h" will be deleted.
-Kbuild will assume files to be in the same relative directory as the
-Makefile, except if prefixed with $(objtree).
-
-To delete a directory hierarchy use:
-
-	Example:
-		#scripts/package/Makefile
-		clean-dirs := $(objtree)/debian/
-
-This will delete the directory debian in the toplevel directory, including all
-subdirectories.
-
-To exclude certain files from make clean, use the $(no-clean-files) variable.
-This is only a special case used in the top level Kbuild file:
-
-	Example:
-		#Kbuild
-		no-clean-files := $(bounds-file) $(offsets-file)
-
-Usually kbuild descends down in subdirectories due to "obj-* := dir/",
-but in the architecture makefiles where the kbuild infrastructure
-is not sufficient this sometimes needs to be explicit.
-
-	Example:
-		#arch/x86/boot/Makefile
-		subdir- := compressed/
-
-The above assignment instructs kbuild to descend down in the
-directory compressed/ when "make clean" is executed.
-
-To support the clean infrastructure in the Makefiles that build the
-final bootimage there is an optional target named archclean:
-
-	Example:
-		#arch/x86/Makefile
-		archclean:
-			$(Q)$(MAKE) $(clean)=arch/x86/boot
-
-When "make clean" is executed, make will descend down in arch/x86/boot,
-and clean as usual. The Makefile located in arch/x86/boot/ may use
-the subdir- trick to descend further down.
-
-Note 1: arch/$(ARCH)/Makefile cannot use "subdir-", because that file is
-included in the top level makefile, and the kbuild infrastructure
-is not operational at that point.
-
-Note 2: All directories listed in core-y, libs-y, drivers-y and net-y will
-be visited during "make clean".
-
-=== 6 Architecture Makefiles
-
-The top level Makefile sets up the environment and does the preparation,
-before starting to descend down in the individual directories.
-The top level makefile contains the generic part, whereas
-arch/$(ARCH)/Makefile contains what is required to set up kbuild
-for said architecture.
-To do so, arch/$(ARCH)/Makefile sets up a number of variables and defines
-a few targets.
-
-When kbuild executes, the following steps are followed (roughly):
-1) Configuration of the kernel => produce .config
-2) Store kernel version in include/linux/version.h
-3) Updating all other prerequisites to the target prepare:
-   - Additional prerequisites are specified in arch/$(ARCH)/Makefile
-4) Recursively descend down in all directories listed in
-   init-* core* drivers-* net-* libs-* and build all targets.
-   - The values of the above variables are expanded in arch/$(ARCH)/Makefile.
-5) All object files are then linked and the resulting file vmlinux is
-   located at the root of the obj tree.
-   The very first objects linked are listed in head-y, assigned by
-   arch/$(ARCH)/Makefile.
-6) Finally, the architecture-specific part does any required post processing
-   and builds the final bootimage.
-   - This includes building boot records
-   - Preparing initrd images and the like
-
-
---- 6.1 Set variables to tweak the build to the architecture
-
-    LDFLAGS		Generic $(LD) options
-
-	Flags used for all invocations of the linker.
-	Often specifying the emulation is sufficient.
-
-	Example:
-		#arch/s390/Makefile
-		LDFLAGS         := -m elf_s390
-	Note: ldflags-y can be used to further customise
-	the flags used. See chapter 3.7.
-
-    LDFLAGS_vmlinux	Options for $(LD) when linking vmlinux
-
-	LDFLAGS_vmlinux is used to specify additional flags to pass to
-	the linker when linking the final vmlinux image.
-	LDFLAGS_vmlinux uses the LDFLAGS_$@ support.
-
-	Example:
-		#arch/x86/Makefile
-		LDFLAGS_vmlinux := -e stext
-
-    OBJCOPYFLAGS	objcopy flags
-
-	When $(call if_changed,objcopy) is used to translate a .o file,
-	the flags specified in OBJCOPYFLAGS will be used.
-	$(call if_changed,objcopy) is often used to generate raw binaries on
-	vmlinux.
-
-	Example:
-		#arch/s390/Makefile
-		OBJCOPYFLAGS := -O binary
-
-		#arch/s390/boot/Makefile
-		$(obj)/image: vmlinux FORCE
-			$(call if_changed,objcopy)
-
-	In this example, the binary $(obj)/image is a binary version of
-	vmlinux. The usage of $(call if_changed,xxx) will be described later.
-
-    KBUILD_AFLAGS		$(AS) assembler flags
-
-	Default value - see top level Makefile
-	Append or modify as required per architecture.
-
-	Example:
-		#arch/sparc64/Makefile
-		KBUILD_AFLAGS += -m64 -mcpu=ultrasparc
-
-    KBUILD_CFLAGS		$(CC) compiler flags
-
-	Default value - see top level Makefile
-	Append or modify as required per architecture.
-
-	Often, the KBUILD_CFLAGS variable depends on the configuration.
-
-	Example:
-		#arch/x86/boot/compressed/Makefile
-		cflags-$(CONFIG_X86_32) := -march=i386
-		cflags-$(CONFIG_X86_64) := -mcmodel=small
-		KBUILD_CFLAGS += $(cflags-y)
-
-	Many arch Makefiles dynamically run the target C compiler to
-	probe supported options:
-
-		#arch/x86/Makefile
-
-		...
-		cflags-$(CONFIG_MPENTIUMII)     += $(call cc-option,\
-						-march=pentium2,-march=i686)
-		...
-		# Disable unit-at-a-time mode ...
-		KBUILD_CFLAGS += $(call cc-option,-fno-unit-at-a-time)
-		...
-
-
-	The first example utilises the trick that a config option expands
-	to 'y' when selected.
-
-    KBUILD_AFLAGS_KERNEL	$(AS) options specific for built-in
-
-	$(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile
-	resident kernel code.
-
-    KBUILD_AFLAGS_MODULE   Options for $(AS) when building modules
-
-	$(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
-	are used for $(AS).
-	From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
-
-    KBUILD_CFLAGS_KERNEL	$(CC) options specific for built-in
-
-	$(KBUILD_CFLAGS_KERNEL) contains extra C compiler flags used to compile
-	resident kernel code.
-
-    KBUILD_CFLAGS_MODULE   Options for $(CC) when building modules
-
-	$(KBUILD_CFLAGS_MODULE) is used to add arch-specific options that
-	are used for $(CC).
-	From commandline CFLAGS_MODULE shall be used (see kbuild.txt).
-
-    KBUILD_LDFLAGS_MODULE   Options for $(LD) when linking modules
-
-	$(KBUILD_LDFLAGS_MODULE) is used to add arch-specific options
-	used when linking modules. This is often a linker script.
-	From commandline LDFLAGS_MODULE shall be used (see kbuild.txt).
-
-    KBUILD_ARFLAGS   Options for $(AR) when creating archives
-
-	$(KBUILD_ARFLAGS) set by the top level Makefile to "D" (deterministic
-	mode) if this option is supported by $(AR).
-
-    ARCH_CPPFLAGS, ARCH_AFLAGS, ARCH_CFLAGS   Overrides the kbuild defaults
-
-	These variables are appended to the KBUILD_CPPFLAGS,
-	KBUILD_AFLAGS, and KBUILD_CFLAGS, respectively, after the
-	top-level Makefile has set any other flags. This provides a
-	means for an architecture to override the defaults.
-
-
---- 6.2 Add prerequisites to archheaders:
-
-	The archheaders: rule is used to generate header files that
-	may be installed into user space by "make header_install" or
-	"make headers_install_all".  In order to support
-	"make headers_install_all", this target has to be able to run
-	on an unconfigured tree, or a tree configured for another
-	architecture.
-
-	It is run before "make archprepare" when run on the
-	architecture itself.
-
-
---- 6.3 Add prerequisites to archprepare:
-
-	The archprepare: rule is used to list prerequisites that need to be
-	built before starting to descend down in the subdirectories.
-	This is usually used for header files containing assembler constants.
-
-		Example:
-		#arch/arm/Makefile
-		archprepare: maketools
-
-	In this example, the file target maketools will be processed
-	before descending down in the subdirectories.
-	See also chapter XXX-TODO that describe how kbuild supports
-	generating offset header files.
-
-
---- 6.4 List directories to visit when descending
-
-	An arch Makefile cooperates with the top Makefile to define variables
-	which specify how to build the vmlinux file.  Note that there is no
-	corresponding arch-specific section for modules; the module-building
-	machinery is all architecture-independent.
-
-
-    head-y, init-y, core-y, libs-y, drivers-y, net-y
-
-	$(head-y) lists objects to be linked first in vmlinux.
-	$(libs-y) lists directories where a lib.a archive can be located.
-	The rest list directories where a built-in.a object file can be
-	located.
-
-	$(init-y) objects will be located after $(head-y).
-	Then the rest follows in this order:
-	$(core-y), $(libs-y), $(drivers-y) and $(net-y).
-
-	The top level Makefile defines values for all generic directories,
-	and arch/$(ARCH)/Makefile only adds architecture-specific directories.
-
-	Example:
-		#arch/sparc64/Makefile
-		core-y += arch/sparc64/kernel/
-		libs-y += arch/sparc64/prom/ arch/sparc64/lib/
-		drivers-$(CONFIG_OPROFILE)  += arch/sparc64/oprofile/
-
-
---- 6.5 Architecture-specific boot images
-
-	An arch Makefile specifies goals that take the vmlinux file, compress
-	it, wrap it in bootstrapping code, and copy the resulting files
-	somewhere. This includes various kinds of installation commands.
-	The actual goals are not standardized across architectures.
-
-	It is common to locate any additional processing in a boot/
-	directory below arch/$(ARCH)/.
-
-	Kbuild does not provide any smart way to support building a
-	target specified in boot/. Therefore arch/$(ARCH)/Makefile shall
-	call make manually to build a target in boot/.
-
-	The recommended approach is to include shortcuts in
-	arch/$(ARCH)/Makefile, and use the full path when calling down
-	into the arch/$(ARCH)/boot/Makefile.
-
-	Example:
-		#arch/x86/Makefile
-		boot := arch/x86/boot
-		bzImage: vmlinux
-			$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
-
-	"$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke
-	make in a subdirectory.
-
-	There are no rules for naming architecture-specific targets,
-	but executing "make help" will list all relevant targets.
-	To support this, $(archhelp) must be defined.
-
-	Example:
-		#arch/x86/Makefile
-		define archhelp
-		  echo  '* bzImage      - Image (arch/$(ARCH)/boot/bzImage)'
-		endif
-
-	When make is executed without arguments, the first goal encountered
-	will be built. In the top level Makefile the first goal present
-	is all:.
-	An architecture shall always, per default, build a bootable image.
-	In "make help", the default goal is highlighted with a '*'.
-	Add a new prerequisite to all: to select a default goal different
-	from vmlinux.
-
-	Example:
-		#arch/x86/Makefile
-		all: bzImage
-
-	When "make" is executed without arguments, bzImage will be built.
-
---- 6.6 Building non-kbuild targets
-
-    extra-y
-
-	extra-y specifies additional targets created in the current
-	directory, in addition to any targets specified by obj-*.
-
-	Listing all targets in extra-y is required for two purposes:
-	1) Enable kbuild to check changes in command lines
-	   - When $(call if_changed,xxx) is used
-	2) kbuild knows what files to delete during "make clean"
-
-	Example:
-		#arch/x86/kernel/Makefile
-		extra-y := head.o init_task.o
-
-	In this example, extra-y is used to list object files that
-	shall be built, but shall not be linked as part of built-in.a.
-
-
---- 6.7 Commands useful for building a boot image
-
-	Kbuild provides a few macros that are useful when building a
-	boot image.
-
-    if_changed
-
-	if_changed is the infrastructure used for the following commands.
-
-	Usage:
-		target: source(s) FORCE
-			$(call if_changed,ld/objcopy/gzip/...)
-
-	When the rule is evaluated, it is checked to see if any files
-	need an update, or the command line has changed since the last
-	invocation. The latter will force a rebuild if any options
-	to the executable have changed.
-	Any target that utilises if_changed must be listed in $(targets),
-	otherwise the command line check will fail, and the target will
-	always be built.
-	Assignments to $(targets) are without $(obj)/ prefix.
-	if_changed may be used in conjunction with custom commands as
-	defined in 6.8 "Custom kbuild commands".
-
-	Note: It is a typical mistake to forget the FORCE prerequisite.
-	Another common pitfall is that whitespace is sometimes
-	significant; for instance, the below will fail (note the extra space
-	after the comma):
-		target: source(s) FORCE
-	#WRONG!#	$(call if_changed, ld/objcopy/gzip/...)
-
-        Note: if_changed should not be used more than once per target.
-              It stores the executed command in a corresponding .cmd
-        file and multiple calls would result in overwrites and
-        unwanted results when the target is up to date and only the
-        tests on changed commands trigger execution of commands.
-
-    ld
-	Link target. Often, LDFLAGS_$@ is used to set specific options to ld.
-
-	Example:
-		#arch/x86/boot/Makefile
-		LDFLAGS_bootsect := -Ttext 0x0 -s --oformat binary
-		LDFLAGS_setup    := -Ttext 0x0 -s --oformat binary -e begtext
-
-		targets += setup setup.o bootsect bootsect.o
-		$(obj)/setup $(obj)/bootsect: %: %.o FORCE
-			$(call if_changed,ld)
-
-	In this example, there are two possible targets, requiring different
-	options to the linker. The linker options are specified using the
-	LDFLAGS_$@ syntax - one for each potential target.
-	$(targets) are assigned all potential targets, by which kbuild knows
-	the targets and will:
-		1) check for commandline changes
-		2) delete target during make clean
-
-	The ": %: %.o" part of the prerequisite is a shorthand that
-	frees us from listing the setup.o and bootsect.o files.
-	Note: It is a common mistake to forget the "targets :=" assignment,
-	      resulting in the target file being recompiled for no
-	      obvious reason.
-
-    objcopy
-	Copy binary. Uses OBJCOPYFLAGS usually specified in
-	arch/$(ARCH)/Makefile.
-	OBJCOPYFLAGS_$@ may be used to set additional options.
-
-    gzip
-	Compress target. Use maximum compression to compress target.
-
-	Example:
-		#arch/x86/boot/compressed/Makefile
-		$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
-			$(call if_changed,gzip)
-
-    dtc
-	Create flattened device tree blob object suitable for linking
-	into vmlinux. Device tree blobs linked into vmlinux are placed
-	in an init section in the image. Platform code *must* copy the
-	blob to non-init memory prior to calling unflatten_device_tree().
-
-	To use this command, simply add *.dtb into obj-y or targets, or make
-	some other target depend on %.dtb
-
-	A central rule exists to create $(obj)/%.dtb from $(src)/%.dts;
-	architecture Makefiles do no need to explicitly write out that rule.
-
-	Example:
-		targets += $(dtb-y)
-		DTC_FLAGS ?= -p 1024
-
---- 6.8 Custom kbuild commands
-
-	When kbuild is executing with KBUILD_VERBOSE=0, then only a shorthand
-	of a command is normally displayed.
-	To enable this behaviour for custom commands kbuild requires
-	two variables to be set:
-	quiet_cmd_<command>	- what shall be echoed
-	      cmd_<command>	- the command to execute
-
-	Example:
-		#
-		quiet_cmd_image = BUILD   $@
-		      cmd_image = $(obj)/tools/build $(BUILDFLAGS) \
-		                                     $(obj)/vmlinux.bin > $@
-
-		targets += bzImage
-		$(obj)/bzImage: $(obj)/vmlinux.bin $(obj)/tools/build FORCE
-			$(call if_changed,image)
-			@echo 'Kernel: $@ is ready'
-
-	When updating the $(obj)/bzImage target, the line
-
-	BUILD    arch/x86/boot/bzImage
-
-	will be displayed with "make KBUILD_VERBOSE=0".
-
-
---- 6.9 Preprocessing linker scripts
-
-	When the vmlinux image is built, the linker script
-	arch/$(ARCH)/kernel/vmlinux.lds is used.
-	The script is a preprocessed variant of the file vmlinux.lds.S
-	located in the same directory.
-	kbuild knows .lds files and includes a rule *lds.S -> *lds.
-
-	Example:
-		#arch/x86/kernel/Makefile
-		always := vmlinux.lds
-
-		#Makefile
-		export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH)
-
-	The assignment to $(always) is used to tell kbuild to build the
-	target vmlinux.lds.
-	The assignment to $(CPPFLAGS_vmlinux.lds) tells kbuild to use the
-	specified options when building the target vmlinux.lds.
-
-	When building the *.lds target, kbuild uses the variables:
-	KBUILD_CPPFLAGS	: Set in top-level Makefile
-	cppflags-y	: May be set in the kbuild makefile
-	CPPFLAGS_$(@F)  : Target-specific flags.
-	                  Note that the full filename is used in this
-	                  assignment.
-
-	The kbuild infrastructure for *lds files is used in several
-	architecture-specific files.
-
---- 6.10 Generic header files
-
-	The directory include/asm-generic contains the header files
-	that may be shared between individual architectures.
-	The recommended approach how to use a generic header file is
-	to list the file in the Kbuild file.
-	See "7.2 generic-y" for further info on syntax etc.
-
---- 6.11 Post-link pass
-
-	If the file arch/xxx/Makefile.postlink exists, this makefile
-	will be invoked for post-link objects (vmlinux and modules.ko)
-	for architectures to run post-link passes on. Must also handle
-	the clean target.
-
-	This pass runs after kallsyms generation. If the architecture
-	needs to modify symbol locations, rather than manipulate the
-	kallsyms, it may be easier to add another postlink target for
-	.tmp_vmlinux? targets to be called from link-vmlinux.sh.
-
-	For example, powerpc uses this to check relocation sanity of
-	the linked vmlinux file.
-
-=== 7 Kbuild syntax for exported headers
-
-The kernel includes a set of headers that is exported to userspace.
-Many headers can be exported as-is but other headers require a
-minimal pre-processing before they are ready for user-space.
-The pre-processing does:
-- drop kernel-specific annotations
-- drop include of compiler.h
-- drop all sections that are kernel internal (guarded by ifdef __KERNEL__)
-
-All headers under include/uapi/, include/generated/uapi/,
-arch/<arch>/include/uapi/ and arch/<arch>/include/generated/uapi/
-are exported.
-
-A Kbuild file may be defined under arch/<arch>/include/uapi/asm/ and
-arch/<arch>/include/asm/ to list asm files coming from asm-generic.
-See subsequent chapter for the syntax of the Kbuild file.
-
---- 7.1 no-export-headers
-
-	no-export-headers is essentially used by include/uapi/linux/Kbuild to
-	avoid exporting specific headers (e.g. kvm.h) on architectures that do
-	not support it. It should be avoided as much as possible.
-
---- 7.2 generic-y
-
-	If an architecture uses a verbatim copy of a header from
-	include/asm-generic then this is listed in the file
-	arch/$(ARCH)/include/asm/Kbuild like this:
-
-		Example:
-			#arch/x86/include/asm/Kbuild
-			generic-y += termios.h
-			generic-y += rtc.h
-
-	During the prepare phase of the build a wrapper include
-	file is generated in the directory:
-
-		arch/$(ARCH)/include/generated/asm
-
-	When a header is exported where the architecture uses
-	the generic header a similar wrapper is generated as part
-	of the set of exported headers in the directory:
-
-		usr/include/asm
-
-	The generated wrapper will in both cases look like the following:
-
-		Example: termios.h
-			#include <asm-generic/termios.h>
-
---- 7.3 generated-y
-
-	If an architecture generates other header files alongside generic-y
-	wrappers, generated-y specifies them.
-
-	This prevents them being treated as stale asm-generic wrappers and
-	removed.
-
-		Example:
-			#arch/x86/include/asm/Kbuild
-			generated-y += syscalls_32.h
-
---- 7.4 mandatory-y
-
-	mandatory-y is essentially used by include/(uapi/)asm-generic/Kbuild
-	to define the minimum set of ASM headers that all architectures must have.
-
-	This works like optional generic-y. If a mandatory header is missing
-	in arch/$(ARCH)/include/(uapi/)/asm, Kbuild will automatically generate
-	a wrapper of the asm-generic one.
-
-	The convention is to list one subdir per line and
-	preferably in alphabetic order.
-
-=== 8 Kbuild Variables
-
-The top Makefile exports the following variables:
-
-    VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION
-
-	These variables define the current kernel version.  A few arch
-	Makefiles actually use these values directly; they should use
-	$(KERNELRELEASE) instead.
-
-	$(VERSION), $(PATCHLEVEL), and $(SUBLEVEL) define the basic
-	three-part version number, such as "2", "4", and "0".  These three
-	values are always numeric.
-
-	$(EXTRAVERSION) defines an even tinier sublevel for pre-patches
-	or additional patches.	It is usually some non-numeric string
-	such as "-pre4", and is often blank.
-
-    KERNELRELEASE
-
-	$(KERNELRELEASE) is a single string such as "2.4.0-pre4", suitable
-	for constructing installation directory names or showing in
-	version strings.  Some arch Makefiles use it for this purpose.
-
-    ARCH
-
-	This variable defines the target architecture, such as "i386",
-	"arm", or "sparc". Some kbuild Makefiles test $(ARCH) to
-	determine which files to compile.
-
-	By default, the top Makefile sets $(ARCH) to be the same as the
-	host system architecture.  For a cross build, a user may
-	override the value of $(ARCH) on the command line:
-
-	    make ARCH=m68k ...
-
-
-    INSTALL_PATH
-
-	This variable defines a place for the arch Makefiles to install
-	the resident kernel image and System.map file.
-	Use this for architecture-specific install targets.
-
-    INSTALL_MOD_PATH, MODLIB
-
-	$(INSTALL_MOD_PATH) specifies a prefix to $(MODLIB) for module
-	installation.  This variable is not defined in the Makefile but
-	may be passed in by the user if desired.
-
-	$(MODLIB) specifies the directory for module installation.
-	The top Makefile defines $(MODLIB) to
-	$(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE).  The user may
-	override this value on the command line if desired.
-
-    INSTALL_MOD_STRIP
-
-	If this variable is specified, it will cause modules to be stripped
-	after they are installed.  If INSTALL_MOD_STRIP is '1', then the
-	default option --strip-debug will be used.  Otherwise, the
-	INSTALL_MOD_STRIP value will be used as the option(s) to the strip
-	command.
-
-
-=== 9 Makefile language
-
-The kernel Makefiles are designed to be run with GNU Make.  The Makefiles
-use only the documented features of GNU Make, but they do use many
-GNU extensions.
-
-GNU Make supports elementary list-processing functions.  The kernel
-Makefiles use a novel style of list building and manipulation with few
-"if" statements.
-
-GNU Make has two assignment operators, ":=" and "=".  ":=" performs
-immediate evaluation of the right-hand side and stores an actual string
-into the left-hand side.  "=" is like a formula definition; it stores the
-right-hand side in an unevaluated form and then evaluates this form each
-time the left-hand side is used.
-
-There are some cases where "=" is appropriate.  Usually, though, ":="
-is the right choice.
-
-=== 10 Credits
-
-Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net>
-Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
-Updates by Sam Ravnborg <sam@ravnborg.org>
-Language QA by Jan Engelhardt <jengelh@gmx.de>
-
-=== 11 TODO
-
-- Describe how kbuild supports shipped files with _shipped.
-- Generating offset header files.
-- Add more variables to section 7?
-
-
-
diff --git a/Documentation/kbuild/modules.rst b/Documentation/kbuild/modules.rst
new file mode 100644
index 000000000000..24e763482650
--- /dev/null
+++ b/Documentation/kbuild/modules.rst
@@ -0,0 +1,571 @@
+=========================
+Building External Modules
+=========================
+
+This document describes how to build an out-of-tree kernel module.
+
+.. Table of Contents
+
+	=== 1 Introduction
+	=== 2 How to Build External Modules
+	   --- 2.1 Command Syntax
+	   --- 2.2 Options
+	   --- 2.3 Targets
+	   --- 2.4 Building Separate Files
+	=== 3. Creating a Kbuild File for an External Module
+	   --- 3.1 Shared Makefile
+	   --- 3.2 Separate Kbuild file and Makefile
+	   --- 3.3 Binary Blobs
+	   --- 3.4 Building Multiple Modules
+	=== 4. Include Files
+	   --- 4.1 Kernel Includes
+	   --- 4.2 Single Subdirectory
+	   --- 4.3 Several Subdirectories
+	=== 5. Module Installation
+	   --- 5.1 INSTALL_MOD_PATH
+	   --- 5.2 INSTALL_MOD_DIR
+	=== 6. Module Versioning
+	   --- 6.1 Symbols From the Kernel (vmlinux + modules)
+	   --- 6.2 Symbols and External Modules
+	   --- 6.3 Symbols From Another External Module
+	=== 7. Tips & Tricks
+	   --- 7.1 Testing for CONFIG_FOO_BAR
+
+
+
+1. Introduction
+===============
+
+"kbuild" is the build system used by the Linux kernel. Modules must use
+kbuild to stay compatible with changes in the build infrastructure and
+to pick up the right flags to "gcc." Functionality for building modules
+both in-tree and out-of-tree is provided. The method for building
+either is similar, and all modules are initially developed and built
+out-of-tree.
+
+Covered in this document is information aimed at developers interested
+in building out-of-tree (or "external") modules. The author of an
+external module should supply a makefile that hides most of the
+complexity, so one only has to type "make" to build the module. This is
+easily accomplished, and a complete example will be presented in
+section 3.
+
+
+2. How to Build External Modules
+================================
+
+To build external modules, you must have a prebuilt kernel available
+that contains the configuration and header files used in the build.
+Also, the kernel must have been built with modules enabled. If you are
+using a distribution kernel, there will be a package for the kernel you
+are running provided by your distribution.
+
+An alternative is to use the "make" target "modules_prepare." This will
+make sure the kernel contains the information required. The target
+exists solely as a simple way to prepare a kernel source tree for
+building external modules.
+
+NOTE: "modules_prepare" will not build Module.symvers even if
+CONFIG_MODVERSIONS is set; therefore, a full kernel build needs to be
+executed to make module versioning work.
+
+2.1 Command Syntax
+==================
+
+	The command to build an external module is::
+
+		$ make -C <path_to_kernel_src> M=$PWD
+
+	The kbuild system knows that an external module is being built
+	due to the "M=<dir>" option given in the command.
+
+	To build against the running kernel use::
+
+		$ make -C /lib/modules/`uname -r`/build M=$PWD
+
+	Then to install the module(s) just built, add the target
+	"modules_install" to the command::
+
+		$ make -C /lib/modules/`uname -r`/build M=$PWD modules_install
+
+2.2 Options
+===========
+
+	($KDIR refers to the path of the kernel source directory.)
+
+	make -C $KDIR M=$PWD
+
+	-C $KDIR
+		The directory where the kernel source is located.
+		"make" will actually change to the specified directory
+		when executing and will change back when finished.
+
+	M=$PWD
+		Informs kbuild that an external module is being built.
+		The value given to "M" is the absolute path of the
+		directory where the external module (kbuild file) is
+		located.
+
+2.3 Targets
+===========
+
+	When building an external module, only a subset of the "make"
+	targets are available.
+
+	make -C $KDIR M=$PWD [target]
+
+	The default will build the module(s) located in the current
+	directory, so a target does not need to be specified. All
+	output files will also be generated in this directory. No
+	attempts are made to update the kernel source, and it is a
+	precondition that a successful "make" has been executed for the
+	kernel.
+
+	modules
+		The default target for external modules. It has the
+		same functionality as if no target was specified. See
+		description above.
+
+	modules_install
+		Install the external module(s). The default location is
+		/lib/modules/<kernel_release>/extra/, but a prefix may
+		be added with INSTALL_MOD_PATH (discussed in section 5).
+
+	clean
+		Remove all generated files in the module directory only.
+
+	help
+		List the available targets for external modules.
+
+2.4 Building Separate Files
+===========================
+
+	It is possible to build single files that are part of a module.
+	This works equally well for the kernel, a module, and even for
+	external modules.
+
+	Example (The module foo.ko, consist of bar.o and baz.o)::
+
+		make -C $KDIR M=$PWD bar.lst
+		make -C $KDIR M=$PWD baz.o
+		make -C $KDIR M=$PWD foo.ko
+		make -C $KDIR M=$PWD ./
+
+
+3. Creating a Kbuild File for an External Module
+================================================
+
+In the last section we saw the command to build a module for the
+running kernel. The module is not actually built, however, because a
+build file is required. Contained in this file will be the name of
+the module(s) being built, along with the list of requisite source
+files. The file may be as simple as a single line::
+
+	obj-m := <module_name>.o
+
+The kbuild system will build <module_name>.o from <module_name>.c,
+and, after linking, will result in the kernel module <module_name>.ko.
+The above line can be put in either a "Kbuild" file or a "Makefile."
+When the module is built from multiple sources, an additional line is
+needed listing the files::
+
+	<module_name>-y := <src1>.o <src2>.o ...
+
+NOTE: Further documentation describing the syntax used by kbuild is
+located in Documentation/kbuild/makefiles.rst.
+
+The examples below demonstrate how to create a build file for the
+module 8123.ko, which is built from the following files::
+
+	8123_if.c
+	8123_if.h
+	8123_pci.c
+	8123_bin.o_shipped	<= Binary blob
+
+--- 3.1 Shared Makefile
+
+	An external module always includes a wrapper makefile that
+	supports building the module using "make" with no arguments.
+	This target is not used by kbuild; it is only for convenience.
+	Additional functionality, such as test targets, can be included
+	but should be filtered out from kbuild due to possible name
+	clashes.
+
+	Example 1::
+
+		--> filename: Makefile
+		ifneq ($(KERNELRELEASE),)
+		# kbuild part of makefile
+		obj-m  := 8123.o
+		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
+
+		else
+		# normal makefile
+		KDIR ?= /lib/modules/`uname -r`/build
+
+		default:
+			$(MAKE) -C $(KDIR) M=$$PWD
+
+		# Module specific targets
+		genbin:
+			echo "X" > 8123_bin.o_shipped
+
+		endif
+
+	The check for KERNELRELEASE is used to separate the two parts
+	of the makefile. In the example, kbuild will only see the two
+	assignments, whereas "make" will see everything except these
+	two assignments. This is due to two passes made on the file:
+	the first pass is by the "make" instance run on the command
+	line; the second pass is by the kbuild system, which is
+	initiated by the parameterized "make" in the default target.
+
+3.2 Separate Kbuild File and Makefile
+-------------------------------------
+
+	In newer versions of the kernel, kbuild will first look for a
+	file named "Kbuild," and only if that is not found, will it
+	then look for a makefile. Utilizing a "Kbuild" file allows us
+	to split up the makefile from example 1 into two files:
+
+	Example 2::
+
+		--> filename: Kbuild
+		obj-m  := 8123.o
+		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
+
+		--> filename: Makefile
+		KDIR ?= /lib/modules/`uname -r`/build
+
+		default:
+			$(MAKE) -C $(KDIR) M=$$PWD
+
+		# Module specific targets
+		genbin:
+			echo "X" > 8123_bin.o_shipped
+
+	The split in example 2 is questionable due to the simplicity of
+	each file; however, some external modules use makefiles
+	consisting of several hundred lines, and here it really pays
+	off to separate the kbuild part from the rest.
+
+	The next example shows a backward compatible version.
+
+	Example 3::
+
+		--> filename: Kbuild
+		obj-m  := 8123.o
+		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
+
+		--> filename: Makefile
+		ifneq ($(KERNELRELEASE),)
+		# kbuild part of makefile
+		include Kbuild
+
+		else
+		# normal makefile
+		KDIR ?= /lib/modules/`uname -r`/build
+
+		default:
+			$(MAKE) -C $(KDIR) M=$$PWD
+
+		# Module specific targets
+		genbin:
+			echo "X" > 8123_bin.o_shipped
+
+		endif
+
+	Here the "Kbuild" file is included from the makefile. This
+	allows an older version of kbuild, which only knows of
+	makefiles, to be used when the "make" and kbuild parts are
+	split into separate files.
+
+3.3 Binary Blobs
+----------------
+
+	Some external modules need to include an object file as a blob.
+	kbuild has support for this, but requires the blob file to be
+	named <filename>_shipped. When the kbuild rules kick in, a copy
+	of <filename>_shipped is created with _shipped stripped off,
+	giving us <filename>. This shortened filename can be used in
+	the assignment to the module.
+
+	Throughout this section, 8123_bin.o_shipped has been used to
+	build the kernel module 8123.ko; it has been included as
+	8123_bin.o::
+
+		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
+
+	Although there is no distinction between the ordinary source
+	files and the binary file, kbuild will pick up different rules
+	when creating the object file for the module.
+
+3.4 Building Multiple Modules
+=============================
+
+	kbuild supports building multiple modules with a single build
+	file. For example, if you wanted to build two modules, foo.ko
+	and bar.ko, the kbuild lines would be::
+
+		obj-m := foo.o bar.o
+		foo-y := <foo_srcs>
+		bar-y := <bar_srcs>
+
+	It is that simple!
+
+
+4. Include Files
+================
+
+Within the kernel, header files are kept in standard locations
+according to the following rule:
+
+	* If the header file only describes the internal interface of a
+	  module, then the file is placed in the same directory as the
+	  source files.
+	* If the header file describes an interface used by other parts
+	  of the kernel that are located in different directories, then
+	  the file is placed in include/linux/.
+
+	  NOTE:
+	      There are two notable exceptions to this rule: larger
+	      subsystems have their own directory under include/, such as
+	      include/scsi; and architecture specific headers are located
+	      under arch/$(ARCH)/include/.
+
+4.1 Kernel Includes
+-------------------
+
+	To include a header file located under include/linux/, simply
+	use::
+
+		#include <linux/module.h>
+
+	kbuild will add options to "gcc" so the relevant directories
+	are searched.
+
+4.2 Single Subdirectory
+-----------------------
+
+	External modules tend to place header files in a separate
+	include/ directory where their source is located, although this
+	is not the usual kernel style. To inform kbuild of the
+	directory, use either ccflags-y or CFLAGS_<filename>.o.
+
+	Using the example from section 3, if we moved 8123_if.h to a
+	subdirectory named include, the resulting kbuild file would
+	look like::
+
+		--> filename: Kbuild
+		obj-m := 8123.o
+
+		ccflags-y := -Iinclude
+		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
+
+	Note that in the assignment there is no space between -I and
+	the path. This is a limitation of kbuild: there must be no
+	space present.
+
+4.3 Several Subdirectories
+--------------------------
+
+	kbuild can handle files that are spread over several directories.
+	Consider the following example::
+
+		.
+		|__ src
+		|   |__ complex_main.c
+		|   |__ hal
+		|	|__ hardwareif.c
+		|	|__ include
+		|	    |__ hardwareif.h
+		|__ include
+		|__ complex.h
+
+	To build the module complex.ko, we then need the following
+	kbuild file::
+
+		--> filename: Kbuild
+		obj-m := complex.o
+		complex-y := src/complex_main.o
+		complex-y += src/hal/hardwareif.o
+
+		ccflags-y := -I$(src)/include
+		ccflags-y += -I$(src)/src/hal/include
+
+	As you can see, kbuild knows how to handle object files located
+	in other directories. The trick is to specify the directory
+	relative to the kbuild file's location. That being said, this
+	is NOT recommended practice.
+
+	For the header files, kbuild must be explicitly told where to
+	look. When kbuild executes, the current directory is always the
+	root of the kernel tree (the argument to "-C") and therefore an
+	absolute path is needed. $(src) provides the absolute path by
+	pointing to the directory where the currently executing kbuild
+	file is located.
+
+
+5. Module Installation
+======================
+
+Modules which are included in the kernel are installed in the
+directory:
+
+	/lib/modules/$(KERNELRELEASE)/kernel/
+
+And external modules are installed in:
+
+	/lib/modules/$(KERNELRELEASE)/extra/
+
+5.1 INSTALL_MOD_PATH
+--------------------
+
+	Above are the default directories but as always some level of
+	customization is possible. A prefix can be added to the
+	installation path using the variable INSTALL_MOD_PATH::
+
+		$ make INSTALL_MOD_PATH=/frodo modules_install
+		=> Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel/
+
+	INSTALL_MOD_PATH may be set as an ordinary shell variable or,
+	as shown above, can be specified on the command line when
+	calling "make." This has effect when installing both in-tree
+	and out-of-tree modules.
+
+5.2 INSTALL_MOD_DIR
+-------------------
+
+	External modules are by default installed to a directory under
+	/lib/modules/$(KERNELRELEASE)/extra/, but you may wish to
+	locate modules for a specific functionality in a separate
+	directory. For this purpose, use INSTALL_MOD_DIR to specify an
+	alternative name to "extra."::
+
+		$ make INSTALL_MOD_DIR=gandalf -C $KDIR \
+		       M=$PWD modules_install
+		=> Install dir: /lib/modules/$(KERNELRELEASE)/gandalf/
+
+
+6. Module Versioning
+====================
+
+Module versioning is enabled by the CONFIG_MODVERSIONS tag, and is used
+as a simple ABI consistency check. A CRC value of the full prototype
+for an exported symbol is created. When a module is loaded/used, the
+CRC values contained in the kernel are compared with similar values in
+the module; if they are not equal, the kernel refuses to load the
+module.
+
+Module.symvers contains a list of all exported symbols from a kernel
+build.
+
+6.1 Symbols From the Kernel (vmlinux + modules)
+-----------------------------------------------
+
+	During a kernel build, a file named Module.symvers will be
+	generated. Module.symvers contains all exported symbols from
+	the kernel and compiled modules. For each symbol, the
+	corresponding CRC value is also stored.
+
+	The syntax of the Module.symvers file is::
+
+		<CRC>	    <Symbol>	       <module>
+
+		0x2d036834  scsi_remove_host   drivers/scsi/scsi_mod
+
+	For a kernel build without CONFIG_MODVERSIONS enabled, the CRC
+	would read 0x00000000.
+
+	Module.symvers serves two purposes:
+
+	1) It lists all exported symbols from vmlinux and all modules.
+	2) It lists the CRC if CONFIG_MODVERSIONS is enabled.
+
+6.2 Symbols and External Modules
+--------------------------------
+
+	When building an external module, the build system needs access
+	to the symbols from the kernel to check if all external symbols
+	are defined. This is done in the MODPOST step. modpost obtains
+	the symbols by reading Module.symvers from the kernel source
+	tree. If a Module.symvers file is present in the directory
+	where the external module is being built, this file will be
+	read too. During the MODPOST step, a new Module.symvers file
+	will be written containing all exported symbols that were not
+	defined in the kernel.
+
+--- 6.3 Symbols From Another External Module
+
+	Sometimes, an external module uses exported symbols from
+	another external module. kbuild needs to have full knowledge of
+	all symbols to avoid spitting out warnings about undefined
+	symbols. Three solutions exist for this situation.
+
+	NOTE: The method with a top-level kbuild file is recommended
+	but may be impractical in certain situations.
+
+	Use a top-level kbuild file
+		If you have two modules, foo.ko and bar.ko, where
+		foo.ko needs symbols from bar.ko, you can use a
+		common top-level kbuild file so both modules are
+		compiled in the same build. Consider the following
+		directory layout::
+
+			./foo/ <= contains foo.ko
+			./bar/ <= contains bar.ko
+
+		The top-level kbuild file would then look like::
+
+			#./Kbuild (or ./Makefile):
+				obj-y := foo/ bar/
+
+		And executing::
+
+			$ make -C $KDIR M=$PWD
+
+		will then do the expected and compile both modules with
+		full knowledge of symbols from either module.
+
+	Use an extra Module.symvers file
+		When an external module is built, a Module.symvers file
+		is generated containing all exported symbols which are
+		not defined in the kernel. To get access to symbols
+		from bar.ko, copy the Module.symvers file from the
+		compilation of bar.ko to the directory where foo.ko is
+		built. During the module build, kbuild will read the
+		Module.symvers file in the directory of the external
+		module, and when the build is finished, a new
+		Module.symvers file is created containing the sum of
+		all symbols defined and not part of the kernel.
+
+	Use "make" variable KBUILD_EXTRA_SYMBOLS
+		If it is impractical to copy Module.symvers from
+		another module, you can assign a space separated list
+		of files to KBUILD_EXTRA_SYMBOLS in your build file.
+		These files will be loaded by modpost during the
+		initialization of its symbol tables.
+
+
+7. Tips & Tricks
+================
+
+7.1 Testing for CONFIG_FOO_BAR
+------------------------------
+
+	Modules often need to check for certain `CONFIG_` options to
+	decide if a specific feature is included in the module. In
+	kbuild this is done by referencing the `CONFIG_` variable
+	directly::
+
+		#fs/ext2/Makefile
+		obj-$(CONFIG_EXT2_FS) += ext2.o
+
+		ext2-y := balloc.o bitmap.o dir.o
+		ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o
+
+	External modules have traditionally used "grep" to check for
+	specific `CONFIG_` settings directly in .config. This usage is
+	broken. As introduced before, external modules should use
+	kbuild for building and can therefore use the same methods as
+	in-tree modules when testing for `CONFIG_` definitions.
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt
deleted file mode 100644
index 80295c613e37..000000000000
--- a/Documentation/kbuild/modules.txt
+++ /dev/null
@@ -1,541 +0,0 @@
-Building External Modules
-
-This document describes how to build an out-of-tree kernel module.
-
-=== Table of Contents
-
-	=== 1 Introduction
-	=== 2 How to Build External Modules
-	   --- 2.1 Command Syntax
-	   --- 2.2 Options
-	   --- 2.3 Targets
-	   --- 2.4 Building Separate Files
-	=== 3. Creating a Kbuild File for an External Module
-	   --- 3.1 Shared Makefile
-	   --- 3.2 Separate Kbuild file and Makefile
-	   --- 3.3 Binary Blobs
-	   --- 3.4 Building Multiple Modules
-	=== 4. Include Files
-	   --- 4.1 Kernel Includes
-	   --- 4.2 Single Subdirectory
-	   --- 4.3 Several Subdirectories
-	=== 5. Module Installation
-	   --- 5.1 INSTALL_MOD_PATH
-	   --- 5.2 INSTALL_MOD_DIR
-	=== 6. Module Versioning
-	   --- 6.1 Symbols From the Kernel (vmlinux + modules)
-	   --- 6.2 Symbols and External Modules
-	   --- 6.3 Symbols From Another External Module
-	=== 7. Tips & Tricks
-	   --- 7.1 Testing for CONFIG_FOO_BAR
-
-
-
-=== 1. Introduction
-
-"kbuild" is the build system used by the Linux kernel. Modules must use
-kbuild to stay compatible with changes in the build infrastructure and
-to pick up the right flags to "gcc." Functionality for building modules
-both in-tree and out-of-tree is provided. The method for building
-either is similar, and all modules are initially developed and built
-out-of-tree.
-
-Covered in this document is information aimed at developers interested
-in building out-of-tree (or "external") modules. The author of an
-external module should supply a makefile that hides most of the
-complexity, so one only has to type "make" to build the module. This is
-easily accomplished, and a complete example will be presented in
-section 3.
-
-
-=== 2. How to Build External Modules
-
-To build external modules, you must have a prebuilt kernel available
-that contains the configuration and header files used in the build.
-Also, the kernel must have been built with modules enabled. If you are
-using a distribution kernel, there will be a package for the kernel you
-are running provided by your distribution.
-
-An alternative is to use the "make" target "modules_prepare." This will
-make sure the kernel contains the information required. The target
-exists solely as a simple way to prepare a kernel source tree for
-building external modules.
-
-NOTE: "modules_prepare" will not build Module.symvers even if
-CONFIG_MODVERSIONS is set; therefore, a full kernel build needs to be
-executed to make module versioning work.
-
---- 2.1 Command Syntax
-
-	The command to build an external module is:
-
-		$ make -C <path_to_kernel_src> M=$PWD
-
-	The kbuild system knows that an external module is being built
-	due to the "M=<dir>" option given in the command.
-
-	To build against the running kernel use:
-
-		$ make -C /lib/modules/`uname -r`/build M=$PWD
-
-	Then to install the module(s) just built, add the target
-	"modules_install" to the command:
-
-		$ make -C /lib/modules/`uname -r`/build M=$PWD modules_install
-
---- 2.2 Options
-
-	($KDIR refers to the path of the kernel source directory.)
-
-	make -C $KDIR M=$PWD
-
-	-C $KDIR
-		The directory where the kernel source is located.
-		"make" will actually change to the specified directory
-		when executing and will change back when finished.
-
-	M=$PWD
-		Informs kbuild that an external module is being built.
-		The value given to "M" is the absolute path of the
-		directory where the external module (kbuild file) is
-		located.
-
---- 2.3 Targets
-
-	When building an external module, only a subset of the "make"
-	targets are available.
-
-	make -C $KDIR M=$PWD [target]
-
-	The default will build the module(s) located in the current
-	directory, so a target does not need to be specified. All
-	output files will also be generated in this directory. No
-	attempts are made to update the kernel source, and it is a
-	precondition that a successful "make" has been executed for the
-	kernel.
-
-	modules
-		The default target for external modules. It has the
-		same functionality as if no target was specified. See
-		description above.
-
-	modules_install
-		Install the external module(s). The default location is
-		/lib/modules/<kernel_release>/extra/, but a prefix may
-		be added with INSTALL_MOD_PATH (discussed in section 5).
-
-	clean
-		Remove all generated files in the module directory only.
-
-	help
-		List the available targets for external modules.
-
---- 2.4 Building Separate Files
-
-	It is possible to build single files that are part of a module.
-	This works equally well for the kernel, a module, and even for
-	external modules.
-
-	Example (The module foo.ko, consist of bar.o and baz.o):
-		make -C $KDIR M=$PWD bar.lst
-		make -C $KDIR M=$PWD baz.o
-		make -C $KDIR M=$PWD foo.ko
-		make -C $KDIR M=$PWD ./
-
-
-=== 3. Creating a Kbuild File for an External Module
-
-In the last section we saw the command to build a module for the
-running kernel. The module is not actually built, however, because a
-build file is required. Contained in this file will be the name of
-the module(s) being built, along with the list of requisite source
-files. The file may be as simple as a single line:
-
-	obj-m := <module_name>.o
-
-The kbuild system will build <module_name>.o from <module_name>.c,
-and, after linking, will result in the kernel module <module_name>.ko.
-The above line can be put in either a "Kbuild" file or a "Makefile."
-When the module is built from multiple sources, an additional line is
-needed listing the files:
-
-	<module_name>-y := <src1>.o <src2>.o ...
-
-NOTE: Further documentation describing the syntax used by kbuild is
-located in Documentation/kbuild/makefiles.txt.
-
-The examples below demonstrate how to create a build file for the
-module 8123.ko, which is built from the following files:
-
-	8123_if.c
-	8123_if.h
-	8123_pci.c
-	8123_bin.o_shipped	<= Binary blob
-
---- 3.1 Shared Makefile
-
-	An external module always includes a wrapper makefile that
-	supports building the module using "make" with no arguments.
-	This target is not used by kbuild; it is only for convenience.
-	Additional functionality, such as test targets, can be included
-	but should be filtered out from kbuild due to possible name
-	clashes.
-
-	Example 1:
-		--> filename: Makefile
-		ifneq ($(KERNELRELEASE),)
-		# kbuild part of makefile
-		obj-m  := 8123.o
-		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
-
-		else
-		# normal makefile
-		KDIR ?= /lib/modules/`uname -r`/build
-
-		default:
-			$(MAKE) -C $(KDIR) M=$$PWD
-
-		# Module specific targets
-		genbin:
-			echo "X" > 8123_bin.o_shipped
-
-		endif
-
-	The check for KERNELRELEASE is used to separate the two parts
-	of the makefile. In the example, kbuild will only see the two
-	assignments, whereas "make" will see everything except these
-	two assignments. This is due to two passes made on the file:
-	the first pass is by the "make" instance run on the command
-	line; the second pass is by the kbuild system, which is
-	initiated by the parameterized "make" in the default target.
-
---- 3.2 Separate Kbuild File and Makefile
-
-	In newer versions of the kernel, kbuild will first look for a
-	file named "Kbuild," and only if that is not found, will it
-	then look for a makefile. Utilizing a "Kbuild" file allows us
-	to split up the makefile from example 1 into two files:
-
-	Example 2:
-		--> filename: Kbuild
-		obj-m  := 8123.o
-		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
-
-		--> filename: Makefile
-		KDIR ?= /lib/modules/`uname -r`/build
-
-		default:
-			$(MAKE) -C $(KDIR) M=$$PWD
-
-		# Module specific targets
-		genbin:
-			echo "X" > 8123_bin.o_shipped
-
-	The split in example 2 is questionable due to the simplicity of
-	each file; however, some external modules use makefiles
-	consisting of several hundred lines, and here it really pays
-	off to separate the kbuild part from the rest.
-
-	The next example shows a backward compatible version.
-
-	Example 3:
-		--> filename: Kbuild
-		obj-m  := 8123.o
-		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
-
-		--> filename: Makefile
-		ifneq ($(KERNELRELEASE),)
-		# kbuild part of makefile
-		include Kbuild
-
-		else
-		# normal makefile
-		KDIR ?= /lib/modules/`uname -r`/build
-
-		default:
-			$(MAKE) -C $(KDIR) M=$$PWD
-
-		# Module specific targets
-		genbin:
-			echo "X" > 8123_bin.o_shipped
-
-		endif
-
-	Here the "Kbuild" file is included from the makefile. This
-	allows an older version of kbuild, which only knows of
-	makefiles, to be used when the "make" and kbuild parts are
-	split into separate files.
-
---- 3.3 Binary Blobs
-
-	Some external modules need to include an object file as a blob.
-	kbuild has support for this, but requires the blob file to be
-	named <filename>_shipped. When the kbuild rules kick in, a copy
-	of <filename>_shipped is created with _shipped stripped off,
-	giving us <filename>. This shortened filename can be used in
-	the assignment to the module.
-
-	Throughout this section, 8123_bin.o_shipped has been used to
-	build the kernel module 8123.ko; it has been included as
-	8123_bin.o.
-
-		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
-
-	Although there is no distinction between the ordinary source
-	files and the binary file, kbuild will pick up different rules
-	when creating the object file for the module.
-
---- 3.4 Building Multiple Modules
-
-	kbuild supports building multiple modules with a single build
-	file. For example, if you wanted to build two modules, foo.ko
-	and bar.ko, the kbuild lines would be:
-
-		obj-m := foo.o bar.o
-		foo-y := <foo_srcs>
-		bar-y := <bar_srcs>
-
-	It is that simple!
-
-
-=== 4. Include Files
-
-Within the kernel, header files are kept in standard locations
-according to the following rule:
-
-	* If the header file only describes the internal interface of a
-	  module, then the file is placed in the same directory as the
-	  source files.
-	* If the header file describes an interface used by other parts
-	  of the kernel that are located in different directories, then
-	  the file is placed in include/linux/.
-
-	  NOTE: There are two notable exceptions to this rule: larger
-	  subsystems have their own directory under include/, such as
-	  include/scsi; and architecture specific headers are located
-	  under arch/$(ARCH)/include/.
-
---- 4.1 Kernel Includes
-
-	To include a header file located under include/linux/, simply
-	use:
-
-		#include <linux/module.h>
-
-	kbuild will add options to "gcc" so the relevant directories
-	are searched.
-
---- 4.2 Single Subdirectory
-
-	External modules tend to place header files in a separate
-	include/ directory where their source is located, although this
-	is not the usual kernel style. To inform kbuild of the
-	directory, use either ccflags-y or CFLAGS_<filename>.o.
-
-	Using the example from section 3, if we moved 8123_if.h to a
-	subdirectory named include, the resulting kbuild file would
-	look like:
-
-		--> filename: Kbuild
-		obj-m := 8123.o
-
-		ccflags-y := -Iinclude
-		8123-y := 8123_if.o 8123_pci.o 8123_bin.o
-
-	Note that in the assignment there is no space between -I and
-	the path. This is a limitation of kbuild: there must be no
-	space present.
-
---- 4.3 Several Subdirectories
-
-	kbuild can handle files that are spread over several directories.
-	Consider the following example:
-
-	.
-	|__ src
-	|   |__ complex_main.c
-	|   |__ hal
-	|	|__ hardwareif.c
-	|	|__ include
-	|	    |__ hardwareif.h
-	|__ include
-	    |__ complex.h
-
-	To build the module complex.ko, we then need the following
-	kbuild file:
-
-		--> filename: Kbuild
-		obj-m := complex.o
-		complex-y := src/complex_main.o
-		complex-y += src/hal/hardwareif.o
-
-		ccflags-y := -I$(src)/include
-		ccflags-y += -I$(src)/src/hal/include
-
-	As you can see, kbuild knows how to handle object files located
-	in other directories. The trick is to specify the directory
-	relative to the kbuild file's location. That being said, this
-	is NOT recommended practice.
-
-	For the header files, kbuild must be explicitly told where to
-	look. When kbuild executes, the current directory is always the
-	root of the kernel tree (the argument to "-C") and therefore an
-	absolute path is needed. $(src) provides the absolute path by
-	pointing to the directory where the currently executing kbuild
-	file is located.
-
-
-=== 5. Module Installation
-
-Modules which are included in the kernel are installed in the
-directory:
-
-	/lib/modules/$(KERNELRELEASE)/kernel/
-
-And external modules are installed in:
-
-	/lib/modules/$(KERNELRELEASE)/extra/
-
---- 5.1 INSTALL_MOD_PATH
-
-	Above are the default directories but as always some level of
-	customization is possible. A prefix can be added to the
-	installation path using the variable INSTALL_MOD_PATH:
-
-		$ make INSTALL_MOD_PATH=/frodo modules_install
-		=> Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel/
-
-	INSTALL_MOD_PATH may be set as an ordinary shell variable or,
-	as shown above, can be specified on the command line when
-	calling "make." This has effect when installing both in-tree
-	and out-of-tree modules.
-
---- 5.2 INSTALL_MOD_DIR
-
-	External modules are by default installed to a directory under
-	/lib/modules/$(KERNELRELEASE)/extra/, but you may wish to
-	locate modules for a specific functionality in a separate
-	directory. For this purpose, use INSTALL_MOD_DIR to specify an
-	alternative name to "extra."
-
-		$ make INSTALL_MOD_DIR=gandalf -C $KDIR \
-		       M=$PWD modules_install
-		=> Install dir: /lib/modules/$(KERNELRELEASE)/gandalf/
-
-
-=== 6. Module Versioning
-
-Module versioning is enabled by the CONFIG_MODVERSIONS tag, and is used
-as a simple ABI consistency check. A CRC value of the full prototype
-for an exported symbol is created. When a module is loaded/used, the
-CRC values contained in the kernel are compared with similar values in
-the module; if they are not equal, the kernel refuses to load the
-module.
-
-Module.symvers contains a list of all exported symbols from a kernel
-build.
-
---- 6.1 Symbols From the Kernel (vmlinux + modules)
-
-	During a kernel build, a file named Module.symvers will be
-	generated. Module.symvers contains all exported symbols from
-	the kernel and compiled modules. For each symbol, the
-	corresponding CRC value is also stored.
-
-	The syntax of the Module.symvers file is:
-		<CRC>	    <Symbol>	       <module>
-
-		0x2d036834  scsi_remove_host   drivers/scsi/scsi_mod
-
-	For a kernel build without CONFIG_MODVERSIONS enabled, the CRC
-	would read 0x00000000.
-
-	Module.symvers serves two purposes:
-	1) It lists all exported symbols from vmlinux and all modules.
-	2) It lists the CRC if CONFIG_MODVERSIONS is enabled.
-
---- 6.2 Symbols and External Modules
-
-	When building an external module, the build system needs access
-	to the symbols from the kernel to check if all external symbols
-	are defined. This is done in the MODPOST step. modpost obtains
-	the symbols by reading Module.symvers from the kernel source
-	tree. If a Module.symvers file is present in the directory
-	where the external module is being built, this file will be
-	read too. During the MODPOST step, a new Module.symvers file
-	will be written containing all exported symbols that were not
-	defined in the kernel.
-
---- 6.3 Symbols From Another External Module
-
-	Sometimes, an external module uses exported symbols from
-	another external module. kbuild needs to have full knowledge of
-	all symbols to avoid spitting out warnings about undefined
-	symbols. Three solutions exist for this situation.
-
-	NOTE: The method with a top-level kbuild file is recommended
-	but may be impractical in certain situations.
-
-	Use a top-level kbuild file
-		If you have two modules, foo.ko and bar.ko, where
-		foo.ko needs symbols from bar.ko, you can use a
-		common top-level kbuild file so both modules are
-		compiled in the same build. Consider the following
-		directory layout:
-
-		./foo/ <= contains foo.ko
-		./bar/ <= contains bar.ko
-
-		The top-level kbuild file would then look like:
-
-		#./Kbuild (or ./Makefile):
-			obj-y := foo/ bar/
-
-		And executing
-
-			$ make -C $KDIR M=$PWD
-
-		will then do the expected and compile both modules with
-		full knowledge of symbols from either module.
-
-	Use an extra Module.symvers file
-		When an external module is built, a Module.symvers file
-		is generated containing all exported symbols which are
-		not defined in the kernel. To get access to symbols
-		from bar.ko, copy the Module.symvers file from the
-		compilation of bar.ko to the directory where foo.ko is
-		built. During the module build, kbuild will read the
-		Module.symvers file in the directory of the external
-		module, and when the build is finished, a new
-		Module.symvers file is created containing the sum of
-		all symbols defined and not part of the kernel.
-
-	Use "make" variable KBUILD_EXTRA_SYMBOLS
-		If it is impractical to copy Module.symvers from
-		another module, you can assign a space separated list
-		of files to KBUILD_EXTRA_SYMBOLS in your build file.
-		These files will be loaded by modpost during the
-		initialization of its symbol tables.
-
-
-=== 7. Tips & Tricks
-
---- 7.1 Testing for CONFIG_FOO_BAR
-
-	Modules often need to check for certain CONFIG_ options to
-	decide if a specific feature is included in the module. In
-	kbuild this is done by referencing the CONFIG_ variable
-	directly.
-
-		#fs/ext2/Makefile
-		obj-$(CONFIG_EXT2_FS) += ext2.o
-
-		ext2-y := balloc.o bitmap.o dir.o
-		ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o
-
-	External modules have traditionally used "grep" to check for
-	specific CONFIG_ settings directly in .config. This usage is
-	broken. As introduced before, external modules should use
-	kbuild for building and can therefore use the same methods as
-	in-tree modules when testing for CONFIG_ definitions.
-
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index d824e4feaff3..5891a701a159 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -718,7 +718,7 @@ make a neat patch, there's administrative work to be done:
 -  Usually you want a configuration option for your kernel hack. Edit
    ``Kconfig`` in the appropriate directory. The Config language is
    simple to use by cut and paste, and there's complete documentation in
-   ``Documentation/kbuild/kconfig-language.txt``.
+   ``Documentation/kbuild/kconfig-language.rst``.
 
    In your description of the option, make sure you address both the
    expert user and the user who knows nothing about your feature.
@@ -728,7 +728,7 @@ make a neat patch, there's administrative work to be done:
 
 -  Edit the ``Makefile``: the CONFIG variables are exported here so you
    can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
-   is documented in ``Documentation/kbuild/makefiles.txt``.
+   is documented in ``Documentation/kbuild/makefiles.rst``.
 
 -  Put yourself in ``CREDITS`` if you've done something noteworthy,
    usually beyond a single file (your name should be at the top of the
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index fa864a51e6ea..f4a2198187f9 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -686,7 +686,7 @@ filesystems) should advertise this prominently in their prompt string::
 	...
 
 For full documentation on the configuration files, see the file
-Documentation/kbuild/kconfig-language.txt.
+Documentation/kbuild/kconfig-language.rst.
 
 
 11) Data structures
diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst
index c88867b173d9..365efc9e4aa8 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -39,7 +39,7 @@ and elsewhere regarding submitting Linux kernel patches.
 
 6) Any new or modified ``CONFIG`` options do not muck up the config menu and
    default to off unless they meet the exception criteria documented in
-   ``Documentation/kbuild/kconfig-language.txt`` Menu attributes: default value.
+   ``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value.
 
 7) All new ``Kconfig`` options have help text.
 
diff --git a/Documentation/translations/it_IT/kernel-hacking/hacking.rst b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
index 7178e517af0a..24c592852bf1 100644
--- a/Documentation/translations/it_IT/kernel-hacking/hacking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
@@ -755,7 +755,7 @@ anche per avere patch pulite, c'è del lavoro amministrativo da fare:
 -  Solitamente vorrete un'opzione di configurazione per la vostra modifica
    al kernel. Modificate ``Kconfig`` nella cartella giusta. Il linguaggio
    Config è facile con copia ed incolla, e c'è una completa documentazione
-   nel file ``Documentation/kbuild/kconfig-language.txt``.
+   nel file ``Documentation/kbuild/kconfig-language.rst``.
 
    Nella descrizione della vostra opzione, assicuratevi di parlare sia agli
    utenti esperti sia agli utente che non sanno nulla del vostro lavoro.
@@ -767,7 +767,7 @@ anche per avere patch pulite, c'è del lavoro amministrativo da fare:
 -  Modificate il file ``Makefile``: le variabili CONFIG sono esportate qui,
    quindi potete solitamente aggiungere una riga come la seguete
    "obj-$(CONFIG_xxx) += xxx.o". La sintassi è documentata nel file
-   ``Documentation/kbuild/makefiles.txt``.
+   ``Documentation/kbuild/makefiles.rst``.
 
 -  Aggiungete voi stessi in ``CREDITS`` se avete fatto qualcosa di notevole,
    solitamente qualcosa che supera il singolo file (comunque il vostro nome
diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst
index a6559d25a23d..8995d2d19f20 100644
--- a/Documentation/translations/it_IT/process/coding-style.rst
+++ b/Documentation/translations/it_IT/process/coding-style.rst
@@ -696,7 +696,7 @@ nella stringa di titolo::
 	...
 
 Per la documentazione completa sui file di configurazione, consultate
-il documento Documentation/kbuild/kconfig-language.txt
+il documento Documentation/kbuild/kconfig-language.rst
 
 
 11) Strutture dati
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst
index 70e65a7b3620..ea74cae958d7 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -43,7 +43,7 @@ sottomissione delle patch, in particolare
 
 6) Le opzioni ``CONFIG``, nuove o modificate, non scombussolano il menu
    di configurazione e sono preimpostate come disabilitate a meno che non
-   soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.txt``
+   soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.rst``
    alla punto "Voci di menu: valori predefiniti".
 
 7) Tutte le nuove opzioni ``Kconfig`` hanno un messaggio di aiuto.
diff --git a/Documentation/translations/zh_CN/process/coding-style.rst b/Documentation/translations/zh_CN/process/coding-style.rst
index 5479c591c2f7..4f6237392e65 100644
--- a/Documentation/translations/zh_CN/process/coding-style.rst
+++ b/Documentation/translations/zh_CN/process/coding-style.rst
@@ -599,7 +599,7 @@ Documentation/doc-guide/ 和 scripts/kernel-doc 以获得详细信息。
 	depends on ADFS_FS
 	...
 
-要查看配置文件的完整文档，请看 Documentation/kbuild/kconfig-language.txt。
+要查看配置文件的完整文档，请看 Documentation/kbuild/kconfig-language.rst。
 
 
 11) 数据结构
diff --git a/Documentation/translations/zh_CN/process/submit-checklist.rst b/Documentation/translations/zh_CN/process/submit-checklist.rst
index 89061aa8fdbe..f4785d2b0491 100644
--- a/Documentation/translations/zh_CN/process/submit-checklist.rst
+++ b/Documentation/translations/zh_CN/process/submit-checklist.rst
@@ -38,7 +38,7 @@ Linux内核补丁提交清单
    违规行为。
 
 6) 任何新的或修改过的 ``CONFIG`` 选项都不会弄脏配置菜单，并默认为关闭，除非
-   它们符合 ``Documentation/kbuild/kconfig-language.txt`` 中记录的异常条件,
+   它们符合 ``Documentation/kbuild/kconfig-language.rst`` 中记录的异常条件,
    菜单属性：默认值.
 
 7) 所有新的 ``kconfig`` 选项都有帮助文本。
diff --git a/Kconfig b/Kconfig
index 990b0c390dfc..e10b3ee084d4 100644
--- a/Kconfig
+++ b/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 mainmenu "Linux/$(ARCH) $(KERNELVERSION) Kernel Configuration"
 
diff --git a/arch/arc/plat-eznps/Kconfig b/arch/arc/plat-eznps/Kconfig
index 2eaecfb063a7..a376a50d3fea 100644
--- a/arch/arc/plat-eznps/Kconfig
+++ b/arch/arc/plat-eznps/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 menuconfig ARC_PLAT_EZNPS
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index eeb0471268a0..c5e6b70e1510 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 config C6X
diff --git a/arch/microblaze/Kconfig.debug b/arch/microblaze/Kconfig.debug
index 3a343188d86c..865527ac332a 100644
--- a/arch/microblaze/Kconfig.debug
+++ b/arch/microblaze/Kconfig.debug
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 
 config TRACE_IRQFLAGS_SUPPORT
 	def_bool y
diff --git a/arch/microblaze/Kconfig.platform b/arch/microblaze/Kconfig.platform
index 5bf54c1d4f60..7795f90dad86 100644
--- a/arch/microblaze/Kconfig.platform
+++ b/arch/microblaze/Kconfig.platform
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 # Platform selection Kconfig menu for MicroBlaze targets
 #
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 3299e287a477..fd0d0639454f 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 config NDS32
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 7cfb20555b10..bf326f0edd2f 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 config OPENRISC
diff --git a/arch/powerpc/sysdev/Kconfig b/arch/powerpc/sysdev/Kconfig
index e0dbec780fe9..d23288c4abf6 100644
--- a/arch/powerpc/sysdev/Kconfig
+++ b/arch/powerpc/sysdev/Kconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 config PPC4xx_PCI_EXPRESS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 0c4b12205632..be713da93946 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 config 64BIT
diff --git a/drivers/auxdisplay/Kconfig b/drivers/auxdisplay/Kconfig
index c52c738e554a..dd61fdd400f0 100644
--- a/drivers/auxdisplay/Kconfig
+++ b/drivers/auxdisplay/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 # Auxiliary display drivers configuration.
 #
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 9026df923542..35078c6f334a 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 
 menu "Firmware Drivers"
diff --git a/drivers/mtd/devices/Kconfig b/drivers/mtd/devices/Kconfig
index ef0e476b2525..49abbc52457d 100644
--- a/drivers/mtd/devices/Kconfig
+++ b/drivers/mtd/devices/Kconfig
@@ -48,7 +48,7 @@ config MTD_MS02NV
 
 	  If you want to compile this driver as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want),
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 	  The module will be called ms02-nv.
 
 config MTD_DATAFLASH
diff --git a/drivers/net/ethernet/smsc/Kconfig b/drivers/net/ethernet/smsc/Kconfig
index d1b6a78557ec..9e1c3752b200 100644
--- a/drivers/net/ethernet/smsc/Kconfig
+++ b/drivers/net/ethernet/smsc/Kconfig
@@ -49,7 +49,7 @@ config SMC91X
 	  This driver is also available as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want).
 	  The module will be called smc91x.  If you want to compile it as a
-	  module, say M here and read <file:Documentation/kbuild/modules.txt>.
+	  module, say M here and read <file:Documentation/kbuild/modules.rst>.
 
 config PCMCIA_SMC91C92
 	tristate "SMC 91Cxx PCMCIA support"
@@ -86,7 +86,7 @@ config SMC911X
 
 	  This driver is also available as a module. The module will be
 	  called smc911x.  If you want to compile it as a module, say M
-	  here and read <file:Documentation/kbuild/modules.txt>
+	  here and read <file:Documentation/kbuild/modules.rst>
 
 config SMSC911X
 	tristate "SMSC LAN911x/LAN921x families embedded ethernet support"
@@ -121,6 +121,6 @@ config SMSC9420
 
 	  This driver is also available as a module. The module will be
 	  called smsc9420.  If you want to compile it as a module, say M
-	  here and read <file:Documentation/kbuild/modules.txt>
+	  here and read <file:Documentation/kbuild/modules.rst>
 
 endif # NET_VENDOR_SMSC
diff --git a/drivers/net/wireless/intel/iwlegacy/Kconfig b/drivers/net/wireless/intel/iwlegacy/Kconfig
index aa01c83e0060..e329fd7b09c0 100644
--- a/drivers/net/wireless/intel/iwlegacy/Kconfig
+++ b/drivers/net/wireless/intel/iwlegacy/Kconfig
@@ -32,7 +32,7 @@ config IWL4965
 
 	  If you want to compile the driver as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want),
-	  say M here and read <file:Documentation/kbuild/modules.txt>.  The
+	  say M here and read <file:Documentation/kbuild/modules.rst>.  The
 	  module will be called iwl4965.
 
 config IWL3945
@@ -58,7 +58,7 @@ config IWL3945
 
 	  If you want to compile the driver as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want),
-	  say M here and read <file:Documentation/kbuild/modules.txt>.  The
+	  say M here and read <file:Documentation/kbuild/modules.rst>.  The
 	  module will be called iwl3945.
 
 menu "iwl3945 / iwl4965 Debugging Options"
diff --git a/drivers/net/wireless/intel/iwlwifi/Kconfig b/drivers/net/wireless/intel/iwlwifi/Kconfig
index e5528189163f..235349a33a3c 100644
--- a/drivers/net/wireless/intel/iwlwifi/Kconfig
+++ b/drivers/net/wireless/intel/iwlwifi/Kconfig
@@ -40,7 +40,7 @@ config IWLWIFI
 
 	  If you want to compile the driver as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want),
-	  say M here and read <file:Documentation/kbuild/modules.txt>.  The
+	  say M here and read <file:Documentation/kbuild/modules.rst>.  The
 	  module will be called iwlwifi.
 
 if IWLWIFI
diff --git a/drivers/parport/Kconfig b/drivers/parport/Kconfig
index 24189c3399e0..1791830e7a71 100644
--- a/drivers/parport/Kconfig
+++ b/drivers/parport/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 #
 # For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
 #
 # Parport configuration.
 #
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 61da513fc0ed..f31b6b780eaf 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -183,7 +183,7 @@ config CHR_DEV_SCH
 	
 	  If you want to compile this as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want),
-	  say M here and read <file:Documentation/kbuild/modules.txt> and
+	  say M here and read <file:Documentation/kbuild/modules.rst> and
 	  <file:Documentation/scsi/scsi.txt>. The module will be called ch.o.
 	  If unsure, say N.
 
@@ -1474,7 +1474,7 @@ config ZFCP
 
           This driver is also available as a module. This module will be
           called zfcp. If you want to compile it as a module, say M here
-          and read <file:Documentation/kbuild/modules.txt>.
+          and read <file:Documentation/kbuild/modules.rst>.
 
 config SCSI_PMCRAID
 	tristate "PMC SIERRA Linux MaxRAID adapter support"
diff --git a/drivers/staging/sm750fb/Kconfig b/drivers/staging/sm750fb/Kconfig
index fb5a086bf9b1..8c0d8a873d5b 100644
--- a/drivers/staging/sm750fb/Kconfig
+++ b/drivers/staging/sm750fb/Kconfig
@@ -12,4 +12,4 @@ config FB_SM750
 
 	  This driver is also available as a module. The module will be
 	  called sm750fb. If you want to compile it as a module, say M
-	  here and read <file:Documentation/kbuild/modules.txt>.
+	  here and read <file:Documentation/kbuild/modules.rst>.
diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig
index c97f270338bf..4a88e1ca25c0 100644
--- a/drivers/usb/misc/Kconfig
+++ b/drivers/usb/misc/Kconfig
@@ -16,7 +16,7 @@ config USB_EMI62
 	  This code is also available as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you want).
 	  The module will be called audio. If you want to compile it as a
-	  module, say M here and read <file:Documentation/kbuild/modules.txt>.
+	  module, say M here and read <file:Documentation/kbuild/modules.rst>.
 
 config USB_EMI26
 	tristate "EMI 2|6 USB Audio interface support"
@@ -67,7 +67,7 @@ config USB_LEGOTOWER
 	  inserted in and removed from the running kernel whenever you want).
 	  The module will be called legousbtower. If you want to compile it as
 	  a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.
+	  <file:Documentation/kbuild/modules.rst>.
 
 config USB_LCD
 	tristate "USB LCD driver support"
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 737b86328c9e..31ba91cb916a 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -289,7 +289,7 @@ config FB_ARMCLCD
 
 	  If you want to compile this as a module (=code which can be
 	  inserted into and removed from the running kernel), say M
-	  here and read <file:Documentation/kbuild/modules.txt>.  The module
+	  here and read <file:Documentation/kbuild/modules.rst>.  The module
 	  will be called amba-clcd.
 
 config FB_ACORN
@@ -1752,7 +1752,7 @@ config FB_PXA
 	  This driver is also available as a module ( = code which can be
 	  inserted and removed from the running kernel whenever you want). The
 	  module will be called pxafb. If you want to compile it as a module,
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If unsure, say N.
 
@@ -1833,7 +1833,7 @@ config FB_W100
 	  This driver is also available as a module ( = code which can be
 	  inserted and removed from the running kernel whenever you want). The
 	  module will be called w100fb. If you want to compile it as a module,
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If unsure, say N.
 
@@ -1862,7 +1862,7 @@ config FB_TMIO
 	  This driver is also available as a module ( = code which can be
 	  inserted and removed from the running kernel whenever you want). The
 	  module will be called tmiofb. If you want to compile it as a module,
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If unsure, say N.
 
@@ -1908,7 +1908,7 @@ config FB_S3C2410
 	  This driver is also available as a module ( = code which can be
 	  inserted and removed from the running kernel whenever you want). The
 	  module will be called s3c2410fb. If you want to compile it as a module,
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If unsure, say N.
 config FB_S3C2410_DEBUG
@@ -1945,7 +1945,7 @@ config FB_SM501
 	  This driver is also available as a module ( = code which can be
 	  inserted and removed from the running kernel whenever you want). The
 	  module will be called sm501fb. If you want to compile it as a module,
-	  say M here and read <file:Documentation/kbuild/modules.txt>.
+	  say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If unsure, say N.
 
@@ -2288,7 +2288,7 @@ config FB_SM712
 
 	  This driver is also available as a module. The module will be
 	  called sm712fb. If you want to compile it as a module, say M
-	  here and read <file:Documentation/kbuild/modules.txt>.
+	  here and read <file:Documentation/kbuild/modules.rst>.
 
 source "drivers/video/fbdev/omap/Kconfig"
 source "drivers/video/fbdev/omap2/Kconfig"
diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index c3ad90c43801..36a98d36d339 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -114,7 +114,7 @@ config BRIDGE_EBT_LIMIT
 	  equivalent of the iptables limit match.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config BRIDGE_EBT_MARK
 	tristate "ebt: mark filter support"
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 3e6494269501..69e76d677f9e 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -308,7 +308,7 @@ config IP_NF_RAW
 	  and OUTPUT chains.
 	
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 # security table for MAC policy
 config IP_NF_SECURITY
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index f7c6f5be9f76..6120a7800975 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -241,7 +241,7 @@ config IP6_NF_RAW
 	  and OUTPUT chains.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 # security table for MAC policy
 config IP6_NF_SECURITY
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 21025c2c605b..dd2af7be3eea 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1056,7 +1056,7 @@ config NETFILTER_XT_TARGET_TRACE
 	  the tables, chains, rules.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_TARGET_SECMARK
 	tristate '"SECMARK" target support'
@@ -1115,7 +1115,7 @@ config NETFILTER_XT_MATCH_ADDRTYPE
 	  eg. UNICAST, LOCAL, BROADCAST, ...
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_BPF
 	tristate '"bpf" match support'
@@ -1160,7 +1160,7 @@ config NETFILTER_XT_MATCH_COMMENT
 	  comments in your iptables ruleset.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_CONNBYTES
 	tristate  '"connbytes" per-connection counter match support'
@@ -1171,7 +1171,7 @@ config NETFILTER_XT_MATCH_CONNBYTES
 	  number of bytes and/or packets for each direction within a connection.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_CONNLABEL
 	tristate '"connlabel" match support'
@@ -1237,7 +1237,7 @@ config NETFILTER_XT_MATCH_DCCP
 	  and DCCP flags.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_DEVGROUP
 	tristate '"devgroup" match support'
@@ -1473,7 +1473,7 @@ config NETFILTER_XT_MATCH_QUOTA
 	  byte counter.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_RATEEST
 	tristate '"rateest" match support'
@@ -1497,7 +1497,7 @@ config NETFILTER_XT_MATCH_REALM
 	  in tc world.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_RECENT
 	tristate '"recent" match support'
@@ -1519,7 +1519,7 @@ config NETFILTER_XT_MATCH_SCTP
 	  and SCTP chunk types.
 
 	  If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
+	  <file:Documentation/kbuild/modules.rst>.  If unsure, say `N'.
 
 config NETFILTER_XT_MATCH_SOCKET
 	tristate '"socket" match support'
diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig
index b93bb7bdb04a..b83e16ade4d2 100644
--- a/net/tipc/Kconfig
+++ b/net/tipc/Kconfig
@@ -17,7 +17,7 @@ menuconfig TIPC
 	  This protocol support is also available as a module ( = code which
 	  can be inserted in and removed from the running kernel whenever you
 	  want). The module will be called tipc. If you want to compile it
-	  as a module, say M here and read <file:Documentation/kbuild/modules.txt>.
+	  as a module, say M here and read <file:Documentation/kbuild/modules.rst>.
 
 	  If in doubt, say N.
 
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index f641bb0aa63f..ee58cde8ee3b 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -68,7 +68,7 @@ endef
 
 ######
 # gcc support functions
-# See documentation in Documentation/kbuild/makefiles.txt
+# See documentation in Documentation/kbuild/makefiles.rst
 
 # cc-cross-prefix
 # Usage: CROSS_COMPILE := $(call cc-cross-prefix, m68k-linux-gnu- m68k-linux-)
@@ -210,7 +210,7 @@ objectify = $(foreach o,$(1),$(if $(filter /%,$(o)),$(o),$(obj)/$(o)))
 # if_changed_dep  - as if_changed, but uses fixdep to reveal dependencies
 #                   including used config symbols
 # if_changed_rule - as if_changed but execute rule instead
-# See Documentation/kbuild/makefiles.txt for more info
+# See Documentation/kbuild/makefiles.rst for more info
 
 ifneq ($(KBUILD_NOCMDDEP),1)
 # Check if both arguments are the same including their order. Result is empty
diff --git a/scripts/Makefile.host b/scripts/Makefile.host
index b6a54bdf0965..a316d368b697 100644
--- a/scripts/Makefile.host
+++ b/scripts/Makefile.host
@@ -6,7 +6,7 @@
 #
 # Both C and C++ are supported, but preferred language is C for such utilities.
 #
-# Sample syntax (see Documentation/kbuild/makefiles.txt for reference)
+# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
 # hostprogs-y := bin2hex
 # Will compile bin2hex.c and create an executable named bin2hex
 #
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c
index 1f9266dadedf..09fd6fa18e1a 100644
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -1114,7 +1114,7 @@ static void sym_check_print_recursive(struct symbol *last_sym)
 	}
 
 	fprintf(stderr,
-		"For a resolution refer to Documentation/kbuild/kconfig-language.txt\n"
+		"For a resolution refer to Documentation/kbuild/kconfig-language.rst\n"
 		"subsection \"Kconfig recursive dependency limitations\"\n"
 		"\n");
 
diff --git a/scripts/kconfig/tests/err_recursive_dep/expected_stderr b/scripts/kconfig/tests/err_recursive_dep/expected_stderr
index 84679b104655..c9f4abf9a791 100644
--- a/scripts/kconfig/tests/err_recursive_dep/expected_stderr
+++ b/scripts/kconfig/tests/err_recursive_dep/expected_stderr
@@ -1,38 +1,38 @@
 Kconfig:11:error: recursive dependency detected!
 Kconfig:11:	symbol B is selected by B
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:5:error: recursive dependency detected!
 Kconfig:5:	symbol A depends on A
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:17:error: recursive dependency detected!
 Kconfig:17:	symbol C1 depends on C2
 Kconfig:21:	symbol C2 depends on C1
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:32:error: recursive dependency detected!
 Kconfig:32:	symbol D2 is selected by D1
 Kconfig:27:	symbol D1 depends on D2
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:37:error: recursive dependency detected!
 Kconfig:37:	symbol E1 depends on E2
 Kconfig:42:	symbol E2 is implied by E1
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:60:error: recursive dependency detected!
 Kconfig:60:	symbol G depends on G
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
 
 Kconfig:51:error: recursive dependency detected!
 Kconfig:51:	symbol F2 depends on F1
 Kconfig:49:	symbol F1 default value contains F2
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
 subsection "Kconfig recursive dependency limitations"
diff --git a/sound/oss/dmasound/Kconfig b/sound/oss/dmasound/Kconfig
index 12e42165b4a5..1a3339859840 100644
--- a/sound/oss/dmasound/Kconfig
+++ b/sound/oss/dmasound/Kconfig
@@ -11,7 +11,7 @@ config DMASOUND_ATARI
 	  This driver is also available as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you
 	  want). If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.
+	  <file:Documentation/kbuild/modules.rst>.
 
 config DMASOUND_PAULA
 	tristate "Amiga DMA sound support"
@@ -25,7 +25,7 @@ config DMASOUND_PAULA
 	  This driver is also available as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you
 	  want). If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.
+	  <file:Documentation/kbuild/modules.rst>.
 
 config DMASOUND_Q40
 	tristate "Q40 sound support"
@@ -39,7 +39,7 @@ config DMASOUND_Q40
 	  This driver is also available as a module ( = code which can be
 	  inserted in and removed from the running kernel whenever you
 	  want). If you want to compile it as a module, say M here and read
-	  <file:Documentation/kbuild/modules.txt>.
+	  <file:Documentation/kbuild/modules.rst>.
 
 config DMASOUND
 	tristate
-- 
cgit v1.2.3-59-g8ed1b


From d67297ad343ec02a88f947b45526c92d2870aed3 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:49 -0300
Subject: docs: kdump: convert docs to ReST and rename to *.rst

Convert kdump documentation to ReST and add it to the
user faced manual, as the documents are mainly focused on
sysadmins that would be enabling kdump.

Note: the vmcoreinfo.rst has one very long title on one of its
sub-sections:

	PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)

I opted to break this one, into two entries with the same content,
in order to make it easier to display after being parsed in html and PDF.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/bug-hunting.rst         |   2 +-
 Documentation/admin-guide/kernel-parameters.txt   |   6 +-
 Documentation/kdump/index.rst                     |  21 +
 Documentation/kdump/kdump.rst                     | 534 ++++++++++++++++++++++
 Documentation/kdump/kdump.txt                     | 509 ---------------------
 Documentation/kdump/vmcoreinfo.rst                | 488 ++++++++++++++++++++
 Documentation/kdump/vmcoreinfo.txt                | 495 --------------------
 Documentation/powerpc/firmware-assisted-dump.txt  |   2 +-
 Documentation/translations/zh_CN/oops-tracing.txt |   2 +-
 Documentation/watchdog/hpwdt.txt                  |   2 +-
 arch/arm/Kconfig                                  |   2 +-
 arch/arm64/Kconfig                                |   2 +-
 arch/sh/Kconfig                                   |   2 +-
 arch/x86/Kconfig                                  |   4 +-
 14 files changed, 1055 insertions(+), 1016 deletions(-)
 create mode 100644 Documentation/kdump/index.rst
 create mode 100644 Documentation/kdump/kdump.rst
 delete mode 100644 Documentation/kdump/kdump.txt
 create mode 100644 Documentation/kdump/vmcoreinfo.rst
 delete mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index f278b289e260..b761aa2a51d2 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -90,7 +90,7 @@ the disk is not available then you have three options:
     run a null modem to a second machine and capture the output there
     using your favourite communication program.  Minicom works well.
 
-(3) Use Kdump (see Documentation/kdump/kdump.txt),
+(3) Use Kdump (see Documentation/kdump/kdump.rst),
     extract the kernel ring buffer from old memory with using dmesg
     gdbmacro in Documentation/kdump/gdbmacros.txt.
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 81c168b25b20..2148fd289851 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -708,14 +708,14 @@
 			[KNL, x86_64] select a region under 4G first, and
 			fall back to reserve region above 4G when '@offset'
 			hasn't been specified.
-			See Documentation/kdump/kdump.txt for further details.
+			See Documentation/kdump/kdump.rst for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
 			in the running system. The syntax of range is
 			start-[end] where start and end are both
 			a memory unit (amount[KMG]). See also
-			Documentation/kdump/kdump.txt for an example.
+			Documentation/kdump/kdump.rst for an example.
 
 	crashkernel=size[KMG],high
 			[KNL, x86_64] range could be above 4G. Allow kernel
@@ -1209,7 +1209,7 @@
 			Specifies physical address of start of kernel core
 			image elf header and optionally the size. Generally
 			kexec loader will pass this option to capture kernel.
-			See Documentation/kdump/kdump.txt for details.
+			See Documentation/kdump/kdump.rst for details.
 
 	enable_mtrr_cleanup [X86]
 			The kernel tries to adjust MTRR layout from continuous
diff --git a/Documentation/kdump/index.rst b/Documentation/kdump/index.rst
new file mode 100644
index 000000000000..2b17fcf6867a
--- /dev/null
+++ b/Documentation/kdump/index.rst
@@ -0,0 +1,21 @@
+:orphan:
+
+================================================================
+Documentation for Kdump - The kexec-based Crash Dumping Solution
+================================================================
+
+This document includes overview, setup and installation, and analysis
+information.
+
+.. toctree::
+    :maxdepth: 1
+
+    kdump
+    vmcoreinfo
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/kdump/kdump.rst b/Documentation/kdump/kdump.rst
new file mode 100644
index 000000000000..ac7e131d2935
--- /dev/null
+++ b/Documentation/kdump/kdump.rst
@@ -0,0 +1,534 @@
+================================================================
+Documentation for Kdump - The kexec-based Crash Dumping Solution
+================================================================
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+========
+
+Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
+dump of the system kernel's memory needs to be taken (for example, when
+the system panics). The system kernel's memory image is preserved across
+the reboot and is accessible to the dump-capture kernel.
+
+You can use common commands, such as cp and scp, to copy the
+memory image to a dump file on the local disk, or across the network to
+a remote system.
+
+Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
+s390x, arm and arm64 architectures.
+
+When the system kernel boots, it reserves a small section of memory for
+the dump-capture kernel. This ensures that ongoing Direct Memory Access
+(DMA) from the system kernel does not corrupt the dump-capture kernel.
+The kexec -p command loads the dump-capture kernel into this reserved
+memory.
+
+On x86 machines, the first 640 KB of physical memory is needed to boot,
+regardless of where the kernel loads. Therefore, kexec backs up this
+region just before rebooting into the dump-capture kernel.
+
+Similarly on PPC64 machines first 32KB of physical memory is needed for
+booting regardless of where the kernel is loaded and to support 64K page
+size kexec backs up the first 64KB memory.
+
+For s390x, when kdump is triggered, the crashkernel region is exchanged
+with the region [0, crashkernel region size] and then the kdump kernel
+runs in [0, crashkernel region size]. Therefore no relocatable kernel is
+needed for s390x.
+
+All of the necessary information about the system kernel's core image is
+encoded in the ELF format, and stored in a reserved area of memory
+before a crash. The physical address of the start of the ELF header is
+passed to the dump-capture kernel through the elfcorehdr= boot
+parameter. Optionally the size of the ELF header can also be passed
+when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax.
+
+
+With the dump-capture kernel, you can access the memory image through
+/proc/vmcore. This exports the dump as an ELF-format file that you can
+write out using file copy commands such as cp or scp. Further, you can
+use analysis tools such as the GNU Debugger (GDB) and the Crash tool to
+debug the dump file. This method ensures that the dump pages are correctly
+ordered.
+
+
+Setup and Installation
+======================
+
+Install kexec-tools
+-------------------
+
+1) Login as the root user.
+
+2) Download the kexec-tools user-space package from the following URL:
+
+http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
+
+This is a symlink to the latest version.
+
+The latest kexec-tools git tree is available at:
+
+- git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
+- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
+
+There is also a gitweb interface available at
+http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
+
+More information about kexec-tools can be found at
+http://horms.net/projects/kexec/
+
+3) Unpack the tarball with the tar command, as follows::
+
+	tar xvpzf kexec-tools.tar.gz
+
+4) Change to the kexec-tools directory, as follows::
+
+	cd kexec-tools-VERSION
+
+5) Configure the package, as follows::
+
+	./configure
+
+6) Compile the package, as follows::
+
+	make
+
+7) Install the package, as follows::
+
+	make install
+
+
+Build the system and dump-capture kernels
+-----------------------------------------
+There are two possible methods of using Kdump.
+
+1) Build a separate custom dump-capture kernel for capturing the
+   kernel core dump.
+
+2) Or use the system kernel binary itself as dump-capture kernel and there is
+   no need to build a separate dump-capture kernel. This is possible
+   only with the architectures which support a relocatable kernel. As
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.
+
+Building a relocatable kernel is advantageous from the point of view that
+one does not have to build a second kernel for capturing the dump. But
+at the same time one might want to build a custom dump capture kernel
+suitable to his needs.
+
+Following are the configuration setting required for system and
+dump-capture kernels for enabling kdump support.
+
+System kernel config options
+----------------------------
+
+1) Enable "kexec system call" in "Processor type and features."::
+
+	CONFIG_KEXEC=y
+
+2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
+   filesystems." This is usually enabled by default::
+
+	CONFIG_SYSFS=y
+
+   Note that "sysfs file system support" might not appear in the "Pseudo
+   filesystems" menu if "Configure standard kernel features (for small
+   systems)" is not enabled in "General Setup." In this case, check the
+   .config file itself to ensure that sysfs is turned on, as follows::
+
+	grep 'CONFIG_SYSFS' .config
+
+3) Enable "Compile the kernel with debug info" in "Kernel hacking."::
+
+	CONFIG_DEBUG_INFO=Y
+
+   This causes the kernel to be built with debug symbols. The dump
+   analysis tools require a vmlinux with debug symbols in order to read
+   and analyze a dump file.
+
+Dump-capture kernel config options (Arch Independent)
+-----------------------------------------------------
+
+1) Enable "kernel crash dumps" support under "Processor type and
+   features"::
+
+	CONFIG_CRASH_DUMP=y
+
+2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems"::
+
+	CONFIG_PROC_VMCORE=y
+
+   (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
+
+Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
+--------------------------------------------------------------------
+
+1) On i386, enable high memory support under "Processor type and
+   features"::
+
+	CONFIG_HIGHMEM64G=y
+
+   or::
+
+	CONFIG_HIGHMEM4G
+
+2) On i386 and x86_64, disable symmetric multi-processing support
+   under "Processor type and features"::
+
+	CONFIG_SMP=n
+
+   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
+   when loading the dump-capture kernel, see section "Load the Dump-capture
+   Kernel".)
+
+3) If one wants to build and use a relocatable kernel,
+   Enable "Build a relocatable kernel" support under "Processor type and
+   features"::
+
+	CONFIG_RELOCATABLE=y
+
+4) Use a suitable value for "Physical address where the kernel is
+   loaded" (under "Processor type and features"). This only appears when
+   "kernel crash dumps" is enabled. A suitable value depends upon
+   whether kernel is relocatable or not.
+
+   If you are using a relocatable kernel use CONFIG_PHYSICAL_START=0x100000
+   This will compile the kernel for physical address 1MB, but given the fact
+   kernel is relocatable, it can be run from any physical address hence
+   kexec boot loader will load it in memory region reserved for dump-capture
+   kernel.
+
+   Otherwise it should be the start of memory region reserved for
+   second kernel using boot parameter "crashkernel=Y@X". Here X is
+   start of memory region reserved for dump-capture kernel.
+   Generally X is 16MB (0x1000000). So you can set
+   CONFIG_PHYSICAL_START=0x1000000
+
+5) Make and install the kernel and its modules. DO NOT add this kernel
+   to the boot loader configuration files.
+
+Dump-capture kernel config options (Arch Dependent, ppc64)
+----------------------------------------------------------
+
+1) Enable "Build a kdump crash kernel" support under "Kernel" options::
+
+	CONFIG_CRASH_DUMP=y
+
+2)   Enable "Build a relocatable kernel" support::
+
+	CONFIG_RELOCATABLE=y
+
+   Make and install the kernel and its modules.
+
+Dump-capture kernel config options (Arch Dependent, ia64)
+----------------------------------------------------------
+
+- No specific options are required to create a dump-capture kernel
+  for ia64, other than those specified in the arch independent section
+  above. This means that it is possible to use the system kernel
+  as a dump-capture kernel if desired.
+
+  The crashkernel region can be automatically placed by the system
+  kernel at run time. This is done by specifying the base address as 0,
+  or omitting it all together::
+
+	crashkernel=256M@0
+
+  or::
+
+	crashkernel=256M
+
+  If the start address is specified, note that the start address of the
+  kernel will be aligned to 64Mb, so if the start address is not then
+  any space below the alignment point will be wasted.
+
+Dump-capture kernel config options (Arch Dependent, arm)
+----------------------------------------------------------
+
+-   To use a relocatable kernel,
+    Enable "AUTO_ZRELADDR" support under "Boot" options::
+
+	AUTO_ZRELADDR=y
+
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  will not be reset to EL2 on panic.
+
+Extended crashkernel syntax
+===========================
+
+While the "crashkernel=size[@offset]" syntax is sufficient for most
+configurations, sometimes it's handy to have the reserved memory dependent
+on the value of System RAM -- that's mostly for distributors that pre-setup
+the kernel command line to avoid a unbootable system after some memory has
+been removed from the machine.
+
+The syntax is::
+
+    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+    range=start-[end]
+
+For example::
+
+    crashkernel=512M-2G:64M,2G-:128M
+
+This would mean:
+
+    1) if the RAM is smaller than 512M, then don't reserve anything
+       (this is the "rescue" case)
+    2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
+    3) if the RAM size is larger than 2G, then reserve 128M
+
+
+
+Boot into System Kernel
+=======================
+
+1) Update the boot loader (such as grub, yaboot, or lilo) configuration
+   files as necessary.
+
+2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
+   where Y specifies how much memory to reserve for the dump-capture kernel
+   and X specifies the beginning of this reserved memory. For example,
+   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
+   starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
+
+   On x86 and x86_64, use "crashkernel=64M@16M".
+
+   On ppc64, use "crashkernel=128M@32M".
+
+   On ia64, 256M@256M is a generous value that typically works.
+   The region may be automatically placed on ia64, see the
+   dump-capture kernel config option notes above.
+   If use sparse memory, the size should be rounded to GRANULE boundaries.
+
+   On s390x, typically use "crashkernel=xxM". The value of xx is dependent
+   on the memory consumption of the kdump system. In general this is not
+   dependent on the memory size of the production system.
+
+   On arm, the use of "crashkernel=Y@X" is no longer necessary; the
+   kernel will automatically locate the crash kernel image within the
+   first 512MB of RAM if X is not given.
+
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+
+Load the Dump-capture Kernel
+============================
+
+After booting to the system kernel, dump-capture kernel needs to be
+loaded.
+
+Based on the architecture and type of image (relocatable or not), one
+can choose to load the uncompressed vmlinux or compressed bzImage/vmlinuz
+of dump-capture kernel. Following is the summary.
+
+For i386 and x86_64:
+
+	- Use vmlinux if kernel is not relocatable.
+	- Use bzImage/vmlinuz if kernel is relocatable.
+
+For ppc64:
+
+	- Use vmlinux
+
+For ia64:
+
+	- Use vmlinux or vmlinuz.gz
+
+For s390x:
+
+	- Use image or bzImage
+
+For arm:
+
+	- Use zImage
+
+For arm64:
+
+	- Use vmlinux or Image
+
+If you are using an uncompressed vmlinux image then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-vmlinux-image> \
+   --initrd=<initrd-for-dump-capture-kernel> --args-linux \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using a compressed bzImage/vmlinuz, then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-bzImage> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using a compressed zImage, then use following command
+to load dump-capture kernel::
+
+   kexec --type zImage -p <dump-capture-kernel-bzImage> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --dtb=<dtb-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-Image> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+Please note, that --args-linux does not need to be specified for ia64.
+It is planned to make this a no-op on that architecture, but for now
+it should be omitted
+
+Following are the arch specific command line options to be used while
+loading dump-capture kernel.
+
+For i386, x86_64 and ia64:
+
+	"1 irqpoll maxcpus=1 reset_devices"
+
+For ppc64:
+
+	"1 maxcpus=1 noirqdistrib reset_devices"
+
+For s390x:
+
+	"1 maxcpus=1 cgroup_disable=memory"
+
+For arm:
+
+	"1 maxcpus=1 reset_devices"
+
+For arm64:
+
+	"1 maxcpus=1 reset_devices"
+
+Notes on loading the dump-capture kernel:
+
+* By default, the ELF headers are stored in ELF64 format to support
+  systems with more than 4GB memory. On i386, kexec automatically checks if
+  the physical RAM size exceeds the 4 GB limit and if not, uses ELF32.
+  So, on non-PAE systems, ELF32 is always used.
+
+  The --elf32-core-headers option can be used to force the generation of ELF32
+  headers. This is necessary because GDB currently cannot open vmcore files
+  with ELF64 headers on 32-bit systems.
+
+* The "irqpoll" boot parameter reduces driver initialization failures
+  due to shared interrupts in the dump-capture kernel.
+
+* You must specify <root-dev> in the format corresponding to the root
+  device name in the output of mount command.
+
+* Boot parameter "1" boots the dump-capture kernel into single-user
+  mode without networking. If you want networking, use "3".
+
+* We generally don't have to bring up a SMP kernel just to capture the
+  dump. Hence generally it is useful either to build a UP dump-capture
+  kernel or specify maxcpus=1 option while loading dump-capture kernel.
+  Note, though maxcpus always works, you had better replace it with
+  nr_cpus to save memory if supported by the current ARCH, such as x86.
+
+* You should enable multi-cpu support in dump-capture kernel if you intend
+  to use multi-thread programs with it, such as parallel dump feature of
+  makedumpfile. Otherwise, the multi-thread program may have a great
+  performance degradation. To enable multi-cpu support, you should bring up an
+  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
+  options while loading it.
+
+* For s390x there are two kdump modes: If a ELF header is specified with
+  the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
+  is done on all other architectures. If no elfcorehdr= kernel parameter is
+  specified, the s390x kdump kernel dynamically creates the header. The
+  second mode has the advantage that for CPU and memory hotplug, kdump has
+  not to be reloaded with kexec_load().
+
+* For s390x systems with many attached devices the "cio_ignore" kernel
+  parameter should be used for the kdump kernel in order to prevent allocation
+  of kernel memory for devices that are not relevant for kdump. The same
+  applies to systems that use SCSI/FCP devices. In that case the
+  "allow_lun_scan" zfcp module parameter should be set to zero before
+  setting FCP devices online.
+
+Kernel Panic
+============
+
+After successfully loading the dump-capture kernel as previously
+described, the system will reboot into the dump-capture kernel if a
+system crash is triggered.  Trigger points are located in panic(),
+die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
+
+The following conditions will execute a crash trigger point:
+
+If a hard lockup is detected and "NMI watchdog" is configured, the system
+will boot into the dump-capture kernel ( die_nmi() ).
+
+If die() is called, and it happens to be a thread with pid 0 or 1, or die()
+is called inside interrupt context or die() is called and panic_on_oops is set,
+the system will boot into the dump-capture kernel.
+
+On powerpc systems when a soft-reset is generated, die() is called by all cpus
+and the system will boot into the dump-capture kernel.
+
+For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
+"echo c > /proc/sysrq-trigger" or write a module to force the panic.
+
+Write Out the Dump File
+=======================
+
+After the dump-capture kernel is booted, write out the dump file with
+the following command::
+
+   cp /proc/vmcore <dump-file>
+
+
+Analysis
+========
+
+Before analyzing the dump image, you should reboot into a stable kernel.
+
+You can do limited analysis using GDB on the dump file copied out of
+/proc/vmcore. Use the debug vmlinux built with -g and run the following
+command::
+
+   gdb vmlinux <dump-file>
+
+Stack trace for the task on processor 0, register display, and memory
+display work fine.
+
+Note: GDB cannot analyze core files generated in ELF64 format for x86.
+On systems with a maximum of 4GB of memory, you can generate
+ELF32-format headers using the --elf32-core-headers kernel option on the
+dump kernel.
+
+You can also use the Crash utility to analyze dump files in Kdump
+format. Crash is available on Dave Anderson's site at the following URL:
+
+   http://people.redhat.com/~anderson/
+
+Trigger Kdump on WARN()
+=======================
+
+The kernel parameter, panic_on_warn, calls panic() in all WARN() paths.  This
+will cause a kdump to occur at the panic() call.  In cases where a user wants
+to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
+to achieve the same behaviour.
+
+Contact
+=======
+
+- Vivek Goyal (vgoyal@redhat.com)
+- Maneesh Soni (maneesh@in.ibm.com)
+
+GDB macros
+==========
+
+.. include:: gdbmacros.txt
+   :literal:
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
deleted file mode 100644
index 3162eeb8c262..000000000000
--- a/Documentation/kdump/kdump.txt
+++ /dev/null
@@ -1,509 +0,0 @@
-================================================================
-Documentation for Kdump - The kexec-based Crash Dumping Solution
-================================================================
-
-This document includes overview, setup and installation, and analysis
-information.
-
-Overview
-========
-
-Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
-dump of the system kernel's memory needs to be taken (for example, when
-the system panics). The system kernel's memory image is preserved across
-the reboot and is accessible to the dump-capture kernel.
-
-You can use common commands, such as cp and scp, to copy the
-memory image to a dump file on the local disk, or across the network to
-a remote system.
-
-Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x, arm and arm64 architectures.
-
-When the system kernel boots, it reserves a small section of memory for
-the dump-capture kernel. This ensures that ongoing Direct Memory Access
-(DMA) from the system kernel does not corrupt the dump-capture kernel.
-The kexec -p command loads the dump-capture kernel into this reserved
-memory.
-
-On x86 machines, the first 640 KB of physical memory is needed to boot,
-regardless of where the kernel loads. Therefore, kexec backs up this
-region just before rebooting into the dump-capture kernel.
-
-Similarly on PPC64 machines first 32KB of physical memory is needed for
-booting regardless of where the kernel is loaded and to support 64K page
-size kexec backs up the first 64KB memory.
-
-For s390x, when kdump is triggered, the crashkernel region is exchanged
-with the region [0, crashkernel region size] and then the kdump kernel
-runs in [0, crashkernel region size]. Therefore no relocatable kernel is
-needed for s390x.
-
-All of the necessary information about the system kernel's core image is
-encoded in the ELF format, and stored in a reserved area of memory
-before a crash. The physical address of the start of the ELF header is
-passed to the dump-capture kernel through the elfcorehdr= boot
-parameter. Optionally the size of the ELF header can also be passed
-when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax.
-
-
-With the dump-capture kernel, you can access the memory image through
-/proc/vmcore. This exports the dump as an ELF-format file that you can
-write out using file copy commands such as cp or scp. Further, you can
-use analysis tools such as the GNU Debugger (GDB) and the Crash tool to
-debug the dump file. This method ensures that the dump pages are correctly
-ordered.
-
-
-Setup and Installation
-======================
-
-Install kexec-tools
--------------------
-
-1) Login as the root user.
-
-2) Download the kexec-tools user-space package from the following URL:
-
-http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
-
-This is a symlink to the latest version.
-
-The latest kexec-tools git tree is available at:
-
-git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
-and
-http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
-
-There is also a gitweb interface available at
-http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
-
-More information about kexec-tools can be found at
-http://horms.net/projects/kexec/
-
-3) Unpack the tarball with the tar command, as follows:
-
-   tar xvpzf kexec-tools.tar.gz
-
-4) Change to the kexec-tools directory, as follows:
-
-   cd kexec-tools-VERSION
-
-5) Configure the package, as follows:
-
-   ./configure
-
-6) Compile the package, as follows:
-
-   make
-
-7) Install the package, as follows:
-
-   make install
-
-
-Build the system and dump-capture kernels
------------------------------------------
-There are two possible methods of using Kdump.
-
-1) Build a separate custom dump-capture kernel for capturing the
-   kernel core dump.
-
-2) Or use the system kernel binary itself as dump-capture kernel and there is
-   no need to build a separate dump-capture kernel. This is possible
-   only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
-   relocatable kernel.
-
-Building a relocatable kernel is advantageous from the point of view that
-one does not have to build a second kernel for capturing the dump. But
-at the same time one might want to build a custom dump capture kernel
-suitable to his needs.
-
-Following are the configuration setting required for system and
-dump-capture kernels for enabling kdump support.
-
-System kernel config options
-----------------------------
-
-1) Enable "kexec system call" in "Processor type and features."
-
-   CONFIG_KEXEC=y
-
-2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
-   filesystems." This is usually enabled by default.
-
-   CONFIG_SYSFS=y
-
-   Note that "sysfs file system support" might not appear in the "Pseudo
-   filesystems" menu if "Configure standard kernel features (for small
-   systems)" is not enabled in "General Setup." In this case, check the
-   .config file itself to ensure that sysfs is turned on, as follows:
-
-   grep 'CONFIG_SYSFS' .config
-
-3) Enable "Compile the kernel with debug info" in "Kernel hacking."
-
-   CONFIG_DEBUG_INFO=Y
-
-   This causes the kernel to be built with debug symbols. The dump
-   analysis tools require a vmlinux with debug symbols in order to read
-   and analyze a dump file.
-
-Dump-capture kernel config options (Arch Independent)
------------------------------------------------------
-
-1) Enable "kernel crash dumps" support under "Processor type and
-   features":
-
-   CONFIG_CRASH_DUMP=y
-
-2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems".
-
-   CONFIG_PROC_VMCORE=y
-   (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
-
-Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
---------------------------------------------------------------------
-
-1) On i386, enable high memory support under "Processor type and
-   features":
-
-   CONFIG_HIGHMEM64G=y
-   or
-   CONFIG_HIGHMEM4G
-
-2) On i386 and x86_64, disable symmetric multi-processing support
-   under "Processor type and features":
-
-   CONFIG_SMP=n
-
-   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
-   when loading the dump-capture kernel, see section "Load the Dump-capture
-   Kernel".)
-
-3) If one wants to build and use a relocatable kernel,
-   Enable "Build a relocatable kernel" support under "Processor type and
-   features"
-
-   CONFIG_RELOCATABLE=y
-
-4) Use a suitable value for "Physical address where the kernel is
-   loaded" (under "Processor type and features"). This only appears when
-   "kernel crash dumps" is enabled. A suitable value depends upon
-   whether kernel is relocatable or not.
-
-   If you are using a relocatable kernel use CONFIG_PHYSICAL_START=0x100000
-   This will compile the kernel for physical address 1MB, but given the fact
-   kernel is relocatable, it can be run from any physical address hence
-   kexec boot loader will load it in memory region reserved for dump-capture
-   kernel.
-
-   Otherwise it should be the start of memory region reserved for
-   second kernel using boot parameter "crashkernel=Y@X". Here X is
-   start of memory region reserved for dump-capture kernel.
-   Generally X is 16MB (0x1000000). So you can set
-   CONFIG_PHYSICAL_START=0x1000000
-
-5) Make and install the kernel and its modules. DO NOT add this kernel
-   to the boot loader configuration files.
-
-Dump-capture kernel config options (Arch Dependent, ppc64)
-----------------------------------------------------------
-
-1) Enable "Build a kdump crash kernel" support under "Kernel" options:
-
-   CONFIG_CRASH_DUMP=y
-
-2)   Enable "Build a relocatable kernel" support
-
-   CONFIG_RELOCATABLE=y
-
-   Make and install the kernel and its modules.
-
-Dump-capture kernel config options (Arch Dependent, ia64)
-----------------------------------------------------------
-
-- No specific options are required to create a dump-capture kernel
-  for ia64, other than those specified in the arch independent section
-  above. This means that it is possible to use the system kernel
-  as a dump-capture kernel if desired.
-
-  The crashkernel region can be automatically placed by the system
-  kernel at run time. This is done by specifying the base address as 0,
-  or omitting it all together.
-
-  crashkernel=256M@0
-  or
-  crashkernel=256M
-
-  If the start address is specified, note that the start address of the
-  kernel will be aligned to 64Mb, so if the start address is not then
-  any space below the alignment point will be wasted.
-
-Dump-capture kernel config options (Arch Dependent, arm)
-----------------------------------------------------------
-
--   To use a relocatable kernel,
-    Enable "AUTO_ZRELADDR" support under "Boot" options:
-
-    AUTO_ZRELADDR=y
-
-Dump-capture kernel config options (Arch Dependent, arm64)
-----------------------------------------------------------
-
-- Please note that kvm of the dump-capture kernel will not be enabled
-  on non-VHE systems even if it is configured. This is because the CPU
-  will not be reset to EL2 on panic.
-
-Extended crashkernel syntax
-===========================
-
-While the "crashkernel=size[@offset]" syntax is sufficient for most
-configurations, sometimes it's handy to have the reserved memory dependent
-on the value of System RAM -- that's mostly for distributors that pre-setup
-the kernel command line to avoid a unbootable system after some memory has
-been removed from the machine.
-
-The syntax is:
-
-    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
-    range=start-[end]
-
-For example:
-
-    crashkernel=512M-2G:64M,2G-:128M
-
-This would mean:
-
-    1) if the RAM is smaller than 512M, then don't reserve anything
-       (this is the "rescue" case)
-    2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
-    3) if the RAM size is larger than 2G, then reserve 128M
-
-
-
-Boot into System Kernel
-=======================
-
-1) Update the boot loader (such as grub, yaboot, or lilo) configuration
-   files as necessary.
-
-2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
-   where Y specifies how much memory to reserve for the dump-capture kernel
-   and X specifies the beginning of this reserved memory. For example,
-   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
-   starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
-
-   On x86 and x86_64, use "crashkernel=64M@16M".
-
-   On ppc64, use "crashkernel=128M@32M".
-
-   On ia64, 256M@256M is a generous value that typically works.
-   The region may be automatically placed on ia64, see the
-   dump-capture kernel config option notes above.
-   If use sparse memory, the size should be rounded to GRANULE boundaries.
-
-   On s390x, typically use "crashkernel=xxM". The value of xx is dependent
-   on the memory consumption of the kdump system. In general this is not
-   dependent on the memory size of the production system.
-
-   On arm, the use of "crashkernel=Y@X" is no longer necessary; the
-   kernel will automatically locate the crash kernel image within the
-   first 512MB of RAM if X is not given.
-
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
-
-Load the Dump-capture Kernel
-============================
-
-After booting to the system kernel, dump-capture kernel needs to be
-loaded.
-
-Based on the architecture and type of image (relocatable or not), one
-can choose to load the uncompressed vmlinux or compressed bzImage/vmlinuz
-of dump-capture kernel. Following is the summary.
-
-For i386 and x86_64:
-	- Use vmlinux if kernel is not relocatable.
-	- Use bzImage/vmlinuz if kernel is relocatable.
-For ppc64:
-	- Use vmlinux
-For ia64:
-	- Use vmlinux or vmlinuz.gz
-For s390x:
-	- Use image or bzImage
-For arm:
-	- Use zImage
-For arm64:
-	- Use vmlinux or Image
-
-If you are using an uncompressed vmlinux image then use following command
-to load dump-capture kernel.
-
-   kexec -p <dump-capture-kernel-vmlinux-image> \
-   --initrd=<initrd-for-dump-capture-kernel> --args-linux \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using a compressed bzImage/vmlinuz, then use following command
-to load dump-capture kernel.
-
-   kexec -p <dump-capture-kernel-bzImage> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using a compressed zImage, then use following command
-to load dump-capture kernel.
-
-   kexec --type zImage -p <dump-capture-kernel-bzImage> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --dtb=<dtb-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using an uncompressed Image, then use following command
-to load dump-capture kernel.
-
-   kexec -p <dump-capture-kernel-Image> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-Please note, that --args-linux does not need to be specified for ia64.
-It is planned to make this a no-op on that architecture, but for now
-it should be omitted
-
-Following are the arch specific command line options to be used while
-loading dump-capture kernel.
-
-For i386, x86_64 and ia64:
-	"1 irqpoll maxcpus=1 reset_devices"
-
-For ppc64:
-	"1 maxcpus=1 noirqdistrib reset_devices"
-
-For s390x:
-	"1 maxcpus=1 cgroup_disable=memory"
-
-For arm:
-	"1 maxcpus=1 reset_devices"
-
-For arm64:
-	"1 maxcpus=1 reset_devices"
-
-Notes on loading the dump-capture kernel:
-
-* By default, the ELF headers are stored in ELF64 format to support
-  systems with more than 4GB memory. On i386, kexec automatically checks if
-  the physical RAM size exceeds the 4 GB limit and if not, uses ELF32.
-  So, on non-PAE systems, ELF32 is always used.
-
-  The --elf32-core-headers option can be used to force the generation of ELF32
-  headers. This is necessary because GDB currently cannot open vmcore files
-  with ELF64 headers on 32-bit systems.
-
-* The "irqpoll" boot parameter reduces driver initialization failures
-  due to shared interrupts in the dump-capture kernel.
-
-* You must specify <root-dev> in the format corresponding to the root
-  device name in the output of mount command.
-
-* Boot parameter "1" boots the dump-capture kernel into single-user
-  mode without networking. If you want networking, use "3".
-
-* We generally don't have to bring up a SMP kernel just to capture the
-  dump. Hence generally it is useful either to build a UP dump-capture
-  kernel or specify maxcpus=1 option while loading dump-capture kernel.
-  Note, though maxcpus always works, you had better replace it with
-  nr_cpus to save memory if supported by the current ARCH, such as x86.
-
-* You should enable multi-cpu support in dump-capture kernel if you intend
-  to use multi-thread programs with it, such as parallel dump feature of
-  makedumpfile. Otherwise, the multi-thread program may have a great
-  performance degradation. To enable multi-cpu support, you should bring up an
-  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
-  options while loading it.
-
-* For s390x there are two kdump modes: If a ELF header is specified with
-  the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
-  is done on all other architectures. If no elfcorehdr= kernel parameter is
-  specified, the s390x kdump kernel dynamically creates the header. The
-  second mode has the advantage that for CPU and memory hotplug, kdump has
-  not to be reloaded with kexec_load().
-
-* For s390x systems with many attached devices the "cio_ignore" kernel
-  parameter should be used for the kdump kernel in order to prevent allocation
-  of kernel memory for devices that are not relevant for kdump. The same
-  applies to systems that use SCSI/FCP devices. In that case the
-  "allow_lun_scan" zfcp module parameter should be set to zero before
-  setting FCP devices online.
-
-Kernel Panic
-============
-
-After successfully loading the dump-capture kernel as previously
-described, the system will reboot into the dump-capture kernel if a
-system crash is triggered.  Trigger points are located in panic(),
-die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
-
-The following conditions will execute a crash trigger point:
-
-If a hard lockup is detected and "NMI watchdog" is configured, the system
-will boot into the dump-capture kernel ( die_nmi() ).
-
-If die() is called, and it happens to be a thread with pid 0 or 1, or die()
-is called inside interrupt context or die() is called and panic_on_oops is set,
-the system will boot into the dump-capture kernel.
-
-On powerpc systems when a soft-reset is generated, die() is called by all cpus
-and the system will boot into the dump-capture kernel.
-
-For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
-"echo c > /proc/sysrq-trigger" or write a module to force the panic.
-
-Write Out the Dump File
-=======================
-
-After the dump-capture kernel is booted, write out the dump file with
-the following command:
-
-   cp /proc/vmcore <dump-file>
-
-
-Analysis
-========
-
-Before analyzing the dump image, you should reboot into a stable kernel.
-
-You can do limited analysis using GDB on the dump file copied out of
-/proc/vmcore. Use the debug vmlinux built with -g and run the following
-command:
-
-   gdb vmlinux <dump-file>
-
-Stack trace for the task on processor 0, register display, and memory
-display work fine.
-
-Note: GDB cannot analyze core files generated in ELF64 format for x86.
-On systems with a maximum of 4GB of memory, you can generate
-ELF32-format headers using the --elf32-core-headers kernel option on the
-dump kernel.
-
-You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
-
-   http://people.redhat.com/~anderson/
-
-Trigger Kdump on WARN()
-=======================
-
-The kernel parameter, panic_on_warn, calls panic() in all WARN() paths.  This
-will cause a kdump to occur at the panic() call.  In cases where a user wants
-to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
-to achieve the same behaviour.
-
-Contact
-=======
-
-Vivek Goyal (vgoyal@redhat.com)
-Maneesh Soni (maneesh@in.ibm.com)
-
diff --git a/Documentation/kdump/vmcoreinfo.rst b/Documentation/kdump/vmcoreinfo.rst
new file mode 100644
index 000000000000..007a6b86e0ee
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.rst
@@ -0,0 +1,488 @@
+==========
+VMCOREINFO
+==========
+
+What is it?
+===========
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+Common variables
+================
+
+init_uts_ns.name.release
+------------------------
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built. For example, crash uses it to
+find the corresponding vmlinux in order to process vmcore.
+
+PAGE_SIZE
+---------
+
+The size of a page. It is the smallest unit of data used by the memory
+management facilities. It is usually 4096 bytes of size and a page is
+aligned on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+-----------
+
+The UTS namespace which is used to isolate two specific elements of the
+system that relate to the uname(2) system call. It is named after the
+data structure used to store information returned by the uname(2) system
+call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---------------
+
+An array node_states[N_ONLINE] which represents the set of online nodes
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+--------------
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+------
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--------------
+
+Stores the virtual area list. makedumpfile gets the vmalloc start value
+from this variable and its value is necessary for vmalloc translation.
+
+mem_map
+-------
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+----------------
+
+Makedumpfile gets the pglist_data structure from this symbol, which is
+used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--------------------------------------------------------------------------
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+----
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute contiguous memory.
+
+pglist_data
+-----------
+
+The size of a pglist_data structure. This value is used to check if the
+pglist_data structure is valid. It is also used for checking the memory
+type.
+
+zone
+----
+
+The size of a zone structure. This value is used to check if the zone
+structure has been found. It is also used for excluding free pages.
+
+free_area
+---------
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful when excluding free pages.
+
+list_head
+---------
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+----------
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
+-------------------------------------------------------------------------------------------------
+
+User-space tools compute their values based on the offset of these
+variables. The variables are used when excluding unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_spanned_pages|node_id)
+-----------------------------------------------------------------------------------------
+
+On NUMA machines, each NUMA node has a pg_data_t to describe its memory
+layout. On UMA machines there is a single pglist_data which describes the
+whole memory.
+
+These values are used to check the memory type and to compute the
+virtual address for memory map.
+
+(zone, free_area|vm_stat|spanned_pages)
+---------------------------------------
+
+Each node is divided into a number of blocks called zones which
+represent ranges within memory. A zone is described by a structure zone.
+
+User-space tools compute required values based on the offset of these
+variables.
+
+(free_area, free_list)
+----------------------
+
+Offset of the free_list's member. This value is used to compute the number
+of free pages.
+
+Each zone has a free_area structure array called free_area[MAX_ORDER].
+The free_list represents a linked list of free page blocks.
+
+(list_head, next|prev)
+----------------------
+
+Offsets of the list_head's members. list_head is used to define a
+circular linked list. User-space tools need these in order to traverse
+lists.
+
+(vmap_area, va_start|list)
+--------------------------
+
+Offsets of the vmap_area's members. They carry vmalloc-specific
+information. Makedumpfile gets the start address of the vmalloc region
+from this.
+
+(zone.free_area, MAX_ORDER)
+---------------------------
+
+Free areas descriptor. User-space tools use this value to iterate the
+free_area ranges. MAX_ORDER is used by the zone buddy allocator.
+
+log_first_idx
+-------------
+
+Index of the first record stored in the buffer log_buf. Used by
+user-space tools to read the strings in the log_buf.
+
+log_buf
+-------
+
+Console output is written to the ring buffer log_buf at index
+log_first_idx. Used to get the kernel log.
+
+log_buf_len
+-----------
+
+log_buf's length.
+
+clear_idx
+---------
+
+The index that the next printk() record to read after the last clear
+command. It indicates the first record after the last SYSLOG_ACTION
+_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
+the dmesg log.
+
+log_next_idx
+------------
+
+The index of the next record to store in the buffer log_buf. Used to
+compute the index of the current buffer position.
+
+printk_log
+----------
+
+The size of a structure printk_log. Used to compute the size of
+messages, and extract dmesg log. It encapsulates header information for
+log_buf, such as timestamp, syslog level, etc.
+
+(printk_log, ts_nsec|len|text_len|dict_len)
+-------------------------------------------
+
+It represents field offsets in struct printk_log. User space tools
+parse it and check whether the values of printk_log's members have been
+changed.
+
+(free_area.free_list, MIGRATE_TYPES)
+------------------------------------
+
+The number of migrate types for pages. The free_list is described by the
+array. Used by tools to compute the number of free pages.
+
+NR_FREE_PAGES
+-------------
+
+On linux-2.6.21 or later, the number of free pages is in
+vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
+
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
+------------------------------------------------------------------------------
+
+Page attributes. These flags are used to filter various unnecessary for
+dumping pages.
+
+PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
+-----------------------------------------------------------------------------
+
+More page attributes. These flags are used to filter various unnecessary for
+dumping pages.
+
+
+HUGETLB_PAGE_DTOR
+-----------------
+
+The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
+excludes these pages.
+
+x86_64
+======
+
+phys_base
+---------
+
+Used to convert the virtual address of an exported kernel symbol to its
+corresponding physical address.
+
+init_top_pgt
+------------
+
+Used to walk through the whole page table and convert virtual addresses
+to physical addresses. The init_top_pgt is somewhat similar to
+swapper_pg_dir, but it is only used in x86_64.
+
+pgtable_l5_enabled
+------------------
+
+User-space tools need to know whether the crash kernel was in 5-level
+paging mode.
+
+node_data
+---------
+
+This is a struct pglist_data array and stores all NUMA nodes
+information. Makedumpfile gets the pglist_data structure from it.
+
+(node_data, MAX_NUMNODES)
+-------------------------
+
+The maximum number of nodes in system.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+KERNEL_IMAGE_SIZE
+-----------------
+
+Currently unused by Makedumpfile. Used to compute the module virtual
+address by Crash.
+
+sme_mask
+--------
+
+AMD-specific with SME support: it indicates the secure memory encryption
+mask. Makedumpfile tools need to know whether the crash kernel was
+encrypted. If SME is enabled in the first kernel, the crash kernel's
+page table entries (pgd/pud/pmd/pte) contain the memory encryption
+mask. This is used to remove the SME mask and obtain the true physical
+address.
+
+Currently, sme_mask stores the value of the C-bit position. If needed,
+additional SME-relevant info can be placed in that variable.
+
+For example::
+
+  [ misc	        ][ enc bit  ][ other misc SME info       ]
+  0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
+  63   59   55   51   47   43   39   35   31   27   ... 3
+
+x86_32
+======
+
+X86_PAE
+-------
+
+Denotes whether physical address extensions are enabled. It has the cost
+of a higher page table lookup overhead, and also consumes more page
+table space per process. Used to check whether PAE was enabled in the
+crash kernel when converting virtual addresses to physical addresses.
+
+ia64
+====
+
+pgdat_list|(pgdat_list, MAX_NUMNODES)
+-------------------------------------
+
+pg_data_t array storing all NUMA nodes information. MAX_NUMNODES
+indicates the number of the nodes.
+
+node_memblk|(node_memblk, NR_NODE_MEMBLKS)
+------------------------------------------
+
+List of node memory chunks. Filled when parsing the SRAT table to obtain
+information about memory nodes. NR_NODE_MEMBLKS indicates the number of
+node memory chunks.
+
+These values are used to compute the number of nodes the crashed kernel used.
+
+node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
+----------------------------------------------------------------
+
+The size of a struct node_memblk_s and the offsets of the
+node_memblk_s's members. Used to compute the number of nodes.
+
+PGTABLE_3|PGTABLE_4
+-------------------
+
+User-space tools need to know whether the crash kernel was in 3-level or
+4-level paging mode. Used to distinguish the page table.
+
+ARM64
+=====
+
+VA_BITS
+-------
+
+The maximum number of bits for virtual addresses. Used to compute the
+virtual memory ranges.
+
+kimage_voffset
+--------------
+
+The offset between the kernel virtual and physical mappings. Used to
+translate virtual to physical addresses.
+
+PHYS_OFFSET
+-----------
+
+Indicates the physical address of the start of memory. Similar to
+kimage_voffset, which is used to translate virtual to physical
+addresses.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+arm
+===
+
+ARM_LPAE
+--------
+
+It indicates whether the crash kernel supports large physical address
+extensions. Used to translate virtual to physical addresses.
+
+s390
+====
+
+lowcore_ptr
+-----------
+
+An array with a pointer to the lowcore of every CPU. Used to print the
+psw and all registers information.
+
+high_memory
+-----------
+
+Used to get the vmalloc_start address from the high_memory symbol.
+
+(lowcore_ptr, NR_CPUS)
+----------------------
+
+The maximum number of CPUs.
+
+powerpc
+=======
+
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+contig_page_data
+----------------
+
+See above.
+
+vmemmap_list
+------------
+
+The vmemmap_list maintains the entire vmemmap physical mapping. Used
+to get vmemmap list count and populated vmemmap regions info. If the
+vmemmap address translation information is stored in the crash kernel,
+it is used to translate vmemmap kernel virtual addresses.
+
+mmu_vmemmap_psize
+-----------------
+
+The size of a page. Used to translate virtual to physical addresses.
+
+mmu_psize_defs
+--------------
+
+Page size definitions, i.e. 4k, 64k, or 16M.
+
+Used to make vtop translations.
+
+vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|(vmemmap_backing, virt_addr)
+--------------------------------------------------------------------------------------------
+
+The vmemmap virtual address space management does not have a traditional
+page table to track which virtual struct pages are backed by a physical
+mapping. The virtual to physical mappings are tracked in a simple linked
+list format.
+
+User-space tools need to know the offset of list, phys and virt_addr
+when computing the count of vmemmap regions.
+
+mmu_psize_def|(mmu_psize_def, shift)
+------------------------------------
+
+The size of a struct mmu_psize_def and the offset of mmu_psize_def's
+member.
+
+Used in vtop translations.
+
+sh
+==
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+X2TLB
+-----
+
+Indicates whether the crashed kernel enabled SH extended mode.
diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt
deleted file mode 100644
index bb94a4bd597a..000000000000
--- a/Documentation/kdump/vmcoreinfo.txt
+++ /dev/null
@@ -1,495 +0,0 @@
-================================================================
-			VMCOREINFO
-================================================================
-
-===========
-What is it?
-===========
-
-VMCOREINFO is a special ELF note section. It contains various
-information from the kernel like structure size, page size, symbol
-values, field offsets, etc. These data are packed into an ELF note
-section and used by user-space tools like crash and makedumpfile to
-analyze a kernel's memory layout.
-
-================
-Common variables
-================
-
-init_uts_ns.name.release
-------------------------
-
-The version of the Linux kernel. Used to find the corresponding source
-code from which the kernel has been built. For example, crash uses it to
-find the corresponding vmlinux in order to process vmcore.
-
-PAGE_SIZE
----------
-
-The size of a page. It is the smallest unit of data used by the memory
-management facilities. It is usually 4096 bytes of size and a page is
-aligned on 4096 bytes. Used for computing page addresses.
-
-init_uts_ns
------------
-
-The UTS namespace which is used to isolate two specific elements of the
-system that relate to the uname(2) system call. It is named after the
-data structure used to store information returned by the uname(2) system
-call.
-
-User-space tools can get the kernel name, host name, kernel release
-number, kernel version, architecture name and OS type from it.
-
-node_online_map
----------------
-
-An array node_states[N_ONLINE] which represents the set of online nodes
-in a system, one bit position per node number. Used to keep track of
-which nodes are in the system and online.
-
-swapper_pg_dir
--------------
-
-The global page directory pointer of the kernel. Used to translate
-virtual to physical addresses.
-
-_stext
-------
-
-Defines the beginning of the text section. In general, _stext indicates
-the kernel start address. Used to convert a virtual address from the
-direct kernel map to a physical address.
-
-vmap_area_list
---------------
-
-Stores the virtual area list. makedumpfile gets the vmalloc start value
-from this variable and its value is necessary for vmalloc translation.
-
-mem_map
--------
-
-Physical addresses are translated to struct pages by treating them as
-an index into the mem_map array. Right-shifting a physical address
-PAGE_SHIFT bits converts it into a page frame number which is an index
-into that mem_map array.
-
-Used to map an address to the corresponding struct page.
-
-contig_page_data
-----------------
-
-Makedumpfile gets the pglist_data structure from this symbol, which is
-used to describe the memory layout.
-
-User-space tools use this to exclude free pages when dumping memory.
-
-mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
---------------------------------------------------------------------------
-
-The address of the mem_section array, its length, structure size, and
-the section_mem_map offset.
-
-It exists in the sparse memory mapping model, and it is also somewhat
-similar to the mem_map variable, both of them are used to translate an
-address.
-
-page
-----
-
-The size of a page structure. struct page is an important data structure
-and it is widely used to compute contiguous memory.
-
-pglist_data
------------
-
-The size of a pglist_data structure. This value is used to check if the
-pglist_data structure is valid. It is also used for checking the memory
-type.
-
-zone
-----
-
-The size of a zone structure. This value is used to check if the zone
-structure has been found. It is also used for excluding free pages.
-
-free_area
----------
-
-The size of a free_area structure. It indicates whether the free_area
-structure is valid or not. Useful when excluding free pages.
-
-list_head
----------
-
-The size of a list_head structure. Used when iterating lists in a
-post-mortem analysis session.
-
-nodemask_t
-----------
-
-The size of a nodemask_t type. Used to compute the number of online
-nodes.
-
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
-       compound_order|compound_head)
--------------------------------------------------------------------
-
-User-space tools compute their values based on the offset of these
-variables. The variables are used when excluding unnecessary pages.
-
-(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
-              spanned_pages|node_id)
--------------------------------------------------------------------
-
-On NUMA machines, each NUMA node has a pg_data_t to describe its memory
-layout. On UMA machines there is a single pglist_data which describes the
-whole memory.
-
-These values are used to check the memory type and to compute the
-virtual address for memory map.
-
-(zone, free_area|vm_stat|spanned_pages)
----------------------------------------
-
-Each node is divided into a number of blocks called zones which
-represent ranges within memory. A zone is described by a structure zone.
-
-User-space tools compute required values based on the offset of these
-variables.
-
-(free_area, free_list)
-----------------------
-
-Offset of the free_list's member. This value is used to compute the number
-of free pages.
-
-Each zone has a free_area structure array called free_area[MAX_ORDER].
-The free_list represents a linked list of free page blocks.
-
-(list_head, next|prev)
-----------------------
-
-Offsets of the list_head's members. list_head is used to define a
-circular linked list. User-space tools need these in order to traverse
-lists.
-
-(vmap_area, va_start|list)
---------------------------
-
-Offsets of the vmap_area's members. They carry vmalloc-specific
-information. Makedumpfile gets the start address of the vmalloc region
-from this.
-
-(zone.free_area, MAX_ORDER)
----------------------------
-
-Free areas descriptor. User-space tools use this value to iterate the
-free_area ranges. MAX_ORDER is used by the zone buddy allocator.
-
-log_first_idx
--------------
-
-Index of the first record stored in the buffer log_buf. Used by
-user-space tools to read the strings in the log_buf.
-
-log_buf
--------
-
-Console output is written to the ring buffer log_buf at index
-log_first_idx. Used to get the kernel log.
-
-log_buf_len
------------
-
-log_buf's length.
-
-clear_idx
----------
-
-The index that the next printk() record to read after the last clear
-command. It indicates the first record after the last SYSLOG_ACTION
-_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
-the dmesg log.
-
-log_next_idx
-------------
-
-The index of the next record to store in the buffer log_buf. Used to
-compute the index of the current buffer position.
-
-printk_log
-----------
-
-The size of a structure printk_log. Used to compute the size of
-messages, and extract dmesg log. It encapsulates header information for
-log_buf, such as timestamp, syslog level, etc.
-
-(printk_log, ts_nsec|len|text_len|dict_len)
--------------------------------------------
-
-It represents field offsets in struct printk_log. User space tools
-parse it and check whether the values of printk_log's members have been
-changed.
-
-(free_area.free_list, MIGRATE_TYPES)
-------------------------------------
-
-The number of migrate types for pages. The free_list is described by the
-array. Used by tools to compute the number of free pages.
-
-NR_FREE_PAGES
--------------
-
-On linux-2.6.21 or later, the number of free pages is in
-vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
-
-PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision
-|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)
-|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
------------------------------------------------------------------
-
-Page attributes. These flags are used to filter various unnecessary for
-dumping pages.
-
-HUGETLB_PAGE_DTOR
------------------
-
-The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
-excludes these pages.
-
-======
-x86_64
-======
-
-phys_base
----------
-
-Used to convert the virtual address of an exported kernel symbol to its
-corresponding physical address.
-
-init_top_pgt
-------------
-
-Used to walk through the whole page table and convert virtual addresses
-to physical addresses. The init_top_pgt is somewhat similar to
-swapper_pg_dir, but it is only used in x86_64.
-
-pgtable_l5_enabled
-------------------
-
-User-space tools need to know whether the crash kernel was in 5-level
-paging mode.
-
-node_data
----------
-
-This is a struct pglist_data array and stores all NUMA nodes
-information. Makedumpfile gets the pglist_data structure from it.
-
-(node_data, MAX_NUMNODES)
--------------------------
-
-The maximum number of nodes in system.
-
-KERNELOFFSET
-------------
-
-The kernel randomization offset. Used to compute the page offset. If
-KASLR is disabled, this value is zero.
-
-KERNEL_IMAGE_SIZE
------------------
-
-Currently unused by Makedumpfile. Used to compute the module virtual
-address by Crash.
-
-sme_mask
---------
-
-AMD-specific with SME support: it indicates the secure memory encryption
-mask. Makedumpfile tools need to know whether the crash kernel was
-encrypted. If SME is enabled in the first kernel, the crash kernel's
-page table entries (pgd/pud/pmd/pte) contain the memory encryption
-mask. This is used to remove the SME mask and obtain the true physical
-address.
-
-Currently, sme_mask stores the value of the C-bit position. If needed,
-additional SME-relevant info can be placed in that variable.
-
-For example:
-[ misc	        ][ enc bit  ][ other misc SME info       ]
-0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
-63   59   55   51   47   43   39   35   31   27   ... 3
-
-======
-x86_32
-======
-
-X86_PAE
--------
-
-Denotes whether physical address extensions are enabled. It has the cost
-of a higher page table lookup overhead, and also consumes more page
-table space per process. Used to check whether PAE was enabled in the
-crash kernel when converting virtual addresses to physical addresses.
-
-====
-ia64
-====
-
-pgdat_list|(pgdat_list, MAX_NUMNODES)
--------------------------------------
-
-pg_data_t array storing all NUMA nodes information. MAX_NUMNODES
-indicates the number of the nodes.
-
-node_memblk|(node_memblk, NR_NODE_MEMBLKS)
-------------------------------------------
-
-List of node memory chunks. Filled when parsing the SRAT table to obtain
-information about memory nodes. NR_NODE_MEMBLKS indicates the number of
-node memory chunks.
-
-These values are used to compute the number of nodes the crashed kernel used.
-
-node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
-----------------------------------------------------------------
-
-The size of a struct node_memblk_s and the offsets of the
-node_memblk_s's members. Used to compute the number of nodes.
-
-PGTABLE_3|PGTABLE_4
--------------------
-
-User-space tools need to know whether the crash kernel was in 3-level or
-4-level paging mode. Used to distinguish the page table.
-
-=====
-ARM64
-=====
-
-VA_BITS
--------
-
-The maximum number of bits for virtual addresses. Used to compute the
-virtual memory ranges.
-
-kimage_voffset
---------------
-
-The offset between the kernel virtual and physical mappings. Used to
-translate virtual to physical addresses.
-
-PHYS_OFFSET
------------
-
-Indicates the physical address of the start of memory. Similar to
-kimage_voffset, which is used to translate virtual to physical
-addresses.
-
-KERNELOFFSET
-------------
-
-The kernel randomization offset. Used to compute the page offset. If
-KASLR is disabled, this value is zero.
-
-====
-arm
-====
-
-ARM_LPAE
---------
-
-It indicates whether the crash kernel supports large physical address
-extensions. Used to translate virtual to physical addresses.
-
-====
-s390
-====
-
-lowcore_ptr
-----------
-
-An array with a pointer to the lowcore of every CPU. Used to print the
-psw and all registers information.
-
-high_memory
------------
-
-Used to get the vmalloc_start address from the high_memory symbol.
-
-(lowcore_ptr, NR_CPUS)
-----------------------
-
-The maximum number of CPUs.
-
-=======
-powerpc
-=======
-
-
-node_data|(node_data, MAX_NUMNODES)
------------------------------------
-
-See above.
-
-contig_page_data
-----------------
-
-See above.
-
-vmemmap_list
-------------
-
-The vmemmap_list maintains the entire vmemmap physical mapping. Used
-to get vmemmap list count and populated vmemmap regions info. If the
-vmemmap address translation information is stored in the crash kernel,
-it is used to translate vmemmap kernel virtual addresses.
-
-mmu_vmemmap_psize
------------------
-
-The size of a page. Used to translate virtual to physical addresses.
-
-mmu_psize_defs
---------------
-
-Page size definitions, i.e. 4k, 64k, or 16M.
-
-Used to make vtop translations.
-
-vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|
-(vmemmap_backing, virt_addr)
-----------------------------------------------------------------
-
-The vmemmap virtual address space management does not have a traditional
-page table to track which virtual struct pages are backed by a physical
-mapping. The virtual to physical mappings are tracked in a simple linked
-list format.
-
-User-space tools need to know the offset of list, phys and virt_addr
-when computing the count of vmemmap regions.
-
-mmu_psize_def|(mmu_psize_def, shift)
-------------------------------------
-
-The size of a struct mmu_psize_def and the offset of mmu_psize_def's
-member.
-
-Used in vtop translations.
-
-==
-sh
-==
-
-node_data|(node_data, MAX_NUMNODES)
------------------------------------
-
-See above.
-
-X2TLB
------
-
-Indicates whether the crashed kernel enabled SH extended mode.
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index 18c5feef2577..0c41d6d463f3 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -59,7 +59,7 @@ as follows:
          the default calculated size. Use this option if default
          boot memory size is not sufficient for second kernel to
          boot successfully. For syntax of crashkernel= parameter,
-         refer to Documentation/kdump/kdump.txt. If any offset is
+         refer to Documentation/kdump/kdump.rst. If any offset is
          provided in crashkernel= parameter, it will be ignored
          as fadump uses a predefined offset to reserve memory
          for boot memory dump preservation in case of a crash.
diff --git a/Documentation/translations/zh_CN/oops-tracing.txt b/Documentation/translations/zh_CN/oops-tracing.txt
index 93fa061cf9e4..368ddd05b304 100644
--- a/Documentation/translations/zh_CN/oops-tracing.txt
+++ b/Documentation/translations/zh_CN/oops-tracing.txt
@@ -53,7 +53,7 @@ cat /proc/kmsg > file， 然而你必须介入中止传输， kmsg是一个“
 （2）用串口终端启动（请参看Documentation/admin-guide/serial-console.rst），运行一个null
 modem到另一台机器并用你喜欢的通讯工具获取输出。Minicom工作地很好。
 
-（3）使用Kdump（请参看Documentation/kdump/kdump.txt），
+（3）使用Kdump（请参看Documentation/kdump/kdump.rst），
 使用在Documentation/kdump/gdbmacros.txt中定义的dmesg gdb宏，从旧的内存中提取内核
 环形缓冲区。
 
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
index 55df692c5595..aaa9e4b4bdcd 100644
--- a/Documentation/watchdog/hpwdt.txt
+++ b/Documentation/watchdog/hpwdt.txt
@@ -51,7 +51,7 @@ Last reviewed: 08/20/2018
  and loop forever.  This is generally not what a watchdog user wants.
 
  For those wishing to learn more please see:
-	Documentation/kdump/kdump.txt
+	Documentation/kdump/kdump.rst
 	Documentation/admin-guide/kernel-parameters.txt (panic=)
 	Your Linux Distribution specific documentation.
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0f220264cc23..249d788f3124 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2010,7 +2010,7 @@ config CRASH_DUMP
 	  kdump/kexec. The crash dump kernel must be compiled to a
 	  memory address not used by the main kernel
 
-	  For more details see Documentation/kdump/kdump.txt
+	  For more details see Documentation/kdump/kdump.rst
 
 config AUTO_ZRELADDR
 	bool "Auto calculation of the decompressed kernel image address"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 697ea0510729..27568506e1eb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -994,7 +994,7 @@ config CRASH_DUMP
 	  reserved region and then later executed after a crash by
 	  kdump/kexec.
 
-	  For more details see Documentation/kdump/kdump.txt
+	  For more details see Documentation/kdump/kdump.rst
 
 config XEN_DOM0
 	def_bool y
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index b77f512bb176..ce1a28654507 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -623,7 +623,7 @@ config CRASH_DUMP
 	  to a memory address not used by the main kernel using
 	  PHYSICAL_START.
 
-	  For more details see Documentation/kdump/kdump.txt
+	  For more details see Documentation/kdump/kdump.rst
 
 config KEXEC_JUMP
 	bool "kexec jump (EXPERIMENTAL)"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9f1f7b47621c..8fbd685dd984 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2037,7 +2037,7 @@ config CRASH_DUMP
 	  to a memory address not used by the main kernel or BIOS using
 	  PHYSICAL_START, or it must be built as a relocatable image
 	  (CONFIG_RELOCATABLE=y).
-	  For more details see Documentation/kdump/kdump.txt
+	  For more details see Documentation/kdump/kdump.rst
 
 config KEXEC_JUMP
 	bool "kexec jump"
@@ -2074,7 +2074,7 @@ config PHYSICAL_START
 	  the reserved region.  In other words, it can be set based on
 	  the "X" value as specified in the "crashkernel=YM@XM"
 	  command line boot parameter passed to the panic-ed
-	  kernel. Please take a look at Documentation/kdump/kdump.txt
+	  kernel. Please take a look at Documentation/kdump/kdump.rst
 	  for more details about crash dumps.
 
 	  Usage of bzImage for capturing the crash dump is recommended as
-- 
cgit v1.2.3-59-g8ed1b


From 09bbf055c3329008522b4a9814afe412c202daa7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:51 -0300
Subject: docs: mic: convert docs to ReST and rename to *.rst

Convert Intel Many Integrated Core architecture docs to ReST.

The conversion is trivial: just add title and literal block
markups, and adjust some identation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/mic/index.rst         |  18 ++++++
 Documentation/mic/mic_overview.rst  |  85 ++++++++++++++++++++++++++++
 Documentation/mic/mic_overview.txt  |  81 ---------------------------
 Documentation/mic/scif_overview.rst | 108 ++++++++++++++++++++++++++++++++++++
 Documentation/mic/scif_overview.txt |  98 --------------------------------
 5 files changed, 211 insertions(+), 179 deletions(-)
 create mode 100644 Documentation/mic/index.rst
 create mode 100644 Documentation/mic/mic_overview.rst
 delete mode 100644 Documentation/mic/mic_overview.txt
 create mode 100644 Documentation/mic/scif_overview.rst
 delete mode 100644 Documentation/mic/scif_overview.txt

diff --git a/Documentation/mic/index.rst b/Documentation/mic/index.rst
new file mode 100644
index 000000000000..082fa8f6a260
--- /dev/null
+++ b/Documentation/mic/index.rst
@@ -0,0 +1,18 @@
+:orphan:
+
+=============================================
+Intel Many Integrated Core (MIC) architecture
+=============================================
+
+.. toctree::
+    :maxdepth: 1
+
+    mic_overview
+    scif_overview
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/mic/mic_overview.rst b/Documentation/mic/mic_overview.rst
new file mode 100644
index 000000000000..17d956bdaf7c
--- /dev/null
+++ b/Documentation/mic/mic_overview.rst
@@ -0,0 +1,85 @@
+======================================================
+Intel Many Integrated Core (MIC) architecture overview
+======================================================
+
+An Intel MIC X100 device is a PCIe form factor add-in coprocessor
+card based on the Intel Many Integrated Core (MIC) architecture
+that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
+implements the three required standard address spaces i.e. configuration,
+memory and I/O. The host OS loads a device driver as is typical for
+PCIe devices. The card itself runs a bootstrap after reset that
+transfers control to the card OS downloaded from the host driver. The
+host driver supports OSPM suspend and resume operations. It shuts down
+the card during suspend and reboots the card OS during resume.
+The card OS as shipped by Intel is a Linux kernel with modifications
+for the X100 devices.
+
+Since it is a PCIe card, it does not have the ability to host hardware
+devices for networking, storage and console. We provide these devices
+on X100 coprocessors thus enabling a self-bootable equivalent
+environment for applications. A key benefit of our solution is that it
+leverages the standard virtio framework for network, disk and console
+devices, though in our case the virtio framework is used across a PCIe
+bus. A Virtio Over PCIe (VOP) driver allows creating user space
+backends or devices on the host which are used to probe virtio drivers
+for these devices on the MIC card. The existing VRINGH infrastructure
+in the kernel is used to access virtio rings from the host. The card
+VOP driver allows card virtio drivers to communicate with their user
+space backends on the host via a device page. Ring 3 apps on the host
+can add, remove and configure virtio devices. A thin MIC specific
+virtio_config_ops is implemented which is borrowed heavily from
+previous similar implementations in lguest and s390.
+
+MIC PCIe card has a dma controller with 8 channels. These channels are
+shared between the host s/w and the card s/w. 0 to 3 are used by host
+and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
+a virtual bus called mic bus is created and virtual dma devices are
+created on it by the host/card drivers. On host the channels are private
+and used only by the host driver to transfer data for the virtio devices.
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a
+low level communications API across PCIe currently implemented for MIC.
+More details are available at scif_overview.txt.
+
+The Coprocessor State Management (COSM) driver on the host allows for
+boot, shutdown and reset of Intel MIC devices. It communicates with a COSM
+"client" driver on the MIC cards over SCIF to perform these functions.
+
+Here is a block diagram of the various components described above. The
+virtio backends are situated on the host rather than the card given better
+single threaded performance for the host compared to MIC, the ability of
+the host to initiate DMA's to/from the card using the MIC DMA engine and
+the fact that the virtio block storage backend can only be on the host::
+
+               +----------+           |             +----------+
+               | Card OS  |           |             | Host OS  |
+               +----------+           |             +----------+
+                                      |
+        +-------+ +--------+ +------+ | +---------+  +--------+ +--------+
+        | Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
+        | Net   | |Console | |Block | | |Net      |  |Console | |Block   |
+        | Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
+        +---+---+ +---+----+ +--+---+ | +---------+  +----+---+ +--------+
+            |         |         |     |      |            |         |
+            |         |         |     |User  |            |         |
+            |         |         |     |------|------------|--+------|-------
+            +---------+---------+     |Kernel                |
+                      |               |                      |
+  +---------+     +---+----+ +------+ | +------+ +------+ +--+---+  +-------+
+  |MIC DMA  |     |  VOP   | | SCIF | | | SCIF | | COSM | | VOP  |  |MIC DMA|
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+      |               |         |     |    |        |                    |
+  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
+  |MIC      |     |  VOP   | |SCIF  | | |SCIF  | | COSM | | VOP  |  | MIC   |
+  |HW Bus   |     |  HW Bus| |HW Bus| | |HW Bus| | Bus  | |HW Bus|  |HW Bus |
+  +---------+     +--------+ +--+---+ | +--+---+ +------+ +------+  +-------+
+      |               |         |     |       |     |                    |
+      |   +-----------+--+      |     |       |    +---------------+     |
+      |   |Intel MIC     |      |     |       |    |Intel MIC      |     |
+      |   |Card Driver   |      |     |       |    |Host Driver    |     |
+      +---+--------------+------+     |       +----+---------------+-----+
+                 |                    |                   |
+             +-------------------------------------------------------------+
+             |                                                             |
+             |                    PCIe Bus                                 |
+             +-------------------------------------------------------------+
diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
deleted file mode 100644
index 074adbdf83a4..000000000000
--- a/Documentation/mic/mic_overview.txt
+++ /dev/null
@@ -1,81 +0,0 @@
-An Intel MIC X100 device is a PCIe form factor add-in coprocessor
-card based on the Intel Many Integrated Core (MIC) architecture
-that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
-implements the three required standard address spaces i.e. configuration,
-memory and I/O. The host OS loads a device driver as is typical for
-PCIe devices. The card itself runs a bootstrap after reset that
-transfers control to the card OS downloaded from the host driver. The
-host driver supports OSPM suspend and resume operations. It shuts down
-the card during suspend and reboots the card OS during resume.
-The card OS as shipped by Intel is a Linux kernel with modifications
-for the X100 devices.
-
-Since it is a PCIe card, it does not have the ability to host hardware
-devices for networking, storage and console. We provide these devices
-on X100 coprocessors thus enabling a self-bootable equivalent
-environment for applications. A key benefit of our solution is that it
-leverages the standard virtio framework for network, disk and console
-devices, though in our case the virtio framework is used across a PCIe
-bus. A Virtio Over PCIe (VOP) driver allows creating user space
-backends or devices on the host which are used to probe virtio drivers
-for these devices on the MIC card. The existing VRINGH infrastructure
-in the kernel is used to access virtio rings from the host. The card
-VOP driver allows card virtio drivers to communicate with their user
-space backends on the host via a device page. Ring 3 apps on the host
-can add, remove and configure virtio devices. A thin MIC specific
-virtio_config_ops is implemented which is borrowed heavily from
-previous similar implementations in lguest and s390.
-
-MIC PCIe card has a dma controller with 8 channels. These channels are
-shared between the host s/w and the card s/w. 0 to 3 are used by host
-and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
-a virtual bus called mic bus is created and virtual dma devices are
-created on it by the host/card drivers. On host the channels are private
-and used only by the host driver to transfer data for the virtio devices.
-
-The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a
-low level communications API across PCIe currently implemented for MIC.
-More details are available at scif_overview.txt.
-
-The Coprocessor State Management (COSM) driver on the host allows for
-boot, shutdown and reset of Intel MIC devices. It communicates with a COSM
-"client" driver on the MIC cards over SCIF to perform these functions.
-
-Here is a block diagram of the various components described above. The
-virtio backends are situated on the host rather than the card given better
-single threaded performance for the host compared to MIC, the ability of
-the host to initiate DMA's to/from the card using the MIC DMA engine and
-the fact that the virtio block storage backend can only be on the host.
-
-               +----------+           |             +----------+
-               | Card OS  |           |             | Host OS  |
-               +----------+           |             +----------+
-                                      |
-        +-------+ +--------+ +------+ | +---------+  +--------+ +--------+
-        | Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
-        | Net   | |Console | |Block | | |Net      |  |Console | |Block   |
-        | Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
-        +---+---+ +---+----+ +--+---+ | +---------+  +----+---+ +--------+
-            |         |         |     |      |            |         |
-            |         |         |     |User  |            |         |
-            |         |         |     |------|------------|--+------|-------
-            +---------+---------+     |Kernel                |
-                      |               |                      |
-  +---------+     +---+----+ +------+ | +------+ +------+ +--+---+  +-------+
-  |MIC DMA  |     |  VOP   | | SCIF | | | SCIF | | COSM | | VOP  |  |MIC DMA|
-  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
-      |               |         |     |    |        |                    |
-  +---+-----+     +---+----+ +--+---+ | +--+---+ +--+---+ +------+  +----+--+
-  |MIC      |     |  VOP   | |SCIF  | | |SCIF  | | COSM | | VOP  |  | MIC   |
-  |HW Bus   |     |  HW Bus| |HW Bus| | |HW Bus| | Bus  | |HW Bus|  |HW Bus |
-  +---------+     +--------+ +--+---+ | +--+---+ +------+ +------+  +-------+
-      |               |         |     |       |     |                    |
-      |   +-----------+--+      |     |       |    +---------------+     |
-      |   |Intel MIC     |      |     |       |    |Intel MIC      |     |
-      |   |Card Driver   |      |     |       |    |Host Driver    |     |
-      +---+--------------+------+     |       +----+---------------+-----+
-                 |                    |                   |
-             +-------------------------------------------------------------+
-             |                                                             |
-             |                    PCIe Bus                                 |
-             +-------------------------------------------------------------+
diff --git a/Documentation/mic/scif_overview.rst b/Documentation/mic/scif_overview.rst
new file mode 100644
index 000000000000..4c8ad9e43706
--- /dev/null
+++ b/Documentation/mic/scif_overview.rst
@@ -0,0 +1,108 @@
+========================================
+Symmetric Communication Interface (SCIF)
+========================================
+
+The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
+level communications API across PCIe currently implemented for MIC. Currently
+SCIF provides inter-node communication within a single host platform, where a
+node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
+communicating over the PCIe bus while providing an API that is symmetric
+across all the nodes in the PCIe network. An important design objective for SCIF
+is to deliver the maximum possible performance given the communication
+abilities of the hardware. SCIF has been used to implement an offload compiler
+runtime and OFED support for MPI implementations for MIC coprocessors.
+
+SCIF API Components
+===================
+
+The SCIF API has the following parts:
+
+1. Connection establishment using a client server model
+2. Byte stream messaging intended for short messages
+3. Node enumeration to determine online nodes
+4. Poll semantics for detection of incoming connections and messages
+5. Memory registration to pin down pages
+6. Remote memory mapping for low latency CPU accesses via mmap
+7. Remote DMA (RDMA) for high bandwidth DMA transfers
+8. Fence APIs for RDMA synchronization
+
+SCIF exposes the notion of a connection which can be used by peer processes on
+nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
+process in a SCIF node initiates a SCIF connection to a peer process on a
+different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
+which are similar to connection oriented socket APIs. Connected SCIF endpoints
+can also register local memory which is followed by data transfer using either
+DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
+kernel mode clients which are functionally equivalent.
+
+SCIF Performance for MIC
+========================
+
+DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
+SCIF shows the performance advantages of SCIF for HPC applications and
+runtimes::
+
+             Comparison of TCP and SCIF based BW
+
+  Throughput (GB/sec)
+    8 +                                             PCIe Bandwidth ******
+      +                                                        TCP ######
+    7 +    **************************************             SCIF %%%%%%
+      |                       %%%%%%%%%%%%%%%%%%%
+    6 +                   %%%%
+      |                 %%
+      |               %%%
+    5 +              %%
+      |            %%
+    4 +           %%
+      |          %%
+    3 +         %%
+      |        %
+    2 +      %%
+      |     %%
+      |    %
+    1 +
+      +    ######################################
+    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
+      1       10     100      1000   10000   100000
+                   Transfer Size (KBytes)
+
+SCIF allows memory sharing via mmap(..) between processes on different PCIe
+nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
+latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
+
+SCIF has a user space library which is a thin IOCTL wrapper providing a user
+space API similar to the kernel API in scif.h. The SCIF user space library
+is distributed @ https://software.intel.com/en-us/mic-developer
+
+Here is some pseudo code for an example of how two applications on two PCIe
+nodes would typically use the SCIF API::
+
+  Process A (on node A)			Process B (on node B)
+
+  /* get online node information */
+  scif_get_node_ids(..)			scif_get_node_ids(..)
+  scif_open(..)				scif_open(..)
+  scif_bind(..)				scif_bind(..)
+  scif_listen(..)
+  scif_accept(..)				scif_connect(..)
+  /* SCIF connection established */
+
+  /* Send and receive short messages */
+  scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
+
+  /* Register memory */
+  scif_register(..)			scif_register(..)
+
+  /* RDMA */
+  scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
+
+  /* Fence DMAs */
+  scif_fence_signal(..)			scif_fence_signal(..)
+
+  mmap(..)				mmap(..)
+
+  /* Access remote registered memory */
+
+  /* Close the endpoints */
+  scif_close(..)				scif_close(..)
diff --git a/Documentation/mic/scif_overview.txt b/Documentation/mic/scif_overview.txt
deleted file mode 100644
index 0a280d986731..000000000000
--- a/Documentation/mic/scif_overview.txt
+++ /dev/null
@@ -1,98 +0,0 @@
-The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
-level communications API across PCIe currently implemented for MIC. Currently
-SCIF provides inter-node communication within a single host platform, where a
-node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
-communicating over the PCIe bus while providing an API that is symmetric
-across all the nodes in the PCIe network. An important design objective for SCIF
-is to deliver the maximum possible performance given the communication
-abilities of the hardware. SCIF has been used to implement an offload compiler
-runtime and OFED support for MPI implementations for MIC coprocessors.
-
-==== SCIF API Components ====
-The SCIF API has the following parts:
-1. Connection establishment using a client server model
-2. Byte stream messaging intended for short messages
-3. Node enumeration to determine online nodes
-4. Poll semantics for detection of incoming connections and messages
-5. Memory registration to pin down pages
-6. Remote memory mapping for low latency CPU accesses via mmap
-7. Remote DMA (RDMA) for high bandwidth DMA transfers
-8. Fence APIs for RDMA synchronization
-
-SCIF exposes the notion of a connection which can be used by peer processes on
-nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
-process in a SCIF node initiates a SCIF connection to a peer process on a
-different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
-which are similar to connection oriented socket APIs. Connected SCIF endpoints
-can also register local memory which is followed by data transfer using either
-DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
-kernel mode clients which are functionally equivalent.
-
-==== SCIF Performance for MIC ====
-DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
-SCIF shows the performance advantages of SCIF for HPC applications and runtimes.
-
-             Comparison of TCP and SCIF based BW
-
-  Throughput (GB/sec)
-    8 +                                             PCIe Bandwidth ******
-      +                                                        TCP ######
-    7 +    **************************************             SCIF %%%%%%
-      |                       %%%%%%%%%%%%%%%%%%%
-    6 +                   %%%%
-      |                 %%
-      |               %%%
-    5 +              %%
-      |            %%
-    4 +           %%
-      |          %%
-    3 +         %%
-      |        %
-    2 +      %%
-      |     %%
-      |    %
-    1 +
-      +    ######################################
-    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
-      1       10     100      1000   10000   100000
-                   Transfer Size (KBytes)
-
-SCIF allows memory sharing via mmap(..) between processes on different PCIe
-nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
-latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
-
-SCIF has a user space library which is a thin IOCTL wrapper providing a user
-space API similar to the kernel API in scif.h. The SCIF user space library
-is distributed @ https://software.intel.com/en-us/mic-developer
-
-Here is some pseudo code for an example of how two applications on two PCIe
-nodes would typically use the SCIF API:
-
-Process A (on node A)			Process B (on node B)
-
-/* get online node information */
-scif_get_node_ids(..)			scif_get_node_ids(..)
-scif_open(..)				scif_open(..)
-scif_bind(..)				scif_bind(..)
-scif_listen(..)
-scif_accept(..)				scif_connect(..)
-/* SCIF connection established */
-
-/* Send and receive short messages */
-scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
-
-/* Register memory */
-scif_register(..)			scif_register(..)
-
-/* RDMA */
-scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
-
-/* Fence DMAs */
-scif_fence_signal(..)			scif_fence_signal(..)
-
-mmap(..)				mmap(..)
-
-/* Access remote registered memory */
-
-/* Close the endpoints */
-scif_close(..)				scif_close(..)
-- 
cgit v1.2.3-59-g8ed1b


From 593733ab80ac2c607acc1fc3fbaba5031d38253a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:52 -0300
Subject: docs: netlabel: convert docs to ReST and rename to *.rst

Convert netlabel documentation to ReST.

This was trivial: just add proper title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/netlabel/cipso_ipv4.rst    | 56 ++++++++++++++++++++++++++++++++
 Documentation/netlabel/cipso_ipv4.txt    | 49 ----------------------------
 Documentation/netlabel/draft_ietf.rst    |  5 +++
 Documentation/netlabel/index.rst         | 21 ++++++++++++
 Documentation/netlabel/introduction.rst  | 52 +++++++++++++++++++++++++++++
 Documentation/netlabel/introduction.txt  | 46 --------------------------
 Documentation/netlabel/lsm_interface.rst | 53 ++++++++++++++++++++++++++++++
 Documentation/netlabel/lsm_interface.txt | 47 ---------------------------
 8 files changed, 187 insertions(+), 142 deletions(-)
 create mode 100644 Documentation/netlabel/cipso_ipv4.rst
 delete mode 100644 Documentation/netlabel/cipso_ipv4.txt
 create mode 100644 Documentation/netlabel/draft_ietf.rst
 create mode 100644 Documentation/netlabel/index.rst
 create mode 100644 Documentation/netlabel/introduction.rst
 delete mode 100644 Documentation/netlabel/introduction.txt
 create mode 100644 Documentation/netlabel/lsm_interface.rst
 delete mode 100644 Documentation/netlabel/lsm_interface.txt

diff --git a/Documentation/netlabel/cipso_ipv4.rst b/Documentation/netlabel/cipso_ipv4.rst
new file mode 100644
index 000000000000..cbd3f3231221
--- /dev/null
+++ b/Documentation/netlabel/cipso_ipv4.rst
@@ -0,0 +1,56 @@
+===================================
+NetLabel CIPSO/IPv4 Protocol Engine
+===================================
+
+Paul Moore, paul.moore@hp.com
+
+May 17, 2006
+
+Overview
+========
+
+The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial
+IP Security Option (CIPSO) draft from July 16, 1992.  A copy of this
+draft can be found in this directory
+(draft-ietf-cipso-ipsecurity-01.txt).  While the IETF draft never made
+it to an RFC standard it has become a de-facto standard for labeled
+networking and is used in many trusted operating systems.
+
+Outbound Packet Processing
+==========================
+
+The CIPSO/IPv4 protocol engine applies the CIPSO IP option to packets by
+adding the CIPSO label to the socket.  This causes all packets leaving the
+system through the socket to have the CIPSO IP option applied.  The socket's
+CIPSO label can be changed at any point in time, however, it is recommended
+that it is set upon the socket's creation.  The LSM can set the socket's CIPSO
+label by using the NetLabel security module API; if the NetLabel "domain" is
+configured to use CIPSO for packet labeling then a CIPSO IP option will be
+generated and attached to the socket.
+
+Inbound Packet Processing
+=========================
+
+The CIPSO/IPv4 protocol engine validates every CIPSO IP option it finds at the
+IP layer without any special handling required by the LSM.  However, in order
+to decode and translate the CIPSO label on the packet the LSM must use the
+NetLabel security module API to extract the security attributes of the packet.
+This is typically done at the socket layer using the 'socket_sock_rcv_skb()'
+LSM hook.
+
+Label Translation
+=================
+
+The CIPSO/IPv4 protocol engine contains a mechanism to translate CIPSO security
+attributes such as sensitivity level and category to values which are
+appropriate for the host.  These mappings are defined as part of a CIPSO
+Domain Of Interpretation (DOI) definition and are configured through the
+NetLabel user space communication layer.  Each DOI definition can have a
+different security attribute mapping table.
+
+Label Translation Cache
+=======================
+
+The NetLabel system provides a framework for caching security attribute
+mappings from the network labels to the corresponding LSM identifiers.  The
+CIPSO/IPv4 protocol engine supports this caching mechanism.
diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt
deleted file mode 100644
index a6075481fd60..000000000000
--- a/Documentation/netlabel/cipso_ipv4.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-NetLabel CIPSO/IPv4 Protocol Engine
-==============================================================================
-Paul Moore, paul.moore@hp.com
-
-May 17, 2006
-
- * Overview
-
-The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial
-IP Security Option (CIPSO) draft from July 16, 1992.  A copy of this
-draft can be found in this directory
-(draft-ietf-cipso-ipsecurity-01.txt).  While the IETF draft never made
-it to an RFC standard it has become a de-facto standard for labeled
-networking and is used in many trusted operating systems.
-
- * Outbound Packet Processing
-
-The CIPSO/IPv4 protocol engine applies the CIPSO IP option to packets by
-adding the CIPSO label to the socket.  This causes all packets leaving the
-system through the socket to have the CIPSO IP option applied.  The socket's
-CIPSO label can be changed at any point in time, however, it is recommended
-that it is set upon the socket's creation.  The LSM can set the socket's CIPSO
-label by using the NetLabel security module API; if the NetLabel "domain" is
-configured to use CIPSO for packet labeling then a CIPSO IP option will be
-generated and attached to the socket.
-
- * Inbound Packet Processing
-
-The CIPSO/IPv4 protocol engine validates every CIPSO IP option it finds at the
-IP layer without any special handling required by the LSM.  However, in order
-to decode and translate the CIPSO label on the packet the LSM must use the
-NetLabel security module API to extract the security attributes of the packet.
-This is typically done at the socket layer using the 'socket_sock_rcv_skb()'
-LSM hook.
-
- * Label Translation
-
-The CIPSO/IPv4 protocol engine contains a mechanism to translate CIPSO security
-attributes such as sensitivity level and category to values which are
-appropriate for the host.  These mappings are defined as part of a CIPSO
-Domain Of Interpretation (DOI) definition and are configured through the
-NetLabel user space communication layer.  Each DOI definition can have a
-different security attribute mapping table.
-
- * Label Translation Cache
-
-The NetLabel system provides a framework for caching security attribute
-mappings from the network labels to the corresponding LSM identifiers.  The
-CIPSO/IPv4 protocol engine supports this caching mechanism.
diff --git a/Documentation/netlabel/draft_ietf.rst b/Documentation/netlabel/draft_ietf.rst
new file mode 100644
index 000000000000..5ed39ab8234b
--- /dev/null
+++ b/Documentation/netlabel/draft_ietf.rst
@@ -0,0 +1,5 @@
+Draft IETF CIPSO IP Security
+----------------------------
+
+ .. include:: draft-ietf-cipso-ipsecurity-01.txt
+    :literal:
diff --git a/Documentation/netlabel/index.rst b/Documentation/netlabel/index.rst
new file mode 100644
index 000000000000..47f1e0e5acd1
--- /dev/null
+++ b/Documentation/netlabel/index.rst
@@ -0,0 +1,21 @@
+:orphan:
+
+========
+NetLabel
+========
+
+.. toctree::
+    :maxdepth: 1
+
+    introduction
+    cipso_ipv4
+    lsm_interface
+
+    draft_ietf
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/netlabel/introduction.rst b/Documentation/netlabel/introduction.rst
new file mode 100644
index 000000000000..9333bbb0adc1
--- /dev/null
+++ b/Documentation/netlabel/introduction.rst
@@ -0,0 +1,52 @@
+=====================
+NetLabel Introduction
+=====================
+
+Paul Moore, paul.moore@hp.com
+
+August 2, 2006
+
+Overview
+========
+
+NetLabel is a mechanism which can be used by kernel security modules to attach
+security attributes to outgoing network packets generated from user space
+applications and read security attributes from incoming network packets.  It
+is composed of three main components, the protocol engines, the communication
+layer, and the kernel security module API.
+
+Protocol Engines
+================
+
+The protocol engines are responsible for both applying and retrieving the
+network packet's security attributes.  If any translation between the network
+security attributes and those on the host are required then the protocol
+engine will handle those tasks as well.  Other kernel subsystems should
+refrain from calling the protocol engines directly, instead they should use
+the NetLabel kernel security module API described below.
+
+Detailed information about each NetLabel protocol engine can be found in this
+directory.
+
+Communication Layer
+===================
+
+The communication layer exists to allow NetLabel configuration and monitoring
+from user space.  The NetLabel communication layer uses a message based
+protocol built on top of the Generic NETLINK transport mechanism.  The exact
+formatting of these NetLabel messages as well as the Generic NETLINK family
+names can be found in the 'net/netlabel/' directory as comments in the
+header files as well as in 'include/net/netlabel.h'.
+
+Security Module API
+===================
+
+The purpose of the NetLabel security module API is to provide a protocol
+independent interface to the underlying NetLabel protocol engines.  In addition
+to protocol independence, the security module API is designed to be completely
+LSM independent which should allow multiple LSMs to leverage the same code
+base.
+
+Detailed information about the NetLabel security module API can be found in the
+'include/net/netlabel.h' header file as well as the 'lsm_interface.txt' file
+found in this directory.
diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt
deleted file mode 100644
index 3caf77bcff0f..000000000000
--- a/Documentation/netlabel/introduction.txt
+++ /dev/null
@@ -1,46 +0,0 @@
-NetLabel Introduction
-==============================================================================
-Paul Moore, paul.moore@hp.com
-
-August 2, 2006
-
- * Overview
-
-NetLabel is a mechanism which can be used by kernel security modules to attach
-security attributes to outgoing network packets generated from user space
-applications and read security attributes from incoming network packets.  It
-is composed of three main components, the protocol engines, the communication
-layer, and the kernel security module API.
-
- * Protocol Engines
-
-The protocol engines are responsible for both applying and retrieving the
-network packet's security attributes.  If any translation between the network
-security attributes and those on the host are required then the protocol
-engine will handle those tasks as well.  Other kernel subsystems should
-refrain from calling the protocol engines directly, instead they should use
-the NetLabel kernel security module API described below.
-
-Detailed information about each NetLabel protocol engine can be found in this
-directory.
-
- * Communication Layer
-
-The communication layer exists to allow NetLabel configuration and monitoring
-from user space.  The NetLabel communication layer uses a message based
-protocol built on top of the Generic NETLINK transport mechanism.  The exact
-formatting of these NetLabel messages as well as the Generic NETLINK family
-names can be found in the 'net/netlabel/' directory as comments in the
-header files as well as in 'include/net/netlabel.h'.
-
- * Security Module API
-
-The purpose of the NetLabel security module API is to provide a protocol
-independent interface to the underlying NetLabel protocol engines.  In addition
-to protocol independence, the security module API is designed to be completely
-LSM independent which should allow multiple LSMs to leverage the same code
-base.
-
-Detailed information about the NetLabel security module API can be found in the
-'include/net/netlabel.h' header file as well as the 'lsm_interface.txt' file
-found in this directory.
diff --git a/Documentation/netlabel/lsm_interface.rst b/Documentation/netlabel/lsm_interface.rst
new file mode 100644
index 000000000000..026fc267f798
--- /dev/null
+++ b/Documentation/netlabel/lsm_interface.rst
@@ -0,0 +1,53 @@
+========================================
+NetLabel Linux Security Module Interface
+========================================
+
+Paul Moore, paul.moore@hp.com
+
+May 17, 2006
+
+Overview
+========
+
+NetLabel is a mechanism which can set and retrieve security attributes from
+network packets.  It is intended to be used by LSM developers who want to make
+use of a common code base for several different packet labeling protocols.
+The NetLabel security module API is defined in 'include/net/netlabel.h' but a
+brief overview is given below.
+
+NetLabel Security Attributes
+============================
+
+Since NetLabel supports multiple different packet labeling protocols and LSMs
+it uses the concept of security attributes to refer to the packet's security
+labels.  The NetLabel security attributes are defined by the
+'netlbl_lsm_secattr' structure in the NetLabel header file.  Internally the
+NetLabel subsystem converts the security attributes to and from the correct
+low-level packet label depending on the NetLabel build time and run time
+configuration.  It is up to the LSM developer to translate the NetLabel
+security attributes into whatever security identifiers are in use for their
+particular LSM.
+
+NetLabel LSM Protocol Operations
+================================
+
+These are the functions which allow the LSM developer to manipulate the labels
+on outgoing packets as well as read the labels on incoming packets.  Functions
+exist to operate both on sockets as well as the sk_buffs directly.  These high
+level functions are translated into low level protocol operations based on how
+the administrator has configured the NetLabel subsystem.
+
+NetLabel Label Mapping Cache Operations
+=======================================
+
+Depending on the exact configuration, translation between the network packet
+label and the internal LSM security identifier can be time consuming.  The
+NetLabel label mapping cache is a caching mechanism which can be used to
+sidestep much of this overhead once a mapping has been established.  Once the
+LSM has received a packet, used NetLabel to decode its security attributes,
+and translated the security attributes into a LSM internal identifier the LSM
+can use the NetLabel caching functions to associate the LSM internal
+identifier with the network packet's label.  This means that in the future
+when a incoming packet matches a cached value not only are the internal
+NetLabel translation mechanisms bypassed but the LSM translation mechanisms are
+bypassed as well which should result in a significant reduction in overhead.
diff --git a/Documentation/netlabel/lsm_interface.txt b/Documentation/netlabel/lsm_interface.txt
deleted file mode 100644
index 638c74f7de7f..000000000000
--- a/Documentation/netlabel/lsm_interface.txt
+++ /dev/null
@@ -1,47 +0,0 @@
-NetLabel Linux Security Module Interface
-==============================================================================
-Paul Moore, paul.moore@hp.com
-
-May 17, 2006
-
- * Overview
-
-NetLabel is a mechanism which can set and retrieve security attributes from
-network packets.  It is intended to be used by LSM developers who want to make
-use of a common code base for several different packet labeling protocols.
-The NetLabel security module API is defined in 'include/net/netlabel.h' but a
-brief overview is given below.
-
- * NetLabel Security Attributes
-
-Since NetLabel supports multiple different packet labeling protocols and LSMs
-it uses the concept of security attributes to refer to the packet's security
-labels.  The NetLabel security attributes are defined by the
-'netlbl_lsm_secattr' structure in the NetLabel header file.  Internally the
-NetLabel subsystem converts the security attributes to and from the correct
-low-level packet label depending on the NetLabel build time and run time
-configuration.  It is up to the LSM developer to translate the NetLabel
-security attributes into whatever security identifiers are in use for their
-particular LSM.
-
- * NetLabel LSM Protocol Operations
-
-These are the functions which allow the LSM developer to manipulate the labels
-on outgoing packets as well as read the labels on incoming packets.  Functions
-exist to operate both on sockets as well as the sk_buffs directly.  These high
-level functions are translated into low level protocol operations based on how
-the administrator has configured the NetLabel subsystem.
-
- * NetLabel Label Mapping Cache Operations
-
-Depending on the exact configuration, translation between the network packet
-label and the internal LSM security identifier can be time consuming.  The
-NetLabel label mapping cache is a caching mechanism which can be used to
-sidestep much of this overhead once a mapping has been established.  Once the
-LSM has received a packet, used NetLabel to decode its security attributes,
-and translated the security attributes into a LSM internal identifier the LSM
-can use the NetLabel caching functions to associate the LSM internal
-identifier with the network packet's label.  This means that in the future
-when a incoming packet matches a cached value not only are the internal
-NetLabel translation mechanisms bypassed but the LSM translation mechanisms are
-bypassed as well which should result in a significant reduction in overhead.
-- 
cgit v1.2.3-59-g8ed1b


From 3bdab16c55f57a24245c97d707241dd9b48d1a91 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:53 -0300
Subject: docs: pcmcia: convert docs to ReST and rename to *.rst

Convert the pcmcia docs to ReST format. Most of the changes here
are trivial.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/pcmcia/devicetable.rst    |  37 ++++++++
 Documentation/pcmcia/devicetable.txt    |  33 -------
 Documentation/pcmcia/driver-changes.rst | 160 ++++++++++++++++++++++++++++++++
 Documentation/pcmcia/driver-changes.txt | 149 -----------------------------
 Documentation/pcmcia/driver.rst         |  30 ++++++
 Documentation/pcmcia/driver.txt         |  30 ------
 Documentation/pcmcia/index.rst          |  20 ++++
 Documentation/pcmcia/locking.rst        | 133 ++++++++++++++++++++++++++
 Documentation/pcmcia/locking.txt        | 118 -----------------------
 drivers/pcmcia/ds.c                     |   2 +-
 include/pcmcia/ds.h                     |   2 +-
 include/pcmcia/ss.h                     |   2 +-
 12 files changed, 383 insertions(+), 333 deletions(-)
 create mode 100644 Documentation/pcmcia/devicetable.rst
 delete mode 100644 Documentation/pcmcia/devicetable.txt
 create mode 100644 Documentation/pcmcia/driver-changes.rst
 delete mode 100644 Documentation/pcmcia/driver-changes.txt
 create mode 100644 Documentation/pcmcia/driver.rst
 delete mode 100644 Documentation/pcmcia/driver.txt
 create mode 100644 Documentation/pcmcia/index.rst
 create mode 100644 Documentation/pcmcia/locking.rst
 delete mode 100644 Documentation/pcmcia/locking.txt

diff --git a/Documentation/pcmcia/devicetable.rst b/Documentation/pcmcia/devicetable.rst
new file mode 100644
index 000000000000..fd1d60d12ca1
--- /dev/null
+++ b/Documentation/pcmcia/devicetable.rst
@@ -0,0 +1,37 @@
+============
+Device table
+============
+
+Matching of PCMCIA devices to drivers is done using one or more of the
+following criteria:
+
+- manufactor ID
+- card ID
+- product ID strings _and_ hashes of these strings
+- function ID
+- device function (actual and pseudo)
+
+You should use the helpers in include/pcmcia/device_id.h for generating the
+struct pcmcia_device_id[] entries which match devices to drivers.
+
+If you want to match product ID strings, you also need to pass the crc32
+hashes of the string to the macro, e.g. if you want to match the product ID
+string 1, you need to use
+
+PCMCIA_DEVICE_PROD_ID1("some_string", 0x(hash_of_some_string)),
+
+If the hash is incorrect, the kernel will inform you about this in "dmesg"
+upon module initialization, and tell you of the correct hash.
+
+You can determine the hash of the product ID strings by catting the file
+"modalias" in the sysfs directory of the PCMCIA device. It generates a string
+in the following form:
+pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000
+
+The hex value after "pa" is the hash of product ID string 1, after "pb" for
+string 2 and so on.
+
+Alternatively, you can use crc32hash (see tools/pcmcia/crc32hash.c)
+to determine the crc32 hash.  Simply pass the string you want to evaluate
+as argument to this program, e.g.:
+$ tools/pcmcia/crc32hash "Dual Speed"
diff --git a/Documentation/pcmcia/devicetable.txt b/Documentation/pcmcia/devicetable.txt
deleted file mode 100644
index 5f3e00ab54c4..000000000000
--- a/Documentation/pcmcia/devicetable.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-Matching of PCMCIA devices to drivers is done using one or more of the
-following criteria:
-
-- manufactor ID
-- card ID
-- product ID strings _and_ hashes of these strings
-- function ID
-- device function (actual and pseudo)
-
-You should use the helpers in include/pcmcia/device_id.h for generating the
-struct pcmcia_device_id[] entries which match devices to drivers.
-
-If you want to match product ID strings, you also need to pass the crc32
-hashes of the string to the macro, e.g. if you want to match the product ID
-string 1, you need to use
-
-PCMCIA_DEVICE_PROD_ID1("some_string", 0x(hash_of_some_string)),
-
-If the hash is incorrect, the kernel will inform you about this in "dmesg"
-upon module initialization, and tell you of the correct hash.
-
-You can determine the hash of the product ID strings by catting the file
-"modalias" in the sysfs directory of the PCMCIA device. It generates a string
-in the following form:
-pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000
-
-The hex value after "pa" is the hash of product ID string 1, after "pb" for
-string 2 and so on.
-
-Alternatively, you can use crc32hash (see tools/pcmcia/crc32hash.c)
-to determine the crc32 hash.  Simply pass the string you want to evaluate
-as argument to this program, e.g.:
-$ tools/pcmcia/crc32hash "Dual Speed"
diff --git a/Documentation/pcmcia/driver-changes.rst b/Documentation/pcmcia/driver-changes.rst
new file mode 100644
index 000000000000..33fe9ebec049
--- /dev/null
+++ b/Documentation/pcmcia/driver-changes.rst
@@ -0,0 +1,160 @@
+==============
+Driver changes
+==============
+
+This file details changes in 2.6 which affect PCMCIA card driver authors:
+
+* pcmcia_loop_config() and autoconfiguration (as of 2.6.36)
+   If `struct pcmcia_device *p_dev->config_flags` is set accordingly,
+   pcmcia_loop_config() now sets up certain configuration values
+   automatically, though the driver may still override the settings
+   in the callback function. The following autoconfiguration options
+   are provided at the moment:
+
+	- CONF_AUTO_CHECK_VCC : check for matching Vcc
+	- CONF_AUTO_SET_VPP   : set Vpp
+	- CONF_AUTO_AUDIO     : auto-enable audio line, if required
+	- CONF_AUTO_SET_IO    : set ioport resources (->resource[0,1])
+	- CONF_AUTO_SET_IOMEM : set first iomem resource (->resource[2])
+
+* pcmcia_request_configuration -> pcmcia_enable_device (as of 2.6.36)
+   pcmcia_request_configuration() got renamed to pcmcia_enable_device(),
+   as it mirrors pcmcia_disable_device(). Configuration settings are now
+   stored in struct pcmcia_device, e.g. in the fields config_flags,
+   config_index, config_base, vpp.
+
+* pcmcia_request_window changes (as of 2.6.36)
+   Instead of win_req_t, drivers are now requested to fill out
+   `struct pcmcia_device *p_dev->resource[2,3,4,5]` for up to four ioport
+   ranges. After a call to pcmcia_request_window(), the regions found there
+   are reserved and may be used immediately -- until pcmcia_release_window()
+   is called.
+
+* pcmcia_request_io changes (as of 2.6.36)
+   Instead of io_req_t, drivers are now requested to fill out
+   `struct pcmcia_device *p_dev->resource[0,1]` for up to two ioport
+   ranges. After a call to pcmcia_request_io(), the ports found there
+   are reserved, after calling pcmcia_request_configuration(), they may
+   be used.
+
+* No dev_info_t, no cs_types.h (as of 2.6.36)
+   dev_info_t and a few other typedefs are removed. No longer use them
+   in PCMCIA device drivers. Also, do not include pcmcia/cs_types.h, as
+   this file is gone.
+
+* No dev_node_t (as of 2.6.35)
+   There is no more need to fill out a "dev_node_t" structure.
+
+* New IRQ request rules (as of 2.6.35)
+   Instead of the old pcmcia_request_irq() interface, drivers may now
+   choose between:
+
+   - calling request_irq/free_irq directly. Use the IRQ from `*p_dev->irq`.
+   - use pcmcia_request_irq(p_dev, handler_t); the PCMCIA core will
+     clean up automatically on calls to pcmcia_disable_device() or
+     device ejection.
+
+* no cs_error / CS_CHECK / CONFIG_PCMCIA_DEBUG (as of 2.6.33)
+   Instead of the cs_error() callback or the CS_CHECK() macro, please use
+   Linux-style checking of return values, and -- if necessary -- debug
+   messages using "dev_dbg()" or "pr_debug()".
+
+* New CIS tuple access (as of 2.6.33)
+   Instead of pcmcia_get_{first,next}_tuple(), pcmcia_get_tuple_data() and
+   pcmcia_parse_tuple(), a driver shall use "pcmcia_get_tuple()" if it is
+   only interested in one (raw) tuple, or "pcmcia_loop_tuple()" if it is
+   interested in all tuples of one type. To decode the MAC from CISTPL_FUNCE,
+   a new helper "pcmcia_get_mac_from_cis()" was added.
+
+* New configuration loop helper (as of 2.6.28)
+   By calling pcmcia_loop_config(), a driver can iterate over all available
+   configuration options. During a driver's probe() phase, one doesn't need
+   to use pcmcia_get_{first,next}_tuple, pcmcia_get_tuple_data and
+   pcmcia_parse_tuple directly in most if not all cases.
+
+* New release helper (as of 2.6.17)
+   Instead of calling pcmcia_release_{configuration,io,irq,win}, all that's
+   necessary now is calling pcmcia_disable_device. As there is no valid
+   reason left to call pcmcia_release_io and pcmcia_release_irq, the
+   exports for them were removed.
+
+* Unify detach and REMOVAL event code, as well as attach and INSERTION
+  code (as of 2.6.16)::
+
+       void (*remove)          (struct pcmcia_device *dev);
+       int (*probe)            (struct pcmcia_device *dev);
+
+* Move suspend, resume and reset out of event handler (as of 2.6.16)::
+
+       int (*suspend)          (struct pcmcia_device *dev);
+       int (*resume)           (struct pcmcia_device *dev);
+
+  should be initialized in struct pcmcia_driver, and handle
+  (SUSPEND == RESET_PHYSICAL) and (RESUME == CARD_RESET) events
+
+* event handler initialization in struct pcmcia_driver (as of 2.6.13)
+   The event handler is notified of all events, and must be initialized
+   as the event() callback in the driver's struct pcmcia_driver.
+
+* pcmcia/version.h should not be used (as of 2.6.13)
+   This file will be removed eventually.
+
+* in-kernel device<->driver matching (as of 2.6.13)
+   PCMCIA devices and their correct drivers can now be matched in
+   kernelspace. See 'devicetable.txt' for details.
+
+* Device model integration (as of 2.6.11)
+   A struct pcmcia_device is registered with the device model core,
+   and can be used (e.g. for SET_NETDEV_DEV) by using
+   handle_to_dev(client_handle_t * handle).
+
+* Convert internal I/O port addresses to unsigned int (as of 2.6.11)
+   ioaddr_t should be replaced by unsigned int in PCMCIA card drivers.
+
+* irq_mask and irq_list parameters (as of 2.6.11)
+   The irq_mask and irq_list parameters should no longer be used in
+   PCMCIA card drivers. Instead, it is the job of the PCMCIA core to
+   determine which IRQ should be used. Therefore, link->irq.IRQInfo2
+   is ignored.
+
+* client->PendingEvents is gone (as of 2.6.11)
+   client->PendingEvents is no longer available.
+
+* client->Attributes are gone (as of 2.6.11)
+   client->Attributes is unused, therefore it is removed from all
+   PCMCIA card drivers
+
+* core functions no longer available (as of 2.6.11)
+   The following functions have been removed from the kernel source
+   because they are unused by all in-kernel drivers, and no external
+   driver was reported to rely on them::
+
+	pcmcia_get_first_region()
+	pcmcia_get_next_region()
+	pcmcia_modify_window()
+	pcmcia_set_event_mask()
+	pcmcia_get_first_window()
+	pcmcia_get_next_window()
+
+* device list iteration upon module removal (as of 2.6.10)
+   It is no longer necessary to iterate on the driver's internal
+   client list and call the ->detach() function upon module removal.
+
+* Resource management. (as of 2.6.8)
+   Although the PCMCIA subsystem will allocate resources for cards,
+   it no longer marks these resources busy. This means that driver
+   authors are now responsible for claiming your resources as per
+   other drivers in Linux. You should use request_region() to mark
+   your IO regions in-use, and request_mem_region() to mark your
+   memory regions in-use. The name argument should be a pointer to
+   your driver name. Eg, for pcnet_cs, name should point to the
+   string "pcnet_cs".
+
+* CardServices is gone
+  CardServices() in 2.4 is just a big switch statement to call various
+  services.  In 2.6, all of those entry points are exported and called
+  directly (except for pcmcia_report_error(), just use cs_error() instead).
+
+* struct pcmcia_driver
+  You need to use struct pcmcia_driver and pcmcia_{un,}register_driver
+  instead of {un,}register_pccard_driver
diff --git a/Documentation/pcmcia/driver-changes.txt b/Documentation/pcmcia/driver-changes.txt
deleted file mode 100644
index 78355c4c268a..000000000000
--- a/Documentation/pcmcia/driver-changes.txt
+++ /dev/null
@@ -1,149 +0,0 @@
-This file details changes in 2.6 which affect PCMCIA card driver authors:
-* pcmcia_loop_config() and autoconfiguration (as of 2.6.36)
-   If struct pcmcia_device *p_dev->config_flags is set accordingly,
-   pcmcia_loop_config() now sets up certain configuration values
-   automatically, though the driver may still override the settings
-   in the callback function. The following autoconfiguration options
-   are provided at the moment:
-	CONF_AUTO_CHECK_VCC : check for matching Vcc
-	CONF_AUTO_SET_VPP   : set Vpp
-	CONF_AUTO_AUDIO     : auto-enable audio line, if required
-	CONF_AUTO_SET_IO    : set ioport resources (->resource[0,1])
-	CONF_AUTO_SET_IOMEM : set first iomem resource (->resource[2])
-
-* pcmcia_request_configuration -> pcmcia_enable_device (as of 2.6.36)
-   pcmcia_request_configuration() got renamed to pcmcia_enable_device(),
-   as it mirrors pcmcia_disable_device(). Configuration settings are now
-   stored in struct pcmcia_device, e.g. in the fields config_flags,
-   config_index, config_base, vpp.
-
-* pcmcia_request_window changes (as of 2.6.36)
-   Instead of win_req_t, drivers are now requested to fill out
-   struct pcmcia_device *p_dev->resource[2,3,4,5] for up to four ioport
-   ranges. After a call to pcmcia_request_window(), the regions found there
-   are reserved and may be used immediately -- until pcmcia_release_window()
-   is called.
-
-* pcmcia_request_io changes (as of 2.6.36)
-   Instead of io_req_t, drivers are now requested to fill out
-   struct pcmcia_device *p_dev->resource[0,1] for up to two ioport
-   ranges. After a call to pcmcia_request_io(), the ports found there
-   are reserved, after calling pcmcia_request_configuration(), they may
-   be used.
-
-* No dev_info_t, no cs_types.h (as of 2.6.36)
-   dev_info_t and a few other typedefs are removed. No longer use them
-   in PCMCIA device drivers. Also, do not include pcmcia/cs_types.h, as
-   this file is gone.
-
-* No dev_node_t (as of 2.6.35)
-   There is no more need to fill out a "dev_node_t" structure.
-
-* New IRQ request rules (as of 2.6.35)
-   Instead of the old pcmcia_request_irq() interface, drivers may now
-   choose between:
-   - calling request_irq/free_irq directly. Use the IRQ from *p_dev->irq.
-   - use pcmcia_request_irq(p_dev, handler_t); the PCMCIA core will
-     clean up automatically on calls to pcmcia_disable_device() or
-     device ejection.
-
-* no cs_error / CS_CHECK / CONFIG_PCMCIA_DEBUG (as of 2.6.33)
-   Instead of the cs_error() callback or the CS_CHECK() macro, please use
-   Linux-style checking of return values, and -- if necessary -- debug
-   messages using "dev_dbg()" or "pr_debug()".
-
-* New CIS tuple access (as of 2.6.33)
-   Instead of pcmcia_get_{first,next}_tuple(), pcmcia_get_tuple_data() and
-   pcmcia_parse_tuple(), a driver shall use "pcmcia_get_tuple()" if it is
-   only interested in one (raw) tuple, or "pcmcia_loop_tuple()" if it is
-   interested in all tuples of one type. To decode the MAC from CISTPL_FUNCE,
-   a new helper "pcmcia_get_mac_from_cis()" was added.
-
-* New configuration loop helper (as of 2.6.28)
-   By calling pcmcia_loop_config(), a driver can iterate over all available
-   configuration options. During a driver's probe() phase, one doesn't need
-   to use pcmcia_get_{first,next}_tuple, pcmcia_get_tuple_data and
-   pcmcia_parse_tuple directly in most if not all cases.
-
-* New release helper (as of 2.6.17)
-   Instead of calling pcmcia_release_{configuration,io,irq,win}, all that's
-   necessary now is calling pcmcia_disable_device. As there is no valid
-   reason left to call pcmcia_release_io and pcmcia_release_irq, the
-   exports for them were removed.
-
-* Unify detach and REMOVAL event code, as well as attach and INSERTION
-  code (as of 2.6.16)
-       void (*remove)          (struct pcmcia_device *dev);
-       int (*probe)            (struct pcmcia_device *dev);
-
-* Move suspend, resume and reset out of event handler (as of 2.6.16)
-       int (*suspend)          (struct pcmcia_device *dev);
-       int (*resume)           (struct pcmcia_device *dev);
-  should be initialized in struct pcmcia_driver, and handle
-  (SUSPEND == RESET_PHYSICAL) and (RESUME == CARD_RESET) events
-
-* event handler initialization in struct pcmcia_driver (as of 2.6.13)
-   The event handler is notified of all events, and must be initialized
-   as the event() callback in the driver's struct pcmcia_driver.
-
-* pcmcia/version.h should not be used (as of 2.6.13)
-   This file will be removed eventually.
-
-* in-kernel device<->driver matching (as of 2.6.13)
-   PCMCIA devices and their correct drivers can now be matched in
-   kernelspace. See 'devicetable.txt' for details.
-
-* Device model integration (as of 2.6.11)
-   A struct pcmcia_device is registered with the device model core,
-   and can be used (e.g. for SET_NETDEV_DEV) by using
-   handle_to_dev(client_handle_t * handle).
-
-* Convert internal I/O port addresses to unsigned int (as of 2.6.11)
-   ioaddr_t should be replaced by unsigned int in PCMCIA card drivers.
-
-* irq_mask and irq_list parameters (as of 2.6.11)
-   The irq_mask and irq_list parameters should no longer be used in
-   PCMCIA card drivers. Instead, it is the job of the PCMCIA core to
-   determine which IRQ should be used. Therefore, link->irq.IRQInfo2
-   is ignored.
-
-* client->PendingEvents is gone (as of 2.6.11)
-   client->PendingEvents is no longer available.
-
-* client->Attributes are gone (as of 2.6.11)
-   client->Attributes is unused, therefore it is removed from all
-   PCMCIA card drivers
-
-* core functions no longer available (as of 2.6.11)
-   The following functions have been removed from the kernel source
-   because they are unused by all in-kernel drivers, and no external
-   driver was reported to rely on them:
-	pcmcia_get_first_region()
-	pcmcia_get_next_region()
-	pcmcia_modify_window()
-	pcmcia_set_event_mask()
-	pcmcia_get_first_window()
-	pcmcia_get_next_window()
-
-* device list iteration upon module removal (as of 2.6.10)
-   It is no longer necessary to iterate on the driver's internal
-   client list and call the ->detach() function upon module removal.
-
-* Resource management. (as of 2.6.8)
-   Although the PCMCIA subsystem will allocate resources for cards,
-   it no longer marks these resources busy. This means that driver
-   authors are now responsible for claiming your resources as per
-   other drivers in Linux. You should use request_region() to mark
-   your IO regions in-use, and request_mem_region() to mark your
-   memory regions in-use. The name argument should be a pointer to
-   your driver name. Eg, for pcnet_cs, name should point to the
-   string "pcnet_cs".
-
-* CardServices is gone
-  CardServices() in 2.4 is just a big switch statement to call various
-  services.  In 2.6, all of those entry points are exported and called
-  directly (except for pcmcia_report_error(), just use cs_error() instead).
-
-* struct pcmcia_driver
-  You need to use struct pcmcia_driver and pcmcia_{un,}register_driver
-  instead of {un,}register_pccard_driver
diff --git a/Documentation/pcmcia/driver.rst b/Documentation/pcmcia/driver.rst
new file mode 100644
index 000000000000..5c4fe84d51c1
--- /dev/null
+++ b/Documentation/pcmcia/driver.rst
@@ -0,0 +1,30 @@
+=============
+PCMCIA Driver
+=============
+
+sysfs
+-----
+
+New PCMCIA IDs may be added to a device driver pcmcia_device_id table at
+runtime as shown below::
+
+  echo "match_flags manf_id card_id func_id function device_no \
+  prod_id_hash[0] prod_id_hash[1] prod_id_hash[2] prod_id_hash[3]" > \
+  /sys/bus/pcmcia/drivers/{driver}/new_id
+
+All fields are passed in as hexadecimal values (no leading 0x).
+The meaning is described in the PCMCIA specification, the match_flags is
+a bitwise or-ed combination from PCMCIA_DEV_ID_MATCH_* constants
+defined in include/linux/mod_devicetable.h.
+
+Once added, the driver probe routine will be invoked for any unclaimed
+PCMCIA device listed in its (newly updated) pcmcia_device_id list.
+
+A common use-case is to add a new device according to the manufacturer ID
+and the card ID (form the manf_id and card_id file in the device tree).
+For this, just use::
+
+  echo "0x3 manf_id card_id 0 0 0 0 0 0 0" > \
+    /sys/bus/pcmcia/drivers/{driver}/new_id
+
+after loading the driver.
diff --git a/Documentation/pcmcia/driver.txt b/Documentation/pcmcia/driver.txt
deleted file mode 100644
index 0ac167920778..000000000000
--- a/Documentation/pcmcia/driver.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-PCMCIA Driver
--------------
-
-
-sysfs
------
-
-New PCMCIA IDs may be added to a device driver pcmcia_device_id table at
-runtime as shown below:
-
-echo "match_flags manf_id card_id func_id function device_no \
-prod_id_hash[0] prod_id_hash[1] prod_id_hash[2] prod_id_hash[3]" > \
-/sys/bus/pcmcia/drivers/{driver}/new_id
-
-All fields are passed in as hexadecimal values (no leading 0x).
-The meaning is described in the PCMCIA specification, the match_flags is
-a bitwise or-ed combination from PCMCIA_DEV_ID_MATCH_* constants
-defined in include/linux/mod_devicetable.h.
-
-Once added, the driver probe routine will be invoked for any unclaimed
-PCMCIA device listed in its (newly updated) pcmcia_device_id list.
-
-A common use-case is to add a new device according to the manufacturer ID
-and the card ID (form the manf_id and card_id file in the device tree).
-For this, just use:
-
-echo "0x3 manf_id card_id 0 0 0 0 0 0 0" > \
-        /sys/bus/pcmcia/drivers/{driver}/new_id
-
-after loading the driver.
diff --git a/Documentation/pcmcia/index.rst b/Documentation/pcmcia/index.rst
new file mode 100644
index 000000000000..779c8527109e
--- /dev/null
+++ b/Documentation/pcmcia/index.rst
@@ -0,0 +1,20 @@
+:orphan:
+
+======
+pcmcia
+======
+
+.. toctree::
+    :maxdepth: 1
+
+    driver
+    devicetable
+    locking
+    driver-changes
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/pcmcia/locking.rst b/Documentation/pcmcia/locking.rst
new file mode 100644
index 000000000000..e35257139c89
--- /dev/null
+++ b/Documentation/pcmcia/locking.rst
@@ -0,0 +1,133 @@
+=======
+Locking
+=======
+
+This file explains the locking and exclusion scheme used in the PCCARD
+and PCMCIA subsystems.
+
+
+A) Overview, Locking Hierarchy:
+===============================
+
+pcmcia_socket_list_rwsem
+	- protects only the list of sockets
+
+- skt_mutex
+	- serializes card insert / ejection
+
+  - ops_mutex
+	- serializes socket operation
+
+
+B) Exclusion
+============
+
+The following functions and callbacks to struct pcmcia_socket must
+be called with "skt_mutex" held::
+
+	socket_detect_change()
+	send_event()
+	socket_reset()
+	socket_shutdown()
+	socket_setup()
+	socket_remove()
+	socket_insert()
+	socket_early_resume()
+	socket_late_resume()
+	socket_resume()
+	socket_suspend()
+
+	struct pcmcia_callback	*callback
+
+The following functions and callbacks to struct pcmcia_socket must
+be called with "ops_mutex" held::
+
+	socket_reset()
+	socket_setup()
+
+	struct pccard_operations	*ops
+	struct pccard_resource_ops	*resource_ops;
+
+Note that send_event() and `struct pcmcia_callback *callback` must not be
+called with "ops_mutex" held.
+
+
+C) Protection
+=============
+
+1. Global Data:
+---------------
+struct list_head	pcmcia_socket_list;
+
+protected by pcmcia_socket_list_rwsem;
+
+
+2. Per-Socket Data:
+-------------------
+The resource_ops and their data are protected by ops_mutex.
+
+The "main" struct pcmcia_socket is protected as follows (read-only fields
+or single-use fields not mentioned):
+
+- by pcmcia_socket_list_rwsem::
+
+	struct list_head	socket_list;
+
+- by thread_lock::
+
+	unsigned int		thread_events;
+
+- by skt_mutex::
+
+	u_int			suspended_state;
+	void			(*tune_bridge);
+	struct pcmcia_callback	*callback;
+	int			resume_status;
+
+- by ops_mutex::
+
+	socket_state_t		socket;
+	u_int			state;
+	u_short			lock_count;
+	pccard_mem_map		cis_mem;
+	void __iomem 		*cis_virt;
+	struct { }		irq;
+	io_window_t		io[];
+	pccard_mem_map		win[];
+	struct list_head	cis_cache;
+	size_t			fake_cis_len;
+	u8			*fake_cis;
+	u_int			irq_mask;
+	void 			(*zoom_video);
+	int 			(*power_hook);
+	u8			resource...;
+	struct list_head	devices_list;
+	u8			device_count;
+	struct 			pcmcia_state;
+
+
+3. Per PCMCIA-device Data:
+--------------------------
+
+The "main" struct pcmcia_device is protected as follows (read-only fields
+or single-use fields not mentioned):
+
+
+- by pcmcia_socket->ops_mutex::
+
+	struct list_head	socket_device_list;
+	struct config_t		*function_config;
+	u16			_irq:1;
+	u16			_io:1;
+	u16			_win:4;
+	u16			_locked:1;
+	u16			allow_func_id_match:1;
+	u16			suspended:1;
+	u16			_removed:1;
+
+- by the PCMCIA driver::
+
+	io_req_t		io;
+	irq_req_t		irq;
+	config_req_t		conf;
+	window_handle_t		win;
diff --git a/Documentation/pcmcia/locking.txt b/Documentation/pcmcia/locking.txt
deleted file mode 100644
index b2c9b478906b..000000000000
--- a/Documentation/pcmcia/locking.txt
+++ /dev/null
@@ -1,118 +0,0 @@
-This file explains the locking and exclusion scheme used in the PCCARD
-and PCMCIA subsystems.
-
-
-A) Overview, Locking Hierarchy:
-===============================
-
-pcmcia_socket_list_rwsem	- protects only the list of sockets
-- skt_mutex			- serializes card insert / ejection
-  - ops_mutex			- serializes socket operation
-
-
-B) Exclusion
-============
-
-The following functions and callbacks to struct pcmcia_socket must
-be called with "skt_mutex" held:
-
-	socket_detect_change()
-	send_event()
-	socket_reset()
-	socket_shutdown()
-	socket_setup()
-	socket_remove()
-	socket_insert()
-	socket_early_resume()
-	socket_late_resume()
-	socket_resume()
-	socket_suspend()
-
-	struct pcmcia_callback	*callback
-
-The following functions and callbacks to struct pcmcia_socket must
-be called with "ops_mutex" held:
-
-	socket_reset()
-	socket_setup()
-
-	struct pccard_operations	*ops
-	struct pccard_resource_ops	*resource_ops;
-
-Note that send_event() and struct pcmcia_callback *callback must not be
-called with "ops_mutex" held.
-
-
-C) Protection
-=============
-
-1. Global Data:
----------------
-struct list_head	pcmcia_socket_list;
-
-protected by pcmcia_socket_list_rwsem;
-
-
-2. Per-Socket Data:
--------------------
-The resource_ops and their data are protected by ops_mutex.
-
-The "main" struct pcmcia_socket is protected as follows (read-only fields
-or single-use fields not mentioned):
-
-- by pcmcia_socket_list_rwsem:
-	struct list_head	socket_list;
-
-- by thread_lock:
-	unsigned int		thread_events;
-
-- by skt_mutex:
-	u_int			suspended_state;
-	void			(*tune_bridge);
-	struct pcmcia_callback	*callback;
-	int			resume_status;
-
-- by ops_mutex:
-	socket_state_t		socket;
-	u_int			state;
-	u_short			lock_count;
-	pccard_mem_map		cis_mem;
-	void __iomem 		*cis_virt;
-	struct { }		irq;
-	io_window_t		io[];
-	pccard_mem_map		win[];
-	struct list_head	cis_cache;
-	size_t			fake_cis_len;
-	u8			*fake_cis;
-	u_int			irq_mask;
-	void 			(*zoom_video);
-	int 			(*power_hook);
-	u8			resource...;
-	struct list_head	devices_list;
-	u8			device_count;
-	struct 			pcmcia_state;
-
-
-3. Per PCMCIA-device Data:
---------------------------
-
-The "main" struct pcmcia_device is protected as follows (read-only fields
-or single-use fields not mentioned):
-
-
-- by pcmcia_socket->ops_mutex:
-	struct list_head	socket_device_list;
-	struct config_t		*function_config;
-	u16			_irq:1;
-	u16			_io:1;
-	u16			_win:4;
-	u16			_locked:1;
-	u16			allow_func_id_match:1;
-	u16			suspended:1;
-	u16			_removed:1;
-
-- by the PCMCIA driver:
-	io_req_t		io;
-	irq_req_t		irq;
-	config_req_t		conf;
-	window_handle_t		win;
diff --git a/drivers/pcmcia/ds.c b/drivers/pcmcia/ds.c
index a9258f641cee..5230e284bb20 100644
--- a/drivers/pcmcia/ds.c
+++ b/drivers/pcmcia/ds.c
@@ -67,7 +67,7 @@ static void pcmcia_check_driver(struct pcmcia_driver *p_drv)
 			       "be 0x%x\n", p_drv->name, did->prod_id[i],
 			       did->prod_id_hash[i], hash);
 			printk(KERN_DEBUG "pcmcia: see "
-				"Documentation/pcmcia/devicetable.txt for "
+				"Documentation/pcmcia/devicetable.rst for "
 				"details\n");
 		}
 		did++;
diff --git a/include/pcmcia/ds.h b/include/pcmcia/ds.h
index 3037157855f0..4e58c20dabcb 100644
--- a/include/pcmcia/ds.h
+++ b/include/pcmcia/ds.h
@@ -39,7 +39,7 @@ struct config_t;
 struct net_device;
 
 /* dynamic device IDs for PCMCIA device drivers. See
- * Documentation/pcmcia/driver.txt for details.
+ * Documentation/pcmcia/driver.rst for details.
 */
 struct pcmcia_dynids {
 	struct mutex		lock;
diff --git a/include/pcmcia/ss.h b/include/pcmcia/ss.h
index 731cde010f42..89629ee57840 100644
--- a/include/pcmcia/ss.h
+++ b/include/pcmcia/ss.h
@@ -190,7 +190,7 @@ struct pcmcia_socket {
 	unsigned int			sysfs_events;
 
 	/* For the non-trivial interaction between these locks,
-	 * see Documentation/pcmcia/locking.txt */
+	 * see Documentation/pcmcia/locking.rst */
 	struct mutex			skt_mutex;
 	struct mutex			ops_mutex;
 
-- 
cgit v1.2.3-59-g8ed1b


From 28aedd7ee214eb63a2e6924b5ec2b081aa7b3953 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:56 -0300
Subject: docs: pps.txt: convert to ReST and rename to pps.rst

This file is already in a good shape: just its title and
adding some literal block markups is needed for it to be
part of the document.

While it has a small chapter with sysfs stuff, most of
the document is focused on driver development.

As it describes a kernel API, move it to the driver-api
directory.

In order to avoid conflicts, let's add an :orphan: tag
to it, to be removed when added to the driver-api book.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/pps.rst | 242 +++++++++++++++++++++++++++++++++++++++
 Documentation/pps/pps.txt        | 239 --------------------------------------
 MAINTAINERS                      |   2 +-
 3 files changed, 243 insertions(+), 240 deletions(-)
 create mode 100644 Documentation/driver-api/pps.rst
 delete mode 100644 Documentation/pps/pps.txt

diff --git a/Documentation/driver-api/pps.rst b/Documentation/driver-api/pps.rst
new file mode 100644
index 000000000000..1456d2c32ebd
--- /dev/null
+++ b/Documentation/driver-api/pps.rst
@@ -0,0 +1,242 @@
+:orphan:
+
+======================
+PPS - Pulse Per Second
+======================
+
+Copyright (C) 2007 Rodolfo Giometti <giometti@enneenne.com>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+
+
+Overview
+--------
+
+LinuxPPS provides a programming interface (API) to define in the
+system several PPS sources.
+
+PPS means "pulse per second" and a PPS source is just a device which
+provides a high precision signal each second so that an application
+can use it to adjust system clock time.
+
+A PPS source can be connected to a serial port (usually to the Data
+Carrier Detect pin) or to a parallel port (ACK-pin) or to a special
+CPU's GPIOs (this is the common case in embedded systems) but in each
+case when a new pulse arrives the system must apply to it a timestamp
+and record it for userland.
+
+Common use is the combination of the NTPD as userland program, with a
+GPS receiver as PPS source, to obtain a wallclock-time with
+sub-millisecond synchronisation to UTC.
+
+
+RFC considerations
+------------------
+
+While implementing a PPS API as RFC 2783 defines and using an embedded
+CPU GPIO-Pin as physical link to the signal, I encountered a deeper
+problem:
+
+   At startup it needs a file descriptor as argument for the function
+   time_pps_create().
+
+This implies that the source has a /dev/... entry. This assumption is
+OK for the serial and parallel port, where you can do something
+useful besides(!) the gathering of timestamps as it is the central
+task for a PPS API. But this assumption does not work for a single
+purpose GPIO line. In this case even basic file-related functionality
+(like read() and write()) makes no sense at all and should not be a
+precondition for the use of a PPS API.
+
+The problem can be simply solved if you consider that a PPS source is
+not always connected with a GPS data source.
+
+So your programs should check if the GPS data source (the serial port
+for instance) is a PPS source too, and if not they should provide the
+possibility to open another device as PPS source.
+
+In LinuxPPS the PPS sources are simply char devices usually mapped
+into files /dev/pps0, /dev/pps1, etc.
+
+
+PPS with USB to serial devices
+------------------------------
+
+It is possible to grab the PPS from an USB to serial device. However,
+you should take into account the latencies and jitter introduced by
+the USB stack. Users have reported clock instability around +-1ms when
+synchronized with PPS through USB. With USB 2.0, jitter may decrease
+down to the order of 125 microseconds.
+
+This may be suitable for time server synchronization with NTP because
+of its undersampling and algorithms.
+
+If your device doesn't report PPS, you can check that the feature is
+supported by its driver. Most of the time, you only need to add a call
+to usb_serial_handle_dcd_change after checking the DCD status (see
+ch341 and pl2303 examples).
+
+
+Coding example
+--------------
+
+To register a PPS source into the kernel you should define a struct
+pps_source_info as follows::
+
+    static struct pps_source_info pps_ktimer_info = {
+	    .name         = "ktimer",
+	    .path         = "",
+	    .mode         = PPS_CAPTUREASSERT | PPS_OFFSETASSERT |
+			    PPS_ECHOASSERT |
+			    PPS_CANWAIT | PPS_TSFMT_TSPEC,
+	    .echo         = pps_ktimer_echo,
+	    .owner        = THIS_MODULE,
+    };
+
+and then calling the function pps_register_source() in your
+initialization routine as follows::
+
+    source = pps_register_source(&pps_ktimer_info,
+			PPS_CAPTUREASSERT | PPS_OFFSETASSERT);
+
+The pps_register_source() prototype is::
+
+  int pps_register_source(struct pps_source_info *info, int default_params)
+
+where "info" is a pointer to a structure that describes a particular
+PPS source, "default_params" tells the system what the initial default
+parameters for the device should be (it is obvious that these parameters
+must be a subset of ones defined in the struct
+pps_source_info which describe the capabilities of the driver).
+
+Once you have registered a new PPS source into the system you can
+signal an assert event (for example in the interrupt handler routine)
+just using::
+
+    pps_event(source, &ts, PPS_CAPTUREASSERT, ptr)
+
+where "ts" is the event's timestamp.
+
+The same function may also run the defined echo function
+(pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
+asked for that... etc..
+
+Please see the file drivers/pps/clients/pps-ktimer.c for example code.
+
+
+SYSFS support
+-------------
+
+If the SYSFS filesystem is enabled in the kernel it provides a new class::
+
+   $ ls /sys/class/pps/
+   pps0/  pps1/  pps2/
+
+Every directory is the ID of a PPS sources defined in the system and
+inside you find several files::
+
+   $ ls -F /sys/class/pps/pps0/
+   assert     dev        mode       path       subsystem@
+   clear      echo       name       power/     uevent
+
+
+Inside each "assert" and "clear" file you can find the timestamp and a
+sequence number::
+
+   $ cat /sys/class/pps/pps0/assert
+   1170026870.983207967#8
+
+Where before the "#" is the timestamp in seconds; after it is the
+sequence number. Other files are:
+
+ * echo: reports if the PPS source has an echo function or not;
+
+ * mode: reports available PPS functioning modes;
+
+ * name: reports the PPS source's name;
+
+ * path: reports the PPS source's device path, that is the device the
+   PPS source is connected to (if it exists).
+
+
+Testing the PPS support
+-----------------------
+
+In order to test the PPS support even without specific hardware you can use
+the pps-ktimer driver (see the client subsection in the PPS configuration menu)
+and the userland tools available in your distribution's pps-tools package,
+http://linuxpps.org , or https://github.com/redlab-i/pps-tools.
+
+Once you have enabled the compilation of pps-ktimer just modprobe it (if
+not statically compiled)::
+
+   # modprobe pps-ktimer
+
+and the run ppstest as follow::
+
+   $ ./ppstest /dev/pps1
+   trying PPS source "/dev/pps1"
+   found PPS source "/dev/pps1"
+   ok, found 1 source(s), now start fetching data...
+   source 0 - assert 1186592699.388832443, sequence: 364 - clear  0.000000000, sequence: 0
+   source 0 - assert 1186592700.388931295, sequence: 365 - clear  0.000000000, sequence: 0
+   source 0 - assert 1186592701.389032765, sequence: 366 - clear  0.000000000, sequence: 0
+
+Please note that to compile userland programs, you need the file timepps.h.
+This is available in the pps-tools repository mentioned above.
+
+
+Generators
+----------
+
+Sometimes one needs to be able not only to catch PPS signals but to produce
+them also. For example, running a distributed simulation, which requires
+computers' clock to be synchronized very tightly. One way to do this is to
+invent some complicated hardware solutions but it may be neither necessary
+nor affordable. The cheap way is to load a PPS generator on one of the
+computers (master) and PPS clients on others (slaves), and use very simple
+cables to deliver signals using parallel ports, for example.
+
+Parallel port cable pinout::
+
+	pin	name	master      slave
+	1	STROBE	  *------     *
+	2	D0	  *     |     *
+	3	D1	  *     |     *
+	4	D2	  *     |     *
+	5	D3	  *     |     *
+	6	D4	  *     |     *
+	7	D5	  *     |     *
+	8	D6	  *     |     *
+	9	D7	  *     |     *
+	10	ACK	  *     ------*
+	11	BUSY	  *           *
+	12	PE	  *           *
+	13	SEL	  *           *
+	14	AUTOFD	  *           *
+	15	ERROR	  *           *
+	16	INIT	  *           *
+	17	SELIN	  *           *
+	18-25	GND	  *-----------*
+
+Please note that parallel port interrupt occurs only on high->low transition,
+so it is used for PPS assert edge. PPS clear edge can be determined only
+using polling in the interrupt handler which actually can be done way more
+precisely because interrupt handling delays can be quite big and random. So
+current parport PPS generator implementation (pps_gen_parport module) is
+geared towards using the clear edge for time synchronization.
+
+Clear edge polling is done with disabled interrupts so it's better to select
+delay between assert and clear edge as small as possible to reduce system
+latencies. But if it is too small slave won't be able to capture clear edge
+transition. The default of 30us should be good enough in most situations.
+The delay can be selected using 'delay' pps_gen_parport module parameter.
diff --git a/Documentation/pps/pps.txt b/Documentation/pps/pps.txt
deleted file mode 100644
index 99f5d8c4c652..000000000000
--- a/Documentation/pps/pps.txt
+++ /dev/null
@@ -1,239 +0,0 @@
-
-			PPS - Pulse Per Second
-			----------------------
-
-(C) Copyright 2007 Rodolfo Giometti <giometti@enneenne.com>
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-
-
-Overview
---------
-
-LinuxPPS provides a programming interface (API) to define in the
-system several PPS sources.
-
-PPS means "pulse per second" and a PPS source is just a device which
-provides a high precision signal each second so that an application
-can use it to adjust system clock time.
-
-A PPS source can be connected to a serial port (usually to the Data
-Carrier Detect pin) or to a parallel port (ACK-pin) or to a special
-CPU's GPIOs (this is the common case in embedded systems) but in each
-case when a new pulse arrives the system must apply to it a timestamp
-and record it for userland.
-
-Common use is the combination of the NTPD as userland program, with a
-GPS receiver as PPS source, to obtain a wallclock-time with
-sub-millisecond synchronisation to UTC.
-
-
-RFC considerations
-------------------
-
-While implementing a PPS API as RFC 2783 defines and using an embedded
-CPU GPIO-Pin as physical link to the signal, I encountered a deeper
-problem:
-
-   At startup it needs a file descriptor as argument for the function
-   time_pps_create().
-
-This implies that the source has a /dev/... entry. This assumption is
-OK for the serial and parallel port, where you can do something
-useful besides(!) the gathering of timestamps as it is the central
-task for a PPS API. But this assumption does not work for a single
-purpose GPIO line. In this case even basic file-related functionality
-(like read() and write()) makes no sense at all and should not be a
-precondition for the use of a PPS API.
-
-The problem can be simply solved if you consider that a PPS source is
-not always connected with a GPS data source.
-
-So your programs should check if the GPS data source (the serial port
-for instance) is a PPS source too, and if not they should provide the
-possibility to open another device as PPS source.
-
-In LinuxPPS the PPS sources are simply char devices usually mapped
-into files /dev/pps0, /dev/pps1, etc.
-
-
-PPS with USB to serial devices
-------------------------------
-
-It is possible to grab the PPS from an USB to serial device. However,
-you should take into account the latencies and jitter introduced by
-the USB stack. Users have reported clock instability around +-1ms when
-synchronized with PPS through USB. With USB 2.0, jitter may decrease
-down to the order of 125 microseconds.
-
-This may be suitable for time server synchronization with NTP because
-of its undersampling and algorithms.
-
-If your device doesn't report PPS, you can check that the feature is
-supported by its driver. Most of the time, you only need to add a call
-to usb_serial_handle_dcd_change after checking the DCD status (see
-ch341 and pl2303 examples).
-
-
-Coding example
---------------
-
-To register a PPS source into the kernel you should define a struct
-pps_source_info as follows:
-
-    static struct pps_source_info pps_ktimer_info = {
-	    .name         = "ktimer",
-	    .path         = "",
-	    .mode         = PPS_CAPTUREASSERT | PPS_OFFSETASSERT |
-			    PPS_ECHOASSERT |
-			    PPS_CANWAIT | PPS_TSFMT_TSPEC,
-	    .echo         = pps_ktimer_echo,
-	    .owner        = THIS_MODULE,
-    };
-
-and then calling the function pps_register_source() in your
-initialization routine as follows:
-
-    source = pps_register_source(&pps_ktimer_info,
-			PPS_CAPTUREASSERT | PPS_OFFSETASSERT);
-
-The pps_register_source() prototype is:
-
-  int pps_register_source(struct pps_source_info *info, int default_params)
-
-where "info" is a pointer to a structure that describes a particular
-PPS source, "default_params" tells the system what the initial default
-parameters for the device should be (it is obvious that these parameters
-must be a subset of ones defined in the struct
-pps_source_info which describe the capabilities of the driver).
-
-Once you have registered a new PPS source into the system you can
-signal an assert event (for example in the interrupt handler routine)
-just using:
-
-    pps_event(source, &ts, PPS_CAPTUREASSERT, ptr)
-
-where "ts" is the event's timestamp.
-
-The same function may also run the defined echo function
-(pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
-asked for that... etc..
-
-Please see the file drivers/pps/clients/pps-ktimer.c for example code.
-
-
-SYSFS support
--------------
-
-If the SYSFS filesystem is enabled in the kernel it provides a new class:
-
-   $ ls /sys/class/pps/
-   pps0/  pps1/  pps2/
-
-Every directory is the ID of a PPS sources defined in the system and
-inside you find several files:
-
-   $ ls -F /sys/class/pps/pps0/
-   assert     dev        mode       path       subsystem@
-   clear      echo       name       power/     uevent
-
-
-Inside each "assert" and "clear" file you can find the timestamp and a
-sequence number:
-
-   $ cat /sys/class/pps/pps0/assert
-   1170026870.983207967#8
-
-Where before the "#" is the timestamp in seconds; after it is the
-sequence number. Other files are:
-
- * echo: reports if the PPS source has an echo function or not;
-
- * mode: reports available PPS functioning modes;
-
- * name: reports the PPS source's name;
-
- * path: reports the PPS source's device path, that is the device the
-   PPS source is connected to (if it exists).
-
-
-Testing the PPS support
------------------------
-
-In order to test the PPS support even without specific hardware you can use
-the pps-ktimer driver (see the client subsection in the PPS configuration menu)
-and the userland tools available in your distribution's pps-tools package,
-http://linuxpps.org , or https://github.com/redlab-i/pps-tools.
-
-Once you have enabled the compilation of pps-ktimer just modprobe it (if
-not statically compiled):
-
-   # modprobe pps-ktimer
-
-and the run ppstest as follow:
-
-   $ ./ppstest /dev/pps1
-   trying PPS source "/dev/pps1"
-   found PPS source "/dev/pps1"
-   ok, found 1 source(s), now start fetching data...
-   source 0 - assert 1186592699.388832443, sequence: 364 - clear  0.000000000, sequence: 0
-   source 0 - assert 1186592700.388931295, sequence: 365 - clear  0.000000000, sequence: 0
-   source 0 - assert 1186592701.389032765, sequence: 366 - clear  0.000000000, sequence: 0
-
-Please note that to compile userland programs, you need the file timepps.h.
-This is available in the pps-tools repository mentioned above.
-
-
-Generators
-----------
-
-Sometimes one needs to be able not only to catch PPS signals but to produce
-them also. For example, running a distributed simulation, which requires
-computers' clock to be synchronized very tightly. One way to do this is to
-invent some complicated hardware solutions but it may be neither necessary
-nor affordable. The cheap way is to load a PPS generator on one of the
-computers (master) and PPS clients on others (slaves), and use very simple
-cables to deliver signals using parallel ports, for example.
-
-Parallel port cable pinout:
-pin	name	master      slave
-1	STROBE	  *------     *
-2	D0	  *     |     *
-3	D1	  *     |     *
-4	D2	  *     |     *
-5	D3	  *     |     *
-6	D4	  *     |     *
-7	D5	  *     |     *
-8	D6	  *     |     *
-9	D7	  *     |     *
-10	ACK	  *     ------*
-11	BUSY	  *           *
-12	PE	  *           *
-13	SEL	  *           *
-14	AUTOFD	  *           *
-15	ERROR	  *           *
-16	INIT	  *           *
-17	SELIN	  *           *
-18-25	GND	  *-----------*
-
-Please note that parallel port interrupt occurs only on high->low transition,
-so it is used for PPS assert edge. PPS clear edge can be determined only
-using polling in the interrupt handler which actually can be done way more
-precisely because interrupt handling delays can be quite big and random. So
-current parport PPS generator implementation (pps_gen_parport module) is
-geared towards using the clear edge for time synchronization.
-
-Clear edge polling is done with disabled interrupts so it's better to select
-delay between assert and clear edge as small as possible to reduce system
-latencies. But if it is too small slave won't be able to capture clear edge
-transition. The default of 30us should be good enough in most situations.
-The delay can be selected using 'delay' pps_gen_parport module parameter.
diff --git a/MAINTAINERS b/MAINTAINERS
index ac88ed99fca5..aae3bd8a19f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12659,7 +12659,7 @@ M:	Rodolfo Giometti <giometti@enneenne.com>
 W:	http://wiki.enneenne.com/index.php/LinuxPPS_support
 L:	linuxpps@ml.enneenne.com (subscribers-only)
 S:	Maintained
-F:	Documentation/pps/
+F:	Documentation/driver-api/pps.rst
 F:	Documentation/devicetree/bindings/pps/pps-gpio.txt
 F:	Documentation/ABI/testing/sysfs-pps
 F:	drivers/pps/
-- 
cgit v1.2.3-59-g8ed1b


From 329f00415a424063c23f75ff77f7d9c67916324d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:57 -0300
Subject: docs: ptp.txt: convert to ReST and move to driver-api

The conversion is trivial: just adjust title markups.

In order to avoid conflicts, let's add an :orphan: tag
to it, to be removed when this file gets added to the
driver-api book.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/driver-api/ptp.rst          | 96 +++++++++++++++++++++++++++++++
 Documentation/networking/timestamping.txt |  2 +-
 Documentation/ptp/ptp.txt                 | 86 ---------------------------
 MAINTAINERS                               |  2 +-
 4 files changed, 98 insertions(+), 88 deletions(-)
 create mode 100644 Documentation/driver-api/ptp.rst
 delete mode 100644 Documentation/ptp/ptp.txt

diff --git a/Documentation/driver-api/ptp.rst b/Documentation/driver-api/ptp.rst
new file mode 100644
index 000000000000..b6e65d66d37a
--- /dev/null
+++ b/Documentation/driver-api/ptp.rst
@@ -0,0 +1,96 @@
+:orphan:
+
+===========================================
+PTP hardware clock infrastructure for Linux
+===========================================
+
+  This patch set introduces support for IEEE 1588 PTP clocks in
+  Linux. Together with the SO_TIMESTAMPING socket options, this
+  presents a standardized method for developing PTP user space
+  programs, synchronizing Linux with external clocks, and using the
+  ancillary features of PTP hardware clocks.
+
+  A new class driver exports a kernel interface for specific clock
+  drivers and a user space interface. The infrastructure supports a
+  complete set of PTP hardware clock functionality.
+
+  + Basic clock operations
+    - Set time
+    - Get time
+    - Shift the clock by a given offset atomically
+    - Adjust clock frequency
+
+  + Ancillary clock features
+    - Time stamp external events
+    - Period output signals configurable from user space
+    - Synchronization of the Linux system time via the PPS subsystem
+
+PTP hardware clock kernel API
+=============================
+
+   A PTP clock driver registers itself with the class driver. The
+   class driver handles all of the dealings with user space. The
+   author of a clock driver need only implement the details of
+   programming the clock hardware. The clock driver notifies the class
+   driver of asynchronous events (alarms and external time stamps) via
+   a simple message passing interface.
+
+   The class driver supports multiple PTP clock drivers. In normal use
+   cases, only one PTP clock is needed. However, for testing and
+   development, it can be useful to have more than one clock in a
+   single system, in order to allow performance comparisons.
+
+PTP hardware clock user space API
+=================================
+
+   The class driver also creates a character device for each
+   registered clock. User space can use an open file descriptor from
+   the character device as a POSIX clock id and may call
+   clock_gettime, clock_settime, and clock_adjtime.  These calls
+   implement the basic clock operations.
+
+   User space programs may control the clock using standardized
+   ioctls. A program may query, enable, configure, and disable the
+   ancillary clock features. User space can receive time stamped
+   events via blocking read() and poll().
+
+Writing clock drivers
+=====================
+
+   Clock drivers include include/linux/ptp_clock_kernel.h and register
+   themselves by presenting a 'struct ptp_clock_info' to the
+   registration method. Clock drivers must implement all of the
+   functions in the interface. If a clock does not offer a particular
+   ancillary feature, then the driver should just return -EOPNOTSUPP
+   from those functions.
+
+   Drivers must ensure that all of the methods in interface are
+   reentrant. Since most hardware implementations treat the time value
+   as a 64 bit integer accessed as two 32 bit registers, drivers
+   should use spin_lock_irqsave/spin_unlock_irqrestore to protect
+   against concurrent access. This locking cannot be accomplished in
+   class driver, since the lock may also be needed by the clock
+   driver's interrupt service routine.
+
+Supported hardware
+==================
+
+   * Freescale eTSEC gianfar
+
+     - 2 Time stamp external triggers, programmable polarity (opt. interrupt)
+     - 2 Alarm registers (optional interrupt)
+     - 3 Periodic signals (optional interrupt)
+
+   * National DP83640
+
+     - 6 GPIOs programmable as inputs or outputs
+     - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be
+       used as general inputs or outputs
+     - GPIO inputs can time stamp external triggers
+     - GPIO outputs can produce periodic signals
+     - 1 interrupt pin
+
+   * Intel IXP465
+
+     - Auxiliary Slave/Master Mode Snapshot (optional interrupt)
+     - Target Time (optional interrupt)
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index bbdaf8990031..8dd6333c3270 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -368,7 +368,7 @@ ts[1] used to hold hardware timestamps converted to system time.
 Instead, expose the hardware clock device on the NIC directly as
 a HW PTP clock source, to allow time conversion in userspace and
 optionally synchronize system time with a userspace PTP stack such
-as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
+as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst.
 
 Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
 together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
diff --git a/Documentation/ptp/ptp.txt b/Documentation/ptp/ptp.txt
deleted file mode 100644
index 11e904ee073f..000000000000
--- a/Documentation/ptp/ptp.txt
+++ /dev/null
@@ -1,86 +0,0 @@
-
-* PTP hardware clock infrastructure for Linux
-
-  This patch set introduces support for IEEE 1588 PTP clocks in
-  Linux. Together with the SO_TIMESTAMPING socket options, this
-  presents a standardized method for developing PTP user space
-  programs, synchronizing Linux with external clocks, and using the
-  ancillary features of PTP hardware clocks.
-
-  A new class driver exports a kernel interface for specific clock
-  drivers and a user space interface. The infrastructure supports a
-  complete set of PTP hardware clock functionality.
-
-  + Basic clock operations
-    - Set time
-    - Get time
-    - Shift the clock by a given offset atomically
-    - Adjust clock frequency
-
-  + Ancillary clock features
-    - Time stamp external events
-    - Period output signals configurable from user space
-    - Synchronization of the Linux system time via the PPS subsystem
-
-** PTP hardware clock kernel API
-
-   A PTP clock driver registers itself with the class driver. The
-   class driver handles all of the dealings with user space. The
-   author of a clock driver need only implement the details of
-   programming the clock hardware. The clock driver notifies the class
-   driver of asynchronous events (alarms and external time stamps) via
-   a simple message passing interface.
-
-   The class driver supports multiple PTP clock drivers. In normal use
-   cases, only one PTP clock is needed. However, for testing and
-   development, it can be useful to have more than one clock in a
-   single system, in order to allow performance comparisons.
-
-** PTP hardware clock user space API
-
-   The class driver also creates a character device for each
-   registered clock. User space can use an open file descriptor from
-   the character device as a POSIX clock id and may call
-   clock_gettime, clock_settime, and clock_adjtime.  These calls
-   implement the basic clock operations.
-
-   User space programs may control the clock using standardized
-   ioctls. A program may query, enable, configure, and disable the
-   ancillary clock features. User space can receive time stamped
-   events via blocking read() and poll().
-
-** Writing clock drivers
-
-   Clock drivers include include/linux/ptp_clock_kernel.h and register
-   themselves by presenting a 'struct ptp_clock_info' to the
-   registration method. Clock drivers must implement all of the
-   functions in the interface. If a clock does not offer a particular
-   ancillary feature, then the driver should just return -EOPNOTSUPP
-   from those functions.
-
-   Drivers must ensure that all of the methods in interface are
-   reentrant. Since most hardware implementations treat the time value
-   as a 64 bit integer accessed as two 32 bit registers, drivers
-   should use spin_lock_irqsave/spin_unlock_irqrestore to protect
-   against concurrent access. This locking cannot be accomplished in
-   class driver, since the lock may also be needed by the clock
-   driver's interrupt service routine.
-
-** Supported hardware
-
-   + Freescale eTSEC gianfar
-     - 2 Time stamp external triggers, programmable polarity (opt. interrupt)
-     - 2 Alarm registers (optional interrupt)
-     - 3 Periodic signals (optional interrupt)
-
-   + National DP83640
-     - 6 GPIOs programmable as inputs or outputs
-     - 6 GPIOs with dedicated functions (LED/JTAG/clock) can also be
-       used as general inputs or outputs
-     - GPIO inputs can time stamp external triggers
-     - GPIO outputs can produce periodic signals
-     - 1 interrupt pin
-
-   + Intel IXP465
-     - Auxiliary Slave/Master Mode Snapshot (optional interrupt)
-     - Target Time (optional interrupt)
diff --git a/MAINTAINERS b/MAINTAINERS
index aae3bd8a19f4..5fe44d5d82b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12765,7 +12765,7 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 W:	http://linuxptp.sourceforge.net/
 F:	Documentation/ABI/testing/sysfs-ptp
-F:	Documentation/ptp/*
+F:	Documentation/driver-api/ptp.rst
 F:	drivers/net/phy/dp83640*
 F:	drivers/ptp/*
 F:	include/linux/ptp_cl*
-- 
cgit v1.2.3-59-g8ed1b


From bdf3a950fb46600a2c34e5f3d9810c274e59a031 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:58 -0300
Subject: docs: riscv: convert docs to ReST and rename to *.rst

The conversion here is trivial:
 - Adjust the document title's markup
 - Do some whitespace alignment;
 - mark literal blocks;
 - Use ReST way to markup indented lists.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/riscv/index.rst |  17 +++
 Documentation/riscv/pmu.rst   | 255 ++++++++++++++++++++++++++++++++++++++++++
 Documentation/riscv/pmu.txt   | 249 -----------------------------------------
 3 files changed, 272 insertions(+), 249 deletions(-)
 create mode 100644 Documentation/riscv/index.rst
 create mode 100644 Documentation/riscv/pmu.rst
 delete mode 100644 Documentation/riscv/pmu.txt

diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst
new file mode 100644
index 000000000000..c4b906d9b5a7
--- /dev/null
+++ b/Documentation/riscv/index.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+===================
+RISC-V architecture
+===================
+
+.. toctree::
+    :maxdepth: 1
+
+    pmu
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/riscv/pmu.rst b/Documentation/riscv/pmu.rst
new file mode 100644
index 000000000000..acb216b99c26
--- /dev/null
+++ b/Documentation/riscv/pmu.rst
@@ -0,0 +1,255 @@
+===================================
+Supporting PMUs on RISC-V platforms
+===================================
+
+Alan Kao <alankao@andestech.com>, Mar 2018
+
+Introduction
+------------
+
+As of this writing, perf_event-related features mentioned in The RISC-V ISA
+Privileged Version 1.10 are as follows:
+(please check the manual for more details)
+
+* [m|s]counteren
+* mcycle[h], cycle[h]
+* minstret[h], instret[h]
+* mhpeventx, mhpcounterx[h]
+
+With such function set only, porting perf would require a lot of work, due to
+the lack of the following general architectural performance monitoring features:
+
+* Enabling/Disabling counters
+  Counters are just free-running all the time in our case.
+* Interrupt caused by counter overflow
+  No such feature in the spec.
+* Interrupt indicator
+  It is not possible to have many interrupt ports for all counters, so an
+  interrupt indicator is required for software to tell which counter has
+  just overflowed.
+* Writing to counters
+  There will be an SBI to support this since the kernel cannot modify the
+  counters [1].  Alternatively, some vendor considers to implement
+  hardware-extension for M-S-U model machines to write counters directly.
+
+This document aims to provide developers a quick guide on supporting their
+PMUs in the kernel.  The following sections briefly explain perf' mechanism
+and todos.
+
+You may check previous discussions here [1][2].  Also, it might be helpful
+to check the appendix for related kernel structures.
+
+
+1. Initialization
+-----------------
+
+*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
+various methods according to perf's internal convention and PMU-specific
+parameters.  One should declare such instance to represent the PMU.  By default,
+*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
+basic support to a baseline QEMU model.
+
+Then he/she can either assign the instance's pointer to *riscv_pmu* so that
+the minimal and already-implemented logic can be leveraged, or invent his/her
+own *riscv_init_platform_pmu* implementation.
+
+In other words, existing sources of *riscv_base_pmu* merely provide a
+reference implementation.  Developers can flexibly decide how many parts they
+can leverage, and in the most extreme case, they can customize every function
+according to their needs.
+
+
+2. Event Initialization
+-----------------------
+
+When a user launches a perf command to monitor some events, it is first
+interpreted by the userspace perf tool into multiple *perf_event_open*
+system calls, and then each of them calls to the body of *event_init*
+member function that was assigned in the previous step.  In *riscv_base_pmu*'s
+case, it is *riscv_event_init*.
+
+The main purpose of this function is to translate the event provided by user
+into bitmap, so that HW-related control registers or counters can directly be
+manipulated.  The translation is based on the mappings and methods provided in
+*riscv_pmu*.
+
+Note that some features can be done in this stage as well:
+
+(1) interrupt setting, which is stated in the next section;
+(2) privilege level setting (user space only, kernel space only, both);
+(3) destructor setting.  Normally it is sufficient to apply *riscv_destroy_event*;
+(4) tweaks for non-sampling events, which will be utilized by functions such as
+    *perf_adjust_period*, usually something like the follows::
+
+      if (!is_sampling_event(event)) {
+              hwc->sample_period = x86_pmu.max_period;
+              hwc->last_period = hwc->sample_period;
+              local64_set(&hwc->period_left, hwc->sample_period);
+      }
+
+In the case of *riscv_base_pmu*, only (3) is provided for now.
+
+
+3. Interrupt
+------------
+
+3.1. Interrupt Initialization
+
+This often occurs at the beginning of the *event_init* method. In common
+practice, this should be a code segment like::
+
+  int x86_reserve_hardware(void)
+  {
+        int err = 0;
+
+        if (!atomic_inc_not_zero(&pmc_refcount)) {
+                mutex_lock(&pmc_reserve_mutex);
+                if (atomic_read(&pmc_refcount) == 0) {
+                        if (!reserve_pmc_hardware())
+                                err = -EBUSY;
+                        else
+                                reserve_ds_buffers();
+                }
+                if (!err)
+                        atomic_inc(&pmc_refcount);
+                mutex_unlock(&pmc_reserve_mutex);
+        }
+
+        return err;
+  }
+
+And the magic is in *reserve_pmc_hardware*, which usually does atomic
+operations to make implemented IRQ accessible from some global function pointer.
+*release_pmc_hardware* serves the opposite purpose, and it is used in event
+destructors mentioned in previous section.
+
+(Note: From the implementations in all the architectures, the *reserve/release*
+pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
+It does NOT deal with the binding between an event and a physical counter,
+which will be introduced in the next section.)
+
+3.2. IRQ Structure
+
+Basically, a IRQ runs the following pseudo code::
+
+  for each hardware counter that triggered this overflow
+
+      get the event of this counter
+
+      // following two steps are defined as *read()*,
+      // check the section Reading/Writing Counters for details.
+      count the delta value since previous interrupt
+      update the event->count (# event occurs) by adding delta, and
+                 event->hw.period_left by subtracting delta
+
+      if the event overflows
+          sample data
+          set the counter appropriately for the next overflow
+
+          if the event overflows again
+              too frequently, throttle this event
+          fi
+      fi
+
+  end for
+
+However as of this writing, none of the RISC-V implementations have designed an
+interrupt for perf, so the details are to be completed in the future.
+
+4. Reading/Writing Counters
+---------------------------
+
+They seem symmetric but perf treats them quite differently.  For reading, there
+is a *read* interface in *struct pmu*, but it serves more than just reading.
+According to the context, the *read* function not only reads the content of the
+counter (event->count), but also updates the left period to the next interrupt
+(event->hw.period_left).
+
+But the core of perf does not need direct write to counters.  Writing counters
+is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one
+has to set the counter to a good value for the next interrupt; 2) inside the IRQ
+it should set the counter to the same resonable value.
+
+Reading is not a problem in RISC-V but writing would need some effort, since
+counters are not allowed to be written by S-mode.
+
+
+5. add()/del()/start()/stop()
+-----------------------------
+
+Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
+starts/stop the counter of some event in the PMU.  All of them take the same
+arguments: *struct perf_event *event* and *int flag*.
+
+Consider perf as a state machine, then you will find that these functions serve
+as the state transition process between those states.
+Three states (event->hw.state) are defined:
+
+* PERF_HES_STOPPED:	the counter is stopped
+* PERF_HES_UPTODATE:	the event->count is up-to-date
+* PERF_HES_ARCH:	arch-dependent usage ... we don't need this for now
+
+A normal flow of these state transitions are as follows:
+
+* A user launches a perf event, resulting in calling to *event_init*.
+* When being context-switched in, *add* is called by the perf core, with a flag
+  PERF_EF_START, which means that the event should be started after it is added.
+  At this stage, a general event is bound to a physical counter, if any.
+  The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
+  stopped, and the (software) event count does not need updating.
+
+  - *start* is then called, and the counter is enabled.
+    With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check
+    previous section for detail).
+    Nothing is written if the flag does not contain PERF_EF_RELOAD.
+    The state now is reset to none, because it is neither stopped nor updated
+    (the counting already started)
+
+* When being context-switched out, *del* is called.  It then checks out all the
+  events in the PMU and calls *stop* to update their counts.
+
+  - *stop* is called by *del*
+    and the perf core with flag PERF_EF_UPDATE, and it often shares the same
+    subroutine as *read* with the same logic.
+    The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
+
+  - Life cycle of these two pairs: *add* and *del* are called repeatedly as
+    tasks switch in-and-out; *start* and *stop* is also called when the perf core
+    needs a quick stop-and-start, for instance, when the interrupt period is being
+    adjusted.
+
+Current implementation is sufficient for now and can be easily extended to
+features in the future.
+
+A. Related Structures
+---------------------
+
+* struct pmu: include/linux/perf_event.h
+* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
+
+  Both structures are designed to be read-only.
+
+  *struct pmu* defines some function pointer interfaces, and most of them take
+  *struct perf_event* as a main argument, dealing with perf events according to
+  perf's internal state machine (check kernel/events/core.c for details).
+
+  *struct riscv_pmu* defines PMU-specific parameters.  The naming follows the
+  convention of all other architectures.
+
+* struct perf_event: include/linux/perf_event.h
+* struct hw_perf_event
+
+  The generic structure that represents perf events, and the hardware-related
+  details.
+
+* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
+
+  The structure that holds the status of events, has two fixed members:
+  the number of events and the array of the events.
+
+References
+----------
+
+[1] https://github.com/riscv/riscv-linux/pull/124
+
+[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
deleted file mode 100644
index b29f03a6d82f..000000000000
--- a/Documentation/riscv/pmu.txt
+++ /dev/null
@@ -1,249 +0,0 @@
-Supporting PMUs on RISC-V platforms
-==========================================
-Alan Kao <alankao@andestech.com>, Mar 2018
-
-Introduction
-------------
-
-As of this writing, perf_event-related features mentioned in The RISC-V ISA
-Privileged Version 1.10 are as follows:
-(please check the manual for more details)
-
-* [m|s]counteren
-* mcycle[h], cycle[h]
-* minstret[h], instret[h]
-* mhpeventx, mhpcounterx[h]
-
-With such function set only, porting perf would require a lot of work, due to
-the lack of the following general architectural performance monitoring features:
-
-* Enabling/Disabling counters
-  Counters are just free-running all the time in our case.
-* Interrupt caused by counter overflow
-  No such feature in the spec.
-* Interrupt indicator
-  It is not possible to have many interrupt ports for all counters, so an
-  interrupt indicator is required for software to tell which counter has
-  just overflowed.
-* Writing to counters
-  There will be an SBI to support this since the kernel cannot modify the
-  counters [1].  Alternatively, some vendor considers to implement
-  hardware-extension for M-S-U model machines to write counters directly.
-
-This document aims to provide developers a quick guide on supporting their
-PMUs in the kernel.  The following sections briefly explain perf' mechanism
-and todos.
-
-You may check previous discussions here [1][2].  Also, it might be helpful
-to check the appendix for related kernel structures.
-
-
-1. Initialization
------------------
-
-*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
-various methods according to perf's internal convention and PMU-specific
-parameters.  One should declare such instance to represent the PMU.  By default,
-*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
-basic support to a baseline QEMU model.
-
-Then he/she can either assign the instance's pointer to *riscv_pmu* so that
-the minimal and already-implemented logic can be leveraged, or invent his/her
-own *riscv_init_platform_pmu* implementation.
-
-In other words, existing sources of *riscv_base_pmu* merely provide a
-reference implementation.  Developers can flexibly decide how many parts they
-can leverage, and in the most extreme case, they can customize every function
-according to their needs.
-
-
-2. Event Initialization
------------------------
-
-When a user launches a perf command to monitor some events, it is first
-interpreted by the userspace perf tool into multiple *perf_event_open*
-system calls, and then each of them calls to the body of *event_init*
-member function that was assigned in the previous step.  In *riscv_base_pmu*'s
-case, it is *riscv_event_init*.
-
-The main purpose of this function is to translate the event provided by user
-into bitmap, so that HW-related control registers or counters can directly be
-manipulated.  The translation is based on the mappings and methods provided in
-*riscv_pmu*.
-
-Note that some features can be done in this stage as well:
-
-(1) interrupt setting, which is stated in the next section;
-(2) privilege level setting (user space only, kernel space only, both);
-(3) destructor setting.  Normally it is sufficient to apply *riscv_destroy_event*;
-(4) tweaks for non-sampling events, which will be utilized by functions such as
-*perf_adjust_period*, usually something like the follows:
-
-if (!is_sampling_event(event)) {
-        hwc->sample_period = x86_pmu.max_period;
-        hwc->last_period = hwc->sample_period;
-        local64_set(&hwc->period_left, hwc->sample_period);
-}
-
-In the case of *riscv_base_pmu*, only (3) is provided for now.
-
-
-3. Interrupt
-------------
-
-3.1. Interrupt Initialization
-
-This often occurs at the beginning of the *event_init* method. In common
-practice, this should be a code segment like
-
-int x86_reserve_hardware(void)
-{
-        int err = 0;
-
-        if (!atomic_inc_not_zero(&pmc_refcount)) {
-                mutex_lock(&pmc_reserve_mutex);
-                if (atomic_read(&pmc_refcount) == 0) {
-                        if (!reserve_pmc_hardware())
-                                err = -EBUSY;
-                        else
-                                reserve_ds_buffers();
-                }
-                if (!err)
-                        atomic_inc(&pmc_refcount);
-                mutex_unlock(&pmc_reserve_mutex);
-        }
-
-        return err;
-}
-
-And the magic is in *reserve_pmc_hardware*, which usually does atomic
-operations to make implemented IRQ accessible from some global function pointer.
-*release_pmc_hardware* serves the opposite purpose, and it is used in event
-destructors mentioned in previous section.
-
-(Note: From the implementations in all the architectures, the *reserve/release*
-pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading.
-It does NOT deal with the binding between an event and a physical counter,
-which will be introduced in the next section.)
-
-3.2. IRQ Structure
-
-Basically, a IRQ runs the following pseudo code:
-
-for each hardware counter that triggered this overflow
-
-    get the event of this counter
-
-    // following two steps are defined as *read()*,
-    // check the section Reading/Writing Counters for details.
-    count the delta value since previous interrupt
-    update the event->count (# event occurs) by adding delta, and
-               event->hw.period_left by subtracting delta
-
-    if the event overflows
-        sample data
-        set the counter appropriately for the next overflow
-
-        if the event overflows again
-            too frequently, throttle this event
-        fi
-    fi
-
-end for
-
-However as of this writing, none of the RISC-V implementations have designed an
-interrupt for perf, so the details are to be completed in the future.
-
-4. Reading/Writing Counters
----------------------------
-
-They seem symmetric but perf treats them quite differently.  For reading, there
-is a *read* interface in *struct pmu*, but it serves more than just reading.
-According to the context, the *read* function not only reads the content of the
-counter (event->count), but also updates the left period to the next interrupt
-(event->hw.period_left).
-
-But the core of perf does not need direct write to counters.  Writing counters
-is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one
-has to set the counter to a good value for the next interrupt; 2) inside the IRQ
-it should set the counter to the same resonable value.
-
-Reading is not a problem in RISC-V but writing would need some effort, since
-counters are not allowed to be written by S-mode.
-
-
-5. add()/del()/start()/stop()
------------------------------
-
-Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop()
-starts/stop the counter of some event in the PMU.  All of them take the same
-arguments: *struct perf_event *event* and *int flag*.
-
-Consider perf as a state machine, then you will find that these functions serve
-as the state transition process between those states.
-Three states (event->hw.state) are defined:
-
-* PERF_HES_STOPPED:	the counter is stopped
-* PERF_HES_UPTODATE:	the event->count is up-to-date
-* PERF_HES_ARCH:	arch-dependent usage ... we don't need this for now
-
-A normal flow of these state transitions are as follows:
-
-* A user launches a perf event, resulting in calling to *event_init*.
-* When being context-switched in, *add* is called by the perf core, with a flag
-  PERF_EF_START, which means that the event should be started after it is added.
-  At this stage, a general event is bound to a physical counter, if any.
-  The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now
-  stopped, and the (software) event count does not need updating.
-** *start* is then called, and the counter is enabled.
-   With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check
-   previous section for detail).
-   Nothing is written if the flag does not contain PERF_EF_RELOAD.
-   The state now is reset to none, because it is neither stopped nor updated
-   (the counting already started)
-* When being context-switched out, *del* is called.  It then checks out all the
-  events in the PMU and calls *stop* to update their counts.
-** *stop* is called by *del*
-   and the perf core with flag PERF_EF_UPDATE, and it often shares the same
-   subroutine as *read* with the same logic.
-   The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
-
-** Life cycle of these two pairs: *add* and *del* are called repeatedly as
-  tasks switch in-and-out; *start* and *stop* is also called when the perf core
-  needs a quick stop-and-start, for instance, when the interrupt period is being
-  adjusted.
-
-Current implementation is sufficient for now and can be easily extended to
-features in the future.
-
-A. Related Structures
----------------------
-
-* struct pmu: include/linux/perf_event.h
-* struct riscv_pmu: arch/riscv/include/asm/perf_event.h
-
-  Both structures are designed to be read-only.
-
-  *struct pmu* defines some function pointer interfaces, and most of them take
-*struct perf_event* as a main argument, dealing with perf events according to
-perf's internal state machine (check kernel/events/core.c for details).
-
-  *struct riscv_pmu* defines PMU-specific parameters.  The naming follows the
-convention of all other architectures.
-
-* struct perf_event: include/linux/perf_event.h
-* struct hw_perf_event
-
-  The generic structure that represents perf events, and the hardware-related
-details.
-
-* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h
-
-  The structure that holds the status of events, has two fixed members:
-the number of events and the array of the events.
-
-References
-----------
-
-[1] https://github.com/riscv/riscv-linux/pull/124
-[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA
-- 
cgit v1.2.3-59-g8ed1b


From 4ca9bc225e46eb7bc040dd948be7cb68975d80d3 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:52:59 -0300
Subject: docs: target: convert docs to ReST and rename to *.rst

Convert the TCM docs to ReST format and add them to the
bookset.

This has a mix of userspace-faced and Kernelspace faced
docs. Still, it sounds a better candidate to be added at
the kernel API set of docs.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/target/index.rst           |  19 ++
 Documentation/target/scripts.rst         |  11 +
 Documentation/target/tcm_mod_builder.rst | 149 ++++++++++++
 Documentation/target/tcm_mod_builder.txt | 145 -----------
 Documentation/target/tcmu-design.rst     | 405 +++++++++++++++++++++++++++++++
 Documentation/target/tcmu-design.txt     | 381 -----------------------------
 scripts/documentation-file-ref-check     |   2 +-
 7 files changed, 585 insertions(+), 527 deletions(-)
 create mode 100644 Documentation/target/index.rst
 create mode 100644 Documentation/target/scripts.rst
 create mode 100644 Documentation/target/tcm_mod_builder.rst
 delete mode 100644 Documentation/target/tcm_mod_builder.txt
 create mode 100644 Documentation/target/tcmu-design.rst
 delete mode 100644 Documentation/target/tcmu-design.txt

diff --git a/Documentation/target/index.rst b/Documentation/target/index.rst
new file mode 100644
index 000000000000..b68f48982392
--- /dev/null
+++ b/Documentation/target/index.rst
@@ -0,0 +1,19 @@
+:orphan:
+
+==================
+TCM Virtual Device
+==================
+
+.. toctree::
+    :maxdepth: 1
+
+    tcmu-design
+    tcm_mod_builder
+    scripts
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/target/scripts.rst b/Documentation/target/scripts.rst
new file mode 100644
index 000000000000..172d42b522e4
--- /dev/null
+++ b/Documentation/target/scripts.rst
@@ -0,0 +1,11 @@
+TCM mod builder script
+----------------------
+
+.. literalinclude:: tcm_mod_builder.py
+    :language: perl
+
+Target export device script
+---------------------------
+
+.. literalinclude:: target-export-device
+    :language: shell
diff --git a/Documentation/target/tcm_mod_builder.rst b/Documentation/target/tcm_mod_builder.rst
new file mode 100644
index 000000000000..9bfc9822e2bd
--- /dev/null
+++ b/Documentation/target/tcm_mod_builder.rst
@@ -0,0 +1,149 @@
+=========================================
+The TCM v4 fabric module script generator
+=========================================
+
+Greetings all,
+
+This document is intended to be a mini-HOWTO for using the tcm_mod_builder.py
+script to generate a brand new functional TCM v4 fabric .ko module of your very own,
+that once built can be immediately be loaded to start access the new TCM/ConfigFS
+fabric skeleton, by simply using::
+
+	modprobe $TCM_NEW_MOD
+	mkdir -p /sys/kernel/config/target/$TCM_NEW_MOD
+
+This script will create a new drivers/target/$TCM_NEW_MOD/, and will do the following
+
+	1) Generate new API callers for drivers/target/target_core_fabric_configs.c logic
+	   ->make_tpg(), ->drop_tpg(), ->make_wwn(), ->drop_wwn().  These are created
+	   into $TCM_NEW_MOD/$TCM_NEW_MOD_configfs.c
+	2) Generate basic infrastructure for loading/unloading LKMs and TCM/ConfigFS fabric module
+	   using a skeleton struct target_core_fabric_ops API template.
+	3) Based on user defined T10 Proto_Ident for the new fabric module being built,
+	   the TransportID / Initiator and Target WWPN related handlers for
+	   SPC-3 persistent reservation are automatically generated in $TCM_NEW_MOD/$TCM_NEW_MOD_fabric.c
+	   using drivers/target/target_core_fabric_lib.c logic.
+	4) NOP API calls for all other Data I/O path and fabric dependent attribute logic
+	   in $TCM_NEW_MOD/$TCM_NEW_MOD_fabric.c
+
+tcm_mod_builder.py depends upon the mandatory '-p $PROTO_IDENT' and '-m
+$FABRIC_MOD_name' parameters, and actually running the script looks like::
+
+  target:/mnt/sdb/lio-core-2.6.git/Documentation/target# python tcm_mod_builder.py -p iSCSI -m tcm_nab5000
+  tcm_dir: /mnt/sdb/lio-core-2.6.git/Documentation/target/../../
+  Set fabric_mod_name: tcm_nab5000
+  Set fabric_mod_dir:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000
+  Using proto_ident: iSCSI
+  Creating fabric_mod_dir:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_base.h
+  Using tcm_mod_scan_fabric_ops:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../include/target/target_core_fabric_ops.h
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_fabric.c
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_fabric.h
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_configfs.c
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/Kbuild
+  Writing file:
+  /mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/Kconfig
+  Would you like to add tcm_nab5000to drivers/target/Kbuild..? [yes,no]: yes
+  Would you like to add tcm_nab5000to drivers/target/Kconfig..? [yes,no]: yes
+
+At the end of tcm_mod_builder.py. the script will ask to add the following
+line to drivers/target/Kbuild::
+
+	obj-$(CONFIG_TCM_NAB5000)       += tcm_nab5000/
+
+and the same for drivers/target/Kconfig::
+
+	source "drivers/target/tcm_nab5000/Kconfig"
+
+#) Run 'make menuconfig' and select the new CONFIG_TCM_NAB5000 item::
+
+	<M>   TCM_NAB5000 fabric module
+
+#) Build using 'make modules', once completed you will have::
+
+    target:/mnt/sdb/lio-core-2.6.git# ls -la drivers/target/tcm_nab5000/
+    total 1348
+    drwxr-xr-x 2 root root   4096 2010-10-05 03:23 .
+    drwxr-xr-x 9 root root   4096 2010-10-05 03:22 ..
+    -rw-r--r-- 1 root root    282 2010-10-05 03:22 Kbuild
+    -rw-r--r-- 1 root root    171 2010-10-05 03:22 Kconfig
+    -rw-r--r-- 1 root root     49 2010-10-05 03:23 modules.order
+    -rw-r--r-- 1 root root    738 2010-10-05 03:22 tcm_nab5000_base.h
+    -rw-r--r-- 1 root root   9096 2010-10-05 03:22 tcm_nab5000_configfs.c
+    -rw-r--r-- 1 root root 191200 2010-10-05 03:23 tcm_nab5000_configfs.o
+    -rw-r--r-- 1 root root  40504 2010-10-05 03:23 .tcm_nab5000_configfs.o.cmd
+    -rw-r--r-- 1 root root   5414 2010-10-05 03:22 tcm_nab5000_fabric.c
+    -rw-r--r-- 1 root root   2016 2010-10-05 03:22 tcm_nab5000_fabric.h
+    -rw-r--r-- 1 root root 190932 2010-10-05 03:23 tcm_nab5000_fabric.o
+    -rw-r--r-- 1 root root  40713 2010-10-05 03:23 .tcm_nab5000_fabric.o.cmd
+    -rw-r--r-- 1 root root 401861 2010-10-05 03:23 tcm_nab5000.ko
+    -rw-r--r-- 1 root root    265 2010-10-05 03:23 .tcm_nab5000.ko.cmd
+    -rw-r--r-- 1 root root    459 2010-10-05 03:23 tcm_nab5000.mod.c
+    -rw-r--r-- 1 root root  23896 2010-10-05 03:23 tcm_nab5000.mod.o
+    -rw-r--r-- 1 root root  22655 2010-10-05 03:23 .tcm_nab5000.mod.o.cmd
+    -rw-r--r-- 1 root root 379022 2010-10-05 03:23 tcm_nab5000.o
+    -rw-r--r-- 1 root root    211 2010-10-05 03:23 .tcm_nab5000.o.cmd
+
+#) Load the new module, create a lun_0 configfs group, and add new TCM Core
+   IBLOCK backstore symlink to port::
+
+    target:/mnt/sdb/lio-core-2.6.git# insmod drivers/target/tcm_nab5000.ko
+    target:/mnt/sdb/lio-core-2.6.git# mkdir -p /sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0
+    target:/mnt/sdb/lio-core-2.6.git# cd /sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0/
+    target:/sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0# ln -s /sys/kernel/config/target/core/iblock_0/lvm_test0 nab5000_port
+
+    target:/sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0# cd -
+    target:/mnt/sdb/lio-core-2.6.git# tree /sys/kernel/config/target/nab5000/
+    /sys/kernel/config/target/nab5000/
+    |-- discovery_auth
+    |-- iqn.foo
+    |   `-- tpgt_1
+    |       |-- acls
+    |       |-- attrib
+    |       |-- lun
+    |       |   `-- lun_0
+    |       |       |-- alua_tg_pt_gp
+    |       |       |-- alua_tg_pt_offline
+    |       |       |-- alua_tg_pt_status
+    |       |       |-- alua_tg_pt_write_md
+    |	|	`-- nab5000_port -> ../../../../../../target/core/iblock_0/lvm_test0
+    |       |-- np
+    |       `-- param
+    `-- version
+
+    target:/mnt/sdb/lio-core-2.6.git# lsmod
+    Module                  Size  Used by
+    tcm_nab5000             3935  4
+    iscsi_target_mod      193211  0
+    target_core_stgt        8090  0
+    target_core_pscsi      11122  1
+    target_core_file        9172  2
+    target_core_iblock      9280  1
+    target_core_mod       228575  31
+    tcm_nab5000,iscsi_target_mod,target_core_stgt,target_core_pscsi,target_core_file,target_core_iblock
+    libfc                  73681  0
+    scsi_debug             56265  0
+    scsi_tgt                8666  1 target_core_stgt
+    configfs               20644  2 target_core_mod
+
+----------------------------------------------------------------------
+
+Future TODO items
+=================
+
+	1) Add more T10 proto_idents
+	2) Make tcm_mod_dump_fabric_ops() smarter and generate function pointer
+	   defs directly from include/target/target_core_fabric_ops.h:struct target_core_fabric_ops
+	   structure members.
+
+October 5th, 2010
+
+Nicholas A. Bellinger <nab@linux-iscsi.org>
diff --git a/Documentation/target/tcm_mod_builder.txt b/Documentation/target/tcm_mod_builder.txt
deleted file mode 100644
index ae22f7005540..000000000000
--- a/Documentation/target/tcm_mod_builder.txt
+++ /dev/null
@@ -1,145 +0,0 @@
->>>>>>>>>> The TCM v4 fabric module script generator <<<<<<<<<<
-
-Greetings all,
-
-This document is intended to be a mini-HOWTO for using the tcm_mod_builder.py
-script to generate a brand new functional TCM v4 fabric .ko module of your very own,
-that once built can be immediately be loaded to start access the new TCM/ConfigFS
-fabric skeleton, by simply using:
-
-	modprobe $TCM_NEW_MOD
-	mkdir -p /sys/kernel/config/target/$TCM_NEW_MOD
-
-This script will create a new drivers/target/$TCM_NEW_MOD/, and will do the following
-
-	*) Generate new API callers for drivers/target/target_core_fabric_configs.c logic
-	   ->make_tpg(), ->drop_tpg(), ->make_wwn(), ->drop_wwn().  These are created
-	   into $TCM_NEW_MOD/$TCM_NEW_MOD_configfs.c
-	*) Generate basic infrastructure for loading/unloading LKMs and TCM/ConfigFS fabric module
-	   using a skeleton struct target_core_fabric_ops API template.
-	*) Based on user defined T10 Proto_Ident for the new fabric module being built,
-	   the TransportID / Initiator and Target WWPN related handlers for
-	   SPC-3 persistent reservation are automatically generated in $TCM_NEW_MOD/$TCM_NEW_MOD_fabric.c
-	   using drivers/target/target_core_fabric_lib.c logic.
-	*) NOP API calls for all other Data I/O path and fabric dependent attribute logic
-	   in $TCM_NEW_MOD/$TCM_NEW_MOD_fabric.c
-
-tcm_mod_builder.py depends upon the mandatory '-p $PROTO_IDENT' and '-m
-$FABRIC_MOD_name' parameters, and actually running the script looks like:
-
-target:/mnt/sdb/lio-core-2.6.git/Documentation/target# python tcm_mod_builder.py -p iSCSI -m tcm_nab5000
-tcm_dir: /mnt/sdb/lio-core-2.6.git/Documentation/target/../../
-Set fabric_mod_name: tcm_nab5000
-Set fabric_mod_dir:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000
-Using proto_ident: iSCSI
-Creating fabric_mod_dir:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_base.h
-Using tcm_mod_scan_fabric_ops:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../include/target/target_core_fabric_ops.h
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_fabric.c
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_fabric.h
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/tcm_nab5000_configfs.c
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/Kbuild
-Writing file:
-/mnt/sdb/lio-core-2.6.git/Documentation/target/../../drivers/target/tcm_nab5000/Kconfig
-Would you like to add tcm_nab5000to drivers/target/Kbuild..? [yes,no]: yes
-Would you like to add tcm_nab5000to drivers/target/Kconfig..? [yes,no]: yes
-
-At the end of tcm_mod_builder.py. the script will ask to add the following
-line to drivers/target/Kbuild:
-
-	obj-$(CONFIG_TCM_NAB5000)       += tcm_nab5000/
-
-and the same for drivers/target/Kconfig:
-
-	source "drivers/target/tcm_nab5000/Kconfig"
-
-*) Run 'make menuconfig' and select the new CONFIG_TCM_NAB5000 item:
-
-	<M>   TCM_NAB5000 fabric module
-
-*) Build using 'make modules', once completed you will have:
-
-target:/mnt/sdb/lio-core-2.6.git# ls -la drivers/target/tcm_nab5000/
-total 1348
-drwxr-xr-x 2 root root   4096 2010-10-05 03:23 .
-drwxr-xr-x 9 root root   4096 2010-10-05 03:22 ..
--rw-r--r-- 1 root root    282 2010-10-05 03:22 Kbuild
--rw-r--r-- 1 root root    171 2010-10-05 03:22 Kconfig
--rw-r--r-- 1 root root     49 2010-10-05 03:23 modules.order
--rw-r--r-- 1 root root    738 2010-10-05 03:22 tcm_nab5000_base.h
--rw-r--r-- 1 root root   9096 2010-10-05 03:22 tcm_nab5000_configfs.c
--rw-r--r-- 1 root root 191200 2010-10-05 03:23 tcm_nab5000_configfs.o
--rw-r--r-- 1 root root  40504 2010-10-05 03:23 .tcm_nab5000_configfs.o.cmd
--rw-r--r-- 1 root root   5414 2010-10-05 03:22 tcm_nab5000_fabric.c
--rw-r--r-- 1 root root   2016 2010-10-05 03:22 tcm_nab5000_fabric.h
--rw-r--r-- 1 root root 190932 2010-10-05 03:23 tcm_nab5000_fabric.o
--rw-r--r-- 1 root root  40713 2010-10-05 03:23 .tcm_nab5000_fabric.o.cmd
--rw-r--r-- 1 root root 401861 2010-10-05 03:23 tcm_nab5000.ko
--rw-r--r-- 1 root root    265 2010-10-05 03:23 .tcm_nab5000.ko.cmd
--rw-r--r-- 1 root root    459 2010-10-05 03:23 tcm_nab5000.mod.c
--rw-r--r-- 1 root root  23896 2010-10-05 03:23 tcm_nab5000.mod.o
--rw-r--r-- 1 root root  22655 2010-10-05 03:23 .tcm_nab5000.mod.o.cmd
--rw-r--r-- 1 root root 379022 2010-10-05 03:23 tcm_nab5000.o
--rw-r--r-- 1 root root    211 2010-10-05 03:23 .tcm_nab5000.o.cmd
-
-*) Load the new module, create a lun_0 configfs group, and add new TCM Core
-   IBLOCK backstore symlink to port:
-
-target:/mnt/sdb/lio-core-2.6.git# insmod drivers/target/tcm_nab5000.ko
-target:/mnt/sdb/lio-core-2.6.git# mkdir -p /sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0
-target:/mnt/sdb/lio-core-2.6.git# cd /sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0/
-target:/sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0# ln -s /sys/kernel/config/target/core/iblock_0/lvm_test0 nab5000_port
-
-target:/sys/kernel/config/target/nab5000/iqn.foo/tpgt_1/lun/lun_0# cd -
-target:/mnt/sdb/lio-core-2.6.git# tree /sys/kernel/config/target/nab5000/
-/sys/kernel/config/target/nab5000/
-|-- discovery_auth
-|-- iqn.foo
-|   `-- tpgt_1
-|       |-- acls
-|       |-- attrib
-|       |-- lun
-|       |   `-- lun_0
-|       |       |-- alua_tg_pt_gp
-|       |       |-- alua_tg_pt_offline
-|       |       |-- alua_tg_pt_status
-|       |       |-- alua_tg_pt_write_md
-|	|	`-- nab5000_port -> ../../../../../../target/core/iblock_0/lvm_test0
-|       |-- np
-|       `-- param
-`-- version
-
-target:/mnt/sdb/lio-core-2.6.git# lsmod
-Module                  Size  Used by
-tcm_nab5000             3935  4
-iscsi_target_mod      193211  0
-target_core_stgt        8090  0
-target_core_pscsi      11122  1
-target_core_file        9172  2
-target_core_iblock      9280  1
-target_core_mod       228575  31
-tcm_nab5000,iscsi_target_mod,target_core_stgt,target_core_pscsi,target_core_file,target_core_iblock
-libfc                  73681  0
-scsi_debug             56265  0
-scsi_tgt                8666  1 target_core_stgt
-configfs               20644  2 target_core_mod
-
-----------------------------------------------------------------------
-
-Future TODO items:
-
-	*) Add more T10 proto_idents
-	*) Make tcm_mod_dump_fabric_ops() smarter and generate function pointer
-	   defs directly from include/target/target_core_fabric_ops.h:struct target_core_fabric_ops
-	   structure members.
-
-October 5th, 2010
-Nicholas A. Bellinger <nab@linux-iscsi.org>
diff --git a/Documentation/target/tcmu-design.rst b/Documentation/target/tcmu-design.rst
new file mode 100644
index 000000000000..a7b426707bf6
--- /dev/null
+++ b/Documentation/target/tcmu-design.rst
@@ -0,0 +1,405 @@
+====================
+TCM Userspace Design
+====================
+
+
+.. Contents:
+
+   1) TCM Userspace Design
+     a) Background
+     b) Benefits
+     c) Design constraints
+     d) Implementation overview
+        i. Mailbox
+        ii. Command ring
+        iii. Data Area
+     e) Device discovery
+     f) Device events
+     g) Other contingencies
+   2) Writing a user pass-through handler
+     a) Discovering and configuring TCMU uio devices
+     b) Waiting for events on the device(s)
+     c) Managing the command ring
+   3) A final note
+
+
+TCM Userspace Design
+====================
+
+TCM is another name for LIO, an in-kernel iSCSI target (server).
+Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
+allows userspace programs to be written which act as iSCSI targets.
+This document describes the design.
+
+The existing kernel provides modules for different SCSI transport
+protocols.  TCM also modularizes the data storage.  There are existing
+modules for file, block device, RAM or using another SCSI device as
+storage.  These are called "backstores" or "storage engines".  These
+built-in modules are implemented entirely as kernel code.
+
+Background
+----------
+
+In addition to modularizing the transport protocol used for carrying
+SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
+the actual data storage as well. These are referred to as "backstores"
+or "storage engines". The target comes with backstores that allow a
+file, a block device, RAM, or another SCSI device to be used for the
+local storage needed for the exported SCSI LUN. Like the rest of LIO,
+these are implemented entirely as kernel code.
+
+These backstores cover the most common use cases, but not all. One new
+use case that other non-kernel target solutions, such as tgt, are able
+to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
+target then serves as a translator, allowing initiators to store data
+in these non-traditional networked storage systems, while still only
+using standard protocols themselves.
+
+If the target is a userspace process, supporting these is easy. tgt,
+for example, needs only a small adapter module for each, because the
+modules just use the available userspace libraries for RBD and GLFS.
+
+Adding support for these backstores in LIO is considerably more
+difficult, because LIO is entirely kernel code. Instead of undertaking
+the significant work to port the GLFS or RBD APIs and protocols to the
+kernel, another approach is to create a userspace pass-through
+backstore for LIO, "TCMU".
+
+
+Benefits
+--------
+
+In addition to allowing relatively easy support for RBD and GLFS, TCMU
+will also allow easier development of new backstores. TCMU combines
+with the LIO loopback fabric to become something similar to FUSE
+(Filesystem in Userspace), but at the SCSI layer instead of the
+filesystem layer. A SUSE, if you will.
+
+The disadvantage is there are more distinct components to configure, and
+potentially to malfunction. This is unavoidable, but hopefully not
+fatal if we're careful to keep things as simple as possible.
+
+Design constraints
+------------------
+
+- Good performance: high throughput, low latency
+- Cleanly handle if userspace:
+
+   1) never attaches
+   2) hangs
+   3) dies
+   4) misbehaves
+
+- Allow future flexibility in user & kernel implementations
+- Be reasonably memory-efficient
+- Simple to configure & run
+- Simple to write a userspace backend
+
+
+Implementation overview
+-----------------------
+
+The core of the TCMU interface is a memory region that is shared
+between kernel and userspace. Within this region is: a control area
+(mailbox); a lockless producer/consumer circular buffer for commands
+to be passed up, and status returned; and an in/out data buffer area.
+
+TCMU uses the pre-existing UIO subsystem. UIO allows device driver
+development in userspace, and this is conceptually very close to the
+TCMU use case, except instead of a physical device, TCMU implements a
+memory-mapped layout designed for SCSI commands. Using UIO also
+benefits TCMU by handling device introspection (e.g. a way for
+userspace to determine how large the shared region is) and signaling
+mechanisms in both directions.
+
+There are no embedded pointers in the memory region. Everything is
+expressed as an offset from the region's starting address. This allows
+the ring to still work if the user process dies and is restarted with
+the region mapped at a different virtual address.
+
+See target_core_user.h for the struct definitions.
+
+The Mailbox
+-----------
+
+The mailbox is always at the start of the shared memory region, and
+contains a version, details about the starting offset and size of the
+command ring, and head and tail pointers to be used by the kernel and
+userspace (respectively) to put commands on the ring, and indicate
+when the commands are completed.
+
+version - 1 (userspace should abort if otherwise)
+
+flags:
+    - TCMU_MAILBOX_FLAG_CAP_OOOC:
+	indicates out-of-order completion is supported.
+	See "The Command Ring" for details.
+
+cmdr_off
+	The offset of the start of the command ring from the start
+	of the memory region, to account for the mailbox size.
+cmdr_size
+	The size of the command ring. This does *not* need to be a
+	power of two.
+cmd_head
+	Modified by the kernel to indicate when a command has been
+	placed on the ring.
+cmd_tail
+	Modified by userspace to indicate when it has completed
+	processing of a command.
+
+The Command Ring
+----------------
+
+Commands are placed on the ring by the kernel incrementing
+mailbox.cmd_head by the size of the command, modulo cmdr_size, and
+then signaling userspace via uio_event_notify(). Once the command is
+completed, userspace updates mailbox.cmd_tail in the same way and
+signals the kernel via a 4-byte write(). When cmd_head equals
+cmd_tail, the ring is empty -- no commands are currently waiting to be
+processed by userspace.
+
+TCMU commands are 8-byte aligned. They start with a common header
+containing "len_op", a 32-bit value that stores the length, as well as
+the opcode in the lowest unused bits. It also contains cmd_id and
+flags fields for setting by the kernel (kflags) and userspace
+(uflags).
+
+Currently only two opcodes are defined, TCMU_OP_CMD and TCMU_OP_PAD.
+
+When the opcode is CMD, the entry in the command ring is a struct
+tcmu_cmd_entry. Userspace finds the SCSI CDB (Command Data Block) via
+tcmu_cmd_entry.req.cdb_off. This is an offset from the start of the
+overall shared memory region, not the entry. The data in/out buffers
+are accessible via tht req.iov[] array. iov_cnt contains the number of
+entries in iov[] needed to describe either the Data-In or Data-Out
+buffers. For bidirectional commands, iov_cnt specifies how many iovec
+entries cover the Data-Out area, and iov_bidi_cnt specifies how many
+iovec entries immediately after that in iov[] cover the Data-In
+area. Just like other fields, iov.iov_base is an offset from the start
+of the region.
+
+When completing a command, userspace sets rsp.scsi_status, and
+rsp.sense_buffer if necessary. Userspace then increments
+mailbox.cmd_tail by entry.hdr.length (mod cmdr_size) and signals the
+kernel via the UIO method, a 4-byte write to the file descriptor.
+
+If TCMU_MAILBOX_FLAG_CAP_OOOC is set for mailbox->flags, kernel is
+capable of handling out-of-order completions. In this case, userspace can
+handle command in different order other than original. Since kernel would
+still process the commands in the same order it appeared in the command
+ring, userspace need to update the cmd->id when completing the
+command(a.k.a steal the original command's entry).
+
+When the opcode is PAD, userspace only updates cmd_tail as above --
+it's a no-op. (The kernel inserts PAD entries to ensure each CMD entry
+is contiguous within the command ring.)
+
+More opcodes may be added in the future. If userspace encounters an
+opcode it does not handle, it must set UNKNOWN_OP bit (bit 0) in
+hdr.uflags, update cmd_tail, and proceed with processing additional
+commands, if any.
+
+The Data Area
+-------------
+
+This is shared-memory space after the command ring. The organization
+of this area is not defined in the TCMU interface, and userspace
+should access only the parts referenced by pending iovs.
+
+
+Device Discovery
+----------------
+
+Other devices may be using UIO besides TCMU. Unrelated user processes
+may also be handling different sets of TCMU devices. TCMU userspace
+processes must find their devices by scanning sysfs
+class/uio/uio*/name. For TCMU devices, these names will be of the
+format::
+
+	tcm-user/<hba_num>/<device_name>/<subtype>/<path>
+
+where "tcm-user" is common for all TCMU-backed UIO devices. <hba_num>
+and <device_name> allow userspace to find the device's path in the
+kernel target's configfs tree. Assuming the usual mount point, it is
+found at::
+
+	/sys/kernel/config/target/core/user_<hba_num>/<device_name>
+
+This location contains attributes such as "hw_block_size", that
+userspace needs to know for correct operation.
+
+<subtype> will be a userspace-process-unique string to identify the
+TCMU device as expecting to be backed by a certain handler, and <path>
+will be an additional handler-specific string for the user process to
+configure the device, if needed. The name cannot contain ':', due to
+LIO limitations.
+
+For all devices so discovered, the user handler opens /dev/uioX and
+calls mmap()::
+
+	mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)
+
+where size must be equal to the value read from
+/sys/class/uio/uioX/maps/map0/size.
+
+
+Device Events
+-------------
+
+If a new device is added or removed, a notification will be broadcast
+over netlink, using a generic netlink family name of "TCM-USER" and a
+multicast group named "config". This will include the UIO name as
+described in the previous section, as well as the UIO minor
+number. This should allow userspace to identify both the UIO device and
+the LIO device, so that after determining the device is supported
+(based on subtype) it can take the appropriate action.
+
+
+Other contingencies
+-------------------
+
+Userspace handler process never attaches:
+
+- TCMU will post commands, and then abort them after a timeout period
+  (30 seconds.)
+
+Userspace handler process is killed:
+
+- It is still possible to restart and re-connect to TCMU
+  devices. Command ring is preserved. However, after the timeout period,
+  the kernel will abort pending tasks.
+
+Userspace handler process hangs:
+
+- The kernel will abort pending tasks after a timeout period.
+
+Userspace handler process is malicious:
+
+- The process can trivially break the handling of devices it controls,
+  but should not be able to access kernel memory outside its shared
+  memory areas.
+
+
+Writing a user pass-through handler (with example code)
+=======================================================
+
+A user process handing a TCMU device must support the following:
+
+a) Discovering and configuring TCMU uio devices
+b) Waiting for events on the device(s)
+c) Managing the command ring: Parsing operations and commands,
+   performing work as needed, setting response fields (scsi_status and
+   possibly sense_buffer), updating cmd_tail, and notifying the kernel
+   that work has been finished
+
+First, consider instead writing a plugin for tcmu-runner. tcmu-runner
+implements all of this, and provides a higher-level API for plugin
+authors.
+
+TCMU is designed so that multiple unrelated processes can manage TCMU
+devices separately. All handlers should make sure to only open their
+devices, based opon a known subtype string.
+
+a) Discovering and configuring TCMU UIO devices::
+
+      /* error checking omitted for brevity */
+
+      int fd, dev_fd;
+      char buf[256];
+      unsigned long long map_len;
+      void *map;
+
+      fd = open("/sys/class/uio/uio0/name", O_RDONLY);
+      ret = read(fd, buf, sizeof(buf));
+      close(fd);
+      buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
+
+      /* we only want uio devices whose name is a format we expect */
+      if (strncmp(buf, "tcm-user", 8))
+	exit(-1);
+
+      /* Further checking for subtype also needed here */
+
+      fd = open(/sys/class/uio/%s/maps/map0/size, O_RDONLY);
+      ret = read(fd, buf, sizeof(buf));
+      close(fd);
+      str_buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
+
+      map_len = strtoull(buf, NULL, 0);
+
+      dev_fd = open("/dev/uio0", O_RDWR);
+      map = mmap(NULL, map_len, PROT_READ|PROT_WRITE, MAP_SHARED, dev_fd, 0);
+
+
+      b) Waiting for events on the device(s)
+
+      while (1) {
+        char buf[4];
+
+        int ret = read(dev_fd, buf, 4); /* will block */
+
+        handle_device_events(dev_fd, map);
+      }
+
+
+c) Managing the command ring::
+
+      #include <linux/target_core_user.h>
+
+      int handle_device_events(int fd, void *map)
+      {
+        struct tcmu_mailbox *mb = map;
+        struct tcmu_cmd_entry *ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
+        int did_some_work = 0;
+
+        /* Process events from cmd ring until we catch up with cmd_head */
+        while (ent != (void *)mb + mb->cmdr_off + mb->cmd_head) {
+
+          if (tcmu_hdr_get_op(ent->hdr.len_op) == TCMU_OP_CMD) {
+            uint8_t *cdb = (void *)mb + ent->req.cdb_off;
+            bool success = true;
+
+            /* Handle command here. */
+            printf("SCSI opcode: 0x%x\n", cdb[0]);
+
+            /* Set response fields */
+            if (success)
+              ent->rsp.scsi_status = SCSI_NO_SENSE;
+            else {
+              /* Also fill in rsp->sense_buffer here */
+              ent->rsp.scsi_status = SCSI_CHECK_CONDITION;
+            }
+          }
+          else if (tcmu_hdr_get_op(ent->hdr.len_op) != TCMU_OP_PAD) {
+            /* Tell the kernel we didn't handle unknown opcodes */
+            ent->hdr.uflags |= TCMU_UFLAG_UNKNOWN_OP;
+          }
+          else {
+            /* Do nothing for PAD entries except update cmd_tail */
+          }
+
+          /* update cmd_tail */
+          mb->cmd_tail = (mb->cmd_tail + tcmu_hdr_get_len(&ent->hdr)) % mb->cmdr_size;
+          ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
+          did_some_work = 1;
+        }
+
+        /* Notify the kernel that work has been finished */
+        if (did_some_work) {
+          uint32_t buf = 0;
+
+          write(fd, &buf, 4);
+        }
+
+        return 0;
+      }
+
+
+A final note
+============
+
+Please be careful to return codes as defined by the SCSI
+specifications. These are different than some values defined in the
+scsi/scsi.h include file. For example, CHECK CONDITION's status code
+is 2, not 1.
diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.txt
deleted file mode 100644
index 4cebc1ebf99a..000000000000
--- a/Documentation/target/tcmu-design.txt
+++ /dev/null
@@ -1,381 +0,0 @@
-Contents:
-
-1) TCM Userspace Design
-  a) Background
-  b) Benefits
-  c) Design constraints
-  d) Implementation overview
-     i. Mailbox
-     ii. Command ring
-     iii. Data Area
-  e) Device discovery
-  f) Device events
-  g) Other contingencies
-2) Writing a user pass-through handler
-  a) Discovering and configuring TCMU uio devices
-  b) Waiting for events on the device(s)
-  c) Managing the command ring
-3) A final note
-
-
-TCM Userspace Design
---------------------
-
-TCM is another name for LIO, an in-kernel iSCSI target (server).
-Existing TCM targets run in the kernel.  TCMU (TCM in Userspace)
-allows userspace programs to be written which act as iSCSI targets.
-This document describes the design.
-
-The existing kernel provides modules for different SCSI transport
-protocols.  TCM also modularizes the data storage.  There are existing
-modules for file, block device, RAM or using another SCSI device as
-storage.  These are called "backstores" or "storage engines".  These
-built-in modules are implemented entirely as kernel code.
-
-Background:
-
-In addition to modularizing the transport protocol used for carrying
-SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
-the actual data storage as well. These are referred to as "backstores"
-or "storage engines". The target comes with backstores that allow a
-file, a block device, RAM, or another SCSI device to be used for the
-local storage needed for the exported SCSI LUN. Like the rest of LIO,
-these are implemented entirely as kernel code.
-
-These backstores cover the most common use cases, but not all. One new
-use case that other non-kernel target solutions, such as tgt, are able
-to support is using Gluster's GLFS or Ceph's RBD as a backstore. The
-target then serves as a translator, allowing initiators to store data
-in these non-traditional networked storage systems, while still only
-using standard protocols themselves.
-
-If the target is a userspace process, supporting these is easy. tgt,
-for example, needs only a small adapter module for each, because the
-modules just use the available userspace libraries for RBD and GLFS.
-
-Adding support for these backstores in LIO is considerably more
-difficult, because LIO is entirely kernel code. Instead of undertaking
-the significant work to port the GLFS or RBD APIs and protocols to the
-kernel, another approach is to create a userspace pass-through
-backstore for LIO, "TCMU".
-
-
-Benefits:
-
-In addition to allowing relatively easy support for RBD and GLFS, TCMU
-will also allow easier development of new backstores. TCMU combines
-with the LIO loopback fabric to become something similar to FUSE
-(Filesystem in Userspace), but at the SCSI layer instead of the
-filesystem layer. A SUSE, if you will.
-
-The disadvantage is there are more distinct components to configure, and
-potentially to malfunction. This is unavoidable, but hopefully not
-fatal if we're careful to keep things as simple as possible.
-
-Design constraints:
-
-- Good performance: high throughput, low latency
-- Cleanly handle if userspace:
-   1) never attaches
-   2) hangs
-   3) dies
-   4) misbehaves
-- Allow future flexibility in user & kernel implementations
-- Be reasonably memory-efficient
-- Simple to configure & run
-- Simple to write a userspace backend
-
-
-Implementation overview:
-
-The core of the TCMU interface is a memory region that is shared
-between kernel and userspace. Within this region is: a control area
-(mailbox); a lockless producer/consumer circular buffer for commands
-to be passed up, and status returned; and an in/out data buffer area.
-
-TCMU uses the pre-existing UIO subsystem. UIO allows device driver
-development in userspace, and this is conceptually very close to the
-TCMU use case, except instead of a physical device, TCMU implements a
-memory-mapped layout designed for SCSI commands. Using UIO also
-benefits TCMU by handling device introspection (e.g. a way for
-userspace to determine how large the shared region is) and signaling
-mechanisms in both directions.
-
-There are no embedded pointers in the memory region. Everything is
-expressed as an offset from the region's starting address. This allows
-the ring to still work if the user process dies and is restarted with
-the region mapped at a different virtual address.
-
-See target_core_user.h for the struct definitions.
-
-The Mailbox:
-
-The mailbox is always at the start of the shared memory region, and
-contains a version, details about the starting offset and size of the
-command ring, and head and tail pointers to be used by the kernel and
-userspace (respectively) to put commands on the ring, and indicate
-when the commands are completed.
-
-version - 1 (userspace should abort if otherwise)
-flags:
-- TCMU_MAILBOX_FLAG_CAP_OOOC: indicates out-of-order completion is
-  supported.  See "The Command Ring" for details.
-cmdr_off - The offset of the start of the command ring from the start
-of the memory region, to account for the mailbox size.
-cmdr_size - The size of the command ring. This does *not* need to be a
-power of two.
-cmd_head - Modified by the kernel to indicate when a command has been
-placed on the ring.
-cmd_tail - Modified by userspace to indicate when it has completed
-processing of a command.
-
-The Command Ring:
-
-Commands are placed on the ring by the kernel incrementing
-mailbox.cmd_head by the size of the command, modulo cmdr_size, and
-then signaling userspace via uio_event_notify(). Once the command is
-completed, userspace updates mailbox.cmd_tail in the same way and
-signals the kernel via a 4-byte write(). When cmd_head equals
-cmd_tail, the ring is empty -- no commands are currently waiting to be
-processed by userspace.
-
-TCMU commands are 8-byte aligned. They start with a common header
-containing "len_op", a 32-bit value that stores the length, as well as
-the opcode in the lowest unused bits. It also contains cmd_id and
-flags fields for setting by the kernel (kflags) and userspace
-(uflags).
-
-Currently only two opcodes are defined, TCMU_OP_CMD and TCMU_OP_PAD.
-
-When the opcode is CMD, the entry in the command ring is a struct
-tcmu_cmd_entry. Userspace finds the SCSI CDB (Command Data Block) via
-tcmu_cmd_entry.req.cdb_off. This is an offset from the start of the
-overall shared memory region, not the entry. The data in/out buffers
-are accessible via tht req.iov[] array. iov_cnt contains the number of
-entries in iov[] needed to describe either the Data-In or Data-Out
-buffers. For bidirectional commands, iov_cnt specifies how many iovec
-entries cover the Data-Out area, and iov_bidi_cnt specifies how many
-iovec entries immediately after that in iov[] cover the Data-In
-area. Just like other fields, iov.iov_base is an offset from the start
-of the region.
-
-When completing a command, userspace sets rsp.scsi_status, and
-rsp.sense_buffer if necessary. Userspace then increments
-mailbox.cmd_tail by entry.hdr.length (mod cmdr_size) and signals the
-kernel via the UIO method, a 4-byte write to the file descriptor.
-
-If TCMU_MAILBOX_FLAG_CAP_OOOC is set for mailbox->flags, kernel is
-capable of handling out-of-order completions. In this case, userspace can
-handle command in different order other than original. Since kernel would
-still process the commands in the same order it appeared in the command
-ring, userspace need to update the cmd->id when completing the
-command(a.k.a steal the original command's entry).
-
-When the opcode is PAD, userspace only updates cmd_tail as above --
-it's a no-op. (The kernel inserts PAD entries to ensure each CMD entry
-is contiguous within the command ring.)
-
-More opcodes may be added in the future. If userspace encounters an
-opcode it does not handle, it must set UNKNOWN_OP bit (bit 0) in
-hdr.uflags, update cmd_tail, and proceed with processing additional
-commands, if any.
-
-The Data Area:
-
-This is shared-memory space after the command ring. The organization
-of this area is not defined in the TCMU interface, and userspace
-should access only the parts referenced by pending iovs.
-
-
-Device Discovery:
-
-Other devices may be using UIO besides TCMU. Unrelated user processes
-may also be handling different sets of TCMU devices. TCMU userspace
-processes must find their devices by scanning sysfs
-class/uio/uio*/name. For TCMU devices, these names will be of the
-format:
-
-tcm-user/<hba_num>/<device_name>/<subtype>/<path>
-
-where "tcm-user" is common for all TCMU-backed UIO devices. <hba_num>
-and <device_name> allow userspace to find the device's path in the
-kernel target's configfs tree. Assuming the usual mount point, it is
-found at:
-
-/sys/kernel/config/target/core/user_<hba_num>/<device_name>
-
-This location contains attributes such as "hw_block_size", that
-userspace needs to know for correct operation.
-
-<subtype> will be a userspace-process-unique string to identify the
-TCMU device as expecting to be backed by a certain handler, and <path>
-will be an additional handler-specific string for the user process to
-configure the device, if needed. The name cannot contain ':', due to
-LIO limitations.
-
-For all devices so discovered, the user handler opens /dev/uioX and
-calls mmap():
-
-mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)
-
-where size must be equal to the value read from
-/sys/class/uio/uioX/maps/map0/size.
-
-
-Device Events:
-
-If a new device is added or removed, a notification will be broadcast
-over netlink, using a generic netlink family name of "TCM-USER" and a
-multicast group named "config". This will include the UIO name as
-described in the previous section, as well as the UIO minor
-number. This should allow userspace to identify both the UIO device and
-the LIO device, so that after determining the device is supported
-(based on subtype) it can take the appropriate action.
-
-
-Other contingencies:
-
-Userspace handler process never attaches:
-
-- TCMU will post commands, and then abort them after a timeout period
-  (30 seconds.)
-
-Userspace handler process is killed:
-
-- It is still possible to restart and re-connect to TCMU
-  devices. Command ring is preserved. However, after the timeout period,
-  the kernel will abort pending tasks.
-
-Userspace handler process hangs:
-
-- The kernel will abort pending tasks after a timeout period.
-
-Userspace handler process is malicious:
-
-- The process can trivially break the handling of devices it controls,
-  but should not be able to access kernel memory outside its shared
-  memory areas.
-
-
-Writing a user pass-through handler (with example code)
--------------------------------------------------------
-
-A user process handing a TCMU device must support the following:
-
-a) Discovering and configuring TCMU uio devices
-b) Waiting for events on the device(s)
-c) Managing the command ring: Parsing operations and commands,
-   performing work as needed, setting response fields (scsi_status and
-   possibly sense_buffer), updating cmd_tail, and notifying the kernel
-   that work has been finished
-
-First, consider instead writing a plugin for tcmu-runner. tcmu-runner
-implements all of this, and provides a higher-level API for plugin
-authors.
-
-TCMU is designed so that multiple unrelated processes can manage TCMU
-devices separately. All handlers should make sure to only open their
-devices, based opon a known subtype string.
-
-a) Discovering and configuring TCMU UIO devices:
-
-(error checking omitted for brevity)
-
-int fd, dev_fd;
-char buf[256];
-unsigned long long map_len;
-void *map;
-
-fd = open("/sys/class/uio/uio0/name", O_RDONLY);
-ret = read(fd, buf, sizeof(buf));
-close(fd);
-buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
-
-/* we only want uio devices whose name is a format we expect */
-if (strncmp(buf, "tcm-user", 8))
-	exit(-1);
-
-/* Further checking for subtype also needed here */
-
-fd = open(/sys/class/uio/%s/maps/map0/size, O_RDONLY);
-ret = read(fd, buf, sizeof(buf));
-close(fd);
-str_buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
-
-map_len = strtoull(buf, NULL, 0);
-
-dev_fd = open("/dev/uio0", O_RDWR);
-map = mmap(NULL, map_len, PROT_READ|PROT_WRITE, MAP_SHARED, dev_fd, 0);
-
-
-b) Waiting for events on the device(s)
-
-while (1) {
-  char buf[4];
-
-  int ret = read(dev_fd, buf, 4); /* will block */
-
-  handle_device_events(dev_fd, map);
-}
-
-
-c) Managing the command ring
-
-#include <linux/target_core_user.h>
-
-int handle_device_events(int fd, void *map)
-{
-  struct tcmu_mailbox *mb = map;
-  struct tcmu_cmd_entry *ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
-  int did_some_work = 0;
-
-  /* Process events from cmd ring until we catch up with cmd_head */
-  while (ent != (void *)mb + mb->cmdr_off + mb->cmd_head) {
-
-    if (tcmu_hdr_get_op(ent->hdr.len_op) == TCMU_OP_CMD) {
-      uint8_t *cdb = (void *)mb + ent->req.cdb_off;
-      bool success = true;
-
-      /* Handle command here. */
-      printf("SCSI opcode: 0x%x\n", cdb[0]);
-
-      /* Set response fields */
-      if (success)
-        ent->rsp.scsi_status = SCSI_NO_SENSE;
-      else {
-        /* Also fill in rsp->sense_buffer here */
-        ent->rsp.scsi_status = SCSI_CHECK_CONDITION;
-      }
-    }
-    else if (tcmu_hdr_get_op(ent->hdr.len_op) != TCMU_OP_PAD) {
-      /* Tell the kernel we didn't handle unknown opcodes */
-      ent->hdr.uflags |= TCMU_UFLAG_UNKNOWN_OP;
-    }
-    else {
-      /* Do nothing for PAD entries except update cmd_tail */
-    }
-
-    /* update cmd_tail */
-    mb->cmd_tail = (mb->cmd_tail + tcmu_hdr_get_len(&ent->hdr)) % mb->cmdr_size;
-    ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
-    did_some_work = 1;
-  }
-
-  /* Notify the kernel that work has been finished */
-  if (did_some_work) {
-    uint32_t buf = 0;
-
-    write(fd, &buf, 4);
-  }
-
-  return 0;
-}
-
-
-A final note
-------------
-
-Please be careful to return codes as defined by the SCSI
-specifications. These are different than some values defined in the
-scsi/scsi.h include file. For example, CHECK CONDITION's status code
-is 2, not 1.
diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index 440227bb55a9..a4139a576726 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -124,7 +124,7 @@ while (<IN>) {
 		# Remove sched-pelt false-positive
 		next if ($fulref =~ m,^Documentation/scheduler/sched-pelt$,);
 
-		# Discard some build examples from Documentation/target/tcm_mod_builder.txt
+		# Discard some build examples from Documentation/target/tcm_mod_builder.rst
 		next if ($fulref =~ m,mnt/sdb/lio-core-2.6.git/Documentation/target,);
 
 		# Check if exists, evaluating wildcards
-- 
cgit v1.2.3-59-g8ed1b


From 458f69ef36656dc74679667380422dd8063eabfb Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:53:00 -0300
Subject: docs: timers: convert docs to ReST and rename to *.rst

The conversion here is really trivial: just a bunch of title
markups and very few puntual changes is enough to make it to
be parsed by Sphinx and generate a nice html.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/timers/NO_HZ.txt        | 318 ---------------------------------
 Documentation/timers/highres.rst      | 250 ++++++++++++++++++++++++++
 Documentation/timers/highres.txt      | 249 --------------------------
 Documentation/timers/hpet.rst         |  30 ++++
 Documentation/timers/hpet.txt         |  28 ---
 Documentation/timers/hrtimers.rst     | 178 +++++++++++++++++++
 Documentation/timers/hrtimers.txt     | 178 -------------------
 Documentation/timers/index.rst        |  22 +++
 Documentation/timers/no_hz.rst        | 326 ++++++++++++++++++++++++++++++++++
 Documentation/timers/timekeeping.rst  | 180 +++++++++++++++++++
 Documentation/timers/timekeeping.txt  | 179 -------------------
 Documentation/timers/timers-howto.rst | 112 ++++++++++++
 Documentation/timers/timers-howto.txt | 105 -----------
 MAINTAINERS                           |   2 +-
 drivers/media/usb/dvb-usb-v2/anysee.c |   2 +-
 drivers/regulator/core.c              |   2 +-
 include/linux/iopoll.h                |   4 +-
 include/linux/regmap.h                |   4 +-
 scripts/checkpatch.pl                 |   8 +-
 sound/soc/sof/ops.h                   |   2 +-
 20 files changed, 1110 insertions(+), 1069 deletions(-)
 delete mode 100644 Documentation/timers/NO_HZ.txt
 create mode 100644 Documentation/timers/highres.rst
 delete mode 100644 Documentation/timers/highres.txt
 create mode 100644 Documentation/timers/hpet.rst
 delete mode 100644 Documentation/timers/hpet.txt
 create mode 100644 Documentation/timers/hrtimers.rst
 delete mode 100644 Documentation/timers/hrtimers.txt
 create mode 100644 Documentation/timers/index.rst
 create mode 100644 Documentation/timers/no_hz.rst
 create mode 100644 Documentation/timers/timekeeping.rst
 delete mode 100644 Documentation/timers/timekeeping.txt
 create mode 100644 Documentation/timers/timers-howto.rst
 delete mode 100644 Documentation/timers/timers-howto.txt

diff --git a/Documentation/timers/NO_HZ.txt b/Documentation/timers/NO_HZ.txt
deleted file mode 100644
index 9591092da5e0..000000000000
--- a/Documentation/timers/NO_HZ.txt
+++ /dev/null
@@ -1,318 +0,0 @@
-		NO_HZ: Reducing Scheduling-Clock Ticks
-
-
-This document describes Kconfig options and boot parameters that can
-reduce the number of scheduling-clock interrupts, thereby improving energy
-efficiency and reducing OS jitter.  Reducing OS jitter is important for
-some types of computationally intensive high-performance computing (HPC)
-applications and for real-time applications.
-
-There are three main ways of managing scheduling-clock interrupts
-(also known as "scheduling-clock ticks" or simply "ticks"):
-
-1.	Never omit scheduling-clock ticks (CONFIG_HZ_PERIODIC=y or
-	CONFIG_NO_HZ=n for older kernels).  You normally will -not-
-	want to choose this option.
-
-2.	Omit scheduling-clock ticks on idle CPUs (CONFIG_NO_HZ_IDLE=y or
-	CONFIG_NO_HZ=y for older kernels).  This is the most common
-	approach, and should be the default.
-
-3.	Omit scheduling-clock ticks on CPUs that are either idle or that
-	have only one runnable task (CONFIG_NO_HZ_FULL=y).  Unless you
-	are running realtime applications or certain types of HPC
-	workloads, you will normally -not- want this option.
-
-These three cases are described in the following three sections, followed
-by a third section on RCU-specific considerations, a fourth section
-discussing testing, and a fifth and final section listing known issues.
-
-
-NEVER OMIT SCHEDULING-CLOCK TICKS
-
-Very old versions of Linux from the 1990s and the very early 2000s
-are incapable of omitting scheduling-clock ticks.  It turns out that
-there are some situations where this old-school approach is still the
-right approach, for example, in heavy workloads with lots of tasks
-that use short bursts of CPU, where there are very frequent idle
-periods, but where these idle periods are also quite short (tens or
-hundreds of microseconds).  For these types of workloads, scheduling
-clock interrupts will normally be delivered any way because there
-will frequently be multiple runnable tasks per CPU.  In these cases,
-attempting to turn off the scheduling clock interrupt will have no effect
-other than increasing the overhead of switching to and from idle and
-transitioning between user and kernel execution.
-
-This mode of operation can be selected using CONFIG_HZ_PERIODIC=y (or
-CONFIG_NO_HZ=n for older kernels).
-
-However, if you are instead running a light workload with long idle
-periods, failing to omit scheduling-clock interrupts will result in
-excessive power consumption.  This is especially bad on battery-powered
-devices, where it results in extremely short battery lifetimes.  If you
-are running light workloads, you should therefore read the following
-section.
-
-In addition, if you are running either a real-time workload or an HPC
-workload with short iterations, the scheduling-clock interrupts can
-degrade your applications performance.  If this describes your workload,
-you should read the following two sections.
-
-
-OMIT SCHEDULING-CLOCK TICKS FOR IDLE CPUs
-
-If a CPU is idle, there is little point in sending it a scheduling-clock
-interrupt.  After all, the primary purpose of a scheduling-clock interrupt
-is to force a busy CPU to shift its attention among multiple duties,
-and an idle CPU has no duties to shift its attention among.
-
-The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
-scheduling-clock interrupts to idle CPUs, which is critically important
-both to battery-powered devices and to highly virtualized mainframes.
-A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would
-drain its battery very quickly, easily 2-3 times as fast as would the
-same device running a CONFIG_NO_HZ_IDLE=y kernel.  A mainframe running
-1,500 OS instances might find that half of its CPU time was consumed by
-unnecessary scheduling-clock interrupts.  In these situations, there
-is strong motivation to avoid sending scheduling-clock interrupts to
-idle CPUs.  That said, dyntick-idle mode is not free:
-
-1.	It increases the number of instructions executed on the path
-	to and from the idle loop.
-
-2.	On many architectures, dyntick-idle mode also increases the
-	number of expensive clock-reprogramming operations.
-
-Therefore, systems with aggressive real-time response constraints often
-run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
-in order to avoid degrading from-idle transition latencies.
-
-An idle CPU that is not receiving scheduling-clock interrupts is said to
-be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
-tickless".  The remainder of this document will use "dyntick-idle mode".
-
-There is also a boot parameter "nohz=" that can be used to disable
-dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
-By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
-dyntick-idle mode.
-
-
-OMIT SCHEDULING-CLOCK TICKS FOR CPUs WITH ONLY ONE RUNNABLE TASK
-
-If a CPU has only one runnable task, there is little point in sending it
-a scheduling-clock interrupt because there is no other task to switch to.
-Note that omitting scheduling-clock ticks for CPUs with only one runnable
-task implies also omitting them for idle CPUs.
-
-The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
-sending scheduling-clock interrupts to CPUs with a single runnable task,
-and such CPUs are said to be "adaptive-ticks CPUs".  This is important
-for applications with aggressive real-time response constraints because
-it allows them to improve their worst-case response times by the maximum
-duration of a scheduling-clock interrupt.  It is also important for
-computationally intensive short-iteration workloads:  If any CPU is
-delayed during a given iteration, all the other CPUs will be forced to
-wait idle while the delayed CPU finishes.  Thus, the delay is multiplied
-by one less than the number of CPUs.  In these situations, there is
-again strong motivation to avoid sending scheduling-clock interrupts.
-
-By default, no CPU will be an adaptive-ticks CPU.  The "nohz_full="
-boot parameter specifies the adaptive-ticks CPUs.  For example,
-"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
-CPUs.  Note that you are prohibited from marking all of the CPUs as
-adaptive-tick CPUs:  At least one non-adaptive-tick CPU must remain
-online to handle timekeeping tasks in order to ensure that system
-calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
-(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
-user processes to observe slight drifts in clock rate.)  Therefore, the
-boot CPU is prohibited from entering adaptive-ticks mode.  Specifying a
-"nohz_full=" mask that includes the boot CPU will result in a boot-time
-error message, and the boot CPU will be removed from the mask.  Note that
-this means that your system must have at least two CPUs in order for
-CONFIG_NO_HZ_FULL=y to do anything for you.
-
-Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
-This is covered in the "RCU IMPLICATIONS" section below.
-
-Normally, a CPU remains in adaptive-ticks mode as long as possible.
-In particular, transitioning to kernel mode does not automatically change
-the mode.  Instead, the CPU will exit adaptive-ticks mode only if needed,
-for example, if that CPU enqueues an RCU callback.
-
-Just as with dyntick-idle mode, the benefits of adaptive-tick mode do
-not come for free:
-
-1.	CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run
-	adaptive ticks without also running dyntick idle.  This dependency
-	extends down into the implementation, so that all of the costs
-	of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL.
-
-2.	The user/kernel transitions are slightly more expensive due
-	to the need to inform kernel subsystems (such as RCU) about
-	the change in mode.
-
-3.	POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
-	Real-time applications needing to take actions based on CPU time
-	consumption need to use other means of doing so.
-
-4.	If there are more perf events pending than the hardware can
-	accommodate, they are normally round-robined so as to collect
-	all of them over time.  Adaptive-tick mode may prevent this
-	round-robining from happening.  This will likely be fixed by
-	preventing CPUs with large numbers of perf events pending from
-	entering adaptive-tick mode.
-
-5.	Scheduler statistics for adaptive-tick CPUs may be computed
-	slightly differently than those for non-adaptive-tick CPUs.
-	This might in turn perturb load-balancing of real-time tasks.
-
-6.	The LB_BIAS scheduler feature is disabled by adaptive ticks.
-
-Although improvements are expected over time, adaptive ticks is quite
-useful for many types of real-time and compute-intensive applications.
-However, the drawbacks listed above mean that adaptive ticks should not
-(yet) be enabled by default.
-
-
-RCU IMPLICATIONS
-
-There are situations in which idle CPUs cannot be permitted to
-enter either dyntick-idle mode or adaptive-tick mode, the most
-common being when that CPU has RCU callbacks pending.
-
-The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
-to enter dyntick-idle mode or adaptive-tick mode anyway.  In this case,
-a timer will awaken these CPUs every four jiffies in order to ensure
-that the RCU callbacks are processed in a timely fashion.
-
-Another approach is to offload RCU callback processing to "rcuo" kthreads
-using the CONFIG_RCU_NOCB_CPU=y Kconfig option.  The specific CPUs to
-offload may be selected using The "rcu_nocbs=" kernel boot parameter,
-which takes a comma-separated list of CPUs and CPU ranges, for example,
-"1,3-5" selects CPUs 1, 3, 4, and 5.
-
-The offloaded CPUs will never queue RCU callbacks, and therefore RCU
-never prevents offloaded CPUs from entering either dyntick-idle mode
-or adaptive-tick mode.  That said, note that it is up to userspace to
-pin the "rcuo" kthreads to specific CPUs if desired.  Otherwise, the
-scheduler will decide where to run them, which might or might not be
-where you want them to run.
-
-
-TESTING
-
-So you enable all the OS-jitter features described in this document,
-but do not see any change in your workload's behavior.  Is this because
-your workload isn't affected that much by OS jitter, or is it because
-something else is in the way?  This section helps answer this question
-by providing a simple OS-jitter test suite, which is available on branch
-master of the following git archive:
-
-git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git
-
-Clone this archive and follow the instructions in the README file.
-This test procedure will produce a trace that will allow you to evaluate
-whether or not you have succeeded in removing OS jitter from your system.
-If this trace shows that you have removed OS jitter as much as is
-possible, then you can conclude that your workload is not all that
-sensitive to OS jitter.
-
-Note: this test requires that your system have at least two CPUs.
-We do not currently have a good way to remove OS jitter from single-CPU
-systems.
-
-
-KNOWN ISSUES
-
-o	Dyntick-idle slows transitions to and from idle slightly.
-	In practice, this has not been a problem except for the most
-	aggressive real-time workloads, which have the option of disabling
-	dyntick-idle mode, an option that most of them take.  However,
-	some workloads will no doubt want to use adaptive ticks to
-	eliminate scheduling-clock interrupt latencies.  Here are some
-	options for these workloads:
-
-	a.	Use PMQOS from userspace to inform the kernel of your
-		latency requirements (preferred).
-
-	b.	On x86 systems, use the "idle=mwait" boot parameter.
-
-	c.	On x86 systems, use the "intel_idle.max_cstate=" to limit
-	`	the maximum C-state depth.
-
-	d.	On x86 systems, use the "idle=poll" boot parameter.
-		However, please note that use of this parameter can cause
-		your CPU to overheat, which may cause thermal throttling
-		to degrade your latencies -- and that this degradation can
-		be even worse than that of dyntick-idle.  Furthermore,
-		this parameter effectively disables Turbo Mode on Intel
-		CPUs, which can significantly reduce maximum performance.
-
-o	Adaptive-ticks slows user/kernel transitions slightly.
-	This is not expected to be a problem for computationally intensive
-	workloads, which have few such transitions.  Careful benchmarking
-	will be required to determine whether or not other workloads
-	are significantly affected by this effect.
-
-o	Adaptive-ticks does not do anything unless there is only one
-	runnable task for a given CPU, even though there are a number
-	of other situations where the scheduling-clock tick is not
-	needed.  To give but one example, consider a CPU that has one
-	runnable high-priority SCHED_FIFO task and an arbitrary number
-	of low-priority SCHED_OTHER tasks.  In this case, the CPU is
-	required to run the SCHED_FIFO task until it either blocks or
-	some other higher-priority task awakens on (or is assigned to)
-	this CPU, so there is no point in sending a scheduling-clock
-	interrupt to this CPU.	However, the current implementation
-	nevertheless sends scheduling-clock interrupts to CPUs having a
-	single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
-	tasks, even though these interrupts are unnecessary.
-
-	And even when there are multiple runnable tasks on a given CPU,
-	there is little point in interrupting that CPU until the current
-	running task's timeslice expires, which is almost always way
-	longer than the time of the next scheduling-clock interrupt.
-
-	Better handling of these sorts of situations is future work.
-
-o	A reboot is required to reconfigure both adaptive idle and RCU
-	callback offloading.  Runtime reconfiguration could be provided
-	if needed, however, due to the complexity of reconfiguring RCU at
-	runtime, there would need to be an earthshakingly good reason.
-	Especially given that you have the straightforward option of
-	simply offloading RCU callbacks from all CPUs and pinning them
-	where you want them whenever you want them pinned.
-
-o	Additional configuration is required to deal with other sources
-	of OS jitter, including interrupts and system-utility tasks
-	and processes.  This configuration normally involves binding
-	interrupts and tasks to particular CPUs.
-
-o	Some sources of OS jitter can currently be eliminated only by
-	constraining the workload.  For example, the only way to eliminate
-	OS jitter due to global TLB shootdowns is to avoid the unmapping
-	operations (such as kernel module unload operations) that
-	result in these shootdowns.  For another example, page faults
-	and TLB misses can be reduced (and in some cases eliminated) by
-	using huge pages and by constraining the amount of memory used
-	by the application.  Pre-faulting the working set can also be
-	helpful, especially when combined with the mlock() and mlockall()
-	system calls.
-
-o	Unless all CPUs are idle, at least one CPU must keep the
-	scheduling-clock interrupt going in order to support accurate
-	timekeeping.
-
-o	If there might potentially be some adaptive-ticks CPUs, there
-	will be at least one CPU keeping the scheduling-clock interrupt
-	going, even if all CPUs are otherwise idle.
-
-	Better handling of this situation is ongoing work.
-
-o	Some process-handling operations still require the occasional
-	scheduling-clock tick.	These operations include calculating CPU
-	load, maintaining sched average, computing CFS entity vruntime,
-	computing avenrun, and carrying out load balancing.  They are
-	currently accommodated by scheduling-clock tick every second
-	or so.	On-going work will eliminate the need even for these
-	infrequent scheduling-clock ticks.
diff --git a/Documentation/timers/highres.rst b/Documentation/timers/highres.rst
new file mode 100644
index 000000000000..bde5eb7e5c9e
--- /dev/null
+++ b/Documentation/timers/highres.rst
@@ -0,0 +1,250 @@
+=====================================================
+High resolution timers and dynamic ticks design notes
+=====================================================
+
+Further information can be found in the paper of the OLS 2006 talk "hrtimers
+and beyond". The paper is part of the OLS 2006 Proceedings Volume 1, which can
+be found on the OLS website:
+https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf
+
+The slides to this talk are available from:
+http://www.cs.columbia.edu/~nahum/w6998/papers/ols2006-hrtimers-slides.pdf
+
+The slides contain five figures (pages 2, 15, 18, 20, 22), which illustrate the
+changes in the time(r) related Linux subsystems. Figure #1 (p. 2) shows the
+design of the Linux time(r) system before hrtimers and other building blocks
+got merged into mainline.
+
+Note: the paper and the slides are talking about "clock event source", while we
+switched to the name "clock event devices" in meantime.
+
+The design contains the following basic building blocks:
+
+- hrtimer base infrastructure
+- timeofday and clock source management
+- clock event management
+- high resolution timer functionality
+- dynamic ticks
+
+
+hrtimer base infrastructure
+---------------------------
+
+The hrtimer base infrastructure was merged into the 2.6.16 kernel. Details of
+the base implementation are covered in Documentation/timers/hrtimers.rst. See
+also figure #2 (OLS slides p. 15)
+
+The main differences to the timer wheel, which holds the armed timer_list type
+timers are:
+
+       - time ordered enqueueing into a rb-tree
+       - independent of ticks (the processing is based on nanoseconds)
+
+
+timeofday and clock source management
+-------------------------------------
+
+John Stultz's Generic Time Of Day (GTOD) framework moves a large portion of
+code out of the architecture-specific areas into a generic management
+framework, as illustrated in figure #3 (OLS slides p. 18). The architecture
+specific portion is reduced to the low level hardware details of the clock
+sources, which are registered in the framework and selected on a quality based
+decision. The low level code provides hardware setup and readout routines and
+initializes data structures, which are used by the generic time keeping code to
+convert the clock ticks to nanosecond based time values. All other time keeping
+related functionality is moved into the generic code. The GTOD base patch got
+merged into the 2.6.18 kernel.
+
+Further information about the Generic Time Of Day framework is available in the
+OLS 2005 Proceedings Volume 1:
+
+	http://www.linuxsymposium.org/2005/linuxsymposium_procv1.pdf
+
+The paper "We Are Not Getting Any Younger: A New Approach to Time and
+Timers" was written by J. Stultz, D.V. Hart, & N. Aravamudan.
+
+Figure #3 (OLS slides p.18) illustrates the transformation.
+
+
+clock event management
+----------------------
+
+While clock sources provide read access to the monotonically increasing time
+value, clock event devices are used to schedule the next event
+interrupt(s). The next event is currently defined to be periodic, with its
+period defined at compile time. The setup and selection of the event device
+for various event driven functionalities is hardwired into the architecture
+dependent code. This results in duplicated code across all architectures and
+makes it extremely difficult to change the configuration of the system to use
+event interrupt devices other than those already built into the
+architecture. Another implication of the current design is that it is necessary
+to touch all the architecture-specific implementations in order to provide new
+functionality like high resolution timers or dynamic ticks.
+
+The clock events subsystem tries to address this problem by providing a generic
+solution to manage clock event devices and their usage for the various clock
+event driven kernel functionalities. The goal of the clock event subsystem is
+to minimize the clock event related architecture dependent code to the pure
+hardware related handling and to allow easy addition and utilization of new
+clock event devices. It also minimizes the duplicated code across the
+architectures as it provides generic functionality down to the interrupt
+service handler, which is almost inherently hardware dependent.
+
+Clock event devices are registered either by the architecture dependent boot
+code or at module insertion time. Each clock event device fills a data
+structure with clock-specific property parameters and callback functions. The
+clock event management decides, by using the specified property parameters, the
+set of system functions a clock event device will be used to support. This
+includes the distinction of per-CPU and per-system global event devices.
+
+System-level global event devices are used for the Linux periodic tick. Per-CPU
+event devices are used to provide local CPU functionality such as process
+accounting, profiling, and high resolution timers.
+
+The management layer assigns one or more of the following functions to a clock
+event device:
+
+      - system global periodic tick (jiffies update)
+      - cpu local update_process_times
+      - cpu local profiling
+      - cpu local next event interrupt (non periodic mode)
+
+The clock event device delegates the selection of those timer interrupt related
+functions completely to the management layer. The clock management layer stores
+a function pointer in the device description structure, which has to be called
+from the hardware level handler. This removes a lot of duplicated code from the
+architecture specific timer interrupt handlers and hands the control over the
+clock event devices and the assignment of timer interrupt related functionality
+to the core code.
+
+The clock event layer API is rather small. Aside from the clock event device
+registration interface it provides functions to schedule the next event
+interrupt, clock event device notification service and support for suspend and
+resume.
+
+The framework adds about 700 lines of code which results in a 2KB increase of
+the kernel binary size. The conversion of i386 removes about 100 lines of
+code. The binary size decrease is in the range of 400 byte. We believe that the
+increase of flexibility and the avoidance of duplicated code across
+architectures justifies the slight increase of the binary size.
+
+The conversion of an architecture has no functional impact, but allows to
+utilize the high resolution and dynamic tick functionalities without any change
+to the clock event device and timer interrupt code. After the conversion the
+enabling of high resolution timers and dynamic ticks is simply provided by
+adding the kernel/time/Kconfig file to the architecture specific Kconfig and
+adding the dynamic tick specific calls to the idle routine (a total of 3 lines
+added to the idle function and the Kconfig file)
+
+Figure #4 (OLS slides p.20) illustrates the transformation.
+
+
+high resolution timer functionality
+-----------------------------------
+
+During system boot it is not possible to use the high resolution timer
+functionality, while making it possible would be difficult and would serve no
+useful function. The initialization of the clock event device framework, the
+clock source framework (GTOD) and hrtimers itself has to be done and
+appropriate clock sources and clock event devices have to be registered before
+the high resolution functionality can work. Up to the point where hrtimers are
+initialized, the system works in the usual low resolution periodic mode. The
+clock source and the clock event device layers provide notification functions
+which inform hrtimers about availability of new hardware. hrtimers validates
+the usability of the registered clock sources and clock event devices before
+switching to high resolution mode. This ensures also that a kernel which is
+configured for high resolution timers can run on a system which lacks the
+necessary hardware support.
+
+The high resolution timer code does not support SMP machines which have only
+global clock event devices. The support of such hardware would involve IPI
+calls when an interrupt happens. The overhead would be much larger than the
+benefit. This is the reason why we currently disable high resolution and
+dynamic ticks on i386 SMP systems which stop the local APIC in C3 power
+state. A workaround is available as an idea, but the problem has not been
+tackled yet.
+
+The time ordered insertion of timers provides all the infrastructure to decide
+whether the event device has to be reprogrammed when a timer is added. The
+decision is made per timer base and synchronized across per-cpu timer bases in
+a support function. The design allows the system to utilize separate per-CPU
+clock event devices for the per-CPU timer bases, but currently only one
+reprogrammable clock event device per-CPU is utilized.
+
+When the timer interrupt happens, the next event interrupt handler is called
+from the clock event distribution code and moves expired timers from the
+red-black tree to a separate double linked list and invokes the softirq
+handler. An additional mode field in the hrtimer structure allows the system to
+execute callback functions directly from the next event interrupt handler. This
+is restricted to code which can safely be executed in the hard interrupt
+context. This applies, for example, to the common case of a wakeup function as
+used by nanosleep. The advantage of executing the handler in the interrupt
+context is the avoidance of up to two context switches - from the interrupted
+context to the softirq and to the task which is woken up by the expired
+timer.
+
+Once a system has switched to high resolution mode, the periodic tick is
+switched off. This disables the per system global periodic clock event device -
+e.g. the PIT on i386 SMP systems.
+
+The periodic tick functionality is provided by an per-cpu hrtimer. The callback
+function is executed in the next event interrupt context and updates jiffies
+and calls update_process_times and profiling. The implementation of the hrtimer
+based periodic tick is designed to be extended with dynamic tick functionality.
+This allows to use a single clock event device to schedule high resolution
+timer and periodic events (jiffies tick, profiling, process accounting) on UP
+systems. This has been proved to work with the PIT on i386 and the Incrementer
+on PPC.
+
+The softirq for running the hrtimer queues and executing the callbacks has been
+separated from the tick bound timer softirq to allow accurate delivery of high
+resolution timer signals which are used by itimer and POSIX interval
+timers. The execution of this softirq can still be delayed by other softirqs,
+but the overall latencies have been significantly improved by this separation.
+
+Figure #5 (OLS slides p.22) illustrates the transformation.
+
+
+dynamic ticks
+-------------
+
+Dynamic ticks are the logical consequence of the hrtimer based periodic tick
+replacement (sched_tick). The functionality of the sched_tick hrtimer is
+extended by three functions:
+
+- hrtimer_stop_sched_tick
+- hrtimer_restart_sched_tick
+- hrtimer_update_jiffies
+
+hrtimer_stop_sched_tick() is called when a CPU goes into idle state. The code
+evaluates the next scheduled timer event (from both hrtimers and the timer
+wheel) and in case that the next event is further away than the next tick it
+reprograms the sched_tick to this future event, to allow longer idle sleeps
+without worthless interruption by the periodic tick. The function is also
+called when an interrupt happens during the idle period, which does not cause a
+reschedule. The call is necessary as the interrupt handler might have armed a
+new timer whose expiry time is before the time which was identified as the
+nearest event in the previous call to hrtimer_stop_sched_tick.
+
+hrtimer_restart_sched_tick() is called when the CPU leaves the idle state before
+it calls schedule(). hrtimer_restart_sched_tick() resumes the periodic tick,
+which is kept active until the next call to hrtimer_stop_sched_tick().
+
+hrtimer_update_jiffies() is called from irq_enter() when an interrupt happens
+in the idle period to make sure that jiffies are up to date and the interrupt
+handler has not to deal with an eventually stale jiffy value.
+
+The dynamic tick feature provides statistical values which are exported to
+userspace via /proc/stat and can be made available for enhanced power
+management control.
+
+The implementation leaves room for further development like full tickless
+systems, where the time slice is controlled by the scheduler, variable
+frequency profiling, and a complete removal of jiffies in the future.
+
+
+Aside the current initial submission of i386 support, the patchset has been
+extended to x86_64 and ARM already. Initial (work in progress) support is also
+available for MIPS and PowerPC.
+
+	  Thomas, Ingo
diff --git a/Documentation/timers/highres.txt b/Documentation/timers/highres.txt
deleted file mode 100644
index 8f9741592123..000000000000
--- a/Documentation/timers/highres.txt
+++ /dev/null
@@ -1,249 +0,0 @@
-High resolution timers and dynamic ticks design notes
------------------------------------------------------
-
-Further information can be found in the paper of the OLS 2006 talk "hrtimers
-and beyond". The paper is part of the OLS 2006 Proceedings Volume 1, which can
-be found on the OLS website:
-https://www.kernel.org/doc/ols/2006/ols2006v1-pages-333-346.pdf
-
-The slides to this talk are available from:
-http://www.cs.columbia.edu/~nahum/w6998/papers/ols2006-hrtimers-slides.pdf
-
-The slides contain five figures (pages 2, 15, 18, 20, 22), which illustrate the
-changes in the time(r) related Linux subsystems. Figure #1 (p. 2) shows the
-design of the Linux time(r) system before hrtimers and other building blocks
-got merged into mainline.
-
-Note: the paper and the slides are talking about "clock event source", while we
-switched to the name "clock event devices" in meantime.
-
-The design contains the following basic building blocks:
-
-- hrtimer base infrastructure
-- timeofday and clock source management
-- clock event management
-- high resolution timer functionality
-- dynamic ticks
-
-
-hrtimer base infrastructure
----------------------------
-
-The hrtimer base infrastructure was merged into the 2.6.16 kernel. Details of
-the base implementation are covered in Documentation/timers/hrtimers.txt. See
-also figure #2 (OLS slides p. 15)
-
-The main differences to the timer wheel, which holds the armed timer_list type
-timers are:
-       - time ordered enqueueing into a rb-tree
-       - independent of ticks (the processing is based on nanoseconds)
-
-
-timeofday and clock source management
--------------------------------------
-
-John Stultz's Generic Time Of Day (GTOD) framework moves a large portion of
-code out of the architecture-specific areas into a generic management
-framework, as illustrated in figure #3 (OLS slides p. 18). The architecture
-specific portion is reduced to the low level hardware details of the clock
-sources, which are registered in the framework and selected on a quality based
-decision. The low level code provides hardware setup and readout routines and
-initializes data structures, which are used by the generic time keeping code to
-convert the clock ticks to nanosecond based time values. All other time keeping
-related functionality is moved into the generic code. The GTOD base patch got
-merged into the 2.6.18 kernel.
-
-Further information about the Generic Time Of Day framework is available in the
-OLS 2005 Proceedings Volume 1:
-http://www.linuxsymposium.org/2005/linuxsymposium_procv1.pdf
-
-The paper "We Are Not Getting Any Younger: A New Approach to Time and
-Timers" was written by J. Stultz, D.V. Hart, & N. Aravamudan.
-
-Figure #3 (OLS slides p.18) illustrates the transformation.
-
-
-clock event management
-----------------------
-
-While clock sources provide read access to the monotonically increasing time
-value, clock event devices are used to schedule the next event
-interrupt(s). The next event is currently defined to be periodic, with its
-period defined at compile time. The setup and selection of the event device
-for various event driven functionalities is hardwired into the architecture
-dependent code. This results in duplicated code across all architectures and
-makes it extremely difficult to change the configuration of the system to use
-event interrupt devices other than those already built into the
-architecture. Another implication of the current design is that it is necessary
-to touch all the architecture-specific implementations in order to provide new
-functionality like high resolution timers or dynamic ticks.
-
-The clock events subsystem tries to address this problem by providing a generic
-solution to manage clock event devices and their usage for the various clock
-event driven kernel functionalities. The goal of the clock event subsystem is
-to minimize the clock event related architecture dependent code to the pure
-hardware related handling and to allow easy addition and utilization of new
-clock event devices. It also minimizes the duplicated code across the
-architectures as it provides generic functionality down to the interrupt
-service handler, which is almost inherently hardware dependent.
-
-Clock event devices are registered either by the architecture dependent boot
-code or at module insertion time. Each clock event device fills a data
-structure with clock-specific property parameters and callback functions. The
-clock event management decides, by using the specified property parameters, the
-set of system functions a clock event device will be used to support. This
-includes the distinction of per-CPU and per-system global event devices.
-
-System-level global event devices are used for the Linux periodic tick. Per-CPU
-event devices are used to provide local CPU functionality such as process
-accounting, profiling, and high resolution timers.
-
-The management layer assigns one or more of the following functions to a clock
-event device:
-      - system global periodic tick (jiffies update)
-      - cpu local update_process_times
-      - cpu local profiling
-      - cpu local next event interrupt (non periodic mode)
-
-The clock event device delegates the selection of those timer interrupt related
-functions completely to the management layer. The clock management layer stores
-a function pointer in the device description structure, which has to be called
-from the hardware level handler. This removes a lot of duplicated code from the
-architecture specific timer interrupt handlers and hands the control over the
-clock event devices and the assignment of timer interrupt related functionality
-to the core code.
-
-The clock event layer API is rather small. Aside from the clock event device
-registration interface it provides functions to schedule the next event
-interrupt, clock event device notification service and support for suspend and
-resume.
-
-The framework adds about 700 lines of code which results in a 2KB increase of
-the kernel binary size. The conversion of i386 removes about 100 lines of
-code. The binary size decrease is in the range of 400 byte. We believe that the
-increase of flexibility and the avoidance of duplicated code across
-architectures justifies the slight increase of the binary size.
-
-The conversion of an architecture has no functional impact, but allows to
-utilize the high resolution and dynamic tick functionalities without any change
-to the clock event device and timer interrupt code. After the conversion the
-enabling of high resolution timers and dynamic ticks is simply provided by
-adding the kernel/time/Kconfig file to the architecture specific Kconfig and
-adding the dynamic tick specific calls to the idle routine (a total of 3 lines
-added to the idle function and the Kconfig file)
-
-Figure #4 (OLS slides p.20) illustrates the transformation.
-
-
-high resolution timer functionality
------------------------------------
-
-During system boot it is not possible to use the high resolution timer
-functionality, while making it possible would be difficult and would serve no
-useful function. The initialization of the clock event device framework, the
-clock source framework (GTOD) and hrtimers itself has to be done and
-appropriate clock sources and clock event devices have to be registered before
-the high resolution functionality can work. Up to the point where hrtimers are
-initialized, the system works in the usual low resolution periodic mode. The
-clock source and the clock event device layers provide notification functions
-which inform hrtimers about availability of new hardware. hrtimers validates
-the usability of the registered clock sources and clock event devices before
-switching to high resolution mode. This ensures also that a kernel which is
-configured for high resolution timers can run on a system which lacks the
-necessary hardware support.
-
-The high resolution timer code does not support SMP machines which have only
-global clock event devices. The support of such hardware would involve IPI
-calls when an interrupt happens. The overhead would be much larger than the
-benefit. This is the reason why we currently disable high resolution and
-dynamic ticks on i386 SMP systems which stop the local APIC in C3 power
-state. A workaround is available as an idea, but the problem has not been
-tackled yet.
-
-The time ordered insertion of timers provides all the infrastructure to decide
-whether the event device has to be reprogrammed when a timer is added. The
-decision is made per timer base and synchronized across per-cpu timer bases in
-a support function. The design allows the system to utilize separate per-CPU
-clock event devices for the per-CPU timer bases, but currently only one
-reprogrammable clock event device per-CPU is utilized.
-
-When the timer interrupt happens, the next event interrupt handler is called
-from the clock event distribution code and moves expired timers from the
-red-black tree to a separate double linked list and invokes the softirq
-handler. An additional mode field in the hrtimer structure allows the system to
-execute callback functions directly from the next event interrupt handler. This
-is restricted to code which can safely be executed in the hard interrupt
-context. This applies, for example, to the common case of a wakeup function as
-used by nanosleep. The advantage of executing the handler in the interrupt
-context is the avoidance of up to two context switches - from the interrupted
-context to the softirq and to the task which is woken up by the expired
-timer.
-
-Once a system has switched to high resolution mode, the periodic tick is
-switched off. This disables the per system global periodic clock event device -
-e.g. the PIT on i386 SMP systems.
-
-The periodic tick functionality is provided by an per-cpu hrtimer. The callback
-function is executed in the next event interrupt context and updates jiffies
-and calls update_process_times and profiling. The implementation of the hrtimer
-based periodic tick is designed to be extended with dynamic tick functionality.
-This allows to use a single clock event device to schedule high resolution
-timer and periodic events (jiffies tick, profiling, process accounting) on UP
-systems. This has been proved to work with the PIT on i386 and the Incrementer
-on PPC.
-
-The softirq for running the hrtimer queues and executing the callbacks has been
-separated from the tick bound timer softirq to allow accurate delivery of high
-resolution timer signals which are used by itimer and POSIX interval
-timers. The execution of this softirq can still be delayed by other softirqs,
-but the overall latencies have been significantly improved by this separation.
-
-Figure #5 (OLS slides p.22) illustrates the transformation.
-
-
-dynamic ticks
--------------
-
-Dynamic ticks are the logical consequence of the hrtimer based periodic tick
-replacement (sched_tick). The functionality of the sched_tick hrtimer is
-extended by three functions:
-
-- hrtimer_stop_sched_tick
-- hrtimer_restart_sched_tick
-- hrtimer_update_jiffies
-
-hrtimer_stop_sched_tick() is called when a CPU goes into idle state. The code
-evaluates the next scheduled timer event (from both hrtimers and the timer
-wheel) and in case that the next event is further away than the next tick it
-reprograms the sched_tick to this future event, to allow longer idle sleeps
-without worthless interruption by the periodic tick. The function is also
-called when an interrupt happens during the idle period, which does not cause a
-reschedule. The call is necessary as the interrupt handler might have armed a
-new timer whose expiry time is before the time which was identified as the
-nearest event in the previous call to hrtimer_stop_sched_tick.
-
-hrtimer_restart_sched_tick() is called when the CPU leaves the idle state before
-it calls schedule(). hrtimer_restart_sched_tick() resumes the periodic tick,
-which is kept active until the next call to hrtimer_stop_sched_tick().
-
-hrtimer_update_jiffies() is called from irq_enter() when an interrupt happens
-in the idle period to make sure that jiffies are up to date and the interrupt
-handler has not to deal with an eventually stale jiffy value.
-
-The dynamic tick feature provides statistical values which are exported to
-userspace via /proc/stat and can be made available for enhanced power
-management control.
-
-The implementation leaves room for further development like full tickless
-systems, where the time slice is controlled by the scheduler, variable
-frequency profiling, and a complete removal of jiffies in the future.
-
-
-Aside the current initial submission of i386 support, the patchset has been
-extended to x86_64 and ARM already. Initial (work in progress) support is also
-available for MIPS and PowerPC.
-
-	  Thomas, Ingo
-
-
-
diff --git a/Documentation/timers/hpet.rst b/Documentation/timers/hpet.rst
new file mode 100644
index 000000000000..c9d05d3caaca
--- /dev/null
+++ b/Documentation/timers/hpet.rst
@@ -0,0 +1,30 @@
+===========================================
+High Precision Event Timer Driver for Linux
+===========================================
+
+The High Precision Event Timer (HPET) hardware follows a specification
+by Intel and Microsoft, revision 1.
+
+Each HPET has one fixed-rate counter (at 10+ MHz, hence "High Precision")
+and up to 32 comparators.  Normally three or more comparators are provided,
+each of which can generate oneshot interrupts and at least one of which has
+additional hardware to support periodic interrupts.  The comparators are
+also called "timers", which can be misleading since usually timers are
+independent of each other ... these share a counter, complicating resets.
+
+HPET devices can support two interrupt routing modes.  In one mode, the
+comparators are additional interrupt sources with no particular system
+role.  Many x86 BIOS writers don't route HPET interrupts at all, which
+prevents use of that mode.  They support the other "legacy replacement"
+mode where the first two comparators block interrupts from 8254 timers
+and from the RTC.
+
+The driver supports detection of HPET driver allocation and initialization
+of the HPET before the driver module_init routine is called.  This enables
+platform code which uses timer 0 or 1 as the main timer to intercept HPET
+initialization.  An example of this initialization can be found in
+arch/x86/kernel/hpet.c.
+
+The driver provides a userspace API which resembles the API found in the
+RTC driver framework.  An example user space program is provided in
+file:samples/timers/hpet_example.c
diff --git a/Documentation/timers/hpet.txt b/Documentation/timers/hpet.txt
deleted file mode 100644
index 895345ec513b..000000000000
--- a/Documentation/timers/hpet.txt
+++ /dev/null
@@ -1,28 +0,0 @@
-		High Precision Event Timer Driver for Linux
-
-The High Precision Event Timer (HPET) hardware follows a specification
-by Intel and Microsoft, revision 1.
-
-Each HPET has one fixed-rate counter (at 10+ MHz, hence "High Precision")
-and up to 32 comparators.  Normally three or more comparators are provided,
-each of which can generate oneshot interrupts and at least one of which has
-additional hardware to support periodic interrupts.  The comparators are
-also called "timers", which can be misleading since usually timers are
-independent of each other ... these share a counter, complicating resets.
-
-HPET devices can support two interrupt routing modes.  In one mode, the
-comparators are additional interrupt sources with no particular system
-role.  Many x86 BIOS writers don't route HPET interrupts at all, which
-prevents use of that mode.  They support the other "legacy replacement"
-mode where the first two comparators block interrupts from 8254 timers
-and from the RTC.
-
-The driver supports detection of HPET driver allocation and initialization
-of the HPET before the driver module_init routine is called.  This enables
-platform code which uses timer 0 or 1 as the main timer to intercept HPET
-initialization.  An example of this initialization can be found in
-arch/x86/kernel/hpet.c.
-
-The driver provides a userspace API which resembles the API found in the
-RTC driver framework.  An example user space program is provided in
-file:samples/timers/hpet_example.c
diff --git a/Documentation/timers/hrtimers.rst b/Documentation/timers/hrtimers.rst
new file mode 100644
index 000000000000..c1c20a693e8f
--- /dev/null
+++ b/Documentation/timers/hrtimers.rst
@@ -0,0 +1,178 @@
+======================================================
+hrtimers - subsystem for high-resolution kernel timers
+======================================================
+
+This patch introduces a new subsystem for high-resolution kernel timers.
+
+One might ask the question: we already have a timer subsystem
+(kernel/timers.c), why do we need two timer subsystems? After a lot of
+back and forth trying to integrate high-resolution and high-precision
+features into the existing timer framework, and after testing various
+such high-resolution timer implementations in practice, we came to the
+conclusion that the timer wheel code is fundamentally not suitable for
+such an approach. We initially didn't believe this ('there must be a way
+to solve this'), and spent a considerable effort trying to integrate
+things into the timer wheel, but we failed. In hindsight, there are
+several reasons why such integration is hard/impossible:
+
+- the forced handling of low-resolution and high-resolution timers in
+  the same way leads to a lot of compromises, macro magic and #ifdef
+  mess. The timers.c code is very "tightly coded" around jiffies and
+  32-bitness assumptions, and has been honed and micro-optimized for a
+  relatively narrow use case (jiffies in a relatively narrow HZ range)
+  for many years - and thus even small extensions to it easily break
+  the wheel concept, leading to even worse compromises. The timer wheel
+  code is very good and tight code, there's zero problems with it in its
+  current usage - but it is simply not suitable to be extended for
+  high-res timers.
+
+- the unpredictable [O(N)] overhead of cascading leads to delays which
+  necessitate a more complex handling of high resolution timers, which
+  in turn decreases robustness. Such a design still leads to rather large
+  timing inaccuracies. Cascading is a fundamental property of the timer
+  wheel concept, it cannot be 'designed out' without inevitably
+  degrading other portions of the timers.c code in an unacceptable way.
+
+- the implementation of the current posix-timer subsystem on top of
+  the timer wheel has already introduced a quite complex handling of
+  the required readjusting of absolute CLOCK_REALTIME timers at
+  settimeofday or NTP time - further underlying our experience by
+  example: that the timer wheel data structure is too rigid for high-res
+  timers.
+
+- the timer wheel code is most optimal for use cases which can be
+  identified as "timeouts". Such timeouts are usually set up to cover
+  error conditions in various I/O paths, such as networking and block
+  I/O. The vast majority of those timers never expire and are rarely
+  recascaded because the expected correct event arrives in time so they
+  can be removed from the timer wheel before any further processing of
+  them becomes necessary. Thus the users of these timeouts can accept
+  the granularity and precision tradeoffs of the timer wheel, and
+  largely expect the timer subsystem to have near-zero overhead.
+  Accurate timing for them is not a core purpose - in fact most of the
+  timeout values used are ad-hoc. For them it is at most a necessary
+  evil to guarantee the processing of actual timeout completions
+  (because most of the timeouts are deleted before completion), which
+  should thus be as cheap and unintrusive as possible.
+
+The primary users of precision timers are user-space applications that
+utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
+users like drivers and subsystems which require precise timed events
+(e.g. multimedia) can benefit from the availability of a separate
+high-resolution timer subsystem as well.
+
+While this subsystem does not offer high-resolution clock sources just
+yet, the hrtimer subsystem can be easily extended with high-resolution
+clock capabilities, and patches for that exist and are maturing quickly.
+The increasing demand for realtime and multimedia applications along
+with other potential users for precise timers gives another reason to
+separate the "timeout" and "precise timer" subsystems.
+
+Another potential benefit is that such a separation allows even more
+special-purpose optimization of the existing timer wheel for the low
+resolution and low precision use cases - once the precision-sensitive
+APIs are separated from the timer wheel and are migrated over to
+hrtimers. E.g. we could decrease the frequency of the timeout subsystem
+from 250 Hz to 100 HZ (or even smaller).
+
+hrtimer subsystem implementation details
+----------------------------------------
+
+the basic design considerations were:
+
+- simplicity
+
+- data structure not bound to jiffies or any other granularity. All the
+  kernel logic works at 64-bit nanoseconds resolution - no compromises.
+
+- simplification of existing, timing related kernel code
+
+another basic requirement was the immediate enqueueing and ordering of
+timers at activation time. After looking at several possible solutions
+such as radix trees and hashes, we chose the red black tree as the basic
+data structure. Rbtrees are available as a library in the kernel and are
+used in various performance-critical areas of e.g. memory management and
+file systems. The rbtree is solely used for time sorted ordering, while
+a separate list is used to give the expiry code fast access to the
+queued timers, without having to walk the rbtree.
+
+(This separate list is also useful for later when we'll introduce
+high-resolution clocks, where we need separate pending and expired
+queues while keeping the time-order intact.)
+
+Time-ordered enqueueing is not purely for the purposes of
+high-resolution clocks though, it also simplifies the handling of
+absolute timers based on a low-resolution CLOCK_REALTIME. The existing
+implementation needed to keep an extra list of all armed absolute
+CLOCK_REALTIME timers along with complex locking. In case of
+settimeofday and NTP, all the timers (!) had to be dequeued, the
+time-changing code had to fix them up one by one, and all of them had to
+be enqueued again. The time-ordered enqueueing and the storage of the
+expiry time in absolute time units removes all this complex and poorly
+scaling code from the posix-timer implementation - the clock can simply
+be set without having to touch the rbtree. This also makes the handling
+of posix-timers simpler in general.
+
+The locking and per-CPU behavior of hrtimers was mostly taken from the
+existing timer wheel code, as it is mature and well suited. Sharing code
+was not really a win, due to the different data structures. Also, the
+hrtimer functions now have clearer behavior and clearer names - such as
+hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
+equivalent to del_timer() and del_timer_sync()] - so there's no direct
+1:1 mapping between them on the algorithmic level, and thus no real
+potential for code sharing either.
+
+Basic data types: every time value, absolute or relative, is in a
+special nanosecond-resolution type: ktime_t. The kernel-internal
+representation of ktime_t values and operations is implemented via
+macros and inline functions, and can be switched between a "hybrid
+union" type and a plain "scalar" 64bit nanoseconds representation (at
+compile time). The hybrid union type optimizes time conversions on 32bit
+CPUs. This build-time-selectable ktime_t storage format was implemented
+to avoid the performance impact of 64-bit multiplications and divisions
+on 32bit CPUs. Such operations are frequently necessary to convert
+between the storage formats provided by kernel and userspace interfaces
+and the internal time format. (See include/linux/ktime.h for further
+details.)
+
+hrtimers - rounding of timer values
+-----------------------------------
+
+the hrtimer code will round timer events to lower-resolution clocks
+because it has to. Otherwise it will do no artificial rounding at all.
+
+one question is, what resolution value should be returned to the user by
+the clock_getres() interface. This will return whatever real resolution
+a given clock has - be it low-res, high-res, or artificially-low-res.
+
+hrtimers - testing and verification
+-----------------------------------
+
+We used the high-resolution clock subsystem ontop of hrtimers to verify
+the hrtimer implementation details in praxis, and we also ran the posix
+timer tests in order to ensure specification compliance. We also ran
+tests on low-resolution clocks.
+
+The hrtimer patch converts the following kernel functionality to use
+hrtimers:
+
+ - nanosleep
+ - itimers
+ - posix-timers
+
+The conversion of nanosleep and posix-timers enabled the unification of
+nanosleep and clock_nanosleep.
+
+The code was successfully compiled for the following platforms:
+
+ i386, x86_64, ARM, PPC, PPC64, IA64
+
+The code was run-tested on the following platforms:
+
+ i386(UP/SMP), x86_64(UP/SMP), ARM, PPC
+
+hrtimers were also integrated into the -rt tree, along with a
+hrtimers-based high-resolution clock implementation, so the hrtimers
+code got a healthy amount of testing and use in practice.
+
+	Thomas Gleixner, Ingo Molnar
diff --git a/Documentation/timers/hrtimers.txt b/Documentation/timers/hrtimers.txt
deleted file mode 100644
index 588d85724f10..000000000000
--- a/Documentation/timers/hrtimers.txt
+++ /dev/null
@@ -1,178 +0,0 @@
-
-hrtimers - subsystem for high-resolution kernel timers
-----------------------------------------------------
-
-This patch introduces a new subsystem for high-resolution kernel timers.
-
-One might ask the question: we already have a timer subsystem
-(kernel/timers.c), why do we need two timer subsystems? After a lot of
-back and forth trying to integrate high-resolution and high-precision
-features into the existing timer framework, and after testing various
-such high-resolution timer implementations in practice, we came to the
-conclusion that the timer wheel code is fundamentally not suitable for
-such an approach. We initially didn't believe this ('there must be a way
-to solve this'), and spent a considerable effort trying to integrate
-things into the timer wheel, but we failed. In hindsight, there are
-several reasons why such integration is hard/impossible:
-
-- the forced handling of low-resolution and high-resolution timers in
-  the same way leads to a lot of compromises, macro magic and #ifdef
-  mess. The timers.c code is very "tightly coded" around jiffies and
-  32-bitness assumptions, and has been honed and micro-optimized for a
-  relatively narrow use case (jiffies in a relatively narrow HZ range)
-  for many years - and thus even small extensions to it easily break
-  the wheel concept, leading to even worse compromises. The timer wheel
-  code is very good and tight code, there's zero problems with it in its
-  current usage - but it is simply not suitable to be extended for
-  high-res timers.
-
-- the unpredictable [O(N)] overhead of cascading leads to delays which
-  necessitate a more complex handling of high resolution timers, which
-  in turn decreases robustness. Such a design still leads to rather large
-  timing inaccuracies. Cascading is a fundamental property of the timer
-  wheel concept, it cannot be 'designed out' without inevitably
-  degrading other portions of the timers.c code in an unacceptable way.
-
-- the implementation of the current posix-timer subsystem on top of
-  the timer wheel has already introduced a quite complex handling of
-  the required readjusting of absolute CLOCK_REALTIME timers at
-  settimeofday or NTP time - further underlying our experience by
-  example: that the timer wheel data structure is too rigid for high-res
-  timers.
-
-- the timer wheel code is most optimal for use cases which can be
-  identified as "timeouts". Such timeouts are usually set up to cover
-  error conditions in various I/O paths, such as networking and block
-  I/O. The vast majority of those timers never expire and are rarely
-  recascaded because the expected correct event arrives in time so they
-  can be removed from the timer wheel before any further processing of
-  them becomes necessary. Thus the users of these timeouts can accept
-  the granularity and precision tradeoffs of the timer wheel, and
-  largely expect the timer subsystem to have near-zero overhead.
-  Accurate timing for them is not a core purpose - in fact most of the
-  timeout values used are ad-hoc. For them it is at most a necessary
-  evil to guarantee the processing of actual timeout completions
-  (because most of the timeouts are deleted before completion), which
-  should thus be as cheap and unintrusive as possible.
-
-The primary users of precision timers are user-space applications that
-utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
-users like drivers and subsystems which require precise timed events
-(e.g. multimedia) can benefit from the availability of a separate
-high-resolution timer subsystem as well.
-
-While this subsystem does not offer high-resolution clock sources just
-yet, the hrtimer subsystem can be easily extended with high-resolution
-clock capabilities, and patches for that exist and are maturing quickly.
-The increasing demand for realtime and multimedia applications along
-with other potential users for precise timers gives another reason to
-separate the "timeout" and "precise timer" subsystems.
-
-Another potential benefit is that such a separation allows even more
-special-purpose optimization of the existing timer wheel for the low
-resolution and low precision use cases - once the precision-sensitive
-APIs are separated from the timer wheel and are migrated over to
-hrtimers. E.g. we could decrease the frequency of the timeout subsystem
-from 250 Hz to 100 HZ (or even smaller).
-
-hrtimer subsystem implementation details
-----------------------------------------
-
-the basic design considerations were:
-
-- simplicity
-
-- data structure not bound to jiffies or any other granularity. All the
-  kernel logic works at 64-bit nanoseconds resolution - no compromises.
-
-- simplification of existing, timing related kernel code
-
-another basic requirement was the immediate enqueueing and ordering of
-timers at activation time. After looking at several possible solutions
-such as radix trees and hashes, we chose the red black tree as the basic
-data structure. Rbtrees are available as a library in the kernel and are
-used in various performance-critical areas of e.g. memory management and
-file systems. The rbtree is solely used for time sorted ordering, while
-a separate list is used to give the expiry code fast access to the
-queued timers, without having to walk the rbtree.
-
-(This separate list is also useful for later when we'll introduce
-high-resolution clocks, where we need separate pending and expired
-queues while keeping the time-order intact.)
-
-Time-ordered enqueueing is not purely for the purposes of
-high-resolution clocks though, it also simplifies the handling of
-absolute timers based on a low-resolution CLOCK_REALTIME. The existing
-implementation needed to keep an extra list of all armed absolute
-CLOCK_REALTIME timers along with complex locking. In case of
-settimeofday and NTP, all the timers (!) had to be dequeued, the
-time-changing code had to fix them up one by one, and all of them had to
-be enqueued again. The time-ordered enqueueing and the storage of the
-expiry time in absolute time units removes all this complex and poorly
-scaling code from the posix-timer implementation - the clock can simply
-be set without having to touch the rbtree. This also makes the handling
-of posix-timers simpler in general.
-
-The locking and per-CPU behavior of hrtimers was mostly taken from the
-existing timer wheel code, as it is mature and well suited. Sharing code
-was not really a win, due to the different data structures. Also, the
-hrtimer functions now have clearer behavior and clearer names - such as
-hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
-equivalent to del_timer() and del_timer_sync()] - so there's no direct
-1:1 mapping between them on the algorithmic level, and thus no real
-potential for code sharing either.
-
-Basic data types: every time value, absolute or relative, is in a
-special nanosecond-resolution type: ktime_t. The kernel-internal
-representation of ktime_t values and operations is implemented via
-macros and inline functions, and can be switched between a "hybrid
-union" type and a plain "scalar" 64bit nanoseconds representation (at
-compile time). The hybrid union type optimizes time conversions on 32bit
-CPUs. This build-time-selectable ktime_t storage format was implemented
-to avoid the performance impact of 64-bit multiplications and divisions
-on 32bit CPUs. Such operations are frequently necessary to convert
-between the storage formats provided by kernel and userspace interfaces
-and the internal time format. (See include/linux/ktime.h for further
-details.)
-
-hrtimers - rounding of timer values
------------------------------------
-
-the hrtimer code will round timer events to lower-resolution clocks
-because it has to. Otherwise it will do no artificial rounding at all.
-
-one question is, what resolution value should be returned to the user by
-the clock_getres() interface. This will return whatever real resolution
-a given clock has - be it low-res, high-res, or artificially-low-res.
-
-hrtimers - testing and verification
-----------------------------------
-
-We used the high-resolution clock subsystem ontop of hrtimers to verify
-the hrtimer implementation details in praxis, and we also ran the posix
-timer tests in order to ensure specification compliance. We also ran
-tests on low-resolution clocks.
-
-The hrtimer patch converts the following kernel functionality to use
-hrtimers:
-
- - nanosleep
- - itimers
- - posix-timers
-
-The conversion of nanosleep and posix-timers enabled the unification of
-nanosleep and clock_nanosleep.
-
-The code was successfully compiled for the following platforms:
-
- i386, x86_64, ARM, PPC, PPC64, IA64
-
-The code was run-tested on the following platforms:
-
- i386(UP/SMP), x86_64(UP/SMP), ARM, PPC
-
-hrtimers were also integrated into the -rt tree, along with a
-hrtimers-based high-resolution clock implementation, so the hrtimers
-code got a healthy amount of testing and use in practice.
-
-	Thomas Gleixner, Ingo Molnar
diff --git a/Documentation/timers/index.rst b/Documentation/timers/index.rst
new file mode 100644
index 000000000000..91f6f8263c48
--- /dev/null
+++ b/Documentation/timers/index.rst
@@ -0,0 +1,22 @@
+:orphan:
+
+======
+timers
+======
+
+.. toctree::
+    :maxdepth: 1
+
+    highres
+    hpet
+    hrtimers
+    no_hz
+    timekeeping
+    timers-howto
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
new file mode 100644
index 000000000000..065db217cb04
--- /dev/null
+++ b/Documentation/timers/no_hz.rst
@@ -0,0 +1,326 @@
+﻿======================================
+NO_HZ: Reducing Scheduling-Clock Ticks
+======================================
+
+
+This document describes Kconfig options and boot parameters that can
+reduce the number of scheduling-clock interrupts, thereby improving energy
+efficiency and reducing OS jitter.  Reducing OS jitter is important for
+some types of computationally intensive high-performance computing (HPC)
+applications and for real-time applications.
+
+There are three main ways of managing scheduling-clock interrupts
+(also known as "scheduling-clock ticks" or simply "ticks"):
+
+1.	Never omit scheduling-clock ticks (CONFIG_HZ_PERIODIC=y or
+	CONFIG_NO_HZ=n for older kernels).  You normally will -not-
+	want to choose this option.
+
+2.	Omit scheduling-clock ticks on idle CPUs (CONFIG_NO_HZ_IDLE=y or
+	CONFIG_NO_HZ=y for older kernels).  This is the most common
+	approach, and should be the default.
+
+3.	Omit scheduling-clock ticks on CPUs that are either idle or that
+	have only one runnable task (CONFIG_NO_HZ_FULL=y).  Unless you
+	are running realtime applications or certain types of HPC
+	workloads, you will normally -not- want this option.
+
+These three cases are described in the following three sections, followed
+by a third section on RCU-specific considerations, a fourth section
+discussing testing, and a fifth and final section listing known issues.
+
+
+Never Omit Scheduling-Clock Ticks
+=================================
+
+Very old versions of Linux from the 1990s and the very early 2000s
+are incapable of omitting scheduling-clock ticks.  It turns out that
+there are some situations where this old-school approach is still the
+right approach, for example, in heavy workloads with lots of tasks
+that use short bursts of CPU, where there are very frequent idle
+periods, but where these idle periods are also quite short (tens or
+hundreds of microseconds).  For these types of workloads, scheduling
+clock interrupts will normally be delivered any way because there
+will frequently be multiple runnable tasks per CPU.  In these cases,
+attempting to turn off the scheduling clock interrupt will have no effect
+other than increasing the overhead of switching to and from idle and
+transitioning between user and kernel execution.
+
+This mode of operation can be selected using CONFIG_HZ_PERIODIC=y (or
+CONFIG_NO_HZ=n for older kernels).
+
+However, if you are instead running a light workload with long idle
+periods, failing to omit scheduling-clock interrupts will result in
+excessive power consumption.  This is especially bad on battery-powered
+devices, where it results in extremely short battery lifetimes.  If you
+are running light workloads, you should therefore read the following
+section.
+
+In addition, if you are running either a real-time workload or an HPC
+workload with short iterations, the scheduling-clock interrupts can
+degrade your applications performance.  If this describes your workload,
+you should read the following two sections.
+
+
+Omit Scheduling-Clock Ticks For Idle CPUs
+=========================================
+
+If a CPU is idle, there is little point in sending it a scheduling-clock
+interrupt.  After all, the primary purpose of a scheduling-clock interrupt
+is to force a busy CPU to shift its attention among multiple duties,
+and an idle CPU has no duties to shift its attention among.
+
+The CONFIG_NO_HZ_IDLE=y Kconfig option causes the kernel to avoid sending
+scheduling-clock interrupts to idle CPUs, which is critically important
+both to battery-powered devices and to highly virtualized mainframes.
+A battery-powered device running a CONFIG_HZ_PERIODIC=y kernel would
+drain its battery very quickly, easily 2-3 times as fast as would the
+same device running a CONFIG_NO_HZ_IDLE=y kernel.  A mainframe running
+1,500 OS instances might find that half of its CPU time was consumed by
+unnecessary scheduling-clock interrupts.  In these situations, there
+is strong motivation to avoid sending scheduling-clock interrupts to
+idle CPUs.  That said, dyntick-idle mode is not free:
+
+1.	It increases the number of instructions executed on the path
+	to and from the idle loop.
+
+2.	On many architectures, dyntick-idle mode also increases the
+	number of expensive clock-reprogramming operations.
+
+Therefore, systems with aggressive real-time response constraints often
+run CONFIG_HZ_PERIODIC=y kernels (or CONFIG_NO_HZ=n for older kernels)
+in order to avoid degrading from-idle transition latencies.
+
+An idle CPU that is not receiving scheduling-clock interrupts is said to
+be "dyntick-idle", "in dyntick-idle mode", "in nohz mode", or "running
+tickless".  The remainder of this document will use "dyntick-idle mode".
+
+There is also a boot parameter "nohz=" that can be used to disable
+dyntick-idle mode in CONFIG_NO_HZ_IDLE=y kernels by specifying "nohz=off".
+By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
+dyntick-idle mode.
+
+
+Omit Scheduling-Clock Ticks For CPUs With Only One Runnable Task
+================================================================
+
+If a CPU has only one runnable task, there is little point in sending it
+a scheduling-clock interrupt because there is no other task to switch to.
+Note that omitting scheduling-clock ticks for CPUs with only one runnable
+task implies also omitting them for idle CPUs.
+
+The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
+sending scheduling-clock interrupts to CPUs with a single runnable task,
+and such CPUs are said to be "adaptive-ticks CPUs".  This is important
+for applications with aggressive real-time response constraints because
+it allows them to improve their worst-case response times by the maximum
+duration of a scheduling-clock interrupt.  It is also important for
+computationally intensive short-iteration workloads:  If any CPU is
+delayed during a given iteration, all the other CPUs will be forced to
+wait idle while the delayed CPU finishes.  Thus, the delay is multiplied
+by one less than the number of CPUs.  In these situations, there is
+again strong motivation to avoid sending scheduling-clock interrupts.
+
+By default, no CPU will be an adaptive-ticks CPU.  The "nohz_full="
+boot parameter specifies the adaptive-ticks CPUs.  For example,
+"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
+CPUs.  Note that you are prohibited from marking all of the CPUs as
+adaptive-tick CPUs:  At least one non-adaptive-tick CPU must remain
+online to handle timekeeping tasks in order to ensure that system
+calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
+(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
+user processes to observe slight drifts in clock rate.)  Therefore, the
+boot CPU is prohibited from entering adaptive-ticks mode.  Specifying a
+"nohz_full=" mask that includes the boot CPU will result in a boot-time
+error message, and the boot CPU will be removed from the mask.  Note that
+this means that your system must have at least two CPUs in order for
+CONFIG_NO_HZ_FULL=y to do anything for you.
+
+Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
+This is covered in the "RCU IMPLICATIONS" section below.
+
+Normally, a CPU remains in adaptive-ticks mode as long as possible.
+In particular, transitioning to kernel mode does not automatically change
+the mode.  Instead, the CPU will exit adaptive-ticks mode only if needed,
+for example, if that CPU enqueues an RCU callback.
+
+Just as with dyntick-idle mode, the benefits of adaptive-tick mode do
+not come for free:
+
+1.	CONFIG_NO_HZ_FULL selects CONFIG_NO_HZ_COMMON, so you cannot run
+	adaptive ticks without also running dyntick idle.  This dependency
+	extends down into the implementation, so that all of the costs
+	of CONFIG_NO_HZ_IDLE are also incurred by CONFIG_NO_HZ_FULL.
+
+2.	The user/kernel transitions are slightly more expensive due
+	to the need to inform kernel subsystems (such as RCU) about
+	the change in mode.
+
+3.	POSIX CPU timers prevent CPUs from entering adaptive-tick mode.
+	Real-time applications needing to take actions based on CPU time
+	consumption need to use other means of doing so.
+
+4.	If there are more perf events pending than the hardware can
+	accommodate, they are normally round-robined so as to collect
+	all of them over time.  Adaptive-tick mode may prevent this
+	round-robining from happening.  This will likely be fixed by
+	preventing CPUs with large numbers of perf events pending from
+	entering adaptive-tick mode.
+
+5.	Scheduler statistics for adaptive-tick CPUs may be computed
+	slightly differently than those for non-adaptive-tick CPUs.
+	This might in turn perturb load-balancing of real-time tasks.
+
+6.	The LB_BIAS scheduler feature is disabled by adaptive ticks.
+
+Although improvements are expected over time, adaptive ticks is quite
+useful for many types of real-time and compute-intensive applications.
+However, the drawbacks listed above mean that adaptive ticks should not
+(yet) be enabled by default.
+
+
+RCU Implications
+================
+
+There are situations in which idle CPUs cannot be permitted to
+enter either dyntick-idle mode or adaptive-tick mode, the most
+common being when that CPU has RCU callbacks pending.
+
+The CONFIG_RCU_FAST_NO_HZ=y Kconfig option may be used to cause such CPUs
+to enter dyntick-idle mode or adaptive-tick mode anyway.  In this case,
+a timer will awaken these CPUs every four jiffies in order to ensure
+that the RCU callbacks are processed in a timely fashion.
+
+Another approach is to offload RCU callback processing to "rcuo" kthreads
+using the CONFIG_RCU_NOCB_CPU=y Kconfig option.  The specific CPUs to
+offload may be selected using The "rcu_nocbs=" kernel boot parameter,
+which takes a comma-separated list of CPUs and CPU ranges, for example,
+"1,3-5" selects CPUs 1, 3, 4, and 5.
+
+The offloaded CPUs will never queue RCU callbacks, and therefore RCU
+never prevents offloaded CPUs from entering either dyntick-idle mode
+or adaptive-tick mode.  That said, note that it is up to userspace to
+pin the "rcuo" kthreads to specific CPUs if desired.  Otherwise, the
+scheduler will decide where to run them, which might or might not be
+where you want them to run.
+
+
+Testing
+=======
+
+So you enable all the OS-jitter features described in this document,
+but do not see any change in your workload's behavior.  Is this because
+your workload isn't affected that much by OS jitter, or is it because
+something else is in the way?  This section helps answer this question
+by providing a simple OS-jitter test suite, which is available on branch
+master of the following git archive:
+
+git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git
+
+Clone this archive and follow the instructions in the README file.
+This test procedure will produce a trace that will allow you to evaluate
+whether or not you have succeeded in removing OS jitter from your system.
+If this trace shows that you have removed OS jitter as much as is
+possible, then you can conclude that your workload is not all that
+sensitive to OS jitter.
+
+Note: this test requires that your system have at least two CPUs.
+We do not currently have a good way to remove OS jitter from single-CPU
+systems.
+
+
+Known Issues
+============
+
+*	Dyntick-idle slows transitions to and from idle slightly.
+	In practice, this has not been a problem except for the most
+	aggressive real-time workloads, which have the option of disabling
+	dyntick-idle mode, an option that most of them take.  However,
+	some workloads will no doubt want to use adaptive ticks to
+	eliminate scheduling-clock interrupt latencies.  Here are some
+	options for these workloads:
+
+	a.	Use PMQOS from userspace to inform the kernel of your
+		latency requirements (preferred).
+
+	b.	On x86 systems, use the "idle=mwait" boot parameter.
+
+	c.	On x86 systems, use the "intel_idle.max_cstate=" to limit
+	`	the maximum C-state depth.
+
+	d.	On x86 systems, use the "idle=poll" boot parameter.
+		However, please note that use of this parameter can cause
+		your CPU to overheat, which may cause thermal throttling
+		to degrade your latencies -- and that this degradation can
+		be even worse than that of dyntick-idle.  Furthermore,
+		this parameter effectively disables Turbo Mode on Intel
+		CPUs, which can significantly reduce maximum performance.
+
+*	Adaptive-ticks slows user/kernel transitions slightly.
+	This is not expected to be a problem for computationally intensive
+	workloads, which have few such transitions.  Careful benchmarking
+	will be required to determine whether or not other workloads
+	are significantly affected by this effect.
+
+*	Adaptive-ticks does not do anything unless there is only one
+	runnable task for a given CPU, even though there are a number
+	of other situations where the scheduling-clock tick is not
+	needed.  To give but one example, consider a CPU that has one
+	runnable high-priority SCHED_FIFO task and an arbitrary number
+	of low-priority SCHED_OTHER tasks.  In this case, the CPU is
+	required to run the SCHED_FIFO task until it either blocks or
+	some other higher-priority task awakens on (or is assigned to)
+	this CPU, so there is no point in sending a scheduling-clock
+	interrupt to this CPU.	However, the current implementation
+	nevertheless sends scheduling-clock interrupts to CPUs having a
+	single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
+	tasks, even though these interrupts are unnecessary.
+
+	And even when there are multiple runnable tasks on a given CPU,
+	there is little point in interrupting that CPU until the current
+	running task's timeslice expires, which is almost always way
+	longer than the time of the next scheduling-clock interrupt.
+
+	Better handling of these sorts of situations is future work.
+
+*	A reboot is required to reconfigure both adaptive idle and RCU
+	callback offloading.  Runtime reconfiguration could be provided
+	if needed, however, due to the complexity of reconfiguring RCU at
+	runtime, there would need to be an earthshakingly good reason.
+	Especially given that you have the straightforward option of
+	simply offloading RCU callbacks from all CPUs and pinning them
+	where you want them whenever you want them pinned.
+
+*	Additional configuration is required to deal with other sources
+	of OS jitter, including interrupts and system-utility tasks
+	and processes.  This configuration normally involves binding
+	interrupts and tasks to particular CPUs.
+
+*	Some sources of OS jitter can currently be eliminated only by
+	constraining the workload.  For example, the only way to eliminate
+	OS jitter due to global TLB shootdowns is to avoid the unmapping
+	operations (such as kernel module unload operations) that
+	result in these shootdowns.  For another example, page faults
+	and TLB misses can be reduced (and in some cases eliminated) by
+	using huge pages and by constraining the amount of memory used
+	by the application.  Pre-faulting the working set can also be
+	helpful, especially when combined with the mlock() and mlockall()
+	system calls.
+
+*	Unless all CPUs are idle, at least one CPU must keep the
+	scheduling-clock interrupt going in order to support accurate
+	timekeeping.
+
+*	If there might potentially be some adaptive-ticks CPUs, there
+	will be at least one CPU keeping the scheduling-clock interrupt
+	going, even if all CPUs are otherwise idle.
+
+	Better handling of this situation is ongoing work.
+
+*	Some process-handling operations still require the occasional
+	scheduling-clock tick.	These operations include calculating CPU
+	load, maintaining sched average, computing CFS entity vruntime,
+	computing avenrun, and carrying out load balancing.  They are
+	currently accommodated by scheduling-clock tick every second
+	or so.	On-going work will eliminate the need even for these
+	infrequent scheduling-clock ticks.
diff --git a/Documentation/timers/timekeeping.rst b/Documentation/timers/timekeeping.rst
new file mode 100644
index 000000000000..f83e98852e2c
--- /dev/null
+++ b/Documentation/timers/timekeeping.rst
@@ -0,0 +1,180 @@
+===========================================================
+Clock sources, Clock events, sched_clock() and delay timers
+===========================================================
+
+This document tries to briefly explain some basic kernel timekeeping
+abstractions. It partly pertains to the drivers usually found in
+drivers/clocksource in the kernel tree, but the code may be spread out
+across the kernel.
+
+If you grep through the kernel source you will find a number of architecture-
+specific implementations of clock sources, clockevents and several likewise
+architecture-specific overrides of the sched_clock() function and some
+delay timers.
+
+To provide timekeeping for your platform, the clock source provides
+the basic timeline, whereas clock events shoot interrupts on certain points
+on this timeline, providing facilities such as high-resolution timers.
+sched_clock() is used for scheduling and timestamping, and delay timers
+provide an accurate delay source using hardware counters.
+
+
+Clock sources
+-------------
+
+The purpose of the clock source is to provide a timeline for the system that
+tells you where you are in time. For example issuing the command 'date' on
+a Linux system will eventually read the clock source to determine exactly
+what time it is.
+
+Typically the clock source is a monotonic, atomic counter which will provide
+n bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over.
+It will ideally NEVER stop ticking as long as the system is running. It
+may stop during system suspend.
+
+The clock source shall have as high resolution as possible, and the frequency
+shall be as stable and correct as possible as compared to a real-world wall
+clock. It should not move unpredictably back and forth in time or miss a few
+cycles here and there.
+
+It must be immune to the kind of effects that occur in hardware where e.g.
+the counter register is read in two phases on the bus lowest 16 bits first
+and the higher 16 bits in a second bus cycle with the counter bits
+potentially being updated in between leading to the risk of very strange
+values from the counter.
+
+When the wall-clock accuracy of the clock source isn't satisfactory, there
+are various quirks and layers in the timekeeping code for e.g. synchronizing
+the user-visible time to RTC clocks in the system or against networked time
+servers using NTP, but all they do basically is update an offset against
+the clock source, which provides the fundamental timeline for the system.
+These measures does not affect the clock source per se, they only adapt the
+system to the shortcomings of it.
+
+The clock source struct shall provide means to translate the provided counter
+into a nanosecond value as an unsigned long long (unsigned 64 bit) number.
+Since this operation may be invoked very often, doing this in a strict
+mathematical sense is not desirable: instead the number is taken as close as
+possible to a nanosecond value using only the arithmetic operations
+multiply and shift, so in clocksource_cyc2ns() you find:
+
+  ns ~= (clocksource * mult) >> shift
+
+You will find a number of helper functions in the clock source code intended
+to aid in providing these mult and shift values, such as
+clocksource_khz2mult(), clocksource_hz2mult() that help determine the
+mult factor from a fixed shift, and clocksource_register_hz() and
+clocksource_register_khz() which will help out assigning both shift and mult
+factors using the frequency of the clock source as the only input.
+
+For real simple clock sources accessed from a single I/O memory location
+there is nowadays even clocksource_mmio_init() which will take a memory
+location, bit width, a parameter telling whether the counter in the
+register counts up or down, and the timer clock rate, and then conjure all
+necessary parameters.
+
+Since a 32-bit counter at say 100 MHz will wrap around to zero after some 43
+seconds, the code handling the clock source will have to compensate for this.
+That is the reason why the clock source struct also contains a 'mask'
+member telling how many bits of the source are valid. This way the timekeeping
+code knows when the counter will wrap around and can insert the necessary
+compensation code on both sides of the wrap point so that the system timeline
+remains monotonic.
+
+
+Clock events
+------------
+
+Clock events are the conceptual reverse of clock sources: they take a
+desired time specification value and calculate the values to poke into
+hardware timer registers.
+
+Clock events are orthogonal to clock sources. The same hardware
+and register range may be used for the clock event, but it is essentially
+a different thing. The hardware driving clock events has to be able to
+fire interrupts, so as to trigger events on the system timeline. On an SMP
+system, it is ideal (and customary) to have one such event driving timer per
+CPU core, so that each core can trigger events independently of any other
+core.
+
+You will notice that the clock event device code is based on the same basic
+idea about translating counters to nanoseconds using mult and shift
+arithmetic, and you find the same family of helper functions again for
+assigning these values. The clock event driver does not need a 'mask'
+attribute however: the system will not try to plan events beyond the time
+horizon of the clock event.
+
+
+sched_clock()
+-------------
+
+In addition to the clock sources and clock events there is a special weak
+function in the kernel called sched_clock(). This function shall return the
+number of nanoseconds since the system was started. An architecture may or
+may not provide an implementation of sched_clock() on its own. If a local
+implementation is not provided, the system jiffy counter will be used as
+sched_clock().
+
+As the name suggests, sched_clock() is used for scheduling the system,
+determining the absolute timeslice for a certain process in the CFS scheduler
+for example. It is also used for printk timestamps when you have selected to
+include time information in printk for things like bootcharts.
+
+Compared to clock sources, sched_clock() has to be very fast: it is called
+much more often, especially by the scheduler. If you have to do trade-offs
+between accuracy compared to the clock source, you may sacrifice accuracy
+for speed in sched_clock(). It however requires some of the same basic
+characteristics as the clock source, i.e. it should be monotonic.
+
+The sched_clock() function may wrap only on unsigned long long boundaries,
+i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
+after circa 585 years. (For most practical systems this means "never".)
+
+If an architecture does not provide its own implementation of this function,
+it will fall back to using jiffies, making its maximum resolution 1/HZ of the
+jiffy frequency for the architecture. This will affect scheduling accuracy
+and will likely show up in system benchmarks.
+
+The clock driving sched_clock() may stop or reset to zero during system
+suspend/sleep. This does not matter to the function it serves of scheduling
+events on the system. However it may result in interesting timestamps in
+printk().
+
+The sched_clock() function should be callable in any context, IRQ- and
+NMI-safe and return a sane value in any context.
+
+Some architectures may have a limited set of time sources and lack a nice
+counter to derive a 64-bit nanosecond value, so for example on the ARM
+architecture, special helper functions have been created to provide a
+sched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
+same counter that is also used as clock source is used for this purpose.
+
+On SMP systems, it is crucial for performance that sched_clock() can be called
+independently on each CPU without any synchronization performance hits.
+Some hardware (such as the x86 TSC) will cause the sched_clock() function to
+drift between the CPUs on the system. The kernel can work around this by
+enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
+that makes sched_clock() different from the ordinary clock source.
+
+
+Delay timers (some architectures only)
+--------------------------------------
+
+On systems with variable CPU frequency, the various kernel delay() functions
+will sometimes behave strangely. Basically these delays usually use a hard
+loop to delay a certain number of jiffy fractions using a "lpj" (loops per
+jiffy) value, calibrated on boot.
+
+Let's hope that your system is running on maximum frequency when this value
+is calibrated: as an effect when the frequency is geared down to half the
+full frequency, any delay() will be twice as long. Usually this does not
+hurt, as you're commonly requesting that amount of delay *or more*. But
+basically the semantics are quite unpredictable on such systems.
+
+Enter timer-based delays. Using these, a timer read may be used instead of
+a hard-coded loop for providing the desired delay.
+
+This is done by declaring a struct delay_timer and assigning the appropriate
+function pointers and rate settings for this delay timer.
+
+This is available on some architectures like OpenRISC or ARM.
diff --git a/Documentation/timers/timekeeping.txt b/Documentation/timers/timekeeping.txt
deleted file mode 100644
index 2d1732b0a868..000000000000
--- a/Documentation/timers/timekeeping.txt
+++ /dev/null
@@ -1,179 +0,0 @@
-Clock sources, Clock events, sched_clock() and delay timers
------------------------------------------------------------
-
-This document tries to briefly explain some basic kernel timekeeping
-abstractions. It partly pertains to the drivers usually found in
-drivers/clocksource in the kernel tree, but the code may be spread out
-across the kernel.
-
-If you grep through the kernel source you will find a number of architecture-
-specific implementations of clock sources, clockevents and several likewise
-architecture-specific overrides of the sched_clock() function and some
-delay timers.
-
-To provide timekeeping for your platform, the clock source provides
-the basic timeline, whereas clock events shoot interrupts on certain points
-on this timeline, providing facilities such as high-resolution timers.
-sched_clock() is used for scheduling and timestamping, and delay timers
-provide an accurate delay source using hardware counters.
-
-
-Clock sources
--------------
-
-The purpose of the clock source is to provide a timeline for the system that
-tells you where you are in time. For example issuing the command 'date' on
-a Linux system will eventually read the clock source to determine exactly
-what time it is.
-
-Typically the clock source is a monotonic, atomic counter which will provide
-n bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over.
-It will ideally NEVER stop ticking as long as the system is running. It
-may stop during system suspend.
-
-The clock source shall have as high resolution as possible, and the frequency
-shall be as stable and correct as possible as compared to a real-world wall
-clock. It should not move unpredictably back and forth in time or miss a few
-cycles here and there.
-
-It must be immune to the kind of effects that occur in hardware where e.g.
-the counter register is read in two phases on the bus lowest 16 bits first
-and the higher 16 bits in a second bus cycle with the counter bits
-potentially being updated in between leading to the risk of very strange
-values from the counter.
-
-When the wall-clock accuracy of the clock source isn't satisfactory, there
-are various quirks and layers in the timekeeping code for e.g. synchronizing
-the user-visible time to RTC clocks in the system or against networked time
-servers using NTP, but all they do basically is update an offset against
-the clock source, which provides the fundamental timeline for the system.
-These measures does not affect the clock source per se, they only adapt the
-system to the shortcomings of it.
-
-The clock source struct shall provide means to translate the provided counter
-into a nanosecond value as an unsigned long long (unsigned 64 bit) number.
-Since this operation may be invoked very often, doing this in a strict
-mathematical sense is not desirable: instead the number is taken as close as
-possible to a nanosecond value using only the arithmetic operations
-multiply and shift, so in clocksource_cyc2ns() you find:
-
-  ns ~= (clocksource * mult) >> shift
-
-You will find a number of helper functions in the clock source code intended
-to aid in providing these mult and shift values, such as
-clocksource_khz2mult(), clocksource_hz2mult() that help determine the
-mult factor from a fixed shift, and clocksource_register_hz() and
-clocksource_register_khz() which will help out assigning both shift and mult
-factors using the frequency of the clock source as the only input.
-
-For real simple clock sources accessed from a single I/O memory location
-there is nowadays even clocksource_mmio_init() which will take a memory
-location, bit width, a parameter telling whether the counter in the
-register counts up or down, and the timer clock rate, and then conjure all
-necessary parameters.
-
-Since a 32-bit counter at say 100 MHz will wrap around to zero after some 43
-seconds, the code handling the clock source will have to compensate for this.
-That is the reason why the clock source struct also contains a 'mask'
-member telling how many bits of the source are valid. This way the timekeeping
-code knows when the counter will wrap around and can insert the necessary
-compensation code on both sides of the wrap point so that the system timeline
-remains monotonic.
-
-
-Clock events
-------------
-
-Clock events are the conceptual reverse of clock sources: they take a
-desired time specification value and calculate the values to poke into
-hardware timer registers.
-
-Clock events are orthogonal to clock sources. The same hardware
-and register range may be used for the clock event, but it is essentially
-a different thing. The hardware driving clock events has to be able to
-fire interrupts, so as to trigger events on the system timeline. On an SMP
-system, it is ideal (and customary) to have one such event driving timer per
-CPU core, so that each core can trigger events independently of any other
-core.
-
-You will notice that the clock event device code is based on the same basic
-idea about translating counters to nanoseconds using mult and shift
-arithmetic, and you find the same family of helper functions again for
-assigning these values. The clock event driver does not need a 'mask'
-attribute however: the system will not try to plan events beyond the time
-horizon of the clock event.
-
-
-sched_clock()
--------------
-
-In addition to the clock sources and clock events there is a special weak
-function in the kernel called sched_clock(). This function shall return the
-number of nanoseconds since the system was started. An architecture may or
-may not provide an implementation of sched_clock() on its own. If a local
-implementation is not provided, the system jiffy counter will be used as
-sched_clock().
-
-As the name suggests, sched_clock() is used for scheduling the system,
-determining the absolute timeslice for a certain process in the CFS scheduler
-for example. It is also used for printk timestamps when you have selected to
-include time information in printk for things like bootcharts.
-
-Compared to clock sources, sched_clock() has to be very fast: it is called
-much more often, especially by the scheduler. If you have to do trade-offs
-between accuracy compared to the clock source, you may sacrifice accuracy
-for speed in sched_clock(). It however requires some of the same basic
-characteristics as the clock source, i.e. it should be monotonic.
-
-The sched_clock() function may wrap only on unsigned long long boundaries,
-i.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
-after circa 585 years. (For most practical systems this means "never".)
-
-If an architecture does not provide its own implementation of this function,
-it will fall back to using jiffies, making its maximum resolution 1/HZ of the
-jiffy frequency for the architecture. This will affect scheduling accuracy
-and will likely show up in system benchmarks.
-
-The clock driving sched_clock() may stop or reset to zero during system
-suspend/sleep. This does not matter to the function it serves of scheduling
-events on the system. However it may result in interesting timestamps in
-printk().
-
-The sched_clock() function should be callable in any context, IRQ- and
-NMI-safe and return a sane value in any context.
-
-Some architectures may have a limited set of time sources and lack a nice
-counter to derive a 64-bit nanosecond value, so for example on the ARM
-architecture, special helper functions have been created to provide a
-sched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
-same counter that is also used as clock source is used for this purpose.
-
-On SMP systems, it is crucial for performance that sched_clock() can be called
-independently on each CPU without any synchronization performance hits.
-Some hardware (such as the x86 TSC) will cause the sched_clock() function to
-drift between the CPUs on the system. The kernel can work around this by
-enabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
-that makes sched_clock() different from the ordinary clock source.
-
-
-Delay timers (some architectures only)
---------------------------------------
-
-On systems with variable CPU frequency, the various kernel delay() functions
-will sometimes behave strangely. Basically these delays usually use a hard
-loop to delay a certain number of jiffy fractions using a "lpj" (loops per
-jiffy) value, calibrated on boot.
-
-Let's hope that your system is running on maximum frequency when this value
-is calibrated: as an effect when the frequency is geared down to half the
-full frequency, any delay() will be twice as long. Usually this does not
-hurt, as you're commonly requesting that amount of delay *or more*. But
-basically the semantics are quite unpredictable on such systems.
-
-Enter timer-based delays. Using these, a timer read may be used instead of
-a hard-coded loop for providing the desired delay.
-
-This is done by declaring a struct delay_timer and assigning the appropriate
-function pointers and rate settings for this delay timer.
-
-This is available on some architectures like OpenRISC or ARM.
diff --git a/Documentation/timers/timers-howto.rst b/Documentation/timers/timers-howto.rst
new file mode 100644
index 000000000000..7e3167bec2b1
--- /dev/null
+++ b/Documentation/timers/timers-howto.rst
@@ -0,0 +1,112 @@
+===================================================================
+delays - Information on the various kernel delay / sleep mechanisms
+===================================================================
+
+This document seeks to answer the common question: "What is the
+RightWay (TM) to insert a delay?"
+
+This question is most often faced by driver writers who have to
+deal with hardware delays and who may not be the most intimately
+familiar with the inner workings of the Linux Kernel.
+
+
+Inserting Delays
+----------------
+
+The first, and most important, question you need to ask is "Is my
+code in an atomic context?"  This should be followed closely by "Does
+it really need to delay in atomic context?" If so...
+
+ATOMIC CONTEXT:
+	You must use the `*delay` family of functions. These
+	functions use the jiffie estimation of clock speed
+	and will busy wait for enough loop cycles to achieve
+	the desired delay:
+
+	ndelay(unsigned long nsecs)
+	udelay(unsigned long usecs)
+	mdelay(unsigned long msecs)
+
+	udelay is the generally preferred API; ndelay-level
+	precision may not actually exist on many non-PC devices.
+
+	mdelay is macro wrapper around udelay, to account for
+	possible overflow when passing large arguments to udelay.
+	In general, use of mdelay is discouraged and code should
+	be refactored to allow for the use of msleep.
+
+NON-ATOMIC CONTEXT:
+	You should use the `*sleep[_range]` family of functions.
+	There are a few more options here, while any of them may
+	work correctly, using the "right" sleep function will
+	help the scheduler, power management, and just make your
+	driver better :)
+
+	-- Backed by busy-wait loop:
+
+		udelay(unsigned long usecs)
+
+	-- Backed by hrtimers:
+
+		usleep_range(unsigned long min, unsigned long max)
+
+	-- Backed by jiffies / legacy_timers
+
+		msleep(unsigned long msecs)
+		msleep_interruptible(unsigned long msecs)
+
+	Unlike the `*delay` family, the underlying mechanism
+	driving each of these calls varies, thus there are
+	quirks you should be aware of.
+
+
+	SLEEPING FOR "A FEW" USECS ( < ~10us? ):
+		* Use udelay
+
+		- Why not usleep?
+			On slower systems, (embedded, OR perhaps a speed-
+			stepped PC!) the overhead of setting up the hrtimers
+			for usleep *may* not be worth it. Such an evaluation
+			will obviously depend on your specific situation, but
+			it is something to be aware of.
+
+	SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
+		* Use usleep_range
+
+		- Why not msleep for (1ms - 20ms)?
+			Explained originally here:
+				http://lkml.org/lkml/2007/8/3/250
+
+			msleep(1~20) may not do what the caller intends, and
+			will often sleep longer (~20 ms actual sleep for any
+			value given in the 1~20ms range). In many cases this
+			is not the desired behavior.
+
+		- Why is there no "usleep" / What is a good range?
+			Since usleep_range is built on top of hrtimers, the
+			wakeup will be very precise (ish), thus a simple
+			usleep function would likely introduce a large number
+			of undesired interrupts.
+
+			With the introduction of a range, the scheduler is
+			free to coalesce your wakeup with any other wakeup
+			that may have happened for other reasons, or at the
+			worst case, fire an interrupt for your upper bound.
+
+			The larger a range you supply, the greater a chance
+			that you will not trigger an interrupt; this should
+			be balanced with what is an acceptable upper bound on
+			delay / performance for your specific code path. Exact
+			tolerances here are very situation specific, thus it
+			is left to the caller to determine a reasonable range.
+
+	SLEEPING FOR LARGER MSECS ( 10ms+ )
+		* Use msleep or possibly msleep_interruptible
+
+		- What's the difference?
+			msleep sets the current task to TASK_UNINTERRUPTIBLE
+			whereas msleep_interruptible sets the current task to
+			TASK_INTERRUPTIBLE before scheduling the sleep. In
+			short, the difference is whether the sleep can be ended
+			early by a signal. In general, just use msleep unless
+			you know you have a need for the interruptible variant.
diff --git a/Documentation/timers/timers-howto.txt b/Documentation/timers/timers-howto.txt
deleted file mode 100644
index 038f8c77a076..000000000000
--- a/Documentation/timers/timers-howto.txt
+++ /dev/null
@@ -1,105 +0,0 @@
-delays - Information on the various kernel delay / sleep mechanisms
--------------------------------------------------------------------
-
-This document seeks to answer the common question: "What is the
-RightWay (TM) to insert a delay?"
-
-This question is most often faced by driver writers who have to
-deal with hardware delays and who may not be the most intimately
-familiar with the inner workings of the Linux Kernel.
-
-
-Inserting Delays
-----------------
-
-The first, and most important, question you need to ask is "Is my
-code in an atomic context?"  This should be followed closely by "Does
-it really need to delay in atomic context?" If so...
-
-ATOMIC CONTEXT:
-	You must use the *delay family of functions. These
-	functions use the jiffie estimation of clock speed
-	and will busy wait for enough loop cycles to achieve
-	the desired delay:
-
-	ndelay(unsigned long nsecs)
-	udelay(unsigned long usecs)
-	mdelay(unsigned long msecs)
-
-	udelay is the generally preferred API; ndelay-level
-	precision may not actually exist on many non-PC devices.
-
-	mdelay is macro wrapper around udelay, to account for
-	possible overflow when passing large arguments to udelay.
-	In general, use of mdelay is discouraged and code should
-	be refactored to allow for the use of msleep.
-
-NON-ATOMIC CONTEXT:
-	You should use the *sleep[_range] family of functions.
-	There are a few more options here, while any of them may
-	work correctly, using the "right" sleep function will
-	help the scheduler, power management, and just make your
-	driver better :)
-
-	-- Backed by busy-wait loop:
-		udelay(unsigned long usecs)
-	-- Backed by hrtimers:
-		usleep_range(unsigned long min, unsigned long max)
-	-- Backed by jiffies / legacy_timers
-		msleep(unsigned long msecs)
-		msleep_interruptible(unsigned long msecs)
-
-	Unlike the *delay family, the underlying mechanism
-	driving each of these calls varies, thus there are
-	quirks you should be aware of.
-
-
-	SLEEPING FOR "A FEW" USECS ( < ~10us? ):
-		* Use udelay
-
-		- Why not usleep?
-			On slower systems, (embedded, OR perhaps a speed-
-			stepped PC!) the overhead of setting up the hrtimers
-			for usleep *may* not be worth it. Such an evaluation
-			will obviously depend on your specific situation, but
-			it is something to be aware of.
-
-	SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
-		* Use usleep_range
-
-		- Why not msleep for (1ms - 20ms)?
-			Explained originally here:
-				http://lkml.org/lkml/2007/8/3/250
-			msleep(1~20) may not do what the caller intends, and
-			will often sleep longer (~20 ms actual sleep for any
-			value given in the 1~20ms range). In many cases this
-			is not the desired behavior.
-
-		- Why is there no "usleep" / What is a good range?
-			Since usleep_range is built on top of hrtimers, the
-			wakeup will be very precise (ish), thus a simple
-			usleep function would likely introduce a large number
-			of undesired interrupts.
-
-			With the introduction of a range, the scheduler is
-			free to coalesce your wakeup with any other wakeup
-			that may have happened for other reasons, or at the
-			worst case, fire an interrupt for your upper bound.
-
-			The larger a range you supply, the greater a chance
-			that you will not trigger an interrupt; this should
-			be balanced with what is an acceptable upper bound on
-			delay / performance for your specific code path. Exact
-			tolerances here are very situation specific, thus it
-			is left to the caller to determine a reasonable range.
-
-	SLEEPING FOR LARGER MSECS ( 10ms+ )
-		* Use msleep or possibly msleep_interruptible
-
-		- What's the difference?
-			msleep sets the current task to TASK_UNINTERRUPTIBLE
-			whereas msleep_interruptible sets the current task to
-			TASK_INTERRUPTIBLE before scheduling the sleep. In
-			short, the difference is whether the sleep can be ended
-			early by a signal. In general, just use msleep unless
-			you know you have a need for the interruptible variant.
diff --git a/MAINTAINERS b/MAINTAINERS
index 5fe44d5d82b4..0db7f12439f7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7192,7 +7192,7 @@ F:	drivers/net/ethernet/hp/hp100.*
 HPET:	High Precision Event Timers driver
 M:	Clemens Ladisch <clemens@ladisch.de>
 S:	Maintained
-F:	Documentation/timers/hpet.txt
+F:	Documentation/timers/hpet.rst
 F:	drivers/char/hpet.c
 F:	include/linux/hpet.h
 F:	include/uapi/linux/hpet.h
diff --git a/drivers/media/usb/dvb-usb-v2/anysee.c b/drivers/media/usb/dvb-usb-v2/anysee.c
index 48fb0d41e03b..fb6d99dea31a 100644
--- a/drivers/media/usb/dvb-usb-v2/anysee.c
+++ b/drivers/media/usb/dvb-usb-v2/anysee.c
@@ -56,7 +56,7 @@ static int anysee_ctrl_msg(struct dvb_usb_device *d,
 	/* TODO FIXME: dvb_usb_generic_rw() fails rarely with error code -32
 	 * (EPIPE, Broken pipe). Function supports currently msleep() as a
 	 * parameter but I would not like to use it, since according to
-	 * Documentation/timers/timers-howto.txt it should not be used such
+	 * Documentation/timers/timers-howto.rst it should not be used such
 	 * short, under < 20ms, sleeps. Repeating failed message would be
 	 * better choice as not to add unwanted delays...
 	 * Fixing that correctly is one of those or both;
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index c894cf0d8a28..c5d8996d5165 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2304,7 +2304,7 @@ static int regulator_ena_gpio_ctrl(struct regulator_dev *rdev, bool enable)
  *
  * Delay for the requested amount of time as per the guidelines in:
  *
- *     Documentation/timers/timers-howto.txt
+ *     Documentation/timers/timers-howto.rst
  *
  * The assumption here is that regulators will never be enabled in
  * atomic context and therefore sleeping functions can be used.
diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
index 3908353deec6..35e15dfd4155 100644
--- a/include/linux/iopoll.h
+++ b/include/linux/iopoll.h
@@ -21,7 +21,7 @@
  * @cond: Break condition (usually involving @val)
  * @sleep_us: Maximum time to sleep between reads in us (0
  *            tight-loops).  Should be less than ~20ms since usleep_range
- *            is used (see Documentation/timers/timers-howto.txt).
+ *            is used (see Documentation/timers/timers-howto.rst).
  * @timeout_us: Timeout in us, 0 means never timeout
  *
  * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
@@ -60,7 +60,7 @@
  * @cond: Break condition (usually involving @val)
  * @delay_us: Time to udelay between reads in us (0 tight-loops).  Should
  *            be less than ~10us since udelay is used (see
- *            Documentation/timers/timers-howto.txt).
+ *            Documentation/timers/timers-howto.rst).
  * @timeout_us: Timeout in us, 0 means never timeout
  *
  * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
diff --git a/include/linux/regmap.h b/include/linux/regmap.h
index daeec7dbd65c..ed5e9d0a1285 100644
--- a/include/linux/regmap.h
+++ b/include/linux/regmap.h
@@ -112,7 +112,7 @@ struct reg_sequence {
  * @cond: Break condition (usually involving @val)
  * @sleep_us: Maximum time to sleep between reads in us (0
  *            tight-loops).  Should be less than ~20ms since usleep_range
- *            is used (see Documentation/timers/timers-howto.txt).
+ *            is used (see Documentation/timers/timers-howto.rst).
  * @timeout_us: Timeout in us, 0 means never timeout
  *
  * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_read
@@ -154,7 +154,7 @@ struct reg_sequence {
  * @cond: Break condition (usually involving @val)
  * @sleep_us: Maximum time to sleep between reads in us (0
  *            tight-loops).  Should be less than ~20ms since usleep_range
- *            is used (see Documentation/timers/timers-howto.txt).
+ *            is used (see Documentation/timers/timers-howto.rst).
  * @timeout_us: Timeout in us, 0 means never timeout
  *
  * Returns 0 on success and -ETIMEDOUT upon a timeout or the regmap_field_read
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 342c7c781ba5..a6d436809bf5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5712,7 +5712,7 @@ sub process {
 			# ignore udelay's < 10, however
 			if (! ($delay < 10) ) {
 				CHK("USLEEP_RANGE",
-				    "usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt\n" . $herecurr);
+				    "usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst\n" . $herecurr);
 			}
 			if ($delay > 2000) {
 				WARN("LONG_UDELAY",
@@ -5724,7 +5724,7 @@ sub process {
 		if ($line =~ /\bmsleep\s*\((\d+)\);/) {
 			if ($1 < 20) {
 				WARN("MSLEEP",
-				     "msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.txt\n" . $herecurr);
+				     "msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst\n" . $herecurr);
 			}
 		}
 
@@ -6115,11 +6115,11 @@ sub process {
 			my $max = $7;
 			if ($min eq $max) {
 				WARN("USLEEP_RANGE",
-				     "usleep_range should not use min == max args; see Documentation/timers/timers-howto.txt\n" . "$here\n$stat\n");
+				     "usleep_range should not use min == max args; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n");
 			} elsif ($min =~ /^\d+$/ && $max =~ /^\d+$/ &&
 				 $min > $max) {
 				WARN("USLEEP_RANGE",
-				     "usleep_range args reversed, use min then max; see Documentation/timers/timers-howto.txt\n" . "$here\n$stat\n");
+				     "usleep_range args reversed, use min then max; see Documentation/timers/timers-howto.rst\n" . "$here\n$stat\n");
 			}
 		}
 
diff --git a/sound/soc/sof/ops.h b/sound/soc/sof/ops.h
index 80fc3b374c2b..8058a6c73082 100644
--- a/sound/soc/sof/ops.h
+++ b/sound/soc/sof/ops.h
@@ -349,7 +349,7 @@ static inline const struct snd_sof_dsp_ops
  * @cond: Break condition (usually involving @val)
  * @sleep_us: Maximum time to sleep between reads in us (0
  *            tight-loops).  Should be less than ~20ms since usleep_range
- *            is used (see Documentation/timers/timers-howto.txt).
+ *            is used (see Documentation/timers/timers-howto.rst).
  * @timeout_us: Timeout in us, 0 means never timeout
  *
  * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
-- 
cgit v1.2.3-59-g8ed1b


From cc2a2d19f896d174cad16c2348100bec49c00958 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:53:01 -0300
Subject: docs: watchdog: convert docs to ReST and rename to *.rst

Convert those documents and prepare them to be part of the kernel
API book, as most of the stuff there are related to the
Kernel interfaces.

Still, in the future, it would make sense to split the docs,
as some of the stuff is clearly focused on sysadmin tasks.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.txt    |   2 +-
 Documentation/kernel-per-CPU-kthreads.txt          |   2 +-
 .../watchdog/convert_drivers_to_kernel_api.rst     | 219 ++++++
 .../watchdog/convert_drivers_to_kernel_api.txt     | 218 ------
 Documentation/watchdog/hpwdt.rst                   |  73 ++
 Documentation/watchdog/hpwdt.txt                   |  66 --
 Documentation/watchdog/index.rst                   |  25 +
 Documentation/watchdog/mlx-wdt.rst                 |  56 ++
 Documentation/watchdog/mlx-wdt.txt                 |  52 --
 Documentation/watchdog/pcwd-watchdog.rst           |  71 ++
 Documentation/watchdog/pcwd-watchdog.txt           |  66 --
 Documentation/watchdog/watchdog-api.rst            | 271 ++++++++
 Documentation/watchdog/watchdog-api.txt            | 237 -------
 Documentation/watchdog/watchdog-kernel-api.rst     | 338 ++++++++++
 Documentation/watchdog/watchdog-kernel-api.txt     | 305 ---------
 Documentation/watchdog/watchdog-parameters.rst     | 736 +++++++++++++++++++++
 Documentation/watchdog/watchdog-parameters.txt     | 410 ------------
 Documentation/watchdog/watchdog-pm.rst             |  22 +
 Documentation/watchdog/watchdog-pm.txt             |  19 -
 Documentation/watchdog/wdt.rst                     |  63 ++
 Documentation/watchdog/wdt.txt                     |  50 --
 MAINTAINERS                                        |   2 +-
 drivers/watchdog/Kconfig                           |   6 +-
 drivers/watchdog/smsc37b787_wdt.c                  |   2 +-
 24 files changed, 1881 insertions(+), 1430 deletions(-)
 create mode 100644 Documentation/watchdog/convert_drivers_to_kernel_api.rst
 delete mode 100644 Documentation/watchdog/convert_drivers_to_kernel_api.txt
 create mode 100644 Documentation/watchdog/hpwdt.rst
 delete mode 100644 Documentation/watchdog/hpwdt.txt
 create mode 100644 Documentation/watchdog/index.rst
 create mode 100644 Documentation/watchdog/mlx-wdt.rst
 delete mode 100644 Documentation/watchdog/mlx-wdt.txt
 create mode 100644 Documentation/watchdog/pcwd-watchdog.rst
 delete mode 100644 Documentation/watchdog/pcwd-watchdog.txt
 create mode 100644 Documentation/watchdog/watchdog-api.rst
 delete mode 100644 Documentation/watchdog/watchdog-api.txt
 create mode 100644 Documentation/watchdog/watchdog-kernel-api.rst
 delete mode 100644 Documentation/watchdog/watchdog-kernel-api.txt
 create mode 100644 Documentation/watchdog/watchdog-parameters.rst
 delete mode 100644 Documentation/watchdog/watchdog-parameters.txt
 create mode 100644 Documentation/watchdog/watchdog-pm.rst
 delete mode 100644 Documentation/watchdog/watchdog-pm.txt
 create mode 100644 Documentation/watchdog/wdt.rst
 delete mode 100644 Documentation/watchdog/wdt.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2148fd289851..9ac37fcca3ee 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5160,7 +5160,7 @@
 			Default: 3 = cyan.
 
 	watchdog timers	[HW,WDT] For information on watchdog timers,
-			see Documentation/watchdog/watchdog-parameters.txt
+			see Documentation/watchdog/watchdog-parameters.rst
 			or other driver-specific files in the
 			Documentation/watchdog/ directory.
 
diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt
index 23b0c8b20cd1..5623b9916411 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -348,7 +348,7 @@ To reduce its OS jitter, do at least one of the following:
 2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
 	from being created.  Other related watchdog and softlockup boot
 	parameters may be found in Documentation/admin-guide/kernel-parameters.rst
-	and Documentation/watchdog/watchdog-parameters.txt.
+	and Documentation/watchdog/watchdog-parameters.rst.
 3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
 	watchdog timer.
 4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.rst b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
new file mode 100644
index 000000000000..dd934cc08e40
--- /dev/null
+++ b/Documentation/watchdog/convert_drivers_to_kernel_api.rst
@@ -0,0 +1,219 @@
+=========================================================
+Converting old watchdog drivers to the watchdog framework
+=========================================================
+
+by Wolfram Sang <w.sang@pengutronix.de>
+
+Before the watchdog framework came into the kernel, every driver had to
+implement the API on its own. Now, as the framework factored out the common
+components, those drivers can be lightened making it a user of the framework.
+This document shall guide you for this task. The necessary steps are described
+as well as things to look out for.
+
+
+Remove the file_operations struct
+---------------------------------
+
+Old drivers define their own file_operations for actions like open(), write(),
+etc... These are now handled by the framework and just call the driver when
+needed. So, in general, the 'file_operations' struct and assorted functions can
+go. Only very few driver-specific details have to be moved to other functions.
+Here is a overview of the functions and probably needed actions:
+
+- open: Everything dealing with resource management (file-open checks, magic
+  close preparations) can simply go. Device specific stuff needs to go to the
+  driver specific start-function. Note that for some drivers, the start-function
+  also serves as the ping-function. If that is the case and you need start/stop
+  to be balanced (clocks!), you are better off refactoring a separate start-function.
+
+- close: Same hints as for open apply.
+
+- write: Can simply go, all defined behaviour is taken care of by the framework,
+  i.e. ping on write and magic char ('V') handling.
+
+- ioctl: While the driver is allowed to have extensions to the IOCTL interface,
+  the most common ones are handled by the framework, supported by some assistance
+  from the driver:
+
+	WDIOC_GETSUPPORT:
+		Returns the mandatory watchdog_info struct from the driver
+
+	WDIOC_GETSTATUS:
+		Needs the status-callback defined, otherwise returns 0
+
+	WDIOC_GETBOOTSTATUS:
+		Needs the bootstatus member properly set. Make sure it is 0 if you
+		don't have further support!
+
+	WDIOC_SETOPTIONS:
+		No preparations needed
+
+	WDIOC_KEEPALIVE:
+		If wanted, options in watchdog_info need to have WDIOF_KEEPALIVEPING
+		set
+
+	WDIOC_SETTIMEOUT:
+		Options in watchdog_info need to have WDIOF_SETTIMEOUT set
+		and a set_timeout-callback has to be defined. The core will also
+		do limit-checking, if min_timeout and max_timeout in the watchdog
+		device are set. All is optional.
+
+	WDIOC_GETTIMEOUT:
+		No preparations needed
+
+	WDIOC_GETTIMELEFT:
+		It needs get_timeleft() callback to be defined. Otherwise it
+		will return EOPNOTSUPP
+
+  Other IOCTLs can be served using the ioctl-callback. Note that this is mainly
+  intended for porting old drivers; new drivers should not invent private IOCTLs.
+  Private IOCTLs are processed first. When the callback returns with
+  -ENOIOCTLCMD, the IOCTLs of the framework will be tried, too. Any other error
+  is directly given to the user.
+
+Example conversion::
+
+  -static const struct file_operations s3c2410wdt_fops = {
+  -       .owner          = THIS_MODULE,
+  -       .llseek         = no_llseek,
+  -       .write          = s3c2410wdt_write,
+  -       .unlocked_ioctl = s3c2410wdt_ioctl,
+  -       .open           = s3c2410wdt_open,
+  -       .release        = s3c2410wdt_release,
+  -};
+
+Check the functions for device-specific stuff and keep it for later
+refactoring. The rest can go.
+
+
+Remove the miscdevice
+---------------------
+
+Since the file_operations are gone now, you can also remove the 'struct
+miscdevice'. The framework will create it on watchdog_dev_register() called by
+watchdog_register_device()::
+
+  -static struct miscdevice s3c2410wdt_miscdev = {
+  -       .minor          = WATCHDOG_MINOR,
+  -       .name           = "watchdog",
+  -       .fops           = &s3c2410wdt_fops,
+  -};
+
+
+Remove obsolete includes and defines
+------------------------------------
+
+Because of the simplifications, a few defines are probably unused now. Remove
+them. Includes can be removed, too. For example::
+
+  - #include <linux/fs.h>
+  - #include <linux/miscdevice.h> (if MODULE_ALIAS_MISCDEV is not used)
+  - #include <linux/uaccess.h> (if no custom IOCTLs are used)
+
+
+Add the watchdog operations
+---------------------------
+
+All possible callbacks are defined in 'struct watchdog_ops'. You can find it
+explained in 'watchdog-kernel-api.txt' in this directory. start(), stop() and
+owner must be set, the rest are optional. You will easily find corresponding
+functions in the old driver. Note that you will now get a pointer to the
+watchdog_device as a parameter to these functions, so you probably have to
+change the function header. Other changes are most likely not needed, because
+here simply happens the direct hardware access. If you have device-specific
+code left from the above steps, it should be refactored into these callbacks.
+
+Here is a simple example::
+
+  +static struct watchdog_ops s3c2410wdt_ops = {
+  +       .owner = THIS_MODULE,
+  +       .start = s3c2410wdt_start,
+  +       .stop = s3c2410wdt_stop,
+  +       .ping = s3c2410wdt_keepalive,
+  +       .set_timeout = s3c2410wdt_set_heartbeat,
+  +};
+
+A typical function-header change looks like::
+
+  -static void s3c2410wdt_keepalive(void)
+  +static int s3c2410wdt_keepalive(struct watchdog_device *wdd)
+   {
+  ...
+  +
+  +       return 0;
+   }
+
+  ...
+
+  -       s3c2410wdt_keepalive();
+  +       s3c2410wdt_keepalive(&s3c2410_wdd);
+
+
+Add the watchdog device
+-----------------------
+
+Now we need to create a 'struct watchdog_device' and populate it with the
+necessary information for the framework. The struct is also explained in detail
+in 'watchdog-kernel-api.txt' in this directory. We pass it the mandatory
+watchdog_info struct and the newly created watchdog_ops. Often, old drivers
+have their own record-keeping for things like bootstatus and timeout using
+static variables. Those have to be converted to use the members in
+watchdog_device. Note that the timeout values are unsigned int. Some drivers
+use signed int, so this has to be converted, too.
+
+Here is a simple example for a watchdog device::
+
+  +static struct watchdog_device s3c2410_wdd = {
+  +       .info = &s3c2410_wdt_ident,
+  +       .ops = &s3c2410wdt_ops,
+  +};
+
+
+Handle the 'nowayout' feature
+-----------------------------
+
+A few drivers use nowayout statically, i.e. there is no module parameter for it
+and only CONFIG_WATCHDOG_NOWAYOUT determines if the feature is going to be
+used. This needs to be converted by initializing the status variable of the
+watchdog_device like this::
+
+        .status = WATCHDOG_NOWAYOUT_INIT_STATUS,
+
+Most drivers, however, also allow runtime configuration of nowayout, usually
+by adding a module parameter. The conversion for this would be something like::
+
+	watchdog_set_nowayout(&s3c2410_wdd, nowayout);
+
+The module parameter itself needs to stay, everything else related to nowayout
+can go, though. This will likely be some code in open(), close() or write().
+
+
+Register the watchdog device
+----------------------------
+
+Replace misc_register(&miscdev) with watchdog_register_device(&watchdog_dev).
+Make sure the return value gets checked and the error message, if present,
+still fits. Also convert the unregister case::
+
+  -       ret = misc_register(&s3c2410wdt_miscdev);
+  +       ret = watchdog_register_device(&s3c2410_wdd);
+
+  ...
+
+  -       misc_deregister(&s3c2410wdt_miscdev);
+  +       watchdog_unregister_device(&s3c2410_wdd);
+
+
+Update the Kconfig-entry
+------------------------
+
+The entry for the driver now needs to select WATCHDOG_CORE:
+
+  +       select WATCHDOG_CORE
+
+
+Create a patch and send it to upstream
+--------------------------------------
+
+Make sure you understood Documentation/process/submitting-patches.rst and send your patch to
+linux-watchdog@vger.kernel.org. We are looking forward to it :)
diff --git a/Documentation/watchdog/convert_drivers_to_kernel_api.txt b/Documentation/watchdog/convert_drivers_to_kernel_api.txt
deleted file mode 100644
index 9fffb2958d13..000000000000
--- a/Documentation/watchdog/convert_drivers_to_kernel_api.txt
+++ /dev/null
@@ -1,218 +0,0 @@
-Converting old watchdog drivers to the watchdog framework
-by Wolfram Sang <w.sang@pengutronix.de>
-=========================================================
-
-Before the watchdog framework came into the kernel, every driver had to
-implement the API on its own. Now, as the framework factored out the common
-components, those drivers can be lightened making it a user of the framework.
-This document shall guide you for this task. The necessary steps are described
-as well as things to look out for.
-
-
-Remove the file_operations struct
----------------------------------
-
-Old drivers define their own file_operations for actions like open(), write(),
-etc... These are now handled by the framework and just call the driver when
-needed. So, in general, the 'file_operations' struct and assorted functions can
-go. Only very few driver-specific details have to be moved to other functions.
-Here is a overview of the functions and probably needed actions:
-
-- open: Everything dealing with resource management (file-open checks, magic
-  close preparations) can simply go. Device specific stuff needs to go to the
-  driver specific start-function. Note that for some drivers, the start-function
-  also serves as the ping-function. If that is the case and you need start/stop
-  to be balanced (clocks!), you are better off refactoring a separate start-function.
-
-- close: Same hints as for open apply.
-
-- write: Can simply go, all defined behaviour is taken care of by the framework,
-  i.e. ping on write and magic char ('V') handling.
-
-- ioctl: While the driver is allowed to have extensions to the IOCTL interface,
-  the most common ones are handled by the framework, supported by some assistance
-  from the driver:
-
-	WDIOC_GETSUPPORT:
-		Returns the mandatory watchdog_info struct from the driver
-
-	WDIOC_GETSTATUS:
-		Needs the status-callback defined, otherwise returns 0
-
-	WDIOC_GETBOOTSTATUS:
-		Needs the bootstatus member properly set. Make sure it is 0 if you
-		don't have further support!
-
-	WDIOC_SETOPTIONS:
-		No preparations needed
-
-	WDIOC_KEEPALIVE:
-		If wanted, options in watchdog_info need to have WDIOF_KEEPALIVEPING
-		set
-
-	WDIOC_SETTIMEOUT:
-		Options in watchdog_info need to have WDIOF_SETTIMEOUT set
-		and a set_timeout-callback has to be defined. The core will also
-		do limit-checking, if min_timeout and max_timeout in the watchdog
-		device are set. All is optional.
-
-	WDIOC_GETTIMEOUT:
-		No preparations needed
-
-	WDIOC_GETTIMELEFT:
-		It needs get_timeleft() callback to be defined. Otherwise it
-		will return EOPNOTSUPP
-
-  Other IOCTLs can be served using the ioctl-callback. Note that this is mainly
-  intended for porting old drivers; new drivers should not invent private IOCTLs.
-  Private IOCTLs are processed first. When the callback returns with
-  -ENOIOCTLCMD, the IOCTLs of the framework will be tried, too. Any other error
-  is directly given to the user.
-
-Example conversion:
-
--static const struct file_operations s3c2410wdt_fops = {
--       .owner          = THIS_MODULE,
--       .llseek         = no_llseek,
--       .write          = s3c2410wdt_write,
--       .unlocked_ioctl = s3c2410wdt_ioctl,
--       .open           = s3c2410wdt_open,
--       .release        = s3c2410wdt_release,
--};
-
-Check the functions for device-specific stuff and keep it for later
-refactoring. The rest can go.
-
-
-Remove the miscdevice
----------------------
-
-Since the file_operations are gone now, you can also remove the 'struct
-miscdevice'. The framework will create it on watchdog_dev_register() called by
-watchdog_register_device().
-
--static struct miscdevice s3c2410wdt_miscdev = {
--       .minor          = WATCHDOG_MINOR,
--       .name           = "watchdog",
--       .fops           = &s3c2410wdt_fops,
--};
-
-
-Remove obsolete includes and defines
-------------------------------------
-
-Because of the simplifications, a few defines are probably unused now. Remove
-them. Includes can be removed, too. For example:
-
-- #include <linux/fs.h>
-- #include <linux/miscdevice.h> (if MODULE_ALIAS_MISCDEV is not used)
-- #include <linux/uaccess.h> (if no custom IOCTLs are used)
-
-
-Add the watchdog operations
----------------------------
-
-All possible callbacks are defined in 'struct watchdog_ops'. You can find it
-explained in 'watchdog-kernel-api.txt' in this directory. start(), stop() and
-owner must be set, the rest are optional. You will easily find corresponding
-functions in the old driver. Note that you will now get a pointer to the
-watchdog_device as a parameter to these functions, so you probably have to
-change the function header. Other changes are most likely not needed, because
-here simply happens the direct hardware access. If you have device-specific
-code left from the above steps, it should be refactored into these callbacks.
-
-Here is a simple example:
-
-+static struct watchdog_ops s3c2410wdt_ops = {
-+       .owner = THIS_MODULE,
-+       .start = s3c2410wdt_start,
-+       .stop = s3c2410wdt_stop,
-+       .ping = s3c2410wdt_keepalive,
-+       .set_timeout = s3c2410wdt_set_heartbeat,
-+};
-
-A typical function-header change looks like:
-
--static void s3c2410wdt_keepalive(void)
-+static int s3c2410wdt_keepalive(struct watchdog_device *wdd)
- {
-...
-+
-+       return 0;
- }
-
-...
-
--       s3c2410wdt_keepalive();
-+       s3c2410wdt_keepalive(&s3c2410_wdd);
-
-
-Add the watchdog device
------------------------
-
-Now we need to create a 'struct watchdog_device' and populate it with the
-necessary information for the framework. The struct is also explained in detail
-in 'watchdog-kernel-api.txt' in this directory. We pass it the mandatory
-watchdog_info struct and the newly created watchdog_ops. Often, old drivers
-have their own record-keeping for things like bootstatus and timeout using
-static variables. Those have to be converted to use the members in
-watchdog_device. Note that the timeout values are unsigned int. Some drivers
-use signed int, so this has to be converted, too.
-
-Here is a simple example for a watchdog device:
-
-+static struct watchdog_device s3c2410_wdd = {
-+       .info = &s3c2410_wdt_ident,
-+       .ops = &s3c2410wdt_ops,
-+};
-
-
-Handle the 'nowayout' feature
------------------------------
-
-A few drivers use nowayout statically, i.e. there is no module parameter for it
-and only CONFIG_WATCHDOG_NOWAYOUT determines if the feature is going to be
-used. This needs to be converted by initializing the status variable of the
-watchdog_device like this:
-
-        .status = WATCHDOG_NOWAYOUT_INIT_STATUS,
-
-Most drivers, however, also allow runtime configuration of nowayout, usually
-by adding a module parameter. The conversion for this would be something like:
-
-	watchdog_set_nowayout(&s3c2410_wdd, nowayout);
-
-The module parameter itself needs to stay, everything else related to nowayout
-can go, though. This will likely be some code in open(), close() or write().
-
-
-Register the watchdog device
-----------------------------
-
-Replace misc_register(&miscdev) with watchdog_register_device(&watchdog_dev).
-Make sure the return value gets checked and the error message, if present,
-still fits. Also convert the unregister case.
-
--       ret = misc_register(&s3c2410wdt_miscdev);
-+       ret = watchdog_register_device(&s3c2410_wdd);
-
-...
-
--       misc_deregister(&s3c2410wdt_miscdev);
-+       watchdog_unregister_device(&s3c2410_wdd);
-
-
-Update the Kconfig-entry
-------------------------
-
-The entry for the driver now needs to select WATCHDOG_CORE:
-
-+       select WATCHDOG_CORE
-
-
-Create a patch and send it to upstream
---------------------------------------
-
-Make sure you understood Documentation/process/submitting-patches.rst and send your patch to
-linux-watchdog@vger.kernel.org. We are looking forward to it :)
-
diff --git a/Documentation/watchdog/hpwdt.rst b/Documentation/watchdog/hpwdt.rst
new file mode 100644
index 000000000000..94a96371113e
--- /dev/null
+++ b/Documentation/watchdog/hpwdt.rst
@@ -0,0 +1,73 @@
+===========================
+HPE iLO NMI Watchdog Driver
+===========================
+
+for iLO based ProLiant Servers
+==============================
+
+Last reviewed: 08/20/2018
+
+
+ The HPE iLO NMI Watchdog driver is a kernel module that provides basic
+ watchdog functionality and handler for the iLO "Generate NMI to System"
+ virtual button.
+
+ All references to iLO in this document imply it also works on iLO2 and all
+ subsequent generations.
+
+ Watchdog functionality is enabled like any other common watchdog driver. That
+ is, an application needs to be started that kicks off the watchdog timer. A
+ basic application exists in tools/testing/selftests/watchdog/ named
+ watchdog-test.c. Simply compile the C file and kick it off. If the system
+ gets into a bad state and hangs, the HPE ProLiant iLO timer register will
+ not be updated in a timely fashion and a hardware system reset (also known as
+ an Automatic Server Recovery (ASR)) event will occur.
+
+ The hpwdt driver also has the following module parameters:
+
+ ============  ================================================================
+ soft_margin   allows the user to set the watchdog timer value.
+               Default value is 30 seconds.
+ timeout       an alias of soft_margin.
+ pretimeout    allows the user to set the watchdog pretimeout value.
+               This is the number of seconds before timeout when an
+               NMI is delivered to the system. Setting the value to
+               zero disables the pretimeout NMI.
+               Default value is 9 seconds.
+ nowayout      basic watchdog parameter that does not allow the timer to
+               be restarted or an impending ASR to be escaped.
+               Default value is set when compiling the kernel. If it is set
+               to "Y", then there is no way of disabling the watchdog once
+               it has been started.
+ ============  ================================================================
+
+ NOTE:
+       More information about watchdog drivers in general, including the ioctl
+       interface to /dev/watchdog can be found in
+       Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt.
+
+ Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
+ can only be set to 9 seconds.  Attempts to set pretimeout to other
+ non-zero values will be rounded, possibly to zero.  Users should verify
+ the pretimeout value after attempting to set pretimeout or timeout.
+
+ Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
+ panic. This is to allow for a crash dump to be collected.  It is incumbent
+ upon the user to have properly configured the system for kdump.
+
+ The default Linux kernel behavior upon panic is to print a kernel tombstone
+ and loop forever.  This is generally not what a watchdog user wants.
+
+ For those wishing to learn more please see:
+	Documentation/kdump/kdump.rst
+	Documentation/admin-guide/kernel-parameters.txt (panic=)
+	Your Linux Distribution specific documentation.
+
+ If the hpwdt does not receive the NMI associated with an expiring timer,
+ the iLO will proceed to reset the system at timeout if the timer hasn't
+ been updated.
+
+--
+
+ The HPE iLO NMI Watchdog Driver and documentation were originally developed
+ by Tom Mingarelli.
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
deleted file mode 100644
index aaa9e4b4bdcd..000000000000
--- a/Documentation/watchdog/hpwdt.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Last reviewed: 08/20/2018
-
-                     HPE iLO NMI Watchdog Driver
-                   for iLO based ProLiant Servers
-
- The HPE iLO NMI Watchdog driver is a kernel module that provides basic
- watchdog functionality and handler for the iLO "Generate NMI to System"
- virtual button.
-
- All references to iLO in this document imply it also works on iLO2 and all
- subsequent generations.
-
- Watchdog functionality is enabled like any other common watchdog driver. That
- is, an application needs to be started that kicks off the watchdog timer. A
- basic application exists in tools/testing/selftests/watchdog/ named
- watchdog-test.c. Simply compile the C file and kick it off. If the system
- gets into a bad state and hangs, the HPE ProLiant iLO timer register will
- not be updated in a timely fashion and a hardware system reset (also known as
- an Automatic Server Recovery (ASR)) event will occur.
-
- The hpwdt driver also has the following module parameters:
-
- soft_margin - allows the user to set the watchdog timer value.
-               Default value is 30 seconds.
- timeout     - an alias of soft_margin.
- pretimeout  - allows the user to set the watchdog pretimeout value.
-               This is the number of seconds before timeout when an
-               NMI is delivered to the system. Setting the value to
-               zero disables the pretimeout NMI.
-               Default value is 9 seconds.
- nowayout    - basic watchdog parameter that does not allow the timer to
-               be restarted or an impending ASR to be escaped.
-               Default value is set when compiling the kernel. If it is set
-               to "Y", then there is no way of disabling the watchdog once
-               it has been started.
-
- NOTE: More information about watchdog drivers in general, including the ioctl
-       interface to /dev/watchdog can be found in
-       Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
-
- Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
- can only be set to 9 seconds.  Attempts to set pretimeout to other
- non-zero values will be rounded, possibly to zero.  Users should verify
- the pretimeout value after attempting to set pretimeout or timeout.
-
- Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
- panic. This is to allow for a crash dump to be collected.  It is incumbent
- upon the user to have properly configured the system for kdump.
-
- The default Linux kernel behavior upon panic is to print a kernel tombstone
- and loop forever.  This is generally not what a watchdog user wants.
-
- For those wishing to learn more please see:
-	Documentation/kdump/kdump.rst
-	Documentation/admin-guide/kernel-parameters.txt (panic=)
-	Your Linux Distribution specific documentation.
-
- If the hpwdt does not receive the NMI associated with an expiring timer,
- the iLO will proceed to reset the system at timeout if the timer hasn't
- been updated.
-
---
-
- The HPE iLO NMI Watchdog Driver and documentation were originally developed
- by Tom Mingarelli.
-
diff --git a/Documentation/watchdog/index.rst b/Documentation/watchdog/index.rst
new file mode 100644
index 000000000000..33a0de631e84
--- /dev/null
+++ b/Documentation/watchdog/index.rst
@@ -0,0 +1,25 @@
+:orphan:
+
+======================
+Linux Watchdog Support
+======================
+
+.. toctree::
+    :maxdepth: 1
+
+    hpwdt
+    mlx-wdt
+    pcwd-watchdog
+    watchdog-api
+    watchdog-kernel-api
+    watchdog-parameters
+    watchdog-pm
+    wdt
+    convert_drivers_to_kernel_api
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/watchdog/mlx-wdt.rst b/Documentation/watchdog/mlx-wdt.rst
new file mode 100644
index 000000000000..bf5bafac47f0
--- /dev/null
+++ b/Documentation/watchdog/mlx-wdt.rst
@@ -0,0 +1,56 @@
+=========================
+Mellanox watchdog drivers
+=========================
+
+for x86 based system switches
+=============================
+
+This driver provides watchdog functionality for various Mellanox
+Ethernet and Infiniband switch systems.
+
+Mellanox watchdog device is implemented in a programmable logic device.
+
+There are 2 types of HW watchdog implementations.
+
+Type 1:
+  Actual HW timeout can be defined as a power of 2 msec.
+  e.g. timeout 20 sec will be rounded up to 32768 msec.
+  The maximum timeout period is 32 sec (32768 msec.),
+  Get time-left isn't supported
+
+Type 2:
+  Actual HW timeout is defined in sec. and it's the same as
+  a user-defined timeout.
+  Maximum timeout is 255 sec.
+  Get time-left is supported.
+
+Type 1 HW watchdog implementation exist in old systems and
+all new systems have type 2 HW watchdog.
+Two types of HW implementation have also different register map.
+
+Mellanox system can have 2 watchdogs: main and auxiliary.
+Main and auxiliary watchdog devices can be enabled together
+on the same system.
+There are several actions that can be defined in the watchdog:
+system reset, start fans on full speed and increase register counter.
+The last 2 actions are performed without a system reset.
+Actions without reset are provided for auxiliary watchdog device,
+which is optional.
+Watchdog can be started during a probe, in this case it will be
+pinged by watchdog core before watchdog device will be opened by
+user space application.
+Watchdog can be initialised in nowayout way, i.e. oncse started
+it can't be stopped.
+
+This mlx-wdt driver supports both HW watchdog implementations.
+
+Watchdog driver is probed from the common mlx_platform driver.
+Mlx_platform driver provides an appropriate set of registers for
+Mellanox watchdog device, identity name (mlx-wdt-main or mlx-wdt-aux),
+initial timeout, performed action in expiration and configuration flags.
+watchdog configuration flags: nowayout and start_at_boot, hw watchdog
+version - type1 or type2.
+The driver checks during initialization if the previous system reset
+was done by the watchdog. If yes, it makes a notification about this event.
+
+Access to HW registers is performed through a generic regmap interface.
diff --git a/Documentation/watchdog/mlx-wdt.txt b/Documentation/watchdog/mlx-wdt.txt
deleted file mode 100644
index 66eeb78505c3..000000000000
--- a/Documentation/watchdog/mlx-wdt.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-		Mellanox watchdog drivers
-		for x86 based system switches
-
-This driver provides watchdog functionality for various Mellanox
-Ethernet and Infiniband switch systems.
-
-Mellanox watchdog device is implemented in a programmable logic device.
-
-There are 2 types of HW watchdog implementations.
-
-Type 1:
-Actual HW timeout can be defined as a power of 2 msec.
-e.g. timeout 20 sec will be rounded up to 32768 msec.
-The maximum timeout period is 32 sec (32768 msec.),
-Get time-left isn't supported
-
-Type 2:
-Actual HW timeout is defined in sec. and it's the same as
-a user-defined timeout.
-Maximum timeout is 255 sec.
-Get time-left is supported.
-
-Type 1 HW watchdog implementation exist in old systems and
-all new systems have type 2 HW watchdog.
-Two types of HW implementation have also different register map.
-
-Mellanox system can have 2 watchdogs: main and auxiliary.
-Main and auxiliary watchdog devices can be enabled together
-on the same system.
-There are several actions that can be defined in the watchdog:
-system reset, start fans on full speed and increase register counter.
-The last 2 actions are performed without a system reset.
-Actions without reset are provided for auxiliary watchdog device,
-which is optional.
-Watchdog can be started during a probe, in this case it will be
-pinged by watchdog core before watchdog device will be opened by
-user space application.
-Watchdog can be initialised in nowayout way, i.e. oncse started
-it can't be stopped.
-
-This mlx-wdt driver supports both HW watchdog implementations.
-
-Watchdog driver is probed from the common mlx_platform driver.
-Mlx_platform driver provides an appropriate set of registers for
-Mellanox watchdog device, identity name (mlx-wdt-main or mlx-wdt-aux),
-initial timeout, performed action in expiration and configuration flags.
-watchdog configuration flags: nowayout and start_at_boot, hw watchdog
-version - type1 or type2.
-The driver checks during initialization if the previous system reset
-was done by the watchdog. If yes, it makes a notification about this event.
-
-Access to HW registers is performed through a generic regmap interface.
diff --git a/Documentation/watchdog/pcwd-watchdog.rst b/Documentation/watchdog/pcwd-watchdog.rst
new file mode 100644
index 000000000000..405e2a370082
--- /dev/null
+++ b/Documentation/watchdog/pcwd-watchdog.rst
@@ -0,0 +1,71 @@
+===================================
+Berkshire Products PC Watchdog Card
+===================================
+
+Last reviewed: 10/05/2007
+
+Support for ISA Cards  Revision A and C
+=======================================
+
+Documentation and Driver by Ken Hollis <kenji@bitgate.com>
+
+ The PC Watchdog is a card that offers the same type of functionality that
+ the WDT card does, only it doesn't require an IRQ to run.  Furthermore,
+ the Revision C card allows you to monitor any IO Port to automatically
+ trigger the card into being reset.  This way you can make the card
+ monitor hard drive status, or anything else you need.
+
+ The Watchdog Driver has one basic role: to talk to the card and send
+ signals to it so it doesn't reset your computer ... at least during
+ normal operation.
+
+ The Watchdog Driver will automatically find your watchdog card, and will
+ attach a running driver for use with that card.  After the watchdog
+ drivers have initialized, you can then talk to the card using a PC
+ Watchdog program.
+
+ I suggest putting a "watchdog -d" before the beginning of an fsck, and
+ a "watchdog -e -t 1" immediately after the end of an fsck.  (Remember
+ to run the program with an "&" to run it in the background!)
+
+ If you want to write a program to be compatible with the PC Watchdog
+ driver, simply use of modify the watchdog test program:
+ tools/testing/selftests/watchdog/watchdog-test.c
+
+
+ Other IOCTL functions include:
+
+	WDIOC_GETSUPPORT
+		This returns the support of the card itself.  This
+		returns in structure "PCWDS" which returns:
+
+			options = WDIOS_TEMPPANIC
+				  (This card supports temperature)
+			firmware_version = xxxx
+				  (Firmware version of the card)
+
+	WDIOC_GETSTATUS
+		This returns the status of the card, with the bits of
+		WDIOF_* bitwise-anded into the value.  (The comments
+		are in linux/pcwd.h)
+
+	WDIOC_GETBOOTSTATUS
+		This returns the status of the card that was reported
+		at bootup.
+
+	WDIOC_GETTEMP
+		This returns the temperature of the card.  (You can also
+		read /dev/watchdog, which gives a temperature update
+		every second.)
+
+	WDIOC_SETOPTIONS
+		This lets you set the options of the card.  You can either
+		enable or disable the card this way.
+
+	WDIOC_KEEPALIVE
+		This pings the card to tell it not to reset your computer.
+
+ And that's all she wrote!
+
+ -- Ken Hollis
+    (kenji@bitgate.com)
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt
deleted file mode 100644
index b8e60a441a43..000000000000
--- a/Documentation/watchdog/pcwd-watchdog.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Last reviewed: 10/05/2007
-
-                     Berkshire Products PC Watchdog Card
-                   Support for ISA Cards  Revision A and C
-           Documentation and Driver by Ken Hollis <kenji@bitgate.com>
-
- The PC Watchdog is a card that offers the same type of functionality that
- the WDT card does, only it doesn't require an IRQ to run.  Furthermore,
- the Revision C card allows you to monitor any IO Port to automatically
- trigger the card into being reset.  This way you can make the card
- monitor hard drive status, or anything else you need.
-
- The Watchdog Driver has one basic role: to talk to the card and send
- signals to it so it doesn't reset your computer ... at least during
- normal operation.
-
- The Watchdog Driver will automatically find your watchdog card, and will
- attach a running driver for use with that card.  After the watchdog
- drivers have initialized, you can then talk to the card using a PC
- Watchdog program.
-
- I suggest putting a "watchdog -d" before the beginning of an fsck, and
- a "watchdog -e -t 1" immediately after the end of an fsck.  (Remember
- to run the program with an "&" to run it in the background!)
-
- If you want to write a program to be compatible with the PC Watchdog
- driver, simply use of modify the watchdog test program:
- tools/testing/selftests/watchdog/watchdog-test.c
-
-
- Other IOCTL functions include:
-
-	WDIOC_GETSUPPORT
-		This returns the support of the card itself.  This
-		returns in structure "PCWDS" which returns:
-			options = WDIOS_TEMPPANIC
-				  (This card supports temperature)
-			firmware_version = xxxx
-				  (Firmware version of the card)
-
-	WDIOC_GETSTATUS
-		This returns the status of the card, with the bits of
-		WDIOF_* bitwise-anded into the value.  (The comments
-		are in linux/pcwd.h)
-
-	WDIOC_GETBOOTSTATUS
-		This returns the status of the card that was reported
-		at bootup.
-
-	WDIOC_GETTEMP
-		This returns the temperature of the card.  (You can also
-		read /dev/watchdog, which gives a temperature update
-		every second.)
-
-	WDIOC_SETOPTIONS
-		This lets you set the options of the card.  You can either
-		enable or disable the card this way.
-
-	WDIOC_KEEPALIVE
-		This pings the card to tell it not to reset your computer.
-
- And that's all she wrote!
-
- -- Ken Hollis
-    (kenji@bitgate.com)
-
diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst
new file mode 100644
index 000000000000..c6c1e9fa9f73
--- /dev/null
+++ b/Documentation/watchdog/watchdog-api.rst
@@ -0,0 +1,271 @@
+=============================
+The Linux Watchdog driver API
+=============================
+
+Last reviewed: 10/05/2007
+
+
+
+Copyright 2002 Christer Weingel <wingel@nano-system.com>
+
+Some parts of this document are copied verbatim from the sbc60xxwdt
+driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
+
+This document describes the state of the Linux 2.4.18 kernel.
+
+Introduction
+============
+
+A Watchdog Timer (WDT) is a hardware circuit that can reset the
+computer system in case of a software fault.  You probably knew that
+already.
+
+Usually a userspace daemon will notify the kernel watchdog driver via the
+/dev/watchdog special device file that userspace is still alive, at
+regular intervals.  When such a notification occurs, the driver will
+usually tell the hardware watchdog that everything is in order, and
+that the watchdog should wait for yet another little while to reset
+the system.  If userspace fails (RAM error, kernel bug, whatever), the
+notifications cease to occur, and the hardware watchdog will reset the
+system (causing a reboot) after the timeout occurs.
+
+The Linux watchdog API is a rather ad-hoc construction and different
+drivers implement different, and sometimes incompatible, parts of it.
+This file is an attempt to document the existing usage and allow
+future driver writers to use it as a reference.
+
+The simplest API
+================
+
+All drivers support the basic mode of operation, where the watchdog
+activates as soon as /dev/watchdog is opened and will reboot unless
+the watchdog is pinged within a certain time, this time is called the
+timeout or margin.  The simplest way to ping the watchdog is to write
+some data to the device.  So a very simple watchdog daemon would look
+like this source file:  see samples/watchdog/watchdog-simple.c
+
+A more advanced driver could for example check that a HTTP server is
+still responding before doing the write call to ping the watchdog.
+
+When the device is closed, the watchdog is disabled, unless the "Magic
+Close" feature is supported (see below).  This is not always such a
+good idea, since if there is a bug in the watchdog daemon and it
+crashes the system will not reboot.  Because of this, some of the
+drivers support the configuration option "Disable watchdog shutdown on
+close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
+the kernel, there is no way of disabling the watchdog once it has been
+started.  So, if the watchdog daemon crashes, the system will reboot
+after the timeout has passed. Watchdog devices also usually support
+the nowayout module parameter so that this option can be controlled at
+runtime.
+
+Magic Close feature
+===================
+
+If a driver supports "Magic Close", the driver will not disable the
+watchdog unless a specific magic character 'V' has been sent to
+/dev/watchdog just before closing the file.  If the userspace daemon
+closes the file without sending this special character, the driver
+will assume that the daemon (and userspace in general) died, and will
+stop pinging the watchdog without disabling it first.  This will then
+cause a reboot if the watchdog is not re-opened in sufficient time.
+
+The ioctl API
+=============
+
+All conforming drivers also support an ioctl API.
+
+Pinging the watchdog using an ioctl:
+
+All drivers that have an ioctl interface support at least one ioctl,
+KEEPALIVE.  This ioctl does exactly the same thing as a write to the
+watchdog device, so the main loop in the above program could be
+replaced with::
+
+	while (1) {
+		ioctl(fd, WDIOC_KEEPALIVE, 0);
+		sleep(10);
+	}
+
+the argument to the ioctl is ignored.
+
+Setting and getting the timeout
+===============================
+
+For some drivers it is possible to modify the watchdog timeout on the
+fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
+flag set in their option field.  The argument is an integer
+representing the timeout in seconds.  The driver returns the real
+timeout used in the same variable, and this timeout might differ from
+the requested one due to limitation of the hardware::
+
+    int timeout = 45;
+    ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
+    printf("The timeout was set to %d seconds\n", timeout);
+
+This example might actually print "The timeout was set to 60 seconds"
+if the device has a granularity of minutes for its timeout.
+
+Starting with the Linux 2.4.18 kernel, it is possible to query the
+current timeout using the GETTIMEOUT ioctl::
+
+    ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
+    printf("The timeout was is %d seconds\n", timeout);
+
+Pretimeouts
+===========
+
+Some watchdog timers can be set to have a trigger go off before the
+actual time they will reset the system.  This can be done with an NMI,
+interrupt, or other mechanism.  This allows Linux to record useful
+information (like panic information and kernel coredumps) before it
+resets::
+
+    pretimeout = 10;
+    ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
+
+Note that the pretimeout is the number of seconds before the time
+when the timeout will go off.  It is not the number of seconds until
+the pretimeout.  So, for instance, if you set the timeout to 60 seconds
+and the pretimeout to 10 seconds, the pretimeout will go off in 50
+seconds.  Setting a pretimeout to zero disables it.
+
+There is also a get function for getting the pretimeout::
+
+    ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
+    printf("The pretimeout was is %d seconds\n", timeout);
+
+Not all watchdog drivers will support a pretimeout.
+
+Get the number of seconds before reboot
+=======================================
+
+Some watchdog drivers have the ability to report the remaining time
+before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
+that returns the number of seconds before reboot::
+
+    ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
+    printf("The timeout was is %d seconds\n", timeleft);
+
+Environmental monitoring
+========================
+
+All watchdog drivers are required return more information about the system,
+some do temperature, fan and power level monitoring, some can tell you
+the reason for the last reboot of the system.  The GETSUPPORT ioctl is
+available to ask what the device can do::
+
+	struct watchdog_info ident;
+	ioctl(fd, WDIOC_GETSUPPORT, &ident);
+
+the fields returned in the ident struct are:
+
+	================	=============================================
+        identity		a string identifying the watchdog driver
+	firmware_version	the firmware version of the card if available
+	options			a flags describing what the device supports
+	================	=============================================
+
+the options field can have the following bits set, and describes what
+kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
+return.   [FIXME -- Is this correct?]
+
+	================	=========================
+	WDIOF_OVERHEAT		Reset due to CPU overheat
+	================	=========================
+
+The machine was last rebooted by the watchdog because the thermal limit was
+exceeded:
+
+	==============		==========
+	WDIOF_FANFAULT		Fan failed
+	==============		==========
+
+A system fan monitored by the watchdog card has failed
+
+	=============		================
+	WDIOF_EXTERN1		External relay 1
+	=============		================
+
+External monitoring relay/source 1 was triggered. Controllers intended for
+real world applications include external monitoring pins that will trigger
+a reset.
+
+	=============		================
+	WDIOF_EXTERN2		External relay 2
+	=============		================
+
+External monitoring relay/source 2 was triggered
+
+	================	=====================
+	WDIOF_POWERUNDER	Power bad/power fault
+	================	=====================
+
+The machine is showing an undervoltage status
+
+	===============		=============================
+	WDIOF_CARDRESET		Card previously reset the CPU
+	===============		=============================
+
+The last reboot was caused by the watchdog card
+
+	================	=====================
+	WDIOF_POWEROVER		Power over voltage
+	================	=====================
+
+The machine is showing an overvoltage status. Note that if one level is
+under and one over both bits will be set - this may seem odd but makes
+sense.
+
+	===================	=====================
+	WDIOF_KEEPALIVEPING	Keep alive ping reply
+	===================	=====================
+
+The watchdog saw a keepalive ping since it was last queried.
+
+	================	=======================
+	WDIOF_SETTIMEOUT	Can set/get the timeout
+	================	=======================
+
+The watchdog can do pretimeouts.
+
+	================	================================
+	WDIOF_PRETIMEOUT	Pretimeout (in seconds), get/set
+	================	================================
+
+
+For those drivers that return any bits set in the option field, the
+GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
+status, and the status at the last reboot, respectively::
+
+    int flags;
+    ioctl(fd, WDIOC_GETSTATUS, &flags);
+
+    or
+
+    ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
+
+Note that not all devices support these two calls, and some only
+support the GETBOOTSTATUS call.
+
+Some drivers can measure the temperature using the GETTEMP ioctl.  The
+returned value is the temperature in degrees fahrenheit::
+
+    int temperature;
+    ioctl(fd, WDIOC_GETTEMP, &temperature);
+
+Finally the SETOPTIONS ioctl can be used to control some aspects of
+the cards operation::
+
+    int options = 0;
+    ioctl(fd, WDIOC_SETOPTIONS, &options);
+
+The following options are available:
+
+	=================	================================
+	WDIOS_DISABLECARD	Turn off the watchdog timer
+	WDIOS_ENABLECARD	Turn on the watchdog timer
+	WDIOS_TEMPPANIC		Kernel panic on temperature trip
+	=================	================================
+
+[FIXME -- better explanations]
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt
deleted file mode 100644
index 0e62ba33b7fb..000000000000
--- a/Documentation/watchdog/watchdog-api.txt
+++ /dev/null
@@ -1,237 +0,0 @@
-Last reviewed: 10/05/2007
-
-
-The Linux Watchdog driver API.
-
-Copyright 2002 Christer Weingel <wingel@nano-system.com>
-
-Some parts of this document are copied verbatim from the sbc60xxwdt
-driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
-
-This document describes the state of the Linux 2.4.18 kernel.
-
-Introduction:
-
-A Watchdog Timer (WDT) is a hardware circuit that can reset the
-computer system in case of a software fault.  You probably knew that
-already.
-
-Usually a userspace daemon will notify the kernel watchdog driver via the
-/dev/watchdog special device file that userspace is still alive, at
-regular intervals.  When such a notification occurs, the driver will
-usually tell the hardware watchdog that everything is in order, and
-that the watchdog should wait for yet another little while to reset
-the system.  If userspace fails (RAM error, kernel bug, whatever), the
-notifications cease to occur, and the hardware watchdog will reset the
-system (causing a reboot) after the timeout occurs.
-
-The Linux watchdog API is a rather ad-hoc construction and different
-drivers implement different, and sometimes incompatible, parts of it.
-This file is an attempt to document the existing usage and allow
-future driver writers to use it as a reference.
-
-The simplest API:
-
-All drivers support the basic mode of operation, where the watchdog
-activates as soon as /dev/watchdog is opened and will reboot unless
-the watchdog is pinged within a certain time, this time is called the
-timeout or margin.  The simplest way to ping the watchdog is to write
-some data to the device.  So a very simple watchdog daemon would look
-like this source file:  see samples/watchdog/watchdog-simple.c
-
-A more advanced driver could for example check that a HTTP server is
-still responding before doing the write call to ping the watchdog.
-
-When the device is closed, the watchdog is disabled, unless the "Magic
-Close" feature is supported (see below).  This is not always such a
-good idea, since if there is a bug in the watchdog daemon and it
-crashes the system will not reboot.  Because of this, some of the
-drivers support the configuration option "Disable watchdog shutdown on
-close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
-the kernel, there is no way of disabling the watchdog once it has been
-started.  So, if the watchdog daemon crashes, the system will reboot
-after the timeout has passed. Watchdog devices also usually support
-the nowayout module parameter so that this option can be controlled at
-runtime.
-
-Magic Close feature:
-
-If a driver supports "Magic Close", the driver will not disable the
-watchdog unless a specific magic character 'V' has been sent to
-/dev/watchdog just before closing the file.  If the userspace daemon
-closes the file without sending this special character, the driver
-will assume that the daemon (and userspace in general) died, and will
-stop pinging the watchdog without disabling it first.  This will then
-cause a reboot if the watchdog is not re-opened in sufficient time.
-
-The ioctl API:
-
-All conforming drivers also support an ioctl API.
-
-Pinging the watchdog using an ioctl:
-
-All drivers that have an ioctl interface support at least one ioctl,
-KEEPALIVE.  This ioctl does exactly the same thing as a write to the
-watchdog device, so the main loop in the above program could be
-replaced with:
-
-	while (1) {
-		ioctl(fd, WDIOC_KEEPALIVE, 0);
-		sleep(10);
-	}
-
-the argument to the ioctl is ignored.
-
-Setting and getting the timeout:
-
-For some drivers it is possible to modify the watchdog timeout on the
-fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
-flag set in their option field.  The argument is an integer
-representing the timeout in seconds.  The driver returns the real
-timeout used in the same variable, and this timeout might differ from
-the requested one due to limitation of the hardware.
-
-    int timeout = 45;
-    ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
-    printf("The timeout was set to %d seconds\n", timeout);
-
-This example might actually print "The timeout was set to 60 seconds"
-if the device has a granularity of minutes for its timeout.
-
-Starting with the Linux 2.4.18 kernel, it is possible to query the
-current timeout using the GETTIMEOUT ioctl.
-
-    ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
-    printf("The timeout was is %d seconds\n", timeout);
-
-Pretimeouts:
-
-Some watchdog timers can be set to have a trigger go off before the
-actual time they will reset the system.  This can be done with an NMI,
-interrupt, or other mechanism.  This allows Linux to record useful
-information (like panic information and kernel coredumps) before it
-resets.
-
-    pretimeout = 10;
-    ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
-
-Note that the pretimeout is the number of seconds before the time
-when the timeout will go off.  It is not the number of seconds until
-the pretimeout.  So, for instance, if you set the timeout to 60 seconds
-and the pretimeout to 10 seconds, the pretimeout will go off in 50
-seconds.  Setting a pretimeout to zero disables it.
-
-There is also a get function for getting the pretimeout:
-
-    ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
-    printf("The pretimeout was is %d seconds\n", timeout);
-
-Not all watchdog drivers will support a pretimeout.
-
-Get the number of seconds before reboot:
-
-Some watchdog drivers have the ability to report the remaining time
-before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
-that returns the number of seconds before reboot.
-
-    ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
-    printf("The timeout was is %d seconds\n", timeleft);
-
-Environmental monitoring:
-
-All watchdog drivers are required return more information about the system,
-some do temperature, fan and power level monitoring, some can tell you
-the reason for the last reboot of the system.  The GETSUPPORT ioctl is
-available to ask what the device can do:
-
-	struct watchdog_info ident;
-	ioctl(fd, WDIOC_GETSUPPORT, &ident);
-
-the fields returned in the ident struct are:
-
-        identity		a string identifying the watchdog driver
-	firmware_version	the firmware version of the card if available
-	options			a flags describing what the device supports
-
-the options field can have the following bits set, and describes what
-kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
-return.   [FIXME -- Is this correct?]
-
-	WDIOF_OVERHEAT		Reset due to CPU overheat
-
-The machine was last rebooted by the watchdog because the thermal limit was
-exceeded
-
-	WDIOF_FANFAULT		Fan failed
-
-A system fan monitored by the watchdog card has failed
-
-	WDIOF_EXTERN1		External relay 1
-
-External monitoring relay/source 1 was triggered. Controllers intended for
-real world applications include external monitoring pins that will trigger
-a reset.
-
-	WDIOF_EXTERN2		External relay 2
-
-External monitoring relay/source 2 was triggered
-
-	WDIOF_POWERUNDER	Power bad/power fault
-
-The machine is showing an undervoltage status
-
-	WDIOF_CARDRESET		Card previously reset the CPU
-
-The last reboot was caused by the watchdog card
-
-	WDIOF_POWEROVER		Power over voltage
-
-The machine is showing an overvoltage status. Note that if one level is
-under and one over both bits will be set - this may seem odd but makes
-sense.
-
-	WDIOF_KEEPALIVEPING	Keep alive ping reply
-
-The watchdog saw a keepalive ping since it was last queried.
-
-	WDIOF_SETTIMEOUT	Can set/get the timeout
-
-The watchdog can do pretimeouts.
-
-	WDIOF_PRETIMEOUT	Pretimeout (in seconds), get/set
-
-
-For those drivers that return any bits set in the option field, the
-GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
-status, and the status at the last reboot, respectively.  
-
-    int flags;
-    ioctl(fd, WDIOC_GETSTATUS, &flags);
-
-    or
-
-    ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
-
-Note that not all devices support these two calls, and some only
-support the GETBOOTSTATUS call.
-
-Some drivers can measure the temperature using the GETTEMP ioctl.  The
-returned value is the temperature in degrees fahrenheit.
-
-    int temperature;
-    ioctl(fd, WDIOC_GETTEMP, &temperature);
-
-Finally the SETOPTIONS ioctl can be used to control some aspects of
-the cards operation.
-
-    int options = 0;
-    ioctl(fd, WDIOC_SETOPTIONS, &options);
-
-The following options are available:
-
-	WDIOS_DISABLECARD	Turn off the watchdog timer
-	WDIOS_ENABLECARD	Turn on the watchdog timer
-	WDIOS_TEMPPANIC		Kernel panic on temperature trip
-
-[FIXME -- better explanations]
-
diff --git a/Documentation/watchdog/watchdog-kernel-api.rst b/Documentation/watchdog/watchdog-kernel-api.rst
new file mode 100644
index 000000000000..864edbe932c1
--- /dev/null
+++ b/Documentation/watchdog/watchdog-kernel-api.rst
@@ -0,0 +1,338 @@
+===============================================
+The Linux WatchDog Timer Driver Core kernel API
+===============================================
+
+Last reviewed: 12-Feb-2013
+
+Wim Van Sebroeck <wim@iguana.be>
+
+Introduction
+------------
+This document does not describe what a WatchDog Timer (WDT) Driver or Device is.
+It also does not describe the API which can be used by user space to communicate
+with a WatchDog Timer. If you want to know this then please read the following
+file: Documentation/watchdog/watchdog-api.rst .
+
+So what does this document describe? It describes the API that can be used by
+WatchDog Timer Drivers that want to use the WatchDog Timer Driver Core
+Framework. This framework provides all interfacing towards user space so that
+the same code does not have to be reproduced each time. This also means that
+a watchdog timer driver then only needs to provide the different routines
+(operations) that control the watchdog timer (WDT).
+
+The API
+-------
+Each watchdog timer driver that wants to use the WatchDog Timer Driver Core
+must #include <linux/watchdog.h> (you would have to do this anyway when
+writing a watchdog device driver). This include file contains following
+register/unregister routines::
+
+	extern int watchdog_register_device(struct watchdog_device *);
+	extern void watchdog_unregister_device(struct watchdog_device *);
+
+The watchdog_register_device routine registers a watchdog timer device.
+The parameter of this routine is a pointer to a watchdog_device structure.
+This routine returns zero on success and a negative errno code for failure.
+
+The watchdog_unregister_device routine deregisters a registered watchdog timer
+device. The parameter of this routine is the pointer to the registered
+watchdog_device structure.
+
+The watchdog subsystem includes an registration deferral mechanism,
+which allows you to register an watchdog as early as you wish during
+the boot process.
+
+The watchdog device structure looks like this::
+
+  struct watchdog_device {
+	int id;
+	struct device *parent;
+	const struct attribute_group **groups;
+	const struct watchdog_info *info;
+	const struct watchdog_ops *ops;
+	const struct watchdog_governor *gov;
+	unsigned int bootstatus;
+	unsigned int timeout;
+	unsigned int pretimeout;
+	unsigned int min_timeout;
+	unsigned int max_timeout;
+	unsigned int min_hw_heartbeat_ms;
+	unsigned int max_hw_heartbeat_ms;
+	struct notifier_block reboot_nb;
+	struct notifier_block restart_nb;
+	void *driver_data;
+	struct watchdog_core_data *wd_data;
+	unsigned long status;
+	struct list_head deferred;
+  };
+
+It contains following fields:
+
+* id: set by watchdog_register_device, id 0 is special. It has both a
+  /dev/watchdog0 cdev (dynamic major, minor 0) as well as the old
+  /dev/watchdog miscdev. The id is set automatically when calling
+  watchdog_register_device.
+* parent: set this to the parent device (or NULL) before calling
+  watchdog_register_device.
+* groups: List of sysfs attribute groups to create when creating the watchdog
+  device.
+* info: a pointer to a watchdog_info structure. This structure gives some
+  additional information about the watchdog timer itself. (Like it's unique name)
+* ops: a pointer to the list of watchdog operations that the watchdog supports.
+* gov: a pointer to the assigned watchdog device pretimeout governor or NULL.
+* timeout: the watchdog timer's timeout value (in seconds).
+  This is the time after which the system will reboot if user space does
+  not send a heartbeat request if WDOG_ACTIVE is set.
+* pretimeout: the watchdog timer's pretimeout value (in seconds).
+* min_timeout: the watchdog timer's minimum timeout value (in seconds).
+  If set, the minimum configurable value for 'timeout'.
+* max_timeout: the watchdog timer's maximum timeout value (in seconds),
+  as seen from userspace. If set, the maximum configurable value for
+  'timeout'. Not used if max_hw_heartbeat_ms is non-zero.
+* min_hw_heartbeat_ms: Hardware limit for minimum time between heartbeats,
+  in milli-seconds. This value is normally 0; it should only be provided
+  if the hardware can not tolerate lower intervals between heartbeats.
+* max_hw_heartbeat_ms: Maximum hardware heartbeat, in milli-seconds.
+  If set, the infrastructure will send heartbeats to the watchdog driver
+  if 'timeout' is larger than max_hw_heartbeat_ms, unless WDOG_ACTIVE
+  is set and userspace failed to send a heartbeat for at least 'timeout'
+  seconds. max_hw_heartbeat_ms must be set if a driver does not implement
+  the stop function.
+* reboot_nb: notifier block that is registered for reboot notifications, for
+  internal use only. If the driver calls watchdog_stop_on_reboot, watchdog core
+  will stop the watchdog on such notifications.
+* restart_nb: notifier block that is registered for machine restart, for
+  internal use only. If a watchdog is capable of restarting the machine, it
+  should define ops->restart. Priority can be changed through
+  watchdog_set_restart_priority.
+* bootstatus: status of the device after booting (reported with watchdog
+  WDIOF_* status bits).
+* driver_data: a pointer to the drivers private data of a watchdog device.
+  This data should only be accessed via the watchdog_set_drvdata and
+  watchdog_get_drvdata routines.
+* wd_data: a pointer to watchdog core internal data.
+* status: this field contains a number of status bits that give extra
+  information about the status of the device (Like: is the watchdog timer
+  running/active, or is the nowayout bit set).
+* deferred: entry in wtd_deferred_reg_list which is used to
+  register early initialized watchdogs.
+
+The list of watchdog operations is defined as::
+
+  struct watchdog_ops {
+	struct module *owner;
+	/* mandatory operations */
+	int (*start)(struct watchdog_device *);
+	int (*stop)(struct watchdog_device *);
+	/* optional operations */
+	int (*ping)(struct watchdog_device *);
+	unsigned int (*status)(struct watchdog_device *);
+	int (*set_timeout)(struct watchdog_device *, unsigned int);
+	int (*set_pretimeout)(struct watchdog_device *, unsigned int);
+	unsigned int (*get_timeleft)(struct watchdog_device *);
+	int (*restart)(struct watchdog_device *);
+	long (*ioctl)(struct watchdog_device *, unsigned int, unsigned long);
+  };
+
+It is important that you first define the module owner of the watchdog timer
+driver's operations. This module owner will be used to lock the module when
+the watchdog is active. (This to avoid a system crash when you unload the
+module and /dev/watchdog is still open).
+
+Some operations are mandatory and some are optional. The mandatory operations
+are:
+
+* start: this is a pointer to the routine that starts the watchdog timer
+  device.
+  The routine needs a pointer to the watchdog timer device structure as a
+  parameter. It returns zero on success or a negative errno code for failure.
+
+Not all watchdog timer hardware supports the same functionality. That's why
+all other routines/operations are optional. They only need to be provided if
+they are supported. These optional routines/operations are:
+
+* stop: with this routine the watchdog timer device is being stopped.
+
+  The routine needs a pointer to the watchdog timer device structure as a
+  parameter. It returns zero on success or a negative errno code for failure.
+  Some watchdog timer hardware can only be started and not be stopped. A
+  driver supporting such hardware does not have to implement the stop routine.
+
+  If a driver has no stop function, the watchdog core will set WDOG_HW_RUNNING
+  and start calling the driver's keepalive pings function after the watchdog
+  device is closed.
+
+  If a watchdog driver does not implement the stop function, it must set
+  max_hw_heartbeat_ms.
+* ping: this is the routine that sends a keepalive ping to the watchdog timer
+  hardware.
+
+  The routine needs a pointer to the watchdog timer device structure as a
+  parameter. It returns zero on success or a negative errno code for failure.
+
+  Most hardware that does not support this as a separate function uses the
+  start function to restart the watchdog timer hardware. And that's also what
+  the watchdog timer driver core does: to send a keepalive ping to the watchdog
+  timer hardware it will either use the ping operation (when available) or the
+  start operation (when the ping operation is not available).
+
+  (Note: the WDIOC_KEEPALIVE ioctl call will only be active when the
+  WDIOF_KEEPALIVEPING bit has been set in the option field on the watchdog's
+  info structure).
+* status: this routine checks the status of the watchdog timer device. The
+  status of the device is reported with watchdog WDIOF_* status flags/bits.
+
+  WDIOF_MAGICCLOSE and WDIOF_KEEPALIVEPING are reported by the watchdog core;
+  it is not necessary to report those bits from the driver. Also, if no status
+  function is provided by the driver, the watchdog core reports the status bits
+  provided in the bootstatus variable of struct watchdog_device.
+
+* set_timeout: this routine checks and changes the timeout of the watchdog
+  timer device. It returns 0 on success, -EINVAL for "parameter out of range"
+  and -EIO for "could not write value to the watchdog". On success this
+  routine should set the timeout value of the watchdog_device to the
+  achieved timeout value (which may be different from the requested one
+  because the watchdog does not necessarily have a 1 second resolution).
+
+  Drivers implementing max_hw_heartbeat_ms set the hardware watchdog heartbeat
+  to the minimum of timeout and max_hw_heartbeat_ms. Those drivers set the
+  timeout value of the watchdog_device either to the requested timeout value
+  (if it is larger than max_hw_heartbeat_ms), or to the achieved timeout value.
+  (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
+  watchdog's info structure).
+
+  If the watchdog driver does not have to perform any action but setting the
+  watchdog_device.timeout, this callback can be omitted.
+
+  If set_timeout is not provided but, WDIOF_SETTIMEOUT is set, the watchdog
+  infrastructure updates the timeout value of the watchdog_device internally
+  to the requested value.
+
+  If the pretimeout feature is used (WDIOF_PRETIMEOUT), then set_timeout must
+  also take care of checking if pretimeout is still valid and set up the timer
+  accordingly. This can't be done in the core without races, so it is the
+  duty of the driver.
+* set_pretimeout: this routine checks and changes the pretimeout value of
+  the watchdog. It is optional because not all watchdogs support pretimeout
+  notification. The timeout value is not an absolute time, but the number of
+  seconds before the actual timeout would happen. It returns 0 on success,
+  -EINVAL for "parameter out of range" and -EIO for "could not write value to
+  the watchdog". A value of 0 disables pretimeout notification.
+
+  (Note: the WDIOF_PRETIMEOUT needs to be set in the options field of the
+  watchdog's info structure).
+
+  If the watchdog driver does not have to perform any action but setting the
+  watchdog_device.pretimeout, this callback can be omitted. That means if
+  set_pretimeout is not provided but WDIOF_PRETIMEOUT is set, the watchdog
+  infrastructure updates the pretimeout value of the watchdog_device internally
+  to the requested value.
+
+* get_timeleft: this routines returns the time that's left before a reset.
+* restart: this routine restarts the machine. It returns 0 on success or a
+  negative errno code for failure.
+* ioctl: if this routine is present then it will be called first before we do
+  our own internal ioctl call handling. This routine should return -ENOIOCTLCMD
+  if a command is not supported. The parameters that are passed to the ioctl
+  call are: watchdog_device, cmd and arg.
+
+The status bits should (preferably) be set with the set_bit and clear_bit alike
+bit-operations. The status bits that are defined are:
+
+* WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device
+  is active or not from user perspective. User space is expected to send
+  heartbeat requests to the driver while this flag is set.
+* WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog.
+  If this bit is set then the watchdog timer will not be able to stop.
+* WDOG_HW_RUNNING: Set by the watchdog driver if the hardware watchdog is
+  running. The bit must be set if the watchdog timer hardware can not be
+  stopped. The bit may also be set if the watchdog timer is running after
+  booting, before the watchdog device is opened. If set, the watchdog
+  infrastructure will send keepalives to the watchdog hardware while
+  WDOG_ACTIVE is not set.
+  Note: when you register the watchdog timer device with this bit set,
+  then opening /dev/watchdog will skip the start operation but send a keepalive
+  request instead.
+
+  To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog
+  timer device) you can either:
+
+  * set it statically in your watchdog_device struct with
+
+	.status = WATCHDOG_NOWAYOUT_INIT_STATUS,
+
+    (this will set the value the same as CONFIG_WATCHDOG_NOWAYOUT) or
+  * use the following helper function::
+
+	static inline void watchdog_set_nowayout(struct watchdog_device *wdd,
+						 int nowayout)
+
+Note:
+   The WatchDog Timer Driver Core supports the magic close feature and
+   the nowayout feature. To use the magic close feature you must set the
+   WDIOF_MAGICCLOSE bit in the options field of the watchdog's info structure.
+
+The nowayout feature will overrule the magic close feature.
+
+To get or set driver specific data the following two helper functions should be
+used::
+
+  static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
+					  void *data)
+  static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
+
+The watchdog_set_drvdata function allows you to add driver specific data. The
+arguments of this function are the watchdog device where you want to add the
+driver specific data to and a pointer to the data itself.
+
+The watchdog_get_drvdata function allows you to retrieve driver specific data.
+The argument of this function is the watchdog device where you want to retrieve
+data from. The function returns the pointer to the driver specific data.
+
+To initialize the timeout field, the following function can be used::
+
+  extern int watchdog_init_timeout(struct watchdog_device *wdd,
+                                   unsigned int timeout_parm,
+                                   struct device *dev);
+
+The watchdog_init_timeout function allows you to initialize the timeout field
+using the module timeout parameter or by retrieving the timeout-sec property from
+the device tree (if the module timeout parameter is invalid). Best practice is
+to set the default timeout value as timeout value in the watchdog_device and
+then use this function to set the user "preferred" timeout value.
+This routine returns zero on success and a negative errno code for failure.
+
+To disable the watchdog on reboot, the user must call the following helper::
+
+  static inline void watchdog_stop_on_reboot(struct watchdog_device *wdd);
+
+To disable the watchdog when unregistering the watchdog, the user must call
+the following helper. Note that this will only stop the watchdog if the
+nowayout flag is not set.
+
+::
+
+  static inline void watchdog_stop_on_unregister(struct watchdog_device *wdd);
+
+To change the priority of the restart handler the following helper should be
+used::
+
+  void watchdog_set_restart_priority(struct watchdog_device *wdd, int priority);
+
+User should follow the following guidelines for setting the priority:
+
+* 0: should be called in last resort, has limited restart capabilities
+* 128: default restart handler, use if no other handler is expected to be
+  available, and/or if restart is sufficient to restart the entire system
+* 255: highest priority, will preempt all other restart handlers
+
+To raise a pretimeout notification, the following function should be used::
+
+  void watchdog_notify_pretimeout(struct watchdog_device *wdd)
+
+The function can be called in the interrupt context. If watchdog pretimeout
+governor framework (kbuild CONFIG_WATCHDOG_PRETIMEOUT_GOV symbol) is enabled,
+an action is taken by a preconfigured pretimeout governor preassigned to
+the watchdog device. If watchdog pretimeout governor framework is not
+enabled, watchdog_notify_pretimeout() prints a notification message to
+the kernel log buffer.
diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
deleted file mode 100644
index 3a91ef5af044..000000000000
--- a/Documentation/watchdog/watchdog-kernel-api.txt
+++ /dev/null
@@ -1,305 +0,0 @@
-The Linux WatchDog Timer Driver Core kernel API.
-===============================================
-Last reviewed: 12-Feb-2013
-
-Wim Van Sebroeck <wim@iguana.be>
-
-Introduction
-------------
-This document does not describe what a WatchDog Timer (WDT) Driver or Device is.
-It also does not describe the API which can be used by user space to communicate
-with a WatchDog Timer. If you want to know this then please read the following
-file: Documentation/watchdog/watchdog-api.txt .
-
-So what does this document describe? It describes the API that can be used by
-WatchDog Timer Drivers that want to use the WatchDog Timer Driver Core
-Framework. This framework provides all interfacing towards user space so that
-the same code does not have to be reproduced each time. This also means that
-a watchdog timer driver then only needs to provide the different routines
-(operations) that control the watchdog timer (WDT).
-
-The API
--------
-Each watchdog timer driver that wants to use the WatchDog Timer Driver Core
-must #include <linux/watchdog.h> (you would have to do this anyway when
-writing a watchdog device driver). This include file contains following
-register/unregister routines:
-
-extern int watchdog_register_device(struct watchdog_device *);
-extern void watchdog_unregister_device(struct watchdog_device *);
-
-The watchdog_register_device routine registers a watchdog timer device.
-The parameter of this routine is a pointer to a watchdog_device structure.
-This routine returns zero on success and a negative errno code for failure.
-
-The watchdog_unregister_device routine deregisters a registered watchdog timer
-device. The parameter of this routine is the pointer to the registered
-watchdog_device structure.
-
-The watchdog subsystem includes an registration deferral mechanism,
-which allows you to register an watchdog as early as you wish during
-the boot process.
-
-The watchdog device structure looks like this:
-
-struct watchdog_device {
-	int id;
-	struct device *parent;
-	const struct attribute_group **groups;
-	const struct watchdog_info *info;
-	const struct watchdog_ops *ops;
-	const struct watchdog_governor *gov;
-	unsigned int bootstatus;
-	unsigned int timeout;
-	unsigned int pretimeout;
-	unsigned int min_timeout;
-	unsigned int max_timeout;
-	unsigned int min_hw_heartbeat_ms;
-	unsigned int max_hw_heartbeat_ms;
-	struct notifier_block reboot_nb;
-	struct notifier_block restart_nb;
-	void *driver_data;
-	struct watchdog_core_data *wd_data;
-	unsigned long status;
-	struct list_head deferred;
-};
-
-It contains following fields:
-* id: set by watchdog_register_device, id 0 is special. It has both a
-  /dev/watchdog0 cdev (dynamic major, minor 0) as well as the old
-  /dev/watchdog miscdev. The id is set automatically when calling
-  watchdog_register_device.
-* parent: set this to the parent device (or NULL) before calling
-  watchdog_register_device.
-* groups: List of sysfs attribute groups to create when creating the watchdog
-  device.
-* info: a pointer to a watchdog_info structure. This structure gives some
-  additional information about the watchdog timer itself. (Like it's unique name)
-* ops: a pointer to the list of watchdog operations that the watchdog supports.
-* gov: a pointer to the assigned watchdog device pretimeout governor or NULL.
-* timeout: the watchdog timer's timeout value (in seconds).
-  This is the time after which the system will reboot if user space does
-  not send a heartbeat request if WDOG_ACTIVE is set.
-* pretimeout: the watchdog timer's pretimeout value (in seconds).
-* min_timeout: the watchdog timer's minimum timeout value (in seconds).
-  If set, the minimum configurable value for 'timeout'.
-* max_timeout: the watchdog timer's maximum timeout value (in seconds),
-  as seen from userspace. If set, the maximum configurable value for
-  'timeout'. Not used if max_hw_heartbeat_ms is non-zero.
-* min_hw_heartbeat_ms: Hardware limit for minimum time between heartbeats,
-  in milli-seconds. This value is normally 0; it should only be provided
-  if the hardware can not tolerate lower intervals between heartbeats.
-* max_hw_heartbeat_ms: Maximum hardware heartbeat, in milli-seconds.
-  If set, the infrastructure will send heartbeats to the watchdog driver
-  if 'timeout' is larger than max_hw_heartbeat_ms, unless WDOG_ACTIVE
-  is set and userspace failed to send a heartbeat for at least 'timeout'
-  seconds. max_hw_heartbeat_ms must be set if a driver does not implement
-  the stop function.
-* reboot_nb: notifier block that is registered for reboot notifications, for
-  internal use only. If the driver calls watchdog_stop_on_reboot, watchdog core
-  will stop the watchdog on such notifications.
-* restart_nb: notifier block that is registered for machine restart, for
-  internal use only. If a watchdog is capable of restarting the machine, it
-  should define ops->restart. Priority can be changed through
-  watchdog_set_restart_priority.
-* bootstatus: status of the device after booting (reported with watchdog
-  WDIOF_* status bits).
-* driver_data: a pointer to the drivers private data of a watchdog device.
-  This data should only be accessed via the watchdog_set_drvdata and
-  watchdog_get_drvdata routines.
-* wd_data: a pointer to watchdog core internal data.
-* status: this field contains a number of status bits that give extra
-  information about the status of the device (Like: is the watchdog timer
-  running/active, or is the nowayout bit set).
-* deferred: entry in wtd_deferred_reg_list which is used to
-  register early initialized watchdogs.
-
-The list of watchdog operations is defined as:
-
-struct watchdog_ops {
-	struct module *owner;
-	/* mandatory operations */
-	int (*start)(struct watchdog_device *);
-	int (*stop)(struct watchdog_device *);
-	/* optional operations */
-	int (*ping)(struct watchdog_device *);
-	unsigned int (*status)(struct watchdog_device *);
-	int (*set_timeout)(struct watchdog_device *, unsigned int);
-	int (*set_pretimeout)(struct watchdog_device *, unsigned int);
-	unsigned int (*get_timeleft)(struct watchdog_device *);
-	int (*restart)(struct watchdog_device *);
-	long (*ioctl)(struct watchdog_device *, unsigned int, unsigned long);
-};
-
-It is important that you first define the module owner of the watchdog timer
-driver's operations. This module owner will be used to lock the module when
-the watchdog is active. (This to avoid a system crash when you unload the
-module and /dev/watchdog is still open).
-
-Some operations are mandatory and some are optional. The mandatory operations
-are:
-* start: this is a pointer to the routine that starts the watchdog timer
-  device.
-  The routine needs a pointer to the watchdog timer device structure as a
-  parameter. It returns zero on success or a negative errno code for failure.
-
-Not all watchdog timer hardware supports the same functionality. That's why
-all other routines/operations are optional. They only need to be provided if
-they are supported. These optional routines/operations are:
-* stop: with this routine the watchdog timer device is being stopped.
-  The routine needs a pointer to the watchdog timer device structure as a
-  parameter. It returns zero on success or a negative errno code for failure.
-  Some watchdog timer hardware can only be started and not be stopped. A
-  driver supporting such hardware does not have to implement the stop routine.
-  If a driver has no stop function, the watchdog core will set WDOG_HW_RUNNING
-  and start calling the driver's keepalive pings function after the watchdog
-  device is closed.
-  If a watchdog driver does not implement the stop function, it must set
-  max_hw_heartbeat_ms.
-* ping: this is the routine that sends a keepalive ping to the watchdog timer
-  hardware.
-  The routine needs a pointer to the watchdog timer device structure as a
-  parameter. It returns zero on success or a negative errno code for failure.
-  Most hardware that does not support this as a separate function uses the
-  start function to restart the watchdog timer hardware. And that's also what
-  the watchdog timer driver core does: to send a keepalive ping to the watchdog
-  timer hardware it will either use the ping operation (when available) or the
-  start operation (when the ping operation is not available).
-  (Note: the WDIOC_KEEPALIVE ioctl call will only be active when the
-  WDIOF_KEEPALIVEPING bit has been set in the option field on the watchdog's
-  info structure).
-* status: this routine checks the status of the watchdog timer device. The
-  status of the device is reported with watchdog WDIOF_* status flags/bits.
-  WDIOF_MAGICCLOSE and WDIOF_KEEPALIVEPING are reported by the watchdog core;
-  it is not necessary to report those bits from the driver. Also, if no status
-  function is provided by the driver, the watchdog core reports the status bits
-  provided in the bootstatus variable of struct watchdog_device.
-* set_timeout: this routine checks and changes the timeout of the watchdog
-  timer device. It returns 0 on success, -EINVAL for "parameter out of range"
-  and -EIO for "could not write value to the watchdog". On success this
-  routine should set the timeout value of the watchdog_device to the
-  achieved timeout value (which may be different from the requested one
-  because the watchdog does not necessarily have a 1 second resolution).
-  Drivers implementing max_hw_heartbeat_ms set the hardware watchdog heartbeat
-  to the minimum of timeout and max_hw_heartbeat_ms. Those drivers set the
-  timeout value of the watchdog_device either to the requested timeout value
-  (if it is larger than max_hw_heartbeat_ms), or to the achieved timeout value.
-  (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
-  watchdog's info structure).
-  If the watchdog driver does not have to perform any action but setting the
-  watchdog_device.timeout, this callback can be omitted.
-  If set_timeout is not provided but, WDIOF_SETTIMEOUT is set, the watchdog
-  infrastructure updates the timeout value of the watchdog_device internally
-  to the requested value.
-  If the pretimeout feature is used (WDIOF_PRETIMEOUT), then set_timeout must
-  also take care of checking if pretimeout is still valid and set up the timer
-  accordingly. This can't be done in the core without races, so it is the
-  duty of the driver.
-* set_pretimeout: this routine checks and changes the pretimeout value of
-  the watchdog. It is optional because not all watchdogs support pretimeout
-  notification. The timeout value is not an absolute time, but the number of
-  seconds before the actual timeout would happen. It returns 0 on success,
-  -EINVAL for "parameter out of range" and -EIO for "could not write value to
-  the watchdog". A value of 0 disables pretimeout notification.
-  (Note: the WDIOF_PRETIMEOUT needs to be set in the options field of the
-  watchdog's info structure).
-  If the watchdog driver does not have to perform any action but setting the
-  watchdog_device.pretimeout, this callback can be omitted. That means if
-  set_pretimeout is not provided but WDIOF_PRETIMEOUT is set, the watchdog
-  infrastructure updates the pretimeout value of the watchdog_device internally
-  to the requested value.
-* get_timeleft: this routines returns the time that's left before a reset.
-* restart: this routine restarts the machine. It returns 0 on success or a
-  negative errno code for failure.
-* ioctl: if this routine is present then it will be called first before we do
-  our own internal ioctl call handling. This routine should return -ENOIOCTLCMD
-  if a command is not supported. The parameters that are passed to the ioctl
-  call are: watchdog_device, cmd and arg.
-
-The status bits should (preferably) be set with the set_bit and clear_bit alike
-bit-operations. The status bits that are defined are:
-* WDOG_ACTIVE: this status bit indicates whether or not a watchdog timer device
-  is active or not from user perspective. User space is expected to send
-  heartbeat requests to the driver while this flag is set.
-* WDOG_NO_WAY_OUT: this bit stores the nowayout setting for the watchdog.
-  If this bit is set then the watchdog timer will not be able to stop.
-* WDOG_HW_RUNNING: Set by the watchdog driver if the hardware watchdog is
-  running. The bit must be set if the watchdog timer hardware can not be
-  stopped. The bit may also be set if the watchdog timer is running after
-  booting, before the watchdog device is opened. If set, the watchdog
-  infrastructure will send keepalives to the watchdog hardware while
-  WDOG_ACTIVE is not set.
-  Note: when you register the watchdog timer device with this bit set,
-  then opening /dev/watchdog will skip the start operation but send a keepalive
-  request instead.
-
-  To set the WDOG_NO_WAY_OUT status bit (before registering your watchdog
-  timer device) you can either:
-  * set it statically in your watchdog_device struct with
-	.status = WATCHDOG_NOWAYOUT_INIT_STATUS,
-    (this will set the value the same as CONFIG_WATCHDOG_NOWAYOUT) or
-  * use the following helper function:
-  static inline void watchdog_set_nowayout(struct watchdog_device *wdd, int nowayout)
-
-Note: The WatchDog Timer Driver Core supports the magic close feature and
-the nowayout feature. To use the magic close feature you must set the
-WDIOF_MAGICCLOSE bit in the options field of the watchdog's info structure.
-The nowayout feature will overrule the magic close feature.
-
-To get or set driver specific data the following two helper functions should be
-used:
-
-static inline void watchdog_set_drvdata(struct watchdog_device *wdd, void *data)
-static inline void *watchdog_get_drvdata(struct watchdog_device *wdd)
-
-The watchdog_set_drvdata function allows you to add driver specific data. The
-arguments of this function are the watchdog device where you want to add the
-driver specific data to and a pointer to the data itself.
-
-The watchdog_get_drvdata function allows you to retrieve driver specific data.
-The argument of this function is the watchdog device where you want to retrieve
-data from. The function returns the pointer to the driver specific data.
-
-To initialize the timeout field, the following function can be used:
-
-extern int watchdog_init_timeout(struct watchdog_device *wdd,
-                                  unsigned int timeout_parm, struct device *dev);
-
-The watchdog_init_timeout function allows you to initialize the timeout field
-using the module timeout parameter or by retrieving the timeout-sec property from
-the device tree (if the module timeout parameter is invalid). Best practice is
-to set the default timeout value as timeout value in the watchdog_device and
-then use this function to set the user "preferred" timeout value.
-This routine returns zero on success and a negative errno code for failure.
-
-To disable the watchdog on reboot, the user must call the following helper:
-
-static inline void watchdog_stop_on_reboot(struct watchdog_device *wdd);
-
-To disable the watchdog when unregistering the watchdog, the user must call
-the following helper. Note that this will only stop the watchdog if the
-nowayout flag is not set.
-
-static inline void watchdog_stop_on_unregister(struct watchdog_device *wdd);
-
-To change the priority of the restart handler the following helper should be
-used:
-
-void watchdog_set_restart_priority(struct watchdog_device *wdd, int priority);
-
-User should follow the following guidelines for setting the priority:
-* 0: should be called in last resort, has limited restart capabilities
-* 128: default restart handler, use if no other handler is expected to be
-  available, and/or if restart is sufficient to restart the entire system
-* 255: highest priority, will preempt all other restart handlers
-
-To raise a pretimeout notification, the following function should be used:
-
-void watchdog_notify_pretimeout(struct watchdog_device *wdd)
-
-The function can be called in the interrupt context. If watchdog pretimeout
-governor framework (kbuild CONFIG_WATCHDOG_PRETIMEOUT_GOV symbol) is enabled,
-an action is taken by a preconfigured pretimeout governor preassigned to
-the watchdog device. If watchdog pretimeout governor framework is not
-enabled, watchdog_notify_pretimeout() prints a notification message to
-the kernel log buffer.
diff --git a/Documentation/watchdog/watchdog-parameters.rst b/Documentation/watchdog/watchdog-parameters.rst
new file mode 100644
index 000000000000..b121caae7798
--- /dev/null
+++ b/Documentation/watchdog/watchdog-parameters.rst
@@ -0,0 +1,736 @@
+==========================
+WatchDog Module Parameters
+==========================
+
+This file provides information on the module parameters of many of
+the Linux watchdog drivers.  Watchdog driver parameter specs should
+be listed here unless the driver has its own driver-specific information
+file.
+
+See Documentation/admin-guide/kernel-parameters.rst for information on
+providing kernel parameters for builtin drivers versus loadable
+modules.
+
+-------------------------------------------------
+
+acquirewdt:
+    wdt_stop:
+	Acquire WDT 'stop' io port (default 0x43)
+    wdt_start:
+	Acquire WDT 'start' io port (default 0x443)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+advantechwdt:
+    wdt_stop:
+	Advantech WDT 'stop' io port (default 0x443)
+    wdt_start:
+	Advantech WDT 'start' io port (default 0x443)
+    timeout:
+	Watchdog timeout in seconds. 1<= timeout <=63, default=60.
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+alim1535_wdt:
+    timeout:
+	Watchdog timeout in seconds. (0 < timeout < 18000, default=60
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+alim7101_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=3600, default=30
+    use_gpio:
+	Use the gpio watchdog (required by old cobalt boards).
+	default=0/off/no
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ar7_wdt:
+    margin:
+	Watchdog margin in seconds (default=60)
+    nowayout:
+	Disable watchdog shutdown on close
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+armada_37xx_wdt:
+    timeout:
+	Watchdog timeout in seconds. (default=120)
+    nowayout:
+	Disable watchdog shutdown on close
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+at91rm9200_wdt:
+    wdt_time:
+	Watchdog time in seconds. (default=5)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+at91sam9_wdt:
+    heartbeat:
+	Watchdog heartbeats in seconds. (default = 15)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+bcm47xx_wdt:
+    wdt_time:
+	Watchdog time in seconds. (default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+coh901327_wdt:
+    margin:
+	Watchdog margin in seconds (default 60s)
+
+-------------------------------------------------
+
+cpu5wdt:
+    port:
+	base address of watchdog card, default is 0x91
+    verbose:
+	be verbose, default is 0 (no)
+    ticks:
+	count down ticks, default is 10000
+
+-------------------------------------------------
+
+cpwd:
+    wd0_timeout:
+	Default watchdog0 timeout in 1/10secs
+    wd1_timeout:
+	Default watchdog1 timeout in 1/10secs
+    wd2_timeout:
+	Default watchdog2 timeout in 1/10secs
+
+-------------------------------------------------
+
+da9052wdt:
+    timeout:
+	Watchdog timeout in seconds. 2<= timeout <=131, default=2.048s
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+davinci_wdt:
+    heartbeat:
+	Watchdog heartbeat period in seconds from 1 to 600, default 60
+
+-------------------------------------------------
+
+ebc-c384_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=15300, default=60)
+    nowayout:
+	Watchdog cannot be stopped once started
+
+-------------------------------------------------
+
+ep93xx_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=3600, default=TBD)
+
+-------------------------------------------------
+
+eurotechwdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    io:
+	Eurotech WDT io port (default=0x3f0)
+    irq:
+	Eurotech WDT irq (default=10)
+    ev:
+	Eurotech WDT event type (default is `int`)
+
+-------------------------------------------------
+
+gef_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+geodewdt:
+    timeout:
+	Watchdog timeout in seconds. 1<= timeout <=131, default=60.
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+i6300esb:
+    heartbeat:
+	Watchdog heartbeat in seconds. (1<heartbeat<2046, default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+iTCO_wdt:
+    heartbeat:
+	Watchdog heartbeat in seconds.
+	(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+iTCO_vendor_support:
+    vendorsupport:
+	iTCO vendor specific support mode, default=0 (none),
+	1=SuperMicro Pent3, 2=SuperMicro Pent4+, 911=Broken SMI BIOS
+
+-------------------------------------------------
+
+ib700wdt:
+    timeout:
+	Watchdog timeout in seconds. 0<= timeout <=30, default=30.
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ibmasr:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+imx2_wdt:
+    timeout:
+	Watchdog timeout in seconds (default 60 s)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+indydog:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+iop_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+it8712f_wdt:
+    margin:
+	Watchdog margin in seconds (default 60)
+    nowayout:
+	Disable watchdog shutdown on close
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+it87_wdt:
+    nogameport:
+	Forbid the activation of game port, default=0
+    nocir:
+	Forbid the use of CIR (workaround for some buggy setups); set to 1 if
+system resets despite watchdog daemon running, default=0
+    exclusive:
+	Watchdog exclusive device open, default=1
+    timeout:
+	Watchdog timeout in seconds, default=60
+    testmode:
+	Watchdog test mode (1 = no reboot), default=0
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ixp4xx_wdt:
+    heartbeat:
+	Watchdog heartbeat in seconds (default 60s)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ks8695_wdt:
+    wdt_time:
+	Watchdog time in seconds. (default=5)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+machzwd:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    action:
+	after watchdog resets, generate:
+	0 = RESET(*)  1 = SMI  2 = NMI  3 = SCI
+
+-------------------------------------------------
+
+max63xx_wdt:
+    heartbeat:
+	Watchdog heartbeat period in seconds from 1 to 60, default 60
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    nodelay:
+	Force selection of a timeout setting without initial delay
+	(max6373/74 only, default=0)
+
+-------------------------------------------------
+
+mixcomwd:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+mpc8xxx_wdt:
+    timeout:
+	Watchdog timeout in ticks. (0<timeout<65536, default=65535)
+    reset:
+	Watchdog Interrupt/Reset Mode. 0 = interrupt, 1 = reset
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+mv64x60_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ni903x_wdt:
+    timeout:
+	Initial watchdog timeout in seconds (0<timeout<516, default=60)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+nic7018_wdt:
+    timeout:
+	Initial watchdog timeout in seconds (0<timeout<464, default=80)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+nuc900_wdt:
+    heartbeat:
+	Watchdog heartbeats in seconds.
+	(default = 15)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+omap_wdt:
+    timer_margin:
+	initial watchdog timeout (in seconds)
+    early_enable:
+	Watchdog is started on module insertion (default=0
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+orion_wdt:
+    heartbeat:
+	Initial watchdog heartbeat in seconds
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+pc87413_wdt:
+    io:
+	pc87413 WDT I/O port (default: io).
+    timeout:
+	Watchdog timeout in minutes (default=timeout).
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+pika_wdt:
+    heartbeat:
+	Watchdog heartbeats in seconds. (default = 15)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+pnx4008_wdt:
+    heartbeat:
+	Watchdog heartbeat period in seconds from 1 to 60, default 19
+    nowayout:
+	Set to 1 to keep watchdog running after device release
+
+-------------------------------------------------
+
+pnx833x_wdt:
+    timeout:
+	Watchdog timeout in Mhz. (68Mhz clock), default=2040000000 (30 seconds)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    start_enabled:
+	Watchdog is started on module insertion (default=1)
+
+-------------------------------------------------
+
+rc32434_wdt:
+    timeout:
+	Watchdog timeout value, in seconds (default=20)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+riowd:
+    riowd_timeout:
+	Watchdog timeout in minutes (default=1)
+
+-------------------------------------------------
+
+s3c2410_wdt:
+    tmr_margin:
+	Watchdog tmr_margin in seconds. (default=15)
+    tmr_atboot:
+	Watchdog is started at boot time if set to 1, default=0
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    soft_noboot:
+	Watchdog action, set to 1 to ignore reboots, 0 to reboot
+    debug:
+	Watchdog debug, set to >1 for debug, (default 0)
+
+-------------------------------------------------
+
+sa1100_wdt:
+    margin:
+	Watchdog margin in seconds (default 60s)
+
+-------------------------------------------------
+
+sb_wdog:
+    timeout:
+	Watchdog timeout in microseconds (max/default 8388607 or 8.3ish secs)
+
+-------------------------------------------------
+
+sbc60xxwdt:
+    wdt_stop:
+	SBC60xx WDT 'stop' io port (default 0x45)
+    wdt_start:
+	SBC60xx WDT 'start' io port (default 0x443)
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sbc7240_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=255, default=30)
+    nowayout:
+	Disable watchdog when closing device file
+
+-------------------------------------------------
+
+sbc8360:
+    timeout:
+	Index into timeout table (0-63) (default=27 (60s))
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sbc_epx_c3:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sbc_fitpc2_wdt:
+    margin:
+	Watchdog margin in seconds (default 60s)
+    nowayout:
+	Watchdog cannot be stopped once started
+
+-------------------------------------------------
+
+sbsa_gwdt:
+    timeout:
+	Watchdog timeout in seconds. (default 10s)
+    action:
+	Watchdog action at the first stage timeout,
+	set to 0 to ignore, 1 to panic. (default=0)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sc1200wdt:
+    isapnp:
+	When set to 0 driver ISA PnP support will be disabled (default=1)
+    io:
+	io port
+    timeout:
+	range is 0-255 minutes, default is 1
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sc520_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1 <= timeout <= 3600, default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sch311x_wdt:
+    force_id:
+	Override the detected device ID
+    therm_trip:
+	Should a ThermTrip trigger the reset generator
+    timeout:
+	Watchdog timeout in seconds. 1<= timeout <=15300, default=60
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+scx200_wdt:
+    margin:
+	Watchdog margin in seconds
+    nowayout:
+	Disable watchdog shutdown on close
+
+-------------------------------------------------
+
+shwdt:
+    clock_division_ratio:
+	Clock division ratio. Valid ranges are from 0x5 (1.31ms)
+	to 0x7 (5.25ms). (default=7)
+    heartbeat:
+	Watchdog heartbeat in seconds. (1 <= heartbeat <= 3600, default=30
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+smsc37b787_wdt:
+    timeout:
+	range is 1-255 units, default is 60
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+softdog:
+    soft_margin:
+	Watchdog soft_margin in seconds.
+	(0 < soft_margin < 65536, default=60)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+    soft_noboot:
+	Softdog action, set to 1 to ignore reboots, 0 to reboot
+	(default=0)
+
+-------------------------------------------------
+
+stmp3xxx_wdt:
+    heartbeat:
+	Watchdog heartbeat period in seconds from 1 to 4194304, default 19
+
+-------------------------------------------------
+
+tegra_wdt:
+    heartbeat:
+	Watchdog heartbeats in seconds. (default = 120)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+ts72xx_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1 <= timeout <= 8, default=8)
+    nowayout:
+	Disable watchdog shutdown on close
+
+-------------------------------------------------
+
+twl4030_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+txx9wdt:
+    timeout:
+	Watchdog timeout in seconds. (0<timeout<N, default=60)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+uniphier_wdt:
+    timeout:
+	Watchdog timeout in power of two seconds.
+	(1 <= timeout <= 128, default=64)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+w83627hf_wdt:
+    wdt_io:
+	w83627hf/thf WDT io port (default 0x2E)
+    timeout:
+	Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+w83877f_wdt:
+    timeout:
+	Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+w83977f_wdt:
+    timeout:
+	Watchdog timeout in seconds (15..7635), default=45)
+    testmode:
+	Watchdog testmode (1 = no reboot), default=0
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+wafer5823wdt:
+    timeout:
+	Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+wdt285:
+    soft_margin:
+	Watchdog timeout in seconds (default=60)
+
+-------------------------------------------------
+
+wdt977:
+    timeout:
+	Watchdog timeout in seconds (60..15300, default=60)
+    testmode:
+	Watchdog testmode (1 = no reboot), default=0
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+wm831x_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+wm8350_wdt:
+    nowayout:
+	Watchdog cannot be stopped once started
+	(default=kernel config parameter)
+
+-------------------------------------------------
+
+sun4v_wdt:
+    timeout_ms:
+	Watchdog timeout in milliseconds 1..180000, default=60000)
+    nowayout:
+	Watchdog cannot be stopped once started
diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
deleted file mode 100644
index 0b88e333f9e1..000000000000
--- a/Documentation/watchdog/watchdog-parameters.txt
+++ /dev/null
@@ -1,410 +0,0 @@
-This file provides information on the module parameters of many of
-the Linux watchdog drivers.  Watchdog driver parameter specs should
-be listed here unless the driver has its own driver-specific information
-file.
-
-
-See Documentation/admin-guide/kernel-parameters.rst for information on
-providing kernel parameters for builtin drivers versus loadable
-modules.
-
-
--------------------------------------------------
-acquirewdt:
-wdt_stop: Acquire WDT 'stop' io port (default 0x43)
-wdt_start: Acquire WDT 'start' io port (default 0x443)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-advantechwdt:
-wdt_stop: Advantech WDT 'stop' io port (default 0x443)
-wdt_start: Advantech WDT 'start' io port (default 0x443)
-timeout: Watchdog timeout in seconds. 1<= timeout <=63, default=60.
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-alim1535_wdt:
-timeout: Watchdog timeout in seconds. (0 < timeout < 18000, default=60
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-alim7101_wdt:
-timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30
-use_gpio: Use the gpio watchdog (required by old cobalt boards).
-	default=0/off/no
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ar7_wdt:
-margin: Watchdog margin in seconds (default=60)
-nowayout: Disable watchdog shutdown on close
-	(default=kernel config parameter)
--------------------------------------------------
-armada_37xx_wdt:
-timeout: Watchdog timeout in seconds. (default=120)
-nowayout: Disable watchdog shutdown on close
-	(default=kernel config parameter)
--------------------------------------------------
-at91rm9200_wdt:
-wdt_time: Watchdog time in seconds. (default=5)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-at91sam9_wdt:
-heartbeat: Watchdog heartbeats in seconds. (default = 15)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-bcm47xx_wdt:
-wdt_time: Watchdog time in seconds. (default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-coh901327_wdt:
-margin: Watchdog margin in seconds (default 60s)
--------------------------------------------------
-cpu5wdt:
-port: base address of watchdog card, default is 0x91
-verbose: be verbose, default is 0 (no)
-ticks: count down ticks, default is 10000
--------------------------------------------------
-cpwd:
-wd0_timeout: Default watchdog0 timeout in 1/10secs
-wd1_timeout: Default watchdog1 timeout in 1/10secs
-wd2_timeout: Default watchdog2 timeout in 1/10secs
--------------------------------------------------
-da9052wdt:
-timeout: Watchdog timeout in seconds. 2<= timeout <=131, default=2.048s
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-davinci_wdt:
-heartbeat: Watchdog heartbeat period in seconds from 1 to 600, default 60
--------------------------------------------------
-ebc-c384_wdt:
-timeout: Watchdog timeout in seconds. (1<=timeout<=15300, default=60)
-nowayout: Watchdog cannot be stopped once started
--------------------------------------------------
-ep93xx_wdt:
-nowayout: Watchdog cannot be stopped once started
-timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=TBD)
--------------------------------------------------
-eurotechwdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-io: Eurotech WDT io port (default=0x3f0)
-irq: Eurotech WDT irq (default=10)
-ev: Eurotech WDT event type (default is `int')
--------------------------------------------------
-gef_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-geodewdt:
-timeout: Watchdog timeout in seconds. 1<= timeout <=131, default=60.
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-i6300esb:
-heartbeat: Watchdog heartbeat in seconds. (1<heartbeat<2046, default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-iTCO_wdt:
-heartbeat: Watchdog heartbeat in seconds.
-	(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-iTCO_vendor_support:
-vendorsupport: iTCO vendor specific support mode, default=0 (none),
-	1=SuperMicro Pent3, 2=SuperMicro Pent4+, 911=Broken SMI BIOS
--------------------------------------------------
-ib700wdt:
-timeout: Watchdog timeout in seconds. 0<= timeout <=30, default=30.
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ibmasr:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-imx2_wdt:
-timeout: Watchdog timeout in seconds (default 60 s)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-indydog:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-iop_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-it8712f_wdt:
-margin: Watchdog margin in seconds (default 60)
-nowayout: Disable watchdog shutdown on close
-	(default=kernel config parameter)
--------------------------------------------------
-it87_wdt:
-nogameport: Forbid the activation of game port, default=0
-nocir: Forbid the use of CIR (workaround for some buggy setups); set to 1 if
-system resets despite watchdog daemon running, default=0
-exclusive: Watchdog exclusive device open, default=1
-timeout: Watchdog timeout in seconds, default=60
-testmode: Watchdog test mode (1 = no reboot), default=0
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ixp4xx_wdt:
-heartbeat: Watchdog heartbeat in seconds (default 60s)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ks8695_wdt:
-wdt_time: Watchdog time in seconds. (default=5)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-machzwd:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-action: after watchdog resets, generate:
-	0 = RESET(*)  1 = SMI  2 = NMI  3 = SCI
--------------------------------------------------
-max63xx_wdt:
-heartbeat: Watchdog heartbeat period in seconds from 1 to 60, default 60
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-nodelay: Force selection of a timeout setting without initial delay
-	(max6373/74 only, default=0)
--------------------------------------------------
-mixcomwd:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-mpc8xxx_wdt:
-timeout: Watchdog timeout in ticks. (0<timeout<65536, default=65535)
-reset: Watchdog Interrupt/Reset Mode. 0 = interrupt, 1 = reset
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-mv64x60_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ni903x_wdt:
-timeout: Initial watchdog timeout in seconds (0<timeout<516, default=60)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-nic7018_wdt:
-timeout: Initial watchdog timeout in seconds (0<timeout<464, default=80)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-nuc900_wdt:
-heartbeat: Watchdog heartbeats in seconds.
-	(default = 15)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-omap_wdt:
-timer_margin: initial watchdog timeout (in seconds)
-early_enable: Watchdog is started on module insertion (default=0
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-orion_wdt:
-heartbeat: Initial watchdog heartbeat in seconds
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-pc87413_wdt:
-io: pc87413 WDT I/O port (default: io).
-timeout: Watchdog timeout in minutes (default=timeout).
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-pika_wdt:
-heartbeat: Watchdog heartbeats in seconds. (default = 15)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-pnx4008_wdt:
-heartbeat: Watchdog heartbeat period in seconds from 1 to 60, default 19
-nowayout: Set to 1 to keep watchdog running after device release
--------------------------------------------------
-pnx833x_wdt:
-timeout: Watchdog timeout in Mhz. (68Mhz clock), default=2040000000 (30 seconds)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-start_enabled: Watchdog is started on module insertion (default=1)
--------------------------------------------------
-rc32434_wdt:
-timeout: Watchdog timeout value, in seconds (default=20)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-riowd:
-riowd_timeout: Watchdog timeout in minutes (default=1)
--------------------------------------------------
-s3c2410_wdt:
-tmr_margin: Watchdog tmr_margin in seconds. (default=15)
-tmr_atboot: Watchdog is started at boot time if set to 1, default=0
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-soft_noboot: Watchdog action, set to 1 to ignore reboots, 0 to reboot
-debug: Watchdog debug, set to >1 for debug, (default 0)
--------------------------------------------------
-sa1100_wdt:
-margin: Watchdog margin in seconds (default 60s)
--------------------------------------------------
-sb_wdog:
-timeout: Watchdog timeout in microseconds (max/default 8388607 or 8.3ish secs)
--------------------------------------------------
-sbc60xxwdt:
-wdt_stop: SBC60xx WDT 'stop' io port (default 0x45)
-wdt_start: SBC60xx WDT 'start' io port (default 0x443)
-timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sbc7240_wdt:
-timeout: Watchdog timeout in seconds. (1<=timeout<=255, default=30)
-nowayout: Disable watchdog when closing device file
--------------------------------------------------
-sbc8360:
-timeout: Index into timeout table (0-63) (default=27 (60s))
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sbc_epx_c3:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sbc_fitpc2_wdt:
-margin: Watchdog margin in seconds (default 60s)
-nowayout: Watchdog cannot be stopped once started
--------------------------------------------------
-sbsa_gwdt:
-timeout: Watchdog timeout in seconds. (default 10s)
-action: Watchdog action at the first stage timeout,
-	set to 0 to ignore, 1 to panic. (default=0)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sc1200wdt:
-isapnp: When set to 0 driver ISA PnP support will be disabled (default=1)
-io: io port
-timeout: range is 0-255 minutes, default is 1
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sc520_wdt:
-timeout: Watchdog timeout in seconds. (1 <= timeout <= 3600, default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sch311x_wdt:
-force_id: Override the detected device ID
-therm_trip: Should a ThermTrip trigger the reset generator
-timeout: Watchdog timeout in seconds. 1<= timeout <=15300, default=60
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-scx200_wdt:
-margin: Watchdog margin in seconds
-nowayout: Disable watchdog shutdown on close
--------------------------------------------------
-shwdt:
-clock_division_ratio: Clock division ratio. Valid ranges are from 0x5 (1.31ms)
-	to 0x7 (5.25ms). (default=7)
-heartbeat: Watchdog heartbeat in seconds. (1 <= heartbeat <= 3600, default=30
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-smsc37b787_wdt:
-timeout: range is 1-255 units, default is 60
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-softdog:
-soft_margin: Watchdog soft_margin in seconds.
-	(0 < soft_margin < 65536, default=60)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
-soft_noboot: Softdog action, set to 1 to ignore reboots, 0 to reboot
-	(default=0)
--------------------------------------------------
-stmp3xxx_wdt:
-heartbeat: Watchdog heartbeat period in seconds from 1 to 4194304, default 19
--------------------------------------------------
-tegra_wdt:
-heartbeat: Watchdog heartbeats in seconds. (default = 120)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-ts72xx_wdt:
-timeout: Watchdog timeout in seconds. (1 <= timeout <= 8, default=8)
-nowayout: Disable watchdog shutdown on close
--------------------------------------------------
-twl4030_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-txx9wdt:
-timeout: Watchdog timeout in seconds. (0<timeout<N, default=60)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-uniphier_wdt:
-timeout: Watchdog timeout in power of two seconds.
-	(1 <= timeout <= 128, default=64)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-w83627hf_wdt:
-wdt_io: w83627hf/thf WDT io port (default 0x2E)
-timeout: Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-w83877f_wdt:
-timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-w83977f_wdt:
-timeout: Watchdog timeout in seconds (15..7635), default=45)
-testmode: Watchdog testmode (1 = no reboot), default=0
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-wafer5823wdt:
-timeout: Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-wdt285:
-soft_margin: Watchdog timeout in seconds (default=60)
--------------------------------------------------
-wdt977:
-timeout: Watchdog timeout in seconds (60..15300, default=60)
-testmode: Watchdog testmode (1 = no reboot), default=0
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-wm831x_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-wm8350_wdt:
-nowayout: Watchdog cannot be stopped once started
-	(default=kernel config parameter)
--------------------------------------------------
-sun4v_wdt:
-timeout_ms: Watchdog timeout in milliseconds 1..180000, default=60000)
-nowayout: Watchdog cannot be stopped once started
--------------------------------------------------
diff --git a/Documentation/watchdog/watchdog-pm.rst b/Documentation/watchdog/watchdog-pm.rst
new file mode 100644
index 000000000000..646e1f28f31f
--- /dev/null
+++ b/Documentation/watchdog/watchdog-pm.rst
@@ -0,0 +1,22 @@
+===============================================
+The Linux WatchDog Timer Power Management Guide
+===============================================
+
+Last reviewed: 17-Dec-2018
+
+Wolfram Sang <wsa+renesas@sang-engineering.com>
+
+Introduction
+------------
+This document states rules about watchdog devices and their power management
+handling to ensure a uniform behaviour for Linux systems.
+
+
+Ping on resume
+--------------
+On resume, a watchdog timer shall be reset to its selected value to give
+userspace enough time to resume. [1] [2]
+
+[1] https://patchwork.kernel.org/patch/10252209/
+
+[2] https://patchwork.kernel.org/patch/10711625/
diff --git a/Documentation/watchdog/watchdog-pm.txt b/Documentation/watchdog/watchdog-pm.txt
deleted file mode 100644
index 7a4dd46e0d24..000000000000
--- a/Documentation/watchdog/watchdog-pm.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-The Linux WatchDog Timer Power Management Guide
-===============================================
-Last reviewed: 17-Dec-2018
-
-Wolfram Sang <wsa+renesas@sang-engineering.com>
-
-Introduction
-------------
-This document states rules about watchdog devices and their power management
-handling to ensure a uniform behaviour for Linux systems.
-
-
-Ping on resume
---------------
-On resume, a watchdog timer shall be reset to its selected value to give
-userspace enough time to resume. [1] [2]
-
-[1] https://patchwork.kernel.org/patch/10252209/
-[2] https://patchwork.kernel.org/patch/10711625/
diff --git a/Documentation/watchdog/wdt.rst b/Documentation/watchdog/wdt.rst
new file mode 100644
index 000000000000..d97b0361535b
--- /dev/null
+++ b/Documentation/watchdog/wdt.rst
@@ -0,0 +1,63 @@
+============================================================
+WDT Watchdog Timer Interfaces For The Linux Operating System
+============================================================
+
+Last Reviewed: 10/05/2007
+
+Alan Cox <alan@lxorguk.ukuu.org.uk>
+
+	- ICS	WDT501-P
+	- ICS	WDT501-P (no fan tachometer)
+	- ICS	WDT500-P
+
+All the interfaces provide /dev/watchdog, which when open must be written
+to within a timeout or the machine will reboot. Each write delays the reboot
+time another timeout. In the case of the software watchdog the ability to
+reboot will depend on the state of the machines and interrupts. The hardware
+boards physically pull the machine down off their own onboard timers and
+will reboot from almost anything.
+
+A second temperature monitoring interface is available on the WDT501P cards.
+This provides /dev/temperature. This is the machine internal temperature in
+degrees Fahrenheit. Each read returns a single byte giving the temperature.
+
+The third interface logs kernel messages on additional alert events.
+
+The ICS ISA-bus wdt card cannot be safely probed for. Instead you need to
+pass IO address and IRQ boot parameters.  E.g.::
+
+	wdt.io=0x240 wdt.irq=11
+
+Other "wdt" driver parameters are:
+
+	===========	======================================================
+	heartbeat	Watchdog heartbeat in seconds (default 60)
+	nowayout	Watchdog cannot be stopped once started (kernel
+			build parameter)
+	tachometer	WDT501-P Fan Tachometer support (0=disable, default=0)
+	type		WDT501-P Card type (500 or 501, default=500)
+	===========	======================================================
+
+Features
+--------
+
+================   =======	   =======
+		   WDT501P	   WDT500P
+================   =======	   =======
+Reboot Timer	   X               X
+External Reboot	   X	           X
+I/O Port Monitor   o		   o
+Temperature	   X		   o
+Fan Speed          X		   o
+Power Under	   X               o
+Power Over         X               o
+Overheat           X               o
+================   =======	   =======
+
+The external event interfaces on the WDT boards are not currently supported.
+Minor numbers are however allocated for it.
+
+
+Example Watchdog Driver:
+
+	see samples/watchdog/watchdog-simple.c
diff --git a/Documentation/watchdog/wdt.txt b/Documentation/watchdog/wdt.txt
deleted file mode 100644
index ed2f0b860869..000000000000
--- a/Documentation/watchdog/wdt.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-Last Reviewed: 10/05/2007
-
-	WDT Watchdog Timer Interfaces For The Linux Operating System
-		Alan Cox <alan@lxorguk.ukuu.org.uk>
-
-	ICS	WDT501-P
-	ICS	WDT501-P (no fan tachometer)
-	ICS	WDT500-P
-
-All the interfaces provide /dev/watchdog, which when open must be written
-to within a timeout or the machine will reboot. Each write delays the reboot
-time another timeout. In the case of the software watchdog the ability to
-reboot will depend on the state of the machines and interrupts. The hardware
-boards physically pull the machine down off their own onboard timers and
-will reboot from almost anything.
-
-A second temperature monitoring interface is available on the WDT501P cards.
-This provides /dev/temperature. This is the machine internal temperature in
-degrees Fahrenheit. Each read returns a single byte giving the temperature.
-
-The third interface logs kernel messages on additional alert events.
-
-The ICS ISA-bus wdt card cannot be safely probed for. Instead you need to
-pass IO address and IRQ boot parameters.  E.g.:
-	wdt.io=0x240 wdt.irq=11
-
-Other "wdt" driver parameters are:
-	heartbeat	Watchdog heartbeat in seconds (default 60)
-	nowayout	Watchdog cannot be stopped once started (kernel
-				build parameter)
-	tachometer	WDT501-P Fan Tachometer support (0=disable, default=0)
-	type		WDT501-P Card type (500 or 501, default=500)
-
-Features
---------
-		WDT501P		WDT500P
-Reboot Timer	   X               X
-External Reboot	   X	           X
-I/O Port Monitor   o		   o
-Temperature	   X		   o
-Fan Speed          X		   o
-Power Under	   X               o
-Power Over         X               o
-Overheat           X               o
-
-The external event interfaces on the WDT boards are not currently supported.
-Minor numbers are however allocated for it.
-
-
-Example Watchdog Driver:  see samples/watchdog/watchdog-simple.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 0db7f12439f7..ab7949a7782f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7009,7 +7009,7 @@ F:	drivers/media/usb/hdpvr/
 HEWLETT PACKARD ENTERPRISE ILO NMI WATCHDOG DRIVER
 M:	Jerry Hoemann <jerry.hoemann@hpe.com>
 S:	Supported
-F:	Documentation/watchdog/hpwdt.txt
+F:	Documentation/watchdog/hpwdt.rst
 F:	drivers/watchdog/hpwdt.c
 
 HEWLETT-PACKARD SMART ARRAY RAID DRIVER (hpsa)
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index ffe754539f5a..6cad0b33d7ad 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -18,7 +18,7 @@ menuconfig WATCHDOG
 	  reboot the machine) and a driver for hardware watchdog boards, which
 	  are more robust and can also keep track of the temperature inside
 	  your computer. For details, read
-	  <file:Documentation/watchdog/watchdog-api.txt> in the kernel source.
+	  <file:Documentation/watchdog/watchdog-api.rst> in the kernel source.
 
 	  The watchdog is usually used together with the watchdog daemon
 	  which is available from
@@ -1870,7 +1870,7 @@ config BOOKE_WDT
 	  Watchdog driver for PowerPC Book-E chips, such as the Freescale
 	  MPC85xx SOCs and the IBM PowerPC 440.
 
-	  Please see Documentation/watchdog/watchdog-api.txt for
+	  Please see Documentation/watchdog/watchdog-api.rst for
 	  more information.
 
 config BOOKE_WDT_DEFAULT_TIMEOUT
@@ -2019,7 +2019,7 @@ config PCWATCHDOG
 	  This card simply watches your kernel to make sure it doesn't freeze,
 	  and if it does, it reboots your computer after a certain amount of
 	  time. This driver is like the WDT501 driver but for different
-	  hardware. Please read <file:Documentation/watchdog/pcwd-watchdog.txt>. The PC
+	  hardware. Please read <file:Documentation/watchdog/pcwd-watchdog.rst>. The PC
 	  watchdog cards can be ordered from <http://www.berkprod.com/>.
 
 	  To compile this driver as a module, choose M here: the
diff --git a/drivers/watchdog/smsc37b787_wdt.c b/drivers/watchdog/smsc37b787_wdt.c
index 13c817ea1d6a..f5713030d0f7 100644
--- a/drivers/watchdog/smsc37b787_wdt.c
+++ b/drivers/watchdog/smsc37b787_wdt.c
@@ -36,7 +36,7 @@
  *  mknod /dev/watchdog c 10 130
  *
  * For an example userspace keep-alive daemon, see:
- *   Documentation/watchdog/wdt.txt
+ *   Documentation/watchdog/wdt.rst
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-- 
cgit v1.2.3-59-g8ed1b


From d223884089734cc637c4e5458870d69f6ded9f89 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:53:02 -0300
Subject: docs: xilinx: convert eemi.txt to eemi.rst

This is a very trivial conversion: adjust the title markup
and add a few literal block markups to produce a better
visual when parsed and avoid warnings.

As newer documents related to xilinx could be added in the future,
create a new index file for it.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/xilinx/eemi.rst  | 67 ++++++++++++++++++++++++++++++++++++++++++
 Documentation/xilinx/eemi.txt  | 67 ------------------------------------------
 Documentation/xilinx/index.rst | 17 +++++++++++
 3 files changed, 84 insertions(+), 67 deletions(-)
 create mode 100644 Documentation/xilinx/eemi.rst
 delete mode 100644 Documentation/xilinx/eemi.txt
 create mode 100644 Documentation/xilinx/index.rst

diff --git a/Documentation/xilinx/eemi.rst b/Documentation/xilinx/eemi.rst
new file mode 100644
index 000000000000..9dcbc6f18d75
--- /dev/null
+++ b/Documentation/xilinx/eemi.rst
@@ -0,0 +1,67 @@
+====================================
+Xilinx Zynq MPSoC EEMI Documentation
+====================================
+
+Xilinx Zynq MPSoC Firmware Interface
+-------------------------------------
+The zynqmp-firmware node describes the interface to platform firmware.
+ZynqMP has an interface to communicate with secure firmware. Firmware
+driver provides an interface to firmware APIs. Interface APIs can be
+used by any driver to communicate with PMC(Platform Management Controller).
+
+Embedded Energy Management Interface (EEMI)
+----------------------------------------------
+The embedded energy management interface is used to allow software
+components running across different processing clusters on a chip or
+device to communicate with a power management controller (PMC) on a
+device to issue or respond to power management requests.
+
+EEMI ops is a structure containing all eemi APIs supported by Zynq MPSoC.
+The zynqmp-firmware driver maintain all EEMI APIs in zynqmp_eemi_ops
+structure. Any driver who want to communicate with PMC using EEMI APIs
+can call zynqmp_pm_get_eemi_ops().
+
+Example of EEMI ops::
+
+	/* zynqmp-firmware driver maintain all EEMI APIs */
+	struct zynqmp_eemi_ops {
+		int (*get_api_version)(u32 *version);
+		int (*query_data)(struct zynqmp_pm_query_data qdata, u32 *out);
+	};
+
+	static const struct zynqmp_eemi_ops eemi_ops = {
+		.get_api_version = zynqmp_pm_get_api_version,
+		.query_data = zynqmp_pm_query_data,
+	};
+
+Example of EEMI ops usage::
+
+	static const struct zynqmp_eemi_ops *eemi_ops;
+	u32 ret_payload[PAYLOAD_ARG_CNT];
+	int ret;
+
+	eemi_ops = zynqmp_pm_get_eemi_ops();
+	if (IS_ERR(eemi_ops))
+		return PTR_ERR(eemi_ops);
+
+	ret = eemi_ops->query_data(qdata, ret_payload);
+
+IOCTL
+------
+IOCTL API is for device control and configuration. It is not a system
+IOCTL but it is an EEMI API. This API can be used by master to control
+any device specific configuration. IOCTL definitions can be platform
+specific. This API also manage shared device configuration.
+
+The following IOCTL IDs are valid for device control:
+- IOCTL_SET_PLL_FRAC_MODE	8
+- IOCTL_GET_PLL_FRAC_MODE	9
+- IOCTL_SET_PLL_FRAC_DATA	10
+- IOCTL_GET_PLL_FRAC_DATA	11
+
+Refer EEMI API guide [0] for IOCTL specific parameters and other EEMI APIs.
+
+References
+----------
+[0] Embedded Energy Management Interface (EEMI) API guide:
+    https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
diff --git a/Documentation/xilinx/eemi.txt b/Documentation/xilinx/eemi.txt
deleted file mode 100644
index 5f39b4ffdcd4..000000000000
--- a/Documentation/xilinx/eemi.txt
+++ /dev/null
@@ -1,67 +0,0 @@
----------------------------------------------------------------------
-Xilinx Zynq MPSoC EEMI Documentation
----------------------------------------------------------------------
-
-Xilinx Zynq MPSoC Firmware Interface
--------------------------------------
-The zynqmp-firmware node describes the interface to platform firmware.
-ZynqMP has an interface to communicate with secure firmware. Firmware
-driver provides an interface to firmware APIs. Interface APIs can be
-used by any driver to communicate with PMC(Platform Management Controller).
-
-Embedded Energy Management Interface (EEMI)
-----------------------------------------------
-The embedded energy management interface is used to allow software
-components running across different processing clusters on a chip or
-device to communicate with a power management controller (PMC) on a
-device to issue or respond to power management requests.
-
-EEMI ops is a structure containing all eemi APIs supported by Zynq MPSoC.
-The zynqmp-firmware driver maintain all EEMI APIs in zynqmp_eemi_ops
-structure. Any driver who want to communicate with PMC using EEMI APIs
-can call zynqmp_pm_get_eemi_ops().
-
-Example of EEMI ops:
-
-	/* zynqmp-firmware driver maintain all EEMI APIs */
-	struct zynqmp_eemi_ops {
-		int (*get_api_version)(u32 *version);
-		int (*query_data)(struct zynqmp_pm_query_data qdata, u32 *out);
-	};
-
-	static const struct zynqmp_eemi_ops eemi_ops = {
-		.get_api_version = zynqmp_pm_get_api_version,
-		.query_data = zynqmp_pm_query_data,
-	};
-
-Example of EEMI ops usage:
-
-	static const struct zynqmp_eemi_ops *eemi_ops;
-	u32 ret_payload[PAYLOAD_ARG_CNT];
-	int ret;
-
-	eemi_ops = zynqmp_pm_get_eemi_ops();
-	if (IS_ERR(eemi_ops))
-		return PTR_ERR(eemi_ops);
-
-	ret = eemi_ops->query_data(qdata, ret_payload);
-
-IOCTL
-------
-IOCTL API is for device control and configuration. It is not a system
-IOCTL but it is an EEMI API. This API can be used by master to control
-any device specific configuration. IOCTL definitions can be platform
-specific. This API also manage shared device configuration.
-
-The following IOCTL IDs are valid for device control:
-- IOCTL_SET_PLL_FRAC_MODE	8
-- IOCTL_GET_PLL_FRAC_MODE	9
-- IOCTL_SET_PLL_FRAC_DATA	10
-- IOCTL_GET_PLL_FRAC_DATA	11
-
-Refer EEMI API guide [0] for IOCTL specific parameters and other EEMI APIs.
-
-References
-----------
-[0] Embedded Energy Management Interface (EEMI) API guide:
-    https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
diff --git a/Documentation/xilinx/index.rst b/Documentation/xilinx/index.rst
new file mode 100644
index 000000000000..01cc1a0714df
--- /dev/null
+++ b/Documentation/xilinx/index.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+===========
+Xilinx FPGA
+===========
+
+.. toctree::
+    :maxdepth: 1
+
+    eemi
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
-- 
cgit v1.2.3-59-g8ed1b


From d6a3b247627a3bc0551504eb305d624cc6fb5453 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:53:03 -0300
Subject: docs: scheduler: convert docs to ReST and rename to *.rst

In order to prepare to add them to the Kernel API book,
convert the files to ReST format.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/ABI/testing/sysfs-kernel-uids   |   2 +-
 Documentation/scheduler/completion.rst        | 293 +++++++++
 Documentation/scheduler/completion.txt        | 291 ---------
 Documentation/scheduler/index.rst             |  29 +
 Documentation/scheduler/sched-arch.rst        |  76 +++
 Documentation/scheduler/sched-arch.txt        |  74 ---
 Documentation/scheduler/sched-bwc.rst         | 128 ++++
 Documentation/scheduler/sched-bwc.txt         | 122 ----
 Documentation/scheduler/sched-deadline.rst    | 888 ++++++++++++++++++++++++++
 Documentation/scheduler/sched-deadline.txt    | 871 -------------------------
 Documentation/scheduler/sched-design-CFS.rst  | 249 ++++++++
 Documentation/scheduler/sched-design-CFS.txt  | 242 -------
 Documentation/scheduler/sched-domains.rst     |  83 +++
 Documentation/scheduler/sched-domains.txt     |  77 ---
 Documentation/scheduler/sched-energy.rst      | 430 +++++++++++++
 Documentation/scheduler/sched-energy.txt      | 425 ------------
 Documentation/scheduler/sched-nice-design.rst | 112 ++++
 Documentation/scheduler/sched-nice-design.txt | 108 ----
 Documentation/scheduler/sched-rt-group.rst    | 185 ++++++
 Documentation/scheduler/sched-rt-group.txt    | 183 ------
 Documentation/scheduler/sched-stats.rst       | 167 +++++
 Documentation/scheduler/sched-stats.txt       | 154 -----
 Documentation/scheduler/text_files.rst        |   5 +
 Documentation/vm/numa.rst                     |   2 +-
 init/Kconfig                                  |   6 +-
 kernel/sched/deadline.c                       |   2 +-
 26 files changed, 2651 insertions(+), 2553 deletions(-)
 create mode 100644 Documentation/scheduler/completion.rst
 delete mode 100644 Documentation/scheduler/completion.txt
 create mode 100644 Documentation/scheduler/index.rst
 create mode 100644 Documentation/scheduler/sched-arch.rst
 delete mode 100644 Documentation/scheduler/sched-arch.txt
 create mode 100644 Documentation/scheduler/sched-bwc.rst
 delete mode 100644 Documentation/scheduler/sched-bwc.txt
 create mode 100644 Documentation/scheduler/sched-deadline.rst
 delete mode 100644 Documentation/scheduler/sched-deadline.txt
 create mode 100644 Documentation/scheduler/sched-design-CFS.rst
 delete mode 100644 Documentation/scheduler/sched-design-CFS.txt
 create mode 100644 Documentation/scheduler/sched-domains.rst
 delete mode 100644 Documentation/scheduler/sched-domains.txt
 create mode 100644 Documentation/scheduler/sched-energy.rst
 delete mode 100644 Documentation/scheduler/sched-energy.txt
 create mode 100644 Documentation/scheduler/sched-nice-design.rst
 delete mode 100644 Documentation/scheduler/sched-nice-design.txt
 create mode 100644 Documentation/scheduler/sched-rt-group.rst
 delete mode 100644 Documentation/scheduler/sched-rt-group.txt
 create mode 100644 Documentation/scheduler/sched-stats.rst
 delete mode 100644 Documentation/scheduler/sched-stats.txt
 create mode 100644 Documentation/scheduler/text_files.rst

diff --git a/Documentation/ABI/testing/sysfs-kernel-uids b/Documentation/ABI/testing/sysfs-kernel-uids
index 28f14695a852..4182b7061816 100644
--- a/Documentation/ABI/testing/sysfs-kernel-uids
+++ b/Documentation/ABI/testing/sysfs-kernel-uids
@@ -11,4 +11,4 @@ Description:
 		example would be, if User A has shares = 1024 and user
 		B has shares = 2048, User B will get twice the CPU
 		bandwidth user A will. For more details refer
-		Documentation/scheduler/sched-design-CFS.txt
+		Documentation/scheduler/sched-design-CFS.rst
diff --git a/Documentation/scheduler/completion.rst b/Documentation/scheduler/completion.rst
new file mode 100644
index 000000000000..9f039b4f4b09
--- /dev/null
+++ b/Documentation/scheduler/completion.rst
@@ -0,0 +1,293 @@
+================================================
+Completions - "wait for completion" barrier APIs
+================================================
+
+Introduction:
+-------------
+
+If you have one or more threads that must wait for some kernel activity
+to have reached a point or a specific state, completions can provide a
+race-free solution to this problem. Semantically they are somewhat like a
+pthread_barrier() and have similar use-cases.
+
+Completions are a code synchronization mechanism which is preferable to any
+misuse of locks/semaphores and busy-loops. Any time you think of using
+yield() or some quirky msleep(1) loop to allow something else to proceed,
+you probably want to look into using one of the wait_for_completion*()
+calls and complete() instead.
+
+The advantage of using completions is that they have a well defined, focused
+purpose which makes it very easy to see the intent of the code, but they
+also result in more efficient code as all threads can continue execution
+until the result is actually needed, and both the waiting and the signalling
+is highly efficient using low level scheduler sleep/wakeup facilities.
+
+Completions are built on top of the waitqueue and wakeup infrastructure of
+the Linux scheduler. The event the threads on the waitqueue are waiting for
+is reduced to a simple flag in 'struct completion', appropriately called "done".
+
+As completions are scheduling related, the code can be found in
+kernel/sched/completion.c.
+
+
+Usage:
+------
+
+There are three main parts to using completions:
+
+ - the initialization of the 'struct completion' synchronization object
+ - the waiting part through a call to one of the variants of wait_for_completion(),
+ - the signaling side through a call to complete() or complete_all().
+
+There are also some helper functions for checking the state of completions.
+Note that while initialization must happen first, the waiting and signaling
+part can happen in any order. I.e. it's entirely normal for a thread
+to have marked a completion as 'done' before another thread checks whether
+it has to wait for it.
+
+To use completions you need to #include <linux/completion.h> and
+create a static or dynamic variable of type 'struct completion',
+which has only two fields::
+
+	struct completion {
+		unsigned int done;
+		wait_queue_head_t wait;
+	};
+
+This provides the ->wait waitqueue to place tasks on for waiting (if any), and
+the ->done completion flag for indicating whether it's completed or not.
+
+Completions should be named to refer to the event that is being synchronized on.
+A good example is::
+
+	wait_for_completion(&early_console_added);
+
+	complete(&early_console_added);
+
+Good, intuitive naming (as always) helps code readability. Naming a completion
+'complete' is not helpful unless the purpose is super obvious...
+
+
+Initializing completions:
+-------------------------
+
+Dynamically allocated completion objects should preferably be embedded in data
+structures that are assured to be alive for the life-time of the function/driver,
+to prevent races with asynchronous complete() calls from occurring.
+
+Particular care should be taken when using the _timeout() or _killable()/_interruptible()
+variants of wait_for_completion(), as it must be assured that memory de-allocation
+does not happen until all related activities (complete() or reinit_completion())
+have taken place, even if these wait functions return prematurely due to a timeout
+or a signal triggering.
+
+Initializing of dynamically allocated completion objects is done via a call to
+init_completion()::
+
+	init_completion(&dynamic_object->done);
+
+In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed"
+or "not done".
+
+The re-initialization function, reinit_completion(), simply resets the
+->done field to 0 ("not done"), without touching the waitqueue.
+Callers of this function must make sure that there are no racy
+wait_for_completion() calls going on in parallel.
+
+Calling init_completion() on the same completion object twice is
+most likely a bug as it re-initializes the queue to an empty queue and
+enqueued tasks could get "lost" - use reinit_completion() in that case,
+but be aware of other races.
+
+For static declaration and initialization, macros are available.
+
+For static (or global) declarations in file scope you can use
+DECLARE_COMPLETION()::
+
+	static DECLARE_COMPLETION(setup_done);
+	DECLARE_COMPLETION(setup_done);
+
+Note that in this case the completion is boot time (or module load time)
+initialized to 'not done' and doesn't require an init_completion() call.
+
+When a completion is declared as a local variable within a function,
+then the initialization should always use DECLARE_COMPLETION_ONSTACK()
+explicitly, not just to make lockdep happy, but also to make it clear
+that limited scope had been considered and is intentional::
+
+	DECLARE_COMPLETION_ONSTACK(setup_done)
+
+Note that when using completion objects as local variables you must be
+acutely aware of the short life time of the function stack: the function
+must not return to a calling context until all activities (such as waiting
+threads) have ceased and the completion object is completely unused.
+
+To emphasise this again: in particular when using some of the waiting API variants
+with more complex outcomes, such as the timeout or signalling (_timeout(),
+_killable() and _interruptible()) variants, the wait might complete
+prematurely while the object might still be in use by another thread - and a return
+from the wait_on_completion*() caller function will deallocate the function
+stack and cause subtle data corruption if a complete() is done in some
+other thread. Simple testing might not trigger these kinds of races.
+
+If unsure, use dynamically allocated completion objects, preferably embedded
+in some other long lived object that has a boringly long life time which
+exceeds the life time of any helper threads using the completion object,
+or has a lock or other synchronization mechanism to make sure complete()
+is not called on a freed object.
+
+A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning.
+
+Waiting for completions:
+------------------------
+
+For a thread to wait for some concurrent activity to finish, it
+calls wait_for_completion() on the initialized completion structure::
+
+	void wait_for_completion(struct completion *done)
+
+A typical usage scenario is::
+
+	CPU#1					CPU#2
+
+	struct completion setup_done;
+
+	init_completion(&setup_done);
+	initialize_work(...,&setup_done,...);
+
+	/* run non-dependent code */		/* do setup */
+
+	wait_for_completion(&setup_done);	complete(setup_done);
+
+This is not implying any particular order between wait_for_completion() and
+the call to complete() - if the call to complete() happened before the call
+to wait_for_completion() then the waiting side simply will continue
+immediately as all dependencies are satisfied; if not, it will block until
+completion is signaled by complete().
+
+Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(),
+so it can only be called safely when you know that interrupts are enabled.
+Calling it from IRQs-off atomic contexts will result in hard-to-detect
+spurious enabling of interrupts.
+
+The default behavior is to wait without a timeout and to mark the task as
+uninterruptible. wait_for_completion() and its variants are only safe
+in process context (as they can sleep) but not in atomic context,
+interrupt context, with disabled IRQs, or preemption is disabled - see also
+try_wait_for_completion() below for handling completion in atomic/interrupt
+context.
+
+As all variants of wait_for_completion() can (obviously) block for a long
+time depending on the nature of the activity they are waiting for, so in
+most cases you probably don't want to call this with held mutexes.
+
+
+wait_for_completion*() variants available:
+------------------------------------------
+
+The below variants all return status and this status should be checked in
+most(/all) cases - in cases where the status is deliberately not checked you
+probably want to make a note explaining this (e.g. see
+arch/arm/kernel/smp.c:__cpu_up()).
+
+A common problem that occurs is to have unclean assignment of return types,
+so take care to assign return-values to variables of the proper type.
+
+Checking for the specific meaning of return values also has been found
+to be quite inaccurate, e.g. constructs like::
+
+	if (!wait_for_completion_interruptible_timeout(...))
+
+... would execute the same code path for successful completion and for the
+interrupted case - which is probably not what you want::
+
+	int wait_for_completion_interruptible(struct completion *done)
+
+This function marks the task TASK_INTERRUPTIBLE while it is waiting.
+If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise::
+
+	unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout)
+
+The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout'
+jiffies. If a timeout occurs it returns 0, else the remaining time in
+jiffies (but at least 1).
+
+Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(),
+to make the code largely HZ-invariant.
+
+If the returned timeout value is deliberately ignored a comment should probably explain
+why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc())::
+
+	long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout)
+
+This function passes a timeout in jiffies and marks the task as
+TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS;
+otherwise it returns 0 if the completion timed out, or the remaining time in
+jiffies if completion occurred.
+
+Further variants include _killable which uses TASK_KILLABLE as the
+designated tasks state and will return -ERESTARTSYS if it is interrupted,
+or 0 if completion was achieved.  There is a _timeout variant as well::
+
+	long wait_for_completion_killable(struct completion *done)
+	long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout)
+
+The _io variants wait_for_completion_io() behave the same as the non-_io
+variants, except for accounting waiting time as 'waiting on IO', which has
+an impact on how the task is accounted in scheduling/IO stats::
+
+	void wait_for_completion_io(struct completion *done)
+	unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout)
+
+
+Signaling completions:
+----------------------
+
+A thread that wants to signal that the conditions for continuation have been
+achieved calls complete() to signal exactly one of the waiters that it can
+continue::
+
+	void complete(struct completion *done)
+
+... or calls complete_all() to signal all current and future waiters::
+
+	void complete_all(struct completion *done)
+
+The signaling will work as expected even if completions are signaled before
+a thread starts waiting. This is achieved by the waiter "consuming"
+(decrementing) the done field of 'struct completion'. Waiting threads
+wakeup order is the same in which they were enqueued (FIFO order).
+
+If complete() is called multiple times then this will allow for that number
+of waiters to continue - each call to complete() will simply increment the
+done field. Calling complete_all() multiple times is a bug though. Both
+complete() and complete_all() can be called in IRQ/atomic context safely.
+
+There can only be one thread calling complete() or complete_all() on a
+particular 'struct completion' at any time - serialized through the wait
+queue spinlock. Any such concurrent calls to complete() or complete_all()
+probably are a design bug.
+
+Signaling completion from IRQ context is fine as it will appropriately
+lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never
+sleep.
+
+
+try_wait_for_completion()/completion_done():
+--------------------------------------------
+
+The try_wait_for_completion() function will not put the thread on the wait
+queue but rather returns false if it would need to enqueue (block) the thread,
+else it consumes one posted completion and returns true::
+
+	bool try_wait_for_completion(struct completion *done)
+
+Finally, to check the state of a completion without changing it in any way,
+call completion_done(), which returns false if there are no posted
+completions that were not yet consumed by waiters (implying that there are
+waiters) and true otherwise::
+
+	bool completion_done(struct completion *done)
+
+Both try_wait_for_completion() and completion_done() are safe to be called in
+IRQ or atomic context.
diff --git a/Documentation/scheduler/completion.txt b/Documentation/scheduler/completion.txt
deleted file mode 100644
index e5b9df4d8078..000000000000
--- a/Documentation/scheduler/completion.txt
+++ /dev/null
@@ -1,291 +0,0 @@
-Completions - "wait for completion" barrier APIs
-================================================
-
-Introduction:
--------------
-
-If you have one or more threads that must wait for some kernel activity
-to have reached a point or a specific state, completions can provide a
-race-free solution to this problem. Semantically they are somewhat like a
-pthread_barrier() and have similar use-cases.
-
-Completions are a code synchronization mechanism which is preferable to any
-misuse of locks/semaphores and busy-loops. Any time you think of using
-yield() or some quirky msleep(1) loop to allow something else to proceed,
-you probably want to look into using one of the wait_for_completion*()
-calls and complete() instead.
-
-The advantage of using completions is that they have a well defined, focused
-purpose which makes it very easy to see the intent of the code, but they
-also result in more efficient code as all threads can continue execution
-until the result is actually needed, and both the waiting and the signalling
-is highly efficient using low level scheduler sleep/wakeup facilities.
-
-Completions are built on top of the waitqueue and wakeup infrastructure of
-the Linux scheduler. The event the threads on the waitqueue are waiting for
-is reduced to a simple flag in 'struct completion', appropriately called "done".
-
-As completions are scheduling related, the code can be found in
-kernel/sched/completion.c.
-
-
-Usage:
-------
-
-There are three main parts to using completions:
-
- - the initialization of the 'struct completion' synchronization object
- - the waiting part through a call to one of the variants of wait_for_completion(),
- - the signaling side through a call to complete() or complete_all().
-
-There are also some helper functions for checking the state of completions.
-Note that while initialization must happen first, the waiting and signaling
-part can happen in any order. I.e. it's entirely normal for a thread
-to have marked a completion as 'done' before another thread checks whether
-it has to wait for it.
-
-To use completions you need to #include <linux/completion.h> and
-create a static or dynamic variable of type 'struct completion',
-which has only two fields:
-
-	struct completion {
-		unsigned int done;
-		wait_queue_head_t wait;
-	};
-
-This provides the ->wait waitqueue to place tasks on for waiting (if any), and
-the ->done completion flag for indicating whether it's completed or not.
-
-Completions should be named to refer to the event that is being synchronized on.
-A good example is:
-
-	wait_for_completion(&early_console_added);
-
-	complete(&early_console_added);
-
-Good, intuitive naming (as always) helps code readability. Naming a completion
-'complete' is not helpful unless the purpose is super obvious...
-
-
-Initializing completions:
--------------------------
-
-Dynamically allocated completion objects should preferably be embedded in data
-structures that are assured to be alive for the life-time of the function/driver,
-to prevent races with asynchronous complete() calls from occurring.
-
-Particular care should be taken when using the _timeout() or _killable()/_interruptible()
-variants of wait_for_completion(), as it must be assured that memory de-allocation
-does not happen until all related activities (complete() or reinit_completion())
-have taken place, even if these wait functions return prematurely due to a timeout
-or a signal triggering.
-
-Initializing of dynamically allocated completion objects is done via a call to
-init_completion():
-
-	init_completion(&dynamic_object->done);
-
-In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed"
-or "not done".
-
-The re-initialization function, reinit_completion(), simply resets the
-->done field to 0 ("not done"), without touching the waitqueue.
-Callers of this function must make sure that there are no racy
-wait_for_completion() calls going on in parallel.
-
-Calling init_completion() on the same completion object twice is
-most likely a bug as it re-initializes the queue to an empty queue and
-enqueued tasks could get "lost" - use reinit_completion() in that case,
-but be aware of other races.
-
-For static declaration and initialization, macros are available.
-
-For static (or global) declarations in file scope you can use DECLARE_COMPLETION():
-
-	static DECLARE_COMPLETION(setup_done);
-	DECLARE_COMPLETION(setup_done);
-
-Note that in this case the completion is boot time (or module load time)
-initialized to 'not done' and doesn't require an init_completion() call.
-
-When a completion is declared as a local variable within a function,
-then the initialization should always use DECLARE_COMPLETION_ONSTACK()
-explicitly, not just to make lockdep happy, but also to make it clear
-that limited scope had been considered and is intentional:
-
-	DECLARE_COMPLETION_ONSTACK(setup_done)
-
-Note that when using completion objects as local variables you must be
-acutely aware of the short life time of the function stack: the function
-must not return to a calling context until all activities (such as waiting
-threads) have ceased and the completion object is completely unused.
-
-To emphasise this again: in particular when using some of the waiting API variants
-with more complex outcomes, such as the timeout or signalling (_timeout(),
-_killable() and _interruptible()) variants, the wait might complete
-prematurely while the object might still be in use by another thread - and a return
-from the wait_on_completion*() caller function will deallocate the function
-stack and cause subtle data corruption if a complete() is done in some
-other thread. Simple testing might not trigger these kinds of races.
-
-If unsure, use dynamically allocated completion objects, preferably embedded
-in some other long lived object that has a boringly long life time which
-exceeds the life time of any helper threads using the completion object,
-or has a lock or other synchronization mechanism to make sure complete()
-is not called on a freed object.
-
-A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning.
-
-Waiting for completions:
-------------------------
-
-For a thread to wait for some concurrent activity to finish, it
-calls wait_for_completion() on the initialized completion structure:
-
-	void wait_for_completion(struct completion *done)
-
-A typical usage scenario is:
-
-	CPU#1					CPU#2
-
-	struct completion setup_done;
-
-	init_completion(&setup_done);
-	initialize_work(...,&setup_done,...);
-
-	/* run non-dependent code */		/* do setup */
-
-	wait_for_completion(&setup_done);	complete(setup_done);
-
-This is not implying any particular order between wait_for_completion() and
-the call to complete() - if the call to complete() happened before the call
-to wait_for_completion() then the waiting side simply will continue
-immediately as all dependencies are satisfied; if not, it will block until
-completion is signaled by complete().
-
-Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(),
-so it can only be called safely when you know that interrupts are enabled.
-Calling it from IRQs-off atomic contexts will result in hard-to-detect
-spurious enabling of interrupts.
-
-The default behavior is to wait without a timeout and to mark the task as
-uninterruptible. wait_for_completion() and its variants are only safe
-in process context (as they can sleep) but not in atomic context,
-interrupt context, with disabled IRQs, or preemption is disabled - see also
-try_wait_for_completion() below for handling completion in atomic/interrupt
-context.
-
-As all variants of wait_for_completion() can (obviously) block for a long
-time depending on the nature of the activity they are waiting for, so in
-most cases you probably don't want to call this with held mutexes.
-
-
-wait_for_completion*() variants available:
-------------------------------------------
-
-The below variants all return status and this status should be checked in
-most(/all) cases - in cases where the status is deliberately not checked you
-probably want to make a note explaining this (e.g. see
-arch/arm/kernel/smp.c:__cpu_up()).
-
-A common problem that occurs is to have unclean assignment of return types,
-so take care to assign return-values to variables of the proper type.
-
-Checking for the specific meaning of return values also has been found
-to be quite inaccurate, e.g. constructs like:
-
-	if (!wait_for_completion_interruptible_timeout(...))
-
-... would execute the same code path for successful completion and for the
-interrupted case - which is probably not what you want.
-
-	int wait_for_completion_interruptible(struct completion *done)
-
-This function marks the task TASK_INTERRUPTIBLE while it is waiting.
-If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise.
-
-	unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout)
-
-The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout'
-jiffies. If a timeout occurs it returns 0, else the remaining time in
-jiffies (but at least 1).
-
-Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(),
-to make the code largely HZ-invariant.
-
-If the returned timeout value is deliberately ignored a comment should probably explain
-why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()).
-
-	long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout)
-
-This function passes a timeout in jiffies and marks the task as
-TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS;
-otherwise it returns 0 if the completion timed out, or the remaining time in
-jiffies if completion occurred.
-
-Further variants include _killable which uses TASK_KILLABLE as the
-designated tasks state and will return -ERESTARTSYS if it is interrupted,
-or 0 if completion was achieved.  There is a _timeout variant as well:
-
-	long wait_for_completion_killable(struct completion *done)
-	long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout)
-
-The _io variants wait_for_completion_io() behave the same as the non-_io
-variants, except for accounting waiting time as 'waiting on IO', which has
-an impact on how the task is accounted in scheduling/IO stats:
-
-	void wait_for_completion_io(struct completion *done)
-	unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout)
-
-
-Signaling completions:
-----------------------
-
-A thread that wants to signal that the conditions for continuation have been
-achieved calls complete() to signal exactly one of the waiters that it can
-continue:
-
-	void complete(struct completion *done)
-
-... or calls complete_all() to signal all current and future waiters:
-
-	void complete_all(struct completion *done)
-
-The signaling will work as expected even if completions are signaled before
-a thread starts waiting. This is achieved by the waiter "consuming"
-(decrementing) the done field of 'struct completion'. Waiting threads
-wakeup order is the same in which they were enqueued (FIFO order).
-
-If complete() is called multiple times then this will allow for that number
-of waiters to continue - each call to complete() will simply increment the
-done field. Calling complete_all() multiple times is a bug though. Both
-complete() and complete_all() can be called in IRQ/atomic context safely.
-
-There can only be one thread calling complete() or complete_all() on a
-particular 'struct completion' at any time - serialized through the wait
-queue spinlock. Any such concurrent calls to complete() or complete_all()
-probably are a design bug.
-
-Signaling completion from IRQ context is fine as it will appropriately
-lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never
-sleep. 
-
-
-try_wait_for_completion()/completion_done():
---------------------------------------------
-
-The try_wait_for_completion() function will not put the thread on the wait
-queue but rather returns false if it would need to enqueue (block) the thread,
-else it consumes one posted completion and returns true.
-
-	bool try_wait_for_completion(struct completion *done)
-
-Finally, to check the state of a completion without changing it in any way,
-call completion_done(), which returns false if there are no posted
-completions that were not yet consumed by waiters (implying that there are
-waiters) and true otherwise;
-
-	bool completion_done(struct completion *done)
-
-Both try_wait_for_completion() and completion_done() are safe to be called in
-IRQ or atomic context.
diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
new file mode 100644
index 000000000000..058be77a4c34
--- /dev/null
+++ b/Documentation/scheduler/index.rst
@@ -0,0 +1,29 @@
+:orphan:
+
+===============
+Linux Scheduler
+===============
+
+.. toctree::
+    :maxdepth: 1
+
+
+    completion
+    sched-arch
+    sched-bwc
+    sched-deadline
+    sched-design-CFS
+    sched-domains
+    sched-energy
+    sched-nice-design
+    sched-rt-group
+    sched-stats
+
+    text_files
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/scheduler/sched-arch.rst b/Documentation/scheduler/sched-arch.rst
new file mode 100644
index 000000000000..0eaec669790a
--- /dev/null
+++ b/Documentation/scheduler/sched-arch.rst
@@ -0,0 +1,76 @@
+=================================================================
+CPU Scheduler implementation hints for architecture specific code
+=================================================================
+
+	Nick Piggin, 2005
+
+Context switch
+==============
+1. Runqueue locking
+By default, the switch_to arch function is called with the runqueue
+locked. This is usually not a problem unless switch_to may need to
+take the runqueue lock. This is usually due to a wake up operation in
+the context switch. See arch/ia64/include/asm/switch_to.h for an example.
+
+To request the scheduler call switch_to with the runqueue unlocked,
+you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file
+(typically the one where switch_to is defined).
+
+Unlocked context switches introduce only a very minor performance
+penalty to the core scheduler implementation in the CONFIG_SMP case.
+
+CPU idle
+========
+Your cpu_idle routines need to obey the following rules:
+
+1. Preempt should now disabled over idle routines. Should only
+   be enabled to call schedule() then disabled again.
+
+2. need_resched/TIF_NEED_RESCHED is only ever set, and will never
+   be cleared until the running task has called schedule(). Idle
+   threads need only ever query need_resched, and may never set or
+   clear it.
+
+3. When cpu_idle finds (need_resched() == 'true'), it should call
+   schedule(). It should not call schedule() otherwise.
+
+4. The only time interrupts need to be disabled when checking
+   need_resched is if we are about to sleep the processor until
+   the next interrupt (this doesn't provide any protection of
+   need_resched, it prevents losing an interrupt):
+
+	4a. Common problem with this type of sleep appears to be::
+
+	        local_irq_disable();
+	        if (!need_resched()) {
+	                local_irq_enable();
+	                *** resched interrupt arrives here ***
+	                __asm__("sleep until next interrupt");
+	        }
+
+5. TIF_POLLING_NRFLAG can be set by idle routines that do not
+   need an interrupt to wake them up when need_resched goes high.
+   In other words, they must be periodically polling need_resched,
+   although it may be reasonable to do some background work or enter
+   a low CPU priority.
+
+      - 5a. If TIF_POLLING_NRFLAG is set, and we do decide to enter
+	an interrupt sleep, it needs to be cleared then a memory
+	barrier issued (followed by a test of need_resched with
+	interrupts disabled, as explained in 3).
+
+arch/x86/kernel/process.c has examples of both polling and
+sleeping idle functions.
+
+
+Possible arch/ problems
+=======================
+
+Possible arch problems I found (and either tried to fix or didn't):
+
+ia64 - is safe_halt call racy vs interrupts? (does it sleep?) (See #4a)
+
+sh64 - Is sleeping racy vs interrupts? (See #4a)
+
+sparc - IRQs on at this point(?), change local_irq_save to _disable.
+      - TODO: needs secondary CPUs to disable preempt (See #1)
diff --git a/Documentation/scheduler/sched-arch.txt b/Documentation/scheduler/sched-arch.txt
deleted file mode 100644
index a2f27bbf2cba..000000000000
--- a/Documentation/scheduler/sched-arch.txt
+++ /dev/null
@@ -1,74 +0,0 @@
-	CPU Scheduler implementation hints for architecture specific code
-
-	Nick Piggin, 2005
-
-Context switch
-==============
-1. Runqueue locking
-By default, the switch_to arch function is called with the runqueue
-locked. This is usually not a problem unless switch_to may need to
-take the runqueue lock. This is usually due to a wake up operation in
-the context switch. See arch/ia64/include/asm/switch_to.h for an example.
-
-To request the scheduler call switch_to with the runqueue unlocked,
-you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file
-(typically the one where switch_to is defined).
-
-Unlocked context switches introduce only a very minor performance
-penalty to the core scheduler implementation in the CONFIG_SMP case.
-
-CPU idle
-========
-Your cpu_idle routines need to obey the following rules:
-
-1. Preempt should now disabled over idle routines. Should only
-   be enabled to call schedule() then disabled again.
-
-2. need_resched/TIF_NEED_RESCHED is only ever set, and will never
-   be cleared until the running task has called schedule(). Idle
-   threads need only ever query need_resched, and may never set or
-   clear it.
-
-3. When cpu_idle finds (need_resched() == 'true'), it should call
-   schedule(). It should not call schedule() otherwise.
-
-4. The only time interrupts need to be disabled when checking
-   need_resched is if we are about to sleep the processor until
-   the next interrupt (this doesn't provide any protection of
-   need_resched, it prevents losing an interrupt).
-
-	4a. Common problem with this type of sleep appears to be:
-	        local_irq_disable();
-	        if (!need_resched()) {
-	                local_irq_enable();
-	                *** resched interrupt arrives here ***
-	                __asm__("sleep until next interrupt");
-	        }
-
-5. TIF_POLLING_NRFLAG can be set by idle routines that do not
-   need an interrupt to wake them up when need_resched goes high.
-   In other words, they must be periodically polling need_resched,
-   although it may be reasonable to do some background work or enter
-   a low CPU priority.
-
-   	5a. If TIF_POLLING_NRFLAG is set, and we do decide to enter
-	    an interrupt sleep, it needs to be cleared then a memory
-	    barrier issued (followed by a test of need_resched with
-	    interrupts disabled, as explained in 3).
-
-arch/x86/kernel/process.c has examples of both polling and
-sleeping idle functions.
-
-
-Possible arch/ problems
-=======================
-
-Possible arch problems I found (and either tried to fix or didn't):
-
-ia64 - is safe_halt call racy vs interrupts? (does it sleep?) (See #4a)
-
-sh64 - Is sleeping racy vs interrupts? (See #4a)
-
-sparc - IRQs on at this point(?), change local_irq_save to _disable.
-      - TODO: needs secondary CPUs to disable preempt (See #1)
-
diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst
new file mode 100644
index 000000000000..3a9064219656
--- /dev/null
+++ b/Documentation/scheduler/sched-bwc.rst
@@ -0,0 +1,128 @@
+=====================
+CFS Bandwidth Control
+=====================
+
+[ This document only discusses CPU bandwidth control for SCHED_NORMAL.
+  The SCHED_RT case is covered in Documentation/scheduler/sched-rt-group.rst ]
+
+CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the
+specification of the maximum CPU bandwidth available to a group or hierarchy.
+
+The bandwidth allowed for a group is specified using a quota and period. Within
+each given "period" (microseconds), a group is allowed to consume only up to
+"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
+group exceeds this limit (for that period), the tasks belonging to its
+hierarchy will be throttled and are not allowed to run again until the next
+period.
+
+A group's unused runtime is globally tracked, being refreshed with quota units
+above at each period boundary.  As threads consume this bandwidth it is
+transferred to cpu-local "silos" on a demand basis.  The amount transferred
+within each of these updates is tunable and described as the "slice".
+
+Management
+----------
+Quota and period are managed within the cpu subsystem via cgroupfs.
+
+cpu.cfs_quota_us: the total available run-time within a period (in microseconds)
+cpu.cfs_period_us: the length of a period (in microseconds)
+cpu.stat: exports throttling statistics [explained further below]
+
+The default values are::
+
+	cpu.cfs_period_us=100ms
+	cpu.cfs_quota=-1
+
+A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
+bandwidth restriction in place, such a group is described as an unconstrained
+bandwidth group.  This represents the traditional work-conserving behavior for
+CFS.
+
+Writing any (valid) positive value(s) will enact the specified bandwidth limit.
+The minimum quota allowed for the quota or period is 1ms.  There is also an
+upper bound on the period length of 1s.  Additional restrictions exist when
+bandwidth limits are used in a hierarchical fashion, these are explained in
+more detail below.
+
+Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
+and return the group to an unconstrained state once more.
+
+Any updates to a group's bandwidth specification will result in it becoming
+unthrottled if it is in a constrained state.
+
+System wide settings
+--------------------
+For efficiency run-time is transferred between the global pool and CPU local
+"silos" in a batch fashion.  This greatly reduces global accounting pressure
+on large systems.  The amount transferred each time such an update is required
+is described as the "slice".
+
+This is tunable via procfs::
+
+	/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)
+
+Larger slice values will reduce transfer overheads, while smaller values allow
+for more fine-grained consumption.
+
+Statistics
+----------
+A group's bandwidth statistics are exported via 3 fields in cpu.stat.
+
+cpu.stat:
+
+- nr_periods: Number of enforcement intervals that have elapsed.
+- nr_throttled: Number of times the group has been throttled/limited.
+- throttled_time: The total time duration (in nanoseconds) for which entities
+  of the group have been throttled.
+
+This interface is read-only.
+
+Hierarchical considerations
+---------------------------
+The interface enforces that an individual entity's bandwidth is always
+attainable, that is: max(c_i) <= C. However, over-subscription in the
+aggregate case is explicitly allowed to enable work-conserving semantics
+within a hierarchy:
+
+  e.g. \Sum (c_i) may exceed C
+
+[ Where C is the parent's bandwidth, and c_i its children ]
+
+
+There are two ways in which a group may become throttled:
+
+	a. it fully consumes its own quota within a period
+	b. a parent's quota is fully consumed within its period
+
+In case b) above, even though the child may have runtime remaining it will not
+be allowed to until the parent's runtime is refreshed.
+
+Examples
+--------
+1. Limit a group to 1 CPU worth of runtime::
+
+	If period is 250ms and quota is also 250ms, the group will get
+	1 CPU worth of runtime every 250ms.
+
+	# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
+	# echo 250000 > cpu.cfs_period_us /* period = 250ms */
+
+2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine
+
+   With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
+   runtime every 500ms::
+
+	# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
+	# echo 500000 > cpu.cfs_period_us /* period = 500ms */
+
+	The larger period here allows for increased burst capacity.
+
+3. Limit a group to 20% of 1 CPU.
+
+   With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU::
+
+	# echo 10000 > cpu.cfs_quota_us /* quota = 10ms */
+	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
+
+   By using a small period here we are ensuring a consistent latency
+   response at the expense of burst capacity.
diff --git a/Documentation/scheduler/sched-bwc.txt b/Documentation/scheduler/sched-bwc.txt
deleted file mode 100644
index f6b1873f68ab..000000000000
--- a/Documentation/scheduler/sched-bwc.txt
+++ /dev/null
@@ -1,122 +0,0 @@
-CFS Bandwidth Control
-=====================
-
-[ This document only discusses CPU bandwidth control for SCHED_NORMAL.
-  The SCHED_RT case is covered in Documentation/scheduler/sched-rt-group.txt ]
-
-CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the
-specification of the maximum CPU bandwidth available to a group or hierarchy.
-
-The bandwidth allowed for a group is specified using a quota and period. Within
-each given "period" (microseconds), a group is allowed to consume only up to
-"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
-group exceeds this limit (for that period), the tasks belonging to its
-hierarchy will be throttled and are not allowed to run again until the next
-period.
-
-A group's unused runtime is globally tracked, being refreshed with quota units
-above at each period boundary.  As threads consume this bandwidth it is
-transferred to cpu-local "silos" on a demand basis.  The amount transferred
-within each of these updates is tunable and described as the "slice".
-
-Management
-----------
-Quota and period are managed within the cpu subsystem via cgroupfs.
-
-cpu.cfs_quota_us: the total available run-time within a period (in microseconds)
-cpu.cfs_period_us: the length of a period (in microseconds)
-cpu.stat: exports throttling statistics [explained further below]
-
-The default values are:
-	cpu.cfs_period_us=100ms
-	cpu.cfs_quota=-1
-
-A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
-bandwidth restriction in place, such a group is described as an unconstrained
-bandwidth group.  This represents the traditional work-conserving behavior for
-CFS.
-
-Writing any (valid) positive value(s) will enact the specified bandwidth limit.
-The minimum quota allowed for the quota or period is 1ms.  There is also an
-upper bound on the period length of 1s.  Additional restrictions exist when
-bandwidth limits are used in a hierarchical fashion, these are explained in
-more detail below.
-
-Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
-and return the group to an unconstrained state once more.
-
-Any updates to a group's bandwidth specification will result in it becoming
-unthrottled if it is in a constrained state.
-
-System wide settings
---------------------
-For efficiency run-time is transferred between the global pool and CPU local
-"silos" in a batch fashion.  This greatly reduces global accounting pressure
-on large systems.  The amount transferred each time such an update is required
-is described as the "slice".
-
-This is tunable via procfs:
-	/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)
-
-Larger slice values will reduce transfer overheads, while smaller values allow
-for more fine-grained consumption.
-
-Statistics
-----------
-A group's bandwidth statistics are exported via 3 fields in cpu.stat.
-
-cpu.stat:
-- nr_periods: Number of enforcement intervals that have elapsed.
-- nr_throttled: Number of times the group has been throttled/limited.
-- throttled_time: The total time duration (in nanoseconds) for which entities
-  of the group have been throttled.
-
-This interface is read-only.
-
-Hierarchical considerations
----------------------------
-The interface enforces that an individual entity's bandwidth is always
-attainable, that is: max(c_i) <= C. However, over-subscription in the
-aggregate case is explicitly allowed to enable work-conserving semantics
-within a hierarchy.
-  e.g. \Sum (c_i) may exceed C
-[ Where C is the parent's bandwidth, and c_i its children ]
-
-
-There are two ways in which a group may become throttled:
-	a. it fully consumes its own quota within a period
-	b. a parent's quota is fully consumed within its period
-
-In case b) above, even though the child may have runtime remaining it will not
-be allowed to until the parent's runtime is refreshed.
-
-Examples
---------
-1. Limit a group to 1 CPU worth of runtime.
-
-	If period is 250ms and quota is also 250ms, the group will get
-	1 CPU worth of runtime every 250ms.
-
-	# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
-	# echo 250000 > cpu.cfs_period_us /* period = 250ms */
-
-2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine.
-
-	With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
-	runtime every 500ms.
-
-	# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
-	# echo 500000 > cpu.cfs_period_us /* period = 500ms */
-
-	The larger period here allows for increased burst capacity.
-
-3. Limit a group to 20% of 1 CPU.
-
-	With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU.
-
-	# echo 10000 > cpu.cfs_quota_us /* quota = 10ms */
-	# echo 50000 > cpu.cfs_period_us /* period = 50ms */
-
-	By using a small period here we are ensuring a consistent latency
-	response at the expense of burst capacity.
-
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
new file mode 100644
index 000000000000..873fb2775ca6
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -0,0 +1,888 @@
+========================
+Deadline Task Scheduling
+========================
+
+.. CONTENTS
+
+    0. WARNING
+    1. Overview
+    2. Scheduling algorithm
+      2.1 Main algorithm
+      2.2 Bandwidth reclaiming
+    3. Scheduling Real-Time Tasks
+      3.1 Definitions
+      3.2 Schedulability Analysis for Uniprocessor Systems
+      3.3 Schedulability Analysis for Multiprocessor Systems
+      3.4 Relationship with SCHED_DEADLINE Parameters
+    4. Bandwidth management
+      4.1 System-wide settings
+      4.2 Task interface
+      4.3 Default behavior
+      4.4 Behavior of sched_yield()
+    5. Tasks CPU affinity
+      5.1 SCHED_DEADLINE and cpusets HOWTO
+    6. Future plans
+    A. Test suite
+    B. Minimal main()
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root users
+ know what they're doing.
+
+
+1. Overview
+===========
+
+ The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
+ basically an implementation of the Earliest Deadline First (EDF) scheduling
+ algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
+ that makes it possible to isolate the behavior of tasks between each other.
+
+
+2. Scheduling algorithm
+=======================
+
+2.1 Main algorithm
+------------------
+
+ SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and
+ "deadline", to schedule tasks. A SCHED_DEADLINE task should receive
+ "runtime" microseconds of execution time every "period" microseconds, and
+ these "runtime" microseconds are available within "deadline" microseconds
+ from the beginning of the period.  In order to implement this behavior,
+ every time the task wakes up, the scheduler computes a "scheduling deadline"
+ consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
+ scheduled using EDF[1] on these scheduling deadlines (the task with the
+ earliest scheduling deadline is selected for execution). Notice that the
+ task actually receives "runtime" time units within "deadline" if a proper
+ "admission control" strategy (see Section "4. Bandwidth management") is used
+ (clearly, if the system is overloaded this guarantee cannot be respected).
+
+ Summing up, the CBS[2,3] algorithm assigns scheduling deadlines to tasks so
+ that each task runs for at most its runtime every period, avoiding any
+ interference between different tasks (bandwidth isolation), while the EDF[1]
+ algorithm selects the task with the earliest scheduling deadline as the one
+ to be executed next. Thanks to this feature, tasks that do not strictly comply
+ with the "traditional" real-time task model (see Section 3) can effectively
+ use the new policy.
+
+ In more details, the CBS algorithm assigns scheduling deadlines to
+ tasks in the following way:
+
+  - Each SCHED_DEADLINE task is characterized by the "runtime",
+    "deadline", and "period" parameters;
+
+  - The state of the task is described by a "scheduling deadline", and
+    a "remaining runtime". These two parameters are initially set to 0;
+
+  - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
+    the scheduler checks if::
+
+                 remaining runtime                  runtime
+        ----------------------------------    >    ---------
+        scheduling deadline - current time           period
+
+    then, if the scheduling deadline is smaller than the current time, or
+    this condition is verified, the scheduling deadline and the
+    remaining runtime are re-initialized as
+
+         scheduling deadline = current time + deadline
+         remaining runtime = runtime
+
+    otherwise, the scheduling deadline and the remaining runtime are
+    left unchanged;
+
+  - When a SCHED_DEADLINE task executes for an amount of time t, its
+    remaining runtime is decreased as::
+
+         remaining runtime = remaining runtime - t
+
+    (technically, the runtime is decreased at every tick, or when the
+    task is descheduled / preempted);
+
+  - When the remaining runtime becomes less or equal than 0, the task is
+    said to be "throttled" (also known as "depleted" in real-time literature)
+    and cannot be scheduled until its scheduling deadline. The "replenishment
+    time" for this task (see next item) is set to be equal to the current
+    value of the scheduling deadline;
+
+  - When the current time is equal to the replenishment time of a
+    throttled task, the scheduling deadline and the remaining runtime are
+    updated as::
+
+         scheduling deadline = scheduling deadline + period
+         remaining runtime = remaining runtime + runtime
+
+ The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task
+ to get informed about runtime overruns through the delivery of SIGXCPU
+ signals.
+
+
+2.2 Bandwidth reclaiming
+------------------------
+
+ Bandwidth reclaiming for deadline tasks is based on the GRUB (Greedy
+ Reclamation of Unused Bandwidth) algorithm [15, 16, 17] and it is enabled
+ when flag SCHED_FLAG_RECLAIM is set.
+
+ The following diagram illustrates the state names for tasks handled by GRUB::
+
+                             ------------
+                 (d)        |   Active   |
+              ------------->|            |
+              |             | Contending |
+              |              ------------
+              |                A      |
+          ----------           |      |
+         |          |          |      |
+         | Inactive |          |(b)   | (a)
+         |          |          |      |
+          ----------           |      |
+              A                |      V
+              |              ------------
+              |             |   Active   |
+              --------------|     Non    |
+                 (c)        | Contending |
+                             ------------
+
+ A task can be in one of the following states:
+
+  - ActiveContending: if it is ready for execution (or executing);
+
+  - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
+    time;
+
+  - Inactive: if it is blocked and has surpassed the 0-lag time.
+
+ State transitions:
+
+  (a) When a task blocks, it does not become immediately inactive since its
+      bandwidth cannot be immediately reclaimed without breaking the
+      real-time guarantees. It therefore enters a transitional state called
+      ActiveNonContending. The scheduler arms the "inactive timer" to fire at
+      the 0-lag time, when the task's bandwidth can be reclaimed without
+      breaking the real-time guarantees.
+
+      The 0-lag time for a task entering the ActiveNonContending state is
+      computed as::
+
+                        (runtime * dl_period)
+             deadline - ---------------------
+                             dl_runtime
+
+      where runtime is the remaining runtime, while dl_runtime and dl_period
+      are the reservation parameters.
+
+  (b) If the task wakes up before the inactive timer fires, the task re-enters
+      the ActiveContending state and the "inactive timer" is canceled.
+      In addition, if the task wakes up on a different runqueue, then
+      the task's utilization must be removed from the previous runqueue's active
+      utilization and must be added to the new runqueue's active utilization.
+      In order to avoid races between a task waking up on a runqueue while the
+      "inactive timer" is running on a different CPU, the "dl_non_contending"
+      flag is used to indicate that a task is not on a runqueue but is active
+      (so, the flag is set when the task blocks and is cleared when the
+      "inactive timer" fires or when the task  wakes up).
+
+  (c) When the "inactive timer" fires, the task enters the Inactive state and
+      its utilization is removed from the runqueue's active utilization.
+
+  (d) When an inactive task wakes up, it enters the ActiveContending state and
+      its utilization is added to the active utilization of the runqueue where
+      it has been enqueued.
+
+ For each runqueue, the algorithm GRUB keeps track of two different bandwidths:
+
+  - Active bandwidth (running_bw): this is the sum of the bandwidths of all
+    tasks in active state (i.e., ActiveContending or ActiveNonContending);
+
+  - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
+    runqueue, including the tasks in Inactive state.
+
+
+ The algorithm reclaims the bandwidth of the tasks in Inactive state.
+ It does so by decrementing the runtime of the executing task Ti at a pace equal
+ to
+
+           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+
+ where:
+
+  - Ui is the bandwidth of task Ti;
+  - Umax is the maximum reclaimable utilization (subjected to RT throttling
+    limits);
+  - Uinact is the (per runqueue) inactive utilization, computed as
+    (this_bq - running_bw);
+  - Uextra is the (per runqueue) extra reclaimable utilization
+    (subjected to RT throttling limits).
+
+
+ Let's now see a trivial example of two deadline tasks with runtime equal
+ to 4 and period equal to 8 (i.e., bandwidth equal to 0.5)::
+
+         A            Task T1
+         |
+         |                               |
+         |                               |
+         |--------                       |----
+         |       |                       V
+         |---|---|---|---|---|---|---|---|--------->t
+         0   1   2   3   4   5   6   7   8
+
+
+         A            Task T2
+         |
+         |                               |
+         |                               |
+         |       ------------------------|
+         |       |                       V
+         |---|---|---|---|---|---|---|---|--------->t
+         0   1   2   3   4   5   6   7   8
+
+
+         A            running_bw
+         |
+       1 -----------------               ------
+         |               |               |
+      0.5-               -----------------
+         |                               |
+         |---|---|---|---|---|---|---|---|--------->t
+         0   1   2   3   4   5   6   7   8
+
+
+  - Time t = 0:
+
+    Both tasks are ready for execution and therefore in ActiveContending state.
+    Suppose Task T1 is the first task to start execution.
+    Since there are no inactive tasks, its runtime is decreased as dq = -1 dt.
+
+  - Time t = 2:
+
+    Suppose that task T1 blocks
+    Task T1 therefore enters the ActiveNonContending state. Since its remaining
+    runtime is equal to 2, its 0-lag time is equal to t = 4.
+    Task T2 start execution, with runtime still decreased as dq = -1 dt since
+    there are no inactive tasks.
+
+  - Time t = 4:
+
+    This is the 0-lag time for Task T1. Since it didn't woken up in the
+    meantime, it enters the Inactive state. Its bandwidth is removed from
+    running_bw.
+    Task T2 continues its execution. However, its runtime is now decreased as
+    dq = - 0.5 dt because Uinact = 0.5.
+    Task T2 therefore reclaims the bandwidth unused by Task T1.
+
+  - Time t = 8:
+
+    Task T1 wakes up. It enters the ActiveContending state again, and the
+    running_bw is incremented.
+
+
+2.3 Energy-aware scheduling
+---------------------------
+
+ When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the
+ GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum
+ value that still allows to meet the deadlines. This behavior is currently
+ implemented only for ARM architectures.
+
+ A particular care must be taken in case the time needed for changing frequency
+ is of the same order of magnitude of the reservation period. In such cases,
+ setting a fixed CPU frequency results in a lower amount of deadline misses.
+
+
+3. Scheduling Real-Time Tasks
+=============================
+
+
+
+ ..  BIG FAT WARNING ******************************************************
+
+ .. warning::
+
+   This section contains a (not-thorough) summary on classical deadline
+   scheduling theory, and how it applies to SCHED_DEADLINE.
+   The reader can "safely" skip to Section 4 if only interested in seeing
+   how the scheduling policy can be used. Anyway, we strongly recommend
+   to come back here and continue reading (once the urge for testing is
+   satisfied :P) to be sure of fully understanding all technical details.
+
+ .. ************************************************************************
+
+ There are no limitations on what kind of task can exploit this new
+ scheduling discipline, even if it must be said that it is particularly
+ suited for periodic or sporadic real-time tasks that need guarantees on their
+ timing behavior, e.g., multimedia, streaming, control applications, etc.
+
+3.1 Definitions
+------------------------
+
+ A typical real-time task is composed of a repetition of computation phases
+ (task instances, or jobs) which are activated on a periodic or sporadic
+ fashion.
+ Each job J_j (where J_j is the j^th job of the task) is characterized by an
+ arrival time r_j (the time when the job starts), an amount of computation
+ time c_j needed to finish the job, and a job absolute deadline d_j, which
+ is the time within which the job should be finished. The maximum execution
+ time max{c_j} is called "Worst Case Execution Time" (WCET) for the task.
+ A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
+ sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
+ d_j = r_j + D, where D is the task's relative deadline.
+ Summing up, a real-time task can be described as
+
+	Task = (WCET, D, P)
+
+ The utilization of a real-time task is defined as the ratio between its
+ WCET and its period (or minimum inter-arrival time), and represents
+ the fraction of CPU time needed to execute the task.
+
+ If the total utilization U=sum(WCET_i/P_i) is larger than M (with M equal
+ to the number of CPUs), then the scheduler is unable to respect all the
+ deadlines.
+ Note that total utilization is defined as the sum of the utilizations
+ WCET_i/P_i over all the real-time tasks in the system. When considering
+ multiple real-time tasks, the parameters of the i-th task are indicated
+ with the "_i" suffix.
+ Moreover, if the total utilization is larger than M, then we risk starving
+ non- real-time tasks by real-time tasks.
+ If, instead, the total utilization is smaller than M, then non real-time
+ tasks will not be starved and the system might be able to respect all the
+ deadlines.
+ As a matter of fact, in this case it is possible to provide an upper bound
+ for tardiness (defined as the maximum between 0 and the difference
+ between the finishing time of a job and its absolute deadline).
+ More precisely, it can be proven that using a global EDF scheduler the
+ maximum tardiness of each task is smaller or equal than
+
+	((M − 1) · WCET_max − WCET_min)/(M − (M − 2) · U_max) + WCET_max
+
+ where WCET_max = max{WCET_i} is the maximum WCET, WCET_min=min{WCET_i}
+ is the minimum WCET, and U_max = max{WCET_i/P_i} is the maximum
+ utilization[12].
+
+3.2 Schedulability Analysis for Uniprocessor Systems
+----------------------------------------------------
+
+ If M=1 (uniprocessor system), or in case of partitioned scheduling (each
+ real-time task is statically assigned to one and only one CPU), it is
+ possible to formally check if all the deadlines are respected.
+ If D_i = P_i for all tasks, then EDF is able to respect all the deadlines
+ of all the tasks executing on a CPU if and only if the total utilization
+ of the tasks running on such a CPU is smaller or equal than 1.
+ If D_i != P_i for some task, then it is possible to define the density of
+ a task as WCET_i/min{D_i,P_i}, and EDF is able to respect all the deadlines
+ of all the tasks running on a CPU if the sum of the densities of the tasks
+ running on such a CPU is smaller or equal than 1:
+
+	sum(WCET_i / min{D_i, P_i}) <= 1
+
+ It is important to notice that this condition is only sufficient, and not
+ necessary: there are task sets that are schedulable, but do not respect the
+ condition. For example, consider the task set {Task_1,Task_2} composed by
+ Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100ms,100ms).
+ EDF is clearly able to schedule the two tasks without missing any deadline
+ (Task_1 is scheduled as soon as it is released, and finishes just in time
+ to respect its deadline; Task_2 is scheduled immediately after Task_1, hence
+ its response time cannot be larger than 50ms + 10ms = 60ms) even if
+
+	50 / min{50,100} + 10 / min{100, 100} = 50 / 50 + 10 / 100 = 1.1
+
+ Of course it is possible to test the exact schedulability of tasks with
+ D_i != P_i (checking a condition that is both sufficient and necessary),
+ but this cannot be done by comparing the total utilization or density with
+ a constant. Instead, the so called "processor demand" approach can be used,
+ computing the total amount of CPU time h(t) needed by all the tasks to
+ respect all of their deadlines in a time interval of size t, and comparing
+ such a time with the interval size t. If h(t) is smaller than t (that is,
+ the amount of time needed by the tasks in a time interval of size t is
+ smaller than the size of the interval) for all the possible values of t, then
+ EDF is able to schedule the tasks respecting all of their deadlines. Since
+ performing this check for all possible values of t is impossible, it has been
+ proven[4,5,6] that it is sufficient to perform the test for values of t
+ between 0 and a maximum value L. The cited papers contain all of the
+ mathematical details and explain how to compute h(t) and L.
+ In any case, this kind of analysis is too complex as well as too
+ time-consuming to be performed on-line. Hence, as explained in Section
+ 4 Linux uses an admission test based on the tasks' utilizations.
+
+3.3 Schedulability Analysis for Multiprocessor Systems
+------------------------------------------------------
+
+ On multiprocessor systems with global EDF scheduling (non partitioned
+ systems), a sufficient test for schedulability can not be based on the
+ utilizations or densities: it can be shown that even if D_i = P_i task
+ sets with utilizations slightly larger than 1 can miss deadlines regardless
+ of the number of CPUs.
+
+ Consider a set {Task_1,...Task_{M+1}} of M+1 tasks on a system with M
+ CPUs, with the first task Task_1=(P,P,P) having period, relative deadline
+ and WCET equal to P. The remaining M tasks Task_i=(e,P-1,P-1) have an
+ arbitrarily small worst case execution time (indicated as "e" here) and a
+ period smaller than the one of the first task. Hence, if all the tasks
+ activate at the same time t, global EDF schedules these M tasks first
+ (because their absolute deadlines are equal to t + P - 1, hence they are
+ smaller than the absolute deadline of Task_1, which is t + P). As a
+ result, Task_1 can be scheduled only at time t + e, and will finish at
+ time t + e + P, after its absolute deadline. The total utilization of the
+ task set is U = M · e / (P - 1) + P / P = M · e / (P - 1) + 1, and for small
+ values of e this can become very close to 1. This is known as "Dhall's
+ effect"[7]. Note: the example in the original paper by Dhall has been
+ slightly simplified here (for example, Dhall more correctly computed
+ lim_{e->0}U).
+
+ More complex schedulability tests for global EDF have been developed in
+ real-time literature[8,9], but they are not based on a simple comparison
+ between total utilization (or density) and a fixed constant. If all tasks
+ have D_i = P_i, a sufficient schedulability condition can be expressed in
+ a simple way:
+
+	sum(WCET_i / P_i) <= M - (M - 1) · U_max
+
+ where U_max = max{WCET_i / P_i}[10]. Notice that for U_max = 1,
+ M - (M - 1) · U_max becomes M - M + 1 = 1 and this schedulability condition
+ just confirms the Dhall's effect. A more complete survey of the literature
+ about schedulability tests for multi-processor real-time scheduling can be
+ found in [11].
+
+ As seen, enforcing that the total utilization is smaller than M does not
+ guarantee that global EDF schedules the tasks without missing any deadline
+ (in other words, global EDF is not an optimal scheduling algorithm). However,
+ a total utilization smaller than M is enough to guarantee that non real-time
+ tasks are not starved and that the tardiness of real-time tasks has an upper
+ bound[12] (as previously noted). Different bounds on the maximum tardiness
+ experienced by real-time tasks have been developed in various papers[13,14],
+ but the theoretical result that is important for SCHED_DEADLINE is that if
+ the total utilization is smaller or equal than M then the response times of
+ the tasks are limited.
+
+3.4 Relationship with SCHED_DEADLINE Parameters
+-----------------------------------------------
+
+ Finally, it is important to understand the relationship between the
+ SCHED_DEADLINE scheduling parameters described in Section 2 (runtime,
+ deadline and period) and the real-time task parameters (WCET, D, P)
+ described in this section. Note that the tasks' temporal constraints are
+ represented by its absolute deadlines d_j = r_j + D described above, while
+ SCHED_DEADLINE schedules the tasks according to scheduling deadlines (see
+ Section 2).
+ If an admission test is used to guarantee that the scheduling deadlines
+ are respected, then SCHED_DEADLINE can be used to schedule real-time tasks
+ guaranteeing that all the jobs' deadlines of a task are respected.
+ In order to do this, a task must be scheduled by setting:
+
+  - runtime >= WCET
+  - deadline = D
+  - period <= P
+
+ IOW, if runtime >= WCET and if period is <= P, then the scheduling deadlines
+ and the absolute deadlines (d_j) coincide, so a proper admission control
+ allows to respect the jobs' absolute deadlines for this task (this is what is
+ called "hard schedulability property" and is an extension of Lemma 1 of [2]).
+ Notice that if runtime > deadline the admission control will surely reject
+ this task, as it is not possible to respect its temporal constraints.
+
+ References:
+
+  1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
+      ming in a hard-real-time environment. Journal of the Association for
+      Computing Machinery, 20(1), 1973.
+  2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
+      Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
+      Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
+  3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
+      Technical Report. http://disi.unitn.it/~abeni/tr-98-01.pdf
+  4 - J. Y. Leung and M.L. Merril. A Note on Preemptive Scheduling of
+      Periodic, Real-Time Tasks. Information Processing Letters, vol. 11,
+      no. 3, pp. 115-118, 1980.
+  5 - S. K. Baruah, A. K. Mok and L. E. Rosier. Preemptively Scheduling
+      Hard-Real-Time Sporadic Tasks on One Processor. Proceedings of the
+      11th IEEE Real-time Systems Symposium, 1990.
+  6 - S. K. Baruah, L. E. Rosier and R. R. Howell. Algorithms and Complexity
+      Concerning the Preemptive Scheduling of Periodic Real-Time tasks on
+      One Processor. Real-Time Systems Journal, vol. 4, no. 2, pp 301-324,
+      1990.
+  7 - S. J. Dhall and C. L. Liu. On a real-time scheduling problem. Operations
+      research, vol. 26, no. 1, pp 127-140, 1978.
+  8 - T. Baker. Multiprocessor EDF and Deadline Monotonic Schedulability
+      Analysis. Proceedings of the 24th IEEE Real-Time Systems Symposium, 2003.
+  9 - T. Baker. An Analysis of EDF Schedulability on a Multiprocessor.
+      IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8,
+      pp 760-768, 2005.
+  10 - J. Goossens, S. Funk and S. Baruah, Priority-Driven Scheduling of
+       Periodic Task Systems on Multiprocessors. Real-Time Systems Journal,
+       vol. 25, no. 2–3, pp. 187–205, 2003.
+  11 - R. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for
+       Multiprocessor Systems. ACM Computing Surveys, vol. 43, no. 4, 2011.
+       http://www-users.cs.york.ac.uk/~robdavis/papers/MPSurveyv5.0.pdf
+  12 - U. C. Devi and J. H. Anderson. Tardiness Bounds under Global EDF
+       Scheduling on a Multiprocessor. Real-Time Systems Journal, vol. 32,
+       no. 2, pp 133-189, 2008.
+  13 - P. Valente and G. Lipari. An Upper Bound to the Lateness of Soft
+       Real-Time Tasks Scheduled by EDF on Multiprocessors. Proceedings of
+       the 26th IEEE Real-Time Systems Symposium, 2005.
+  14 - J. Erickson, U. Devi and S. Baruah. Improved tardiness bounds for
+       Global EDF. Proceedings of the 22nd Euromicro Conference on
+       Real-Time Systems, 2010.
+  15 - G. Lipari, S. Baruah, Greedy reclamation of unused bandwidth in
+       constant-bandwidth servers, 12th IEEE Euromicro Conference on Real-Time
+       Systems, 2000.
+  16 - L. Abeni, J. Lelli, C. Scordino, L. Palopoli, Greedy CPU reclaiming for
+       SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS),
+       Dusseldorf, Germany, 2014.
+  17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel
+       or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied
+       Computing, 2016.
+  18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the
+       Linux kernel, Software: Practice and Experience, 46(6): 821-839, June
+       2016.
+  19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in
+       the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC
+       2018), Pau, France, April 2018.
+
+
+4. Bandwidth management
+=======================
+
+ As previously mentioned, in order for -deadline scheduling to be
+ effective and useful (that is, to be able to provide "runtime" time units
+ within "deadline"), it is important to have some method to keep the allocation
+ of the available fractions of CPU time to the various tasks under control.
+ This is usually called "admission control" and if it is not performed, then
+ no guarantee can be given on the actual scheduling of the -deadline tasks.
+
+ As already stated in Section 3, a necessary condition to be respected to
+ correctly schedule a set of real-time tasks is that the total utilization
+ is smaller than M. When talking about -deadline tasks, this requires that
+ the sum of the ratio between runtime and period for all tasks is smaller
+ than M. Notice that the ratio runtime/period is equivalent to the utilization
+ of a "traditional" real-time task, and is also often referred to as
+ "bandwidth".
+ The interface used to control the CPU bandwidth that can be allocated
+ to -deadline tasks is similar to the one already used for -rt
+ tasks with real-time group scheduling (a.k.a. RT-throttling - see
+ Documentation/scheduler/sched-rt-group.rst), and is based on readable/
+ writable control files located in procfs (for system wide settings).
+ Notice that per-group settings (controlled through cgroupfs) are still not
+ defined for -deadline tasks, because more discussion is needed in order to
+ figure out how we want to manage SCHED_DEADLINE bandwidth at the task group
+ level.
+
+ A main difference between deadline bandwidth management and RT-throttling
+ is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
+ and thus we don't need a higher level throttling mechanism to enforce the
+ desired bandwidth. In other words, this means that interface parameters are
+ only used at admission control time (i.e., when the user calls
+ sched_setattr()). Scheduling is then performed considering actual tasks'
+ parameters, so that CPU bandwidth is allocated to SCHED_DEADLINE tasks
+ respecting their needs in terms of granularity. Therefore, using this simple
+ interface we can put a cap on total utilization of -deadline tasks (i.e.,
+ \Sum (runtime_i / period_i) < global_dl_utilization_cap).
+
+4.1 System wide settings
+------------------------
+
+ The system wide settings are configured under the /proc virtual file system.
+
+ For now the -rt knobs are used for -deadline admission control and the
+ -deadline runtime is accounted against the -rt runtime. We realize that this
+ isn't entirely desirable; however, it is better to have a small interface for
+ now, and be able to change it easily later. The ideal situation (see 5.) is to
+ run -rt tasks from a -deadline server; in which case the -rt bandwidth is a
+ direct subset of dl_bw.
+
+ This means that, for a root_domain comprising M CPUs, -deadline tasks
+ can be created while the sum of their bandwidths stays below:
+
+   M * (sched_rt_runtime_us / sched_rt_period_us)
+
+ It is also possible to disable this bandwidth management logic, and
+ be thus free of oversubscribing the system up to any arbitrary level.
+ This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
+
+
+4.2 Task interface
+------------------
+
+ Specifying a periodic/sporadic task that executes for a given amount of
+ runtime at each instance, and that is scheduled according to the urgency of
+ its own timing constraints needs, in general, a way of declaring:
+
+  - a (maximum/typical) instance execution time,
+  - a minimum interval between consecutive instances,
+  - a time constraint by which each instance must be completed.
+
+ Therefore:
+
+  * a new struct sched_attr, containing all the necessary fields is
+    provided;
+  * the new scheduling related syscalls that manipulate it, i.e.,
+    sched_setattr() and sched_getattr() are implemented.
+
+ For debugging purposes, the leftover runtime and absolute deadline of a
+ SCHED_DEADLINE task can be retrieved through /proc/<pid>/sched (entries
+ dl.runtime and dl.deadline, both values in ns). A programmatic way to
+ retrieve these values from production code is under discussion.
+
+
+4.3 Default behavior
+---------------------
+
+ The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
+ 950000. With rt_period equal to 1000000, by default, it means that -deadline
+ tasks can use at most 95%, multiplied by the number of CPUs that compose the
+ root_domain, for each root_domain.
+ This means that non -deadline tasks will receive at least 5% of the CPU time,
+ and that -deadline tasks will receive their runtime with a guaranteed
+ worst-case delay respect to the "deadline" parameter. If "deadline" = "period"
+ and the cpuset mechanism is used to implement partitioned scheduling (see
+ Section 5), then this simple setting of the bandwidth management is able to
+ deterministically guarantee that -deadline tasks will receive their runtime
+ in a period.
+
+ Finally, notice that in order not to jeopardize the admission control a
+ -deadline task cannot fork.
+
+
+4.4 Behavior of sched_yield()
+-----------------------------
+
+ When a SCHED_DEADLINE task calls sched_yield(), it gives up its
+ remaining runtime and is immediately throttled, until the next
+ period, when its runtime will be replenished (a special flag
+ dl_yielded is set and used to handle correctly throttling and runtime
+ replenishment after a call to sched_yield()).
+
+ This behavior of sched_yield() allows the task to wake-up exactly at
+ the beginning of the next period. Also, this may be useful in the
+ future with bandwidth reclaiming mechanisms, where sched_yield() will
+ make the leftoever runtime available for reclamation by other
+ SCHED_DEADLINE tasks.
+
+
+5. Tasks CPU affinity
+=====================
+
+ -deadline tasks cannot have an affinity mask smaller that the entire
+ root_domain they are created on. However, affinities can be specified
+ through the cpuset facility (Documentation/cgroup-v1/cpusets.txt).
+
+5.1 SCHED_DEADLINE and cpusets HOWTO
+------------------------------------
+
+ An example of a simple configuration (pin a -deadline task to CPU0)
+ follows (rt-app is used to create a -deadline task)::
+
+   mkdir /dev/cpuset
+   mount -t cgroup -o cpuset cpuset /dev/cpuset
+   cd /dev/cpuset
+   mkdir cpu0
+   echo 0 > cpu0/cpuset.cpus
+   echo 0 > cpu0/cpuset.mems
+   echo 1 > cpuset.cpu_exclusive
+   echo 0 > cpuset.sched_load_balance
+   echo 1 > cpu0/cpuset.cpu_exclusive
+   echo 1 > cpu0/cpuset.mem_exclusive
+   echo $$ > cpu0/tasks
+   rt-app -t 100000:10000:d:0 -D5 # it is now actually superfluous to specify
+				  # task affinity
+
+6. Future plans
+===============
+
+ Still missing:
+
+  - programmatic way to retrieve current runtime and absolute deadline
+  - refinements to deadline inheritance, especially regarding the possibility
+    of retaining bandwidth isolation among non-interacting tasks. This is
+    being studied from both theoretical and practical points of view, and
+    hopefully we should be able to produce some demonstrative code soon;
+  - (c)group based bandwidth management, and maybe scheduling;
+  - access control for non-root users (and related security concerns to
+    address), which is the best way to allow unprivileged use of the mechanisms
+    and how to prevent non-root users "cheat" the system?
+
+ As already discussed, we are planning also to merge this work with the EDF
+ throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
+ the preliminary phases of the merge and we really seek feedback that would
+ help us decide on the direction it should take.
+
+Appendix A. Test suite
+======================
+
+ The SCHED_DEADLINE policy can be easily tested using two applications that
+ are part of a wider Linux Scheduler validation suite. The suite is
+ available as a GitHub repository: https://github.com/scheduler-tools.
+
+ The first testing application is called rt-app and can be used to
+ start multiple threads with specific parameters. rt-app supports
+ SCHED_{OTHER,FIFO,RR,DEADLINE} scheduling policies and their related
+ parameters (e.g., niceness, priority, runtime/deadline/period). rt-app
+ is a valuable tool, as it can be used to synthetically recreate certain
+ workloads (maybe mimicking real use-cases) and evaluate how the scheduler
+ behaves under such workloads. In this way, results are easily reproducible.
+ rt-app is available at: https://github.com/scheduler-tools/rt-app.
+
+ Thread parameters can be specified from the command line, with something like
+ this::
+
+  # rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
+
+ The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
+ executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
+ priority 10, executes for 20ms every 150ms. The test will run for a total
+ of 5 seconds.
+
+ More interestingly, configurations can be described with a json file that
+ can be passed as input to rt-app with something like this::
+
+  # rt-app my_config.json
+
+ The parameters that can be specified with the second method are a superset
+ of the command line options. Please refer to rt-app documentation for more
+ details (`<rt-app-sources>/doc/*.json`).
+
+ The second testing application is a modification of schedtool, called
+ schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
+ certain pid/application. schedtool-dl is available at:
+ https://github.com/scheduler-tools/schedtool-dl.git.
+
+ The usage is straightforward::
+
+  # schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
+
+ With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
+ of 10ms every 100ms (note that parameters are expressed in microseconds).
+ You can also use schedtool to create a reservation for an already running
+ application, given that you know its pid::
+
+  # schedtool -E -t 10000000:100000000 my_app_pid
+
+Appendix B. Minimal main()
+==========================
+
+ We provide in what follows a simple (ugly) self-contained code snippet
+ showing how SCHED_DEADLINE reservations can be created by a real-time
+ application developer::
+
+   #define _GNU_SOURCE
+   #include <unistd.h>
+   #include <stdio.h>
+   #include <stdlib.h>
+   #include <string.h>
+   #include <time.h>
+   #include <linux/unistd.h>
+   #include <linux/kernel.h>
+   #include <linux/types.h>
+   #include <sys/syscall.h>
+   #include <pthread.h>
+
+   #define gettid() syscall(__NR_gettid)
+
+   #define SCHED_DEADLINE	6
+
+   /* XXX use the proper syscall numbers */
+   #ifdef __x86_64__
+   #define __NR_sched_setattr		314
+   #define __NR_sched_getattr		315
+   #endif
+
+   #ifdef __i386__
+   #define __NR_sched_setattr		351
+   #define __NR_sched_getattr		352
+   #endif
+
+   #ifdef __arm__
+   #define __NR_sched_setattr		380
+   #define __NR_sched_getattr		381
+   #endif
+
+   static volatile int done;
+
+   struct sched_attr {
+	__u32 size;
+
+	__u32 sched_policy;
+	__u64 sched_flags;
+
+	/* SCHED_NORMAL, SCHED_BATCH */
+	__s32 sched_nice;
+
+	/* SCHED_FIFO, SCHED_RR */
+	__u32 sched_priority;
+
+	/* SCHED_DEADLINE (nsec) */
+	__u64 sched_runtime;
+	__u64 sched_deadline;
+	__u64 sched_period;
+   };
+
+   int sched_setattr(pid_t pid,
+		  const struct sched_attr *attr,
+		  unsigned int flags)
+   {
+	return syscall(__NR_sched_setattr, pid, attr, flags);
+   }
+
+   int sched_getattr(pid_t pid,
+		  struct sched_attr *attr,
+		  unsigned int size,
+		  unsigned int flags)
+   {
+	return syscall(__NR_sched_getattr, pid, attr, size, flags);
+   }
+
+   void *run_deadline(void *data)
+   {
+	struct sched_attr attr;
+	int x = 0;
+	int ret;
+	unsigned int flags = 0;
+
+	printf("deadline thread started [%ld]\n", gettid());
+
+	attr.size = sizeof(attr);
+	attr.sched_flags = 0;
+	attr.sched_nice = 0;
+	attr.sched_priority = 0;
+
+	/* This creates a 10ms/30ms reservation */
+	attr.sched_policy = SCHED_DEADLINE;
+	attr.sched_runtime = 10 * 1000 * 1000;
+	attr.sched_period = attr.sched_deadline = 30 * 1000 * 1000;
+
+	ret = sched_setattr(0, &attr, flags);
+	if (ret < 0) {
+		done = 0;
+		perror("sched_setattr");
+		exit(-1);
+	}
+
+	while (!done) {
+		x++;
+	}
+
+	printf("deadline thread dies [%ld]\n", gettid());
+	return NULL;
+   }
+
+   int main (int argc, char **argv)
+   {
+	pthread_t thread;
+
+	printf("main thread [%ld]\n", gettid());
+
+	pthread_create(&thread, NULL, run_deadline, NULL);
+
+	sleep(10);
+
+	done = 1;
+	pthread_join(thread, NULL);
+
+	printf("main dies [%ld]\n", gettid());
+	return 0;
+   }
diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
deleted file mode 100644
index b14e03ff3528..000000000000
--- a/Documentation/scheduler/sched-deadline.txt
+++ /dev/null
@@ -1,871 +0,0 @@
-			  Deadline Task Scheduling
-			  ------------------------
-
-CONTENTS
-========
-
- 0. WARNING
- 1. Overview
- 2. Scheduling algorithm
-   2.1 Main algorithm
-   2.2 Bandwidth reclaiming
- 3. Scheduling Real-Time Tasks
-   3.1 Definitions
-   3.2 Schedulability Analysis for Uniprocessor Systems
-   3.3 Schedulability Analysis for Multiprocessor Systems
-   3.4 Relationship with SCHED_DEADLINE Parameters
- 4. Bandwidth management
-   4.1 System-wide settings
-   4.2 Task interface
-   4.3 Default behavior
-   4.4 Behavior of sched_yield()
- 5. Tasks CPU affinity
-   5.1 SCHED_DEADLINE and cpusets HOWTO
- 6. Future plans
- A. Test suite
- B. Minimal main()
-
-
-0. WARNING
-==========
-
- Fiddling with these settings can result in an unpredictable or even unstable
- system behavior. As for -rt (group) scheduling, it is assumed that root users
- know what they're doing.
-
-
-1. Overview
-===========
-
- The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
- basically an implementation of the Earliest Deadline First (EDF) scheduling
- algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
- that makes it possible to isolate the behavior of tasks between each other.
-
-
-2. Scheduling algorithm
-==================
-
-2.1 Main algorithm
-------------------
-
- SCHED_DEADLINE [18] uses three parameters, named "runtime", "period", and
- "deadline", to schedule tasks. A SCHED_DEADLINE task should receive
- "runtime" microseconds of execution time every "period" microseconds, and
- these "runtime" microseconds are available within "deadline" microseconds
- from the beginning of the period.  In order to implement this behavior,
- every time the task wakes up, the scheduler computes a "scheduling deadline"
- consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
- scheduled using EDF[1] on these scheduling deadlines (the task with the
- earliest scheduling deadline is selected for execution). Notice that the
- task actually receives "runtime" time units within "deadline" if a proper
- "admission control" strategy (see Section "4. Bandwidth management") is used
- (clearly, if the system is overloaded this guarantee cannot be respected).
-
- Summing up, the CBS[2,3] algorithm assigns scheduling deadlines to tasks so
- that each task runs for at most its runtime every period, avoiding any
- interference between different tasks (bandwidth isolation), while the EDF[1]
- algorithm selects the task with the earliest scheduling deadline as the one
- to be executed next. Thanks to this feature, tasks that do not strictly comply
- with the "traditional" real-time task model (see Section 3) can effectively
- use the new policy.
-
- In more details, the CBS algorithm assigns scheduling deadlines to
- tasks in the following way:
-
-  - Each SCHED_DEADLINE task is characterized by the "runtime",
-    "deadline", and "period" parameters;
-
-  - The state of the task is described by a "scheduling deadline", and
-    a "remaining runtime". These two parameters are initially set to 0;
-
-  - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
-    the scheduler checks if
-
-                 remaining runtime                  runtime
-        ----------------------------------    >    ---------
-        scheduling deadline - current time           period
-
-    then, if the scheduling deadline is smaller than the current time, or
-    this condition is verified, the scheduling deadline and the
-    remaining runtime are re-initialized as
-
-         scheduling deadline = current time + deadline
-         remaining runtime = runtime
-
-    otherwise, the scheduling deadline and the remaining runtime are
-    left unchanged;
-
-  - When a SCHED_DEADLINE task executes for an amount of time t, its
-    remaining runtime is decreased as
-
-         remaining runtime = remaining runtime - t
-
-    (technically, the runtime is decreased at every tick, or when the
-    task is descheduled / preempted);
-
-  - When the remaining runtime becomes less or equal than 0, the task is
-    said to be "throttled" (also known as "depleted" in real-time literature)
-    and cannot be scheduled until its scheduling deadline. The "replenishment
-    time" for this task (see next item) is set to be equal to the current
-    value of the scheduling deadline;
-
-  - When the current time is equal to the replenishment time of a
-    throttled task, the scheduling deadline and the remaining runtime are
-    updated as
-
-         scheduling deadline = scheduling deadline + period
-         remaining runtime = remaining runtime + runtime
-
- The SCHED_FLAG_DL_OVERRUN flag in sched_attr's sched_flags field allows a task
- to get informed about runtime overruns through the delivery of SIGXCPU
- signals.
-
-
-2.2 Bandwidth reclaiming
-------------------------
-
- Bandwidth reclaiming for deadline tasks is based on the GRUB (Greedy
- Reclamation of Unused Bandwidth) algorithm [15, 16, 17] and it is enabled
- when flag SCHED_FLAG_RECLAIM is set.
-
- The following diagram illustrates the state names for tasks handled by GRUB:
-
-                             ------------
-                 (d)        |   Active   |
-              ------------->|            |
-              |             | Contending |
-              |              ------------
-              |                A      |
-          ----------           |      |
-         |          |          |      |
-         | Inactive |          |(b)   | (a)
-         |          |          |      |
-          ----------           |      |
-              A                |      V
-              |              ------------
-              |             |   Active   |
-              --------------|     Non    |
-                 (c)        | Contending |
-                             ------------
-
- A task can be in one of the following states:
-
-  - ActiveContending: if it is ready for execution (or executing);
-
-  - ActiveNonContending: if it just blocked and has not yet surpassed the 0-lag
-    time;
-
-  - Inactive: if it is blocked and has surpassed the 0-lag time.
-
- State transitions:
-
-  (a) When a task blocks, it does not become immediately inactive since its
-      bandwidth cannot be immediately reclaimed without breaking the
-      real-time guarantees. It therefore enters a transitional state called
-      ActiveNonContending. The scheduler arms the "inactive timer" to fire at
-      the 0-lag time, when the task's bandwidth can be reclaimed without
-      breaking the real-time guarantees.
-
-      The 0-lag time for a task entering the ActiveNonContending state is
-      computed as
-
-                        (runtime * dl_period)
-             deadline - ---------------------
-                             dl_runtime
-
-      where runtime is the remaining runtime, while dl_runtime and dl_period
-      are the reservation parameters.
-
-  (b) If the task wakes up before the inactive timer fires, the task re-enters
-      the ActiveContending state and the "inactive timer" is canceled.
-      In addition, if the task wakes up on a different runqueue, then
-      the task's utilization must be removed from the previous runqueue's active
-      utilization and must be added to the new runqueue's active utilization.
-      In order to avoid races between a task waking up on a runqueue while the
-       "inactive timer" is running on a different CPU, the "dl_non_contending"
-      flag is used to indicate that a task is not on a runqueue but is active
-      (so, the flag is set when the task blocks and is cleared when the
-      "inactive timer" fires or when the task  wakes up).
-
-  (c) When the "inactive timer" fires, the task enters the Inactive state and
-      its utilization is removed from the runqueue's active utilization.
-
-  (d) When an inactive task wakes up, it enters the ActiveContending state and
-      its utilization is added to the active utilization of the runqueue where
-      it has been enqueued.
-
- For each runqueue, the algorithm GRUB keeps track of two different bandwidths:
-
-  - Active bandwidth (running_bw): this is the sum of the bandwidths of all
-    tasks in active state (i.e., ActiveContending or ActiveNonContending);
-
-  - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
-    runqueue, including the tasks in Inactive state.
-
-
- The algorithm reclaims the bandwidth of the tasks in Inactive state.
- It does so by decrementing the runtime of the executing task Ti at a pace equal
- to
-
-           dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
-
- where:
-
-  - Ui is the bandwidth of task Ti;
-  - Umax is the maximum reclaimable utilization (subjected to RT throttling
-    limits);
-  - Uinact is the (per runqueue) inactive utilization, computed as
-    (this_bq - running_bw);
-  - Uextra is the (per runqueue) extra reclaimable utilization
-    (subjected to RT throttling limits).
-
-
- Let's now see a trivial example of two deadline tasks with runtime equal
- to 4 and period equal to 8 (i.e., bandwidth equal to 0.5):
-
-     A            Task T1
-     |
-     |                               |
-     |                               |
-     |--------                       |----
-     |       |                       V
-     |---|---|---|---|---|---|---|---|--------->t
-     0   1   2   3   4   5   6   7   8
-
-
-     A            Task T2
-     |
-     |                               |
-     |                               |
-     |       ------------------------|
-     |       |                       V
-     |---|---|---|---|---|---|---|---|--------->t
-     0   1   2   3   4   5   6   7   8
-
-
-     A            running_bw
-     |
-   1 -----------------               ------
-     |               |               |
-  0.5-               -----------------
-     |                               |
-     |---|---|---|---|---|---|---|---|--------->t
-     0   1   2   3   4   5   6   7   8
-
-
-  - Time t = 0:
-
-    Both tasks are ready for execution and therefore in ActiveContending state.
-    Suppose Task T1 is the first task to start execution.
-    Since there are no inactive tasks, its runtime is decreased as dq = -1 dt.
-
-  - Time t = 2:
-
-    Suppose that task T1 blocks
-    Task T1 therefore enters the ActiveNonContending state. Since its remaining
-    runtime is equal to 2, its 0-lag time is equal to t = 4.
-    Task T2 start execution, with runtime still decreased as dq = -1 dt since
-    there are no inactive tasks.
-
-  - Time t = 4:
-
-    This is the 0-lag time for Task T1. Since it didn't woken up in the
-    meantime, it enters the Inactive state. Its bandwidth is removed from
-    running_bw.
-    Task T2 continues its execution. However, its runtime is now decreased as
-    dq = - 0.5 dt because Uinact = 0.5.
-    Task T2 therefore reclaims the bandwidth unused by Task T1.
-
-  - Time t = 8:
-
-    Task T1 wakes up. It enters the ActiveContending state again, and the
-    running_bw is incremented.
-
-
-2.3 Energy-aware scheduling
-------------------------
-
- When cpufreq's schedutil governor is selected, SCHED_DEADLINE implements the
- GRUB-PA [19] algorithm, reducing the CPU operating frequency to the minimum
- value that still allows to meet the deadlines. This behavior is currently
- implemented only for ARM architectures.
-
- A particular care must be taken in case the time needed for changing frequency
- is of the same order of magnitude of the reservation period. In such cases,
- setting a fixed CPU frequency results in a lower amount of deadline misses.
-
-
-3. Scheduling Real-Time Tasks
-=============================
-
- * BIG FAT WARNING ******************************************************
- *
- * This section contains a (not-thorough) summary on classical deadline
- * scheduling theory, and how it applies to SCHED_DEADLINE.
- * The reader can "safely" skip to Section 4 if only interested in seeing
- * how the scheduling policy can be used. Anyway, we strongly recommend
- * to come back here and continue reading (once the urge for testing is
- * satisfied :P) to be sure of fully understanding all technical details.
- ************************************************************************
-
- There are no limitations on what kind of task can exploit this new
- scheduling discipline, even if it must be said that it is particularly
- suited for periodic or sporadic real-time tasks that need guarantees on their
- timing behavior, e.g., multimedia, streaming, control applications, etc.
-
-3.1 Definitions
-------------------------
-
- A typical real-time task is composed of a repetition of computation phases
- (task instances, or jobs) which are activated on a periodic or sporadic
- fashion.
- Each job J_j (where J_j is the j^th job of the task) is characterized by an
- arrival time r_j (the time when the job starts), an amount of computation
- time c_j needed to finish the job, and a job absolute deadline d_j, which
- is the time within which the job should be finished. The maximum execution
- time max{c_j} is called "Worst Case Execution Time" (WCET) for the task.
- A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
- sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
- d_j = r_j + D, where D is the task's relative deadline.
- Summing up, a real-time task can be described as
-	Task = (WCET, D, P)
-
- The utilization of a real-time task is defined as the ratio between its
- WCET and its period (or minimum inter-arrival time), and represents
- the fraction of CPU time needed to execute the task.
-
- If the total utilization U=sum(WCET_i/P_i) is larger than M (with M equal
- to the number of CPUs), then the scheduler is unable to respect all the
- deadlines.
- Note that total utilization is defined as the sum of the utilizations
- WCET_i/P_i over all the real-time tasks in the system. When considering
- multiple real-time tasks, the parameters of the i-th task are indicated
- with the "_i" suffix.
- Moreover, if the total utilization is larger than M, then we risk starving
- non- real-time tasks by real-time tasks.
- If, instead, the total utilization is smaller than M, then non real-time
- tasks will not be starved and the system might be able to respect all the
- deadlines.
- As a matter of fact, in this case it is possible to provide an upper bound
- for tardiness (defined as the maximum between 0 and the difference
- between the finishing time of a job and its absolute deadline).
- More precisely, it can be proven that using a global EDF scheduler the
- maximum tardiness of each task is smaller or equal than
-	((M − 1) · WCET_max − WCET_min)/(M − (M − 2) · U_max) + WCET_max
- where WCET_max = max{WCET_i} is the maximum WCET, WCET_min=min{WCET_i}
- is the minimum WCET, and U_max = max{WCET_i/P_i} is the maximum
- utilization[12].
-
-3.2 Schedulability Analysis for Uniprocessor Systems
-------------------------
-
- If M=1 (uniprocessor system), or in case of partitioned scheduling (each
- real-time task is statically assigned to one and only one CPU), it is
- possible to formally check if all the deadlines are respected.
- If D_i = P_i for all tasks, then EDF is able to respect all the deadlines
- of all the tasks executing on a CPU if and only if the total utilization
- of the tasks running on such a CPU is smaller or equal than 1.
- If D_i != P_i for some task, then it is possible to define the density of
- a task as WCET_i/min{D_i,P_i}, and EDF is able to respect all the deadlines
- of all the tasks running on a CPU if the sum of the densities of the tasks
- running on such a CPU is smaller or equal than 1:
-	sum(WCET_i / min{D_i, P_i}) <= 1
- It is important to notice that this condition is only sufficient, and not
- necessary: there are task sets that are schedulable, but do not respect the
- condition. For example, consider the task set {Task_1,Task_2} composed by
- Task_1=(50ms,50ms,100ms) and Task_2=(10ms,100ms,100ms).
- EDF is clearly able to schedule the two tasks without missing any deadline
- (Task_1 is scheduled as soon as it is released, and finishes just in time
- to respect its deadline; Task_2 is scheduled immediately after Task_1, hence
- its response time cannot be larger than 50ms + 10ms = 60ms) even if
-	50 / min{50,100} + 10 / min{100, 100} = 50 / 50 + 10 / 100 = 1.1
- Of course it is possible to test the exact schedulability of tasks with
- D_i != P_i (checking a condition that is both sufficient and necessary),
- but this cannot be done by comparing the total utilization or density with
- a constant. Instead, the so called "processor demand" approach can be used,
- computing the total amount of CPU time h(t) needed by all the tasks to
- respect all of their deadlines in a time interval of size t, and comparing
- such a time with the interval size t. If h(t) is smaller than t (that is,
- the amount of time needed by the tasks in a time interval of size t is
- smaller than the size of the interval) for all the possible values of t, then
- EDF is able to schedule the tasks respecting all of their deadlines. Since
- performing this check for all possible values of t is impossible, it has been
- proven[4,5,6] that it is sufficient to perform the test for values of t
- between 0 and a maximum value L. The cited papers contain all of the
- mathematical details and explain how to compute h(t) and L.
- In any case, this kind of analysis is too complex as well as too
- time-consuming to be performed on-line. Hence, as explained in Section
- 4 Linux uses an admission test based on the tasks' utilizations.
-
-3.3 Schedulability Analysis for Multiprocessor Systems
-------------------------
-
- On multiprocessor systems with global EDF scheduling (non partitioned
- systems), a sufficient test for schedulability can not be based on the
- utilizations or densities: it can be shown that even if D_i = P_i task
- sets with utilizations slightly larger than 1 can miss deadlines regardless
- of the number of CPUs.
-
- Consider a set {Task_1,...Task_{M+1}} of M+1 tasks on a system with M
- CPUs, with the first task Task_1=(P,P,P) having period, relative deadline
- and WCET equal to P. The remaining M tasks Task_i=(e,P-1,P-1) have an
- arbitrarily small worst case execution time (indicated as "e" here) and a
- period smaller than the one of the first task. Hence, if all the tasks
- activate at the same time t, global EDF schedules these M tasks first
- (because their absolute deadlines are equal to t + P - 1, hence they are
- smaller than the absolute deadline of Task_1, which is t + P). As a
- result, Task_1 can be scheduled only at time t + e, and will finish at
- time t + e + P, after its absolute deadline. The total utilization of the
- task set is U = M · e / (P - 1) + P / P = M · e / (P - 1) + 1, and for small
- values of e this can become very close to 1. This is known as "Dhall's
- effect"[7]. Note: the example in the original paper by Dhall has been
- slightly simplified here (for example, Dhall more correctly computed
- lim_{e->0}U).
-
- More complex schedulability tests for global EDF have been developed in
- real-time literature[8,9], but they are not based on a simple comparison
- between total utilization (or density) and a fixed constant. If all tasks
- have D_i = P_i, a sufficient schedulability condition can be expressed in
- a simple way:
-	sum(WCET_i / P_i) <= M - (M - 1) · U_max
- where U_max = max{WCET_i / P_i}[10]. Notice that for U_max = 1,
- M - (M - 1) · U_max becomes M - M + 1 = 1 and this schedulability condition
- just confirms the Dhall's effect. A more complete survey of the literature
- about schedulability tests for multi-processor real-time scheduling can be
- found in [11].
-
- As seen, enforcing that the total utilization is smaller than M does not
- guarantee that global EDF schedules the tasks without missing any deadline
- (in other words, global EDF is not an optimal scheduling algorithm). However,
- a total utilization smaller than M is enough to guarantee that non real-time
- tasks are not starved and that the tardiness of real-time tasks has an upper
- bound[12] (as previously noted). Different bounds on the maximum tardiness
- experienced by real-time tasks have been developed in various papers[13,14],
- but the theoretical result that is important for SCHED_DEADLINE is that if
- the total utilization is smaller or equal than M then the response times of
- the tasks are limited.
-
-3.4 Relationship with SCHED_DEADLINE Parameters
-------------------------
-
- Finally, it is important to understand the relationship between the
- SCHED_DEADLINE scheduling parameters described in Section 2 (runtime,
- deadline and period) and the real-time task parameters (WCET, D, P)
- described in this section. Note that the tasks' temporal constraints are
- represented by its absolute deadlines d_j = r_j + D described above, while
- SCHED_DEADLINE schedules the tasks according to scheduling deadlines (see
- Section 2).
- If an admission test is used to guarantee that the scheduling deadlines
- are respected, then SCHED_DEADLINE can be used to schedule real-time tasks
- guaranteeing that all the jobs' deadlines of a task are respected.
- In order to do this, a task must be scheduled by setting:
-
-  - runtime >= WCET
-  - deadline = D
-  - period <= P
-
- IOW, if runtime >= WCET and if period is <= P, then the scheduling deadlines
- and the absolute deadlines (d_j) coincide, so a proper admission control
- allows to respect the jobs' absolute deadlines for this task (this is what is
- called "hard schedulability property" and is an extension of Lemma 1 of [2]).
- Notice that if runtime > deadline the admission control will surely reject
- this task, as it is not possible to respect its temporal constraints.
-
- References:
-  1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
-      ming in a hard-real-time environment. Journal of the Association for
-      Computing Machinery, 20(1), 1973.
-  2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
-      Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
-      Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
-  3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
-      Technical Report. http://disi.unitn.it/~abeni/tr-98-01.pdf
-  4 - J. Y. Leung and M.L. Merril. A Note on Preemptive Scheduling of
-      Periodic, Real-Time Tasks. Information Processing Letters, vol. 11,
-      no. 3, pp. 115-118, 1980.
-  5 - S. K. Baruah, A. K. Mok and L. E. Rosier. Preemptively Scheduling
-      Hard-Real-Time Sporadic Tasks on One Processor. Proceedings of the
-      11th IEEE Real-time Systems Symposium, 1990.
-  6 - S. K. Baruah, L. E. Rosier and R. R. Howell. Algorithms and Complexity
-      Concerning the Preemptive Scheduling of Periodic Real-Time tasks on
-      One Processor. Real-Time Systems Journal, vol. 4, no. 2, pp 301-324,
-      1990.
-  7 - S. J. Dhall and C. L. Liu. On a real-time scheduling problem. Operations
-      research, vol. 26, no. 1, pp 127-140, 1978.
-  8 - T. Baker. Multiprocessor EDF and Deadline Monotonic Schedulability
-      Analysis. Proceedings of the 24th IEEE Real-Time Systems Symposium, 2003.
-  9 - T. Baker. An Analysis of EDF Schedulability on a Multiprocessor.
-      IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 8,
-      pp 760-768, 2005.
-  10 - J. Goossens, S. Funk and S. Baruah, Priority-Driven Scheduling of
-       Periodic Task Systems on Multiprocessors. Real-Time Systems Journal,
-       vol. 25, no. 2–3, pp. 187–205, 2003.
-  11 - R. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for
-       Multiprocessor Systems. ACM Computing Surveys, vol. 43, no. 4, 2011.
-       http://www-users.cs.york.ac.uk/~robdavis/papers/MPSurveyv5.0.pdf
-  12 - U. C. Devi and J. H. Anderson. Tardiness Bounds under Global EDF
-       Scheduling on a Multiprocessor. Real-Time Systems Journal, vol. 32,
-       no. 2, pp 133-189, 2008.
-  13 - P. Valente and G. Lipari. An Upper Bound to the Lateness of Soft
-       Real-Time Tasks Scheduled by EDF on Multiprocessors. Proceedings of
-       the 26th IEEE Real-Time Systems Symposium, 2005.
-  14 - J. Erickson, U. Devi and S. Baruah. Improved tardiness bounds for
-       Global EDF. Proceedings of the 22nd Euromicro Conference on
-       Real-Time Systems, 2010.
-  15 - G. Lipari, S. Baruah, Greedy reclamation of unused bandwidth in
-       constant-bandwidth servers, 12th IEEE Euromicro Conference on Real-Time
-       Systems, 2000.
-  16 - L. Abeni, J. Lelli, C. Scordino, L. Palopoli, Greedy CPU reclaiming for
-       SCHED DEADLINE. In Proceedings of the Real-Time Linux Workshop (RTLWS),
-       Dusseldorf, Germany, 2014.
-  17 - L. Abeni, G. Lipari, A. Parri, Y. Sun, Multicore CPU reclaiming: parallel
-       or sequential?. In Proceedings of the 31st Annual ACM Symposium on Applied
-       Computing, 2016.
-  18 - J. Lelli, C. Scordino, L. Abeni, D. Faggioli, Deadline scheduling in the
-       Linux kernel, Software: Practice and Experience, 46(6): 821-839, June
-       2016.
-  19 - C. Scordino, L. Abeni, J. Lelli, Energy-Aware Real-Time Scheduling in
-       the Linux Kernel, 33rd ACM/SIGAPP Symposium On Applied Computing (SAC
-       2018), Pau, France, April 2018.
-
-
-4. Bandwidth management
-=======================
-
- As previously mentioned, in order for -deadline scheduling to be
- effective and useful (that is, to be able to provide "runtime" time units
- within "deadline"), it is important to have some method to keep the allocation
- of the available fractions of CPU time to the various tasks under control.
- This is usually called "admission control" and if it is not performed, then
- no guarantee can be given on the actual scheduling of the -deadline tasks.
-
- As already stated in Section 3, a necessary condition to be respected to
- correctly schedule a set of real-time tasks is that the total utilization
- is smaller than M. When talking about -deadline tasks, this requires that
- the sum of the ratio between runtime and period for all tasks is smaller
- than M. Notice that the ratio runtime/period is equivalent to the utilization
- of a "traditional" real-time task, and is also often referred to as
- "bandwidth".
- The interface used to control the CPU bandwidth that can be allocated
- to -deadline tasks is similar to the one already used for -rt
- tasks with real-time group scheduling (a.k.a. RT-throttling - see
- Documentation/scheduler/sched-rt-group.txt), and is based on readable/
- writable control files located in procfs (for system wide settings).
- Notice that per-group settings (controlled through cgroupfs) are still not
- defined for -deadline tasks, because more discussion is needed in order to
- figure out how we want to manage SCHED_DEADLINE bandwidth at the task group
- level.
-
- A main difference between deadline bandwidth management and RT-throttling
- is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
- and thus we don't need a higher level throttling mechanism to enforce the
- desired bandwidth. In other words, this means that interface parameters are
- only used at admission control time (i.e., when the user calls
- sched_setattr()). Scheduling is then performed considering actual tasks'
- parameters, so that CPU bandwidth is allocated to SCHED_DEADLINE tasks
- respecting their needs in terms of granularity. Therefore, using this simple
- interface we can put a cap on total utilization of -deadline tasks (i.e.,
- \Sum (runtime_i / period_i) < global_dl_utilization_cap).
-
-4.1 System wide settings
-------------------------
-
- The system wide settings are configured under the /proc virtual file system.
-
- For now the -rt knobs are used for -deadline admission control and the
- -deadline runtime is accounted against the -rt runtime. We realize that this
- isn't entirely desirable; however, it is better to have a small interface for
- now, and be able to change it easily later. The ideal situation (see 5.) is to
- run -rt tasks from a -deadline server; in which case the -rt bandwidth is a
- direct subset of dl_bw.
-
- This means that, for a root_domain comprising M CPUs, -deadline tasks
- can be created while the sum of their bandwidths stays below:
-
-   M * (sched_rt_runtime_us / sched_rt_period_us)
-
- It is also possible to disable this bandwidth management logic, and
- be thus free of oversubscribing the system up to any arbitrary level.
- This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
-
-
-4.2 Task interface
-------------------
-
- Specifying a periodic/sporadic task that executes for a given amount of
- runtime at each instance, and that is scheduled according to the urgency of
- its own timing constraints needs, in general, a way of declaring:
-  - a (maximum/typical) instance execution time,
-  - a minimum interval between consecutive instances,
-  - a time constraint by which each instance must be completed.
-
- Therefore:
-  * a new struct sched_attr, containing all the necessary fields is
-    provided;
-  * the new scheduling related syscalls that manipulate it, i.e.,
-    sched_setattr() and sched_getattr() are implemented.
-
- For debugging purposes, the leftover runtime and absolute deadline of a
- SCHED_DEADLINE task can be retrieved through /proc/<pid>/sched (entries
- dl.runtime and dl.deadline, both values in ns). A programmatic way to
- retrieve these values from production code is under discussion.
-
-
-4.3 Default behavior
----------------------
-
- The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
- 950000. With rt_period equal to 1000000, by default, it means that -deadline
- tasks can use at most 95%, multiplied by the number of CPUs that compose the
- root_domain, for each root_domain.
- This means that non -deadline tasks will receive at least 5% of the CPU time,
- and that -deadline tasks will receive their runtime with a guaranteed
- worst-case delay respect to the "deadline" parameter. If "deadline" = "period"
- and the cpuset mechanism is used to implement partitioned scheduling (see
- Section 5), then this simple setting of the bandwidth management is able to
- deterministically guarantee that -deadline tasks will receive their runtime
- in a period.
-
- Finally, notice that in order not to jeopardize the admission control a
- -deadline task cannot fork.
-
-
-4.4 Behavior of sched_yield()
------------------------------
-
- When a SCHED_DEADLINE task calls sched_yield(), it gives up its
- remaining runtime and is immediately throttled, until the next
- period, when its runtime will be replenished (a special flag
- dl_yielded is set and used to handle correctly throttling and runtime
- replenishment after a call to sched_yield()).
-
- This behavior of sched_yield() allows the task to wake-up exactly at
- the beginning of the next period. Also, this may be useful in the
- future with bandwidth reclaiming mechanisms, where sched_yield() will
- make the leftoever runtime available for reclamation by other
- SCHED_DEADLINE tasks.
-
-
-5. Tasks CPU affinity
-=====================
-
- -deadline tasks cannot have an affinity mask smaller that the entire
- root_domain they are created on. However, affinities can be specified
- through the cpuset facility (Documentation/cgroup-v1/cpusets.txt).
-
-5.1 SCHED_DEADLINE and cpusets HOWTO
-------------------------------------
-
- An example of a simple configuration (pin a -deadline task to CPU0)
- follows (rt-app is used to create a -deadline task).
-
- mkdir /dev/cpuset
- mount -t cgroup -o cpuset cpuset /dev/cpuset
- cd /dev/cpuset
- mkdir cpu0
- echo 0 > cpu0/cpuset.cpus
- echo 0 > cpu0/cpuset.mems
- echo 1 > cpuset.cpu_exclusive
- echo 0 > cpuset.sched_load_balance
- echo 1 > cpu0/cpuset.cpu_exclusive
- echo 1 > cpu0/cpuset.mem_exclusive
- echo $$ > cpu0/tasks
- rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
- task affinity)
-
-6. Future plans
-===============
-
- Still missing:
-
-  - programmatic way to retrieve current runtime and absolute deadline
-  - refinements to deadline inheritance, especially regarding the possibility
-    of retaining bandwidth isolation among non-interacting tasks. This is
-    being studied from both theoretical and practical points of view, and
-    hopefully we should be able to produce some demonstrative code soon;
-  - (c)group based bandwidth management, and maybe scheduling;
-  - access control for non-root users (and related security concerns to
-    address), which is the best way to allow unprivileged use of the mechanisms
-    and how to prevent non-root users "cheat" the system?
-
- As already discussed, we are planning also to merge this work with the EDF
- throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
- the preliminary phases of the merge and we really seek feedback that would
- help us decide on the direction it should take.
-
-Appendix A. Test suite
-======================
-
- The SCHED_DEADLINE policy can be easily tested using two applications that
- are part of a wider Linux Scheduler validation suite. The suite is
- available as a GitHub repository: https://github.com/scheduler-tools.
-
- The first testing application is called rt-app and can be used to
- start multiple threads with specific parameters. rt-app supports
- SCHED_{OTHER,FIFO,RR,DEADLINE} scheduling policies and their related
- parameters (e.g., niceness, priority, runtime/deadline/period). rt-app
- is a valuable tool, as it can be used to synthetically recreate certain
- workloads (maybe mimicking real use-cases) and evaluate how the scheduler
- behaves under such workloads. In this way, results are easily reproducible.
- rt-app is available at: https://github.com/scheduler-tools/rt-app.
-
- Thread parameters can be specified from the command line, with something like
- this:
-
-  # rt-app -t 100000:10000:d -t 150000:20000:f:10 -D5
-
- The above creates 2 threads. The first one, scheduled by SCHED_DEADLINE,
- executes for 10ms every 100ms. The second one, scheduled at SCHED_FIFO
- priority 10, executes for 20ms every 150ms. The test will run for a total
- of 5 seconds.
-
- More interestingly, configurations can be described with a json file that
- can be passed as input to rt-app with something like this:
-
-  # rt-app my_config.json
-
- The parameters that can be specified with the second method are a superset
- of the command line options. Please refer to rt-app documentation for more
- details (<rt-app-sources>/doc/*.json).
-
- The second testing application is a modification of schedtool, called
- schedtool-dl, which can be used to setup SCHED_DEADLINE parameters for a
- certain pid/application. schedtool-dl is available at:
- https://github.com/scheduler-tools/schedtool-dl.git.
-
- The usage is straightforward:
-
-  # schedtool -E -t 10000000:100000000 -e ./my_cpuhog_app
-
- With this, my_cpuhog_app is put to run inside a SCHED_DEADLINE reservation
- of 10ms every 100ms (note that parameters are expressed in microseconds).
- You can also use schedtool to create a reservation for an already running
- application, given that you know its pid:
-
-  # schedtool -E -t 10000000:100000000 my_app_pid
-
-Appendix B. Minimal main()
-==========================
-
- We provide in what follows a simple (ugly) self-contained code snippet
- showing how SCHED_DEADLINE reservations can be created by a real-time
- application developer.
-
- #define _GNU_SOURCE
- #include <unistd.h>
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include <time.h>
- #include <linux/unistd.h>
- #include <linux/kernel.h>
- #include <linux/types.h>
- #include <sys/syscall.h>
- #include <pthread.h>
-
- #define gettid() syscall(__NR_gettid)
-
- #define SCHED_DEADLINE	6
-
- /* XXX use the proper syscall numbers */
- #ifdef __x86_64__
- #define __NR_sched_setattr		314
- #define __NR_sched_getattr		315
- #endif
-
- #ifdef __i386__
- #define __NR_sched_setattr		351
- #define __NR_sched_getattr		352
- #endif
-
- #ifdef __arm__
- #define __NR_sched_setattr		380
- #define __NR_sched_getattr		381
- #endif
-
- static volatile int done;
-
- struct sched_attr {
-	__u32 size;
-
-	__u32 sched_policy;
-	__u64 sched_flags;
-
-	/* SCHED_NORMAL, SCHED_BATCH */
-	__s32 sched_nice;
-
-	/* SCHED_FIFO, SCHED_RR */
-	__u32 sched_priority;
-
-	/* SCHED_DEADLINE (nsec) */
-	__u64 sched_runtime;
-	__u64 sched_deadline;
-	__u64 sched_period;
- };
-
- int sched_setattr(pid_t pid,
-		  const struct sched_attr *attr,
-		  unsigned int flags)
- {
-	return syscall(__NR_sched_setattr, pid, attr, flags);
- }
-
- int sched_getattr(pid_t pid,
-		  struct sched_attr *attr,
-		  unsigned int size,
-		  unsigned int flags)
- {
-	return syscall(__NR_sched_getattr, pid, attr, size, flags);
- }
-
- void *run_deadline(void *data)
- {
-	struct sched_attr attr;
-	int x = 0;
-	int ret;
-	unsigned int flags = 0;
-
-	printf("deadline thread started [%ld]\n", gettid());
-
-	attr.size = sizeof(attr);
-	attr.sched_flags = 0;
-	attr.sched_nice = 0;
-	attr.sched_priority = 0;
-
-	/* This creates a 10ms/30ms reservation */
-	attr.sched_policy = SCHED_DEADLINE;
-	attr.sched_runtime = 10 * 1000 * 1000;
-	attr.sched_period = attr.sched_deadline = 30 * 1000 * 1000;
-
-	ret = sched_setattr(0, &attr, flags);
-	if (ret < 0) {
-		done = 0;
-		perror("sched_setattr");
-		exit(-1);
-	}
-
-	while (!done) {
-		x++;
-	}
-
-	printf("deadline thread dies [%ld]\n", gettid());
-	return NULL;
- }
-
- int main (int argc, char **argv)
- {
-	pthread_t thread;
-
-	printf("main thread [%ld]\n", gettid());
-
-	pthread_create(&thread, NULL, run_deadline, NULL);
-
-	sleep(10);
-
-	done = 1;
-	pthread_join(thread, NULL);
-
-	printf("main dies [%ld]\n", gettid());
-	return 0;
- }
diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst
new file mode 100644
index 000000000000..82406685365a
--- /dev/null
+++ b/Documentation/scheduler/sched-design-CFS.rst
@@ -0,0 +1,249 @@
+=============
+CFS Scheduler
+=============
+
+
+1.  OVERVIEW
+============
+
+CFS stands for "Completely Fair Scheduler," and is the new "desktop" process
+scheduler implemented by Ingo Molnar and merged in Linux 2.6.23.  It is the
+replacement for the previous vanilla scheduler's SCHED_OTHER interactivity
+code.
+
+80% of CFS's design can be summed up in a single sentence: CFS basically models
+an "ideal, precise multi-tasking CPU" on real hardware.
+
+"Ideal multi-tasking CPU" is a (non-existent  :-)) CPU that has 100% physical
+power and which can run each task at precise equal speed, in parallel, each at
+1/nr_running speed.  For example: if there are 2 tasks running, then it runs
+each at 50% physical power --- i.e., actually in parallel.
+
+On real hardware, we can run only a single task at once, so we have to
+introduce the concept of "virtual runtime."  The virtual runtime of a task
+specifies when its next timeslice would start execution on the ideal
+multi-tasking CPU described above.  In practice, the virtual runtime of a task
+is its actual runtime normalized to the total number of running tasks.
+
+
+
+2.  FEW IMPLEMENTATION DETAILS
+==============================
+
+In CFS the virtual runtime is expressed and tracked via the per-task
+p->se.vruntime (nanosec-unit) value.  This way, it's possible to accurately
+timestamp and measure the "expected CPU time" a task should have gotten.
+
+[ small detail: on "ideal" hardware, at any time all tasks would have the same
+  p->se.vruntime value --- i.e., tasks would execute simultaneously and no task
+  would ever get "out of balance" from the "ideal" share of CPU time.  ]
+
+CFS's task picking logic is based on this p->se.vruntime value and it is thus
+very simple: it always tries to run the task with the smallest p->se.vruntime
+value (i.e., the task which executed least so far).  CFS always tries to split
+up CPU time between runnable tasks as close to "ideal multitasking hardware" as
+possible.
+
+Most of the rest of CFS's design just falls out of this really simple concept,
+with a few add-on embellishments like nice levels, multiprocessing and various
+algorithm variants to recognize sleepers.
+
+
+
+3.  THE RBTREE
+==============
+
+CFS's design is quite radical: it does not use the old data structures for the
+runqueues, but it uses a time-ordered rbtree to build a "timeline" of future
+task execution, and thus has no "array switch" artifacts (by which both the
+previous vanilla scheduler and RSDL/SD are affected).
+
+CFS also maintains the rq->cfs.min_vruntime value, which is a monotonic
+increasing value tracking the smallest vruntime among all tasks in the
+runqueue.  The total amount of work done by the system is tracked using
+min_vruntime; that value is used to place newly activated entities on the left
+side of the tree as much as possible.
+
+The total number of running tasks in the runqueue is accounted through the
+rq->cfs.load value, which is the sum of the weights of the tasks queued on the
+runqueue.
+
+CFS maintains a time-ordered rbtree, where all runnable tasks are sorted by the
+p->se.vruntime key. CFS picks the "leftmost" task from this tree and sticks to it.
+As the system progresses forwards, the executed tasks are put into the tree
+more and more to the right --- slowly but surely giving a chance for every task
+to become the "leftmost task" and thus get on the CPU within a deterministic
+amount of time.
+
+Summing up, CFS works like this: it runs a task a bit, and when the task
+schedules (or a scheduler tick happens) the task's CPU usage is "accounted
+for": the (small) time it just spent using the physical CPU is added to
+p->se.vruntime.  Once p->se.vruntime gets high enough so that another task
+becomes the "leftmost task" of the time-ordered rbtree it maintains (plus a
+small amount of "granularity" distance relative to the leftmost task so that we
+do not over-schedule tasks and trash the cache), then the new leftmost task is
+picked and the current task is preempted.
+
+
+
+4.  SOME FEATURES OF CFS
+========================
+
+CFS uses nanosecond granularity accounting and does not rely on any jiffies or
+other HZ detail.  Thus the CFS scheduler has no notion of "timeslices" in the
+way the previous scheduler had, and has no heuristics whatsoever.  There is
+only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
+
+   /proc/sys/kernel/sched_min_granularity_ns
+
+which can be used to tune the scheduler from "desktop" (i.e., low latencies) to
+"server" (i.e., good batching) workloads.  It defaults to a setting suitable
+for desktop workloads.  SCHED_BATCH is handled by the CFS scheduler module too.
+
+Due to its design, the CFS scheduler is not prone to any of the "attacks" that
+exist today against the heuristics of the stock scheduler: fiftyp.c, thud.c,
+chew.c, ring-test.c, massive_intr.c all work fine and do not impact
+interactivity and produce the expected behavior.
+
+The CFS scheduler has a much stronger handling of nice levels and SCHED_BATCH
+than the previous vanilla scheduler: both types of workloads are isolated much
+more aggressively.
+
+SMP load-balancing has been reworked/sanitized: the runqueue-walking
+assumptions are gone from the load-balancing code now, and iterators of the
+scheduling modules are used.  The balancing code got quite a bit simpler as a
+result.
+
+
+
+5. Scheduling policies
+======================
+
+CFS implements three scheduling policies:
+
+  - SCHED_NORMAL (traditionally called SCHED_OTHER): The scheduling
+    policy that is used for regular tasks.
+
+  - SCHED_BATCH: Does not preempt nearly as often as regular tasks
+    would, thereby allowing tasks to run longer and make better use of
+    caches but at the cost of interactivity. This is well suited for
+    batch jobs.
+
+  - SCHED_IDLE: This is even weaker than nice 19, but its not a true
+    idle timer scheduler in order to avoid to get into priority
+    inversion problems which would deadlock the machine.
+
+SCHED_FIFO/_RR are implemented in sched/rt.c and are as specified by
+POSIX.
+
+The command chrt from util-linux-ng 2.13.1.1 can set all of these except
+SCHED_IDLE.
+
+
+
+6.  SCHEDULING CLASSES
+======================
+
+The new CFS scheduler has been designed in such a way to introduce "Scheduling
+Classes," an extensible hierarchy of scheduler modules.  These modules
+encapsulate scheduling policy details and are handled by the scheduler core
+without the core code assuming too much about them.
+
+sched/fair.c implements the CFS scheduler described above.
+
+sched/rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than
+the previous vanilla scheduler did.  It uses 100 runqueues (for all 100 RT
+priority levels, instead of 140 in the previous scheduler) and it needs no
+expired array.
+
+Scheduling classes are implemented through the sched_class structure, which
+contains hooks to functions that must be called whenever an interesting event
+occurs.
+
+This is the (partial) list of the hooks:
+
+ - enqueue_task(...)
+
+   Called when a task enters a runnable state.
+   It puts the scheduling entity (task) into the red-black tree and
+   increments the nr_running variable.
+
+ - dequeue_task(...)
+
+   When a task is no longer runnable, this function is called to keep the
+   corresponding scheduling entity out of the red-black tree.  It decrements
+   the nr_running variable.
+
+ - yield_task(...)
+
+   This function is basically just a dequeue followed by an enqueue, unless the
+   compat_yield sysctl is turned on; in that case, it places the scheduling
+   entity at the right-most end of the red-black tree.
+
+ - check_preempt_curr(...)
+
+   This function checks if a task that entered the runnable state should
+   preempt the currently running task.
+
+ - pick_next_task(...)
+
+   This function chooses the most appropriate task eligible to run next.
+
+ - set_curr_task(...)
+
+   This function is called when a task changes its scheduling class or changes
+   its task group.
+
+ - task_tick(...)
+
+   This function is mostly called from time tick functions; it might lead to
+   process switch.  This drives the running preemption.
+
+
+
+
+7.  GROUP SCHEDULER EXTENSIONS TO CFS
+=====================================
+
+Normally, the scheduler operates on individual tasks and strives to provide
+fair CPU time to each task.  Sometimes, it may be desirable to group tasks and
+provide fair CPU time to each such task group.  For example, it may be
+desirable to first provide fair CPU time to each user on the system and then to
+each task belonging to a user.
+
+CONFIG_CGROUP_SCHED strives to achieve exactly that.  It lets tasks to be
+grouped and divides CPU time fairly among such groups.
+
+CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and
+SCHED_RR) tasks.
+
+CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and
+SCHED_BATCH) tasks.
+
+   These options need CONFIG_CGROUPS to be defined, and let the administrator
+   create arbitrary groups of tasks, using the "cgroup" pseudo filesystem.  See
+   Documentation/cgroup-v1/cgroups.txt for more information about this filesystem.
+
+When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
+group created using the pseudo filesystem.  See example steps below to create
+task groups and modify their CPU share using the "cgroups" pseudo filesystem::
+
+	# mount -t tmpfs cgroup_root /sys/fs/cgroup
+	# mkdir /sys/fs/cgroup/cpu
+	# mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
+	# cd /sys/fs/cgroup/cpu
+
+	# mkdir multimedia	# create "multimedia" group of tasks
+	# mkdir browser		# create "browser" group of tasks
+
+	# #Configure the multimedia group to receive twice the CPU bandwidth
+	# #that of browser group
+
+	# echo 2048 > multimedia/cpu.shares
+	# echo 1024 > browser/cpu.shares
+
+	# firefox &	# Launch firefox and move it to "browser" group
+	# echo <firefox_pid> > browser/tasks
+
+	# #Launch gmplayer (or your favourite movie player)
+	# echo <movie_player_pid> > multimedia/tasks
diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
deleted file mode 100644
index edd861c94c1b..000000000000
--- a/Documentation/scheduler/sched-design-CFS.txt
+++ /dev/null
@@ -1,242 +0,0 @@
-                      =============
-                      CFS Scheduler
-                      =============
-
-
-1.  OVERVIEW
-
-CFS stands for "Completely Fair Scheduler," and is the new "desktop" process
-scheduler implemented by Ingo Molnar and merged in Linux 2.6.23.  It is the
-replacement for the previous vanilla scheduler's SCHED_OTHER interactivity
-code.
-
-80% of CFS's design can be summed up in a single sentence: CFS basically models
-an "ideal, precise multi-tasking CPU" on real hardware.
-
-"Ideal multi-tasking CPU" is a (non-existent  :-)) CPU that has 100% physical
-power and which can run each task at precise equal speed, in parallel, each at
-1/nr_running speed.  For example: if there are 2 tasks running, then it runs
-each at 50% physical power --- i.e., actually in parallel.
-
-On real hardware, we can run only a single task at once, so we have to
-introduce the concept of "virtual runtime."  The virtual runtime of a task
-specifies when its next timeslice would start execution on the ideal
-multi-tasking CPU described above.  In practice, the virtual runtime of a task
-is its actual runtime normalized to the total number of running tasks.
-
-
-
-2.  FEW IMPLEMENTATION DETAILS
-
-In CFS the virtual runtime is expressed and tracked via the per-task
-p->se.vruntime (nanosec-unit) value.  This way, it's possible to accurately
-timestamp and measure the "expected CPU time" a task should have gotten.
-
-[ small detail: on "ideal" hardware, at any time all tasks would have the same
-  p->se.vruntime value --- i.e., tasks would execute simultaneously and no task
-  would ever get "out of balance" from the "ideal" share of CPU time.  ]
-
-CFS's task picking logic is based on this p->se.vruntime value and it is thus
-very simple: it always tries to run the task with the smallest p->se.vruntime
-value (i.e., the task which executed least so far).  CFS always tries to split
-up CPU time between runnable tasks as close to "ideal multitasking hardware" as
-possible.
-
-Most of the rest of CFS's design just falls out of this really simple concept,
-with a few add-on embellishments like nice levels, multiprocessing and various
-algorithm variants to recognize sleepers.
-
-
-
-3.  THE RBTREE
-
-CFS's design is quite radical: it does not use the old data structures for the
-runqueues, but it uses a time-ordered rbtree to build a "timeline" of future
-task execution, and thus has no "array switch" artifacts (by which both the
-previous vanilla scheduler and RSDL/SD are affected).
-
-CFS also maintains the rq->cfs.min_vruntime value, which is a monotonic
-increasing value tracking the smallest vruntime among all tasks in the
-runqueue.  The total amount of work done by the system is tracked using
-min_vruntime; that value is used to place newly activated entities on the left
-side of the tree as much as possible.
-
-The total number of running tasks in the runqueue is accounted through the
-rq->cfs.load value, which is the sum of the weights of the tasks queued on the
-runqueue.
-
-CFS maintains a time-ordered rbtree, where all runnable tasks are sorted by the
-p->se.vruntime key. CFS picks the "leftmost" task from this tree and sticks to it.
-As the system progresses forwards, the executed tasks are put into the tree
-more and more to the right --- slowly but surely giving a chance for every task
-to become the "leftmost task" and thus get on the CPU within a deterministic
-amount of time.
-
-Summing up, CFS works like this: it runs a task a bit, and when the task
-schedules (or a scheduler tick happens) the task's CPU usage is "accounted
-for": the (small) time it just spent using the physical CPU is added to
-p->se.vruntime.  Once p->se.vruntime gets high enough so that another task
-becomes the "leftmost task" of the time-ordered rbtree it maintains (plus a
-small amount of "granularity" distance relative to the leftmost task so that we
-do not over-schedule tasks and trash the cache), then the new leftmost task is
-picked and the current task is preempted.
-
-
-
-4.  SOME FEATURES OF CFS
-
-CFS uses nanosecond granularity accounting and does not rely on any jiffies or
-other HZ detail.  Thus the CFS scheduler has no notion of "timeslices" in the
-way the previous scheduler had, and has no heuristics whatsoever.  There is
-only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
-
-   /proc/sys/kernel/sched_min_granularity_ns
-
-which can be used to tune the scheduler from "desktop" (i.e., low latencies) to
-"server" (i.e., good batching) workloads.  It defaults to a setting suitable
-for desktop workloads.  SCHED_BATCH is handled by the CFS scheduler module too.
-
-Due to its design, the CFS scheduler is not prone to any of the "attacks" that
-exist today against the heuristics of the stock scheduler: fiftyp.c, thud.c,
-chew.c, ring-test.c, massive_intr.c all work fine and do not impact
-interactivity and produce the expected behavior.
-
-The CFS scheduler has a much stronger handling of nice levels and SCHED_BATCH
-than the previous vanilla scheduler: both types of workloads are isolated much
-more aggressively.
-
-SMP load-balancing has been reworked/sanitized: the runqueue-walking
-assumptions are gone from the load-balancing code now, and iterators of the
-scheduling modules are used.  The balancing code got quite a bit simpler as a
-result.
-
-
-
-5. Scheduling policies
-
-CFS implements three scheduling policies:
-
-  - SCHED_NORMAL (traditionally called SCHED_OTHER): The scheduling
-    policy that is used for regular tasks.
-
-  - SCHED_BATCH: Does not preempt nearly as often as regular tasks
-    would, thereby allowing tasks to run longer and make better use of
-    caches but at the cost of interactivity. This is well suited for
-    batch jobs.
-
-  - SCHED_IDLE: This is even weaker than nice 19, but its not a true
-    idle timer scheduler in order to avoid to get into priority
-    inversion problems which would deadlock the machine.
-
-SCHED_FIFO/_RR are implemented in sched/rt.c and are as specified by
-POSIX.
-
-The command chrt from util-linux-ng 2.13.1.1 can set all of these except
-SCHED_IDLE.
-
-
-
-6.  SCHEDULING CLASSES
-
-The new CFS scheduler has been designed in such a way to introduce "Scheduling
-Classes," an extensible hierarchy of scheduler modules.  These modules
-encapsulate scheduling policy details and are handled by the scheduler core
-without the core code assuming too much about them.
-
-sched/fair.c implements the CFS scheduler described above.
-
-sched/rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than
-the previous vanilla scheduler did.  It uses 100 runqueues (for all 100 RT
-priority levels, instead of 140 in the previous scheduler) and it needs no
-expired array.
-
-Scheduling classes are implemented through the sched_class structure, which
-contains hooks to functions that must be called whenever an interesting event
-occurs.
-
-This is the (partial) list of the hooks:
-
- - enqueue_task(...)
-
-   Called when a task enters a runnable state.
-   It puts the scheduling entity (task) into the red-black tree and
-   increments the nr_running variable.
-
- - dequeue_task(...)
-
-   When a task is no longer runnable, this function is called to keep the
-   corresponding scheduling entity out of the red-black tree.  It decrements
-   the nr_running variable.
-
- - yield_task(...)
-
-   This function is basically just a dequeue followed by an enqueue, unless the
-   compat_yield sysctl is turned on; in that case, it places the scheduling
-   entity at the right-most end of the red-black tree.
-
- - check_preempt_curr(...)
-
-   This function checks if a task that entered the runnable state should
-   preempt the currently running task.
-
- - pick_next_task(...)
-
-   This function chooses the most appropriate task eligible to run next.
-
- - set_curr_task(...)
-
-   This function is called when a task changes its scheduling class or changes
-   its task group.
-
- - task_tick(...)
-
-   This function is mostly called from time tick functions; it might lead to
-   process switch.  This drives the running preemption.
-
-
-
-
-7.  GROUP SCHEDULER EXTENSIONS TO CFS
-
-Normally, the scheduler operates on individual tasks and strives to provide
-fair CPU time to each task.  Sometimes, it may be desirable to group tasks and
-provide fair CPU time to each such task group.  For example, it may be
-desirable to first provide fair CPU time to each user on the system and then to
-each task belonging to a user.
-
-CONFIG_CGROUP_SCHED strives to achieve exactly that.  It lets tasks to be
-grouped and divides CPU time fairly among such groups.
-
-CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and
-SCHED_RR) tasks.
-
-CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and
-SCHED_BATCH) tasks.
-
-   These options need CONFIG_CGROUPS to be defined, and let the administrator
-   create arbitrary groups of tasks, using the "cgroup" pseudo filesystem.  See
-   Documentation/cgroup-v1/cgroups.txt for more information about this filesystem.
-
-When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
-group created using the pseudo filesystem.  See example steps below to create
-task groups and modify their CPU share using the "cgroups" pseudo filesystem.
-
-	# mount -t tmpfs cgroup_root /sys/fs/cgroup
-	# mkdir /sys/fs/cgroup/cpu
-	# mount -t cgroup -ocpu none /sys/fs/cgroup/cpu
-	# cd /sys/fs/cgroup/cpu
-
-	# mkdir multimedia	# create "multimedia" group of tasks
-	# mkdir browser		# create "browser" group of tasks
-
-	# #Configure the multimedia group to receive twice the CPU bandwidth
-	# #that of browser group
-
-	# echo 2048 > multimedia/cpu.shares
-	# echo 1024 > browser/cpu.shares
-
-	# firefox &	# Launch firefox and move it to "browser" group
-	# echo <firefox_pid> > browser/tasks
-
-	# #Launch gmplayer (or your favourite movie player)
-	# echo <movie_player_pid> > multimedia/tasks
diff --git a/Documentation/scheduler/sched-domains.rst b/Documentation/scheduler/sched-domains.rst
new file mode 100644
index 000000000000..f7504226f445
--- /dev/null
+++ b/Documentation/scheduler/sched-domains.rst
@@ -0,0 +1,83 @@
+=================
+Scheduler Domains
+=================
+
+Each CPU has a "base" scheduling domain (struct sched_domain). The domain
+hierarchy is built from these base domains via the ->parent pointer. ->parent
+MUST be NULL terminated, and domain structures should be per-CPU as they are
+locklessly updated.
+
+Each scheduling domain spans a number of CPUs (stored in the ->span field).
+A domain's span MUST be a superset of it child's span (this restriction could
+be relaxed if the need arises), and a base domain for CPU i MUST span at least
+i. The top domain for each CPU will generally span all CPUs in the system
+although strictly it doesn't have to, but this could lead to a case where some
+CPUs will never be given tasks to run unless the CPUs allowed mask is
+explicitly set. A sched domain's span means "balance process load among these
+CPUs".
+
+Each scheduling domain must have one or more CPU groups (struct sched_group)
+which are organised as a circular one way linked list from the ->groups
+pointer. The union of cpumasks of these groups MUST be the same as the
+domain's span. The intersection of cpumasks from any two of these groups
+MUST be the empty set. The group pointed to by the ->groups pointer MUST
+contain the CPU to which the domain belongs. Groups may be shared among
+CPUs as they contain read only data after they have been set up.
+
+Balancing within a sched domain occurs between groups. That is, each group
+is treated as one entity. The load of a group is defined as the sum of the
+load of each of its member CPUs, and only when the load of a group becomes
+out of balance are tasks moved between groups.
+
+In kernel/sched/core.c, trigger_load_balance() is run periodically on each CPU
+through scheduler_tick(). It raises a softirq after the next regularly scheduled
+rebalancing event for the current runqueue has arrived. The actual load
+balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
+in softirq context (SCHED_SOFTIRQ).
+
+The latter function takes two arguments: the current CPU and whether it was idle
+at the time the scheduler_tick() happened and iterates over all sched domains
+our CPU is on, starting from its base domain and going up the ->parent chain.
+While doing that, it checks to see if the current domain has exhausted its
+rebalance interval. If so, it runs load_balance() on that domain. It then checks
+the parent sched_domain (if it exists), and the parent of the parent and so
+forth.
+
+Initially, load_balance() finds the busiest group in the current sched domain.
+If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in
+that group. If it manages to find such a runqueue, it locks both our initial
+CPU's runqueue and the newly found busiest one and starts moving tasks from it
+to our runqueue. The exact number of tasks amounts to an imbalance previously
+computed while iterating over this sched domain's groups.
+
+Implementing sched domains
+==========================
+
+The "base" domain will "span" the first level of the hierarchy. In the case
+of SMT, you'll span all siblings of the physical CPU, with each group being
+a single virtual CPU.
+
+In SMP, the parent of the base domain will span all physical CPUs in the
+node. Each group being a single physical CPU. Then with NUMA, the parent
+of the SMP domain will span the entire machine, with each group having the
+cpumask of a node. Or, you could do multi-level NUMA or Opteron, for example,
+might have just one domain covering its one NUMA level.
+
+The implementor should read comments in include/linux/sched.h:
+struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
+the specifics and what to tune.
+
+Architectures may retain the regular override the default SD_*_INIT flags
+while using the generic domain builder in kernel/sched/core.c if they wish to
+retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
+can be done by #define'ing ARCH_HASH_SCHED_TUNE.
+
+Alternatively, the architecture may completely override the generic domain
+builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
+arch_init_sched_domains function. This function will attach domains to all
+CPUs using cpu_attach_domain.
+
+The sched-domains debugging infrastructure can be enabled by enabling
+CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
+which should catch most possible errors (described above). It also prints out
+the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-domains.txt b/Documentation/scheduler/sched-domains.txt
deleted file mode 100644
index 4af80b1c05aa..000000000000
--- a/Documentation/scheduler/sched-domains.txt
+++ /dev/null
@@ -1,77 +0,0 @@
-Each CPU has a "base" scheduling domain (struct sched_domain). The domain
-hierarchy is built from these base domains via the ->parent pointer. ->parent
-MUST be NULL terminated, and domain structures should be per-CPU as they are
-locklessly updated.
-
-Each scheduling domain spans a number of CPUs (stored in the ->span field).
-A domain's span MUST be a superset of it child's span (this restriction could
-be relaxed if the need arises), and a base domain for CPU i MUST span at least
-i. The top domain for each CPU will generally span all CPUs in the system
-although strictly it doesn't have to, but this could lead to a case where some
-CPUs will never be given tasks to run unless the CPUs allowed mask is
-explicitly set. A sched domain's span means "balance process load among these
-CPUs".
-
-Each scheduling domain must have one or more CPU groups (struct sched_group)
-which are organised as a circular one way linked list from the ->groups
-pointer. The union of cpumasks of these groups MUST be the same as the
-domain's span. The intersection of cpumasks from any two of these groups
-MUST be the empty set. The group pointed to by the ->groups pointer MUST
-contain the CPU to which the domain belongs. Groups may be shared among
-CPUs as they contain read only data after they have been set up.
-
-Balancing within a sched domain occurs between groups. That is, each group
-is treated as one entity. The load of a group is defined as the sum of the
-load of each of its member CPUs, and only when the load of a group becomes
-out of balance are tasks moved between groups.
-
-In kernel/sched/core.c, trigger_load_balance() is run periodically on each CPU
-through scheduler_tick(). It raises a softirq after the next regularly scheduled
-rebalancing event for the current runqueue has arrived. The actual load
-balancing workhorse, run_rebalance_domains()->rebalance_domains(), is then run
-in softirq context (SCHED_SOFTIRQ).
-
-The latter function takes two arguments: the current CPU and whether it was idle
-at the time the scheduler_tick() happened and iterates over all sched domains
-our CPU is on, starting from its base domain and going up the ->parent chain.
-While doing that, it checks to see if the current domain has exhausted its
-rebalance interval. If so, it runs load_balance() on that domain. It then checks
-the parent sched_domain (if it exists), and the parent of the parent and so
-forth.
-
-Initially, load_balance() finds the busiest group in the current sched domain.
-If it succeeds, it looks for the busiest runqueue of all the CPUs' runqueues in
-that group. If it manages to find such a runqueue, it locks both our initial
-CPU's runqueue and the newly found busiest one and starts moving tasks from it
-to our runqueue. The exact number of tasks amounts to an imbalance previously
-computed while iterating over this sched domain's groups.
-
-*** Implementing sched domains ***
-The "base" domain will "span" the first level of the hierarchy. In the case
-of SMT, you'll span all siblings of the physical CPU, with each group being
-a single virtual CPU.
-
-In SMP, the parent of the base domain will span all physical CPUs in the
-node. Each group being a single physical CPU. Then with NUMA, the parent
-of the SMP domain will span the entire machine, with each group having the
-cpumask of a node. Or, you could do multi-level NUMA or Opteron, for example,
-might have just one domain covering its one NUMA level.
-
-The implementor should read comments in include/linux/sched.h:
-struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
-the specifics and what to tune.
-
-Architectures may retain the regular override the default SD_*_INIT flags
-while using the generic domain builder in kernel/sched/core.c if they wish to
-retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
-can be done by #define'ing ARCH_HASH_SCHED_TUNE.
-
-Alternatively, the architecture may completely override the generic domain
-builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
-arch_init_sched_domains function. This function will attach domains to all
-CPUs using cpu_attach_domain.
-
-The sched-domains debugging infrastructure can be enabled by enabling
-CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
-which should catch most possible errors (described above). It also prints out
-the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst
new file mode 100644
index 000000000000..fce5858c9082
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.rst
@@ -0,0 +1,430 @@
+=======================
+Energy Aware Scheduling
+=======================
+
+1. Introduction
+---------------
+
+Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
+the impact of its decisions on the energy consumed by CPUs. EAS relies on an
+Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
+with a minimal impact on throughput. This document aims at providing an
+introduction on how EAS works, what are the main design decisions behind it, and
+details what is needed to get it to run.
+
+Before going any further, please note that at the time of writing::
+
+   /!\ EAS does not support platforms with symmetric CPU topologies /!\
+
+EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
+because this is where the potential for saving energy through scheduling is
+the highest.
+
+The actual EM used by EAS is _not_ maintained by the scheduler, but by a
+dedicated framework. For details about this framework and what it provides,
+please refer to its documentation (see Documentation/power/energy-model.txt).
+
+
+2. Background and Terminology
+-----------------------------
+
+To make it clear from the start:
+ - energy = [joule] (resource like a battery on powered devices)
+ - power = energy/time = [joule/second] = [watt]
+
+The goal of EAS is to minimize energy, while still getting the job done. That
+is, we want to maximize::
+
+	performance [inst/s]
+	--------------------
+	    power [W]
+
+which is equivalent to minimizing::
+
+	energy [J]
+	-----------
+	instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance.
+
+The idea behind introducing an EM is to allow the scheduler to evaluate the
+implications of its decisions rather than blindly applying energy-saving
+techniques that may have positive effects only on some platforms. At the same
+time, the EM must be as simple as possible to minimize the scheduler latency
+impact.
+
+In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
+for the scheduler to decide where a task should run (during wake-up), the EM
+is used to break the tie between several good CPU candidates and pick the one
+that is predicted to yield the best energy consumption without harming the
+system's throughput. The predictions made by EAS rely on specific elements of
+knowledge about the platform's topology, which include the 'capacity' of CPUs,
+and their respective energy costs.
+
+
+3. Topology information
+-----------------------
+
+EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
+differentiate CPUs with different computing throughput. The 'capacity' of a CPU
+represents the amount of work it can absorb when running at its highest
+frequency compared to the most capable CPU of the system. Capacity values are
+normalized in a 1024 range, and are comparable with the utilization signals of
+tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
+to capacity and utilization values, EAS is able to estimate how big/busy a
+task/CPU is, and to take this into consideration when evaluating performance vs
+energy trade-offs. The capacity of CPUs is provided via arch-specific code
+through the arch_scale_cpu_capacity() callback.
+
+The rest of platform knowledge used by EAS is directly read from the Energy
+Model (EM) framework. The EM of a platform is composed of a power cost table
+per 'performance domain' in the system (see Documentation/power/energy-model.txt
+for futher details about performance domains).
+
+The scheduler manages references to the EM objects in the topology code when the
+scheduling domains are built, or re-built. For each root domain (rd), the
+scheduler maintains a singly linked list of all performance domains intersecting
+the current rd->span. Each node in the list contains a pointer to a struct
+em_perf_domain as provided by the EM framework.
+
+The lists are attached to the root domains in order to cope with exclusive
+cpuset configurations. Since the boundaries of exclusive cpusets do not
+necessarily match those of performance domains, the lists of different root
+domains can contain duplicate elements.
+
+Example 1.
+    Let us consider a platform with 12 CPUs, split in 3 performance domains
+    (pd0, pd4 and pd8), organized as follows::
+
+	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
+	          PDs:   |--pd0--|--pd4--|---pd8---|
+	          RDs:   |----rd1----|-----rd2-----|
+
+    Now, consider that userspace decided to split the system with two
+    exclusive cpusets, hence creating two independent root domains, each
+    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
+    above figure. Since pd4 intersects with both rd1 and rd2, it will be
+    present in the linked list '->pd' attached to each of them:
+
+       * rd1->pd: pd0 -> pd4
+       * rd2->pd: pd4 -> pd8
+
+    Please note that the scheduler will create two duplicate list nodes for
+    pd4 (one for each list). However, both just hold a pointer to the same
+    shared data structure of the EM framework.
+
+Since the access to these lists can happen concurrently with hotplug and other
+things, they are protected by RCU, like the rest of topology structures
+manipulated by the scheduler.
+
+EAS also maintains a static key (sched_energy_present) which is enabled when at
+least one root domain meets all conditions for EAS to start. Those conditions
+are summarized in Section 6.
+
+
+4. Energy-Aware task placement
+------------------------------
+
+EAS overrides the CFS task wake-up balancing code. It uses the EM of the
+platform and the PELT signals to choose an energy-efficient target CPU during
+wake-up balance. When EAS is enabled, select_task_rq_fair() calls
+find_energy_efficient_cpu() to do the placement decision. This function looks
+for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
+each performance domain since it is the one which will allow us to keep the
+frequency the lowest. Then, the function checks if placing the task there could
+save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
+in its previous activation.
+
+find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
+energy consumed by the system if the waking task was migrated. compute_energy()
+looks at the current utilization landscape of the CPUs and adjusts it to
+'simulate' the task migration. The EM framework provides the em_pd_energy() API
+which computes the expected energy consumption of each performance domain for
+the given utilization landscape.
+
+An example of energy-optimized task placement decision is detailed below.
+
+Example 2.
+    Let us consider a (fake) platform with 2 independent performance domains
+    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
+    are big.
+
+    The scheduler must decide where to place a task P whose util_avg = 200
+    and prev_cpu = 0.
+
+    The current utilization landscape of the CPUs is depicted on the graph
+    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
+    Each performance domain has three Operating Performance Points (OPPs).
+    The CPU capacity and power cost associated with each OPP is listed in
+    the Energy Model table. The util_avg of P is shown on the figures
+    below as 'PP'::
+
+     CPU util.
+      1024                 - - - - - - -              Energy Model
+                                               +-----------+-------------+
+                                               |  Little   |     Big     |
+       768                 =============       +-----+-----+------+------+
+                                               | Cap | Pwr | Cap  | Pwr  |
+                                               +-----+-----+------+------+
+       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
+                             ##     ##         | 341 | 150 | 768  | 800  |
+       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
+             PP              ##     ##         +-----+-----+------+------+
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
+
+
+    find_energy_efficient_cpu() will first look for the CPUs with the
+    maximum spare capacity in the two performance domains. In this example,
+    CPU1 and CPU3. Then it will estimate the energy of the system if P was
+    placed on either of them, and check if that would save some energy
+    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
+    (which is coherent with the behaviour of the schedutil CPUFreq
+    governor, see Section 6. for more details on this topic).
+
+    **Case 1. P is migrated to CPU1**::
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 300 / 341 * 150 = 131
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1364
+       341  ===========      ##     ##
+                    PP       ##     ##
+       170  -## - - PP-      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    **Case 2. P is migrated to CPU3**::
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 200 / 341 * 150 = 88
+                                             * CPU1: 100 / 341 * 150 = 43
+                                    PP       * CPU2: 600 / 768 * 800 = 625
+       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
+                             ##     ##          => total_energy = 1485
+       341  ===========      ##     ##
+                             ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    **Case 3. P stays on prev_cpu / CPU 0**::
+
+      1024                 - - - - - - -
+
+                                            Energy calculation:
+       768                 =============     * CPU0: 400 / 512 * 300 = 234
+                                             * CPU1: 100 / 512 * 300 = 58
+                                             * CPU2: 600 / 768 * 800 = 625
+       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
+                             ##     ##          => total_energy = 1437
+       341  -PP - - - -      ##     ##
+             PP              ##     ##
+       170  -## - - - -      ##     ##
+             ##     ##       ##     ##
+           ------------    -------------
+            CPU0   CPU1     CPU2   CPU3
+
+
+    From these calculations, the Case 1 has the lowest total energy. So CPU 1
+    is be the best candidate from an energy-efficiency standpoint.
+
+Big CPUs are generally more power hungry than the little ones and are thus used
+mainly when a task doesn't fit the littles. However, little CPUs aren't always
+necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
+of the little CPUs can be less energy-efficient than the lowest OPPs of the
+bigs, for example. So, if the little CPUs happen to have enough utilization at
+a specific point in time, a small task waking up at that moment could be better
+of executing on the big side in order to save energy, even though it would fit
+on the little side.
+
+And even in the case where all OPPs of the big CPUs are less energy-efficient
+than those of the little, using the big CPUs for a small task might still, under
+specific conditions, save energy. Indeed, placing a task on a little CPU can
+result in raising the OPP of the entire performance domain, and that will
+increase the cost of the tasks already running there. If the waking task is
+placed on a big CPU, its own execution cost might be higher than if it was
+running on a little, but it won't impact the other tasks of the little CPUs
+which will keep running at a lower OPP. So, when considering the total energy
+consumed by CPUs, the extra cost of running that one task on a big core can be
+smaller than the cost of raising the OPP on the little CPUs for all the other
+tasks.
+
+The examples above would be nearly impossible to get right in a generic way, and
+for all platforms, without knowing the cost of running at different OPPs on all
+CPUs of the system. Thanks to its EM-based design, EAS should cope with them
+correctly without too many troubles. However, in order to ensure a minimal
+impact on throughput for high-utilization scenarios, EAS also implements another
+mechanism called 'over-utilization'.
+
+
+5. Over-utilization
+-------------------
+
+From a general standpoint, the use-cases where EAS can help the most are those
+involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
+being run, they will require all of the available CPU capacity, and there isn't
+much that can be done by the scheduler to save energy without severly harming
+throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
+'over-utilized' as soon as they are used at more than 80% of their compute
+capacity. As long as no CPUs are over-utilized in a root domain, load balancing
+is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
+the most energy efficient CPUs of the system more than the others if that can be
+done without harming throughput. So, the load-balancer is disabled to prevent
+it from breaking the energy-efficient task placement found by EAS. It is safe to
+do so when the system isn't overutilized since being below the 80% tipping point
+implies that:
+
+    a. there is some idle time on all CPUs, so the utilization signals used by
+       EAS are likely to accurately represent the 'size' of the various tasks
+       in the system;
+    b. all tasks should already be provided with enough CPU capacity,
+       regardless of their nice values;
+    c. since there is spare capacity all tasks must be blocking/sleeping
+       regularly and balancing at wake-up is sufficient.
+
+As soon as one CPU goes above the 80% tipping point, at least one of the three
+assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
+is raised for the entire root domain, EAS is disabled, and the load-balancer is
+re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
+wake-up and load balance under CPU-bound conditions. This provides a better
+respect of the nice values of tasks.
+
+Since the notion of overutilization largely relies on detecting whether or not
+there is some idle time in the system, the CPU capacity 'stolen' by higher
+(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
+such, the detection of overutilization accounts for the capacity used not only
+by CFS tasks, but also by the other scheduling classes and IRQ.
+
+
+6. Dependencies and requirements for EAS
+----------------------------------------
+
+Energy Aware Scheduling depends on the CPUs of the system having specific
+hardware properties and on other features of the kernel being enabled. This
+section lists these dependencies and provides hints as to how they can be met.
+
+
+6.1 - Asymmetric CPU topology
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+As mentioned in the introduction, EAS is only supported on platforms with
+asymmetric CPU topologies for now. This requirement is checked at run-time by
+looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
+domains are built.
+
+The flag is set/cleared automatically by the scheduler topology code whenever
+there are CPUs with different capacities in a root domain. The capacities of
+CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
+callback. As an example, arm and arm64 share an implementation of this callback
+which uses a combination of CPUFreq data and device-tree bindings to compute the
+capacity of CPUs (see drivers/base/arch_topology.c for more details).
+
+So, in order to use EAS on your platform your architecture must implement the
+arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
+capacity than others.
+
+Please note that EAS is not fundamentally incompatible with SMP, but no
+significant savings on SMP platforms have been observed yet. This restriction
+could be amended in the future if proven otherwise.
+
+
+6.2 - Energy Model presence
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+EAS uses the EM of a platform to estimate the impact of scheduling decisions on
+energy. So, your platform must provide power cost tables to the EM framework in
+order to make EAS start. To do so, please refer to documentation of the
+independent EM framework in Documentation/power/energy-model.txt.
+
+Please also note that the scheduling domains need to be re-built after the
+EM has been registered in order to start EAS.
+
+
+6.3 - Energy Model complexity
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The task wake-up path is very latency-sensitive. When the EM of a platform is
+too complex (too many CPUs, too many performance domains, too many performance
+states, ...), the cost of using it in the wake-up path can become prohibitive.
+The energy-aware wake-up algorithm has a complexity of:
+
+	C = Nd * (Nc + Ns)
+
+with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
+total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
+
+A complexity check is performed at the root domain level, when scheduling
+domains are built. EAS will not start on a root domain if its C happens to be
+higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
+time of writing).
+
+If you really want to use EAS but the complexity of your platform's Energy
+Model is too high to be used with a single root domain, you're left with only
+two possible options:
+
+    1. split your system into separate, smaller, root domains using exclusive
+       cpusets and enable EAS locally on each of them. This option has the
+       benefit to work out of the box but the drawback of preventing load
+       balance between root domains, which can result in an unbalanced system
+       overall;
+    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
+       hence enabling it to cope with larger EMs in reasonable time.
+
+
+6.4 - Schedutil governor
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+EAS tries to predict at which OPP will the CPUs be running in the close future
+in order to estimate their energy consumption. To do so, it is assumed that OPPs
+of CPUs follow their utilization.
+
+Although it is very difficult to provide hard guarantees regarding the accuracy
+of this assumption in practice (because the hardware might not do what it is
+told to do, for example), schedutil as opposed to other CPUFreq governors at
+least _requests_ frequencies calculated using the utilization signals.
+Consequently, the only sane governor to use together with EAS is schedutil,
+because it is the only one providing some degree of consistency between
+frequency requests and energy predictions.
+
+Using EAS with any other governor than schedutil is not supported.
+
+
+6.5 Scale-invariant utilization signals
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In order to make accurate prediction across CPUs and for all performance
+states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
+be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
+callbacks.
+
+Using EAS on a platform that doesn't implement these two callbacks is not
+supported.
+
+
+6.6 Multithreading (SMT)
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+EAS in its current form is SMT unaware and is not able to leverage
+multithreaded hardware to save energy. EAS considers threads as independent
+CPUs, which can actually be counter-productive for both performance and energy.
+
+EAS on SMT is not supported.
diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
deleted file mode 100644
index 197d81f4b836..000000000000
--- a/Documentation/scheduler/sched-energy.txt
+++ /dev/null
@@ -1,425 +0,0 @@
-			   =======================
-			   Energy Aware Scheduling
-			   =======================
-
-1. Introduction
----------------
-
-Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict
-the impact of its decisions on the energy consumed by CPUs. EAS relies on an
-Energy Model (EM) of the CPUs to select an energy efficient CPU for each task,
-with a minimal impact on throughput. This document aims at providing an
-introduction on how EAS works, what are the main design decisions behind it, and
-details what is needed to get it to run.
-
-Before going any further, please note that at the time of writing:
-
-   /!\ EAS does not support platforms with symmetric CPU topologies /!\
-
-EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE)
-because this is where the potential for saving energy through scheduling is
-the highest.
-
-The actual EM used by EAS is _not_ maintained by the scheduler, but by a
-dedicated framework. For details about this framework and what it provides,
-please refer to its documentation (see Documentation/power/energy-model.txt).
-
-
-2. Background and Terminology
------------------------------
-
-To make it clear from the start:
- - energy = [joule] (resource like a battery on powered devices)
- - power = energy/time = [joule/second] = [watt]
-
-The goal of EAS is to minimize energy, while still getting the job done. That
-is, we want to maximize:
-
-	performance [inst/s]
-	--------------------
-	    power [W]
-
-which is equivalent to minimizing:
-
-	energy [J]
-	-----------
-	instruction
-
-while still getting 'good' performance. It is essentially an alternative
-optimization objective to the current performance-only objective for the
-scheduler. This alternative considers two objectives: energy-efficiency and
-performance.
-
-The idea behind introducing an EM is to allow the scheduler to evaluate the
-implications of its decisions rather than blindly applying energy-saving
-techniques that may have positive effects only on some platforms. At the same
-time, the EM must be as simple as possible to minimize the scheduler latency
-impact.
-
-In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time
-for the scheduler to decide where a task should run (during wake-up), the EM
-is used to break the tie between several good CPU candidates and pick the one
-that is predicted to yield the best energy consumption without harming the
-system's throughput. The predictions made by EAS rely on specific elements of
-knowledge about the platform's topology, which include the 'capacity' of CPUs,
-and their respective energy costs.
-
-
-3. Topology information
------------------------
-
-EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to
-differentiate CPUs with different computing throughput. The 'capacity' of a CPU
-represents the amount of work it can absorb when running at its highest
-frequency compared to the most capable CPU of the system. Capacity values are
-normalized in a 1024 range, and are comparable with the utilization signals of
-tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks
-to capacity and utilization values, EAS is able to estimate how big/busy a
-task/CPU is, and to take this into consideration when evaluating performance vs
-energy trade-offs. The capacity of CPUs is provided via arch-specific code
-through the arch_scale_cpu_capacity() callback.
-
-The rest of platform knowledge used by EAS is directly read from the Energy
-Model (EM) framework. The EM of a platform is composed of a power cost table
-per 'performance domain' in the system (see Documentation/power/energy-model.txt
-for futher details about performance domains).
-
-The scheduler manages references to the EM objects in the topology code when the
-scheduling domains are built, or re-built. For each root domain (rd), the
-scheduler maintains a singly linked list of all performance domains intersecting
-the current rd->span. Each node in the list contains a pointer to a struct
-em_perf_domain as provided by the EM framework.
-
-The lists are attached to the root domains in order to cope with exclusive
-cpuset configurations. Since the boundaries of exclusive cpusets do not
-necessarily match those of performance domains, the lists of different root
-domains can contain duplicate elements.
-
-Example 1.
-    Let us consider a platform with 12 CPUs, split in 3 performance domains
-    (pd0, pd4 and pd8), organized as follows:
-
-	          CPUs:   0 1 2 3 4 5 6 7 8 9 10 11
-	          PDs:   |--pd0--|--pd4--|---pd8---|
-	          RDs:   |----rd1----|-----rd2-----|
-
-    Now, consider that userspace decided to split the system with two
-    exclusive cpusets, hence creating two independent root domains, each
-    containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the
-    above figure. Since pd4 intersects with both rd1 and rd2, it will be
-    present in the linked list '->pd' attached to each of them:
-       * rd1->pd: pd0 -> pd4
-       * rd2->pd: pd4 -> pd8
-
-    Please note that the scheduler will create two duplicate list nodes for
-    pd4 (one for each list). However, both just hold a pointer to the same
-    shared data structure of the EM framework.
-
-Since the access to these lists can happen concurrently with hotplug and other
-things, they are protected by RCU, like the rest of topology structures
-manipulated by the scheduler.
-
-EAS also maintains a static key (sched_energy_present) which is enabled when at
-least one root domain meets all conditions for EAS to start. Those conditions
-are summarized in Section 6.
-
-
-4. Energy-Aware task placement
-------------------------------
-
-EAS overrides the CFS task wake-up balancing code. It uses the EM of the
-platform and the PELT signals to choose an energy-efficient target CPU during
-wake-up balance. When EAS is enabled, select_task_rq_fair() calls
-find_energy_efficient_cpu() to do the placement decision. This function looks
-for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in
-each performance domain since it is the one which will allow us to keep the
-frequency the lowest. Then, the function checks if placing the task there could
-save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran
-in its previous activation.
-
-find_energy_efficient_cpu() uses compute_energy() to estimate what will be the
-energy consumed by the system if the waking task was migrated. compute_energy()
-looks at the current utilization landscape of the CPUs and adjusts it to
-'simulate' the task migration. The EM framework provides the em_pd_energy() API
-which computes the expected energy consumption of each performance domain for
-the given utilization landscape.
-
-An example of energy-optimized task placement decision is detailed below.
-
-Example 2.
-    Let us consider a (fake) platform with 2 independent performance domains
-    composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3
-    are big.
-
-    The scheduler must decide where to place a task P whose util_avg = 200
-    and prev_cpu = 0.
-
-    The current utilization landscape of the CPUs is depicted on the graph
-    below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively
-    Each performance domain has three Operating Performance Points (OPPs).
-    The CPU capacity and power cost associated with each OPP is listed in
-    the Energy Model table. The util_avg of P is shown on the figures
-    below as 'PP'.
-
-    CPU util.
-      1024                 - - - - - - -              Energy Model
-                                               +-----------+-------------+
-                                               |  Little   |     Big     |
-       768                 =============       +-----+-----+------+------+
-                                               | Cap | Pwr | Cap  | Pwr  |
-                                               +-----+-----+------+------+
-       512  ===========    - ##- - - - -       | 170 | 50  | 512  | 400  |
-                             ##     ##         | 341 | 150 | 768  | 800  |
-       341  -PP - - - -      ##     ##         | 512 | 300 | 1024 | 1700 |
-             PP              ##     ##         +-----+-----+------+------+
-       170  -## - - - -      ##     ##
-             ##     ##       ##     ##
-           ------------    -------------
-            CPU0   CPU1     CPU2   CPU3
-
-      Current OPP: =====       Other OPP: - - -     util_avg (100 each): ##
-
-
-    find_energy_efficient_cpu() will first look for the CPUs with the
-    maximum spare capacity in the two performance domains. In this example,
-    CPU1 and CPU3. Then it will estimate the energy of the system if P was
-    placed on either of them, and check if that would save some energy
-    compared to leaving P on CPU0. EAS assumes that OPPs follow utilization
-    (which is coherent with the behaviour of the schedutil CPUFreq
-    governor, see Section 6. for more details on this topic).
-
-    Case 1. P is migrated to CPU1
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-      1024                 - - - - - - -
-
-                                            Energy calculation:
-       768                 =============     * CPU0: 200 / 341 * 150 = 88
-                                             * CPU1: 300 / 341 * 150 = 131
-                                             * CPU2: 600 / 768 * 800 = 625
-       512  - - - - - -    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
-                             ##     ##          => total_energy = 1364
-       341  ===========      ##     ##
-                    PP       ##     ##
-       170  -## - - PP-      ##     ##
-             ##     ##       ##     ##
-           ------------    -------------
-            CPU0   CPU1     CPU2   CPU3
-
-
-    Case 2. P is migrated to CPU3
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-      1024                 - - - - - - -
-
-                                            Energy calculation:
-       768                 =============     * CPU0: 200 / 341 * 150 = 88
-                                             * CPU1: 100 / 341 * 150 = 43
-                                    PP       * CPU2: 600 / 768 * 800 = 625
-       512  - - - - - -    - ##- - -PP -     * CPU3: 700 / 768 * 800 = 729
-                             ##     ##          => total_energy = 1485
-       341  ===========      ##     ##
-                             ##     ##
-       170  -## - - - -      ##     ##
-             ##     ##       ##     ##
-           ------------    -------------
-            CPU0   CPU1     CPU2   CPU3
-
-
-    Case 3. P stays on prev_cpu / CPU 0
-    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-      1024                 - - - - - - -
-
-                                            Energy calculation:
-       768                 =============     * CPU0: 400 / 512 * 300 = 234
-                                             * CPU1: 100 / 512 * 300 = 58
-                                             * CPU2: 600 / 768 * 800 = 625
-       512  ===========    - ##- - - - -     * CPU3: 500 / 768 * 800 = 520
-                             ##     ##          => total_energy = 1437
-       341  -PP - - - -      ##     ##
-             PP              ##     ##
-       170  -## - - - -      ##     ##
-             ##     ##       ##     ##
-           ------------    -------------
-            CPU0   CPU1     CPU2   CPU3
-
-
-    From these calculations, the Case 1 has the lowest total energy. So CPU 1
-    is be the best candidate from an energy-efficiency standpoint.
-
-Big CPUs are generally more power hungry than the little ones and are thus used
-mainly when a task doesn't fit the littles. However, little CPUs aren't always
-necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
-of the little CPUs can be less energy-efficient than the lowest OPPs of the
-bigs, for example. So, if the little CPUs happen to have enough utilization at
-a specific point in time, a small task waking up at that moment could be better
-of executing on the big side in order to save energy, even though it would fit
-on the little side.
-
-And even in the case where all OPPs of the big CPUs are less energy-efficient
-than those of the little, using the big CPUs for a small task might still, under
-specific conditions, save energy. Indeed, placing a task on a little CPU can
-result in raising the OPP of the entire performance domain, and that will
-increase the cost of the tasks already running there. If the waking task is
-placed on a big CPU, its own execution cost might be higher than if it was
-running on a little, but it won't impact the other tasks of the little CPUs
-which will keep running at a lower OPP. So, when considering the total energy
-consumed by CPUs, the extra cost of running that one task on a big core can be
-smaller than the cost of raising the OPP on the little CPUs for all the other
-tasks.
-
-The examples above would be nearly impossible to get right in a generic way, and
-for all platforms, without knowing the cost of running at different OPPs on all
-CPUs of the system. Thanks to its EM-based design, EAS should cope with them
-correctly without too many troubles. However, in order to ensure a minimal
-impact on throughput for high-utilization scenarios, EAS also implements another
-mechanism called 'over-utilization'.
-
-
-5. Over-utilization
--------------------
-
-From a general standpoint, the use-cases where EAS can help the most are those
-involving a light/medium CPU utilization. Whenever long CPU-bound tasks are
-being run, they will require all of the available CPU capacity, and there isn't
-much that can be done by the scheduler to save energy without severly harming
-throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
-'over-utilized' as soon as they are used at more than 80% of their compute
-capacity. As long as no CPUs are over-utilized in a root domain, load balancing
-is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
-the most energy efficient CPUs of the system more than the others if that can be
-done without harming throughput. So, the load-balancer is disabled to prevent
-it from breaking the energy-efficient task placement found by EAS. It is safe to
-do so when the system isn't overutilized since being below the 80% tipping point
-implies that:
-
-    a. there is some idle time on all CPUs, so the utilization signals used by
-       EAS are likely to accurately represent the 'size' of the various tasks
-       in the system;
-    b. all tasks should already be provided with enough CPU capacity,
-       regardless of their nice values;
-    c. since there is spare capacity all tasks must be blocking/sleeping
-       regularly and balancing at wake-up is sufficient.
-
-As soon as one CPU goes above the 80% tipping point, at least one of the three
-assumptions above becomes incorrect. In this scenario, the 'overutilized' flag
-is raised for the entire root domain, EAS is disabled, and the load-balancer is
-re-enabled. By doing so, the scheduler falls back onto load-based algorithms for
-wake-up and load balance under CPU-bound conditions. This provides a better
-respect of the nice values of tasks.
-
-Since the notion of overutilization largely relies on detecting whether or not
-there is some idle time in the system, the CPU capacity 'stolen' by higher
-(than CFS) scheduling classes (as well as IRQ) must be taken into account. As
-such, the detection of overutilization accounts for the capacity used not only
-by CFS tasks, but also by the other scheduling classes and IRQ.
-
-
-6. Dependencies and requirements for EAS
-----------------------------------------
-
-Energy Aware Scheduling depends on the CPUs of the system having specific
-hardware properties and on other features of the kernel being enabled. This
-section lists these dependencies and provides hints as to how they can be met.
-
-
-  6.1 - Asymmetric CPU topology
-
-As mentioned in the introduction, EAS is only supported on platforms with
-asymmetric CPU topologies for now. This requirement is checked at run-time by
-looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling
-domains are built.
-
-The flag is set/cleared automatically by the scheduler topology code whenever
-there are CPUs with different capacities in a root domain. The capacities of
-CPUs are provided by arch-specific code through the arch_scale_cpu_capacity()
-callback. As an example, arm and arm64 share an implementation of this callback
-which uses a combination of CPUFreq data and device-tree bindings to compute the
-capacity of CPUs (see drivers/base/arch_topology.c for more details).
-
-So, in order to use EAS on your platform your architecture must implement the
-arch_scale_cpu_capacity() callback, and some of the CPUs must have a lower
-capacity than others.
-
-Please note that EAS is not fundamentally incompatible with SMP, but no
-significant savings on SMP platforms have been observed yet. This restriction
-could be amended in the future if proven otherwise.
-
-
-  6.2 - Energy Model presence
-
-EAS uses the EM of a platform to estimate the impact of scheduling decisions on
-energy. So, your platform must provide power cost tables to the EM framework in
-order to make EAS start. To do so, please refer to documentation of the
-independent EM framework in Documentation/power/energy-model.txt.
-
-Please also note that the scheduling domains need to be re-built after the
-EM has been registered in order to start EAS.
-
-
-  6.3 - Energy Model complexity
-
-The task wake-up path is very latency-sensitive. When the EM of a platform is
-too complex (too many CPUs, too many performance domains, too many performance
-states, ...), the cost of using it in the wake-up path can become prohibitive.
-The energy-aware wake-up algorithm has a complexity of:
-
-	C = Nd * (Nc + Ns)
-
-with: Nd the number of performance domains; Nc the number of CPUs; and Ns the
-total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8).
-
-A complexity check is performed at the root domain level, when scheduling
-domains are built. EAS will not start on a root domain if its C happens to be
-higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the
-time of writing).
-
-If you really want to use EAS but the complexity of your platform's Energy
-Model is too high to be used with a single root domain, you're left with only
-two possible options:
-
-    1. split your system into separate, smaller, root domains using exclusive
-       cpusets and enable EAS locally on each of them. This option has the
-       benefit to work out of the box but the drawback of preventing load
-       balance between root domains, which can result in an unbalanced system
-       overall;
-    2. submit patches to reduce the complexity of the EAS wake-up algorithm,
-       hence enabling it to cope with larger EMs in reasonable time.
-
-
-  6.4 - Schedutil governor
-
-EAS tries to predict at which OPP will the CPUs be running in the close future
-in order to estimate their energy consumption. To do so, it is assumed that OPPs
-of CPUs follow their utilization.
-
-Although it is very difficult to provide hard guarantees regarding the accuracy
-of this assumption in practice (because the hardware might not do what it is
-told to do, for example), schedutil as opposed to other CPUFreq governors at
-least _requests_ frequencies calculated using the utilization signals.
-Consequently, the only sane governor to use together with EAS is schedutil,
-because it is the only one providing some degree of consistency between
-frequency requests and energy predictions.
-
-Using EAS with any other governor than schedutil is not supported.
-
-
-  6.5 Scale-invariant utilization signals
-
-In order to make accurate prediction across CPUs and for all performance
-states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
-be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
-callbacks.
-
-Using EAS on a platform that doesn't implement these two callbacks is not
-supported.
-
-
-  6.6 Multithreading (SMT)
-
-EAS in its current form is SMT unaware and is not able to leverage
-multithreaded hardware to save energy. EAS considers threads as independent
-CPUs, which can actually be counter-productive for both performance and energy.
-
-EAS on SMT is not supported.
diff --git a/Documentation/scheduler/sched-nice-design.rst b/Documentation/scheduler/sched-nice-design.rst
new file mode 100644
index 000000000000..0571f1b47e64
--- /dev/null
+++ b/Documentation/scheduler/sched-nice-design.rst
@@ -0,0 +1,112 @@
+=====================
+Scheduler Nice Design
+=====================
+
+This document explains the thinking about the revamped and streamlined
+nice-levels implementation in the new Linux scheduler.
+
+Nice levels were always pretty weak under Linux and people continuously
+pestered us to make nice +19 tasks use up much less CPU time.
+
+Unfortunately that was not that easy to implement under the old
+scheduler, (otherwise we'd have done it long ago) because nice level
+support was historically coupled to timeslice length, and timeslice
+units were driven by the HZ tick, so the smallest timeslice was 1/HZ.
+
+In the O(1) scheduler (in 2003) we changed negative nice levels to be
+much stronger than they were before in 2.4 (and people were happy about
+that change), and we also intentionally calibrated the linear timeslice
+rule so that nice +19 level would be _exactly_ 1 jiffy. To better
+understand it, the timeslice graph went like this (cheesy ASCII art
+alert!)::
+
+
+                   A
+             \     | [timeslice length]
+              \    |
+               \   |
+                \  |
+                 \ |
+                  \|___100msecs
+                   |^ . _
+                   |      ^ . _
+                   |            ^ . _
+ -*----------------------------------*-----> [nice level]
+ -20               |                +19
+                   |
+                   |
+
+So that if someone wanted to really renice tasks, +19 would give a much
+bigger hit than the normal linear rule would do. (The solution of
+changing the ABI to extend priorities was discarded early on.)
+
+This approach worked to some degree for some time, but later on with
+HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which
+we felt to be a bit excessive. Excessive _not_ because it's too small of
+a CPU utilization, but because it causes too frequent (once per
+millisec) rescheduling. (and would thus trash the cache, etc. Remember,
+this was long ago when hardware was weaker and caches were smaller, and
+people were running number crunching apps at nice +19.)
+
+So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the
+right minimal granularity - and this translates to 5% CPU utilization.
+But the fundamental HZ-sensitive property for nice+19 still remained,
+and we never got a single complaint about nice +19 being too _weak_ in
+terms of CPU utilization, we only got complaints about it (still) being
+too _strong_ :-)
+
+To sum it up: we always wanted to make nice levels more consistent, but
+within the constraints of HZ and jiffies and their nasty design level
+coupling to timeslices and granularity it was not really viable.
+
+The second (less frequent but still periodically occurring) complaint
+about Linux's nice level support was its assymetry around the origo
+(which you can see demonstrated in the picture above), or more
+accurately: the fact that nice level behavior depended on the _absolute_
+nice level as well, while the nice API itself is fundamentally
+"relative":
+
+   int nice(int inc);
+
+   asmlinkage long sys_nice(int increment)
+
+(the first one is the glibc API, the second one is the syscall API.)
+Note that the 'inc' is relative to the current nice level. Tools like
+bash's "nice" command mirror this relative API.
+
+With the old scheduler, if you for example started a niced task with +1
+and another task with +2, the CPU split between the two tasks would
+depend on the nice level of the parent shell - if it was at nice -10 the
+CPU split was different than if it was at +5 or +10.
+
+A third complaint against Linux's nice level support was that negative
+nice levels were not 'punchy enough', so lots of people had to resort to
+run audio (and other multimedia) apps under RT priorities such as
+SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation
+proof, and a buggy SCHED_FIFO app can also lock up the system for good.
+
+The new scheduler in v2.6.23 addresses all three types of complaints:
+
+To address the first complaint (of nice levels being not "punchy"
+enough), the scheduler was decoupled from 'time slice' and HZ concepts
+(and granularity was made a separate concept from nice levels) and thus
+it was possible to implement better and more consistent nice +19
+support: with the new scheduler nice +19 tasks get a HZ-independent
+1.5%, instead of the variable 3%-5%-9% range they got in the old
+scheduler.
+
+To address the second complaint (of nice levels not being consistent),
+the new scheduler makes nice(1) have the same CPU utilization effect on
+tasks, regardless of their absolute nice levels. So on the new
+scheduler, running a nice +10 and a nice 11 task has the same CPU
+utilization "split" between them as running a nice -5 and a nice -4
+task. (one will get 55% of the CPU, the other 45%.) That is why nice
+levels were changed to be "multiplicative" (or exponential) - that way
+it does not matter which nice level you start out from, the 'relative
+result' will always be the same.
+
+The third complaint (of negative nice levels not being "punchy" enough
+and forcing audio apps to run under the more dangerous SCHED_FIFO
+scheduling policy) is addressed by the new scheduler almost
+automatically: stronger negative nice levels are an automatic
+side-effect of the recalibrated dynamic range of nice levels.
diff --git a/Documentation/scheduler/sched-nice-design.txt b/Documentation/scheduler/sched-nice-design.txt
deleted file mode 100644
index 3ac1e46d5365..000000000000
--- a/Documentation/scheduler/sched-nice-design.txt
+++ /dev/null
@@ -1,108 +0,0 @@
-This document explains the thinking about the revamped and streamlined
-nice-levels implementation in the new Linux scheduler.
-
-Nice levels were always pretty weak under Linux and people continuously
-pestered us to make nice +19 tasks use up much less CPU time.
-
-Unfortunately that was not that easy to implement under the old
-scheduler, (otherwise we'd have done it long ago) because nice level
-support was historically coupled to timeslice length, and timeslice
-units were driven by the HZ tick, so the smallest timeslice was 1/HZ.
-
-In the O(1) scheduler (in 2003) we changed negative nice levels to be
-much stronger than they were before in 2.4 (and people were happy about
-that change), and we also intentionally calibrated the linear timeslice
-rule so that nice +19 level would be _exactly_ 1 jiffy. To better
-understand it, the timeslice graph went like this (cheesy ASCII art
-alert!):
-
-
-                   A
-             \     | [timeslice length]
-              \    |
-               \   |
-                \  |
-                 \ |
-                  \|___100msecs
-                   |^ . _
-                   |      ^ . _
-                   |            ^ . _
- -*----------------------------------*-----> [nice level]
- -20               |                +19
-                   |
-                   |
-
-So that if someone wanted to really renice tasks, +19 would give a much
-bigger hit than the normal linear rule would do. (The solution of
-changing the ABI to extend priorities was discarded early on.)
-
-This approach worked to some degree for some time, but later on with
-HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which
-we felt to be a bit excessive. Excessive _not_ because it's too small of
-a CPU utilization, but because it causes too frequent (once per
-millisec) rescheduling. (and would thus trash the cache, etc. Remember,
-this was long ago when hardware was weaker and caches were smaller, and
-people were running number crunching apps at nice +19.)
-
-So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the
-right minimal granularity - and this translates to 5% CPU utilization.
-But the fundamental HZ-sensitive property for nice+19 still remained,
-and we never got a single complaint about nice +19 being too _weak_ in
-terms of CPU utilization, we only got complaints about it (still) being
-too _strong_ :-)
-
-To sum it up: we always wanted to make nice levels more consistent, but
-within the constraints of HZ and jiffies and their nasty design level
-coupling to timeslices and granularity it was not really viable.
-
-The second (less frequent but still periodically occurring) complaint
-about Linux's nice level support was its assymetry around the origo
-(which you can see demonstrated in the picture above), or more
-accurately: the fact that nice level behavior depended on the _absolute_
-nice level as well, while the nice API itself is fundamentally
-"relative":
-
-   int nice(int inc);
-
-   asmlinkage long sys_nice(int increment)
-
-(the first one is the glibc API, the second one is the syscall API.)
-Note that the 'inc' is relative to the current nice level. Tools like
-bash's "nice" command mirror this relative API.
-
-With the old scheduler, if you for example started a niced task with +1
-and another task with +2, the CPU split between the two tasks would
-depend on the nice level of the parent shell - if it was at nice -10 the
-CPU split was different than if it was at +5 or +10.
-
-A third complaint against Linux's nice level support was that negative
-nice levels were not 'punchy enough', so lots of people had to resort to
-run audio (and other multimedia) apps under RT priorities such as
-SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation
-proof, and a buggy SCHED_FIFO app can also lock up the system for good.
-
-The new scheduler in v2.6.23 addresses all three types of complaints:
-
-To address the first complaint (of nice levels being not "punchy"
-enough), the scheduler was decoupled from 'time slice' and HZ concepts
-(and granularity was made a separate concept from nice levels) and thus
-it was possible to implement better and more consistent nice +19
-support: with the new scheduler nice +19 tasks get a HZ-independent
-1.5%, instead of the variable 3%-5%-9% range they got in the old
-scheduler.
-
-To address the second complaint (of nice levels not being consistent),
-the new scheduler makes nice(1) have the same CPU utilization effect on
-tasks, regardless of their absolute nice levels. So on the new
-scheduler, running a nice +10 and a nice 11 task has the same CPU
-utilization "split" between them as running a nice -5 and a nice -4
-task. (one will get 55% of the CPU, the other 45%.) That is why nice
-levels were changed to be "multiplicative" (or exponential) - that way
-it does not matter which nice level you start out from, the 'relative
-result' will always be the same.
-
-The third complaint (of negative nice levels not being "punchy" enough
-and forcing audio apps to run under the more dangerous SCHED_FIFO
-scheduling policy) is addressed by the new scheduler almost
-automatically: stronger negative nice levels are an automatic
-side-effect of the recalibrated dynamic range of nice levels.
diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/scheduler/sched-rt-group.rst
new file mode 100644
index 000000000000..79b30a21c51a
--- /dev/null
+++ b/Documentation/scheduler/sched-rt-group.rst
@@ -0,0 +1,185 @@
+==========================
+Real-Time group scheduling
+==========================
+
+.. CONTENTS
+
+   0. WARNING
+   1. Overview
+     1.1 The problem
+     1.2 The solution
+   2. The interface
+     2.1 System-wide settings
+     2.2 Default behaviour
+     2.3 Basis for grouping tasks
+   3. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unstable system, the knobs are
+ root only and assumes root knows what he is doing.
+
+Most notable:
+
+ * very small values in sched_rt_period_us can result in an unstable
+   system when the period is smaller than either the available hrtimer
+   resolution, or the time it takes to handle the budget refresh itself.
+
+ * very small values in sched_rt_runtime_us can result in an unstable
+   system when the runtime is so small the system has difficulty making
+   forward progress (NOTE: the migration thread and kstopmachine both
+   are real-time processes).
+
+1. Overview
+===========
+
+
+1.1 The problem
+---------------
+
+Realtime scheduling is all about determinism, a group has to be able to rely on
+the amount of bandwidth (eg. CPU time) being constant. In order to schedule
+multiple groups of realtime tasks, each group must be assigned a fixed portion
+of the CPU time available.  Without a minimum guarantee a realtime group can
+obviously fall short. A fuzzy upper limit is of no use since it cannot be
+relied upon. Which leaves us with just the single fixed portion.
+
+1.2 The solution
+----------------
+
+CPU time is divided by means of specifying how much time can be spent running
+in a given period. We allocate this "run time" for each realtime group which
+the other realtime groups will not be permitted to use.
+
+Any time not allocated to a realtime group will be used to run normal priority
+tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
+SCHED_OTHER.
+
+Let's consider an example: a frame fixed realtime renderer must deliver 25
+frames a second, which yields a period of 0.04s per frame. Now say it will also
+have to play some music and respond to input, leaving it with around 80% CPU
+time dedicated for the graphics. We can then give this group a run time of 0.8
+* 0.04s = 0.032s.
+
+This way the graphics group will have a 0.04s period with a 0.032s run time
+limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but
+needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
+0.00015s. So this group can be scheduled with a period of 0.005s and a run time
+of 0.00015s.
+
+The remaining CPU time will be used for user input and other tasks. Because
+realtime tasks have explicitly allocated the CPU time they need to perform
+their tasks, buffer underruns in the graphics or audio can be eliminated.
+
+NOTE: the above example is not fully implemented yet. We still
+lack an EDF scheduler to make non-uniform periods usable.
+
+
+2. The Interface
+================
+
+
+2.1 System wide settings
+------------------------
+
+The system wide settings are configured under the /proc virtual file system:
+
+/proc/sys/kernel/sched_rt_period_us:
+  The scheduling period that is equivalent to 100% CPU bandwidth
+
+/proc/sys/kernel/sched_rt_runtime_us:
+  A global limit on how much time realtime scheduling may use.  Even without
+  CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
+  processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
+  available to all realtime groups.
+
+  * Time is specified in us because the interface is s32. This gives an
+    operating range from 1us to about 35 minutes.
+  * sched_rt_period_us takes values from 1 to INT_MAX.
+  * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
+  * A run time of -1 specifies runtime == period, ie. no limit.
+
+
+2.2 Default behaviour
+---------------------
+
+The default values for sched_rt_period_us (1000000 or 1s) and
+sched_rt_runtime_us (950000 or 0.95s).  This gives 0.05s to be used by
+SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
+realtime tasks will not lock up the machine but leave a little time to recover
+it.  By setting runtime to -1 you'd get the old behaviour back.
+
+By default all bandwidth is assigned to the root group and new groups get the
+period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
+want to assign bandwidth to another group, reduce the root group's bandwidth
+and assign some or all of the difference to another group.
+
+Realtime group scheduling means you have to assign a portion of total CPU
+bandwidth to the group before it will accept realtime tasks. Therefore you will
+not be able to run realtime tasks as any user other than root until you have
+done that, even if the user has the rights to run processes with realtime
+priority!
+
+
+2.3 Basis for grouping tasks
+----------------------------
+
+Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
+CPU bandwidth to task groups.
+
+This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
+to control the CPU time reserved for each control group.
+
+For more information on working with control groups, you should read
+Documentation/cgroup-v1/cgroups.txt as well.
+
+Group settings are checked against the following limits in order to keep the
+configuration schedulable:
+
+   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
+
+For now, this can be simplified to just the following (but see Future plans):
+
+   \Sum_{i} runtime_{i} <= global_runtime
+
+
+3. Future plans
+===============
+
+There is work in progress to make the scheduling period for each group
+("<cgroup>/cpu.rt_period_us") configurable as well.
+
+The constraint on the period is that a subgroup must have a smaller or
+equal period to its parent. But realistically its not very useful _yet_
+as its prone to starvation without deadline scheduling.
+
+Consider two sibling groups A and B; both have 50% bandwidth, but A's
+period is twice the length of B's.
+
+* group A: period=100000us, runtime=50000us
+
+	- this runs for 0.05s once every 0.1s
+
+* group B: period= 50000us, runtime=25000us
+
+	- this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
+
+This means that currently a while (1) loop in A will run for the full period of
+B and can starve B's tasks (assuming they are of lower priority) for a whole
+period.
+
+The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
+full deadline scheduling to the linux kernel. Deadline scheduling the above
+groups and treating end of the period as a deadline will ensure that they both
+get their allocated time.
+
+Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
+the biggest challenge as the current linux PI infrastructure is geared towards
+the limited static priority levels 0-99. With deadline scheduling you need to
+do deadline inheritance (since priority is inversely proportional to the
+deadline delta (deadline - now)).
+
+This means the whole PI machinery will have to be reworked - and that is one of
+the most complex pieces of code we have.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
deleted file mode 100644
index d8fce3e78457..000000000000
--- a/Documentation/scheduler/sched-rt-group.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-				Real-Time group scheduling
-				--------------------------
-
-CONTENTS
-========
-
-0. WARNING
-1. Overview
-  1.1 The problem
-  1.2 The solution
-2. The interface
-  2.1 System-wide settings
-  2.2 Default behaviour
-  2.3 Basis for grouping tasks
-3. Future plans
-
-
-0. WARNING
-==========
-
- Fiddling with these settings can result in an unstable system, the knobs are
- root only and assumes root knows what he is doing.
-
-Most notable:
-
- * very small values in sched_rt_period_us can result in an unstable
-   system when the period is smaller than either the available hrtimer
-   resolution, or the time it takes to handle the budget refresh itself.
-
- * very small values in sched_rt_runtime_us can result in an unstable
-   system when the runtime is so small the system has difficulty making
-   forward progress (NOTE: the migration thread and kstopmachine both
-   are real-time processes).
-
-1. Overview
-===========
-
-
-1.1 The problem
----------------
-
-Realtime scheduling is all about determinism, a group has to be able to rely on
-the amount of bandwidth (eg. CPU time) being constant. In order to schedule
-multiple groups of realtime tasks, each group must be assigned a fixed portion
-of the CPU time available.  Without a minimum guarantee a realtime group can
-obviously fall short. A fuzzy upper limit is of no use since it cannot be
-relied upon. Which leaves us with just the single fixed portion.
-
-1.2 The solution
-----------------
-
-CPU time is divided by means of specifying how much time can be spent running
-in a given period. We allocate this "run time" for each realtime group which
-the other realtime groups will not be permitted to use.
-
-Any time not allocated to a realtime group will be used to run normal priority
-tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
-SCHED_OTHER.
-
-Let's consider an example: a frame fixed realtime renderer must deliver 25
-frames a second, which yields a period of 0.04s per frame. Now say it will also
-have to play some music and respond to input, leaving it with around 80% CPU
-time dedicated for the graphics. We can then give this group a run time of 0.8
-* 0.04s = 0.032s.
-
-This way the graphics group will have a 0.04s period with a 0.032s run time
-limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but
-needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
-0.00015s. So this group can be scheduled with a period of 0.005s and a run time
-of 0.00015s.
-
-The remaining CPU time will be used for user input and other tasks. Because
-realtime tasks have explicitly allocated the CPU time they need to perform
-their tasks, buffer underruns in the graphics or audio can be eliminated.
-
-NOTE: the above example is not fully implemented yet. We still
-lack an EDF scheduler to make non-uniform periods usable.
-
-
-2. The Interface
-================
-
-
-2.1 System wide settings
-------------------------
-
-The system wide settings are configured under the /proc virtual file system:
-
-/proc/sys/kernel/sched_rt_period_us:
-  The scheduling period that is equivalent to 100% CPU bandwidth
-
-/proc/sys/kernel/sched_rt_runtime_us:
-  A global limit on how much time realtime scheduling may use.  Even without
-  CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
-  processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
-  available to all realtime groups.
-
-  * Time is specified in us because the interface is s32. This gives an
-    operating range from 1us to about 35 minutes.
-  * sched_rt_period_us takes values from 1 to INT_MAX.
-  * sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
-  * A run time of -1 specifies runtime == period, ie. no limit.
-
-
-2.2 Default behaviour
----------------------
-
-The default values for sched_rt_period_us (1000000 or 1s) and
-sched_rt_runtime_us (950000 or 0.95s).  This gives 0.05s to be used by
-SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
-realtime tasks will not lock up the machine but leave a little time to recover
-it.  By setting runtime to -1 you'd get the old behaviour back.
-
-By default all bandwidth is assigned to the root group and new groups get the
-period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
-want to assign bandwidth to another group, reduce the root group's bandwidth
-and assign some or all of the difference to another group.
-
-Realtime group scheduling means you have to assign a portion of total CPU
-bandwidth to the group before it will accept realtime tasks. Therefore you will
-not be able to run realtime tasks as any user other than root until you have
-done that, even if the user has the rights to run processes with realtime
-priority!
-
-
-2.3 Basis for grouping tasks
-----------------------------
-
-Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
-CPU bandwidth to task groups.
-
-This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
-to control the CPU time reserved for each control group.
-
-For more information on working with control groups, you should read
-Documentation/cgroup-v1/cgroups.txt as well.
-
-Group settings are checked against the following limits in order to keep the
-configuration schedulable:
-
-   \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
-
-For now, this can be simplified to just the following (but see Future plans):
-
-   \Sum_{i} runtime_{i} <= global_runtime
-
-
-3. Future plans
-===============
-
-There is work in progress to make the scheduling period for each group
-("<cgroup>/cpu.rt_period_us") configurable as well.
-
-The constraint on the period is that a subgroup must have a smaller or
-equal period to its parent. But realistically its not very useful _yet_
-as its prone to starvation without deadline scheduling.
-
-Consider two sibling groups A and B; both have 50% bandwidth, but A's
-period is twice the length of B's.
-
-* group A: period=100000us, runtime=50000us
-	- this runs for 0.05s once every 0.1s
-
-* group B: period= 50000us, runtime=25000us
-	- this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
-
-This means that currently a while (1) loop in A will run for the full period of
-B and can starve B's tasks (assuming they are of lower priority) for a whole
-period.
-
-The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
-full deadline scheduling to the linux kernel. Deadline scheduling the above
-groups and treating end of the period as a deadline will ensure that they both
-get their allocated time.
-
-Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
-the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-99. With deadline scheduling you need to
-do deadline inheritance (since priority is inversely proportional to the
-deadline delta (deadline - now)).
-
-This means the whole PI machinery will have to be reworked - and that is one of
-the most complex pieces of code we have.
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
new file mode 100644
index 000000000000..0cb0aa714545
--- /dev/null
+++ b/Documentation/scheduler/sched-stats.rst
@@ -0,0 +1,167 @@
+====================
+Scheduler Statistics
+====================
+
+Version 15 of schedstats dropped counters for some sched_yield:
+yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
+identical to version 14.
+
+Version 14 of schedstats includes support for sched_domains, which hit the
+mainline kernel in 2.6.20 although it is identical to the stats from version
+12 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel
+release).  Some counters make more sense to be per-runqueue; other to be
+per-domain.  Note that domains (and their associated information) will only
+be pertinent and available on machines utilizing CONFIG_SMP.
+
+In version 14 of schedstat, there is at least one level of domain
+statistics for each cpu listed, and there may well be more than one
+domain.  Domains have no particular names in this implementation, but
+the highest numbered one typically arbitrates balancing across all the
+cpus on the machine, while domain0 is the most tightly focused domain,
+sometimes balancing only between pairs of cpus.  At this time, there
+are no architectures which need more than three domain levels. The first
+field in the domain stats is a bit map indicating which cpus are affected
+by that domain.
+
+These fields are counters, and only increment.  Programs which make use
+of these will need to start with a baseline observation and then calculate
+the change in the counters at each subsequent observation.  A perl script
+which does this for many of the fields is available at
+
+    http://eaglet.rain.com/rick/linux/schedstat/
+
+Note that any such script will necessarily be version-specific, as the main
+reason to change versions is changes in the output format.  For those wishing
+to write their own scripts, the fields are described here.
+
+CPU statistics
+--------------
+cpu<N> 1 2 3 4 5 6 7 8 9
+
+First field is a sched_yield() statistic:
+
+     1) # of times sched_yield() was called
+
+Next three are schedule() statistics:
+
+     2) This field is a legacy array expiration count field used in the O(1)
+	scheduler. We kept it for ABI compatibility, but it is always set to zero.
+     3) # of times schedule() was called
+     4) # of times schedule() left the processor idle
+
+Next two are try_to_wake_up() statistics:
+
+     5) # of times try_to_wake_up() was called
+     6) # of times try_to_wake_up() was called to wake up the local cpu
+
+Next three are statistics describing scheduling latency:
+
+     7) sum of all time spent running by tasks on this processor (in jiffies)
+     8) sum of all time spent waiting to run by tasks on this processor (in
+        jiffies)
+     9) # of timeslices run on this cpu
+
+
+Domain statistics
+-----------------
+One of these is produced per domain for each cpu described. (Note that if
+CONFIG_SMP is not defined, *no* domains are utilized and these lines
+will not appear in the output.)
+
+domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+
+The first field is a bit mask indicating what cpus this domain operates over.
+
+The next 24 are a variety of load_balance() statistics in grouped into types
+of idleness (idle, busy, and newly idle):
+
+    1)  # of times in this domain load_balance() was called when the
+        cpu was idle
+    2)  # of times in this domain load_balance() checked but found
+        the load did not require balancing when the cpu was idle
+    3)  # of times in this domain load_balance() tried to move one or
+        more tasks and failed, when the cpu was idle
+    4)  sum of imbalances discovered (if any) with each call to
+        load_balance() in this domain when the cpu was idle
+    5)  # of times in this domain pull_task() was called when the cpu
+        was idle
+    6)  # of times in this domain pull_task() was called even though
+        the target task was cache-hot when idle
+    7)  # of times in this domain load_balance() was called but did
+        not find a busier queue while the cpu was idle
+    8)  # of times in this domain a busier queue was found while the
+        cpu was idle but no busier group was found
+    9)  # of times in this domain load_balance() was called when the
+        cpu was busy
+    10) # of times in this domain load_balance() checked but found the
+        load did not require balancing when busy
+    11) # of times in this domain load_balance() tried to move one or
+        more tasks and failed, when the cpu was busy
+    12) sum of imbalances discovered (if any) with each call to
+        load_balance() in this domain when the cpu was busy
+    13) # of times in this domain pull_task() was called when busy
+    14) # of times in this domain pull_task() was called even though the
+        target task was cache-hot when busy
+    15) # of times in this domain load_balance() was called but did not
+        find a busier queue while the cpu was busy
+    16) # of times in this domain a busier queue was found while the cpu
+        was busy but no busier group was found
+
+    17) # of times in this domain load_balance() was called when the
+        cpu was just becoming idle
+    18) # of times in this domain load_balance() checked but found the
+        load did not require balancing when the cpu was just becoming idle
+    19) # of times in this domain load_balance() tried to move one or more
+        tasks and failed, when the cpu was just becoming idle
+    20) sum of imbalances discovered (if any) with each call to
+        load_balance() in this domain when the cpu was just becoming idle
+    21) # of times in this domain pull_task() was called when newly idle
+    22) # of times in this domain pull_task() was called even though the
+        target task was cache-hot when just becoming idle
+    23) # of times in this domain load_balance() was called but did not
+        find a busier queue while the cpu was just becoming idle
+    24) # of times in this domain a busier queue was found while the cpu
+        was just becoming idle but no busier group was found
+
+   Next three are active_load_balance() statistics:
+
+    25) # of times active_load_balance() was called
+    26) # of times active_load_balance() tried to move a task and failed
+    27) # of times active_load_balance() successfully moved a task
+
+   Next three are sched_balance_exec() statistics:
+
+    28) sbe_cnt is not used
+    29) sbe_balanced is not used
+    30) sbe_pushed is not used
+
+   Next three are sched_balance_fork() statistics:
+
+    31) sbf_cnt is not used
+    32) sbf_balanced is not used
+    33) sbf_pushed is not used
+
+   Next three are try_to_wake_up() statistics:
+
+    34) # of times in this domain try_to_wake_up() awoke a task that
+        last ran on a different cpu in this domain
+    35) # of times in this domain try_to_wake_up() moved a task to the
+        waking cpu because it was cache-cold on its own cpu anyway
+    36) # of times in this domain try_to_wake_up() started passive balancing
+
+/proc/<pid>/schedstat
+---------------------
+schedstats also adds a new /proc/<pid>/schedstat file to include some of
+the same information on a per-process level.  There are three fields in
+this file correlating for that process to:
+
+     1) time spent on the cpu
+     2) time spent waiting on a runqueue
+     3) # of timeslices run on this cpu
+
+A program could be easily written to make use of these extra fields to
+report on how well a particular process or set of processes is faring
+under the scheduler's policies.  A simple version of such a program is
+available at
+
+    http://eaglet.rain.com/rick/linux/schedstat/v12/latency.c
diff --git a/Documentation/scheduler/sched-stats.txt b/Documentation/scheduler/sched-stats.txt
deleted file mode 100644
index 8259b34a66ae..000000000000
--- a/Documentation/scheduler/sched-stats.txt
+++ /dev/null
@@ -1,154 +0,0 @@
-Version 15 of schedstats dropped counters for some sched_yield:
-yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
-identical to version 14.
-
-Version 14 of schedstats includes support for sched_domains, which hit the
-mainline kernel in 2.6.20 although it is identical to the stats from version
-12 which was in the kernel from 2.6.13-2.6.19 (version 13 never saw a kernel
-release).  Some counters make more sense to be per-runqueue; other to be
-per-domain.  Note that domains (and their associated information) will only
-be pertinent and available on machines utilizing CONFIG_SMP.
-
-In version 14 of schedstat, there is at least one level of domain
-statistics for each cpu listed, and there may well be more than one
-domain.  Domains have no particular names in this implementation, but
-the highest numbered one typically arbitrates balancing across all the
-cpus on the machine, while domain0 is the most tightly focused domain,
-sometimes balancing only between pairs of cpus.  At this time, there
-are no architectures which need more than three domain levels. The first
-field in the domain stats is a bit map indicating which cpus are affected
-by that domain.
-
-These fields are counters, and only increment.  Programs which make use
-of these will need to start with a baseline observation and then calculate
-the change in the counters at each subsequent observation.  A perl script
-which does this for many of the fields is available at
-
-    http://eaglet.rain.com/rick/linux/schedstat/
-
-Note that any such script will necessarily be version-specific, as the main
-reason to change versions is changes in the output format.  For those wishing
-to write their own scripts, the fields are described here.
-
-CPU statistics
---------------
-cpu<N> 1 2 3 4 5 6 7 8 9
-
-First field is a sched_yield() statistic:
-     1) # of times sched_yield() was called
-
-Next three are schedule() statistics:
-     2) This field is a legacy array expiration count field used in the O(1)
-	scheduler. We kept it for ABI compatibility, but it is always set to zero.
-     3) # of times schedule() was called
-     4) # of times schedule() left the processor idle
-
-Next two are try_to_wake_up() statistics:
-     5) # of times try_to_wake_up() was called
-     6) # of times try_to_wake_up() was called to wake up the local cpu
-
-Next three are statistics describing scheduling latency:
-     7) sum of all time spent running by tasks on this processor (in jiffies)
-     8) sum of all time spent waiting to run by tasks on this processor (in
-        jiffies)
-     9) # of timeslices run on this cpu
-
-
-Domain statistics
------------------
-One of these is produced per domain for each cpu described. (Note that if
-CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output.)
-
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
-
-The first field is a bit mask indicating what cpus this domain operates over.
-
-The next 24 are a variety of load_balance() statistics in grouped into types
-of idleness (idle, busy, and newly idle):
-
-     1) # of times in this domain load_balance() was called when the
-        cpu was idle
-     2) # of times in this domain load_balance() checked but found
-        the load did not require balancing when the cpu was idle
-     3) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was idle
-     4) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was idle
-     5) # of times in this domain pull_task() was called when the cpu
-        was idle
-     6) # of times in this domain pull_task() was called even though
-        the target task was cache-hot when idle
-     7) # of times in this domain load_balance() was called but did
-        not find a busier queue while the cpu was idle
-     8) # of times in this domain a busier queue was found while the
-        cpu was idle but no busier group was found
-
-     9) # of times in this domain load_balance() was called when the
-        cpu was busy
-    10) # of times in this domain load_balance() checked but found the
-        load did not require balancing when busy
-    11) # of times in this domain load_balance() tried to move one or
-        more tasks and failed, when the cpu was busy
-    12) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was busy
-    13) # of times in this domain pull_task() was called when busy
-    14) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when busy
-    15) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was busy
-    16) # of times in this domain a busier queue was found while the cpu
-        was busy but no busier group was found
-
-    17) # of times in this domain load_balance() was called when the
-        cpu was just becoming idle
-    18) # of times in this domain load_balance() checked but found the
-        load did not require balancing when the cpu was just becoming idle
-    19) # of times in this domain load_balance() tried to move one or more
-        tasks and failed, when the cpu was just becoming idle
-    20) sum of imbalances discovered (if any) with each call to
-        load_balance() in this domain when the cpu was just becoming idle
-    21) # of times in this domain pull_task() was called when newly idle
-    22) # of times in this domain pull_task() was called even though the
-        target task was cache-hot when just becoming idle
-    23) # of times in this domain load_balance() was called but did not
-        find a busier queue while the cpu was just becoming idle
-    24) # of times in this domain a busier queue was found while the cpu
-        was just becoming idle but no busier group was found
-
-   Next three are active_load_balance() statistics:
-    25) # of times active_load_balance() was called
-    26) # of times active_load_balance() tried to move a task and failed
-    27) # of times active_load_balance() successfully moved a task
-
-   Next three are sched_balance_exec() statistics:
-    28) sbe_cnt is not used
-    29) sbe_balanced is not used
-    30) sbe_pushed is not used
-
-   Next three are sched_balance_fork() statistics:
-    31) sbf_cnt is not used
-    32) sbf_balanced is not used
-    33) sbf_pushed is not used
-
-   Next three are try_to_wake_up() statistics:
-    34) # of times in this domain try_to_wake_up() awoke a task that
-        last ran on a different cpu in this domain
-    35) # of times in this domain try_to_wake_up() moved a task to the
-        waking cpu because it was cache-cold on its own cpu anyway
-    36) # of times in this domain try_to_wake_up() started passive balancing
-
-/proc/<pid>/schedstat
-----------------
-schedstats also adds a new /proc/<pid>/schedstat file to include some of
-the same information on a per-process level.  There are three fields in
-this file correlating for that process to:
-     1) time spent on the cpu
-     2) time spent waiting on a runqueue
-     3) # of timeslices run on this cpu
-
-A program could be easily written to make use of these extra fields to
-report on how well a particular process or set of processes is faring
-under the scheduler's policies.  A simple version of such a program is
-available at
-    http://eaglet.rain.com/rick/linux/schedstat/v12/latency.c
diff --git a/Documentation/scheduler/text_files.rst b/Documentation/scheduler/text_files.rst
new file mode 100644
index 000000000000..0bc50307b241
--- /dev/null
+++ b/Documentation/scheduler/text_files.rst
@@ -0,0 +1,5 @@
+Scheduler pelt c program
+------------------------
+
+.. literalinclude:: sched-pelt.c
+    :language: c
diff --git a/Documentation/vm/numa.rst b/Documentation/vm/numa.rst
index 5cae13e9a08b..461d5d57cd4f 100644
--- a/Documentation/vm/numa.rst
+++ b/Documentation/vm/numa.rst
@@ -99,7 +99,7 @@ Local allocation will tend to keep subsequent access to the allocated memory
 as long as the task on whose behalf the kernel allocated some memory does not
 later migrate away from that memory.  The Linux scheduler is aware of the
 NUMA topology of the platform--embodied in the "scheduling domains" data
-structures [see Documentation/scheduler/sched-domains.txt]--and the scheduler
+structures [see Documentation/scheduler/sched-domains.rst]--and the scheduler
 attempts to minimize task migration to distant scheduling domains.  However,
 the scheduler does not take a task's NUMA footprint into account directly.
 Thus, under sufficient imbalance, tasks can migrate between nodes, remote
diff --git a/init/Kconfig b/init/Kconfig
index 0e2344389501..640132853499 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -734,7 +734,7 @@ menuconfig CGROUPS
 	  use with process control subsystems such as Cpusets, CFS, memory
 	  controls or device isolation.
 	  See
-		- Documentation/scheduler/sched-design-CFS.txt	(CFS)
+		- Documentation/scheduler/sched-design-CFS.rst	(CFS)
 		- Documentation/cgroup-v1/ (features for grouping, isolation
 					  and resource control)
 
@@ -835,7 +835,7 @@ config CFS_BANDWIDTH
 	  tasks running within the fair group scheduler.  Groups with no limit
 	  set are considered to be unconstrained and will run with no
 	  restriction.
-	  See Documentation/scheduler/sched-bwc.txt for more information.
+	  See Documentation/scheduler/sched-bwc.rst for more information.
 
 config RT_GROUP_SCHED
 	bool "Group scheduling for SCHED_RR/FIFO"
@@ -846,7 +846,7 @@ config RT_GROUP_SCHED
 	  to task groups. If enabled, it will also make it impossible to
 	  schedule realtime tasks for non-root users until you allocate
 	  realtime bandwidth for them.
-	  See Documentation/scheduler/sched-rt-group.txt for more information.
+	  See Documentation/scheduler/sched-rt-group.rst for more information.
 
 endif #CGROUP_SCHED
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43901fa3f269..049d795ee9d3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -726,7 +726,7 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
  * refill the runtime and set the deadline a period in the future,
  * because keeping the current (absolute) deadline of the task would
  * result in breaking guarantees promised to other tasks (refer to
- * Documentation/scheduler/sched-deadline.txt for more information).
+ * Documentation/scheduler/sched-deadline.rst for more information).
  *
  * This function returns true if:
  *
-- 
cgit v1.2.3-59-g8ed1b


From a2f405a5269fc7d705926298971dcb5e76054e8a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 12 Jun 2019 14:53:04 -0300
Subject: docs: EDID/HOWTO.txt: convert it and rename to howto.rst

Sphinx need to know when a paragraph ends. So, do some adjustments
at the file for it to be properly parsed.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

that's said, I believe that this file should be moved to the
GPU/DRM documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/EDID/HOWTO.txt                    | 49 ---------------------
 Documentation/EDID/howto.rst                    | 58 +++++++++++++++++++++++++
 Documentation/admin-guide/kernel-parameters.txt |  2 +-
 drivers/gpu/drm/Kconfig                         |  2 +-
 4 files changed, 60 insertions(+), 51 deletions(-)
 delete mode 100644 Documentation/EDID/HOWTO.txt
 create mode 100644 Documentation/EDID/howto.rst

diff --git a/Documentation/EDID/HOWTO.txt b/Documentation/EDID/HOWTO.txt
deleted file mode 100644
index 539871c3b785..000000000000
--- a/Documentation/EDID/HOWTO.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-In the good old days when graphics parameters were configured explicitly
-in a file called xorg.conf, even broken hardware could be managed.
-
-Today, with the advent of Kernel Mode Setting, a graphics board is
-either correctly working because all components follow the standards -
-or the computer is unusable, because the screen remains dark after
-booting or it displays the wrong area. Cases when this happens are:
-- The graphics board does not recognize the monitor.
-- The graphics board is unable to detect any EDID data.
-- The graphics board incorrectly forwards EDID data to the driver.
-- The monitor sends no or bogus EDID data.
-- A KVM sends its own EDID data instead of querying the connected monitor.
-Adding the kernel parameter "nomodeset" helps in most cases, but causes
-restrictions later on.
-
-As a remedy for such situations, the kernel configuration item
-CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
-individually prepared or corrected EDID data set in the /lib/firmware
-directory from where it is loaded via the firmware interface. The code
-(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
-commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
-1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
-not contain code to create these data. In order to elucidate the origin
-of the built-in binary EDID blobs and to facilitate the creation of
-individual data for a specific misbehaving monitor, commented sources
-and a Makefile environment are given here.
-
-To create binary EDID and C source code files from the existing data
-material, simply type "make".
-
-If you want to create your own EDID file, copy the file 1024x768.S,
-replace the settings with your own data and add a new target to the
-Makefile. Please note that the EDID data structure expects the timing
-values in a different way as compared to the standard X11 format.
-
-X11:
-HTimings:  hdisp hsyncstart hsyncend htotal
-VTimings:  vdisp vsyncstart vsyncend vtotal
-
-EDID:
-#define XPIX hdisp
-#define XBLANK htotal-hdisp
-#define XOFFSET hsyncstart-hdisp
-#define XPULSE hsyncend-hsyncstart
-
-#define YPIX vdisp
-#define YBLANK vtotal-vdisp
-#define YOFFSET vsyncstart-vdisp
-#define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/EDID/howto.rst b/Documentation/EDID/howto.rst
new file mode 100644
index 000000000000..725fd49a88ca
--- /dev/null
+++ b/Documentation/EDID/howto.rst
@@ -0,0 +1,58 @@
+:orphan:
+
+====
+EDID
+====
+
+In the good old days when graphics parameters were configured explicitly
+in a file called xorg.conf, even broken hardware could be managed.
+
+Today, with the advent of Kernel Mode Setting, a graphics board is
+either correctly working because all components follow the standards -
+or the computer is unusable, because the screen remains dark after
+booting or it displays the wrong area. Cases when this happens are:
+- The graphics board does not recognize the monitor.
+- The graphics board is unable to detect any EDID data.
+- The graphics board incorrectly forwards EDID data to the driver.
+- The monitor sends no or bogus EDID data.
+- A KVM sends its own EDID data instead of querying the connected monitor.
+Adding the kernel parameter "nomodeset" helps in most cases, but causes
+restrictions later on.
+
+As a remedy for such situations, the kernel configuration item
+CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
+individually prepared or corrected EDID data set in the /lib/firmware
+directory from where it is loaded via the firmware interface. The code
+(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
+commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
+1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
+not contain code to create these data. In order to elucidate the origin
+of the built-in binary EDID blobs and to facilitate the creation of
+individual data for a specific misbehaving monitor, commented sources
+and a Makefile environment are given here.
+
+To create binary EDID and C source code files from the existing data
+material, simply type "make".
+
+If you want to create your own EDID file, copy the file 1024x768.S,
+replace the settings with your own data and add a new target to the
+Makefile. Please note that the EDID data structure expects the timing
+values in a different way as compared to the standard X11 format.
+
+X11:
+  HTimings:
+    hdisp hsyncstart hsyncend htotal
+  VTimings:
+    vdisp vsyncstart vsyncend vtotal
+
+EDID::
+
+  #define XPIX hdisp
+  #define XBLANK htotal-hdisp
+  #define XOFFSET hsyncstart-hdisp
+  #define XPULSE hsyncend-hsyncstart
+
+  #define YPIX vdisp
+  #define YBLANK vtotal-vdisp
+  #define YOFFSET vsyncstart-vdisp
+  #define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9ac37fcca3ee..4edf67801420 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -932,7 +932,7 @@
 			edid/1680x1050.bin, or edid/1920x1080.bin is given
 			and no file with the same name exists. Details and
 			instructions how to build your own EDID data are
-			available in Documentation/EDID/HOWTO.txt. An EDID
+			available in Documentation/EDID/howto.rst. An EDID
 			data set will only be used for a particular connector,
 			if its name and a colon are prepended to the EDID
 			name. Each connector may use a unique EDID data
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 36f900d63979..e20e2956f620 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -141,7 +141,7 @@ config DRM_LOAD_EDID_FIRMWARE
 	  monitor are unable to provide appropriate EDID data. Since this
 	  feature is provided as a workaround for broken hardware, the
 	  default case is N. Details and instructions how to build your own
-	  EDID data are given in Documentation/EDID/HOWTO.txt.
+	  EDID data are given in Documentation/EDID/howto.rst.
 
 config DRM_DP_CEC
 	bool "Enable DisplayPort CEC-Tunneling-over-AUX HDMI support"
-- 
cgit v1.2.3-59-g8ed1b


From 407b584d155be67a9311a62da8d7874f62b987ac Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Thu, 13 Jun 2019 07:29:17 -0300
Subject: scripts/documentation-file-ref-check: ignore output dir

When there's no Documentation/output directory, the script will
complain about those missing references:

	Documentation/doc-guide/sphinx.rst: Documentation/output
	Documentation/doc-guide/sphinx.rst: Documentation/output
	Documentation/process/howto.rst: Documentation/output
	Documentation/translations/it_IT/doc-guide/sphinx.rst: Documentation/output
	Documentation/translations/it_IT/doc-guide/sphinx.rst: Documentation/output
	Documentation/translations/it_IT/process/howto.rst: Documentation/output
	Documentation/translations/ja_JP/howto.rst: Documentation/output
	Documentation/translations/ko_KR/howto.rst: Documentation/output

Those are false positives, so add an ignore rule for them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/documentation-file-ref-check | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check
index a4139a576726..7784c54aa38b 100755
--- a/scripts/documentation-file-ref-check
+++ b/scripts/documentation-file-ref-check
@@ -90,6 +90,9 @@ while (<IN>) {
 	# Skip this script
 	next if ($f eq $scriptname);
 
+	# Ignore the dir where documentation will be built
+	next if ($ln =~ m,\b(\S*)Documentation/output,);
+
 	if ($ln =~ m,\b(\S*)(Documentation/[A-Za-z0-9\_\.\,\~/\*\[\]\?+-]*)(.*),) {
 		my $prefix = $1;
 		my $ref = $2;
-- 
cgit v1.2.3-59-g8ed1b


From 83e8b971f81cebe4f9a84cc76d328ac955b62a7a Mon Sep 17 00:00:00 2001
From: André Almeida <andrealmeid@collabora.com>
Date: Tue, 11 Jun 2019 17:03:16 -0300
Subject: sphinx.rst: Add note about code snippets embedded in the text
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

There's a paragraph that explains how to create fixed width text block,
but it doesn't explains how to create fixed width text inline, although
this feature is really used through the documentation. Fix that adding a
quick note about it.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/doc-guide/sphinx.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index 4ba081f43e98..e60a56640c63 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -217,7 +217,7 @@ Here are some specific guidelines for the kernel documentation:
   examples, etc.), use ``::`` for anything that doesn't really benefit
   from syntax highlighting, especially short snippets. Use
   ``.. code-block:: <language>`` for longer code blocks that benefit
-  from highlighting.
+  from highlighting. For a short snippet of code embedded in the text, use \`\`.
 
 
 the C domain
-- 
cgit v1.2.3-59-g8ed1b


From cd84d63a2983ee2d386ff5a020c2c36562e4ef68 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose@arm.com>
Date: Mon, 10 Jun 2019 19:02:42 +0100
Subject: Documentation: coresight: Update the generic device names

Update the documentation to reflect the new naming scheme with
latest changes.

Reported-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/trace/coresight.txt | 82 ++++++++++++++++++++++++++++++++-------
 1 file changed, 67 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index efbc832146e7..b027d61b27a6 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -188,6 +188,49 @@ specific to that component only.  "Implementation defined" customisations are
 expected to be accessed and controlled using those entries.
 
 
+Device Naming scheme
+------------------------
+The devices that appear on the "coresight" bus were named the same as their
+parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
+Thus the names were based on the Linux Open Firmware layer naming convention,
+which follows the base physical address of the device followed by the device
+type. e.g:
+
+root:~# ls /sys/bus/coresight/devices/
+ 20010000.etf  20040000.funnel      20100000.stm     22040000.etm
+ 22140000.etm  230c0000.funnel      23240000.etm     20030000.tpiu
+ 20070000.etr  20120000.replicator  220c0000.funnel
+ 23040000.etm  23140000.etm         23340000.etm
+
+However, with the introduction of ACPI support, the names of the real
+devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
+introduced to use more generic names based on the type of the device. The
+following rules apply:
+
+  1) Devices that are bound to CPUs, are named based on the CPU logical
+     number.
+
+     e.g, ETM bound to CPU0 is named "etm0"
+
+  2) All other devices follow a pattern, "<device_type_prefix>N", where :
+
+	<device_type_prefix> 	- A prefix specific to the type of the device
+	N			- a sequential number assigned based on the order
+				  of probing.
+
+	e.g, tmc_etf0, tmc_etr0, funnel0, funnel1
+
+Thus, with the new scheme the devices could appear as :
+
+root:~# ls /sys/bus/coresight/devices/
+ etm0     etm1     etm2         etm3  etm4      etm5      funnel0
+ funnel1  funnel2  replicator0  stm0  tmc_etf0  tmc_etr0  tpiu0
+
+Some of the examples below might refer to old naming scheme and some
+to the newer scheme, to give a confirmation that what you see on your
+system is not unexpected. One must use the "names" as they appear on
+the system under specified locations.
+
 How to use the tracer modules
 -----------------------------
 
@@ -326,16 +369,25 @@ amount of processor cores), the "cs_etm" PMU will be listed only once.
 A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
 listed along with configuration options within forward slashes '/'.  Since a
 Coresight system will typically have more than one sink, the name of the sink to
-work with needs to be specified as an event option.  Names for sink to choose
-from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
+work with needs to be specified as an event option.
+On newer kernels the available sinks are listed in sysFS under:
+($SYSFS)/bus/event_source/devices/cs_etm/sinks/
+
+	root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
+	tmc_etf0  tmc_etr0  tpiu0
+
+On older kernels, this may need to be found from the list of coresight devices,
+available under ($SYSFS)/bus/coresight/devices/:
+
+	root:~# ls /sys/bus/coresight/devices/
+	 etm0     etm1     etm2         etm3  etm4      etm5      funnel0
+	 funnel1  funnel2  replicator0  stm0  tmc_etf0  tmc_etr0  tpiu0
 
-	root@linaro-nano:~# ls /sys/bus/coresight/devices/
-		20010000.etf   20040000.funnel  20100000.stm  22040000.etm
-		22140000.etm  230c0000.funnel  23240000.etm 20030000.tpiu
-		20070000.etr     20120000.replicator  220c0000.funnel
-		23040000.etm  23140000.etm     23340000.etm
+	root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
 
-	root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program
+As mentioned above in section "Device Naming scheme", the names of the devices could
+look different from what is used in the example above. One must use the device names
+as it appears under the sysFS.
 
 The syntax within the forward slashes '/' is important.  The '@' character
 tells the parser that a sink is about to be specified and that this is the sink
@@ -352,7 +404,7 @@ perf can be used to record and analyze trace of programs.
 Execution can be recorded using 'perf record' with the cs_etm event,
 specifying the name of the sink to record to, e.g:
 
-    perf record -e cs_etm/@20070000.etr/u --per-thread
+    perf record -e cs_etm/@tmc_etr0/u --per-thread
 
 The 'perf report' and 'perf script' commands can be used to analyze execution,
 synthesizing instruction and branch events from the instruction trace.
@@ -381,7 +433,7 @@ sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tuto
 	Bubble sorting array of 30000 elements
 	5910 ms
 
-	$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+	$ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
 	Bubble sorting array of 30000 elements
 	12543 ms
 	[ perf record: Woken up 35 times to write data ]
@@ -405,7 +457,7 @@ than the program flow through the code.
 As with any other CoreSight component, specifics about the STM tracer can be
 found in sysfs with more information on each entry being found in [1]:
 
-root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
+root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
 enable_source   hwevent_select  port_enable     subsystem       uevent
 hwevent_enable  mgmt            port_select     traceid
 root@genericarmv8:~#
@@ -413,14 +465,14 @@ root@genericarmv8:~#
 Like any other source a sink needs to be identified and the STM enabled before
 being used:
 
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
-root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
 
 From there user space applications can request and use channels using the devfs
 interface provided for that purpose by the generic STM API:
 
-root@genericarmv8:~# ls -l /dev/20100000.stm
-crw-------    1 root     root       10,  61 Jan  3 18:11 /dev/20100000.stm
+root@genericarmv8:~# ls -l /dev/stm0
+crw-------    1 root     root       10,  61 Jan  3 18:11 /dev/stm0
 root@genericarmv8:~#
 
 Details on how to use the generic STM API can be found here [2].
-- 
cgit v1.2.3-59-g8ed1b


From 31753202325dd4f9bba27fbbad05f189881c60bc Mon Sep 17 00:00:00 2001
From: Bhupesh Sharma <bhsharma@redhat.com>
Date: Mon, 10 Jun 2019 15:33:39 +0530
Subject: Documentation/stackprotector: powerpc supports stack protector

powerpc architecture (both 64-bit and 32-bit) supports stack protector
mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64:
add stack protector support")].

Update stackprotector arch support documentation to reflect the same.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/features/debug/stackprotector/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt
index 9999ea521f3e..32bbdfc64c32 100644
--- a/Documentation/features/debug/stackprotector/arch-support.txt
+++ b/Documentation/features/debug/stackprotector/arch-support.txt
@@ -22,7 +22,7 @@
     |       nios2: | TODO |
     |    openrisc: | TODO |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |       riscv: | TODO |
     |        s390: | TODO |
     |          sh: |  ok  |
-- 
cgit v1.2.3-59-g8ed1b


From 9d9b889540c380d8f56f7a79edbaae2fff8684d1 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Sun, 9 Jun 2019 14:14:36 +0300
Subject: block: document iostat changes for disk busy time accounting

Since commit 5b18b5a73760 ("block: delete part_round_stats and switch to
less precise counting") io_ticks is approximated by adding one at each
start and end of requests if jiffies has changed.

This works perfectly for requests shorter than a jiffy. If requests runs
more than 2 jiffies some I/O time will not be accounted unless there are
other reuqests.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/iostats.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/iostats.txt b/Documentation/iostats.txt
index 49df45f90e8a..5d63b18bd6d1 100644
--- a/Documentation/iostats.txt
+++ b/Documentation/iostats.txt
@@ -97,6 +97,10 @@ Field  9 -- # of I/Os currently in progress
 Field 10 -- # of milliseconds spent doing I/Os
     This field increases so long as field 9 is nonzero.
 
+    Since 5.0 this field counts jiffies when at least one request was
+    started or completed. If request runs more than 2 jiffies then some
+    I/O time will not be accounted unless there are other requests.
+
 Field 11 -- weighted # of milliseconds spent doing I/Os
     This field is incremented at each I/O start, I/O completion, I/O
     merge, or read of these stats by the number of I/Os in progress
-- 
cgit v1.2.3-59-g8ed1b


From d95ea1a4e1fb7c7ec44969afaf0d983f8170ebef Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Tue, 28 May 2019 17:12:51 -0600
Subject: docs: Add a document on repository management

Every merge window seems to involve at least one episode where subsystem
maintainers don't manage their trees as Linus would like.  Document the
expectations so that at least he has something to point people to.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/maintainer/index.rst                |   1 +
 Documentation/maintainer/rebasing-and-merging.rst | 226 ++++++++++++++++++++++
 2 files changed, 227 insertions(+)
 create mode 100644 Documentation/maintainer/rebasing-and-merging.rst

diff --git a/Documentation/maintainer/index.rst b/Documentation/maintainer/index.rst
index 2a14916930cb..56e2c09dfa39 100644
--- a/Documentation/maintainer/index.rst
+++ b/Documentation/maintainer/index.rst
@@ -10,5 +10,6 @@ additions to this manual.
    :maxdepth: 2
 
    configure-git
+   rebasing-and-merging
    pull-requests
 
diff --git a/Documentation/maintainer/rebasing-and-merging.rst b/Documentation/maintainer/rebasing-and-merging.rst
new file mode 100644
index 000000000000..09f988e7fa71
--- /dev/null
+++ b/Documentation/maintainer/rebasing-and-merging.rst
@@ -0,0 +1,226 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Rebasing and merging
+====================
+
+Maintaining a subsystem, as a general rule, requires a familiarity with the
+Git source-code management system.  Git is a powerful tool with a lot of
+features; as is often the case with such tools, there are right and wrong
+ways to use those features.  This document looks in particular at the use
+of rebasing and merging.  Maintainers often get in trouble when they use
+those tools incorrectly, but avoiding problems is not actually all that
+hard.
+
+One thing to be aware of in general is that, unlike many other projects,
+the kernel community is not scared by seeing merge commits in its
+development history.  Indeed, given the scale of the project, avoiding
+merges would be nearly impossible.  Some problems encountered by
+maintainers result from a desire to avoid merges, while others come from
+merging a little too often.
+
+Rebasing
+========
+
+"Rebasing" is the process of changing the history of a series of commits
+within a repository.  There are two different types of operations that are
+referred to as rebasing since both are done with the ``git rebase``
+command, but there are significant differences between them:
+
+ - Changing the parent (starting) commit upon which a series of patches is
+   built.  For example, a rebase operation could take a patch set built on
+   the previous kernel release and base it, instead, on the current
+   release.  We'll call this operation "reparenting" in the discussion
+   below.
+
+ - Changing the history of a set of patches by fixing (or deleting) broken
+   commits, adding patches, adding tags to commit changelogs, or changing
+   the order in which commits are applied.  In the following text, this
+   type of operation will be referred to as "history modification"
+
+The term "rebasing" will be used to refer to both of the above operations.
+Used properly, rebasing can yield a cleaner and clearer development
+history; used improperly, it can obscure that history and introduce bugs.
+
+There are a few rules of thumb that can help developers to avoid the worst
+perils of rebasing:
+
+ - History that has been exposed to the world beyond your private system
+   should usually not be changed.  Others may have pulled a copy of your
+   tree and built on it; modifying your tree will create pain for them.  If
+   work is in need of rebasing, that is usually a sign that it is not yet
+   ready to be committed to a public repository.
+
+   That said, there are always exceptions.  Some trees (linux-next being
+   a significant example) are frequently rebased by their nature, and
+   developers know not to base work on them.  Developers will sometimes
+   expose an unstable branch for others to test with or for automated
+   testing services.  If you do expose a branch that may be unstable in
+   this way, be sure that prospective users know not to base work on it.
+
+ - Do not rebase a branch that contains history created by others.  If you
+   have pulled changes from another developer's repository, you are now a
+   custodian of their history.  You should not change it.  With few
+   exceptions, for example, a broken commit in a tree like this should be
+   explicitly reverted rather than disappeared via history modification.
+
+ - Do not reparent a tree without a good reason to do so.  Just being on a
+   newer base or avoiding a merge with an upstream repository is not
+   generally a good reason.
+
+ - If you must reparent a repository, do not pick some random kernel commit
+   as the new base.  The kernel is often in a relatively unstable state
+   between release points; basing development on one of those points
+   increases the chances of running into surprising bugs.  When a patch
+   series must move to a new base, pick a stable point (such as one of
+   the -rc releases) to move to.
+
+ - Realize that reparenting a patch series (or making significant history
+   modifications) changes the environment in which it was developed and,
+   likely, invalidates much of the testing that was done.  A reparented
+   patch series should, as a general rule, be treated like new code and
+   retested from the beginning.
+
+A frequent cause of merge-window trouble is when Linus is presented with a
+patch series that has clearly been reparented, often to a random commit,
+shortly before the pull request was sent.  The chances of such a series
+having been adequately tested are relatively low - as are the chances of
+the pull request being acted upon.
+
+If, instead, rebasing is limited to private trees, commits are based on a
+well-known starting point, and they are well tested, the potential for
+trouble is low.
+
+Merging
+=======
+
+Merging is a common operation in the kernel development process; the 5.1
+development cycle included 1,126 merge commits - nearly 9% of the total.
+Kernel work is accumulated in over 100 different subsystem trees, each of
+which may contain multiple topic branches; each branch is usually developed
+independently of the others.  So naturally, at least one merge will be
+required before any given branch finds its way into an upstream repository.
+
+Many projects require that branches in pull requests be based on the
+current trunk so that no merge commits appear in the history.  The kernel
+is not such a project; any rebasing of branches to avoid merges will, most
+likely, lead to trouble.
+
+Subsystem maintainers find themselves having to do two types of merges:
+from lower-level subsystem trees and from others, either sibling trees or
+the mainline.  The best practices to follow differ in those two situations.
+
+Merging from lower-level trees
+------------------------------
+
+Larger subsystems tend to have multiple levels of maintainers, with the
+lower-level maintainers sending pull requests to the higher levels.  Acting
+on such a pull request will almost certainly generate a merge commit; that
+is as it should be.  In fact, subsystem maintainers may want to use
+the --no-ff flag to force the addition of a merge commit in the rare cases
+where one would not normally be created so that the reasons for the merge
+can be recorded.  The changelog for the merge should, for any kind of
+merge, say *why* the merge is being done.  For a lower-level tree, "why" is
+usually a summary of the changes that will come with that pull.
+
+Maintainers at all levels should be using signed tags on their pull
+requests, and upstream maintainers should verify the tags when pulling
+branches.  Failure to do so threatens the security of the development
+process as a whole.
+
+As per the rules outlined above, once you have merged somebody else's
+history into your tree, you cannot rebase that branch, even if you
+otherwise would be able to.
+
+Merging from sibling or upstream trees
+--------------------------------------
+
+While merges from downstream are common and unremarkable, merges from other
+trees tend to be a red flag when it comes time to push a branch upstream.
+Such merges need to be carefully thought about and well justified, or
+there's a good chance that a subsequent pull request will be rejected.
+
+It is natural to want to merge the master branch into a repository; this
+type of merge is often called a "back merge".  Back merges can help to make
+sure that there are no conflicts with parallel development and generally
+gives a warm, fuzzy feeling of being up-to-date.  But this temptation
+should be avoided almost all of the time.
+
+Why is that?  Back merges will muddy the development history of your own
+branch.  They will significantly increase your chances of encountering bugs
+from elsewhere in the community and make it hard to ensure that the work
+you are managing is stable and ready for upstream.  Frequent merges can
+also obscure problems with the development process in your tree; they can
+hide interactions with other trees that should not be happening (often) in
+a well-managed branch.
+
+That said, back merges are occasionally required; when that happens, be
+sure to document *why* it was required in the commit message.  As always,
+merge to a well-known stable point, rather than to some random commit.
+Even then, you should not back merge a tree above your immediate upstream
+tree; if a higher-level back merge is really required, the upstream tree
+should do it first.
+
+One of the most frequent causes of merge-related trouble is when a
+maintainer merges with the upstream in order to resolve merge conflicts
+before sending a pull request.  Again, this temptation is easy enough to
+understand, but it should absolutely be avoided.  This is especially true
+for the final pull request: Linus is adamant that he would much rather see
+merge conflicts than unnecessary back merges.  Seeing the conflicts lets
+him know where potential problem areas are.  He does a lot of merges (382
+in the 5.1 development cycle) and has gotten quite good at conflict
+resolution - often better than the developers involved.
+
+So what should a maintainer do when there is a conflict between their
+subsystem branch and the mainline?  The most important step is to warn
+Linus in the pull request that the conflict will happen; if nothing else,
+that demonstrates an awareness of how your branch fits into the whole.  For
+especially difficult conflicts, create and push a *separate* branch to show
+how you would resolve things.  Mention that branch in your pull request,
+but the pull request itself should be for the unmerged branch.
+
+Even in the absence of known conflicts, doing a test merge before sending a
+pull request is a good idea.  It may alert you to problems that you somehow
+didn't see from linux-next and helps to understand exactly what you are
+asking upstream to do.
+
+Another reason for doing merges of upstream or another subsystem tree is to
+resolve dependencies.  These dependency issues do happen at times, and
+sometimes a cross-merge with another tree is the best way to resolve them;
+as always, in such situations, the merge commit should explain why the
+merge has been done.  Take a moment to do it right; people will read those
+changelogs.
+
+Often, though, dependency issues indicate that a change of approach is
+needed.  Merging another subsystem tree to resolve a dependency risks
+bringing in other bugs and should almost never be done.  If that subsystem
+tree fails to be pulled upstream, whatever problems it had will block the
+merging of your tree as well.  Preferable alternatives include agreeing
+with the maintainer to carry both sets of changes in one of the trees or
+creating a topic branch dedicated to the prerequisite commits that can be
+merged into both trees.  If the dependency is related to major
+infrastructural changes, the right solution might be to hold the dependent
+commits for one development cycle so that those changes have time to
+stabilize in the mainline.
+
+Finally
+=======
+
+It is relatively common to merge with the mainline toward the beginning of
+the development cycle in order to pick up changes and fixes done elsewhere
+in the tree.  As always, such a merge should pick a well-known release
+point rather than some random spot.  If your upstream-bound branch has
+emptied entirely into the mainline during the merge window, you can pull it
+forward with a command like::
+
+  git merge v5.2-rc1^0
+
+The "^0" will cause Git to do a fast-forward merge (which should be
+possible in this situation), thus avoiding the addition of a spurious merge
+commit.
+
+The guidelines laid out above are just that: guidelines.  There will always
+be situations that call out for a different solution, and these guidelines
+should not prevent developers from doing the right thing when the need
+arises.  But one should always think about whether the need has truly
+arisen and be prepared to explain why something abnormal needs to be done. 
-- 
cgit v1.2.3-59-g8ed1b


From 0ad6be30baa3a3bc69349327d62ec4c188db3364 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai@suse.de>
Date: Wed, 19 Jun 2019 07:39:43 +0200
Subject: docs: fb: Add TER16x32 to the available font names

The new font is available since recently.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/fb/fbcon.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/fb/fbcon.rst b/Documentation/fb/fbcon.rst
index cfb9f7c38f18..1da65b9000de 100644
--- a/Documentation/fb/fbcon.rst
+++ b/Documentation/fb/fbcon.rst
@@ -82,7 +82,7 @@ C. Boot options
 
 	Select the initial font to use. The value 'name' can be any of the
 	compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6,
-	PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8.
+	PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, TER16x32, VGA8x16, VGA8x8.
 
 	Note, not all drivers can handle font with widths not divisible by 8,
 	such as vga16fb.
-- 
cgit v1.2.3-59-g8ed1b


From 7e6294cdc0b055225f45e66003b33175e06154aa Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Tue, 18 Jun 2019 15:51:18 -0300
Subject: docs: trace: add a missing blank line

Sphinx expects a blank line after a literal block markup.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/trace/kprobetrace.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index baa3c42ba2f4..7d2b0178d3f3 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -191,6 +191,7 @@ events, you need to enable it.
 
 Use the following command to start tracing in an interval.
 ::
+
     # echo 1 > tracing_on
     Open something...
     # echo 0 > tracing_on
-- 
cgit v1.2.3-59-g8ed1b


From 4ae5b8f2140d18788979153370c83cf925092b5c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Tue, 18 Jun 2019 15:51:19 -0300
Subject: lib: list_sort.c: add a blank line to avoid kernel-doc warnings

In order for a list to be recognized as such, blank lines
are required.

Solve those Sphinx warnings:

./lib/list_sort.c:162: WARNING: Unexpected indentation.
./lib/list_sort.c:163: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 lib/list_sort.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/list_sort.c b/lib/list_sort.c
index 712ed1f4eb64..52f0c258c895 100644
--- a/lib/list_sort.c
+++ b/lib/list_sort.c
@@ -157,9 +157,11 @@ static void merge_final(void *priv, cmp_func cmp, struct list_head *head,
  *
  * The number of pending lists of size 2^k is determined by the
  * state of bit k of "count" plus two extra pieces of information:
+ *
  * - The state of bit k-1 (when k == 0, consider bit -1 always set), and
  * - Whether the higher-order bits are zero or non-zero (i.e.
  *   is count >= 2^(k+1)).
+ *
  * There are six states we distinguish.  "x" represents some arbitrary
  * bits, and "y" represents some arbitrary non-zero bits:
  * 0:  00x: 0 pending of size 2^k;           x pending of sizes < 2^k
-- 
cgit v1.2.3-59-g8ed1b


From 220ee02a0b38726a90430e94714c87550dc3d476 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Thu, 13 Jun 2019 18:25:48 +0200
Subject: docs: stop suggesting strlcpy

Since strlcpy is deprecated, the documentation shouldn't suggest using
it. This patch fixes the examples to use strscpy instead. It also uses
sizeof instead of underlying constants as far as possible, to simplify
future changes to the corresponding data structures.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/hid/hid-transport.txt                         | 6 +++---
 Documentation/i2c/instantiating-devices                     | 2 +-
 Documentation/i2c/upgrading-clients                         | 4 ++--
 Documentation/kernel-hacking/locking.rst                    | 6 +++---
 Documentation/translations/it_IT/kernel-hacking/locking.rst | 6 +++---
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/Documentation/hid/hid-transport.txt b/Documentation/hid/hid-transport.txt
index 3dcba9fd4a3a..4f41d67f1b4b 100644
--- a/Documentation/hid/hid-transport.txt
+++ b/Documentation/hid/hid-transport.txt
@@ -194,9 +194,9 @@ with HID core:
 		goto err_<...>;
 	}
 
-	strlcpy(hid->name, <device-name-src>, 127);
-	strlcpy(hid->phys, <device-phys-src>, 63);
-	strlcpy(hid->uniq, <device-uniq-src>, 63);
+	strscpy(hid->name, <device-name-src>, sizeof(hid->name));
+	strscpy(hid->phys, <device-phys-src>, sizeof(hid->phys));
+	strscpy(hid->uniq, <device-uniq-src>, sizeof(hid->uniq));
 
 	hid->ll_driver = &custom_ll_driver;
 	hid->bus = <device-bus>;
diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices
index 5a3e2f331e8c..345e9ea8281a 100644
--- a/Documentation/i2c/instantiating-devices
+++ b/Documentation/i2c/instantiating-devices
@@ -137,7 +137,7 @@ static int usb_hcd_nxp_probe(struct platform_device *pdev)
 	(...)
 	i2c_adap = i2c_get_adapter(2);
 	memset(&i2c_info, 0, sizeof(struct i2c_board_info));
-	strlcpy(i2c_info.type, "isp1301_nxp", I2C_NAME_SIZE);
+	strscpy(i2c_info.type, "isp1301_nxp", sizeof(i2c_info.type));
 	isp1301_i2c_client = i2c_new_probed_device(i2c_adap, &i2c_info,
 						   normal_i2c, NULL);
 	i2c_put_adapter(i2c_adap);
diff --git a/Documentation/i2c/upgrading-clients b/Documentation/i2c/upgrading-clients
index ccba3ffd6e80..96392cc5b5c7 100644
--- a/Documentation/i2c/upgrading-clients
+++ b/Documentation/i2c/upgrading-clients
@@ -43,7 +43,7 @@ static int example_attach(struct i2c_adapter *adap, int addr, int kind)
 	example->client.adapter = adap;
 
 	i2c_set_clientdata(&state->i2c_client, state);
-	strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE);
+	strscpy(client->i2c_client.name, "example", sizeof(client->i2c_client.name));
 
 	ret = i2c_attach_client(&state->i2c_client);
 	if (ret < 0) {
@@ -138,7 +138,7 @@ can be removed:
 -	example->client.flags   = 0;
 -	example->client.adapter = adap;
 -
--	strlcpy(client->i2c_client.name, "example", I2C_NAME_SIZE);
+-	strscpy(client->i2c_client.name, "example", sizeof(client->i2c_client.name));
 
 The i2c_set_clientdata is now:
 
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 519673df0e82..dc698ea456e0 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -451,7 +451,7 @@ to protect the cache and all the objects within it. Here's the code::
             if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
                     return -ENOMEM;
 
-            strlcpy(obj->name, name, sizeof(obj->name));
+            strscpy(obj->name, name, sizeof(obj->name));
             obj->id = id;
             obj->popularity = 0;
 
@@ -660,7 +660,7 @@ Here is the code::
      }
 
     @@ -63,6 +94,7 @@
-             strlcpy(obj->name, name, sizeof(obj->name));
+             strscpy(obj->name, name, sizeof(obj->name));
              obj->id = id;
              obj->popularity = 0;
     +        obj->refcnt = 1; /* The cache holds a reference */
@@ -774,7 +774,7 @@ the lock is no longer used to protect the reference count itself.
      }
 
     @@ -94,7 +76,7 @@
-             strlcpy(obj->name, name, sizeof(obj->name));
+             strscpy(obj->name, name, sizeof(obj->name));
              obj->id = id;
              obj->popularity = 0;
     -        obj->refcnt = 1; /* The cache holds a reference */
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index 0ef31666663b..5fd8a1abd2be 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -468,7 +468,7 @@ e tutti gli oggetti che contiene. Ecco il codice::
             if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
                     return -ENOMEM;
 
-            strlcpy(obj->name, name, sizeof(obj->name));
+            strscpy(obj->name, name, sizeof(obj->name));
             obj->id = id;
             obj->popularity = 0;
 
@@ -678,7 +678,7 @@ Ecco il codice::
      }
 
     @@ -63,6 +94,7 @@
-             strlcpy(obj->name, name, sizeof(obj->name));
+             strscpy(obj->name, name, sizeof(obj->name));
              obj->id = id;
              obj->popularity = 0;
     +        obj->refcnt = 1; /* The cache holds a reference */
@@ -792,7 +792,7 @@ contatore stesso.
      }
 
     @@ -94,7 +76,7 @@
-             strlcpy(obj->name, name, sizeof(obj->name));
+             strscpy(obj->name, name, sizeof(obj->name));
              obj->id = id;
              obj->popularity = 0;
     -        obj->refcnt = 1; /* The cache holds a reference */
-- 
cgit v1.2.3-59-g8ed1b


From 22aac857394c6105803afcb9801e4e4771eb755d Mon Sep 17 00:00:00 2001
From: Valentin Schneider <valentin.schneider@arm.com>
Date: Tue, 18 Jun 2019 15:56:05 +0100
Subject: docs/vm: hwpoison.rst: Fix quote formatting

The asterisks prepended to the quoted text currently get translated to
bullet points, which gets increasingly confusing the smaller your
screen is (when viewing the sphinx output, that is).

Convert the whole quote to a literal block.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/vm/hwpoison.rst | 52 +++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index 09bd24a92784..a5c884293dac 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -13,32 +13,32 @@ kill the processes associated with it and avoid using it in the future.
 
 This patchkit implements the necessary infrastructure in the VM.
 
-To quote the overview comment:
-
- * High level machine check handler. Handles pages reported by the
- * hardware as being corrupted usually due to a 2bit ECC memory or cache
- * failure.
- *
- * This focusses on pages detected as corrupted in the background.
- * When the current CPU tries to consume corruption the currently
- * running process can just be killed directly instead. This implies
- * that if the error cannot be handled for some reason it's safe to
- * just ignore it because no corruption has been consumed yet. Instead
- * when that happens another machine check will happen.
- *
- * Handles page cache pages in various states. The tricky part
- * here is that we can access any page asynchronous to other VM
- * users, because memory failures could happen anytime and anywhere,
- * possibly violating some of their assumptions. This is why this code
- * has to be extremely careful. Generally it tries to use normal locking
- * rules, as in get the standard locks, even if that means the
- * error handling takes potentially a long time.
- *
- * Some of the operations here are somewhat inefficient and have non
- * linear algorithmic complexity, because the data structures have not
- * been optimized for this case. This is in particular the case
- * for the mapping from a vma to a process. Since this case is expected
- * to be rare we hope we can get away with this.
+To quote the overview comment::
+
+	High level machine check handler. Handles pages reported by the
+	hardware as being corrupted usually due to a 2bit ECC memory or cache
+	failure.
+
+	This focusses on pages detected as corrupted in the background.
+	When the current CPU tries to consume corruption the currently
+	running process can just be killed directly instead. This implies
+	that if the error cannot be handled for some reason it's safe to
+	just ignore it because no corruption has been consumed yet. Instead
+	when that happens another machine check will happen.
+
+	Handles page cache pages in various states. The tricky part
+	here is that we can access any page asynchronous to other VM
+	users, because memory failures could happen anytime and anywhere,
+	possibly violating some of their assumptions. This is why this code
+	has to be extremely careful. Generally it tries to use normal locking
+	rules, as in get the standard locks, even if that means the
+	error handling takes potentially a long time.
+
+	Some of the operations here are somewhat inefficient and have non
+	linear algorithmic complexity, because the data structures have not
+	been optimized for this case. This is in particular the case
+	for the mapping from a vma to a process. Since this case is expected
+	to be rare we hope we can get away with this.
 
 The code consists of a the high level handler in mm/memory-failure.c,
 a new page poison bit and various checks in the VM to handle poisoned
-- 
cgit v1.2.3-59-g8ed1b


From eb8ed28f024fe428a08ee6b7c4de6a31ff6610ad Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Fri, 7 Jun 2019 16:14:06 +0100
Subject: Documentation: x86: Contiguous cbm isn't all X86

Since commit 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
resctrl has supported non-contiguous cache bit masks. The interface
for this is currently try-it-and-see.

Update the documentation to say Intel CPUs have this requirement,
instead of X86.

Cc: Babu Moger <Babu.Moger@amd.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/resctrl_ui.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 225cfd4daaee..066f94e53418 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -342,7 +342,7 @@ For cache resources we describe the portion of the cache that is available
 for allocation using a bitmask. The maximum value of the mask is defined
 by each cpu model (and may be different for different cache levels). It
 is found using CPUID, but is also provided in the "info" directory of
-the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
+the resctrl file system in "info/{resource}/cbm_mask". Intel hardware
 requires that these masks have all the '1' bits in a contiguous block. So
 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
 and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
-- 
cgit v1.2.3-59-g8ed1b


From 7c7a49958286fee9a2b18baffc9c626304a00843 Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Fri, 7 Jun 2019 16:14:07 +0100
Subject: Documentation: x86: Remove cdpl2 unspported statement and fix
 capitalisation

"L2 cache does not support code and data prioritization". This isn't
true, elsewhere the document says it can be enabled with the cdpl2
mount option.

While we're here, these sample strings have lower-case code/data,
which isn't how the kernel exports them.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/resctrl_ui.rst | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 066f94e53418..638cd987937d 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -418,16 +418,22 @@ L3 schemata file details (CDP enabled via mount option to resctrl)
 When CDP is enabled L3 control is split into two separate resources
 so you can specify independent masks for code and data like this::
 
-	L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
-	L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+	L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+	L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
 
 L2 schemata file details
 ------------------------
-L2 cache does not support code and data prioritization, so the
-schemata format is always::
+CDP is supported at L2 using the 'cdpl2' mount option. The schemata
+format is either::
 
 	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
 
+or
+
+	L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+	L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+
 Memory bandwidth Allocation (default mode)
 ------------------------------------------
 
-- 
cgit v1.2.3-59-g8ed1b


From b5453a8ec310f3ee2f7689958d63d9b8ffcbcd3e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Fri, 7 Jun 2019 16:14:08 +0100
Subject: Documentation: x86: Clarify MBA takes MB as referring to mba_sc

"If the MBA is specified in MB then user can enter the max b/w in MB"
is a tautology. How can the user know if the schemata takes a percentage
or a MB/s value?

This is referring to whether the software controller is interpreting
the schemata's value. Make this clear.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/resctrl_ui.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 638cd987937d..866b66aa289b 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -677,8 +677,8 @@ allocations can overlap or not. The allocations specifies the maximum
 b/w that the group may be able to use and the system admin can configure
 the b/w accordingly.
 
-If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB
-rather than the percentage values.
+If resctrl is using the software controller (mba_sc) then user can enter the
+max b/w in MB rather than the percentage values.
 ::
 
   # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
-- 
cgit v1.2.3-59-g8ed1b


From 57794aab8884087debb22fc214d8ca81999ffb0e Mon Sep 17 00:00:00 2001
From: James Morse <james.morse@arm.com>
Date: Fri, 7 Jun 2019 16:14:09 +0100
Subject: Documentation: x86: fix some typos

These are all obvious typos.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/resctrl_ui.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 866b66aa289b..5368cedfb530 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -40,7 +40,7 @@ mount options are:
 	Enable the MBA Software Controller(mba_sc) to specify MBA
 	bandwidth in MBps
 
-L2 and L3 CDP are controlled seperately.
+L2 and L3 CDP are controlled separately.
 
 RDT features are orthogonal. A particular system may support only
 monitoring, only control, or both monitoring and control.  Cache
@@ -118,7 +118,7 @@ related to allocation:
 			      Corresponding region is pseudo-locked. No
 			      sharing allowed.
 
-Memory bandwitdh(MB) subdirectory contains the following files
+Memory bandwidth(MB) subdirectory contains the following files
 with respect to allocation:
 
 "min_bandwidth":
@@ -209,7 +209,7 @@ All groups contain the following files:
 	CPUs to/from this group. As with the tasks file a hierarchy is
 	maintained where MON groups may only include CPUs owned by the
 	parent CTRL_MON group.
-	When the resouce group is in pseudo-locked mode this file will
+	When the resource group is in pseudo-locked mode this file will
 	only be readable, reflecting the CPUs associated with the
 	pseudo-locked region.
 
@@ -380,7 +380,7 @@ where L2 external  is 10GBps (hence aggregate L2 external bandwidth is
 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
 threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
 bandwidth of 100GBps although the percentage value specified is only 50%
-<< 100%. Hence increasing the bandwidth percentage will not yeild any
+<< 100%. Hence increasing the bandwidth percentage will not yield any
 more bandwidth. This is because although the L2 external bandwidth still
 has capacity, the L3 external bandwidth is fully used. Also note that
 this would be dependent on number of cores the benchmark is run on.
@@ -398,7 +398,7 @@ In order to mitigate this and make the interface more user friendly,
 resctrl added support for specifying the bandwidth in MBps as well.  The
 kernel underneath would use a software feedback mechanism or a "Software
 Controller(mba_sc)" which reads the actual bandwidth using MBM counters
-and adjust the memowy bandwidth percentages to ensure::
+and adjust the memory bandwidth percentages to ensure::
 
 	"actual bandwidth < user specified bandwidth".
 
-- 
cgit v1.2.3-59-g8ed1b


From 0f48a2441613d39db50fb12cc739455ef7f1921f Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Mon, 17 Jun 2019 16:34:00 +0200
Subject: doc-rst: Add missing newline at end of file

"git diff" says:

    \ No newline at end of file

after modifying the file.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/docutils.conf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/docutils.conf b/Documentation/docutils.conf
index 2830772264c8..f1a180b97dec 100644
--- a/Documentation/docutils.conf
+++ b/Documentation/docutils.conf
@@ -4,4 +4,4 @@
 # http://docutils.sourceforge.net/docs/user/config.html
 
 [general]
-halt_level: severe
\ No newline at end of file
+halt_level: severe
-- 
cgit v1.2.3-59-g8ed1b


From d74b0d31dddeac2b44c715588d53d9a1e5b1158e Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Thu, 25 Apr 2019 07:55:07 -0600
Subject: Docs: An initial automarkup extension for sphinx

Rather than fill our text files with :c:func:`function()` syntax, just do
the markup via a hook into the sphinx build process.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/conf.py              |  3 +-
 Documentation/sphinx/automarkup.py | 93 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/sphinx/automarkup.py

diff --git a/Documentation/conf.py b/Documentation/conf.py
index 7ace3f8852bd..a502baecbb85 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -34,7 +34,8 @@ needs_sphinx = '1.3'
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
+extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
+              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
 
 # The name of the math extension changed on Sphinx 1.4
 if (major == 1 and minor > 3) or (major > 1):
diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
new file mode 100644
index 000000000000..b300cf129869
--- /dev/null
+++ b/Documentation/sphinx/automarkup.py
@@ -0,0 +1,93 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2019 Jonathan Corbet <corbet@lwn.net>
+#
+# Apply kernel-specific tweaks after the initial document processing
+# has been done.
+#
+from docutils import nodes
+from sphinx import addnodes
+import re
+
+#
+# Regex nastiness.  Of course.
+# Try to identify "function()" that's not already marked up some
+# other way.  Sphinx doesn't like a lot of stuff right after a
+# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
+# bit tries to restrict matches to things that won't create trouble.
+#
+RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
+
+#
+# Many places in the docs refer to common system calls.  It is
+# pointless to try to cross-reference them and, as has been known
+# to happen, somebody defining a function by these names can lead
+# to the creation of incorrect and confusing cross references.  So
+# just don't even try with these names.
+#
+Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap'
+              'select', 'poll', 'fork', 'execve', 'clone', 'ioctl']
+
+#
+# Find all occurrences of function() and try to replace them with
+# appropriate cross references.
+#
+def markup_funcs(docname, app, node):
+    cdom = app.env.domains['c']
+    t = node.astext()
+    done = 0
+    repl = [ ]
+    for m in RE_function.finditer(t):
+        #
+        # Include any text prior to function() as a normal text node.
+        #
+        if m.start() > done:
+            repl.append(nodes.Text(t[done:m.start()]))
+        #
+        # Go through the dance of getting an xref out of the C domain
+        #
+        target = m.group(1)[:-2]
+        target_text = nodes.Text(target + '()')
+        xref = None
+        if target not in Skipfuncs:
+            lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
+            lit_text += target_text
+            pxref = addnodes.pending_xref('', refdomain = 'c',
+                                          reftype = 'function',
+                                          reftarget = target, modname = None,
+                                          classname = None)
+            xref = cdom.resolve_xref(app.env, docname, app.builder,
+                                     'function', target, pxref, lit_text)
+        #
+        # Toss the xref into the list if we got it; otherwise just put
+        # the function text.
+        #
+        if xref:
+            repl.append(xref)
+        else:
+            repl.append(target_text)
+        done = m.end()
+    if done < len(t):
+        repl.append(nodes.Text(t[done:]))
+    return repl
+
+def auto_markup(app, doctree, name):
+    #
+    # This loop could eventually be improved on.  Someday maybe we
+    # want a proper tree traversal with a lot of awareness of which
+    # kinds of nodes to prune.  But this works well for now.
+    #
+    # The nodes.literal test catches ``literal text``, its purpose is to
+    # avoid adding cross-references to functions that have been explicitly
+    # marked with cc:func:.
+    #
+    for para in doctree.traverse(nodes.paragraph):
+        for node in para.traverse(nodes.Text):
+            if not isinstance(node.parent, nodes.literal):
+                node.parent.replace(node, markup_funcs(name, app, node))
+
+def setup(app):
+    app.connect('doctree-resolved', auto_markup)
+    return {
+        'parallel_read_safe': True,
+        'parallel_write_safe': True,
+        }
-- 
cgit v1.2.3-59-g8ed1b


From 9c79df7f0312e883f17a7b1c2352d1511362354c Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Thu, 25 Apr 2019 13:48:13 -0600
Subject: docs: remove :c:func: annotations from xarray.rst

Now that the build system automatically marks up function references, we
don't have to clutter the source files, so take it out.

[Some paragraphs could now benefit from refilling, but that was left out to
avoid obscuring the real changes.]

Acked-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/xarray.rst | 270 +++++++++++++++++++-------------------
 1 file changed, 135 insertions(+), 135 deletions(-)

diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
index ef6f9f98f595..fcedc5349ace 100644
--- a/Documentation/core-api/xarray.rst
+++ b/Documentation/core-api/xarray.rst
@@ -30,27 +30,27 @@ it called marks.  Each mark may be set or cleared independently of
 the others.  You can iterate over entries which are marked.
 
 Normal pointers may be stored in the XArray directly.  They must be 4-byte
-aligned, which is true for any pointer returned from :c:func:`kmalloc` and
-:c:func:`alloc_page`.  It isn't true for arbitrary user-space pointers,
+aligned, which is true for any pointer returned from kmalloc() and
+alloc_page().  It isn't true for arbitrary user-space pointers,
 nor for function pointers.  You can store pointers to statically allocated
 objects, as long as those objects have an alignment of at least 4.
 
 You can also store integers between 0 and ``LONG_MAX`` in the XArray.
-You must first convert it into an entry using :c:func:`xa_mk_value`.
+You must first convert it into an entry using xa_mk_value().
 When you retrieve an entry from the XArray, you can check whether it is
-a value entry by calling :c:func:`xa_is_value`, and convert it back to
-an integer by calling :c:func:`xa_to_value`.
+a value entry by calling xa_is_value(), and convert it back to
+an integer by calling xa_to_value().
 
 Some users want to store tagged pointers instead of using the marks
-described above.  They can call :c:func:`xa_tag_pointer` to create an
-entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
-back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
+described above.  They can call xa_tag_pointer() to create an
+entry with a tag, xa_untag_pointer() to turn a tagged entry
+back into an untagged pointer and xa_pointer_tag() to retrieve
 the tag of an entry.  Tagged pointers use the same bits that are used
 to distinguish value entries from normal pointers, so each user must
 decide whether they want to store value entries or tagged pointers in
 any particular XArray.
 
-The XArray does not support storing :c:func:`IS_ERR` pointers as some
+The XArray does not support storing IS_ERR() pointers as some
 conflict with value entries or internal entries.
 
 An unusual feature of the XArray is the ability to create entries which
@@ -64,89 +64,89 @@ entry will cause the XArray to forget about the range.
 Normal API
 ==========
 
-Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
-for statically allocated XArrays or :c:func:`xa_init` for dynamically
+Start by initialising an XArray, either with DEFINE_XARRAY()
+for statically allocated XArrays or xa_init() for dynamically
 allocated ones.  A freshly-initialised XArray contains a ``NULL``
 pointer at every index.
 
-You can then set entries using :c:func:`xa_store` and get entries
-using :c:func:`xa_load`.  xa_store will overwrite any entry with the
+You can then set entries using xa_store() and get entries
+using xa_load().  xa_store will overwrite any entry with the
 new entry and return the previous entry stored at that index.  You can
-use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
+use xa_erase() instead of calling xa_store() with a
 ``NULL`` entry.  There is no difference between an entry that has never
 been stored to, one that has been erased and one that has most recently
 had ``NULL`` stored to it.
 
 You can conditionally replace an entry at an index by using
-:c:func:`xa_cmpxchg`.  Like :c:func:`cmpxchg`, it will only succeed if
+xa_cmpxchg().  Like cmpxchg(), it will only succeed if
 the entry at that index has the 'old' value.  It also returns the entry
 which was at that index; if it returns the same entry which was passed as
-'old', then :c:func:`xa_cmpxchg` succeeded.
+'old', then xa_cmpxchg() succeeded.
 
 If you want to only store a new entry to an index if the current entry
-at that index is ``NULL``, you can use :c:func:`xa_insert` which
+at that index is ``NULL``, you can use xa_insert() which
 returns ``-EBUSY`` if the entry is not empty.
 
 You can enquire whether a mark is set on an entry by using
-:c:func:`xa_get_mark`.  If the entry is not ``NULL``, you can set a mark
-on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
-calling :c:func:`xa_clear_mark`.  You can ask whether any entry in the
-XArray has a particular mark set by calling :c:func:`xa_marked`.
+xa_get_mark().  If the entry is not ``NULL``, you can set a mark
+on it by using xa_set_mark() and remove the mark from an entry by
+calling xa_clear_mark().  You can ask whether any entry in the
+XArray has a particular mark set by calling xa_marked().
 
 You can copy entries out of the XArray into a plain array by calling
-:c:func:`xa_extract`.  Or you can iterate over the present entries in
-the XArray by calling :c:func:`xa_for_each`.  You may prefer to use
-:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
+xa_extract().  Or you can iterate over the present entries in
+the XArray by calling xa_for_each().  You may prefer to use
+xa_find() or xa_find_after() to move to the next present
 entry in the XArray.
 
-Calling :c:func:`xa_store_range` stores the same entry in a range
+Calling xa_store_range() stores the same entry in a range
 of indices.  If you do this, some of the other operations will behave
 in a slightly odd way.  For example, marking the entry at one index
 may result in the entry being marked at some, but not all of the other
 indices.  Storing into one index may result in the entry retrieved by
 some, but not all of the other indices changing.
 
-Sometimes you need to ensure that a subsequent call to :c:func:`xa_store`
-will not need to allocate memory.  The :c:func:`xa_reserve` function
+Sometimes you need to ensure that a subsequent call to xa_store()
+will not need to allocate memory.  The xa_reserve() function
 will store a reserved entry at the indicated index.  Users of the
 normal API will see this entry as containing ``NULL``.  If you do
-not need to use the reserved entry, you can call :c:func:`xa_release`
+not need to use the reserved entry, you can call xa_release()
 to remove the unused entry.  If another user has stored to the entry
-in the meantime, :c:func:`xa_release` will do nothing; if instead you
-want the entry to become ``NULL``, you should use :c:func:`xa_erase`.
-Using :c:func:`xa_insert` on a reserved entry will fail.
+in the meantime, xa_release() will do nothing; if instead you
+want the entry to become ``NULL``, you should use xa_erase().
+Using xa_insert() on a reserved entry will fail.
 
-If all entries in the array are ``NULL``, the :c:func:`xa_empty` function
+If all entries in the array are ``NULL``, the xa_empty() function
 will return ``true``.
 
 Finally, you can remove all entries from an XArray by calling
-:c:func:`xa_destroy`.  If the XArray entries are pointers, you may wish
+xa_destroy().  If the XArray entries are pointers, you may wish
 to free the entries first.  You can do this by iterating over all present
-entries in the XArray using the :c:func:`xa_for_each` iterator.
+entries in the XArray using the xa_for_each() iterator.
 
 Allocating XArrays
 ------------------
 
-If you use :c:func:`DEFINE_XARRAY_ALLOC` to define the XArray, or
-initialise it by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
+If you use DEFINE_XARRAY_ALLOC() to define the XArray, or
+initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
 the XArray changes to track whether entries are in use or not.
 
-You can call :c:func:`xa_alloc` to store the entry at an unused index
+You can call xa_alloc() to store the entry at an unused index
 in the XArray.  If you need to modify the array from interrupt context,
-you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
+you can use xa_alloc_bh() or xa_alloc_irq() to disable
 interrupts while allocating the ID.
 
-Using :c:func:`xa_store`, :c:func:`xa_cmpxchg` or :c:func:`xa_insert` will
+Using xa_store(), xa_cmpxchg() or xa_insert() will
 also mark the entry as being allocated.  Unlike a normal XArray, storing
-``NULL`` will mark the entry as being in use, like :c:func:`xa_reserve`.
-To free an entry, use :c:func:`xa_erase` (or :c:func:`xa_release` if
+``NULL`` will mark the entry as being in use, like xa_reserve().
+To free an entry, use xa_erase() (or xa_release() if
 you only want to free the entry if it's ``NULL``).
 
 By default, the lowest free entry is allocated starting from 0.  If you
 want to allocate entries starting at 1, it is more efficient to use
-:c:func:`DEFINE_XARRAY_ALLOC1` or ``XA_FLAGS_ALLOC1``.  If you want to
+DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``.  If you want to
 allocate IDs up to a maximum, then wrap back around to the lowest free
-ID, you can use :c:func:`xa_alloc_cyclic`.
+ID, you can use xa_alloc_cyclic().
 
 You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
 is used to track whether an entry is free or not.  The other marks are
@@ -155,17 +155,17 @@ available for your use.
 Memory allocation
 -----------------
 
-The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
-:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
+The xa_store(), xa_cmpxchg(), xa_alloc(),
+xa_reserve() and xa_insert() functions take a gfp_t
 parameter in case the XArray needs to allocate memory to store this entry.
 If the entry is being deleted, no memory allocation needs to be performed,
 and the GFP flags specified will be ignored.
 
 It is possible for no memory to be allocatable, particularly if you pass
 a restrictive set of GFP flags.  In that case, the functions return a
-special value which can be turned into an errno using :c:func:`xa_err`.
+special value which can be turned into an errno using xa_err().
 If you don't need to know exactly which error occurred, using
-:c:func:`xa_is_err` is slightly more efficient.
+xa_is_err() is slightly more efficient.
 
 Locking
 -------
@@ -174,54 +174,54 @@ When using the Normal API, you do not have to worry about locking.
 The XArray uses RCU and an internal spinlock to synchronise access:
 
 No lock needed:
- * :c:func:`xa_empty`
- * :c:func:`xa_marked`
+ * xa_empty()
+ * xa_marked()
 
 Takes RCU read lock:
- * :c:func:`xa_load`
- * :c:func:`xa_for_each`
- * :c:func:`xa_find`
- * :c:func:`xa_find_after`
- * :c:func:`xa_extract`
- * :c:func:`xa_get_mark`
+ * xa_load()
+ * xa_for_each()
+ * xa_find()
+ * xa_find_after()
+ * xa_extract()
+ * xa_get_mark()
 
 Takes xa_lock internally:
- * :c:func:`xa_store`
- * :c:func:`xa_store_bh`
- * :c:func:`xa_store_irq`
- * :c:func:`xa_insert`
- * :c:func:`xa_insert_bh`
- * :c:func:`xa_insert_irq`
- * :c:func:`xa_erase`
- * :c:func:`xa_erase_bh`
- * :c:func:`xa_erase_irq`
- * :c:func:`xa_cmpxchg`
- * :c:func:`xa_cmpxchg_bh`
- * :c:func:`xa_cmpxchg_irq`
- * :c:func:`xa_store_range`
- * :c:func:`xa_alloc`
- * :c:func:`xa_alloc_bh`
- * :c:func:`xa_alloc_irq`
- * :c:func:`xa_reserve`
- * :c:func:`xa_reserve_bh`
- * :c:func:`xa_reserve_irq`
- * :c:func:`xa_destroy`
- * :c:func:`xa_set_mark`
- * :c:func:`xa_clear_mark`
+ * xa_store()
+ * xa_store_bh()
+ * xa_store_irq()
+ * xa_insert()
+ * xa_insert_bh()
+ * xa_insert_irq()
+ * xa_erase()
+ * xa_erase_bh()
+ * xa_erase_irq()
+ * xa_cmpxchg()
+ * xa_cmpxchg_bh()
+ * xa_cmpxchg_irq()
+ * xa_store_range()
+ * xa_alloc()
+ * xa_alloc_bh()
+ * xa_alloc_irq()
+ * xa_reserve()
+ * xa_reserve_bh()
+ * xa_reserve_irq()
+ * xa_destroy()
+ * xa_set_mark()
+ * xa_clear_mark()
 
 Assumes xa_lock held on entry:
- * :c:func:`__xa_store`
- * :c:func:`__xa_insert`
- * :c:func:`__xa_erase`
- * :c:func:`__xa_cmpxchg`
- * :c:func:`__xa_alloc`
- * :c:func:`__xa_set_mark`
- * :c:func:`__xa_clear_mark`
+ * __xa_store()
+ * __xa_insert()
+ * __xa_erase()
+ * __xa_cmpxchg()
+ * __xa_alloc()
+ * __xa_set_mark()
+ * __xa_clear_mark()
 
 If you want to take advantage of the lock to protect the data structures
-that you are storing in the XArray, you can call :c:func:`xa_lock`
-before calling :c:func:`xa_load`, then take a reference count on the
-object you have found before calling :c:func:`xa_unlock`.  This will
+that you are storing in the XArray, you can call xa_lock()
+before calling xa_load(), then take a reference count on the
+object you have found before calling xa_unlock().  This will
 prevent stores from removing the object from the array between looking
 up the object and incrementing the refcount.  You can also use RCU to
 avoid dereferencing freed memory, but an explanation of that is beyond
@@ -261,7 +261,7 @@ context and then erase them in softirq context, you can do that this way::
     }
 
 If you are going to modify the XArray from interrupt or softirq context,
-you need to initialise the array using :c:func:`xa_init_flags`, passing
+you need to initialise the array using xa_init_flags(), passing
 ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
 
 The above example also shows a common pattern of wanting to extend the
@@ -269,20 +269,20 @@ coverage of the xa_lock on the store side to protect some statistics
 associated with the array.
 
 Sharing the XArray with interrupt context is also possible, either
-using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
-context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
+using xa_lock_irqsave() in both the interrupt handler and process
+context, or xa_lock_irq() in process context and xa_lock()
 in the interrupt handler.  Some of the more common patterns have helper
-functions such as :c:func:`xa_store_bh`, :c:func:`xa_store_irq`,
-:c:func:`xa_erase_bh`, :c:func:`xa_erase_irq`, :c:func:`xa_cmpxchg_bh`
-and :c:func:`xa_cmpxchg_irq`.
+functions such as xa_store_bh(), xa_store_irq(),
+xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
+and xa_cmpxchg_irq().
 
 Sometimes you need to protect access to the XArray with a mutex because
 that lock sits above another mutex in the locking hierarchy.  That does
-not entitle you to use functions like :c:func:`__xa_erase` without taking
+not entitle you to use functions like __xa_erase() without taking
 the xa_lock; the xa_lock is used for lockdep validation and will be used
 for other purposes in the future.
 
-The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
+The __xa_set_mark() and __xa_clear_mark() functions are also
 available for situations where you look up an entry and want to atomically
 set or clear a mark.  It may be more efficient to use the advanced API
 in this case, as it will save you from walking the tree twice.
@@ -300,27 +300,27 @@ indeed the normal API is implemented in terms of the advanced API.  The
 advanced API is only available to modules with a GPL-compatible license.
 
 The advanced API is based around the xa_state.  This is an opaque data
-structure which you declare on the stack using the :c:func:`XA_STATE`
+structure which you declare on the stack using the XA_STATE()
 macro.  This macro initialises the xa_state ready to start walking
 around the XArray.  It is used as a cursor to maintain the position
 in the XArray and let you compose various operations together without
 having to restart from the top every time.
 
 The xa_state is also used to store errors.  You can call
-:c:func:`xas_error` to retrieve the error.  All operations check whether
+xas_error() to retrieve the error.  All operations check whether
 the xa_state is in an error state before proceeding, so there's no need
 for you to check for an error after each call; you can make multiple
 calls in succession and only check at a convenient point.  The only
 errors currently generated by the XArray code itself are ``ENOMEM`` and
 ``EINVAL``, but it supports arbitrary errors in case you want to call
-:c:func:`xas_set_err` yourself.
+xas_set_err() yourself.
 
-If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
+If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
 will attempt to allocate more memory using the specified gfp flags and
 cache it in the xa_state for the next attempt.  The idea is that you take
 the xa_lock, attempt the operation and drop the lock.  The operation
 attempts to allocate memory while holding the lock, but it is more
-likely to fail.  Once you have dropped the lock, :c:func:`xas_nomem`
+likely to fail.  Once you have dropped the lock, xas_nomem()
 can try harder to allocate more memory.  It will return ``true`` if it
 is worth retrying the operation (i.e. that there was a memory error *and*
 more memory was allocated).  If it has previously allocated memory, and
@@ -333,7 +333,7 @@ Internal Entries
 The XArray reserves some entries for its own purposes.  These are never
 exposed through the normal API, but when using the advanced API, it's
 possible to see them.  Usually the best way to handle them is to pass them
-to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
+to xas_retry(), and retry the operation if it returns ``true``.
 
 .. flat-table::
    :widths: 1 1 6
@@ -343,89 +343,89 @@ to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
      - Usage
 
    * - Node
-     - :c:func:`xa_is_node`
+     - xa_is_node()
      - An XArray node.  May be visible when using a multi-index xa_state.
 
    * - Sibling
-     - :c:func:`xa_is_sibling`
+     - xa_is_sibling()
      - A non-canonical entry for a multi-index entry.  The value indicates
        which slot in this node has the canonical entry.
 
    * - Retry
-     - :c:func:`xa_is_retry`
+     - xa_is_retry()
      - This entry is currently being modified by a thread which has the
        xa_lock.  The node containing this entry may be freed at the end
        of this RCU period.  You should restart the lookup from the head
        of the array.
 
    * - Zero
-     - :c:func:`xa_is_zero`
+     - xa_is_zero()
      - Zero entries appear as ``NULL`` through the Normal API, but occupy
        an entry in the XArray which can be used to reserve the index for
        future use.  This is used by allocating XArrays for allocated entries
        which are ``NULL``.
 
 Other internal entries may be added in the future.  As far as possible, they
-will be handled by :c:func:`xas_retry`.
+will be handled by xas_retry().
 
 Additional functionality
 ------------------------
 
-The :c:func:`xas_create_range` function allocates all the necessary memory
+The xas_create_range() function allocates all the necessary memory
 to store every entry in a range.  It will set ENOMEM in the xa_state if
 it cannot allocate memory.
 
-You can use :c:func:`xas_init_marks` to reset the marks on an entry
+You can use xas_init_marks() to reset the marks on an entry
 to their default state.  This is usually all marks clear, unless the
 XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
 and all other marks are clear.  Replacing one entry with another using
-:c:func:`xas_store` will not reset the marks on that entry; if you want
+xas_store() will not reset the marks on that entry; if you want
 the marks reset, you should do that explicitly.
 
-The :c:func:`xas_load` will walk the xa_state as close to the entry
+The xas_load() will walk the xa_state as close to the entry
 as it can.  If you know the xa_state has already been walked to the
 entry and need to check that the entry hasn't changed, you can use
-:c:func:`xas_reload` to save a function call.
+xas_reload() to save a function call.
 
 If you need to move to a different index in the XArray, call
-:c:func:`xas_set`.  This resets the cursor to the top of the tree, which
+xas_set().  This resets the cursor to the top of the tree, which
 will generally make the next operation walk the cursor to the desired
 spot in the tree.  If you want to move to the next or previous index,
-call :c:func:`xas_next` or :c:func:`xas_prev`.  Setting the index does
+call xas_next() or xas_prev().  Setting the index does
 not walk the cursor around the array so does not require a lock to be
 held, while moving to the next or previous index does.
 
-You can search for the next present entry using :c:func:`xas_find`.  This
-is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
+You can search for the next present entry using xas_find().  This
+is the equivalent of both xa_find() and xa_find_after();
 if the cursor has been walked to an entry, then it will find the next
 entry after the one currently referenced.  If not, it will return the
-entry at the index of the xa_state.  Using :c:func:`xas_next_entry` to
-move to the next present entry instead of :c:func:`xas_find` will save
+entry at the index of the xa_state.  Using xas_next_entry() to
+move to the next present entry instead of xas_find() will save
 a function call in the majority of cases at the expense of emitting more
 inline code.
 
-The :c:func:`xas_find_marked` function is similar.  If the xa_state has
+The xas_find_marked() function is similar.  If the xa_state has
 not been walked, it will return the entry at the index of the xa_state,
 if it is marked.  Otherwise, it will return the first marked entry after
-the entry referenced by the xa_state.  The :c:func:`xas_next_marked`
-function is the equivalent of :c:func:`xas_next_entry`.
+the entry referenced by the xa_state.  The xas_next_marked()
+function is the equivalent of xas_next_entry().
 
-When iterating over a range of the XArray using :c:func:`xas_for_each`
-or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
-the iteration.  The :c:func:`xas_pause` function exists for this purpose.
+When iterating over a range of the XArray using xas_for_each()
+or xas_for_each_marked(), it may be necessary to temporarily stop
+the iteration.  The xas_pause() function exists for this purpose.
 After you have done the necessary work and wish to resume, the xa_state
 is in an appropriate state to continue the iteration after the entry
 you last processed.  If you have interrupts disabled while iterating,
 then it is good manners to pause the iteration and reenable interrupts
 every ``XA_CHECK_SCHED`` entries.
 
-The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
-:c:func:`xas_clear_mark` functions require the xa_state cursor to have
+The xas_get_mark(), xas_set_mark() and
+xas_clear_mark() functions require the xa_state cursor to have
 been moved to the appropriate location in the xarray; they will do
-nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
+nothing if you have called xas_pause() or xas_set()
 immediately before.
 
-You can call :c:func:`xas_set_update` to have a callback function
+You can call xas_set_update() to have a callback function
 called each time the XArray updates a node.  This is used by the page
 cache workingset code to maintain its list of nodes which contain only
 shadow entries.
@@ -443,25 +443,25 @@ eg indices 64-127 may be tied together, but 2-6 may not be.  This may
 save substantial quantities of memory; for example tying 512 entries
 together will save over 4kB.
 
-You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
-or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
-Calling :c:func:`xas_load` with a multi-index xa_state will walk the
+You can create a multi-index entry by using XA_STATE_ORDER()
+or xas_set_order() followed by a call to xas_store().
+Calling xas_load() with a multi-index xa_state will walk the
 xa_state to the right location in the tree, but the return value is not
 meaningful, potentially being an internal entry or ``NULL`` even when there
-is an entry stored within the range.  Calling :c:func:`xas_find_conflict`
+is an entry stored within the range.  Calling xas_find_conflict()
 will return the first entry within the range or ``NULL`` if there are no
-entries in the range.  The :c:func:`xas_for_each_conflict` iterator will
+entries in the range.  The xas_for_each_conflict() iterator will
 iterate over every entry which overlaps the specified range.
 
-If :c:func:`xas_load` encounters a multi-index entry, the xa_index
+If xas_load() encounters a multi-index entry, the xa_index
 in the xa_state will not be changed.  When iterating over an XArray
-or calling :c:func:`xas_find`, if the initial index is in the middle
+or calling xas_find(), if the initial index is in the middle
 of a multi-index entry, it will not be altered.  Subsequent calls
 or iterations will move the index to the first index in the range.
 Each entry will only be returned once, no matter how many indices it
 occupies.
 
-Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
+Using xas_next() or xas_prev() with a multi-index xa_state
 is not supported.  Using either of these functions on a multi-index entry
 will reveal sibling entries; these should be skipped over by the caller.
 
-- 
cgit v1.2.3-59-g8ed1b


From 344fdb28a0dfac2e42925f149029748b34d2effa Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Fri, 21 Jun 2019 17:34:30 -0600
Subject: kernel-doc: Don't try to mark up function names

We now have better automarkup in sphinx itself and, besides, this markup
was incorrect and left :c:func: gunk in the processed docs.  Sort of
discouraging that nobody ever noticed...:)

As a first step toward the removal of impenetrable regex magic from
kernel-doc it's a tiny one, but you have to start somewhere.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/kernel-doc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index c0cb41e65b9b..6b03012750da 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -249,7 +249,7 @@ my @highlights_rst = (
                        [$type_member_func, "\\:c\\:type\\:`\$1\$2\$3\\\\(\\\\) <\$1>`"],
                        [$type_member, "\\:c\\:type\\:`\$1\$2\$3 <\$1>`"],
 		       [$type_fp_param, "**\$1\\\\(\\\\)**"],
-                       [$type_func, "\\:c\\:func\\:`\$1()`"],
+                       [$type_func, "\$1()"],
                        [$type_enum, "\\:c\\:type\\:`\$1 <\$2>`"],
                        [$type_struct, "\\:c\\:type\\:`\$1 <\$2>`"],
                        [$type_typedef, "\\:c\\:type\\:`\$1 <\$2>`"],
-- 
cgit v1.2.3-59-g8ed1b


From d9d7c0c497b8e2ffd9fe26cc96a49ed2d69d8b75 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Wed, 26 Jun 2019 11:20:21 -0600
Subject: docs: Note that :c:func: should no longer be used

Now that we can mark up function() automatically, there is no reason to use
:c:func: and every reason to avoid it.  Adjust the documentation to reflect
that fact.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/doc-guide/sphinx.rst | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index e60a56640c63..f71ddd592aaa 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -241,11 +241,14 @@ The C domain of the kernel-doc has some additional features. E.g. you can
 
 The func-name (e.g. ioctl) remains in the output but the ref-name changed from
 ``ioctl`` to ``VIDIOC_LOG_STATUS``. The index entry for this function is also
-changed to ``VIDIOC_LOG_STATUS`` and the function can now referenced by:
-
-.. code-block:: rst
-
-     :c:func:`VIDIOC_LOG_STATUS`
+changed to ``VIDIOC_LOG_STATUS``.
+
+Please note that there is no need to use ``c:func:`` to generate cross
+references to function documentation.  Due to some Sphinx extension magic,
+the documentation build system will automatically turn a reference to
+``function()`` into a cross reference if an index entry for the given
+function name exists.  If you see ``c:func:`` use in a kernel document,
+please feel free to remove it.
 
 
 list tables
-- 
cgit v1.2.3-59-g8ed1b


From 163ede97a9a29604c3a8afbf22ae0599f5148621 Mon Sep 17 00:00:00 2001
From: Puranjay Mohan <puranjay12@gmail.com>
Date: Fri, 21 Jun 2019 00:08:27 +0530
Subject: Documentation: platform: Delete x86-laptop-drivers.txt

The list of laptops supported by drivers in PDx86 subsystem is quite
big and growing. x86-laptop-drivers.txt contains details of very few
laptop models. Remove it because it does not  serve any purpose.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/platform/x86-laptop-drivers.txt | 18 ------------------
 1 file changed, 18 deletions(-)
 delete mode 100644 Documentation/platform/x86-laptop-drivers.txt

diff --git a/Documentation/platform/x86-laptop-drivers.txt b/Documentation/platform/x86-laptop-drivers.txt
deleted file mode 100644
index 01facd2590bb..000000000000
--- a/Documentation/platform/x86-laptop-drivers.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-compal-laptop
-=============
-List of supported hardware:
-
-by Compal:
-	Compal FL90/IFL90
-	Compal FL91/IFL91
-	Compal FL92/JFL92
-	Compal FT00/IFT00
-
-by Dell:
-	Dell Vostro 1200
-	Dell Mini 9 (Inspiron 910)
-	Dell Mini 10 (Inspiron 1010)
-	Dell Mini 10v (Inspiron 1011)
-	Dell Mini 1012 (Inspiron 1012)
-	Dell Inspiron 11z (Inspiron 1110)
-	Dell Mini 12 (Inspiron 1210)
-- 
cgit v1.2.3-59-g8ed1b


From 6e88559470f581741bcd0f2794f9054814ac9740 Mon Sep 17 00:00:00 2001
From: Tim Chen <tim.c.chen@linux.intel.com>
Date: Thu, 20 Jun 2019 16:10:50 -0700
Subject: Documentation: Add section about CPU vulnerabilities for Spectre

Add documentation for Spectre vulnerability and the mitigation mechanisms:

- Explain the problem and risks
- Document the mitigation mechanisms
- Document the command line controls
- Document the sysfs files

Co-developed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/hw-vuln/index.rst   |   1 +
 Documentation/admin-guide/hw-vuln/spectre.rst | 697 ++++++++++++++++++++++++++
 Documentation/userspace-api/spec_ctrl.rst     |   2 +
 3 files changed, 700 insertions(+)
 create mode 100644 Documentation/admin-guide/hw-vuln/spectre.rst

diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index ffc064c1ec68..49311f3da6f2 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -9,5 +9,6 @@ are configurable at compile, boot or run time.
 .. toctree::
    :maxdepth: 1
 
+   spectre
    l1tf
    mds
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
new file mode 100644
index 000000000000..25f3b2532198
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -0,0 +1,697 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Spectre Side Channels
+=====================
+
+Spectre is a class of side channel attacks that exploit branch prediction
+and speculative execution on modern CPUs to read memory, possibly
+bypassing access controls. Speculative execution side channel exploits
+do not modify memory but attempt to infer privileged data in the memory.
+
+This document covers Spectre variant 1 and Spectre variant 2.
+
+Affected processors
+-------------------
+
+Speculative execution side channel methods affect a wide range of modern
+high performance processors, since most modern high speed processors
+use branch prediction and speculative execution.
+
+The following CPUs are vulnerable:
+
+    - Intel Core, Atom, Pentium, and Xeon processors
+
+    - AMD Phenom, EPYC, and Zen processors
+
+    - IBM POWER and zSeries processors
+
+    - Higher end ARM processors
+
+    - Apple CPUs
+
+    - Higher end MIPS CPUs
+
+    - Likely most other high performance CPUs. Contact your CPU vendor for details.
+
+Whether a processor is affected or not can be read out from the Spectre
+vulnerability files in sysfs. See :ref:`spectre_sys_info`.
+
+Related CVEs
+------------
+
+The following CVE entries describe Spectre variants:
+
+   =============   =======================  =================
+   CVE-2017-5753   Bounds check bypass      Spectre variant 1
+   CVE-2017-5715   Branch target injection  Spectre variant 2
+   =============   =======================  =================
+
+Problem
+-------
+
+CPUs use speculative operations to improve performance. That may leave
+traces of memory accesses or computations in the processor's caches,
+buffers, and branch predictors. Malicious software may be able to
+influence the speculative execution paths, and then use the side effects
+of the speculative execution in the CPUs' caches and buffers to infer
+privileged data touched during the speculative execution.
+
+Spectre variant 1 attacks take advantage of speculative execution of
+conditional branches, while Spectre variant 2 attacks use speculative
+execution of indirect branches to leak privileged memory.
+See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
+:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
+
+Spectre variant 1 (Bounds Check Bypass)
+---------------------------------------
+
+The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage
+of speculative execution that bypasses conditional branch instructions
+used for memory access bounds check (e.g. checking if the index of an
+array results in memory access within a valid range). This results in
+memory accesses to invalid memory (with out-of-bound index) that are
+done speculatively before validation checks resolve. Such speculative
+memory accesses can leave side effects, creating side channels which
+leak information to the attacker.
+
+There are some extensions of Spectre variant 1 attacks for reading data
+over the network, see :ref:`[12] <spec_ref12>`. However such attacks
+are difficult, low bandwidth, fragile, and are considered low risk.
+
+Spectre variant 2 (Branch Target Injection)
+-------------------------------------------
+
+The branch target injection attack takes advantage of speculative
+execution of indirect branches :ref:`[3] <spec_ref3>`.  The indirect
+branch predictors inside the processor used to guess the target of
+indirect branches can be influenced by an attacker, causing gadget code
+to be speculatively executed, thus exposing sensitive data touched by
+the victim. The side effects left in the CPU's caches during speculative
+execution can be measured to infer data values.
+
+.. _poison_btb:
+
+In Spectre variant 2 attacks, the attacker can steer speculative indirect
+branches in the victim to gadget code by poisoning the branch target
+buffer of a CPU used for predicting indirect branch addresses. Such
+poisoning could be done by indirect branching into existing code,
+with the address offset of the indirect branch under the attacker's
+control. Since the branch prediction on impacted hardware does not
+fully disambiguate branch address and uses the offset for prediction,
+this could cause privileged code's indirect branch to jump to a gadget
+code with the same offset.
+
+The most useful gadgets take an attacker-controlled input parameter (such
+as a register value) so that the memory read can be controlled. Gadgets
+without input parameters might be possible, but the attacker would have
+very little control over what memory can be read, reducing the risk of
+the attack revealing useful data.
+
+One other variant 2 attack vector is for the attacker to poison the
+return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative
+subroutine return instruction execution to go to a gadget.  An attacker's
+imbalanced subroutine call instructions might "poison" entries in the
+return stack buffer which are later consumed by a victim's subroutine
+return instructions.  This attack can be mitigated by flushing the return
+stack buffer on context switch, or virtual machine (VM) exit.
+
+On systems with simultaneous multi-threading (SMT), attacks are possible
+from the sibling thread, as level 1 cache and branch target buffer
+(BTB) may be shared between hardware threads in a CPU core.  A malicious
+program running on the sibling thread may influence its peer's BTB to
+steer its indirect branch speculations to gadget code, and measure the
+speculative execution's side effects left in level 1 cache to infer the
+victim's data.
+
+Attack scenarios
+----------------
+
+The following list of attack scenarios have been anticipated, but may
+not cover all possible attack vectors.
+
+1. A user process attacking the kernel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The attacker passes a parameter to the kernel via a register or
+   via a known address in memory during a syscall. Such parameter may
+   be used later by the kernel as an index to an array or to derive
+   a pointer for a Spectre variant 1 attack.  The index or pointer
+   is invalid, but bound checks are bypassed in the code branch taken
+   for speculative execution. This could cause privileged memory to be
+   accessed and leaked.
+
+   For kernel code that has been identified where data pointers could
+   potentially be influenced for Spectre attacks, new "nospec" accessor
+   macros are used to prevent speculative loading of data.
+
+   Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
+   target buffer (BTB) before issuing syscall to launch an attack.
+   After entering the kernel, the kernel could use the poisoned branch
+   target buffer on indirect jump and jump to gadget code in speculative
+   execution.
+
+   If an attacker tries to control the memory addresses leaked during
+   speculative execution, he would also need to pass a parameter to the
+   gadget, either through a register or a known address in memory. After
+   the gadget has executed, he can measure the side effect.
+
+   The kernel can protect itself against consuming poisoned branch
+   target buffer entries by using return trampolines (also known as
+   "retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all
+   indirect branches. Return trampolines trap speculative execution paths
+   to prevent jumping to gadget code during speculative execution.
+   x86 CPUs with Enhanced Indirect Branch Restricted Speculation
+   (Enhanced IBRS) available in hardware should use the feature to
+   mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is
+   more efficient than retpoline.
+
+   There may be gadget code in firmware which could be exploited with
+   Spectre variant 2 attack by a rogue user process. To mitigate such
+   attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature
+   is turned on before the kernel invokes any firmware code.
+
+2. A user process attacking another user process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   A malicious user process can try to attack another user process,
+   either via a context switch on the same hardware thread, or from the
+   sibling hyperthread sharing a physical processor core on simultaneous
+   multi-threading (SMT) system.
+
+   Spectre variant 1 attacks generally require passing parameters
+   between the processes, which needs a data passing relationship, such
+   as remote procedure calls (RPC).  Those parameters are used in gadget
+   code to derive invalid data pointers accessing privileged memory in
+   the attacked process.
+
+   Spectre variant 2 attacks can be launched from a rogue process by
+   :ref:`poisoning <poison_btb>` the branch target buffer.  This can
+   influence the indirect branch targets for a victim process that either
+   runs later on the same hardware thread, or running concurrently on
+   a sibling hardware thread sharing the same physical core.
+
+   A user process can protect itself against Spectre variant 2 attacks
+   by using the prctl() syscall to disable indirect branch speculation
+   for itself.  An administrator can also cordon off an unsafe process
+   from polluting the branch target buffer by disabling the process's
+   indirect branch speculation. This comes with a performance cost
+   from not using indirect branch speculation and clearing the branch
+   target buffer.  When SMT is enabled on x86, for a process that has
+   indirect branch speculation disabled, Single Threaded Indirect Branch
+   Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the
+   sibling thread from controlling branch target buffer.  In addition,
+   the Indirect Branch Prediction Barrier (IBPB) is issued to clear the
+   branch target buffer when context switching to and from such process.
+
+   On x86, the return stack buffer is stuffed on context switch.
+   This prevents the branch target buffer from being used for branch
+   prediction when the return stack buffer underflows while switching to
+   a deeper call stack. Any poisoned entries in the return stack buffer
+   left by the previous process will also be cleared.
+
+   User programs should use address space randomization to make attacks
+   more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
+
+3. A virtualized guest attacking the host
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   The attack mechanism is similar to how user processes attack the
+   kernel.  The kernel is entered via hyper-calls or other virtualization
+   exit paths.
+
+   For Spectre variant 1 attacks, rogue guests can pass parameters
+   (e.g. in registers) via hyper-calls to derive invalid pointers to
+   speculate into privileged memory after entering the kernel.  For places
+   where such kernel code has been identified, nospec accessor macros
+   are used to stop speculative memory access.
+
+   For Spectre variant 2 attacks, rogue guests can :ref:`poison
+   <poison_btb>` the branch target buffer or return stack buffer, causing
+   the kernel to jump to gadget code in the speculative execution paths.
+
+   To mitigate variant 2, the host kernel can use return trampolines
+   for indirect branches to bypass the poisoned branch target buffer,
+   and flushing the return stack buffer on VM exit.  This prevents rogue
+   guests from affecting indirect branching in the host kernel.
+
+   To protect host processes from rogue guests, host processes can have
+   indirect branch speculation disabled via prctl().  The branch target
+   buffer is cleared before context switching to such processes.
+
+4. A virtualized guest attacking other guest
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   A rogue guest may attack another guest to get data accessible by the
+   other guest.
+
+   Spectre variant 1 attacks are possible if parameters can be passed
+   between guests.  This may be done via mechanisms such as shared memory
+   or message passing.  Such parameters could be used to derive data
+   pointers to privileged data in guest.  The privileged data could be
+   accessed by gadget code in the victim's speculation paths.
+
+   Spectre variant 2 attacks can be launched from a rogue guest by
+   :ref:`poisoning <poison_btb>` the branch target buffer or the return
+   stack buffer. Such poisoned entries could be used to influence
+   speculation execution paths in the victim guest.
+
+   Linux kernel mitigates attacks to other guests running in the same
+   CPU hardware thread by flushing the return stack buffer on VM exit,
+   and clearing the branch target buffer before switching to a new guest.
+
+   If SMT is used, Spectre variant 2 attacks from an untrusted guest
+   in the sibling hyperthread can be mitigated by the administrator,
+   by turning off the unsafe guest's indirect branch speculation via
+   prctl().  A guest can also protect itself by turning on microcode
+   based mitigations (such as IBPB or STIBP on x86) within the guest.
+
+.. _spectre_sys_info:
+
+Spectre system information
+--------------------------
+
+The Linux kernel provides a sysfs interface to enumerate the current
+mitigation status of the system for Spectre: whether the system is
+vulnerable, and which mitigations are active.
+
+The sysfs file showing Spectre variant 1 mitigation status is:
+
+   /sys/devices/system/cpu/vulnerabilities/spectre_v1
+
+The possible values in this file are:
+
+  =======================================  =================================
+  'Mitigation: __user pointer sanitation'  Protection in kernel on a case by
+                                           case base with explicit pointer
+                                           sanitation.
+  =======================================  =================================
+
+However, the protections are put in place on a case by case basis,
+and there is no guarantee that all possible attack vectors for Spectre
+variant 1 are covered.
+
+The spectre_v2 kernel file reports if the kernel has been compiled with
+retpoline mitigation or if the CPU has hardware mitigation, and if the
+CPU has support for additional process-specific mitigation.
+
+This file also reports CPU features enabled by microcode to mitigate
+attack between user processes:
+
+1. Indirect Branch Prediction Barrier (IBPB) to add additional
+   isolation between processes of different users.
+2. Single Thread Indirect Branch Predictors (STIBP) to add additional
+   isolation between CPU threads running on the same core.
+
+These CPU features may impact performance when used and can be enabled
+per process on a case-by-case base.
+
+The sysfs file showing Spectre variant 2 mitigation status is:
+
+   /sys/devices/system/cpu/vulnerabilities/spectre_v2
+
+The possible values in this file are:
+
+  - Kernel status:
+
+  ====================================  =================================
+  'Not affected'                        The processor is not vulnerable
+  'Vulnerable'                          Vulnerable, no mitigation
+  'Mitigation: Full generic retpoline'  Software-focused mitigation
+  'Mitigation: Full AMD retpoline'      AMD-specific software mitigation
+  'Mitigation: Enhanced IBRS'           Hardware-focused mitigation
+  ====================================  =================================
+
+  - Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
+    used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
+
+  ========== =============================================================
+  'IBRS_FW'  Protection against user program attacks when calling firmware
+  ========== =============================================================
+
+  - Indirect branch prediction barrier (IBPB) status for protection between
+    processes of different users. This feature can be controlled through
+    prctl() per process, or through kernel command line options. This is
+    an x86 only feature. For more details see below.
+
+  ===================   ========================================================
+  'IBPB: disabled'      IBPB unused
+  'IBPB: always-on'     Use IBPB on all tasks
+  'IBPB: conditional'   Use IBPB on SECCOMP or indirect branch restricted tasks
+  ===================   ========================================================
+
+  - Single threaded indirect branch prediction (STIBP) status for protection
+    between different hyper threads. This feature can be controlled through
+    prctl per process, or through kernel command line options. This is x86
+    only feature. For more details see below.
+
+  ====================  ========================================================
+  'STIBP: disabled'     STIBP unused
+  'STIBP: forced'       Use STIBP on all tasks
+  'STIBP: conditional'  Use STIBP on SECCOMP or indirect branch restricted tasks
+  ====================  ========================================================
+
+  - Return stack buffer (RSB) protection status:
+
+  =============   ===========================================
+  'RSB filling'   Protection of RSB on context switch enabled
+  =============   ===========================================
+
+Full mitigation might require a microcode update from the CPU
+vendor. When the necessary microcode is not available, the kernel will
+report vulnerability.
+
+Turning on mitigation for Spectre variant 1 and Spectre variant 2
+-----------------------------------------------------------------
+
+1. Kernel mitigation
+^^^^^^^^^^^^^^^^^^^^
+
+   For the Spectre variant 1, vulnerable kernel code (as determined
+   by code audit or scanning tools) is annotated on a case by case
+   basis to use nospec accessor macros for bounds clipping :ref:`[2]
+   <spec_ref2>` to avoid any usable disclosure gadgets. However, it may
+   not cover all attack vectors for Spectre variant 1.
+
+   For Spectre variant 2 mitigation, the compiler turns indirect calls or
+   jumps in the kernel into equivalent return trampolines (retpolines)
+   :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
+   addresses.  Speculative execution paths under retpolines are trapped
+   in an infinite loop to prevent any speculative execution jumping to
+   a gadget.
+
+   To turn on retpoline mitigation on a vulnerable CPU, the kernel
+   needs to be compiled with a gcc compiler that supports the
+   -mindirect-branch=thunk-extern -mindirect-branch-register options.
+   If the kernel is compiled with a Clang compiler, the compiler needs
+   to support -mretpoline-external-thunk option.  The kernel config
+   CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with
+   the latest updated microcode.
+
+   On Intel Skylake-era systems the mitigation covers most, but not all,
+   cases. See :ref:`[3] <spec_ref3>` for more details.
+
+   On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
+   IBRS on x86), retpoline is automatically disabled at run time.
+
+   The retpoline mitigation is turned on by default on vulnerable
+   CPUs. It can be forced on or off by the administrator
+   via the kernel command line and sysfs control files. See
+   :ref:`spectre_mitigation_control_command_line`.
+
+   On x86, indirect branch restricted speculation is turned on by default
+   before invoking any firmware code to prevent Spectre variant 2 exploits
+   using the firmware.
+
+   Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
+   and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
+   attacks on the kernel generally more difficult.
+
+2. User program mitigation
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   User programs can mitigate Spectre variant 1 using LFENCE or "bounds
+   clipping". For more details see :ref:`[2] <spec_ref2>`.
+
+   For Spectre variant 2 mitigation, individual user programs
+   can be compiled with return trampolines for indirect branches.
+   This protects them from consuming poisoned entries in the branch
+   target buffer left by malicious software.  Alternatively, the
+   programs can disable their indirect branch speculation via prctl()
+   (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+   On x86, this will turn on STIBP to guard against attacks from the
+   sibling thread when the user program is running, and use IBPB to
+   flush the branch target buffer when switching to/from the program.
+
+   Restricting indirect branch speculation on a user program will
+   also prevent the program from launching a variant 2 attack
+   on x86.  All sand-boxed SECCOMP programs have indirect branch
+   speculation restricted by default.  Administrators can change
+   that behavior via the kernel command line and sysfs control files.
+   See :ref:`spectre_mitigation_control_command_line`.
+
+   Programs that disable their indirect branch speculation will have
+   more overhead and run slower.
+
+   User programs should use address space randomization
+   (/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
+   difficult.
+
+3. VM mitigation
+^^^^^^^^^^^^^^^^
+
+   Within the kernel, Spectre variant 1 attacks from rogue guests are
+   mitigated on a case by case basis in VM exit paths. Vulnerable code
+   uses nospec accessor macros for "bounds clipping", to avoid any
+   usable disclosure gadgets.  However, this may not cover all variant
+   1 attack vectors.
+
+   For Spectre variant 2 attacks from rogue guests to the kernel, the
+   Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of
+   poisoned entries in branch target buffer left by rogue guests.  It also
+   flushes the return stack buffer on every VM exit to prevent a return
+   stack buffer underflow so poisoned branch target buffer could be used,
+   or attacker guests leaving poisoned entries in the return stack buffer.
+
+   To mitigate guest-to-guest attacks in the same CPU hardware thread,
+   the branch target buffer is sanitized by flushing before switching
+   to a new guest on a CPU.
+
+   The above mitigations are turned on by default on vulnerable CPUs.
+
+   To mitigate guest-to-guest attacks from sibling thread when SMT is
+   in use, an untrusted guest running in the sibling thread can have
+   its indirect branch speculation disabled by administrator via prctl().
+
+   The kernel also allows guests to use any microcode based mitigation
+   they choose to use (such as IBPB or STIBP on x86) to protect themselves.
+
+.. _spectre_mitigation_control_command_line:
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+Spectre variant 2 mitigation can be disabled or force enabled at the
+kernel command line.
+
+	nospectre_v2
+
+		[X86] Disable all mitigations for the Spectre variant 2
+		(indirect branch prediction) vulnerability. System may
+		allow data leaks with this option, which is equivalent
+		to spectre_v2=off.
+
+
+        spectre_v2=
+
+		[X86] Control mitigation of Spectre variant 2
+		(indirect branch speculation) vulnerability.
+		The default operation protects the kernel from
+		user space attacks.
+
+		on
+			unconditionally enable, implies
+			spectre_v2_user=on
+		off
+			unconditionally disable, implies
+		        spectre_v2_user=off
+		auto
+			kernel detects whether your CPU model is
+		        vulnerable
+
+		Selecting 'on' will, and 'auto' may, choose a
+		mitigation method at run time according to the
+		CPU, the available microcode, the setting of the
+		CONFIG_RETPOLINE configuration option, and the
+		compiler with which the kernel was built.
+
+		Selecting 'on' will also enable the mitigation
+		against user space to user space task attacks.
+
+		Selecting 'off' will disable both the kernel and
+		the user space protections.
+
+		Specific mitigations can also be selected manually:
+
+		retpoline
+					replace indirect branches
+		retpoline,generic
+					google's original retpoline
+		retpoline,amd
+					AMD-specific minimal thunk
+
+		Not specifying this option is equivalent to
+		spectre_v2=auto.
+
+For user space mitigation:
+
+        spectre_v2_user=
+
+		[X86] Control mitigation of Spectre variant 2
+		(indirect branch speculation) vulnerability between
+		user space tasks
+
+		on
+			Unconditionally enable mitigations. Is
+			enforced by spectre_v2=on
+
+		off
+			Unconditionally disable mitigations. Is
+			enforced by spectre_v2=off
+
+		prctl
+			Indirect branch speculation is enabled,
+			but mitigation can be enabled via prctl
+			per thread. The mitigation control state
+			is inherited on fork.
+
+		prctl,ibpb
+			Like "prctl" above, but only STIBP is
+			controlled per thread. IBPB is issued
+			always when switching between different user
+			space processes.
+
+		seccomp
+			Same as "prctl" above, but all seccomp
+			threads will enable the mitigation unless
+			they explicitly opt out.
+
+		seccomp,ibpb
+			Like "seccomp" above, but only STIBP is
+			controlled per thread. IBPB is issued
+			always when switching between different
+			user space processes.
+
+		auto
+			Kernel selects the mitigation depending on
+			the available CPU features and vulnerability.
+
+		Default mitigation:
+		If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+
+		Not specifying this option is equivalent to
+		spectre_v2_user=auto.
+
+		In general the kernel by default selects
+		reasonable mitigations for the current CPU. To
+		disable Spectre variant 2 mitigations, boot with
+		spectre_v2=off. Spectre variant 1 mitigations
+		cannot be disabled.
+
+Mitigation selection guide
+--------------------------
+
+1. Trusted userspace
+^^^^^^^^^^^^^^^^^^^^
+
+   If all userspace applications are from trusted sources and do not
+   execute externally supplied untrusted code, then the mitigations can
+   be disabled.
+
+2. Protect sensitive programs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   For security-sensitive programs that have secrets (e.g. crypto
+   keys), protection against Spectre variant 2 can be put in place by
+   disabling indirect branch speculation when the program is running
+   (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+
+3. Sandbox untrusted programs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+   Untrusted programs that could be a source of attacks can be cordoned
+   off by disabling their indirect branch speculation when they are run
+   (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+   This prevents untrusted programs from polluting the branch target
+   buffer.  All programs running in SECCOMP sandboxes have indirect
+   branch speculation restricted by default. This behavior can be
+   changed via the kernel command line and sysfs control files. See
+   :ref:`spectre_mitigation_control_command_line`.
+
+3. High security mode
+^^^^^^^^^^^^^^^^^^^^^
+
+   All Spectre variant 2 mitigations can be forced on
+   at boot time for all programs (See the "on" option in
+   :ref:`spectre_mitigation_control_command_line`).  This will add
+   overhead as indirect branch speculations for all programs will be
+   restricted.
+
+   On x86, branch target buffer will be flushed with IBPB when switching
+   to a new program. STIBP is left on all the time to protect programs
+   against variant 2 attacks originating from programs running on
+   sibling threads.
+
+   Alternatively, STIBP can be used only when running programs
+   whose indirect branch speculation is explicitly disabled,
+   while IBPB is still used all the time when switching to a new
+   program to clear the branch target buffer (See "ibpb" option in
+   :ref:`spectre_mitigation_control_command_line`).  This "ibpb" option
+   has less performance cost than the "on" option, which leaves STIBP
+   on all the time.
+
+References on Spectre
+---------------------
+
+Intel white papers:
+
+.. _spec_ref1:
+
+[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
+
+.. _spec_ref2:
+
+[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_.
+
+.. _spec_ref3:
+
+[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_.
+
+.. _spec_ref4:
+
+[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_.
+
+AMD white papers:
+
+.. _spec_ref5:
+
+[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
+
+.. _spec_ref6:
+
+[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
+
+ARM white papers:
+
+.. _spec_ref7:
+
+[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_.
+
+.. _spec_ref8:
+
+[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_.
+
+Google white paper:
+
+.. _spec_ref9:
+
+[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_.
+
+MIPS white paper:
+
+.. _spec_ref10:
+
+[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
+
+Academic papers:
+
+.. _spec_ref11:
+
+[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_.
+
+.. _spec_ref12:
+
+[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_.
+
+.. _spec_ref13:
+
+[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_.
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index 1129c7550a48..7ddd8f667459 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -49,6 +49,8 @@ If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is
 available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
 misfeature will fail.
 
+.. _set_spec_ctrl:
+
 PR_SET_SPECULATION_CTRL
 -----------------------
 
-- 
cgit v1.2.3-59-g8ed1b


From cca5e0b8a430c888c5de1b5d36b87c085354f2c8 Mon Sep 17 00:00:00 2001
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Date: Wed, 26 Jun 2019 11:49:42 -0600
Subject: Documentation: PGP: update for newer HW devices

Newer devices like Yubikey 5 and Nitrokey Pro 2 have added support for
NISTP's implementation of ECC cryptography, so update the guide
accordingly and add a note on when to use nistp256 and when to use
ed25519 for generating S keys.

Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/process/maintainer-pgp-guide.rst | 31 ++++++++++++++------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/Documentation/process/maintainer-pgp-guide.rst b/Documentation/process/maintainer-pgp-guide.rst
index 4bab7464ff8c..17db11b7ed48 100644
--- a/Documentation/process/maintainer-pgp-guide.rst
+++ b/Documentation/process/maintainer-pgp-guide.rst
@@ -238,7 +238,10 @@ your new subkey::
     work.
 
     If for some reason you prefer to stay with RSA subkeys, just replace
-    "ed25519" with "rsa2048" in the above command.
+    "ed25519" with "rsa2048" in the above command. Additionally, if you
+    plan to use a hardware device that does not support ED25519 ECC
+    keys, like Nitrokey Pro or a Yubikey, then you should use
+    "nistp256" instead or "ed25519."
 
 
 Back up your master key for disaster recovery
@@ -432,23 +435,23 @@ Available smartcard devices
 
 Unless all your laptops and workstations have smartcard readers, the
 easiest is to get a specialized USB device that implements smartcard
-functionality.  There are several options available:
+functionality. There are several options available:
 
 - `Nitrokey Start`_: Open hardware and Free Software, based on FSI
-  Japan's `Gnuk`_. Offers support for ECC keys, but fewest security
-  features (such as resistance to tampering or some side-channel
-  attacks).
-- `Nitrokey Pro`_: Similar to the Nitrokey Start, but more
-  tamper-resistant and offers more security features, but no ECC
-  support.
-- `Yubikey 4`_: proprietary hardware and software, but cheaper than
+  Japan's `Gnuk`_. One of the few available commercial devices that
+  support ED25519 ECC keys, but offer fewest security features (such as
+  resistance to tampering or some side-channel attacks).
+- `Nitrokey Pro 2`_: Similar to the Nitrokey Start, but more
+  tamper-resistant and offers more security features. Pro 2 supports ECC
+  cryptography (NISTP).
+- `Yubikey 5`_: proprietary hardware and software, but cheaper than
   Nitrokey Pro and comes available in the USB-C form that is more useful
   with newer laptops. Offers additional security features such as FIDO
-  U2F, but no ECC.
+  U2F, among others, and now finally supports ECC keys (NISTP).
 
 `LWN has a good review`_ of some of the above models, as well as several
-others. If you want to use ECC keys, your best bet among commercially
-available devices is the Nitrokey Start.
+others. Your choice will depend on cost, shipping availability in your
+geographical region, and open/proprietary hardware considerations.
 
 .. note::
 
@@ -457,8 +460,8 @@ available devices is the Nitrokey Start.
     Foundation.
 
 .. _`Nitrokey Start`: https://shop.nitrokey.com/shop/product/nitrokey-start-6
-.. _`Nitrokey Pro`: https://shop.nitrokey.com/shop/product/nitrokey-pro-3
-.. _`Yubikey 4`: https://www.yubico.com/product/yubikey-4-series/
+.. _`Nitrokey Pro 2`: https://shop.nitrokey.com/shop/product/nitrokey-pro-2-3
+.. _`Yubikey 5`: https://www.yubico.com/products/yubikey-5-overview/
 .. _Gnuk: http://www.fsij.org/doc-gnuk/
 .. _`LWN has a good review`: https://lwn.net/Articles/736231/
 .. _`qualify for a free Nitrokey Start`: https://www.kernel.org/nitrokey-digital-tokens-for-kernel-developers.html
-- 
cgit v1.2.3-59-g8ed1b


From b4f4174ae982dfc855c56e91776920e0166da1bf Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Sat, 22 Jun 2019 14:47:46 -0300
Subject: docs: zh_CN: submitting-drivers.rst: Remove a duplicated
 Documentation/

Somehow, this file ended with Documentation/ twice.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/translations/zh_CN/process/submitting-drivers.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/translations/zh_CN/process/submitting-drivers.rst b/Documentation/translations/zh_CN/process/submitting-drivers.rst
index 72c6cd935821..72f4f45c98de 100644
--- a/Documentation/translations/zh_CN/process/submitting-drivers.rst
+++ b/Documentation/translations/zh_CN/process/submitting-drivers.rst
@@ -22,7 +22,7 @@
 兴趣的是显卡驱动程序，你也许应该访问 XFree86 项目(http://www.xfree86.org/)
 和／或 X.org 项目 (http://x.org)。
 
-另请参阅 Documentation/Documentation/translations/zh_CN/process/submitting-patches.rst 文档。
+另请参阅 Documentation/translations/zh_CN/process/submitting-patches.rst 文档。
 
 
 分配设备号
-- 
cgit v1.2.3-59-g8ed1b


From 8c69b77a0175d6e14df9cdf386a8b69f6cfa2c6a Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.ibm.com>
Date: Mon, 24 Jun 2019 08:25:07 +0300
Subject: scripts/sphinx-pre-install: fix out-of-tree build

Build of htmldocs fails for out-of-tree builds:

$ make V=1 O=~/build/kernel/ htmldocs
make -C /home/rppt/build/kernel -f /home/rppt/git/linux-docs/Makefile htmldocs
make[1]: Entering directory '/home/rppt/build/kernel'
make -f /home/rppt/git/linux-docs/scripts/Makefile.build obj=scripts/basic
rm -f .tmp_quiet_recordmcount
make -f /home/rppt/git/linux-docs/scripts/Makefile.build obj=Documentation htmldocs
Can't open Documentation/conf.py at /home/rppt/git/linux-docs/scripts/sphinx-pre-install line 230.
/home/rppt/git/linux-docs/Documentation/Makefile:80: recipe for target 'htmldocs' failed
make[2]: *** [htmldocs] Error 2

The scripts/sphinx-pre-install is trying to open files in the current
directory which is $KBUILD_OUTPUT rather than in $srctree.

Fix it.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/sphinx-pre-install | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install
index 0b44d51c4991..f230e65329a2 100755
--- a/scripts/sphinx-pre-install
+++ b/scripts/sphinx-pre-install
@@ -5,8 +5,11 @@ use strict;
 # Copyright (c) 2017-2019 Mauro Carvalho Chehab <mchehab@kernel.org>
 #
 
-my $conf = "Documentation/conf.py";
-my $requirement_file = "Documentation/sphinx/requirements.txt";
+my $prefix = "./";
+$prefix = "$ENV{'srctree'}/" if ($ENV{'srctree'});
+
+my $conf = $prefix . "Documentation/conf.py";
+my $requirement_file = $prefix . "Documentation/sphinx/requirements.txt";
 my $virtenv_prefix = "sphinx_";
 
 #
-- 
cgit v1.2.3-59-g8ed1b


From 7c116d22ad23809767c5ec06affa19b9bb163d97 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Wed, 26 Jun 2019 10:35:11 -0300
Subject: docs: filesystems: Remove uneeded .rst extension on toctables

There's no need to use a .rst on Sphinx toc tables. As most of
the Documentation don't use, remove the remaing occurrences.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ext4/index.rst | 8 ++++----
 Documentation/filesystems/index.rst      | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst
index 3be3e54d480d..705d813d558f 100644
--- a/Documentation/filesystems/ext4/index.rst
+++ b/Documentation/filesystems/ext4/index.rst
@@ -8,7 +8,7 @@ ext4 Data Structures and Algorithms
    :maxdepth: 6
    :numbered:
 
-   about.rst
-   overview.rst
-   globals.rst
-   dynamic.rst
+   about
+   overview
+   globals
+   dynamic
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 35644840a690..1651173f1118 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -17,7 +17,7 @@ algorithms work.
    :maxdepth: 2
 
    vfs
-   path-lookup.rst
+   path-lookup
    api-summary
    splice
 
@@ -41,4 +41,4 @@ Documentation for individual filesystem types can be found here.
 .. toctree::
    :maxdepth: 2
 
-   binderfs.rst
+   binderfs
-- 
cgit v1.2.3-59-g8ed1b


From a9f0969cd7b3b3653a10ef4a5d62075aa4a5a27f Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Wed, 26 Jun 2019 15:07:01 -0500
Subject: Documentation: RCU: Convert RCU basic concepts to reST

RCU basic concepts reST markup.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/rcu.txt | 119 ++++++++++++++++++++++++----------------------
 1 file changed, 61 insertions(+), 58 deletions(-)

diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index c818cf65c5a9..8dfb437dacc3 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -1,5 +1,7 @@
-RCU Concepts
+.. _rcu_doc:
 
+RCU Concepts
+============
 
 The basic idea behind RCU (read-copy update) is to split destructive
 operations into two parts, one that prevents anyone from seeing the data
@@ -8,82 +10,83 @@ A "grace period" must elapse between the two parts, and this grace period
 must be long enough that any readers accessing the item being deleted have
 since dropped their references.  For example, an RCU-protected deletion
 from a linked list would first remove the item from the list, wait for
-a grace period to elapse, then free the element.  See the listRCU.txt
-file for more information on using RCU with linked lists.
-
+a grace period to elapse, then free the element.  See the
+Documentation/RCU/listRCU.rst file for more information on using RCU with
+linked lists.
 
 Frequently Asked Questions
+--------------------------
 
-o	Why would anyone want to use RCU?
+- Why would anyone want to use RCU?
 
-	The advantage of RCU's two-part approach is that RCU readers need
-	not acquire any locks, perform any atomic instructions, write to
-	shared memory, or (on CPUs other than Alpha) execute any memory
-	barriers.  The fact that these operations are quite expensive
-	on modern CPUs is what gives RCU its performance advantages
-	in read-mostly situations.  The fact that RCU readers need not
-	acquire locks can also greatly simplify deadlock-avoidance code.
+  The advantage of RCU's two-part approach is that RCU readers need
+  not acquire any locks, perform any atomic instructions, write to
+  shared memory, or (on CPUs other than Alpha) execute any memory
+  barriers.  The fact that these operations are quite expensive
+  on modern CPUs is what gives RCU its performance advantages
+  in read-mostly situations.  The fact that RCU readers need not
+  acquire locks can also greatly simplify deadlock-avoidance code.
 
-o	How can the updater tell when a grace period has completed
-	if the RCU readers give no indication when they are done?
+- How can the updater tell when a grace period has completed
+  if the RCU readers give no indication when they are done?
 
-	Just as with spinlocks, RCU readers are not permitted to
-	block, switch to user-mode execution, or enter the idle loop.
-	Therefore, as soon as a CPU is seen passing through any of these
-	three states, we know that that CPU has exited any previous RCU
-	read-side critical sections.  So, if we remove an item from a
-	linked list, and then wait until all CPUs have switched context,
-	executed in user mode, or executed in the idle loop, we can
-	safely free up that item.
+  Just as with spinlocks, RCU readers are not permitted to
+  block, switch to user-mode execution, or enter the idle loop.
+  Therefore, as soon as a CPU is seen passing through any of these
+  three states, we know that that CPU has exited any previous RCU
+  read-side critical sections.  So, if we remove an item from a
+  linked list, and then wait until all CPUs have switched context,
+  executed in user mode, or executed in the idle loop, we can
+  safely free up that item.
 
-	Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
-	same effect, but require that the readers manipulate CPU-local
-	counters.  These counters allow limited types of blocking within
-	RCU read-side critical sections.  SRCU also uses CPU-local
-	counters, and permits general blocking within RCU read-side
-	critical sections.  These variants of RCU detect grace periods
-	by sampling these counters.
+  Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
+  same effect, but require that the readers manipulate CPU-local
+  counters.  These counters allow limited types of blocking within
+  RCU read-side critical sections.  SRCU also uses CPU-local
+  counters, and permits general blocking within RCU read-side
+  critical sections.  These variants of RCU detect grace periods
+  by sampling these counters.
 
-o	If I am running on a uniprocessor kernel, which can only do one
-	thing at a time, why should I wait for a grace period?
+- If I am running on a uniprocessor kernel, which can only do one
+  thing at a time, why should I wait for a grace period?
 
-	See the UP.txt file in this directory.
+  See the Documentation/RCU/UP.rst file for more information.
 
-o	How can I see where RCU is currently used in the Linux kernel?
+- How can I see where RCU is currently used in the Linux kernel?
 
-	Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
-	"rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
-	"srcu_read_unlock", "synchronize_rcu", "synchronize_net",
-	"synchronize_srcu", and the other RCU primitives.  Or grab one
-	of the cscope databases from:
+  Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
+  "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
+  "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
+  "synchronize_srcu", and the other RCU primitives.  Or grab one
+  of the cscope databases from:
 
-	http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html
+  (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
 
-o	What guidelines should I follow when writing code that uses RCU?
+- What guidelines should I follow when writing code that uses RCU?
 
-	See the checklist.txt file in this directory.
+  See the checklist.txt file in this directory.
 
-o	Why the name "RCU"?
+- Why the name "RCU"?
 
-	"RCU" stands for "read-copy update".  The file listRCU.txt has
-	more information on where this name came from, search for
-	"read-copy update" to find it.
+  "RCU" stands for "read-copy update".  The file Documentation/RCU/listRCU.rst
+  has more information on where this name came from, search for
+  "read-copy update" to find it.
 
-o	I hear that RCU is patented?  What is with that?
+- I hear that RCU is patented?  What is with that?
 
-	Yes, it is.  There are several known patents related to RCU,
-	search for the string "Patent" in RTFP.txt to find them.
-	Of these, one was allowed to lapse by the assignee, and the
-	others have been contributed to the Linux kernel under GPL.
-	There are now also LGPL implementations of user-level RCU
-	available (http://liburcu.org/).
+  Yes, it is.  There are several known patents related to RCU,
+  search for the string "Patent" in RTFP.txt to find them.
+  Of these, one was allowed to lapse by the assignee, and the
+  others have been contributed to the Linux kernel under GPL.
+  There are now also LGPL implementations of user-level RCU
+  available (http://liburcu.org/).
 
-o	I hear that RCU needs work in order to support realtime kernels?
+- I hear that RCU needs work in order to support realtime kernels?
 
-	Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
-	kernel configuration parameter.
+  Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
+  kernel configuration parameter.
 
-o	Where can I find more information on RCU?
+- Where can I find more information on RCU?
 
-	See the RTFP.txt file in this directory.
-	Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
+  See the RTFP.txt file in this directory.
+  Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
-- 
cgit v1.2.3-59-g8ed1b


From 9422dc24df62ada59ead9e4a499f036b4ce670fc Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Wed, 26 Jun 2019 15:07:02 -0500
Subject: Documentation: RCU: Convert RCU linked list to reST

RCU linked list reST markup.

Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/listRCU.txt | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/Documentation/RCU/listRCU.txt b/Documentation/RCU/listRCU.txt
index adb5a3782846..7956ff33042b 100644
--- a/Documentation/RCU/listRCU.txt
+++ b/Documentation/RCU/listRCU.txt
@@ -1,5 +1,7 @@
-Using RCU to Protect Read-Mostly Linked Lists
+.. _list_rcu_doc:
 
+Using RCU to Protect Read-Mostly Linked Lists
+=============================================
 
 One of the best applications of RCU is to protect read-mostly linked lists
 ("struct list_head" in list.h).  One big advantage of this approach
@@ -7,8 +9,8 @@ is that all of the required memory barriers are included for you in
 the list macros.  This document describes several applications of RCU,
 with the best fits first.
 
-
 Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
+----------------------------------------------------------------------
 
 The best applications are cases where, if reader-writer locking were
 used, the read-side lock would be dropped before taking any action
@@ -24,7 +26,7 @@ added or deleted, rather than being modified in place.
 
 A straightforward example of this use of RCU may be found in the
 system-call auditing support.  For example, a reader-writer locked
-implementation of audit_filter_task() might be as follows:
+implementation of audit_filter_task() might be as follows::
 
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
@@ -48,7 +50,7 @@ the corresponding value is returned.  By the time that this value is acted
 on, the list may well have been modified.  This makes sense, since if
 you are turning auditing off, it is OK to audit a few extra system calls.
 
-This means that RCU can be easily applied to the read side, as follows:
+This means that RCU can be easily applied to the read side, as follows::
 
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
@@ -73,7 +75,7 @@ become list_for_each_entry_rcu().  The _rcu() list-traversal primitives
 insert the read-side memory barriers that are required on DEC Alpha CPUs.
 
 The changes to the update side are also straightforward.  A reader-writer
-lock might be used as follows for deletion and insertion:
+lock might be used as follows for deletion and insertion::
 
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
@@ -106,7 +108,7 @@ lock might be used as follows for deletion and insertion:
 		return 0;
 	}
 
-Following are the RCU equivalents for these two functions:
+Following are the RCU equivalents for these two functions::
 
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
@@ -154,13 +156,13 @@ otherwise cause concurrent readers to fail spectacularly.
 So, when readers can tolerate stale data and when entries are either added
 or deleted, without in-place modification, it is very easy to use RCU!
 
-
 Example 2: Handling In-Place Updates
+------------------------------------
 
 The system-call auditing code does not update auditing rules in place.
 However, if it did, reader-writer-locked code to do so might look as
 follows (presumably, the field_count is only permitted to decrease,
-otherwise, the added fields would need to be filled in):
+otherwise, the added fields would need to be filled in)::
 
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
@@ -187,7 +189,7 @@ otherwise, the added fields would need to be filled in):
 The RCU version creates a copy, updates the copy, then replaces the old
 entry with the newly updated entry.  This sequence of actions, allowing
 concurrent reads while doing a copy to perform an update, is what gives
-RCU ("read-copy update") its name.  The RCU code is as follows:
+RCU ("read-copy update") its name.  The RCU code is as follows::
 
 	static inline int audit_upd_rule(struct audit_rule *rule,
 					 struct list_head *list,
@@ -216,8 +218,8 @@ RCU ("read-copy update") its name.  The RCU code is as follows:
 Again, this assumes that the caller holds audit_netlink_sem.  Normally,
 the reader-writer lock would become a spinlock in this sort of code.
 
-
 Example 3: Eliminating Stale Data
+---------------------------------
 
 The auditing examples above tolerate stale data, as do most algorithms
 that are tracking external state.  Because there is a delay from the
@@ -231,13 +233,16 @@ per-entry spinlock, and, if the "deleted" flag is set, pretends that the
 entry does not exist.  For this to be helpful, the search function must
 return holding the per-entry spinlock, as ipc_lock() does in fact do.
 
-Quick Quiz:  Why does the search function need to return holding the
-	per-entry lock for this deleted-flag technique to be helpful?
+Quick Quiz:
+	Why does the search function need to return holding the per-entry lock for
+	this deleted-flag technique to be helpful?
+
+:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
 
 If the system-call audit module were to ever need to reject stale data,
 one way to accomplish this would be to add a "deleted" flag and a "lock"
 spinlock to the audit_entry structure, and modify audit_filter_task()
-as follows:
+as follows::
 
 	static enum audit_state audit_filter_task(struct task_struct *tsk)
 	{
@@ -268,7 +273,7 @@ audit_upd_rule() would need additional memory barriers to ensure
 that the list_add_rcu() was really executed before the list_del_rcu().
 
 The audit_del_rule() function would need to set the "deleted"
-flag under the spinlock as follows:
+flag under the spinlock as follows::
 
 	static inline int audit_del_rule(struct audit_rule *rule,
 					 struct list_head *list)
@@ -290,8 +295,8 @@ flag under the spinlock as follows:
 		return -EFAULT;		/* No matching rule */
 	}
 
-
 Summary
+-------
 
 Read-mostly list-based data structures that can tolerate stale data are
 the most amenable to use of RCU.  The simplest case is where entries are
@@ -302,8 +307,9 @@ If stale data cannot be tolerated, then a "deleted" flag may be used
 in conjunction with a per-entry spinlock in order to allow the search
 function to reject newly deleted data.
 
+.. _answer_quick_quiz_list:
 
-Answer to Quick Quiz
+Answer to Quick Quiz:
 	Why does the search function need to return holding the per-entry
 	lock for this deleted-flag technique to be helpful?
 
-- 
cgit v1.2.3-59-g8ed1b


From 2a5b0c841a9932e3562c9b0dfddeb54de255a595 Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Wed, 26 Jun 2019 15:07:03 -0500
Subject: Documentation: RCU: Convert RCU UP systems to reST

RCU UP systems reST markup.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/UP.txt | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt
index 53bde717017b..67715a47ae89 100644
--- a/Documentation/RCU/UP.txt
+++ b/Documentation/RCU/UP.txt
@@ -1,17 +1,19 @@
-RCU on Uniprocessor Systems
+.. _up_doc:
 
+RCU on Uniprocessor Systems
+===========================
 
 A common misconception is that, on UP systems, the call_rcu() primitive
 may immediately invoke its function.  The basis of this misconception
 is that since there is only one CPU, it should not be necessary to
 wait for anything else to get done, since there are no other CPUs for
-anything else to be happening on.  Although this approach will -sort- -of-
+anything else to be happening on.  Although this approach will *sort of*
 work a surprising amount of the time, it is a very bad idea in general.
 This document presents three examples that demonstrate exactly how bad
 an idea this is.
 
-
 Example 1: softirq Suicide
+--------------------------
 
 Suppose that an RCU-based algorithm scans a linked list containing
 elements A, B, and C in process context, and can delete elements from
@@ -28,8 +30,8 @@ your kernel.
 This same problem can occur if call_rcu() is invoked from a hardware
 interrupt handler.
 
-
 Example 2: Function-Call Fatality
+---------------------------------
 
 Of course, one could avert the suicide described in the preceding example
 by having call_rcu() directly invoke its arguments only if it was called
@@ -46,11 +48,13 @@ its arguments would cause it to fail to make the fundamental guarantee
 underlying RCU, namely that call_rcu() defers invoking its arguments until
 all RCU read-side critical sections currently executing have completed.
 
-Quick Quiz #1: why is it -not- legal to invoke synchronize_rcu() in
-	this case?
+Quick Quiz #1:
+	Why is it *not* legal to invoke synchronize_rcu() in this case?
 
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
 
 Example 3: Death by Deadlock
+----------------------------
 
 Suppose that call_rcu() is invoked while holding a lock, and that the
 callback function must acquire this same lock.  In this case, if
@@ -76,25 +80,30 @@ there are cases where this can be quite ugly:
 If call_rcu() directly invokes the callback, painful locking restrictions
 or API changes would be required.
 
-Quick Quiz #2: What locking restriction must RCU callbacks respect?
+Quick Quiz #2:
+	What locking restriction must RCU callbacks respect?
 
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
 
 Summary
+-------
 
 Permitting call_rcu() to immediately invoke its arguments breaks RCU,
 even on a UP system.  So do not do it!  Even on a UP system, the RCU
-infrastructure -must- respect grace periods, and -must- invoke callbacks
+infrastructure *must* respect grace periods, and *must* invoke callbacks
 from a known environment in which no locks are held.
 
-Note that it -is- safe for synchronize_rcu() to return immediately on
-UP systems, including !PREEMPT SMP builds running on UP systems.
+Note that it *is* safe for synchronize_rcu() to return immediately on
+UP systems, including PREEMPT SMP builds running on UP systems.
 
-Quick Quiz #3: Why can't synchronize_rcu() return immediately on
-	UP systems running preemptable RCU?
+Quick Quiz #3:
+	Why can't synchronize_rcu() return immediately on UP systems running
+	preemptable RCU?
 
+.. _answer_quick_quiz_up:
 
 Answer to Quick Quiz #1:
-	Why is it -not- legal to invoke synchronize_rcu() in this case?
+	Why is it *not* legal to invoke synchronize_rcu() in this case?
 
 	Because the calling function is scanning an RCU-protected linked
 	list, and is therefore within an RCU read-side critical section.
@@ -119,7 +128,7 @@ Answer to Quick Quiz #2:
 
 	This restriction might seem gratuitous, since very few RCU
 	callbacks acquire locks directly.  However, a great many RCU
-	callbacks do acquire locks -indirectly-, for example, via
+	callbacks do acquire locks *indirectly*, for example, via
 	the kfree() primitive.
 
 Answer to Quick Quiz #3:
-- 
cgit v1.2.3-59-g8ed1b


From f93a3e4e8705eb3ea17dcd68819b60875c834bad Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Wed, 26 Jun 2019 15:07:04 -0500
Subject: Documentation: RCU: Rename txt files to rst

Rename the following files to reST:
  - rcu.txt
  - listRCU.txt
  - UP.txt

Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/UP.rst      | 142 +++++++++++++++++++
 Documentation/RCU/UP.txt      | 142 -------------------
 Documentation/RCU/listRCU.rst | 321 ++++++++++++++++++++++++++++++++++++++++++
 Documentation/RCU/listRCU.txt | 321 ------------------------------------------
 Documentation/RCU/rcu.rst     |  92 ++++++++++++
 Documentation/RCU/rcu.txt     |  92 ------------
 6 files changed, 555 insertions(+), 555 deletions(-)
 create mode 100644 Documentation/RCU/UP.rst
 delete mode 100644 Documentation/RCU/UP.txt
 create mode 100644 Documentation/RCU/listRCU.rst
 delete mode 100644 Documentation/RCU/listRCU.txt
 create mode 100644 Documentation/RCU/rcu.rst
 delete mode 100644 Documentation/RCU/rcu.txt

diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
new file mode 100644
index 000000000000..67715a47ae89
--- /dev/null
+++ b/Documentation/RCU/UP.rst
@@ -0,0 +1,142 @@
+.. _up_doc:
+
+RCU on Uniprocessor Systems
+===========================
+
+A common misconception is that, on UP systems, the call_rcu() primitive
+may immediately invoke its function.  The basis of this misconception
+is that since there is only one CPU, it should not be necessary to
+wait for anything else to get done, since there are no other CPUs for
+anything else to be happening on.  Although this approach will *sort of*
+work a surprising amount of the time, it is a very bad idea in general.
+This document presents three examples that demonstrate exactly how bad
+an idea this is.
+
+Example 1: softirq Suicide
+--------------------------
+
+Suppose that an RCU-based algorithm scans a linked list containing
+elements A, B, and C in process context, and can delete elements from
+this same list in softirq context.  Suppose that the process-context scan
+is referencing element B when it is interrupted by softirq processing,
+which deletes element B, and then invokes call_rcu() to free element B
+after a grace period.
+
+Now, if call_rcu() were to directly invoke its arguments, then upon return
+from softirq, the list scan would find itself referencing a newly freed
+element B.  This situation can greatly decrease the life expectancy of
+your kernel.
+
+This same problem can occur if call_rcu() is invoked from a hardware
+interrupt handler.
+
+Example 2: Function-Call Fatality
+---------------------------------
+
+Of course, one could avert the suicide described in the preceding example
+by having call_rcu() directly invoke its arguments only if it was called
+from process context.  However, this can fail in a similar manner.
+
+Suppose that an RCU-based algorithm again scans a linked list containing
+elements A, B, and C in process contexts, but that it invokes a function
+on each element as it is scanned.  Suppose further that this function
+deletes element B from the list, then passes it to call_rcu() for deferred
+freeing.  This may be a bit unconventional, but it is perfectly legal
+RCU usage, since call_rcu() must wait for a grace period to elapse.
+Therefore, in this case, allowing call_rcu() to immediately invoke
+its arguments would cause it to fail to make the fundamental guarantee
+underlying RCU, namely that call_rcu() defers invoking its arguments until
+all RCU read-side critical sections currently executing have completed.
+
+Quick Quiz #1:
+	Why is it *not* legal to invoke synchronize_rcu() in this case?
+
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
+
+Example 3: Death by Deadlock
+----------------------------
+
+Suppose that call_rcu() is invoked while holding a lock, and that the
+callback function must acquire this same lock.  In this case, if
+call_rcu() were to directly invoke the callback, the result would
+be self-deadlock.
+
+In some cases, it would possible to restructure to code so that
+the call_rcu() is delayed until after the lock is released.  However,
+there are cases where this can be quite ugly:
+
+1.	If a number of items need to be passed to call_rcu() within
+	the same critical section, then the code would need to create
+	a list of them, then traverse the list once the lock was
+	released.
+
+2.	In some cases, the lock will be held across some kernel API,
+	so that delaying the call_rcu() until the lock is released
+	requires that the data item be passed up via a common API.
+	It is far better to guarantee that callbacks are invoked
+	with no locks held than to have to modify such APIs to allow
+	arbitrary data items to be passed back up through them.
+
+If call_rcu() directly invokes the callback, painful locking restrictions
+or API changes would be required.
+
+Quick Quiz #2:
+	What locking restriction must RCU callbacks respect?
+
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
+
+Summary
+-------
+
+Permitting call_rcu() to immediately invoke its arguments breaks RCU,
+even on a UP system.  So do not do it!  Even on a UP system, the RCU
+infrastructure *must* respect grace periods, and *must* invoke callbacks
+from a known environment in which no locks are held.
+
+Note that it *is* safe for synchronize_rcu() to return immediately on
+UP systems, including PREEMPT SMP builds running on UP systems.
+
+Quick Quiz #3:
+	Why can't synchronize_rcu() return immediately on UP systems running
+	preemptable RCU?
+
+.. _answer_quick_quiz_up:
+
+Answer to Quick Quiz #1:
+	Why is it *not* legal to invoke synchronize_rcu() in this case?
+
+	Because the calling function is scanning an RCU-protected linked
+	list, and is therefore within an RCU read-side critical section.
+	Therefore, the called function has been invoked within an RCU
+	read-side critical section, and is not permitted to block.
+
+Answer to Quick Quiz #2:
+	What locking restriction must RCU callbacks respect?
+
+	Any lock that is acquired within an RCU callback must be
+	acquired elsewhere using an _irq variant of the spinlock
+	primitive.  For example, if "mylock" is acquired by an
+	RCU callback, then a process-context acquisition of this
+	lock must use something like spin_lock_irqsave() to
+	acquire the lock.
+
+	If the process-context code were to simply use spin_lock(),
+	then, since RCU callbacks can be invoked from softirq context,
+	the callback might be called from a softirq that interrupted
+	the process-context critical section.  This would result in
+	self-deadlock.
+
+	This restriction might seem gratuitous, since very few RCU
+	callbacks acquire locks directly.  However, a great many RCU
+	callbacks do acquire locks *indirectly*, for example, via
+	the kfree() primitive.
+
+Answer to Quick Quiz #3:
+	Why can't synchronize_rcu() return immediately on UP systems
+	running preemptable RCU?
+
+	Because some other task might have been preempted in the middle
+	of an RCU read-side critical section.  If synchronize_rcu()
+	simply immediately returned, it would prematurely signal the
+	end of the grace period, which would come as a nasty shock to
+	that other thread when it started running again.
diff --git a/Documentation/RCU/UP.txt b/Documentation/RCU/UP.txt
deleted file mode 100644
index 67715a47ae89..000000000000
--- a/Documentation/RCU/UP.txt
+++ /dev/null
@@ -1,142 +0,0 @@
-.. _up_doc:
-
-RCU on Uniprocessor Systems
-===========================
-
-A common misconception is that, on UP systems, the call_rcu() primitive
-may immediately invoke its function.  The basis of this misconception
-is that since there is only one CPU, it should not be necessary to
-wait for anything else to get done, since there are no other CPUs for
-anything else to be happening on.  Although this approach will *sort of*
-work a surprising amount of the time, it is a very bad idea in general.
-This document presents three examples that demonstrate exactly how bad
-an idea this is.
-
-Example 1: softirq Suicide
---------------------------
-
-Suppose that an RCU-based algorithm scans a linked list containing
-elements A, B, and C in process context, and can delete elements from
-this same list in softirq context.  Suppose that the process-context scan
-is referencing element B when it is interrupted by softirq processing,
-which deletes element B, and then invokes call_rcu() to free element B
-after a grace period.
-
-Now, if call_rcu() were to directly invoke its arguments, then upon return
-from softirq, the list scan would find itself referencing a newly freed
-element B.  This situation can greatly decrease the life expectancy of
-your kernel.
-
-This same problem can occur if call_rcu() is invoked from a hardware
-interrupt handler.
-
-Example 2: Function-Call Fatality
----------------------------------
-
-Of course, one could avert the suicide described in the preceding example
-by having call_rcu() directly invoke its arguments only if it was called
-from process context.  However, this can fail in a similar manner.
-
-Suppose that an RCU-based algorithm again scans a linked list containing
-elements A, B, and C in process contexts, but that it invokes a function
-on each element as it is scanned.  Suppose further that this function
-deletes element B from the list, then passes it to call_rcu() for deferred
-freeing.  This may be a bit unconventional, but it is perfectly legal
-RCU usage, since call_rcu() must wait for a grace period to elapse.
-Therefore, in this case, allowing call_rcu() to immediately invoke
-its arguments would cause it to fail to make the fundamental guarantee
-underlying RCU, namely that call_rcu() defers invoking its arguments until
-all RCU read-side critical sections currently executing have completed.
-
-Quick Quiz #1:
-	Why is it *not* legal to invoke synchronize_rcu() in this case?
-
-:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
-
-Example 3: Death by Deadlock
-----------------------------
-
-Suppose that call_rcu() is invoked while holding a lock, and that the
-callback function must acquire this same lock.  In this case, if
-call_rcu() were to directly invoke the callback, the result would
-be self-deadlock.
-
-In some cases, it would possible to restructure to code so that
-the call_rcu() is delayed until after the lock is released.  However,
-there are cases where this can be quite ugly:
-
-1.	If a number of items need to be passed to call_rcu() within
-	the same critical section, then the code would need to create
-	a list of them, then traverse the list once the lock was
-	released.
-
-2.	In some cases, the lock will be held across some kernel API,
-	so that delaying the call_rcu() until the lock is released
-	requires that the data item be passed up via a common API.
-	It is far better to guarantee that callbacks are invoked
-	with no locks held than to have to modify such APIs to allow
-	arbitrary data items to be passed back up through them.
-
-If call_rcu() directly invokes the callback, painful locking restrictions
-or API changes would be required.
-
-Quick Quiz #2:
-	What locking restriction must RCU callbacks respect?
-
-:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
-
-Summary
--------
-
-Permitting call_rcu() to immediately invoke its arguments breaks RCU,
-even on a UP system.  So do not do it!  Even on a UP system, the RCU
-infrastructure *must* respect grace periods, and *must* invoke callbacks
-from a known environment in which no locks are held.
-
-Note that it *is* safe for synchronize_rcu() to return immediately on
-UP systems, including PREEMPT SMP builds running on UP systems.
-
-Quick Quiz #3:
-	Why can't synchronize_rcu() return immediately on UP systems running
-	preemptable RCU?
-
-.. _answer_quick_quiz_up:
-
-Answer to Quick Quiz #1:
-	Why is it *not* legal to invoke synchronize_rcu() in this case?
-
-	Because the calling function is scanning an RCU-protected linked
-	list, and is therefore within an RCU read-side critical section.
-	Therefore, the called function has been invoked within an RCU
-	read-side critical section, and is not permitted to block.
-
-Answer to Quick Quiz #2:
-	What locking restriction must RCU callbacks respect?
-
-	Any lock that is acquired within an RCU callback must be
-	acquired elsewhere using an _irq variant of the spinlock
-	primitive.  For example, if "mylock" is acquired by an
-	RCU callback, then a process-context acquisition of this
-	lock must use something like spin_lock_irqsave() to
-	acquire the lock.
-
-	If the process-context code were to simply use spin_lock(),
-	then, since RCU callbacks can be invoked from softirq context,
-	the callback might be called from a softirq that interrupted
-	the process-context critical section.  This would result in
-	self-deadlock.
-
-	This restriction might seem gratuitous, since very few RCU
-	callbacks acquire locks directly.  However, a great many RCU
-	callbacks do acquire locks *indirectly*, for example, via
-	the kfree() primitive.
-
-Answer to Quick Quiz #3:
-	Why can't synchronize_rcu() return immediately on UP systems
-	running preemptable RCU?
-
-	Because some other task might have been preempted in the middle
-	of an RCU read-side critical section.  If synchronize_rcu()
-	simply immediately returned, it would prematurely signal the
-	end of the grace period, which would come as a nasty shock to
-	that other thread when it started running again.
diff --git a/Documentation/RCU/listRCU.rst b/Documentation/RCU/listRCU.rst
new file mode 100644
index 000000000000..7956ff33042b
--- /dev/null
+++ b/Documentation/RCU/listRCU.rst
@@ -0,0 +1,321 @@
+.. _list_rcu_doc:
+
+Using RCU to Protect Read-Mostly Linked Lists
+=============================================
+
+One of the best applications of RCU is to protect read-mostly linked lists
+("struct list_head" in list.h).  One big advantage of this approach
+is that all of the required memory barriers are included for you in
+the list macros.  This document describes several applications of RCU,
+with the best fits first.
+
+Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
+----------------------------------------------------------------------
+
+The best applications are cases where, if reader-writer locking were
+used, the read-side lock would be dropped before taking any action
+based on the results of the search.  The most celebrated example is
+the routing table.  Because the routing table is tracking the state of
+equipment outside of the computer, it will at times contain stale data.
+Therefore, once the route has been computed, there is no need to hold
+the routing table static during transmission of the packet.  After all,
+you can hold the routing table static all you want, but that won't keep
+the external Internet from changing, and it is the state of the external
+Internet that really matters.  In addition, routing entries are typically
+added or deleted, rather than being modified in place.
+
+A straightforward example of this use of RCU may be found in the
+system-call auditing support.  For example, a reader-writer locked
+implementation of audit_filter_task() might be as follows::
+
+	static enum audit_state audit_filter_task(struct task_struct *tsk)
+	{
+		struct audit_entry *e;
+		enum audit_state   state;
+
+		read_lock(&auditsc_lock);
+		/* Note: audit_netlink_sem held by caller. */
+		list_for_each_entry(e, &audit_tsklist, list) {
+			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+				read_unlock(&auditsc_lock);
+				return state;
+			}
+		}
+		read_unlock(&auditsc_lock);
+		return AUDIT_BUILD_CONTEXT;
+	}
+
+Here the list is searched under the lock, but the lock is dropped before
+the corresponding value is returned.  By the time that this value is acted
+on, the list may well have been modified.  This makes sense, since if
+you are turning auditing off, it is OK to audit a few extra system calls.
+
+This means that RCU can be easily applied to the read side, as follows::
+
+	static enum audit_state audit_filter_task(struct task_struct *tsk)
+	{
+		struct audit_entry *e;
+		enum audit_state   state;
+
+		rcu_read_lock();
+		/* Note: audit_netlink_sem held by caller. */
+		list_for_each_entry_rcu(e, &audit_tsklist, list) {
+			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+				rcu_read_unlock();
+				return state;
+			}
+		}
+		rcu_read_unlock();
+		return AUDIT_BUILD_CONTEXT;
+	}
+
+The read_lock() and read_unlock() calls have become rcu_read_lock()
+and rcu_read_unlock(), respectively, and the list_for_each_entry() has
+become list_for_each_entry_rcu().  The _rcu() list-traversal primitives
+insert the read-side memory barriers that are required on DEC Alpha CPUs.
+
+The changes to the update side are also straightforward.  A reader-writer
+lock might be used as follows for deletion and insertion::
+
+	static inline int audit_del_rule(struct audit_rule *rule,
+					 struct list_head *list)
+	{
+		struct audit_entry  *e;
+
+		write_lock(&auditsc_lock);
+		list_for_each_entry(e, list, list) {
+			if (!audit_compare_rule(rule, &e->rule)) {
+				list_del(&e->list);
+				write_unlock(&auditsc_lock);
+				return 0;
+			}
+		}
+		write_unlock(&auditsc_lock);
+		return -EFAULT;		/* No matching rule */
+	}
+
+	static inline int audit_add_rule(struct audit_entry *entry,
+					 struct list_head *list)
+	{
+		write_lock(&auditsc_lock);
+		if (entry->rule.flags & AUDIT_PREPEND) {
+			entry->rule.flags &= ~AUDIT_PREPEND;
+			list_add(&entry->list, list);
+		} else {
+			list_add_tail(&entry->list, list);
+		}
+		write_unlock(&auditsc_lock);
+		return 0;
+	}
+
+Following are the RCU equivalents for these two functions::
+
+	static inline int audit_del_rule(struct audit_rule *rule,
+					 struct list_head *list)
+	{
+		struct audit_entry  *e;
+
+		/* Do not use the _rcu iterator here, since this is the only
+		 * deletion routine. */
+		list_for_each_entry(e, list, list) {
+			if (!audit_compare_rule(rule, &e->rule)) {
+				list_del_rcu(&e->list);
+				call_rcu(&e->rcu, audit_free_rule);
+				return 0;
+			}
+		}
+		return -EFAULT;		/* No matching rule */
+	}
+
+	static inline int audit_add_rule(struct audit_entry *entry,
+					 struct list_head *list)
+	{
+		if (entry->rule.flags & AUDIT_PREPEND) {
+			entry->rule.flags &= ~AUDIT_PREPEND;
+			list_add_rcu(&entry->list, list);
+		} else {
+			list_add_tail_rcu(&entry->list, list);
+		}
+		return 0;
+	}
+
+Normally, the write_lock() and write_unlock() would be replaced by
+a spin_lock() and a spin_unlock(), but in this case, all callers hold
+audit_netlink_sem, so no additional locking is required.  The auditsc_lock
+can therefore be eliminated, since use of RCU eliminates the need for
+writers to exclude readers.  Normally, the write_lock() calls would
+be converted into spin_lock() calls.
+
+The list_del(), list_add(), and list_add_tail() primitives have been
+replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
+The _rcu() list-manipulation primitives add memory barriers that are
+needed on weakly ordered CPUs (most of them!).  The list_del_rcu()
+primitive omits the pointer poisoning debug-assist code that would
+otherwise cause concurrent readers to fail spectacularly.
+
+So, when readers can tolerate stale data and when entries are either added
+or deleted, without in-place modification, it is very easy to use RCU!
+
+Example 2: Handling In-Place Updates
+------------------------------------
+
+The system-call auditing code does not update auditing rules in place.
+However, if it did, reader-writer-locked code to do so might look as
+follows (presumably, the field_count is only permitted to decrease,
+otherwise, the added fields would need to be filled in)::
+
+	static inline int audit_upd_rule(struct audit_rule *rule,
+					 struct list_head *list,
+					 __u32 newaction,
+					 __u32 newfield_count)
+	{
+		struct audit_entry  *e;
+		struct audit_newentry *ne;
+
+		write_lock(&auditsc_lock);
+		/* Note: audit_netlink_sem held by caller. */
+		list_for_each_entry(e, list, list) {
+			if (!audit_compare_rule(rule, &e->rule)) {
+				e->rule.action = newaction;
+				e->rule.file_count = newfield_count;
+				write_unlock(&auditsc_lock);
+				return 0;
+			}
+		}
+		write_unlock(&auditsc_lock);
+		return -EFAULT;		/* No matching rule */
+	}
+
+The RCU version creates a copy, updates the copy, then replaces the old
+entry with the newly updated entry.  This sequence of actions, allowing
+concurrent reads while doing a copy to perform an update, is what gives
+RCU ("read-copy update") its name.  The RCU code is as follows::
+
+	static inline int audit_upd_rule(struct audit_rule *rule,
+					 struct list_head *list,
+					 __u32 newaction,
+					 __u32 newfield_count)
+	{
+		struct audit_entry  *e;
+		struct audit_newentry *ne;
+
+		list_for_each_entry(e, list, list) {
+			if (!audit_compare_rule(rule, &e->rule)) {
+				ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
+				if (ne == NULL)
+					return -ENOMEM;
+				audit_copy_rule(&ne->rule, &e->rule);
+				ne->rule.action = newaction;
+				ne->rule.file_count = newfield_count;
+				list_replace_rcu(&e->list, &ne->list);
+				call_rcu(&e->rcu, audit_free_rule);
+				return 0;
+			}
+		}
+		return -EFAULT;		/* No matching rule */
+	}
+
+Again, this assumes that the caller holds audit_netlink_sem.  Normally,
+the reader-writer lock would become a spinlock in this sort of code.
+
+Example 3: Eliminating Stale Data
+---------------------------------
+
+The auditing examples above tolerate stale data, as do most algorithms
+that are tracking external state.  Because there is a delay from the
+time the external state changes before Linux becomes aware of the change,
+additional RCU-induced staleness is normally not a problem.
+
+However, there are many examples where stale data cannot be tolerated.
+One example in the Linux kernel is the System V IPC (see the ipc_lock()
+function in ipc/util.c).  This code checks a "deleted" flag under a
+per-entry spinlock, and, if the "deleted" flag is set, pretends that the
+entry does not exist.  For this to be helpful, the search function must
+return holding the per-entry spinlock, as ipc_lock() does in fact do.
+
+Quick Quiz:
+	Why does the search function need to return holding the per-entry lock for
+	this deleted-flag technique to be helpful?
+
+:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
+
+If the system-call audit module were to ever need to reject stale data,
+one way to accomplish this would be to add a "deleted" flag and a "lock"
+spinlock to the audit_entry structure, and modify audit_filter_task()
+as follows::
+
+	static enum audit_state audit_filter_task(struct task_struct *tsk)
+	{
+		struct audit_entry *e;
+		enum audit_state   state;
+
+		rcu_read_lock();
+		list_for_each_entry_rcu(e, &audit_tsklist, list) {
+			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+				spin_lock(&e->lock);
+				if (e->deleted) {
+					spin_unlock(&e->lock);
+					rcu_read_unlock();
+					return AUDIT_BUILD_CONTEXT;
+				}
+				rcu_read_unlock();
+				return state;
+			}
+		}
+		rcu_read_unlock();
+		return AUDIT_BUILD_CONTEXT;
+	}
+
+Note that this example assumes that entries are only added and deleted.
+Additional mechanism is required to deal correctly with the
+update-in-place performed by audit_upd_rule().  For one thing,
+audit_upd_rule() would need additional memory barriers to ensure
+that the list_add_rcu() was really executed before the list_del_rcu().
+
+The audit_del_rule() function would need to set the "deleted"
+flag under the spinlock as follows::
+
+	static inline int audit_del_rule(struct audit_rule *rule,
+					 struct list_head *list)
+	{
+		struct audit_entry  *e;
+
+		/* Do not need to use the _rcu iterator here, since this
+		 * is the only deletion routine. */
+		list_for_each_entry(e, list, list) {
+			if (!audit_compare_rule(rule, &e->rule)) {
+				spin_lock(&e->lock);
+				list_del_rcu(&e->list);
+				e->deleted = 1;
+				spin_unlock(&e->lock);
+				call_rcu(&e->rcu, audit_free_rule);
+				return 0;
+			}
+		}
+		return -EFAULT;		/* No matching rule */
+	}
+
+Summary
+-------
+
+Read-mostly list-based data structures that can tolerate stale data are
+the most amenable to use of RCU.  The simplest case is where entries are
+either added or deleted from the data structure (or atomically modified
+in place), but non-atomic in-place modifications can be handled by making
+a copy, updating the copy, then replacing the original with the copy.
+If stale data cannot be tolerated, then a "deleted" flag may be used
+in conjunction with a per-entry spinlock in order to allow the search
+function to reject newly deleted data.
+
+.. _answer_quick_quiz_list:
+
+Answer to Quick Quiz:
+	Why does the search function need to return holding the per-entry
+	lock for this deleted-flag technique to be helpful?
+
+	If the search function drops the per-entry lock before returning,
+	then the caller will be processing stale data in any case.  If it
+	is really OK to be processing stale data, then you don't need a
+	"deleted" flag.  If processing stale data really is a problem,
+	then you need to hold the per-entry lock across all of the code
+	that uses the value that was returned.
diff --git a/Documentation/RCU/listRCU.txt b/Documentation/RCU/listRCU.txt
deleted file mode 100644
index 7956ff33042b..000000000000
--- a/Documentation/RCU/listRCU.txt
+++ /dev/null
@@ -1,321 +0,0 @@
-.. _list_rcu_doc:
-
-Using RCU to Protect Read-Mostly Linked Lists
-=============================================
-
-One of the best applications of RCU is to protect read-mostly linked lists
-("struct list_head" in list.h).  One big advantage of this approach
-is that all of the required memory barriers are included for you in
-the list macros.  This document describes several applications of RCU,
-with the best fits first.
-
-Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
-----------------------------------------------------------------------
-
-The best applications are cases where, if reader-writer locking were
-used, the read-side lock would be dropped before taking any action
-based on the results of the search.  The most celebrated example is
-the routing table.  Because the routing table is tracking the state of
-equipment outside of the computer, it will at times contain stale data.
-Therefore, once the route has been computed, there is no need to hold
-the routing table static during transmission of the packet.  After all,
-you can hold the routing table static all you want, but that won't keep
-the external Internet from changing, and it is the state of the external
-Internet that really matters.  In addition, routing entries are typically
-added or deleted, rather than being modified in place.
-
-A straightforward example of this use of RCU may be found in the
-system-call auditing support.  For example, a reader-writer locked
-implementation of audit_filter_task() might be as follows::
-
-	static enum audit_state audit_filter_task(struct task_struct *tsk)
-	{
-		struct audit_entry *e;
-		enum audit_state   state;
-
-		read_lock(&auditsc_lock);
-		/* Note: audit_netlink_sem held by caller. */
-		list_for_each_entry(e, &audit_tsklist, list) {
-			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
-				read_unlock(&auditsc_lock);
-				return state;
-			}
-		}
-		read_unlock(&auditsc_lock);
-		return AUDIT_BUILD_CONTEXT;
-	}
-
-Here the list is searched under the lock, but the lock is dropped before
-the corresponding value is returned.  By the time that this value is acted
-on, the list may well have been modified.  This makes sense, since if
-you are turning auditing off, it is OK to audit a few extra system calls.
-
-This means that RCU can be easily applied to the read side, as follows::
-
-	static enum audit_state audit_filter_task(struct task_struct *tsk)
-	{
-		struct audit_entry *e;
-		enum audit_state   state;
-
-		rcu_read_lock();
-		/* Note: audit_netlink_sem held by caller. */
-		list_for_each_entry_rcu(e, &audit_tsklist, list) {
-			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
-				rcu_read_unlock();
-				return state;
-			}
-		}
-		rcu_read_unlock();
-		return AUDIT_BUILD_CONTEXT;
-	}
-
-The read_lock() and read_unlock() calls have become rcu_read_lock()
-and rcu_read_unlock(), respectively, and the list_for_each_entry() has
-become list_for_each_entry_rcu().  The _rcu() list-traversal primitives
-insert the read-side memory barriers that are required on DEC Alpha CPUs.
-
-The changes to the update side are also straightforward.  A reader-writer
-lock might be used as follows for deletion and insertion::
-
-	static inline int audit_del_rule(struct audit_rule *rule,
-					 struct list_head *list)
-	{
-		struct audit_entry  *e;
-
-		write_lock(&auditsc_lock);
-		list_for_each_entry(e, list, list) {
-			if (!audit_compare_rule(rule, &e->rule)) {
-				list_del(&e->list);
-				write_unlock(&auditsc_lock);
-				return 0;
-			}
-		}
-		write_unlock(&auditsc_lock);
-		return -EFAULT;		/* No matching rule */
-	}
-
-	static inline int audit_add_rule(struct audit_entry *entry,
-					 struct list_head *list)
-	{
-		write_lock(&auditsc_lock);
-		if (entry->rule.flags & AUDIT_PREPEND) {
-			entry->rule.flags &= ~AUDIT_PREPEND;
-			list_add(&entry->list, list);
-		} else {
-			list_add_tail(&entry->list, list);
-		}
-		write_unlock(&auditsc_lock);
-		return 0;
-	}
-
-Following are the RCU equivalents for these two functions::
-
-	static inline int audit_del_rule(struct audit_rule *rule,
-					 struct list_head *list)
-	{
-		struct audit_entry  *e;
-
-		/* Do not use the _rcu iterator here, since this is the only
-		 * deletion routine. */
-		list_for_each_entry(e, list, list) {
-			if (!audit_compare_rule(rule, &e->rule)) {
-				list_del_rcu(&e->list);
-				call_rcu(&e->rcu, audit_free_rule);
-				return 0;
-			}
-		}
-		return -EFAULT;		/* No matching rule */
-	}
-
-	static inline int audit_add_rule(struct audit_entry *entry,
-					 struct list_head *list)
-	{
-		if (entry->rule.flags & AUDIT_PREPEND) {
-			entry->rule.flags &= ~AUDIT_PREPEND;
-			list_add_rcu(&entry->list, list);
-		} else {
-			list_add_tail_rcu(&entry->list, list);
-		}
-		return 0;
-	}
-
-Normally, the write_lock() and write_unlock() would be replaced by
-a spin_lock() and a spin_unlock(), but in this case, all callers hold
-audit_netlink_sem, so no additional locking is required.  The auditsc_lock
-can therefore be eliminated, since use of RCU eliminates the need for
-writers to exclude readers.  Normally, the write_lock() calls would
-be converted into spin_lock() calls.
-
-The list_del(), list_add(), and list_add_tail() primitives have been
-replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
-The _rcu() list-manipulation primitives add memory barriers that are
-needed on weakly ordered CPUs (most of them!).  The list_del_rcu()
-primitive omits the pointer poisoning debug-assist code that would
-otherwise cause concurrent readers to fail spectacularly.
-
-So, when readers can tolerate stale data and when entries are either added
-or deleted, without in-place modification, it is very easy to use RCU!
-
-Example 2: Handling In-Place Updates
-------------------------------------
-
-The system-call auditing code does not update auditing rules in place.
-However, if it did, reader-writer-locked code to do so might look as
-follows (presumably, the field_count is only permitted to decrease,
-otherwise, the added fields would need to be filled in)::
-
-	static inline int audit_upd_rule(struct audit_rule *rule,
-					 struct list_head *list,
-					 __u32 newaction,
-					 __u32 newfield_count)
-	{
-		struct audit_entry  *e;
-		struct audit_newentry *ne;
-
-		write_lock(&auditsc_lock);
-		/* Note: audit_netlink_sem held by caller. */
-		list_for_each_entry(e, list, list) {
-			if (!audit_compare_rule(rule, &e->rule)) {
-				e->rule.action = newaction;
-				e->rule.file_count = newfield_count;
-				write_unlock(&auditsc_lock);
-				return 0;
-			}
-		}
-		write_unlock(&auditsc_lock);
-		return -EFAULT;		/* No matching rule */
-	}
-
-The RCU version creates a copy, updates the copy, then replaces the old
-entry with the newly updated entry.  This sequence of actions, allowing
-concurrent reads while doing a copy to perform an update, is what gives
-RCU ("read-copy update") its name.  The RCU code is as follows::
-
-	static inline int audit_upd_rule(struct audit_rule *rule,
-					 struct list_head *list,
-					 __u32 newaction,
-					 __u32 newfield_count)
-	{
-		struct audit_entry  *e;
-		struct audit_newentry *ne;
-
-		list_for_each_entry(e, list, list) {
-			if (!audit_compare_rule(rule, &e->rule)) {
-				ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
-				if (ne == NULL)
-					return -ENOMEM;
-				audit_copy_rule(&ne->rule, &e->rule);
-				ne->rule.action = newaction;
-				ne->rule.file_count = newfield_count;
-				list_replace_rcu(&e->list, &ne->list);
-				call_rcu(&e->rcu, audit_free_rule);
-				return 0;
-			}
-		}
-		return -EFAULT;		/* No matching rule */
-	}
-
-Again, this assumes that the caller holds audit_netlink_sem.  Normally,
-the reader-writer lock would become a spinlock in this sort of code.
-
-Example 3: Eliminating Stale Data
----------------------------------
-
-The auditing examples above tolerate stale data, as do most algorithms
-that are tracking external state.  Because there is a delay from the
-time the external state changes before Linux becomes aware of the change,
-additional RCU-induced staleness is normally not a problem.
-
-However, there are many examples where stale data cannot be tolerated.
-One example in the Linux kernel is the System V IPC (see the ipc_lock()
-function in ipc/util.c).  This code checks a "deleted" flag under a
-per-entry spinlock, and, if the "deleted" flag is set, pretends that the
-entry does not exist.  For this to be helpful, the search function must
-return holding the per-entry spinlock, as ipc_lock() does in fact do.
-
-Quick Quiz:
-	Why does the search function need to return holding the per-entry lock for
-	this deleted-flag technique to be helpful?
-
-:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
-
-If the system-call audit module were to ever need to reject stale data,
-one way to accomplish this would be to add a "deleted" flag and a "lock"
-spinlock to the audit_entry structure, and modify audit_filter_task()
-as follows::
-
-	static enum audit_state audit_filter_task(struct task_struct *tsk)
-	{
-		struct audit_entry *e;
-		enum audit_state   state;
-
-		rcu_read_lock();
-		list_for_each_entry_rcu(e, &audit_tsklist, list) {
-			if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
-				spin_lock(&e->lock);
-				if (e->deleted) {
-					spin_unlock(&e->lock);
-					rcu_read_unlock();
-					return AUDIT_BUILD_CONTEXT;
-				}
-				rcu_read_unlock();
-				return state;
-			}
-		}
-		rcu_read_unlock();
-		return AUDIT_BUILD_CONTEXT;
-	}
-
-Note that this example assumes that entries are only added and deleted.
-Additional mechanism is required to deal correctly with the
-update-in-place performed by audit_upd_rule().  For one thing,
-audit_upd_rule() would need additional memory barriers to ensure
-that the list_add_rcu() was really executed before the list_del_rcu().
-
-The audit_del_rule() function would need to set the "deleted"
-flag under the spinlock as follows::
-
-	static inline int audit_del_rule(struct audit_rule *rule,
-					 struct list_head *list)
-	{
-		struct audit_entry  *e;
-
-		/* Do not need to use the _rcu iterator here, since this
-		 * is the only deletion routine. */
-		list_for_each_entry(e, list, list) {
-			if (!audit_compare_rule(rule, &e->rule)) {
-				spin_lock(&e->lock);
-				list_del_rcu(&e->list);
-				e->deleted = 1;
-				spin_unlock(&e->lock);
-				call_rcu(&e->rcu, audit_free_rule);
-				return 0;
-			}
-		}
-		return -EFAULT;		/* No matching rule */
-	}
-
-Summary
--------
-
-Read-mostly list-based data structures that can tolerate stale data are
-the most amenable to use of RCU.  The simplest case is where entries are
-either added or deleted from the data structure (or atomically modified
-in place), but non-atomic in-place modifications can be handled by making
-a copy, updating the copy, then replacing the original with the copy.
-If stale data cannot be tolerated, then a "deleted" flag may be used
-in conjunction with a per-entry spinlock in order to allow the search
-function to reject newly deleted data.
-
-.. _answer_quick_quiz_list:
-
-Answer to Quick Quiz:
-	Why does the search function need to return holding the per-entry
-	lock for this deleted-flag technique to be helpful?
-
-	If the search function drops the per-entry lock before returning,
-	then the caller will be processing stale data in any case.  If it
-	is really OK to be processing stale data, then you don't need a
-	"deleted" flag.  If processing stale data really is a problem,
-	then you need to hold the per-entry lock across all of the code
-	that uses the value that was returned.
diff --git a/Documentation/RCU/rcu.rst b/Documentation/RCU/rcu.rst
new file mode 100644
index 000000000000..8dfb437dacc3
--- /dev/null
+++ b/Documentation/RCU/rcu.rst
@@ -0,0 +1,92 @@
+.. _rcu_doc:
+
+RCU Concepts
+============
+
+The basic idea behind RCU (read-copy update) is to split destructive
+operations into two parts, one that prevents anyone from seeing the data
+item being destroyed, and one that actually carries out the destruction.
+A "grace period" must elapse between the two parts, and this grace period
+must be long enough that any readers accessing the item being deleted have
+since dropped their references.  For example, an RCU-protected deletion
+from a linked list would first remove the item from the list, wait for
+a grace period to elapse, then free the element.  See the
+Documentation/RCU/listRCU.rst file for more information on using RCU with
+linked lists.
+
+Frequently Asked Questions
+--------------------------
+
+- Why would anyone want to use RCU?
+
+  The advantage of RCU's two-part approach is that RCU readers need
+  not acquire any locks, perform any atomic instructions, write to
+  shared memory, or (on CPUs other than Alpha) execute any memory
+  barriers.  The fact that these operations are quite expensive
+  on modern CPUs is what gives RCU its performance advantages
+  in read-mostly situations.  The fact that RCU readers need not
+  acquire locks can also greatly simplify deadlock-avoidance code.
+
+- How can the updater tell when a grace period has completed
+  if the RCU readers give no indication when they are done?
+
+  Just as with spinlocks, RCU readers are not permitted to
+  block, switch to user-mode execution, or enter the idle loop.
+  Therefore, as soon as a CPU is seen passing through any of these
+  three states, we know that that CPU has exited any previous RCU
+  read-side critical sections.  So, if we remove an item from a
+  linked list, and then wait until all CPUs have switched context,
+  executed in user mode, or executed in the idle loop, we can
+  safely free up that item.
+
+  Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
+  same effect, but require that the readers manipulate CPU-local
+  counters.  These counters allow limited types of blocking within
+  RCU read-side critical sections.  SRCU also uses CPU-local
+  counters, and permits general blocking within RCU read-side
+  critical sections.  These variants of RCU detect grace periods
+  by sampling these counters.
+
+- If I am running on a uniprocessor kernel, which can only do one
+  thing at a time, why should I wait for a grace period?
+
+  See the Documentation/RCU/UP.rst file for more information.
+
+- How can I see where RCU is currently used in the Linux kernel?
+
+  Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
+  "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
+  "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
+  "synchronize_srcu", and the other RCU primitives.  Or grab one
+  of the cscope databases from:
+
+  (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
+
+- What guidelines should I follow when writing code that uses RCU?
+
+  See the checklist.txt file in this directory.
+
+- Why the name "RCU"?
+
+  "RCU" stands for "read-copy update".  The file Documentation/RCU/listRCU.rst
+  has more information on where this name came from, search for
+  "read-copy update" to find it.
+
+- I hear that RCU is patented?  What is with that?
+
+  Yes, it is.  There are several known patents related to RCU,
+  search for the string "Patent" in RTFP.txt to find them.
+  Of these, one was allowed to lapse by the assignee, and the
+  others have been contributed to the Linux kernel under GPL.
+  There are now also LGPL implementations of user-level RCU
+  available (http://liburcu.org/).
+
+- I hear that RCU needs work in order to support realtime kernels?
+
+  Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
+  kernel configuration parameter.
+
+- Where can I find more information on RCU?
+
+  See the RTFP.txt file in this directory.
+  Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
deleted file mode 100644
index 8dfb437dacc3..000000000000
--- a/Documentation/RCU/rcu.txt
+++ /dev/null
@@ -1,92 +0,0 @@
-.. _rcu_doc:
-
-RCU Concepts
-============
-
-The basic idea behind RCU (read-copy update) is to split destructive
-operations into two parts, one that prevents anyone from seeing the data
-item being destroyed, and one that actually carries out the destruction.
-A "grace period" must elapse between the two parts, and this grace period
-must be long enough that any readers accessing the item being deleted have
-since dropped their references.  For example, an RCU-protected deletion
-from a linked list would first remove the item from the list, wait for
-a grace period to elapse, then free the element.  See the
-Documentation/RCU/listRCU.rst file for more information on using RCU with
-linked lists.
-
-Frequently Asked Questions
---------------------------
-
-- Why would anyone want to use RCU?
-
-  The advantage of RCU's two-part approach is that RCU readers need
-  not acquire any locks, perform any atomic instructions, write to
-  shared memory, or (on CPUs other than Alpha) execute any memory
-  barriers.  The fact that these operations are quite expensive
-  on modern CPUs is what gives RCU its performance advantages
-  in read-mostly situations.  The fact that RCU readers need not
-  acquire locks can also greatly simplify deadlock-avoidance code.
-
-- How can the updater tell when a grace period has completed
-  if the RCU readers give no indication when they are done?
-
-  Just as with spinlocks, RCU readers are not permitted to
-  block, switch to user-mode execution, or enter the idle loop.
-  Therefore, as soon as a CPU is seen passing through any of these
-  three states, we know that that CPU has exited any previous RCU
-  read-side critical sections.  So, if we remove an item from a
-  linked list, and then wait until all CPUs have switched context,
-  executed in user mode, or executed in the idle loop, we can
-  safely free up that item.
-
-  Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
-  same effect, but require that the readers manipulate CPU-local
-  counters.  These counters allow limited types of blocking within
-  RCU read-side critical sections.  SRCU also uses CPU-local
-  counters, and permits general blocking within RCU read-side
-  critical sections.  These variants of RCU detect grace periods
-  by sampling these counters.
-
-- If I am running on a uniprocessor kernel, which can only do one
-  thing at a time, why should I wait for a grace period?
-
-  See the Documentation/RCU/UP.rst file for more information.
-
-- How can I see where RCU is currently used in the Linux kernel?
-
-  Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
-  "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
-  "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
-  "synchronize_srcu", and the other RCU primitives.  Or grab one
-  of the cscope databases from:
-
-  (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
-
-- What guidelines should I follow when writing code that uses RCU?
-
-  See the checklist.txt file in this directory.
-
-- Why the name "RCU"?
-
-  "RCU" stands for "read-copy update".  The file Documentation/RCU/listRCU.rst
-  has more information on where this name came from, search for
-  "read-copy update" to find it.
-
-- I hear that RCU is patented?  What is with that?
-
-  Yes, it is.  There are several known patents related to RCU,
-  search for the string "Patent" in RTFP.txt to find them.
-  Of these, one was allowed to lapse by the assignee, and the
-  others have been contributed to the Linux kernel under GPL.
-  There are now also LGPL implementations of user-level RCU
-  available (http://liburcu.org/).
-
-- I hear that RCU needs work in order to support realtime kernels?
-
-  Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
-  kernel configuration parameter.
-
-- Where can I find more information on RCU?
-
-  See the RTFP.txt file in this directory.
-  Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
-- 
cgit v1.2.3-59-g8ed1b


From c0e679b4a180f7b9f2cee41c2781bb6af29f7755 Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Wed, 26 Jun 2019 15:07:05 -0500
Subject: Documentation: RCU: Add TOC tree hooks

Add TOC tree hooks for:
  - rcu
  - listRCU
  - UP

Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/index.rst | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/RCU/index.rst

diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst
new file mode 100644
index 000000000000..340a9725676c
--- /dev/null
+++ b/Documentation/RCU/index.rst
@@ -0,0 +1,19 @@
+.. _rcu_concepts:
+
+============
+RCU concepts
+============
+
+.. toctree::
+   :maxdepth: 1
+
+   rcu
+   listRCU
+   UP
+
+.. only:: subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
-- 
cgit v1.2.3-59-g8ed1b


From 772626ecd2cd5b930fa03b4787ddf51ccf819229 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Thu, 27 Jun 2019 08:35:26 -0600
Subject: Add the RCU docs to the core-api manual

We should really move the RCU directory there as well, but that can wait
for another day.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/core-api/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 2466a4c51031..322ac954b390 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -35,6 +35,7 @@ Core utilities
    boot-time-mm
    memory-hotplug
    protection-keys
+   ../RCU/index
 
 
 Interfaces for kernel debugging
-- 
cgit v1.2.3-59-g8ed1b


From 49872a1cfceae38a98ee690e6c860d1cf628364e Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Fri, 28 Jun 2019 09:12:31 -0300
Subject: platform: x86: get rid of a non-existent document

Changeset 163ede97a9a2 ("Documentation: platform: Delete x86-laptop-drivers.txt")
removed the x86-laptop-drivers.txt file, but forgot to update its
Kconfig.

Fixes: 163ede97a9a2 ("Documentation: platform: Delete x86-laptop-drivers.txt")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 drivers/platform/x86/Kconfig | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 5d5cc6111081..b7e5cee2aa26 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -433,9 +433,6 @@ config COMPAL_LAPTOP
 	  It adds support for rfkill, Bluetooth, WLAN, LCD brightness, hwmon
 	  and battery charging level control.
 
-	  For a (possibly incomplete) list of supported laptops, please refer
-	  to: Documentation/platform/x86-laptop-drivers.txt
-
 config SONY_LAPTOP
 	tristate "Sony Laptop Extras"
 	depends on ACPI
-- 
cgit v1.2.3-59-g8ed1b


From 9159ba14285c5432063a0ad83e50afb95674d9b1 Mon Sep 17 00:00:00 2001
From: Sheriff Esseson <sheriffesseson@gmail.com>
Date: Fri, 28 Jun 2019 07:20:01 +0100
Subject: Doc : doc-guide : Fix a typo

fix the disjunction by replacing "of" with "or".

Signed-off-by: Sheriff Esseson <sheriffesseson@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/doc-guide/kernel-doc.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst
index f96059767c8c..192c36af39e2 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -359,7 +359,7 @@ Domain`_ references.
   ``monospaced font``.
 
   Useful if you need to use special characters that would otherwise have some
-  meaning either by kernel-doc script of by reStructuredText.
+  meaning either by kernel-doc script or by reStructuredText.
 
   This is particularly useful if you need to use things like ``%ph`` inside
   a function description.
-- 
cgit v1.2.3-59-g8ed1b


From 62ee81b5681daa781f5e800346ae8654b3e5a864 Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Thu, 27 Jun 2019 15:59:38 +0200
Subject: docs: format kernel-parameters -- as code

The current ReStructuredText formatting results in "--", used to
indicate the end of the kernel command-line parameters, appearing as
an en-dash instead of two hyphens; this patch formats them as code,
"``--``", as done elsewhere in the documentation.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/kernel-parameters.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 8d3273e32eb1..5d29ba5ad88c 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -9,11 +9,11 @@ and sorted into English Dictionary order (defined as ignoring all
 punctuation and sorting digits before letters in a case insensitive
 manner), and with descriptions where known.
 
-The kernel parses parameters from the kernel command line up to "--";
+The kernel parses parameters from the kernel command line up to "``--``";
 if it doesn't recognize a parameter and it doesn't contain a '.', the
 parameter gets passed to init: parameters with '=' go into init's
 environment, others are passed as command line arguments to init.
-Everything after "--" is passed as an argument to init.
+Everything after "``--``" is passed as an argument to init.
 
 Module parameters can be specified in two ways: via the kernel command
 line with a module name prefix, or via modprobe, e.g.::
-- 
cgit v1.2.3-59-g8ed1b


From acb6258acc4fbb76449eec6d0c7ca25254671e31 Mon Sep 17 00:00:00 2001
From: Jiunn Chang <c0d1n61at3@gmail.com>
Date: Thu, 27 Jun 2019 16:01:47 -0500
Subject: doc: RCU callback locks need only _bh, not necessarily _irq

The UP.rst file calls for locks acquired within RCU callback functions
to use _irq variants (spin_lock_irqsave() or similar), which does work,
but can be overkill.  This commit therefore instead calls for _bh variants
(spin_lock_bh() or similar), while noting that _irq does work.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/RCU/UP.rst | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
index 67715a47ae89..e26dda27430c 100644
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@@ -113,12 +113,13 @@ Answer to Quick Quiz #1:
 Answer to Quick Quiz #2:
 	What locking restriction must RCU callbacks respect?
 
-	Any lock that is acquired within an RCU callback must be
-	acquired elsewhere using an _irq variant of the spinlock
-	primitive.  For example, if "mylock" is acquired by an
-	RCU callback, then a process-context acquisition of this
-	lock must use something like spin_lock_irqsave() to
-	acquire the lock.
+	Any lock that is acquired within an RCU callback must be acquired
+	elsewhere using an _bh variant of the spinlock primitive.
+	For example, if "mylock" is acquired by an RCU callback, then
+	a process-context acquisition of this lock must use something
+	like spin_lock_bh() to acquire the lock.  Please note that
+	it is also OK to use _irq variants of spinlocks, for example,
+	spin_lock_irqsave().
 
 	If the process-context code were to simply use spin_lock(),
 	then, since RCU callbacks can be invoked from softirq context,
-- 
cgit v1.2.3-59-g8ed1b


From 7282a93f4df586cac84a81c37f38cccec2e1d8bb Mon Sep 17 00:00:00 2001
From: Stephen Kitt <steve@sk2.org>
Date: Fri, 28 Jun 2019 20:38:41 +0200
Subject: Disable Sphinx SmartyPants in HTML output

The handling of dashes in particular results in confusing
documentation in a number of instances, since "--" becomes an
en-dash. This disables SmartyPants wholesale, losing smart quotes
along with smart dashes.

With Sphinx 1.6 we could fine-tune the conversion, using the new
smartquotes and smartquotes_action settings.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/conf.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/conf.py b/Documentation/conf.py
index a502baecbb85..3b2397bcb565 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -201,7 +201,7 @@ html_context = {
 
 # If true, SmartyPants will be used to convert quotes and dashes to
 # typographically correct entities.
-#html_use_smartypants = True
+html_use_smartypants = False
 
 # Custom sidebar templates, maps document names to template names.
 #html_sidebars = {}
-- 
cgit v1.2.3-59-g8ed1b


From 66f2a122c68d8f13e5db978b6b7571aaf0e53a19 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Tue, 2 Jul 2019 13:54:38 -0400
Subject: docs: Move binderfs to admin-guide

The documentation is more appropriate for the administrator than for
the internal kernel API section it is currently in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/binderfs.rst | 68 ++++++++++++++++++++++++++++++++++
 Documentation/admin-guide/index.rst    |  1 +
 Documentation/filesystems/binderfs.rst | 68 ----------------------------------
 Documentation/filesystems/index.rst    | 10 -----
 4 files changed, 69 insertions(+), 78 deletions(-)
 create mode 100644 Documentation/admin-guide/binderfs.rst
 delete mode 100644 Documentation/filesystems/binderfs.rst

diff --git a/Documentation/admin-guide/binderfs.rst b/Documentation/admin-guide/binderfs.rst
new file mode 100644
index 000000000000..c009671f8434
--- /dev/null
+++ b/Documentation/admin-guide/binderfs.rst
@@ -0,0 +1,68 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Android binderfs Filesystem
+===============================
+
+Android binderfs is a filesystem for the Android binder IPC mechanism.  It
+allows to dynamically add and remove binder devices at runtime.  Binder devices
+located in a new binderfs instance are independent of binder devices located in
+other binderfs instances.  Mounting a new binderfs instance makes it possible
+to get a set of private binder devices.
+
+Mounting binderfs
+-----------------
+
+Android binderfs can be mounted with::
+
+  mkdir /dev/binderfs
+  mount -t binder binder /dev/binderfs
+
+at which point a new instance of binderfs will show up at ``/dev/binderfs``.
+In a fresh instance of binderfs no binder devices will be present.  There will
+only be a ``binder-control`` device which serves as the request handler for
+binderfs. Mounting another binderfs instance at a different location will
+create a new and separate instance from all other binderfs mounts.  This is
+identical to the behavior of e.g. ``devpts`` and ``tmpfs``. The Android
+binderfs filesystem can be mounted in user namespaces.
+
+Options
+-------
+max
+  binderfs instances can be mounted with a limit on the number of binder
+  devices that can be allocated. The ``max=<count>`` mount option serves as
+  a per-instance limit. If ``max=<count>`` is set then only ``<count>`` number
+  of binder devices can be allocated in this binderfs instance.
+
+Allocating binder Devices
+-------------------------
+
+.. _ioctl: http://man7.org/linux/man-pages/man2/ioctl.2.html
+
+To allocate a new binder device in a binderfs instance a request needs to be
+sent through the ``binder-control`` device node.  A request is sent in the form
+of an `ioctl() <ioctl_>`_.
+
+What a program needs to do is to open the ``binder-control`` device node and
+send a ``BINDER_CTL_ADD`` request to the kernel.  Users of binderfs need to
+tell the kernel which name the new binder device should get.  By default a name
+can only contain up to ``BINDERFS_MAX_NAME`` chars including the terminating
+zero byte.
+
+Once the request is made via an `ioctl() <ioctl_>`_ passing a ``struct
+binder_device`` with the name to the kernel it will allocate a new binder
+device and return the major and minor number of the new device in the struct
+(This is necessary because binderfs allocates a major device number
+dynamically.).  After the `ioctl() <ioctl_>`_ returns there will be a new
+binder device located under /dev/binderfs with the chosen name.
+
+Deleting binder Devices
+-----------------------
+
+.. _unlink: http://man7.org/linux/man-pages/man2/unlink.2.html
+.. _rm: http://man7.org/linux/man-pages/man1/rm.1.html
+
+Binderfs binder devices can be deleted via `unlink() <unlink_>`_.  This means
+that the `rm() <rm_>`_ tool can be used to delete them. Note that the
+``binder-control`` device cannot be deleted since this would make the binderfs
+instance unuseable.  The ``binder-control`` device will be deleted when the
+binderfs instance is unmounted and all references to it have been dropped.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 8001917ee012..24fbe0568eff 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -70,6 +70,7 @@ configure specific aspects of kernel behavior to your liking.
    ras
    bcache
    ext4
+   binderfs
    pm/index
    thunderbolt
    LSM/index
diff --git a/Documentation/filesystems/binderfs.rst b/Documentation/filesystems/binderfs.rst
deleted file mode 100644
index c009671f8434..000000000000
--- a/Documentation/filesystems/binderfs.rst
+++ /dev/null
@@ -1,68 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-The Android binderfs Filesystem
-===============================
-
-Android binderfs is a filesystem for the Android binder IPC mechanism.  It
-allows to dynamically add and remove binder devices at runtime.  Binder devices
-located in a new binderfs instance are independent of binder devices located in
-other binderfs instances.  Mounting a new binderfs instance makes it possible
-to get a set of private binder devices.
-
-Mounting binderfs
------------------
-
-Android binderfs can be mounted with::
-
-  mkdir /dev/binderfs
-  mount -t binder binder /dev/binderfs
-
-at which point a new instance of binderfs will show up at ``/dev/binderfs``.
-In a fresh instance of binderfs no binder devices will be present.  There will
-only be a ``binder-control`` device which serves as the request handler for
-binderfs. Mounting another binderfs instance at a different location will
-create a new and separate instance from all other binderfs mounts.  This is
-identical to the behavior of e.g. ``devpts`` and ``tmpfs``. The Android
-binderfs filesystem can be mounted in user namespaces.
-
-Options
--------
-max
-  binderfs instances can be mounted with a limit on the number of binder
-  devices that can be allocated. The ``max=<count>`` mount option serves as
-  a per-instance limit. If ``max=<count>`` is set then only ``<count>`` number
-  of binder devices can be allocated in this binderfs instance.
-
-Allocating binder Devices
--------------------------
-
-.. _ioctl: http://man7.org/linux/man-pages/man2/ioctl.2.html
-
-To allocate a new binder device in a binderfs instance a request needs to be
-sent through the ``binder-control`` device node.  A request is sent in the form
-of an `ioctl() <ioctl_>`_.
-
-What a program needs to do is to open the ``binder-control`` device node and
-send a ``BINDER_CTL_ADD`` request to the kernel.  Users of binderfs need to
-tell the kernel which name the new binder device should get.  By default a name
-can only contain up to ``BINDERFS_MAX_NAME`` chars including the terminating
-zero byte.
-
-Once the request is made via an `ioctl() <ioctl_>`_ passing a ``struct
-binder_device`` with the name to the kernel it will allocate a new binder
-device and return the major and minor number of the new device in the struct
-(This is necessary because binderfs allocates a major device number
-dynamically.).  After the `ioctl() <ioctl_>`_ returns there will be a new
-binder device located under /dev/binderfs with the chosen name.
-
-Deleting binder Devices
------------------------
-
-.. _unlink: http://man7.org/linux/man-pages/man2/unlink.2.html
-.. _rm: http://man7.org/linux/man-pages/man1/rm.1.html
-
-Binderfs binder devices can be deleted via `unlink() <unlink_>`_.  This means
-that the `rm() <rm_>`_ tool can be used to delete them. Note that the
-``binder-control`` device cannot be deleted since this would make the binderfs
-instance unuseable.  The ``binder-control`` device will be deleted when the
-binderfs instance is unmounted and all references to it have been dropped.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 1651173f1118..2de2fe2ab078 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -32,13 +32,3 @@ filesystem implementations.
 
    journalling
    fscrypt
-
-Filesystem-specific documentation
-=================================
-
-Documentation for individual filesystem types can be found here.
-
-.. toctree::
-   :maxdepth: 2
-
-   binderfs
-- 
cgit v1.2.3-59-g8ed1b


From 454f96f2b738374da4b0a703b1e2e7aed82c4486 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date: Sat, 6 Jul 2019 13:28:42 -0300
Subject: docs: automarkup.py: ignore exceptions when seeking for xrefs

When using the automarkup extension with:
	make pdfdocs

without passing an specific book, the code will raise an exception:

	  File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 86, in auto_markup
	    node.parent.replace(node, markup_funcs(name, app, node))
	  File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 59, in markup_funcs
	    'function', target, pxref, lit_text)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/domains/c.py", line 308, in resolve_xref
	    contnode, target)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/util/nodes.py", line 450, in make_refnode
	    '#' + targetid)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 159, in get_relative_uri
	    return self.get_target_uri(to, typ)
	  File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 152, in get_target_uri
	    raise NoUri
	sphinx.environment.NoUri

This happens because not all references will belong to a single
PDF/LaTeX document.

Better to just ignore those than breaking Sphinx build.

Fixes: d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[jc: Narrowed the "except" and tweaked the comment]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/sphinx/automarkup.py | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index b300cf129869..77e89c1956d7 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -6,6 +6,7 @@
 #
 from docutils import nodes
 from sphinx import addnodes
+from sphinx.environment import NoUri
 import re
 
 #
@@ -55,8 +56,15 @@ def markup_funcs(docname, app, node):
                                           reftype = 'function',
                                           reftarget = target, modname = None,
                                           classname = None)
-            xref = cdom.resolve_xref(app.env, docname, app.builder,
-                                     'function', target, pxref, lit_text)
+            #
+            # XXX The Latex builder will throw NoUri exceptions here,
+            # work around that by ignoring them.
+            #
+            try:
+                xref = cdom.resolve_xref(app.env, docname, app.builder,
+                                         'function', target, pxref, lit_text)
+            except NoUri:
+                xref = None
         #
         # Toss the xref into the list if we got it; otherwise just put
         # the function text.
-- 
cgit v1.2.3-59-g8ed1b