32 files changed, 1610 insertions, 229 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index 27232be26e1a..783ddc3ce4e8 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -65,7 +65,7 @@ o  isdn4k-utils           3.1pre1                 # isdnctrl 2>&1|grep version
 o  nfs-utils              1.0.5                   # showmount --version
 o  procps                 3.2.0                   # ps --version
 o  oprofile               0.9                     # oprofiled --version
-o  udev                   058                     # udevinfo -V
+o  udev                   071                     # udevinfo -V
 
 Kernel compilation
 ==================
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index d650ce36485f..4d9b66d8b4db 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -286,7 +286,9 @@ X!Edrivers/pci/search.c
  -->
 !Edrivers/pci/msi.c
 !Edrivers/pci/bus.c
-!Edrivers/pci/hotplug.c
+<!-- FIXME: Removed for now since no structured comments in source
+X!Edrivers/pci/hotplug.c
+-->
 !Edrivers/pci/probe.c
 !Edrivers/pci/rom.c
      </sect1>
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
index 375ae760dc1e..d260d92089ad 100644
--- a/Documentation/DocBook/libata.tmpl
+++ b/Documentation/DocBook/libata.tmpl
@@ -415,6 +415,362 @@ and other resources, etc.
      </sect1>
   </chapter>
 
+  <chapter id="libataEH">
+        <title>Error handling</title>
+
+	<para>
+	This chapter describes how errors are handled under libata.
+	Readers are advised to read SCSI EH
+	(Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first.
+	</para>
+
+	<sect1><title>Origins of commands</title>
+	<para>
+	In libata, a command is represented with struct ata_queued_cmd
+	or qc.  qc's are preallocated during port initialization and
+	repetitively used for command executions.  Currently only one
+	qc is allocated per port but yet-to-be-merged NCQ branch
+	allocates one for each tag and maps each qc to NCQ tag 1-to-1.
+	</para>
+	<para>
+	libata commands can originate from two sources - libata itself
+	and SCSI midlayer.  libata internal commands are used for
+	initialization and error handling.  All normal blk requests
+	and commands for SCSI emulation are passed as SCSI commands
+	through queuecommand callback of SCSI host template.
+	</para>
+	</sect1>
+
+	<sect1><title>How commands are issued</title>
+
+	<variablelist>
+
+	<varlistentry><term>Internal commands</term>
+	<listitem>
+	<para>
+	First, qc is allocated and initialized using
+	ata_qc_new_init().  Although ata_qc_new_init() doesn't
+	implement any wait or retry mechanism when qc is not
+	available, internal commands are currently issued only during
+	initialization and error recovery, so no other command is
+	active and allocation is guaranteed to succeed.
+	</para>
+	<para>
+	Once allocated qc's taskfile is initialized for the command to
+	be executed.  qc currently has two mechanisms to notify
+	completion.  One is via qc->complete_fn() callback and the
+	other is completion qc->waiting.  qc->complete_fn() callback
+	is the asynchronous path used by normal SCSI translated
+	commands and qc->waiting is the synchronous (issuer sleeps in
+	process context) path used by internal commands.
+	</para>
+	<para>
+	Once initialization is complete, host_set lock is acquired
+	and the qc is issued.
+	</para>
+	</listitem>
+	</varlistentry>
+
+	<varlistentry><term>SCSI commands</term>
+	<listitem>
+	<para>
+	All libata drivers use ata_scsi_queuecmd() as
+	hostt->queuecommand callback.  scmds can either be simulated
+	or translated.  No qc is involved in processing a simulated
+	scmd.  The result is computed right away and the scmd is
+	completed.
+	</para>
+	<para>
+	For a translated scmd, ata_qc_new_init() is invoked to
+	allocate a qc and the scmd is translated into the qc.  SCSI
+	midlayer's completion notification function pointer is stored
+	into qc->scsidone.
+	</para>
+	<para>
+	qc->complete_fn() callback is used for completion
+	notification.  ATA commands use ata_scsi_qc_complete() while
+	ATAPI commands use atapi_qc_complete().  Both functions end up
+	calling qc->scsidone to notify upper layer when the qc is
+	finished.  After translation is completed, the qc is issued
+	with ata_qc_issue().
+	</para>
+	<para>
+	Note that SCSI midlayer invokes hostt->queuecommand while
+	holding host_set lock, so all above occur while holding
+	host_set lock.
+	</para>
+	</listitem>
+	</varlistentry>
+
+	</variablelist>
+	</sect1>
+
+	<sect1><title>How commands are processed</title>
+	<para>
+	Depending on which protocol and which controller are used,
+	commands are processed differently.  For the purpose of
+	discussion, a controller which uses taskfile interface and all
+	standard callbacks is assumed.
+	</para>
+	<para>
+	Currently 6 ATA command protocols are used.  They can be
+	sorted into the following four categories according to how
+	they are processed.
+	</para>
+
+	<variablelist>
+	   <varlistentry><term>ATA NO DATA or DMA</term>
+	   <listitem>
+	   <para>
+	   ATA_PROT_NODATA and ATA_PROT_DMA fall into this category.
+	   These types of commands don't require any software
+	   intervention once issued.  Device will raise interrupt on
+	   completion.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>ATA PIO</term>
+	   <listitem>
+	   <para>
+	   ATA_PROT_PIO is in this category.  libata currently
+	   implements PIO with polling.  ATA_NIEN bit is set to turn
+	   off interrupt and pio_task on ata_wq performs polling and
+	   IO.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>ATAPI NODATA or DMA</term>
+	   <listitem>
+	   <para>
+	   ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this
+	   category.  packet_task is used to poll BSY bit after
+	   issuing PACKET command.  Once BSY is turned off by the
+	   device, packet_task transfers CDB and hands off processing
+	   to interrupt handler.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>ATAPI PIO</term>
+	   <listitem>
+	   <para>
+	   ATA_PROT_ATAPI is in this category.  ATA_NIEN bit is set
+	   and, as in ATAPI NODATA or DMA, packet_task submits cdb.
+	   However, after submitting cdb, further processing (data
+	   transfer) is handed off to pio_task.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+	</variablelist>
+        </sect1>
+
+	<sect1><title>How commands are completed</title>
+	<para>
+	Once issued, all qc's are either completed with
+	ata_qc_complete() or time out.  For commands which are handled
+	by interrupts, ata_host_intr() invokes ata_qc_complete(), and,
+	for PIO tasks, pio_task invokes ata_qc_complete().  In error
+	cases, packet_task may also complete commands.
+	</para>
+	<para>
+	ata_qc_complete() does the following.
+	</para>
+
+	<orderedlist>
+
+	<listitem>
+	<para>
+	DMA memory is unmapped.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	ATA_QCFLAG_ACTIVE is clared from qc->flags.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	qc->complete_fn() callback is invoked.  If the return value of
+	the callback is not zero.  Completion is short circuited and
+	ata_qc_complete() returns.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	__ata_qc_complete() is called, which does
+	   <orderedlist>
+
+	   <listitem>
+	   <para>
+	   qc->flags is cleared to zero.
+	   </para>
+	   </listitem>
+
+	   <listitem>
+	   <para>
+	   ap->active_tag and qc->tag are poisoned.
+	   </para>
+	   </listitem>
+
+	   <listitem>
+	   <para>
+	   qc->waiting is claread &amp; completed (in that order).
+	   </para>
+	   </listitem>
+
+	   <listitem>
+	   <para>
+	   qc is deallocated by clearing appropriate bit in ap->qactive.
+	   </para>
+	   </listitem>
+
+	   </orderedlist>
+	</para>
+	</listitem>
+
+	</orderedlist>
+
+	<para>
+	So, it basically notifies upper layer and deallocates qc.  One
+	exception is short-circuit path in #3 which is used by
+	atapi_qc_complete().
+	</para>
+	<para>
+	For all non-ATAPI commands, whether it fails or not, almost
+	the same code path is taken and very little error handling
+	takes place.  A qc is completed with success status if it
+	succeeded, with failed status otherwise.
+	</para>
+	<para>
+	However, failed ATAPI commands require more handling as
+	REQUEST SENSE is needed to acquire sense data.  If an ATAPI
+	command fails, ata_qc_complete() is invoked with error status,
+	which in turn invokes atapi_qc_complete() via
+	qc->complete_fn() callback.
+	</para>
+	<para>
+	This makes atapi_qc_complete() set scmd->result to
+	SAM_STAT_CHECK_CONDITION, complete the scmd and return 1.  As
+	the sense data is empty but scmd->result is CHECK CONDITION,
+	SCSI midlayer will invoke EH for the scmd, and returning 1
+	makes ata_qc_complete() to return without deallocating the qc.
+	This leads us to ata_scsi_error() with partially completed qc.
+	</para>
+
+	</sect1>
+
+	<sect1><title>ata_scsi_error()</title>
+	<para>
+	ata_scsi_error() is the current hostt->eh_strategy_handler()
+	for libata.  As discussed above, this will be entered in two
+	cases - timeout and ATAPI error completion.  This function
+	calls low level libata driver's eng_timeout() callback, the
+	standard callback for which is ata_eng_timeout().  It checks
+	if a qc is active and calls ata_qc_timeout() on the qc if so.
+	Actual error handling occurs in ata_qc_timeout().
+	</para>
+	<para>
+	If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and
+	completes the qc.  Note that as we're currently in EH, we
+	cannot call scsi_done.  As described in SCSI EH doc, a
+	recovered scmd should be either retried with
+	scsi_queue_insert() or finished with scsi_finish_command().
+	Here, we override qc->scsidone with scsi_finish_command() and
+	calls ata_qc_complete().
+	</para>
+	<para>
+	If EH is invoked due to a failed ATAPI qc, the qc here is
+	completed but not deallocated.  The purpose of this
+	half-completion is to use the qc as place holder to make EH
+	code reach this place.  This is a bit hackish, but it works.
+	</para>
+	<para>
+	Once control reaches here, the qc is deallocated by invoking
+	__ata_qc_complete() explicitly.  Then, internal qc for REQUEST
+	SENSE is issued.  Once sense data is acquired, scmd is
+	finished by directly invoking scsi_finish_command() on the
+	scmd.  Note that as we already have completed and deallocated
+	the qc which was associated with the scmd, we don't need
+	to/cannot call ata_qc_complete() again.
+	</para>
+
+	</sect1>
+
+	<sect1><title>Problems with the current EH</title>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	Error representation is too crude.  Currently any and all
+	error conditions are represented with ATA STATUS and ERROR
+	registers.  Errors which aren't ATA device errors are treated
+	as ATA device errors by setting ATA_ERR bit.  Better error
+	descriptor which can properly represent ATA and other
+	errors/exceptions is needed.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	When handling timeouts, no action is taken to make device
+	forget about the timed out command and ready for new commands.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	EH handling via ata_scsi_error() is not properly protected
+	from usual command processing.  On EH entrance, the device is
+	not in quiescent state.  Timed out commands may succeed or
+	fail any time.  pio_task and atapi_task may still be running.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Too weak error recovery.  Devices / controllers causing HSM
+	mismatch errors and other errors quite often require reset to
+	return to known state.  Also, advanced error handling is
+	necessary to support features like NCQ and hotplug.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	ATA errors are directly handled in the interrupt handler and
+	PIO errors in pio_task.  This is problematic for advanced
+	error handling for the following reasons.
+	</para>
+	<para>
+	First, advanced error handling often requires context and
+	internal qc execution.
+	</para>
+	<para>
+	Second, even a simple failure (say, CRC error) needs
+	information gathering and could trigger complex error handling
+	(say, resetting &amp; reconfiguring).  Having multiple code
+	paths to gather information, enter EH and trigger actions
+	makes life painful.
+	</para>
+	<para>
+	Third, scattered EH code makes implementing low level drivers
+	difficult.  Low level drivers override libata callbacks.  If
+	EH is scattered over several places, each affected callbacks
+	should perform its part of error handling.  This can be error
+	prone and painful.
+	</para>
+	</listitem>
+
+	</itemizedlist>
+	</sect1>
+  </chapter>
+
   <chapter id="libataExt">
      <title>libata Library</title>
 !Edrivers/scsi/libata-core.c
@@ -431,6 +787,722 @@ and other resources, etc.
 !Idrivers/scsi/libata-scsi.c
   </chapter>
 
+  <chapter id="ataExceptions">
+     <title>ATA errors &amp; exceptions</title>
+
+  <para>
+  This chapter tries to identify what error/exception conditions exist
+  for ATA/ATAPI devices and describe how they should be handled in
+  implementation-neutral way.
+  </para>
+
+  <para>
+  The term 'error' is used to describe conditions where either an
+  explicit error condition is reported from device or a command has
+  timed out.
+  </para>
+
+  <para>
+  The term 'exception' is either used to describe exceptional
+  conditions which are not errors (say, power or hotplug events), or
+  to describe both errors and non-error exceptional conditions.  Where
+  explicit distinction between error and exception is necessary, the
+  term 'non-error exception' is used.
+  </para>
+
+  <sect1 id="excat">
+     <title>Exception categories</title>
+     <para>
+     Exceptions are described primarily with respect to legacy
+     taskfile + bus master IDE interface.  If a controller provides
+     other better mechanism for error reporting, mapping those into
+     categories described below shouldn't be difficult.
+     </para>
+
+     <para>
+     In the following sections, two recovery actions - reset and
+     reconfiguring transport - are mentioned.  These are described
+     further in <xref linkend="exrec"/>.
+     </para>
+
+     <sect2 id="excatHSMviolation">
+        <title>HSM violation</title>
+        <para>
+        This error is indicated when STATUS value doesn't match HSM
+        requirement during issuing or excution any ATA/ATAPI command.
+        </para>
+
+	<itemizedlist>
+	<title>Examples</title>
+
+        <listitem>
+	<para>
+	ATA_STATUS doesn't contain !BSY &amp;&amp; DRDY &amp;&amp; !DRQ while trying
+	to issue a command.
+        </para>
+	</listitem>
+
+        <listitem>
+	<para>
+	!BSY &amp;&amp; !DRQ during PIO data transfer.
+        </para>
+	</listitem>
+
+        <listitem>
+	<para>
+	DRQ on command completion.
+        </para>
+	</listitem>
+
+        <listitem>
+	<para>
+	!BSY &amp;&amp; ERR after CDB tranfer starts but before the
+        last byte of CDB is transferred.  ATA/ATAPI standard states
+        that &quot;The device shall not terminate the PACKET command
+        with an error before the last byte of the command packet has
+        been written&quot; in the error outputs description of PACKET
+        command and the state diagram doesn't include such
+        transitions.
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	In these cases, HSM is violated and not much information
+	regarding the error can be acquired from STATUS or ERROR
+	register.  IOW, this error can be anything - driver bug,
+	faulty device, controller and/or cable.
+	</para>
+
+	<para>
+	As HSM is violated, reset is necessary to restore known state.
+	Reconfiguring transport for lower speed might be helpful too
+	as transmission errors sometimes cause this kind of errors.
+	</para>
+     </sect2>
+     
+     <sect2 id="excatDevErr">
+        <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title>
+
+	<para>
+	These are errors detected and reported by ATA/ATAPI devices
+	indicating device problems.  For this type of errors, STATUS
+	and ERROR register values are valid and describe error
+	condition.  Note that some of ATA bus errors are detected by
+	ATA/ATAPI devices and reported using the same mechanism as
+	device errors.  Those cases are described later in this
+	section.
+	</para>
+
+	<para>
+	For ATA commands, this type of errors are indicated by !BSY
+	&amp;&amp; ERR during command execution and on completion.
+	</para>
+
+	<para>For ATAPI commands,</para>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	!BSY &amp;&amp; ERR &amp;&amp; ABRT right after issuing PACKET
+	indicates that PACKET command is not supported and falls in
+	this category.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	!BSY &amp;&amp; ERR(==CHK) &amp;&amp; !ABRT after the last
+	byte of CDB is transferred indicates CHECK CONDITION and
+	doesn't fall in this category.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	!BSY &amp;&amp; ERR(==CHK) &amp;&amp; ABRT after the last byte
+        of CDB is transferred *probably* indicates CHECK CONDITION and
+        doesn't fall in this category.
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	Of errors detected as above, the followings are not ATA/ATAPI
+	device errors but ATA bus errors and should be handled
+	according to <xref linkend="excatATAbusErr"/>.
+	</para>
+
+	<variablelist>
+
+	   <varlistentry>
+	   <term>CRC error during data transfer</term>
+	   <listitem>
+	   <para>
+	   This is indicated by ICRC bit in the ERROR register and
+	   means that corruption occurred during data transfer.  Upto
+	   ATA/ATAPI-7, the standard specifies that this bit is only
+	   applicable to UDMA transfers but ATA/ATAPI-8 draft revision
+	   1f says that the bit may be applicable to multiword DMA and
+	   PIO.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry>
+	   <term>ABRT error during data transfer or on completion</term>
+	   <listitem>
+	   <para>
+	   Upto ATA/ATAPI-7, the standard specifies that ABRT could be
+	   set on ICRC errors and on cases where a device is not able
+	   to complete a command.  Combined with the fact that MWDMA
+	   and PIO transfer errors aren't allowed to use ICRC bit upto
+	   ATA/ATAPI-7, it seems to imply that ABRT bit alone could
+	   indicate tranfer errors.
+	   </para>
+	   <para>
+	   However, ATA/ATAPI-8 draft revision 1f removes the part
+	   that ICRC errors can turn on ABRT.  So, this is kind of
+	   gray area.  Some heuristics are needed here.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	</variablelist>
+
+	<para>
+	ATA/ATAPI device errors can be further categorized as follows.
+	</para>
+
+	<variablelist>
+
+	   <varlistentry>
+	   <term>Media errors</term>
+	   <listitem>
+	   <para>
+	   This is indicated by UNC bit in the ERROR register.  ATA
+	   devices reports UNC error only after certain number of
+	   retries cannot recover the data, so there's nothing much
+	   else to do other than notifying upper layer.
+	   </para>
+	   <para>
+	   READ and WRITE commands report CHS or LBA of the first
+	   failed sector but ATA/ATAPI standard specifies that the
+	   amount of transferred data on error completion is
+	   indeterminate, so we cannot assume that sectors preceding
+	   the failed sector have been transferred and thus cannot
+	   complete those sectors successfully as SCSI does.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry>
+	   <term>Media changed / media change requested error</term>
+	   <listitem>
+	   <para>
+	   &lt;&lt;TODO: fill here&gt;&gt;
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>Address error</term>
+	   <listitem>
+	   <para>
+	   This is indicated by IDNF bit in the ERROR register.
+	   Report to upper layer.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>Other errors</term>
+	   <listitem>
+	   <para>
+	   This can be invalid command or parameter indicated by ABRT
+	   ERROR bit or some other error condition.  Note that ABRT
+	   bit can indicate a lot of things including ICRC and Address
+	   errors.  Heuristics needed.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	</variablelist>
+
+	<para>
+	Depending on commands, not all STATUS/ERROR bits are
+	applicable.  These non-applicable bits are marked with
+	&quot;na&quot; in the output descriptions but upto ATA/ATAPI-7
+	no definition of &quot;na&quot; can be found.  However,
+	ATA/ATAPI-8 draft revision 1f describes &quot;N/A&quot; as
+	follows.
+	</para>
+
+	<blockquote>
+	<variablelist>
+	   <varlistentry><term>3.2.3.3a N/A</term>
+	   <listitem>
+	   <para>
+	   A keyword the indicates a field has no defined value in
+	   this standard and should not be checked by the host or
+	   device. N/A fields should be cleared to zero.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+	</variablelist>
+	</blockquote>
+
+	<para>
+	So, it seems reasonable to assume that &quot;na&quot; bits are
+	cleared to zero by devices and thus need no explicit masking.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatATAPIcc">
+        <title>ATAPI device CHECK CONDITION</title>
+
+	<para>
+	ATAPI device CHECK CONDITION error is indicated by set CHK bit
+	(ERR bit) in the STATUS register after the last byte of CDB is
+	transferred for a PACKET command.  For this kind of errors,
+	sense data should be acquired to gather information regarding
+	the errors.  REQUEST SENSE packet command should be used to
+	acquire sense data.
+	</para>
+
+	<para>
+	Once sense data is acquired, this type of errors can be
+	handled similary to other SCSI errors.  Note that sense data
+	may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR
+	&amp;&amp; ASC/ASCQ 47h/00h SCSI PARITY ERROR).  In such
+	cases, the error should be considered as an ATA bus error and
+	handled according to <xref linkend="excatATAbusErr"/>.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatNCQerr">
+        <title>ATA device error (NCQ)</title>
+
+	<para>
+	NCQ command error is indicated by cleared BSY and set ERR bit
+	during NCQ command phase (one or more NCQ commands
+	outstanding).  Although STATUS and ERROR registers will
+	contain valid values describing the error, READ LOG EXT is
+	required to clear the error condition, determine which command
+	has failed and acquire more information.
+	</para>
+
+	<para>
+	READ LOG EXT Log Page 10h reports which tag has failed and
+	taskfile register values describing the error.  With this
+	information the failed command can be handled as a normal ATA
+	command error as in <xref linkend="excatDevErr"/> and all
+	other in-flight commands must be retried.  Note that this
+	retry should not be counted - it's likely that commands
+	retried this way would have completed normally if it were not
+	for the failed command.
+	</para>
+
+	<para>
+	Note that ATA bus errors can be reported as ATA device NCQ
+	errors.  This should be handled as described in <xref
+	linkend="excatATAbusErr"/>.
+	</para>
+
+	<para>
+	If READ LOG EXT Log Page 10h fails or reports NQ, we're
+	thoroughly screwed.  This condition should be treated
+	according to <xref linkend="excatHSMviolation"/>.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatATAbusErr">
+        <title>ATA bus error</title>
+
+	<para>
+	ATA bus error means that data corruption occurred during
+	transmission over ATA bus (SATA or PATA).  This type of errors
+	can be indicated by
+	</para>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	ICRC or ABRT error as described in <xref linkend="excatDevErr"/>.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Controller-specific error completion with error information
+	indicating transmission error.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	On some controllers, command timeout.  In this case, there may
+	be a mechanism to determine that the timeout is due to
+	transmission error.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Unknown/random errors, timeouts and all sorts of weirdities.
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	As described above, transmission errors can cause wide variety
+	of symptoms ranging from device ICRC error to random device
+	lockup, and, for many cases, there is no way to tell if an
+	error condition is due to transmission error or not;
+	therefore, it's necessary to employ some kind of heuristic
+	when dealing with errors and timeouts.  For example,
+	encountering repetitive ABRT errors for known supported
+	command is likely to indicate ATA bus error.
+	</para>
+
+	<para>
+	Once it's determined that ATA bus errors have possibly
+	occurred, lowering ATA bus transmission speed is one of
+	actions which may alleviate the problem.  See <xref
+	linkend="exrecReconf"/> for more information.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatPCIbusErr">
+        <title>PCI bus error</title>
+
+	<para>
+	Data corruption or other failures during transmission over PCI
+	(or other system bus).  For standard BMDMA, this is indicated
+	by Error bit in the BMDMA Status register.  This type of
+	errors must be logged as it indicates something is very wrong
+	with the system.  Resetting host controller is recommended.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatLateCompletion">
+        <title>Late completion</title>
+
+	<para>
+	This occurs when timeout occurs and the timeout handler finds
+	out that the timed out command has completed successfully or
+	with error.  This is usually caused by lost interrupts.  This
+	type of errors must be logged.  Resetting host controller is
+	recommended.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatUnknown">
+        <title>Unknown error (timeout)</title>
+
+	<para>
+	This is when timeout occurs and the command is still
+	processing or the host and device are in unknown state.  When
+	this occurs, HSM could be in any valid or invalid state.  To
+	bring the device to known state and make it forget about the
+	timed out command, resetting is necessary.  The timed out
+	command may be retried.
+	</para>
+
+	<para>
+	Timeouts can also be caused by transmission errors.  Refer to
+	<xref linkend="excatATAbusErr"/> for more details.
+	</para>
+
+     </sect2>
+
+     <sect2 id="excatHoplugPM">
+        <title>Hotplug and power management exceptions</title>
+
+	<para>
+	&lt;&lt;TODO: fill here&gt;&gt;
+	</para>
+
+     </sect2>
+
+  </sect1>
+
+  <sect1 id="exrec">
+     <title>EH recovery actions</title>
+
+     <para>
+     This section discusses several important recovery actions.
+     </para>
+
+     <sect2 id="exrecClr">
+        <title>Clearing error condition</title>
+
+	<para>
+	Many controllers require its error registers to be cleared by
+	error handler.  Different controllers may have different
+	requirements.
+	</para>
+
+	<para>
+	For SATA, it's strongly recommended to clear at least SError
+	register during error handling.
+	</para>
+     </sect2>
+
+     <sect2 id="exrecRst">
+        <title>Reset</title>
+
+	<para>
+	During EH, resetting is necessary in the following cases.
+	</para>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	HSM is in unknown or invalid state
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	HBA is in unknown or invalid state
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	EH needs to make HBA/device forget about in-flight commands
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	HBA/device behaves weirdly
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	Resetting during EH might be a good idea regardless of error
+	condition to improve EH robustness.  Whether to reset both or
+	either one of HBA and device depends on situation but the
+	following scheme is recommended.
+	</para>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	When it's known that HBA is in ready state but ATA/ATAPI
+	device in in unknown state, reset only device.
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	If HBA is in unknown state, reset both HBA and device.
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	HBA resetting is implementation specific.  For a controller
+	complying to taskfile/BMDMA PCI IDE, stopping active DMA
+	transaction may be sufficient iff BMDMA state is the only HBA
+	context.  But even mostly taskfile/BMDMA PCI IDE complying
+	controllers may have implementation specific requirements and
+	mechanism to reset themselves.  This must be addressed by
+	specific drivers.
+	</para>
+
+	<para>
+	OTOH, ATA/ATAPI standard describes in detail ways to reset
+	ATA/ATAPI devices.
+	</para>
+
+	<variablelist>
+
+	   <varlistentry><term>PATA hardware reset</term>
+	   <listitem>
+	   <para>
+	   This is hardware initiated device reset signalled with
+	   asserted PATA RESET- signal.  There is no standard way to
+	   initiate hardware reset from software although some
+	   hardware provides registers that allow driver to directly
+	   tweak the RESET- signal.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>Software reset</term>
+	   <listitem>
+	   <para>
+	   This is achieved by turning CONTROL SRST bit on for at
+	   least 5us.  Both PATA and SATA support it but, in case of
+	   SATA, this may require controller-specific support as the
+	   second Register FIS to clear SRST should be transmitted
+	   while BSY bit is still set.  Note that on PATA, this resets
+	   both master and slave devices on a channel.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term>
+	   <listitem>
+	   <para>
+	   Although ATA/ATAPI standard doesn't describe exactly, EDD
+	   implies some level of resetting, possibly similar level
+	   with software reset.  Host-side EDD protocol can be handled
+	   with normal command processing and most SATA controllers
+	   should be able to handle EDD's just like other commands.
+	   As in software reset, EDD affects both devices on a PATA
+	   bus.
+	   </para>
+	   <para>
+	   Although EDD does reset devices, this doesn't suit error
+	   handling as EDD cannot be issued while BSY is set and it's
+	   unclear how it will act when device is in unknown/weird
+	   state.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>ATAPI DEVICE RESET command</term>
+	   <listitem>
+	   <para>
+	   This is very similar to software reset except that reset
+	   can be restricted to the selected device without affecting
+	   the other device sharing the cable.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	   <varlistentry><term>SATA phy reset</term>
+	   <listitem>
+	   <para>
+	   This is the preferred way of resetting a SATA device.  In
+	   effect, it's identical to PATA hardware reset.  Note that
+	   this can be done with the standard SCR Control register.
+	   As such, it's usually easier to implement than software
+	   reset.
+	   </para>
+	   </listitem>
+	   </varlistentry>
+
+	</variablelist>
+
+	<para>
+	One more thing to consider when resetting devices is that
+	resetting clears certain configuration parameters and they
+	need to be set to their previous or newly adjusted values
+	after reset.
+	</para>
+
+	<para>
+	Parameters affected are.
+	</para>
+
+	<itemizedlist>
+
+	<listitem>
+	<para>
+	CHS set up with INITIALIZE DEVICE PARAMETERS (seldomly used)
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Parameters set with SET FEATURES including transfer mode setting
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Block count set with SET MULTIPLE MODE
+	</para>
+	</listitem>
+
+	<listitem>
+	<para>
+	Other parameters (SET MAX, MEDIA LOCK...)
+	</para>
+	</listitem>
+
+	</itemizedlist>
+
+	<para>
+	ATA/ATAPI standard specifies that some parameters must be
+	maintained across hardware or software reset, but doesn't
+	strictly specify all of them.  Always reconfiguring needed
+	parameters after reset is required for robustness.  Note that
+	this also applies when resuming from deep sleep (power-off).
+	</para>
+
+	<para>
+	Also, ATA/ATAPI standard requires that IDENTIFY DEVICE /
+	IDENTIFY PACKET DEVICE is issued after any configuration
+	parameter is updated or a hardware reset and the result used
+	for further operation.  OS driver is required to implement
+	revalidation mechanism to support this.
+	</para>
+
+     </sect2>
+
+     <sect2 id="exrecReconf">
+        <title>Reconfigure transport</title>
+
+	<para>
+	For both PATA and SATA, a lot of corners are cut for cheap
+	connectors, cables or controllers and it's quite common to see
+	high transmission error rate.  This can be mitigated by
+	lowering transmission speed.
+	</para>
+
+	<para>
+	The following is a possible scheme Jeff Garzik suggested.
+	</para>
+
+	<blockquote>
+	<para>
+	If more than $N (3?) transmission errors happen in 15 minutes,
+	</para>	
+	<itemizedlist>
+	<listitem>
+	<para>
+	if SATA, decrease SATA PHY speed.  if speed cannot be decreased,
+	</para>
+	</listitem>
+	<listitem>
+	<para>
+	decrease UDMA xfer speed.  if at UDMA0, switch to PIO4,
+	</para>
+	</listitem>
+	<listitem>
+	<para>
+	decrease PIO xfer speed.  if at PIO3, complain, but continue
+	</para>
+	</listitem>
+	</itemizedlist>
+	</blockquote>
+
+     </sect2>
+
+  </sect1>
+
+  </chapter>
+
   <chapter id="PiixInt">
      <title>ata_piix Internals</title>
 !Idrivers/scsi/ata_piix.c
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl
index 705c442c7bf4..15ce0f21e5e0 100644
--- a/Documentation/DocBook/usb.tmpl
+++ b/Documentation/DocBook/usb.tmpl
@@ -291,7 +291,7 @@
 
 !Edrivers/usb/core/hcd.c
 !Edrivers/usb/core/hcd-pci.c
-!Edrivers/usb/core/buffer.c
+!Idrivers/usb/core/buffer.c
     </chapter>
 
     <chapter>
diff --git a/Documentation/DocBook/writing_usb_driver.tmpl b/Documentation/DocBook/writing_usb_driver.tmpl
index 51f3bfb6fb6e..008a341234d0 100644
--- a/Documentation/DocBook/writing_usb_driver.tmpl
+++ b/Documentation/DocBook/writing_usb_driver.tmpl
@@ -345,8 +345,7 @@ if (!retval) {
   <programlisting>
 static inline void skel_delete (struct usb_skel *dev)
 {
-    if (dev->bulk_in_buffer != NULL)
-        kfree (dev->bulk_in_buffer);
+    kfree (dev->bulk_in_buffer);
     if (dev->bulk_out_buffer != NULL)
         usb_buffer_free (dev->udev, dev->bulk_out_size,
             dev->bulk_out_buffer,
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
new file mode 100644
index 000000000000..e4c38152f7f7
--- /dev/null
+++ b/Documentation/RCU/torture.txt
@@ -0,0 +1,122 @@
+RCU Torture Test Operation
+
+
+CONFIG_RCU_TORTURE_TEST
+
+The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
+implementations.  It creates an rcutorture kernel module that can
+be loaded to run a torture test.  The test periodically outputs
+status messages via printk(), which can be examined via the dmesg
+command (perhaps grepping for "rcutorture").  The test is started
+when the module is loaded, and stops when the module is unloaded.
+
+However, actually setting this config option to "y" results in the system
+running the test immediately upon boot, and ending only when the system
+is taken down.  Normally, one will instead want to build the system
+with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control
+the test, perhaps using a script similar to the one shown at the end of
+this document.  Note that you will need CONFIG_MODULE_UNLOAD in order
+to be able to end the test.
+
+
+MODULE PARAMETERS
+
+This module has the following parameters:
+
+nreaders	This is the number of RCU reading threads supported.
+		The default is twice the number of CPUs.  Why twice?
+		To properly exercise RCU implementations with preemptible
+		read-side critical sections.
+
+stat_interval	The number of seconds between output of torture
+		statistics (via printk()).  Regardless of the interval,
+		statistics are printed when the module is unloaded.
+		Setting the interval to zero causes the statistics to
+		be printed -only- when the module is unloaded, and this
+		is the default.
+
+verbose		Enable debug printk()s.  Default is disabled.
+
+
+OUTPUT
+
+The statistics output is as follows:
+
+	rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
+	rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
+	rcutorture: Reader Pipe:  1466408 9747 0 0 0 0 0 0 0 0 0
+	rcutorture: Reader Batch:  1464477 11678 0 0 0 0 0 0 0 0
+	rcutorture: Free-Block Circulation:  1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
+	rcutorture: --- End of test
+
+The command "dmesg | grep rcutorture:" will extract this information on
+most systems.  On more esoteric configurations, it may be necessary to
+use other commands to access the output of the printk()s used by
+the RCU torture test.  The printk()s use KERN_ALERT, so they should
+be evident.  ;-)
+
+The entries are as follows:
+
+o	"ggp": The number of counter flips (or batches) since boot.
+
+o	"rtc": The hexadecimal address of the structure currently visible
+	to readers.
+
+o	"ver": The number of times since boot that the rcutw writer task
+	has changed the structure visible to readers.
+
+o	"tfle": If non-zero, indicates that the "torture freelist"
+	containing structure to be placed into the "rtc" area is empty.
+	This condition is important, since it can fool you into thinking
+	that RCU is working when it is not.  :-/
+
+o	"rta": Number of structures allocated from the torture freelist.
+
+o	"rtaf": Number of allocations from the torture freelist that have
+	failed due to the list being empty.
+
+o	"rtf": Number of frees into the torture freelist.
+
+o	"Reader Pipe": Histogram of "ages" of structures seen by readers.
+	If any entries past the first two are non-zero, RCU is broken.
+	And rcutorture prints the error flag string "!!!" to make sure
+	you notice.  The age of a newly allocated structure is zero,
+	it becomes one when removed from reader visibility, and is
+	incremented once per grace period subsequently -- and is freed
+	after passing through (RCU_TORTURE_PIPE_LEN-2) grace periods.
+
+	The output displayed above was taken from a correctly working
+	RCU.  If you want to see what it looks like when broken, break
+	it yourself.  ;-)
+
+o	"Reader Batch": Another histogram of "ages" of structures seen
+	by readers, but in terms of counter flips (or batches) rather
+	than in terms of grace periods.  The legal number of non-zero
+	entries is again two.  The reason for this separate view is
+	that it is easier to get the third entry to show up in the
+	"Reader Batch" list than in the "Reader Pipe" list.
+
+o	"Free-Block Circulation": Shows the number of torture structures
+	that have reached a given point in the pipeline.  The first element
+	should closely correspond to the number of structures allocated,
+	the second to the number that have been removed from reader view,
+	and all but the last remaining to the corresponding number of
+	passes through a grace period.  The last entry should be zero,
+	as it is only incremented if a torture structure's counter
+	somehow gets incremented farther than it should.
+
+
+USAGE
+
+The following script may be used to torture RCU:
+
+	#!/bin/sh
+
+	modprobe rcutorture
+	sleep 100
+	rmmod rcutorture
+	dmesg | grep rcutorture:
+
+The output can be manually inspected for the error flag of "!!!".
+One could of course create a more elaborate script that automatically
+checked for such errors.
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 6dd274d7e1cf..2d65c2182161 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -906,9 +906,20 @@ Aside:
 
 
 4. The I/O scheduler
-I/O schedulers are now per queue. They should be runtime switchable and modular
-but aren't yet. Jens has most bits to do this, but the sysfs implementation is
-missing.
+I/O scheduler, a.k.a. elevator, is implemented in two layers.  Generic dispatch
+queue and specific I/O schedulers.  Unless stated otherwise, elevator is used
+to refer to both parts and I/O scheduler to specific I/O schedulers.
+
+Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c.
+The generic dispatch queue is responsible for properly ordering barrier
+requests, requeueing, handling non-fs requests and all other subtleties.
+
+Specific I/O schedulers are responsible for ordering normal filesystem
+requests.  They can also choose to delay certain requests to improve
+throughput or whatever purpose.  As the plural form indicates, there are
+multiple I/O schedulers.  They can be built as modules but at least one should
+be built inside the kernel.  Each queue can choose different one and can also
+change to another one dynamically.
 
 A block layer call to the i/o scheduler follows the convention elv_xxx(). This
 calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh,
@@ -921,44 +932,36 @@ keeping work.
 The functions an elevator may implement are: (* are mandatory)
 elevator_merge_fn		called to query requests for merge with a bio
 
-elevator_merge_req_fn		" " "  with another request
+elevator_merge_req_fn		called when two requests get merged. the one
+				which gets merged into the other one will be
+				never seen by I/O scheduler again. IOW, after
+				being merged, the request is gone.
 
 elevator_merged_fn		called when a request in the scheduler has been
 				involved in a merge. It is used in the deadline
 				scheduler for example, to reposition the request
 				if its sorting order has changed.
 
-*elevator_next_req_fn		returns the next scheduled request, or NULL
-				if there are none (or none are ready).
+elevator_dispatch_fn		fills the dispatch queue with ready requests.
+				I/O schedulers are free to postpone requests by
+				not filling the dispatch queue unless @force
+				is non-zero.  Once dispatched, I/O schedulers
+				are not allowed to manipulate the requests -
+				they belong to generic dispatch queue.
 
-*elevator_add_req_fn		called to add a new request into the scheduler
+elevator_add_req_fn		called to add a new request into the scheduler
 
 elevator_queue_empty_fn		returns true if the merge queue is empty.
 				Drivers shouldn't use this, but rather check
 				if elv_next_request is NULL (without losing the
 				request if one exists!)
 
-elevator_remove_req_fn		This is called when a driver claims ownership of
-				the target request - it now belongs to the
-				driver. It must not be modified or merged.
-				Drivers must not lose the request! A subsequent
-				call of elevator_next_req_fn must  return the
-				_next_ request.
-
-elevator_requeue_req_fn		called to add a request to the scheduler. This
-				is used when the request has alrnadebeen
-				returned by elv_next_request, but hasn't
-				completed. If this is not implemented then
-				elevator_add_req_fn is called instead.
-
 elevator_former_req_fn
 elevator_latter_req_fn		These return the request before or after the
 				one specified in disk sort order. Used by the
 				block layer to find merge possibilities.
 
-elevator_completed_req_fn	called when a request is completed. This might
-				come about due to being merged with another or
-				when the device completes the request.
+elevator_completed_req_fn	called when a request is completed.
 
 elevator_may_queue_fn		returns true if the scheduler wants to allow the
 				current context to queue a new request even if
@@ -967,13 +970,33 @@ elevator_may_queue_fn		returns true if the scheduler wants to allow the
 
 elevator_set_req_fn
 elevator_put_req_fn		Must be used to allocate and free any elevator
-				specific storate for a request.
+				specific storage for a request.
+
+elevator_activate_req_fn	Called when device driver first sees a request.
+				I/O schedulers can use this callback to
+				determine when actual execution of a request
+				starts.
+elevator_deactivate_req_fn	Called when device driver decides to delay
+				a request by requeueing it.
 
 elevator_init_fn
 elevator_exit_fn		Allocate and free any elevator specific storage
 				for a queue.
 
-4.2 I/O scheduler implementation
+4.2 Request flows seen by I/O schedulers
+All requests seens by I/O schedulers strictly follow one of the following three
+flows.
+
+ set_req_fn ->
+
+ i.   add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn ->
+      (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn
+ ii.  add_req_fn -> (merged_fn ->)* -> merge_req_fn
+ iii. [none]
+
+ -> put_req_fn
+
+4.3 I/O scheduler implementation
 The generic i/o scheduler algorithm attempts to sort/merge/batch requests for
 optimal disk scan and request servicing performance (based on generic
 principles and device capabilities), optimized for:
@@ -993,18 +1016,7 @@ request in sort order to prevent binary tree lookups.
 This arrangement is not a generic block layer characteristic however, so
 elevators may implement queues as they please.
 
-ii. Last merge hint
-The last merge hint is part of the generic queue layer. I/O schedulers must do
-some management on it. For the most part, the most important thing is to make
-sure q->last_merge is cleared (set to NULL) when the request on it is no longer
-a candidate for merging (for example if it has been sent to the driver).
-
-The last merge performed is cached as a hint for the subsequent request. If
-sequential data is being submitted, the hint is used to perform merges without
-any scanning. This is not sufficient when there are multiple processes doing
-I/O though, so a "merge hash" is used by some schedulers.
-
-iii. Merge hash
+ii. Merge hash
 AS and deadline use a hash table indexed by the last sector of a request. This
 enables merging code to quickly look up "back merge" candidates, even when
 multiple I/O streams are being performed at once on one disk.
@@ -1013,29 +1025,8 @@ multiple I/O streams are being performed at once on one disk.
 are far less common than "back merges" due to the nature of most I/O patterns.
 Front merges are handled by the binary trees in AS and deadline schedulers.
 
-iv. Handling barrier cases
-A request with flags REQ_HARDBARRIER or REQ_SOFTBARRIER must not be ordered
-around. That is, they must be processed after all older requests, and before
-any newer ones. This includes merges!
-
-In AS and deadline schedulers, barriers have the effect of flushing the reorder
-queue. The performance cost of this will vary from nothing to a lot depending
-on i/o patterns and device characteristics. Obviously they won't improve
-performance, so their use should be kept to a minimum.
-
-v. Handling insertion position directives
-A request may be inserted with a position directive. The directives are one of
-ELEVATOR_INSERT_BACK, ELEVATOR_INSERT_FRONT, ELEVATOR_INSERT_SORT.
-
-ELEVATOR_INSERT_SORT is a general directive for non-barrier requests.
-ELEVATOR_INSERT_BACK is used to insert a barrier to the back of the queue.
-ELEVATOR_INSERT_FRONT is used to insert a barrier to the front of the queue, and
-overrides the ordering requested by any previous barriers. In practice this is
-harmless and required, because it is used for SCSI requeueing. This does not
-require flushing the reorder queue, so does not impose a performance penalty.
-
-vi. Plugging the queue to batch requests in anticipation of opportunities for
-  merge/sort optimizations
+iii. Plugging the queue to batch requests in anticipation of opportunities for
+     merge/sort optimizations
 
 This is just the same as in 2.4 so far, though per-device unplugging
 support is anticipated for 2.5. Also with a priority-based i/o scheduler,
@@ -1069,7 +1060,7 @@ Aside:
   blk_kick_queue() to unplug a specific queue (right away ?)
   or optionally, all queues, is in the plan.
 
-4.3 I/O contexts
+4.4 I/O contexts
 I/O contexts provide a dynamically allocated per process data area. They may
 be used in I/O schedulers, and in the block layer (could be used for IO statis,
 priorities for example). See *io_context in drivers/block/ll_rw_blk.c, and
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index e132fb1163b0..7eb715e07eda 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -49,9 +49,6 @@ changes occur:
 	page table operations such as what happens during
 	fork, and exec.
 
-	Platform developers note that generic code will always
-	invoke this interface without mm->page_table_lock held.
-
 3) void flush_tlb_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 
@@ -72,9 +69,6 @@ changes occur:
 	call flush_tlb_page (see below) for each entry which may be
 	modified.
 
-	Platform developers note that generic code will always
-	invoke this interface with mm->page_table_lock held.
-
 4) void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 
 	This time we need to remove the PAGE_SIZE sized translation
@@ -93,9 +87,6 @@ changes occur:
 
 	This is used primarily during fault processing.
 
-	Platform developers note that generic code will always
-	invoke this interface with mm->page_table_lock held.
-
 5) void flush_tlb_pgtables(struct mm_struct *mm,
 			   unsigned long start, unsigned long end)
 
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index d17b7d2dd771..a09a8eb80665 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -94,7 +94,7 @@ the available CPU and Memory resources amongst the requesting tasks.
 But larger systems, which benefit more from careful processor and
 memory placement to reduce memory access times and contention,
 and which typically represent a larger investment for the customer,
-can benefit from explictly placing jobs on properly sized subsets of
+can benefit from explicitly placing jobs on properly sized subsets of
 the system.
 
 This can be especially valuable on:
diff --git a/Documentation/driver-model/driver.txt b/Documentation/driver-model/driver.txt
index fabaca1ab1b0..59806c9761f7 100644
--- a/Documentation/driver-model/driver.txt
+++ b/Documentation/driver-model/driver.txt
@@ -14,8 +14,8 @@ struct device_driver {
         int     (*probe)        (struct device * dev);
         int     (*remove)       (struct device * dev);
 
-        int     (*suspend)      (struct device * dev, pm_message_t state, u32 level);
-        int     (*resume)       (struct device * dev, u32 level);
+        int     (*suspend)      (struct device * dev, pm_message_t state);
+        int     (*resume)       (struct device * dev);
 };
 
 
@@ -194,69 +194,13 @@ device; i.e. anything in the device's driver_data field.
 If the device is still present, it should quiesce the device and place
 it into a supported low-power state.
 
-	int	(*suspend)	(struct device * dev, pm_message_t state, u32 level);
+	int	(*suspend)	(struct device * dev, pm_message_t state);
 
-suspend is called to put the device in a low power state. There are
-several stages to successfully suspending a device, which is denoted in
-the @level parameter. Breaking the suspend transition into several
-stages affords the platform flexibility in performing device power
-management based on the requirements of the system and the
-user-defined policy.
+suspend is called to put the device in a low power state.
 
-SUSPEND_NOTIFY notifies the device that a suspend transition is about
-to happen. This happens on system power state transitions to verify
-that all devices can successfully suspend.
+	int	(*resume)	(struct device * dev);
 
-A driver may choose to fail on this call, which should cause the
-entire suspend transition to fail. A driver should fail only if it
-knows that the device will not be able to be resumed properly when the
-system wakes up again. It could also fail if it somehow determines it
-is in the middle of an operation too important to stop.
-
-SUSPEND_DISABLE tells the device to stop I/O transactions. When it
-stops transactions, or what it should do with unfinished transactions
-is a policy of the driver. After this call, the driver should not
-accept any other I/O requests.
-
-SUSPEND_SAVE_STATE tells the device to save the context of the
-hardware. This includes any bus-specific hardware state and
-device-specific hardware state. A pointer to this saved state can be
-stored in the device's saved_state field.
-
-SUSPEND_POWER_DOWN tells the driver to place the device in the low
-power state requested. 
-
-Whether suspend is called with a given level is a policy of the
-platform. Some levels may be omitted; drivers must not assume the
-reception of any level. However, all levels must be called in the
-order above; i.e. notification will always come before disabling;
-disabling the device will come before suspending the device.
-
-All calls are made with interrupts enabled, except for the
-SUSPEND_POWER_DOWN level.
-
-	int	(*resume)	(struct device * dev, u32 level);
-
-Resume is used to bring a device back from a low power state. Like the
-suspend transition, it happens in several stages. 
-
-RESUME_POWER_ON tells the driver to set the power state to the state
-before the suspend call (The device could have already been in a low
-power state before the suspend call to put in a lower power state). 
-
-RESUME_RESTORE_STATE tells the driver to restore the state saved by
-the SUSPEND_SAVE_STATE suspend call. 
-
-RESUME_ENABLE tells the driver to start accepting I/O transactions
-again. Depending on driver policy, the device may already have pending
-I/O requests. 
-
-RESUME_POWER_ON is called with interrupts disabled. The other resume
-levels are called with interrupts enabled. 
-
-As with the various suspend stages, the driver must not assume that
-any other resume calls have been or will be made. Each call should be
-self-contained and not dependent on any external state.
+Resume is used to bring a device back from a low power state.
 
 
 Attributes
diff --git a/Documentation/driver-model/porting.txt b/Documentation/driver-model/porting.txt
index ff2fef2107f0..98b233cb8b36 100644
--- a/Documentation/driver-model/porting.txt
+++ b/Documentation/driver-model/porting.txt
@@ -350,7 +350,7 @@ When a driver is registered, the bus's list of devices is iterated
 over. bus->match() is called for each device that is not already
 claimed by a driver. 
 
-When a device is successfully bound to a device, device->driver is
+When a device is successfully bound to a driver, device->driver is
 set, the device is added to a per-driver list of devices, and a
 symlink is created in the driver's sysfs directory that points to the
 device's physical directory:
diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c
index 4bef8c25172c..d3ad2c24490a 100644
--- a/Documentation/firmware_class/firmware_sample_driver.c
+++ b/Documentation/firmware_class/firmware_sample_driver.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/device.h>
+#include <linux/string.h>
 
 #include "linux/firmware.h"
 
diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c
index 09eab2f1b373..57b956aecbc5 100644
--- a/Documentation/firmware_class/firmware_sample_firmware_class.c
+++ b/Documentation/firmware_class/firmware_sample_firmware_class.c
@@ -14,6 +14,8 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/timer.h>
+#include <linux/slab.h>
+#include <linux/string.h>
 #include <linux/firmware.h>
 
 
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index 0d0195040d88..7f42e441c645 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -4,18 +4,18 @@ Kernel driver it87
 Supported chips:
   * IT8705F
     Prefix: 'it87'
-    Addresses scanned: from Super I/O config space, or default ISA 0x290 (8 I/O ports)
+    Addresses scanned: from Super I/O config space (8 I/O ports)
     Datasheet: Publicly available at the ITE website
                http://www.ite.com.tw/
   * IT8712F
     Prefix: 'it8712'
     Addresses scanned: I2C 0x28 - 0x2f
-                       from Super I/O config space, or default ISA 0x290 (8 I/O ports)
+                       from Super I/O config space (8 I/O ports)
     Datasheet: Publicly available at the ITE website
                http://www.ite.com.tw/
   * SiS950   [clone of IT8705F]
-    Prefix: 'sis950'
-    Addresses scanned: from Super I/O config space, or default ISA 0x290 (8 I/O ports)
+    Prefix: 'it87'
+    Addresses scanned: from Super I/O config space (8 I/O ports)
     Datasheet: No longer be available
 
 Author: Christophe Gauthron <chrisg@0-in.com>
diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90
index 2c4cf39471f4..438cb24cee5b 100644
--- a/Documentation/hwmon/lm90
+++ b/Documentation/hwmon/lm90
@@ -24,14 +24,14 @@ Supported chips:
                http://www.national.com/pf/LM/LM86.html
   * Analog Devices ADM1032
     Prefix: 'adm1032'
-    Addresses scanned: I2C 0x4c
+    Addresses scanned: I2C 0x4c and 0x4d
     Datasheet: Publicly available at the Analog Devices website
-               http://products.analog.com/products/info.asp?product=ADM1032
+               http://www.analog.com/en/prod/0,2877,ADM1032,00.html
   * Analog Devices ADT7461
     Prefix: 'adt7461'
-    Addresses scanned: I2C 0x4c
+    Addresses scanned: I2C 0x4c and 0x4d
     Datasheet: Publicly available at the Analog Devices website
-               http://products.analog.com/products/info.asp?product=ADT7461
+               http://www.analog.com/en/prod/0,2877,ADT7461,00.html
     Note: Only if in ADM1032 compatibility mode
   * Maxim MAX6657
     Prefix: 'max6657'
@@ -71,8 +71,8 @@ increased resolution of the remote temperature measurement.
 
 The different chipsets of the family are not strictly identical, although
 very similar. This driver doesn't handle any specific feature for now,
-but could if there ever was a need for it. For reference, here comes a
-non-exhaustive list of specific features:
+with the exception of SMBus PEC. For reference, here comes a non-exhaustive
+list of specific features:
 
 LM90:
   * Filter and alert configuration register at 0xBF.
@@ -91,6 +91,7 @@ ADM1032:
   * Conversion averaging.
   * Up to 64 conversions/s.
   * ALERT is triggered by open remote sensor.
+  * SMBus PEC support for Write Byte and Receive Byte transactions.
 
 ADT7461
   * Extended temperature range (breaks compatibility)
@@ -119,3 +120,37 @@ The lm90 driver will not update its values more frequently than every
 other second; reading them more often will do no harm, but will return
 'old' values.
 
+PEC Support
+-----------
+
+The ADM1032 is the only chip of the family which supports PEC. It does
+not support PEC on all transactions though, so some care must be taken.
+
+When reading a register value, the PEC byte is computed and sent by the
+ADM1032 chip. However, in the case of a combined transaction (SMBus Read
+Byte), the ADM1032 computes the CRC value over only the second half of
+the message rather than its entirety, because it thinks the first half
+of the message belongs to a different transaction. As a result, the CRC
+value differs from what the SMBus master expects, and all reads fail.
+
+For this reason, the lm90 driver will enable PEC for the ADM1032 only if
+the bus supports the SMBus Send Byte and Receive Byte transaction types.
+These transactions will be used to read register values, instead of
+SMBus Read Byte, and PEC will work properly.
+
+Additionally, the ADM1032 doesn't support SMBus Send Byte with PEC.
+Instead, it will try to write the PEC value to the register (because the
+SMBus Send Byte transaction with PEC is similar to a Write Byte transaction
+without PEC), which is not what we want. Thus, PEC is explicitely disabled
+on SMBus Send Byte transactions in the lm90 driver.
+
+PEC on byte data transactions represents a significant increase in bandwidth
+usage (+33% for writes, +25% for reads) in normal conditions. With the need
+to use two SMBus transaction for reads, this overhead jumps to +50%. Worse,
+two transactions will typically mean twice as much delay waiting for
+transaction completion, effectively doubling the register cache refresh time.
+I guess reliability comes at a price, but it's quite expensive this time.
+
+So, as not everyone might enjoy the slowdown, PEC can be disabled through
+sysfs. Just write 0 to the "pec" file and PEC will be disabled. Write 1
+to that file to enable PEC again.
diff --git a/Documentation/hwmon/smsc47b397 b/Documentation/hwmon/smsc47b397
index da9d80c96432..20682f15ae41 100644
--- a/Documentation/hwmon/smsc47b397
+++ b/Documentation/hwmon/smsc47b397
@@ -3,6 +3,7 @@ Kernel driver smsc47b397
 
 Supported chips:
   * SMSC LPC47B397-NC
+  * SMSC SCH5307-NS
     Prefix: 'smsc47b397'
     Addresses scanned: none, address read from Super I/O config space
     Datasheet: In this file
@@ -12,11 +13,14 @@ Authors: Mark M. Hoffman <mhoffman@lightlink.com>
 
 November 23, 2004
 
-The following specification describes the SMSC LPC47B397-NC sensor chip
+The following specification describes the SMSC LPC47B397-NC[1] sensor chip
 (for which there is no public datasheet available). This document was
 provided by Craig Kelly (In-Store Broadcast Network) and edited/corrected
 by Mark M. Hoffman <mhoffman@lightlink.com>.
 
+[1] And SMSC SCH5307-NS, which has a different device ID but is otherwise
+compatible.
+
 * * * * *
 
 Methods for detecting the HP SIO and reading the thermal data on a dc7100.
@@ -127,7 +131,7 @@ OUT	DX,AL
 The registers of interest for identifying the SIO on the dc7100 are Device ID
 (0x20) and Device Rev  (0x21).
 
-The Device ID will read 0X6F
+The Device ID will read 0x6F (for SCH5307-NS, 0x81)
 The Device Rev currently reads 0x01
 
 Obtaining the HWM Base Address.
diff --git a/Documentation/hwmon/smsc47m1 b/Documentation/hwmon/smsc47m1
index 34e6478c1425..c15bbe68264e 100644
--- a/Documentation/hwmon/smsc47m1
+++ b/Documentation/hwmon/smsc47m1
@@ -12,6 +12,10 @@ Supported chips:
         http://www.smsc.com/main/datasheets/47m14x.pdf
         http://www.smsc.com/main/tools/discontinued/47m15x.pdf
         http://www.smsc.com/main/datasheets/47m192.pdf
+  * SMSC LPC47M997
+    Addresses scanned: none, address read from Super I/O config space
+    Prefix: 'smsc47m1'
+    Datasheet: none
 
 Authors:
         Mark D. Studebaker <mdsxyz123@yahoo.com>,
@@ -30,6 +34,9 @@ The 47M15x and 47M192 chips contain a full 'hardware monitoring block'
 in addition to the fan monitoring and control. The hardware monitoring
 block is not supported by the driver.
 
+No documentation is available for the 47M997, but it has the same device
+ID as the 47M15x and 47M192 chips and seems to be compatible.
+
 Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
 triggered if the rotation speed has dropped below a programmable limit. Fan
 readings can be divided by a programmable divider (1, 2, 4 or 8) to give
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index 346400519d0d..764cdc5480e7 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -272,3 +272,6 @@ beep_mask	Bitmask for beep.
 
 eeprom		Raw EEPROM data in binary form.
 		Read only.
+
+pec		Enable or disable PEC (SMBus only)
+		Read/Write
diff --git a/Documentation/hwmon/via686a b/Documentation/hwmon/via686a
index b82014cb7c53..a936fb3824b2 100644
--- a/Documentation/hwmon/via686a
+++ b/Documentation/hwmon/via686a
@@ -18,8 +18,9 @@ Authors:
 Module Parameters
 -----------------
 
-force_addr=0xaddr       Set the I/O base address. Useful for Asus A7V boards
-                        that don't set the address in the BIOS. Does not do a
+force_addr=0xaddr       Set the I/O base address. Useful for boards that
+                        don't set the address in the BIOS. Look for a BIOS
+                        upgrade before resorting to this. Does not do a
                         PCI force; the via686a must still be present in lspci.
                         Don't use this unless the driver complains that the
                         base address is not set.
@@ -63,3 +64,15 @@ miss once-only alarms.
 
 The driver only updates its values each 1.5 seconds; reading it more often
 will do no harm, but will return 'old' values.
+
+Known Issues
+------------
+
+This driver handles sensors integrated in some VIA south bridges. It is
+possible that a motherboard maker used a VT82C686A/B chip as part of a
+product design but was not interested in its hardware monitoring features,
+in which case the sensor inputs will not be wired. This is the case of
+the Asus K7V, A7V and A7V133 motherboards, to name only a few of them.
+So, if you need the force_addr parameter, and end up with values which
+don't seem to make any sense, don't look any further: your chip is simply
+not wired for hardware monitoring.
diff --git a/Documentation/i2c/busses/i2c-i810 b/Documentation/i2c/busses/i2c-i810
index 0544eb332887..83c3b9743c3c 100644
--- a/Documentation/i2c/busses/i2c-i810
+++ b/Documentation/i2c/busses/i2c-i810
@@ -2,6 +2,7 @@ Kernel driver i2c-i810
 
 Supported adapters:
   * Intel 82810, 82810-DC100, 82810E, and 82815 (GMCH)
+  * Intel 82845G (GMCH)
 
 Authors: 
 	Frodo Looijaard <frodol@dds.nl>, 
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro
index 702f5ac68c09..9363b8bd6109 100644
--- a/Documentation/i2c/busses/i2c-viapro
+++ b/Documentation/i2c/busses/i2c-viapro
@@ -4,17 +4,18 @@ Supported adapters:
   * VIA Technologies, Inc. VT82C596A/B
     Datasheet: Sometimes available at the VIA website
 
-  * VIA Technologies, Inc. VT82C686A/B 
+  * VIA Technologies, Inc. VT82C686A/B
     Datasheet: Sometimes available at the VIA website
 
   * VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237
     Datasheet: available on request from Via
 
 Authors:
-	Frodo Looijaard <frodol@dds.nl>,  
-	Philip Edelbrock <phil@netroedge.com>, 
-	Ky�sti M�lkki <kmalkki@cc.hut.fi>, 
-	Mark D. Studebaker <mdsxyz123@yahoo.com> 
+	Frodo Looijaard <frodol@dds.nl>,
+	Philip Edelbrock <phil@netroedge.com>,
+	Ky�sti M�lkki <kmalkki@cc.hut.fi>,
+	Mark D. Studebaker <mdsxyz123@yahoo.com>,
+	Jean Delvare <khali@linux-fr.org>
 
 Module Parameters
 -----------------
@@ -28,20 +29,22 @@ Description
 -----------
 
 i2c-viapro is a true SMBus host driver for motherboards with one of the
-supported VIA southbridges.
+supported VIA south bridges.
 
 Your lspci -n listing must show one of these :
 
- device 1106:3050   (VT82C596 function 3)
- device 1106:3051   (VT82C596 function 3)
+ device 1106:3050   (VT82C596A function 3)
+ device 1106:3051   (VT82C596B function 3)
  device 1106:3057   (VT82C686 function 4)
  device 1106:3074   (VT8233)
  device 1106:3147   (VT8233A)
- device 1106:8235   (VT8231)
- devide 1106:3177   (VT8235)
- devide 1106:3227   (VT8237)
+ device 1106:8235   (VT8231 function 4)
+ device 1106:3177   (VT8235)
+ device 1106:3227   (VT8237R)
 
 If none of these show up, you should look in the BIOS for settings like
 enable ACPI / SMBus or even USB.
 
-
+Except for the oldest chips (VT82C596A/B, VT82C686A and most probably
+VT8231), this driver supports I2C block transactions. Such transactions
+are mainly useful to read from and write to EEPROMs.
diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205
new file mode 100644
index 000000000000..09407c991fe5
--- /dev/null
+++ b/Documentation/i2c/chips/x1205
@@ -0,0 +1,38 @@
+Kernel driver x1205
+===================
+
+Supported chips:
+  * Xicor X1205 RTC
+    Prefix: 'x1205'
+    Addresses scanned: none
+    Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html
+
+Authors:
+	Karen Spearel <kas11@tampabay.rr.com>,
+	Alessandro Zummo <a.zummo@towertech.it>
+
+Description
+-----------
+
+This module aims to provide complete access to the Xicor X1205 RTC.
+Recently Xicor has merged with Intersil, but the chip is
+still sold under the Xicor brand.
+
+This chip is located at address 0x6f and uses a 2-byte register addressing.
+Two bytes need to be written to read a single register, while most
+other chips just require one and take the second one as the data
+to be written. To prevent corrupting unknown chips, the user must
+explicitely set the probe parameter.
+
+example:
+
+modprobe x1205 probe=0,0x6f
+
+The module supports one more option, hctosys, which is used to set the
+software clock from the x1205. On systems where the x1205 is the
+only hardware rtc, this parameter could be used to achieve a correct
+date/time earlier in the system boot sequence.
+
+example:
+
+modprobe x1205 probe=0,0x6f hctosys=1
diff --git a/Documentation/i2c/functionality b/Documentation/i2c/functionality
index 41ffefbdc60c..60cca249e452 100644
--- a/Documentation/i2c/functionality
+++ b/Documentation/i2c/functionality
@@ -17,9 +17,10 @@ For the most up-to-date list of functionality constants, please check
   I2C_FUNC_I2C                    Plain i2c-level commands (Pure SMBus
                                   adapters typically can not do these)
   I2C_FUNC_10BIT_ADDR             Handles the 10-bit address extensions
-  I2C_FUNC_PROTOCOL_MANGLING      Knows about the I2C_M_REV_DIR_ADDR,
-                                  I2C_M_REV_DIR_ADDR and I2C_M_REV_DIR_NOSTART
-                                  flags (which modify the i2c protocol!)
+  I2C_FUNC_PROTOCOL_MANGLING      Knows about the I2C_M_IGNORE_NAK,
+                                  I2C_M_REV_DIR_ADDR, I2C_M_NOSTART and
+                                  I2C_M_NO_RD_ACK flags (which modify the
+                                  I2C protocol!)
   I2C_FUNC_SMBUS_QUICK            Handles the SMBus write_quick command
   I2C_FUNC_SMBUS_READ_BYTE        Handles the SMBus read_byte command
   I2C_FUNC_SMBUS_WRITE_BYTE       Handles the SMBus write_byte command
diff --git a/Documentation/i2c/porting-clients b/Documentation/i2c/porting-clients
index 4849dfd6961c..184fac2377aa 100644
--- a/Documentation/i2c/porting-clients
+++ b/Documentation/i2c/porting-clients
@@ -82,7 +82,7 @@ Technical changes:
   exit and exit_free. For i2c+isa drivers, labels should be named
   ERROR0, ERROR1 and ERROR2. Don't forget to properly set err before
   jumping to error labels. By the way, labels should be left-aligned.
-  Use memset to fill the client and data area with 0x00.
+  Use kzalloc instead of kmalloc.
   Use i2c_set_clientdata to set the client data (as opposed to
   a direct access to client->data).
   Use strlcpy instead of strcpy to copy the client name.
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 077275722a7c..cff7b652588a 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -33,8 +33,8 @@ static struct i2c_driver foo_driver = {
 	.command	= &foo_command /* may be NULL */
 }
  
-The name can be chosen freely, and may be upto 40 characters long. Please
-use something descriptive here.
+The name field must match the driver name, including the case. It must not
+contain spaces, and may be up to 31 characters long.
 
 Don't worry about the flags field; just put I2C_DF_NOTIFY into it. This
 means that your driver will be notified when new adapters are found.
@@ -43,9 +43,6 @@ This is almost always what you want.
 All other fields are for call-back functions which will be explained 
 below.
 
-There use to be two additional fields in this structure, inc_use et dec_use,
-for module usage count, but these fields were obsoleted and removed.
-
 
 Extra client data
 =================
@@ -58,6 +55,7 @@ be very useful.
 An example structure is below.
 
   struct foo_data {
+    struct i2c_client client;
     struct semaphore lock; /* For ISA access in `sensors' drivers. */
     int sysctl_id;         /* To keep the /proc directory entry for 
                               `sensors' drivers. */
@@ -275,6 +273,7 @@ For now, you can ignore the `flags' parameter. It is there for future use.
     if (is_isa) {
 
       /* Discard immediately if this ISA range is already used */
+      /* FIXME: never use check_region(), only request_region() */
       if (check_region(address,FOO_EXTENT))
         goto ERROR0;
 
@@ -310,22 +309,15 @@ For now, you can ignore the `flags' parameter. It is there for future use.
        client structure, even though we cannot fill it completely yet.
        But it allows us to access several i2c functions safely */
     
-    /* Note that we reserve some space for foo_data too. If you don't
-       need it, remove it. We do it here to help to lessen memory
-       fragmentation. */
-    if (! (new_client = kmalloc(sizeof(struct i2c_client) + 
-                                sizeof(struct foo_data),
-                                GFP_KERNEL))) {
+    if (!(data = kzalloc(sizeof(struct foo_data), GFP_KERNEL))) {
       err = -ENOMEM;
       goto ERROR0;
     }
 
-    /* This is tricky, but it will set the data to the right value. */
-    client->data = new_client + 1;
-    data = (struct foo_data *) (client->data);
+    new_client = &data->client;
+    i2c_set_clientdata(new_client, data);
 
     new_client->addr = address;
-    new_client->data = data;
     new_client->adapter = adapter;
     new_client->driver = &foo_driver;
     new_client->flags = 0;
@@ -451,7 +443,7 @@ much simpler than the attachment code, fortunately!
       release_region(client->addr,LM78_EXTENT);
     /* HYBRID SENSORS CHIP ONLY END */
 
-    kfree(client); /* Frees client data too, if allocated at the same time */
+    kfree(data);
     return 0;
   }
 
@@ -576,12 +568,12 @@ SMBus communication
   extern s32 i2c_smbus_write_block_data(struct i2c_client * client,
                                         u8 command, u8 length,
                                         u8 *values);
+  extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
+                                           u8 command, u8 *values);
 
 These ones were removed in Linux 2.6.10 because they had no users, but could
 be added back later if needed:
 
-  extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
-                                           u8 command, u8 *values);
   extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
                                        u8 command, u8 *values);
   extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client,
diff --git a/Documentation/input/yealink.txt b/Documentation/input/yealink.txt
index 85f095a7ad04..0962c5c948be 100644
--- a/Documentation/input/yealink.txt
+++ b/Documentation/input/yealink.txt
@@ -2,7 +2,6 @@ Driver documentation for yealink usb-p1k phones
 
 0. Status
 ~~~~~~~~~
-
 The p1k is a relatively cheap usb 1.1 phone with:
   - keyboard		full support, yealink.ko / input event API
   - LCD			full support, yealink.ko / sysfs API
@@ -17,9 +16,8 @@ For vendor documentation see http://www.yealink.com
 
 1. Compilation (stand alone version)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 Currently only kernel 2.6.x.y versions are supported.
-In order to build the yealink.ko module do:
+In order to build the yealink.ko module do
 
   make
 
@@ -28,6 +26,21 @@ the Makefile is pointing to the location where your kernel sources
 are located, default /usr/src/linux.
 
 
+1.1 Troubleshooting
+~~~~~~~~~~~~~~~~~~~
+Q: Module yealink compiled and installed without any problem but phone
+   is not initialized and does not react to any actions.
+A: If you see something like:
+   hiddev0: USB HID v1.00 Device [Yealink Network Technology Ltd. VOIP USB Phone
+   in dmesg, it means that the hid driver has grabbed the device first. Try to
+   load module yealink before any other usb hid driver. Please see the
+   instructions provided by your distribution on module configuration.
+
+Q: Phone is working now (displays version and accepts keypad input) but I can't
+   find the sysfs files.
+A: The sysfs files are located on the particular usb endpoint. On most
+   distributions you can do: "find /sys/ -name get_icons" for a hint.
+
 
 2. keyboard features
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 971589a9752d..5dffcfefc3c7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1460,8 +1460,6 @@ running once the system is up.
 	stifb=		[HW]
 			Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
 
-	stram_swap=	[HW,M68k]
-
 	swiotlb=	[IA-64] Number of I/O TLB slabs
 
 	switches=	[HW,M68k]
@@ -1517,8 +1515,6 @@ running once the system is up.
 	uart6850=	[HW,OSS]
 			Format: <io>,<irq>
 
-	usb-handoff	[HW] Enable early USB BIOS -> OS handoff
-
 	usbhid.mousepoll=
 			[USBHID] The interval which mice are to be polled at.
 
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 4afe03a58c5b..31154882000a 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -196,7 +196,7 @@ KEY ACCESS PERMISSIONS
 
 Keys have an owner user ID, a group access ID, and a permissions mask. The mask
 has up to eight bits each for possessor, user, group and other access. Only
-five of each set of eight bits are defined. These permissions granted are:
+six of each set of eight bits are defined. These permissions granted are:
 
  (*) View
 
@@ -224,6 +224,10 @@ five of each set of eight bits are defined. These permissions granted are:
      keyring to a key, a process must have Write permission on the keyring and
      Link permission on the key.
 
+ (*) Set Attribute
+
+     This permits a key's UID, GID and permissions mask to be changed.
+
 For changing the ownership, group ID or permissions mask, being the owner of
 the key or having the sysadmin capability is sufficient.
 
@@ -242,15 +246,15 @@ about the status of the key service:
      this way:
 
 	SERIAL   FLAGS  USAGE EXPY PERM     UID   GID   TYPE      DESCRIPTION: SUMMARY
-	00000001 I-----    39 perm 1f1f0000     0     0 keyring   _uid_ses.0: 1/4
-	00000002 I-----     2 perm 1f1f0000     0     0 keyring   _uid.0: empty
-	00000007 I-----     1 perm 1f1f0000     0     0 keyring   _pid.1: empty
-	0000018d I-----     1 perm 1f1f0000     0     0 keyring   _pid.412: empty
-	000004d2 I--Q--     1 perm 1f1f0000    32    -1 keyring   _uid.32: 1/4
-	000004d3 I--Q--     3 perm 1f1f0000    32    -1 keyring   _uid_ses.32: empty
+	00000001 I-----    39 perm 1f3f0000     0     0 keyring   _uid_ses.0: 1/4
+	00000002 I-----     2 perm 1f3f0000     0     0 keyring   _uid.0: empty
+	00000007 I-----     1 perm 1f3f0000     0     0 keyring   _pid.1: empty
+	0000018d I-----     1 perm 1f3f0000     0     0 keyring   _pid.412: empty
+	000004d2 I--Q--     1 perm 1f3f0000    32    -1 keyring   _uid.32: 1/4
+	000004d3 I--Q--     3 perm 1f3f0000    32    -1 keyring   _uid_ses.32: empty
 	00000892 I--QU-     1 perm 1f000000     0     0 user      metal:copper: 0
-	00000893 I--Q-N     1  35s 1f1f0000     0     0 user      metal:silver: 0
-	00000894 I--Q--     1  10h 001f0000     0     0 user      metal:gold: 0
+	00000893 I--Q-N     1  35s 1f3f0000     0     0 user      metal:silver: 0
+	00000894 I--Q--     1  10h 003f0000     0     0 user      metal:gold: 0
 
      The flags are:
 
diff --git a/Documentation/m68k/kernel-options.txt b/Documentation/m68k/kernel-options.txt
index e191baad8308..d5d3f064f552 100644
--- a/Documentation/m68k/kernel-options.txt
+++ b/Documentation/m68k/kernel-options.txt
@@ -626,7 +626,7 @@ ignored (others aren't affected).
     can be performed in optimal order. Not all SCSI devices support
     tagged queuing (:-().
 
-4.6 switches=
+4.5 switches=
 -------------
 
 Syntax: switches=<list of switches>
@@ -661,28 +661,6 @@ correctly.
 earlier initialization ("ov_"-less) takes precedence. But the
 switching-off on reset still happens in this case.
 
-4.5) stram_swap=
-----------------
-
-Syntax: stram_swap=<do_swap>[,<max_swap>]
-
-  This option is available only if the kernel has been compiled with
-CONFIG_STRAM_SWAP enabled. Normally, the kernel then determines
-dynamically whether to actually use ST-RAM as swap space. (Currently,
-the fraction of ST-RAM must be less or equal 1/3 of total memory to
-enable this swapping.) You can override the kernel's decision by
-specifying this option. 1 for <do_swap> means always enable the swap,
-even if you have less alternate RAM. 0 stands for never swap to
-ST-RAM, even if it's small enough compared to the rest of memory.
-
-  If ST-RAM swapping is enabled, the kernel usually uses all free
-ST-RAM as swap "device". If the kernel resides in ST-RAM, the region
-allocated by it is obviously never used for swapping :-) You can also
-limit this amount by specifying the second parameter, <max_swap>, if
-you want to use parts of ST-RAM as normal system memory. <max_swap> is
-in kBytes and the number should be a multiple of 4 (otherwise: rounded
-down).
-
 5) Options for Amiga Only:
 ==========================
 
diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README
new file mode 100644
index 000000000000..a7e4c4ea3560
--- /dev/null
+++ b/Documentation/mips/AU1xxx_IDE.README
@@ -0,0 +1,168 @@
+README for MIPS AU1XXX IDE driver - Released 2005-07-15
+
+ABOUT
+-----
+This file describes the 'drivers/ide/mips/au1xxx-ide.c', related files and the
+services they provide.
+
+If you are short in patience and just want to know how to add your hard disc to
+the white or black list, go to the 'ADD NEW HARD DISC TO WHITE OR BLACK LIST'
+section.
+
+
+LICENSE
+-------
+
+Copyright (c) 2003-2005 AMD, Personal Connectivity Solutions
+
+This program is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free Software
+Foundation; either version 2 of the License, or (at your option) any later
+version.
+
+THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
+INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
+FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+You should have received a copy of the GNU General Public License along with
+this program; if not, write to the Free Software Foundation, Inc.,
+675 Mass Ave, Cambridge, MA 02139, USA.
+
+Note: for more information, please refer "AMD Alchemy Au1200/Au1550 IDE
+      Interface and Linux Device Driver" Application Note.
+
+
+FILES, CONFIGS AND COMPATABILITY
+--------------------------------
+
+Two files are introduced:
+
+  a) 'include/asm-mips/mach-au1x00/au1xxx_ide.h'
+     containes : struct _auide_hwif
+                 struct drive_list_entry dma_white_list
+                 struct drive_list_entry dma_black_list
+                 timing parameters for PIO mode 0/1/2/3/4
+                 timing parameters for MWDMA 0/1/2
+
+  b) 'drivers/ide/mips/au1xxx-ide.c'
+     contains the functionality of the AU1XXX IDE driver
+
+Four configs variables are introduced:
+
+  CONFIG_BLK_DEV_IDE_AU1XXX_PIO_DBDMA    - enable the PIO+DBDMA mode
+  CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA  - enable the MWDMA mode
+  CONFIG_BLK_DEV_IDE_AU1XXX_BURSTABLE_ON - set Burstable FIFO in DBDMA
+                                           controler
+  CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ - maximum transfer size
+                                           per descriptor
+
+If MWDMA is enabled and the connected hard disc is not on the white list, the
+kernel switches to a "safe mwdma mode" at boot time. In this mode the IDE
+performance is substantial slower then in full speed mwdma. In this case
+please add your hard disc to the white list (follow instruction from 'ADD NEW
+HARD DISC TO WHITE OR BLACK LIST' section).
+
+
+SUPPORTED IDE MODES
+-------------------
+
+The AU1XXX IDE driver supported all PIO modes - PIO mode 0/1/2/3/4 - and all
+MWDMA modes - MWDMA 0/1/2 -. There is no support for SWDMA and UDMA mode.
+
+To change the PIO mode use the program hdparm with option -p, e.g.
+'hdparm -p0 [device]' for PIO mode 0. To enable the MWDMA mode use the option
+-X, e.g. 'hdparm -X32 [device]' for MWDMA mode 0.
+
+
+PERFORMANCE CONFIGURATIONS
+--------------------------
+
+If the used system doesn't need USB support enable the following kernel configs:
+
+CONFIG_IDE=y
+CONFIG_BLK_DEV_IDE=y
+CONFIG_IDE_GENERIC=y
+CONFIG_BLK_DEV_IDEPCI=y
+CONFIG_BLK_DEV_GENERIC=y
+CONFIG_BLK_DEV_IDEDMA_PCI=y
+CONFIG_IDEDMA_PCI_AUTO=y
+CONFIG_BLK_DEV_IDE_AU1XXX=y
+CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y
+CONFIG_BLK_DEV_IDE_AU1XXX_BURSTABLE_ON=y
+CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
+CONFIG_BLK_DEV_IDEDMA=y
+CONFIG_IDEDMA_AUTO=y
+
+If the used system need the USB support enable the following kernel configs for
+high IDE to USB throughput.
+
+CONFIG_BLK_DEV_IDEDISK=y
+CONFIG_IDE_GENERIC=y
+CONFIG_BLK_DEV_IDEPCI=y
+CONFIG_BLK_DEV_GENERIC=y
+CONFIG_BLK_DEV_IDEDMA_PCI=y
+CONFIG_IDEDMA_PCI_AUTO=y
+CONFIG_BLK_DEV_IDE_AU1XXX=y
+CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y
+CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
+CONFIG_BLK_DEV_IDEDMA=y
+CONFIG_IDEDMA_AUTO=y
+
+
+ADD NEW HARD DISC TO WHITE OR BLACK LIST
+----------------------------------------
+
+Step 1 : detect the model name of your hard disc
+
+  a) connect your hard disc to the AU1XXX
+
+  b) boot your kernel and get the hard disc model.
+
+     Example boot log:
+
+     --snipped--
+     Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
+     ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx
+     Au1xxx IDE(builtin) configured for MWDMA2
+     Probing IDE interface ide0...
+     hda: Maxtor 6E040L0, ATA DISK drive
+     ide0 at 0xac800000-0xac800007,0xac8001c0 on irq 64
+     hda: max request size: 64KiB
+     hda: 80293248 sectors (41110 MB) w/2048KiB Cache, CHS=65535/16/63, (U)DMA
+     --snipped--
+
+     In this example 'Maxtor 6E040L0'.
+
+Step  2 : edit 'include/asm-mips/mach-au1x00/au1xxx_ide.h'
+
+  Add your hard disc to the dma_white_list or dma_black_list structur.
+
+Step 3 : Recompile the kernel
+
+  Enable MWDMA support in the kernel configuration. Recompile the kernel and
+  reboot.
+
+Step 4 : Tests
+
+  If you have add a hard disc to the white list, please run some stress tests
+  for verification.
+
+
+ACKNOWLEDGMENTS
+---------------
+
+These drivers wouldn't have been done without the base of kernel 2.4.x AU1XXX
+IDE driver from AMD.
+
+Additional input also from:
+Matthias Lenk <matthias.lenk@amd.com>
+
+Happy hacking!
+Enrico Walther <enrico.walther@amd.com>
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index a55f0f95b171..b0fe41da007b 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -777,7 +777,7 @@ doing so is the same as described in the "Configuring Multiple Bonds
 Manually" section, below.
 
 	NOTE: It has been observed that some Red Hat supplied kernels
-are apparently unable to rename modules at load time (the "-obonding1"
+are apparently unable to rename modules at load time (the "-o bond1"
 part).  Attempts to pass that option to modprobe will produce an
 "Operation not permitted" error.  This has been reported on some
 Fedora Core kernels, and has been seen on RHEL 4 as well.  On kernels
@@ -883,7 +883,8 @@ the above does not work, and the second bonding instance never sees
 its options.  In that case, the second options line can be substituted
 as follows:
 
-install bonding1 /sbin/modprobe bonding -obond1 mode=balance-alb miimon=50
+install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \
+	mode=balance-alb miimon=50
 
 	This may be repeated any number of times, specifying a new and
 unique name in place of bond1 for each subsequent instance.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b433c8a27e2d..65895bb51414 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -309,7 +309,7 @@ tcp_tso_win_divisor - INTEGER
        can be consumed by a single TSO frame.
        The setting of this parameter is a choice between burstiness and
        building larger TSO frames.
-       Default: 8
+       Default: 3
 
 tcp_frto - BOOLEAN
 	Enables F-RTO, an enhanced recovery algorithm for TCP retransmission