583 files changed, 46435 insertions, 19617 deletions
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 45932ec21cee..b41f3e58aefa 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -18,11 +18,11 @@ For an architecture to support this feature, it must define some of
 these macros in include/asm-XXX/topology.h:
 #define topology_physical_package_id(cpu)
 #define topology_core_id(cpu)
-#define topology_thread_siblings(cpu)
-#define topology_core_siblings(cpu)
+#define topology_thread_cpumask(cpu)
+#define topology_core_cpumask(cpu)
 
 The type of **_id is int.
-The type of siblings is cpumask_t.
+The type of siblings is (const) struct cpumask *.
 
 To be consistent on all architectures, include/linux/topology.h
 provides default definitions for any of the above macros that are
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 2be08240ee80..62254d4510c6 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -3145,6 +3145,12 @@ Your cooperation is appreciated.
 		  1 = /dev/blockrom1	Second ROM card's translation layer interface
 		  ...
 
+260 char	OSD (Object-based-device) SCSI Device
+		  0 = /dev/osd0		First OSD Device
+		  1 = /dev/osd1		Second OSD Device
+		  ...
+		  255 = /dev/osd255	256th OSD Device
+
  ****	ADDITIONAL /dev DIRECTORY ENTRIES
 
 This section details additional entries that should or may exist in
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fa4e1239a8fa..6b979d1d09ab 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1325,8 +1325,13 @@ and is between 256 and 4096 characters. It is defined in the file
 
 	memtest=	[KNL,X86] Enable memtest
 			Format: <integer>
-			range: 0,4 : pattern number
 			default : 0 <disable>
+			Specifies the number of memtest passes to be
+			performed. Each pass selects another test
+			pattern from a given set of patterns. Memtest
+			fills the memory with this pattern, validates
+			memory contents and reserves bad memory
+			regions that are detected.
 
 	meye.*=		[HW] Set MotionEye Camera parameters
 			See Documentation/video4linux/meye.txt.
diff --git a/Documentation/scsi/osd.txt b/Documentation/scsi/osd.txt
new file mode 100644
index 000000000000..da162f7fd5f5
--- /dev/null
+++ b/Documentation/scsi/osd.txt
@@ -0,0 +1,198 @@
+The OSD Standard
+================
+OSD (Object-Based Storage Device) is a T10 SCSI command set that is designed
+to provide efficient operation of input/output logical units that manage the
+allocation, placement, and accessing of variable-size data-storage containers,
+called objects. Objects are intended to contain operating system and application
+constructs. Each object has associated attributes attached to it, which are
+integral part of the object and provide metadata about the object. The standard
+defines some common obligatory attributes, but user attributes can be added as
+needed.
+
+See: http://www.t10.org/ftp/t10/drafts/osd2/ for the latest draft for OSD 2
+or search the web for "OSD SCSI"
+
+OSD in the Linux Kernel
+=======================
+osd-initiator:
+  The main component of OSD in Kernel is the osd-initiator library. Its main
+user is intended to be the pNFS-over-objects layout driver, which uses objects
+as its back-end data storage. Other clients are the other osd parts listed below.
+
+osd-uld:
+  This is a SCSI ULD that registers for OSD type devices and provides a testing
+platform, both for the in-kernel initiator as well as connected targets. It
+currently has no useful user-mode API, though it could have if need be.
+
+exofs:
+  Is an OSD based Linux file system. It uses the osd-initiator and osd-uld,
+to export a usable file system for users.
+See Documentation/filesystems/exofs.txt for more details
+
+osd target:
+  There are no current plans for an OSD target implementation in kernel. For all
+needs, a user-mode target that is based on the scsi tgt target framework is
+available from Ohio Supercomputer Center (OSC) at:
+http://www.open-osd.org/bin/view/Main/OscOsdProject
+There are several other target implementations. See http://open-osd.org for more
+links.
+
+Files and Folders
+=================
+This is the complete list of files included in this work:
+include/scsi/
+	osd_initiator.h   Main API for the initiator library
+	osd_types.h	  Common OSD types
+	osd_sec.h	  Security Manager API
+	osd_protocol.h	  Wire definitions of the OSD standard protocol
+	osd_attributes.h  Wire definitions of OSD attributes
+
+drivers/scsi/osd/
+	osd_initiator.c   OSD-Initiator library implementation
+	osd_uld.c	  The OSD scsi ULD
+	osd_ktest.{h,c}	  In-kernel test suite (called by osd_uld)
+	osd_debug.h	  Some printk macros
+	Makefile	  For both in-tree and out-of-tree compilation
+	Kconfig		  Enables inclusion of the different pieces
+	osd_test.c	  User-mode application to call the kernel tests
+
+The OSD-Initiator Library
+=========================
+osd_initiator is a low level implementation of an osd initiator encoder.
+But even though, it should be intuitive and easy to use. Perhaps over time an
+higher lever will form that automates some of the more common recipes.
+
+init/fini:
+- osd_dev_init() associates a scsi_device with an osd_dev structure
+  and initializes some global pools. This should be done once per scsi_device
+  (OSD LUN). The osd_dev structure is needed for calling osd_start_request().
+
+- osd_dev_fini() cleans up before a osd_dev/scsi_device destruction.
+
+OSD commands encoding, execution, and decoding of results:
+
+struct osd_request's is used to iteratively encode an OSD command and carry
+its state throughout execution. Each request goes through these stages:
+
+a. osd_start_request() allocates the request.
+
+b. Any of the osd_req_* methods is used to encode a request of the specified
+   type.
+
+c. osd_req_add_{get,set}_attr_* may be called to add get/set attributes to the
+   CDB. "List" or "Page" mode can be used exclusively. The attribute-list API
+   can be called multiple times on the same request. However, only one
+   attribute-page can be read, as mandated by the OSD standard.
+
+d. osd_finalize_request() computes offsets into the data-in and data-out buffers
+   and signs the request using the provided capability key and integrity-
+   check parameters.
+
+e. osd_execute_request() may be called to execute the request via the block
+   layer and wait for its completion.  The request can be executed
+   asynchronously by calling the block layer API directly.
+
+f. After execution, osd_req_decode_sense() can be called to decode the request's
+   sense information.
+
+g. osd_req_decode_get_attr() may be called to retrieve osd_add_get_attr_list()
+   values.
+
+h. osd_end_request() must be called to deallocate the request and any resource
+   associated with it. Note that osd_end_request cleans up the request at any
+   stage and it must always be called after a successful osd_start_request().
+
+osd_request's structure:
+
+The OSD standard defines a complex structure of IO segments pointed to by
+members in the CDB. Up to 3 segments can be deployed in the IN-Buffer and up to
+4 in the OUT-Buffer. The ASCII illustration below depicts a secure-read with
+associated get+set of attributes-lists. Other combinations very on the same
+basic theme. From no-segments-used up to all-segments-used.
+
+|________OSD-CDB__________|
+|                         |
+|read_len (offset=0)     -|---------\
+|                         |         |
+|get_attrs_list_length    |         |
+|get_attrs_list_offset   -|----\    |
+|                         |    |    |
+|retrieved_attrs_alloc_len|    |    |
+|retrieved_attrs_offset  -|----|----|-\
+|                         |    |    | |
+|set_attrs_list_length    |    |    | |
+|set_attrs_list_offset   -|-\  |    | |
+|                         | |  |    | |
+|in_data_integ_offset    -|-|--|----|-|-\
+|out_data_integ_offset   -|-|--|--\ | | |
+\_________________________/ |  |  | | | |
+                            |  |  | | | |
+|_______OUT-BUFFER________| |  |  | | | |
+|      Set attr list      |</  |  | | | |
+|                         |    |  | | | |
+|-------------------------|    |  | | | |
+|   Get attr descriptors  |<---/  | | | |
+|                         |       | | | |
+|-------------------------|       | | | |
+|    Out-data integrity   |<------/ | | |
+|                         |         | | |
+\_________________________/         | | |
+                                    | | |
+|________IN-BUFFER________|         | | |
+|      In-Data read       |<--------/ | |
+|                         |           | |
+|-------------------------|           | |
+|      Get attr list      |<----------/ |
+|                         |             |
+|-------------------------|             |
+|    In-data integrity    |<------------/
+|                         |
+\_________________________/
+
+A block device request can carry bidirectional payload by means of associating
+a bidi_read request with a main write-request. Each in/out request is described
+by a chain of BIOs associated with each request.
+The CDB is of a SCSI VARLEN CDB format, as described by OSD standard.
+The OSD standard also mandates alignment restrictions at start of each segment.
+
+In the code, in struct osd_request, there are two _osd_io_info structures to
+describe the IN/OUT buffers above, two BIOs for the data payload and up to five
+_osd_req_data_segment structures to hold the different segments allocation and
+information.
+
+Important: We have chosen to disregard the assumption that a BIO-chain (and
+the resulting sg-list) describes a linear memory buffer. Meaning only first and
+last scatter chain can be incomplete and all the middle chains are of PAGE_SIZE.
+For us, a scatter-gather-list, as its name implies and as used by the Networking
+layer, is to describe a vector of buffers that will be transferred to/from the
+wire. It works very well with current iSCSI transport. iSCSI is currently the
+only deployed OSD transport. In the future we anticipate SAS and FC attached OSD
+devices as well.
+
+The OSD Testing ULD
+===================
+TODO: More user-mode control on tests.
+
+Authors, Mailing list
+=====================
+Please communicate with us on any deployment of osd, whether using this code
+or not.
+
+Any problems, questions, bug reports, lonely OSD nights, please email:
+   OSD Dev List <osd-dev@open-osd.org>
+
+More up-to-date information can be found on:
+http://open-osd.org
+
+Boaz Harrosh <bharrosh@panasas.com>
+Benny Halevy <bhalevy@panasas.com>
+
+References
+==========
+Weber, R., "SCSI Object-Based Storage Device Commands",
+T10/1355-D ANSI/INCITS 400-2004,
+http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf
+
+Weber, R., "SCSI Object-Based Storage Device Commands -2 (OSD-2)"
+T10/1729-D, Working Draft, rev. 3
+http://www.t10.org/ftp/t10/drafts/osd2/osd2r03.pdf
diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 7b4596ac4120..e0203662f9e9 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -158,7 +158,7 @@ Offset	Proto	Name		Meaning
 0202/4	2.00+	header		Magic signature "HdrS"
 0206/2	2.00+	version		Boot protocol version supported
 0208/4	2.00+	realmode_swtch	Boot loader hook (see below)
-020C/2	2.00+	start_sys	The load-low segment (0x1000) (obsolete)
+020C/2	2.00+	start_sys_seg	The load-low segment (0x1000) (obsolete)
 020E/2	2.00+	kernel_version	Pointer to kernel version string
 0210/1	2.00+	type_of_loader	Boot loader identifier
 0211/1	2.00+	loadflags	Boot protocol option flags
@@ -170,10 +170,11 @@ Offset	Proto	Name		Meaning
 0224/2	2.01+	heap_end_ptr	Free memory after setup end
 0226/2	N/A	pad1		Unused
 0228/4	2.02+	cmd_line_ptr	32-bit pointer to the kernel command line
-022C/4	2.03+	initrd_addr_max	Highest legal initrd address
+022C/4	2.03+	ramdisk_max	Highest legal initrd address
 0230/4	2.05+	kernel_alignment Physical addr alignment required for kernel
 0234/1	2.05+	relocatable_kernel Whether kernel is relocatable or not
-0235/3	N/A	pad2		Unused
+0235/1	N/A	pad2		Unused
+0236/2	N/A	pad3		Unused
 0238/4	2.06+	cmdline_size	Maximum size of the kernel command line
 023C/4	2.07+	hardware_subarch Hardware subarchitecture
 0240/8	2.07+	hardware_subarch_data Subarchitecture-specific data
@@ -299,14 +300,14 @@ Protocol:	2.00+
   e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version
   10.17.
 
-Field name:	readmode_swtch
+Field name:	realmode_swtch
 Type:		modify (optional)
 Offset/size:	0x208/4
 Protocol:	2.00+
 
   Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.)
 
-Field name:	start_sys
+Field name:	start_sys_seg
 Type:		read
 Offset/size:	0x20c/2
 Protocol:	2.00+
@@ -468,7 +469,7 @@ Protocol:	2.02+
   zero, the kernel will assume that your boot loader does not support
   the 2.02+ protocol.
 
-Field name:	initrd_addr_max
+Field name:	ramdisk_max
 Type:		read
 Offset/size:	0x22c/4
 Protocol:	2.03+
@@ -542,7 +543,10 @@ Protocol:	2.08+
 
   The payload may be compressed. The format of both the compressed and
   uncompressed data should be determined using the standard magic
-  numbers. Currently only gzip compressed ELF is used.
+  numbers.  The currently supported compression formats are gzip
+  (magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A) and LZMA
+  (magic number 5D 00).  The uncompressed payload is currently always ELF
+  (magic number 7F 45 4C 46).
   
 Field name:	payload_length
 Type:		read
diff --git a/MAINTAINERS b/MAINTAINERS
index 64c89c215b01..d8a4c8d0a554 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3310,6 +3310,16 @@ L:	orinoco-devel@lists.sourceforge.net
 W:	http://www.nongnu.org/orinoco/
 S:	Maintained
 
+OSD LIBRARY
+P:	Boaz Harrosh
+M:	bharrosh@panasas.com
+P:	Benny Halevy
+M:	bhalevy@panasas.com
+L:	osd-dev@open-osd.org
+W:	http://open-osd.org
+T:	git://git.open-osd.org/open-osd.git
+S:	Maintained
+
 P54 WIRELESS DRIVER
 P:	Michael Wu
 M:	flamingice@sourmilk.net
diff --git a/Makefile b/Makefile
index 1ab3ebfc9091..c6307b6d069f 100644
--- a/Makefile
+++ b/Makefile
@@ -533,8 +533,9 @@ KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
 endif
 
 # Force gcc to behave correct even for buggy distributions
-# Arch Makefiles may override this setting
+ifndef CONFIG_CC_STACKPROTECTOR
 KBUILD_CFLAGS += $(call cc-option, -fno-stack-protector)
+endif
 
 ifdef CONFIG_FRAME_POINTER
 KBUILD_CFLAGS	+= -fno-omit-frame-pointer -fno-optimize-sibling-calls
diff --git a/arch/alpha/kernel/irq.c b/arch/alpha/kernel/irq.c
index d3812eb84015..cc7834661427 100644
--- a/arch/alpha/kernel/irq.c
+++ b/arch/alpha/kernel/irq.c
@@ -55,7 +55,7 @@ int irq_select_affinity(unsigned int irq)
 		cpu = (cpu < (NR_CPUS-1) ? cpu + 1 : 0);
 	last_cpu = cpu;
 
-	irq_desc[irq].affinity = cpumask_of_cpu(cpu);
+	cpumask_copy(irq_desc[irq].affinity, cpumask_of(cpu));
 	irq_desc[irq].chip->set_affinity(irq, cpumask_of(cpu));
 	return 0;
 }
diff --git a/arch/alpha/mm/init.c b/arch/alpha/mm/init.c
index 5d7a16eab312..af71d38c8e41 100644
--- a/arch/alpha/mm/init.c
+++ b/arch/alpha/mm/init.c
@@ -189,9 +189,21 @@ callback_init(void * kernel_end)
 
 	if (alpha_using_srm) {
 		static struct vm_struct console_remap_vm;
-		unsigned long vaddr = VMALLOC_START;
+		unsigned long nr_pages = 0;
+		unsigned long vaddr;
 		unsigned long i, j;
 
+		/* calculate needed size */
+		for (i = 0; i < crb->map_entries; ++i)
+			nr_pages += crb->map[i].count;
+
+		/* register the vm area */
+		console_remap_vm.flags = VM_ALLOC;
+		console_remap_vm.size = nr_pages << PAGE_SHIFT;
+		vm_area_register_early(&console_remap_vm, PAGE_SIZE);
+
+		vaddr = (unsigned long)console_remap_vm.addr;
+
 		/* Set up the third level PTEs and update the virtual
 		   addresses of the CRB entries.  */
 		for (i = 0; i < crb->map_entries; ++i) {
@@ -213,12 +225,6 @@ callback_init(void * kernel_end)
 				vaddr += PAGE_SIZE;
 			}
 		}
-
-		/* Let vmalloc know that we've allocated some space.  */
-		console_remap_vm.flags = VM_ALLOC;
-		console_remap_vm.addr = (void *) VMALLOC_START;
-		console_remap_vm.size = vaddr - VMALLOC_START;
-		vmlist = &console_remap_vm;
 	}
 
 	callback_init_done = 1;
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 7296f0416286..6874c7dca75a 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -104,6 +104,11 @@ static struct irq_desc bad_irq_desc = {
 	.lock = __SPIN_LOCK_UNLOCKED(bad_irq_desc.lock),
 };
 
+#ifdef CONFIG_CPUMASK_OFFSTACK
+/* We are not allocating bad_irq_desc.affinity or .pending_mask */
+#error "ARM architecture does not support CONFIG_CPUMASK_OFFSTACK."
+#endif
+
 /*
  * do_IRQ handles all hardware IRQ's.  Decoded IRQs should not
  * come via this function.  Instead, they should provide their
@@ -161,7 +166,7 @@ void __init init_IRQ(void)
 		irq_desc[irq].status |= IRQ_NOREQUEST | IRQ_NOPROBE;
 
 #ifdef CONFIG_SMP
-	bad_irq_desc.affinity = CPU_MASK_ALL;
+	cpumask_setall(bad_irq_desc.affinity);
 	bad_irq_desc.cpu = smp_processor_id();
 #endif
 	init_arch_irq();
@@ -191,15 +196,16 @@ void migrate_irqs(void)
 		struct irq_desc *desc = irq_desc + i;
 
 		if (desc->cpu == cpu) {
-			unsigned int newcpu = any_online_cpu(desc->affinity);
-
-			if (newcpu == NR_CPUS) {
+			unsigned int newcpu = cpumask_any_and(desc->affinity,
+							      cpu_online_mask);
+			if (newcpu >= nr_cpu_ids) {
 				if (printk_ratelimit())
 					printk(KERN_INFO "IRQ%u no longer affine to CPU%u\n",
 					       i, cpu);
 
-				cpus_setall(desc->affinity);
-				newcpu = any_online_cpu(desc->affinity);
+				cpumask_setall(desc->affinity);
+				newcpu = cpumask_any_and(desc->affinity,
+							 cpu_online_mask);
 			}
 
 			route_irq(desc, i, newcpu);
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 00216071eaf7..1602373e539c 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -64,7 +64,9 @@ SECTIONS
 		__initramfs_end = .;
 #endif
 		. = ALIGN(4096);
+		__per_cpu_load = .;
 		__per_cpu_start = .;
+			*(.data.percpu.page_aligned)
 			*(.data.percpu)
 			*(.data.percpu.shared_aligned)
 		__per_cpu_end = .;
diff --git a/arch/arm/oprofile/op_model_mpcore.c b/arch/arm/oprofile/op_model_mpcore.c
index 6d6bd5899240..853d42bb8682 100644
--- a/arch/arm/oprofile/op_model_mpcore.c
+++ b/arch/arm/oprofile/op_model_mpcore.c
@@ -263,7 +263,7 @@ static void em_route_irq(int irq, unsigned int cpu)
 	const struct cpumask *mask = cpumask_of(cpu);
 
 	spin_lock_irq(&desc->lock);
-	desc->affinity = *mask;
+	cpumask_copy(desc->affinity, mask);
 	desc->chip->set_affinity(irq, mask);
 	spin_unlock_irq(&desc->lock);
 }
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index b189680d18b0..05fe3053dcae 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -181,7 +181,7 @@ source "kernel/Kconfig.preempt"
 config QUICKLIST
 	def_bool y
 
-config HAVE_ARCH_BOOTMEM_NODE
+config HAVE_ARCH_BOOTMEM
 	def_bool n
 
 config ARCH_HAVE_MEMORY_PRESENT
diff --git a/arch/blackfin/include/asm/percpu.h b/arch/blackfin/include/asm/percpu.h
index 797c0c165069..c94c7bc88c71 100644
--- a/arch/blackfin/include/asm/percpu.h
+++ b/arch/blackfin/include/asm/percpu.h
@@ -3,14 +3,4 @@
 
 #include <asm-generic/percpu.h>
 
-#ifdef CONFIG_MODULES
-#define PERCPU_MODULE_RESERVE 8192
-#else
-#define PERCPU_MODULE_RESERVE 0
-#endif
-
-#define PERCPU_ENOUGH_ROOM \
-	(ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) + \
-	 PERCPU_MODULE_RESERVE)
-
 #endif	/* __ARCH_BLACKFIN_PERCPU__ */
diff --git a/arch/blackfin/kernel/irqchip.c b/arch/blackfin/kernel/irqchip.c
index bd052a67032e..401bd32aa499 100644
--- a/arch/blackfin/kernel/irqchip.c
+++ b/arch/blackfin/kernel/irqchip.c
@@ -70,6 +70,11 @@ static struct irq_desc bad_irq_desc = {
 #endif
 };
 
+#ifdef CONFIG_CPUMASK_OFFSTACK
+/* We are not allocating a variable-sized bad_irq_desc.affinity */
+#error "Blackfin architecture does not support CONFIG_CPUMASK_OFFSTACK."
+#endif
+
 int show_interrupts(struct seq_file *p, void *v)
 {
 	int i = *(loff_t *) v, j;
diff --git a/arch/ia64/include/asm/percpu.h b/arch/ia64/include/asm/percpu.h
index 77f30b664b4e..30cf46534dd2 100644
--- a/arch/ia64/include/asm/percpu.h
+++ b/arch/ia64/include/asm/percpu.h
@@ -27,12 +27,12 @@ extern void *per_cpu_init(void);
 
 #else /* ! SMP */
 
-#define PER_CPU_ATTRIBUTES	__attribute__((__section__(".data.percpu")))
-
 #define per_cpu_init()				(__phys_per_cpu_start)
 
 #endif	/* SMP */
 
+#define PER_CPU_BASE_SECTION ".data.percpu"
+
 /*
  * Be extremely careful when taking the address of this variable!  Due to virtual
  * remapping, it is different from the canonical address returned by __get_cpu_var(var)!
diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index 32f3af1641c5..3193f4417e16 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -84,7 +84,7 @@ void build_cpu_to_node_map(void);
 	.child			= NULL,			\
 	.groups			= NULL,			\
 	.min_interval		= 8,			\
-	.max_interval		= 8*(min(num_online_cpus(), 32)), \
+	.max_interval		= 8*(min(num_online_cpus(), 32U)), \
 	.busy_factor		= 64,			\
 	.imbalance_pct		= 125,			\
 	.cache_nice_tries	= 2,			\
diff --git a/arch/ia64/include/asm/uv/uv.h b/arch/ia64/include/asm/uv/uv.h
new file mode 100644
index 000000000000..61b5bdfd980e
--- /dev/null
+++ b/arch/ia64/include/asm/uv/uv.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_IA64_UV_UV_H
+#define _ASM_IA64_UV_UV_H
+
+#include <asm/system.h>
+#include <asm/sn/simulator.h>
+
+static inline int is_uv_system(void)
+{
+	/* temporary support for running on hardware simulator */
+	return IS_MEDUSA() || ia64_platform_is("uv");
+}
+
+#endif	/* _ASM_IA64_UV_UV_H */
diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index d541671caf4a..bdef2ce38c8b 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -199,6 +199,10 @@ char *__init __acpi_map_table(unsigned long phys_addr, unsigned long size)
 	return __va(phys_addr);
 }
 
+void __init __acpi_unmap_table(char *map, unsigned long size)
+{
+}
+
 /* --------------------------------------------------------------------------
                             Boot-time Table Parsing
    -------------------------------------------------------------------------- */
diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index e13125058bed..166e0d839fa0 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -880,7 +880,7 @@ iosapic_unregister_intr (unsigned int gsi)
 	if (iosapic_intr_info[irq].count == 0) {
 #ifdef CONFIG_SMP
 		/* Clear affinity */
-		cpus_setall(idesc->affinity);
+		cpumask_setall(idesc->affinity);
 #endif
 		/* Clear the interrupt information */
 		iosapic_intr_info[irq].dest = 0;
diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index 4f596613bffd..7429752ef5ad 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -103,7 +103,7 @@ static char irq_redir [NR_IRQS]; // = { [0 ... NR_IRQS-1] = 1 };
 void set_irq_affinity_info (unsigned int irq, int hwid, int redir)
 {
 	if (irq < NR_IRQS) {
-		cpumask_copy(&irq_desc[irq].affinity,
+		cpumask_copy(irq_desc[irq].affinity,
 			     cpumask_of(cpu_logical_id(hwid)));
 		irq_redir[irq] = (char) (redir & 0xff);
 	}
@@ -148,7 +148,7 @@ static void migrate_irqs(void)
 		if (desc->status == IRQ_PER_CPU)
 			continue;
 
-		if (cpumask_any_and(&irq_desc[irq].affinity, cpu_online_mask)
+		if (cpumask_any_and(irq_desc[irq].affinity, cpu_online_mask)
 		    >= nr_cpu_ids) {
 			/*
 			 * Save it for phase 2 processing
diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 28d3d483db92..acc4d19ae62a 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -493,14 +493,15 @@ ia64_handle_irq (ia64_vector vector, struct pt_regs *regs)
 	saved_tpr = ia64_getreg(_IA64_REG_CR_TPR);
 	ia64_srlz_d();
 	while (vector != IA64_SPURIOUS_INT_VECTOR) {
+		int irq = local_vector_to_irq(vector);
+		struct irq_desc *desc = irq_to_desc(irq);
+
 		if (unlikely(IS_LOCAL_TLB_FLUSH(vector))) {
 			smp_local_flush_tlb();
-			kstat_this_cpu.irqs[vector]++;
-		} else if (unlikely(IS_RESCHEDULE(vector)))
-			kstat_this_cpu.irqs[vector]++;
-		else {
-			int irq = local_vector_to_irq(vector);
-
+			kstat_incr_irqs_this_cpu(irq, desc);
+		} else if (unlikely(IS_RESCHEDULE(vector))) {
+			kstat_incr_irqs_this_cpu(irq, desc);
+		} else {
 			ia64_setreg(_IA64_REG_CR_TPR, vector);
 			ia64_srlz_d();
 
@@ -543,22 +544,24 @@ void ia64_process_pending_intr(void)
 
 	vector = ia64_get_ivr();
 
-	 irq_enter();
-	 saved_tpr = ia64_getreg(_IA64_REG_CR_TPR);
-	 ia64_srlz_d();
+	irq_enter();
+	saved_tpr = ia64_getreg(_IA64_REG_CR_TPR);
+	ia64_srlz_d();
 
 	 /*
 	  * Perform normal interrupt style processing
 	  */
 	while (vector != IA64_SPURIOUS_INT_VECTOR) {
+		int irq = local_vector_to_irq(vector);
+		struct irq_desc *desc = irq_to_desc(irq);
+
 		if (unlikely(IS_LOCAL_TLB_FLUSH(vector))) {
 			smp_local_flush_tlb();
-			kstat_this_cpu.irqs[vector]++;
-		} else if (unlikely(IS_RESCHEDULE(vector)))
-			kstat_this_cpu.irqs[vector]++;
-		else {
+			kstat_incr_irqs_this_cpu(irq, desc);
+		} else if (unlikely(IS_RESCHEDULE(vector))) {
+			kstat_incr_irqs_this_cpu(irq, desc);
+		} else {
 			struct pt_regs *old_regs = set_irq_regs(NULL);
-			int irq = local_vector_to_irq(vector);
 
 			ia64_setreg(_IA64_REG_CR_TPR, vector);
 			ia64_srlz_d();
diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c
index 368ee4e5266d..2b15e233f7fe 100644
--- a/arch/ia64/kernel/msi_ia64.c
+++ b/arch/ia64/kernel/msi_ia64.c
@@ -38,7 +38,7 @@ static void ia64_set_msi_irq_affinity(unsigned int irq,
 	msg.data = data;
 
 	write_msi_msg(irq, &msg);
-	irq_desc[irq].affinity = cpumask_of_cpu(cpu);
+	cpumask_copy(irq_desc[irq].affinity, cpumask_of(cpu));
 }
 #endif /* CONFIG_SMP */
 
@@ -150,7 +150,7 @@ static void dmar_msi_set_affinity(unsigned int irq, const struct cpumask *mask)
 	msg.address_lo |= MSI_ADDR_DEST_ID_CPU(cpu_physical_id(cpu));
 
 	dmar_msi_write(irq, &msg);
-	irq_desc[irq].affinity = *mask;
+	cpumask_copy(irq_desc[irq].affinity, mask);
 }
 #endif /* CONFIG_SMP */
 
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 10a7d47e8510..3765efc5f963 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -213,16 +213,9 @@ SECTIONS
         { *(.data.cacheline_aligned) }
 
   /* Per-cpu data: */
-  percpu : { } :percpu
   . = ALIGN(PERCPU_PAGE_SIZE);
-  __phys_per_cpu_start = .;
-  .data.percpu PERCPU_ADDR : AT(__phys_per_cpu_start - LOAD_OFFSET)
-	{
-		__per_cpu_start = .;
-		*(.data.percpu)
-		*(.data.percpu.shared_aligned)
-		__per_cpu_end = .;
-	}
+  PERCPU_VADDR(PERCPU_ADDR, :percpu)
+  __phys_per_cpu_start = __per_cpu_load;
   . = __phys_per_cpu_start + PERCPU_PAGE_SIZE;	/* ensure percpu data fits
   						 * into percpu page size
 						 */
diff --git a/arch/ia64/sn/kernel/msi_sn.c b/arch/ia64/sn/kernel/msi_sn.c
index ca553b0429ce..81e428943d73 100644
--- a/arch/ia64/sn/kernel/msi_sn.c
+++ b/arch/ia64/sn/kernel/msi_sn.c
@@ -205,7 +205,7 @@ static void sn_set_msi_irq_affinity(unsigned int irq,
 	msg.address_lo = (u32)(bus_addr & 0x00000000ffffffff);
 
 	write_msi_msg(irq, &msg);
-	irq_desc[irq].affinity = *cpu_mask;
+	cpumask_copy(irq_desc[irq].affinity, cpu_mask);
 }
 #endif /* CONFIG_SMP */
 
diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h
index abc62aa744ac..3214ade02d10 100644
--- a/arch/mips/include/asm/irq.h
+++ b/arch/mips/include/asm/irq.h
@@ -66,7 +66,7 @@ extern void smtc_forward_irq(unsigned int irq);
  */
 #define IRQ_AFFINITY_HOOK(irq)						\
 do {									\
-    if (!cpu_isset(smp_processor_id(), irq_desc[irq].affinity)) {	\
+    if (!cpumask_test_cpu(smp_processor_id(), irq_desc[irq].affinity)) {\
 	smtc_forward_irq(irq);						\
 	irq_exit();							\
 	return;								\
diff --git a/arch/mips/kernel/irq-gic.c b/arch/mips/kernel/irq-gic.c
index 494a49a317e9..87deb8f6c458 100644
--- a/arch/mips/kernel/irq-gic.c
+++ b/arch/mips/kernel/irq-gic.c
@@ -187,7 +187,7 @@ static void gic_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 		set_bit(irq, pcpu_masks[first_cpu(tmp)].pcpu_mask);
 
 	}
-	irq_desc[irq].affinity = *cpumask;
+	cpumask_copy(irq_desc[irq].affinity, cpumask);
 	spin_unlock_irqrestore(&gic_lock, flags);
 
 }
diff --git a/arch/mips/kernel/smtc.c b/arch/mips/kernel/smtc.c
index b6cca01ff82b..5f5af7d4c890 100644
--- a/arch/mips/kernel/smtc.c
+++ b/arch/mips/kernel/smtc.c
@@ -686,7 +686,7 @@ void smtc_forward_irq(unsigned int irq)
 	 * and efficiency, we just pick the easiest one to find.
 	 */
 
-	target = first_cpu(irq_desc[irq].affinity);
+	target = cpumask_first(irq_desc[irq].affinity);
 
 	/*
 	 * We depend on the platform code to have correctly processed
@@ -921,11 +921,13 @@ void ipi_decode(struct smtc_ipi *pipi)
 	struct clock_event_device *cd;
 	void *arg_copy = pipi->arg;
 	int type_copy = pipi->type;
+	int irq = MIPS_CPU_IRQ_BASE + 1;
+
 	smtc_ipi_nq(&freeIPIq, pipi);
 	switch (type_copy) {
 	case SMTC_CLOCK_TICK:
 		irq_enter();
-		kstat_this_cpu.irqs[MIPS_CPU_IRQ_BASE + 1]++;
+		kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 		cd = &per_cpu(mips_clockevent_device, cpu);
 		cd->event_handler(cd);
 		irq_exit();
diff --git a/arch/mips/mti-malta/malta-smtc.c b/arch/mips/mti-malta/malta-smtc.c
index aabd7274507b..5ba31888fefb 100644
--- a/arch/mips/mti-malta/malta-smtc.c
+++ b/arch/mips/mti-malta/malta-smtc.c
@@ -116,7 +116,7 @@ struct plat_smp_ops msmtc_smp_ops = {
 
 void plat_set_irq_affinity(unsigned int irq, const struct cpumask *affinity)
 {
-	cpumask_t tmask = *affinity;
+	cpumask_t tmask;
 	int cpu = 0;
 	void smtc_set_irq_affinity(unsigned int irq, cpumask_t aff);
 
@@ -139,11 +139,12 @@ void plat_set_irq_affinity(unsigned int irq, const struct cpumask *affinity)
 	 * be made to forward to an offline "CPU".
 	 */
 
+	cpumask_copy(&tmask, affinity);
 	for_each_cpu(cpu, affinity) {
 		if ((cpu_data[cpu].vpe_id != 0) || !cpu_online(cpu))
 			cpu_clear(cpu, tmask);
 	}
-	irq_desc[irq].affinity = tmask;
+	cpumask_copy(irq_desc[irq].affinity, &tmask);
 
 	if (cpus_empty(tmask))
 		/*
diff --git a/arch/mips/sgi-ip22/ip22-int.c b/arch/mips/sgi-ip22/ip22-int.c
index f8b18af141a1..0ecd5fe9486e 100644
--- a/arch/mips/sgi-ip22/ip22-int.c
+++ b/arch/mips/sgi-ip22/ip22-int.c
@@ -155,7 +155,7 @@ static void indy_buserror_irq(void)
 	int irq = SGI_BUSERR_IRQ;
 
 	irq_enter();
-	kstat_this_cpu.irqs[irq]++;
+	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 	ip22_be_interrupt(irq);
 	irq_exit();
 }
diff --git a/arch/mips/sgi-ip22/ip22-time.c b/arch/mips/sgi-ip22/ip22-time.c
index 3dcb27ec0c53..c8f7d2328b24 100644
--- a/arch/mips/sgi-ip22/ip22-time.c
+++ b/arch/mips/sgi-ip22/ip22-time.c
@@ -122,7 +122,7 @@ void indy_8254timer_irq(void)
 	char c;
 
 	irq_enter();
-	kstat_this_cpu.irqs[irq]++;
+	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 	printk(KERN_ALERT "Oops, got 8254 interrupt.\n");
 	ArcRead(0, &c, 1, &cnt);
 	ArcEnterInteractiveMode();
diff --git a/arch/mips/sibyte/bcm1480/smp.c b/arch/mips/sibyte/bcm1480/smp.c
index dddfda8e8294..314691648c97 100644
--- a/arch/mips/sibyte/bcm1480/smp.c
+++ b/arch/mips/sibyte/bcm1480/smp.c
@@ -178,9 +178,10 @@ struct plat_smp_ops bcm1480_smp_ops = {
 void bcm1480_mailbox_interrupt(void)
 {
 	int cpu = smp_processor_id();
+	int irq = K_BCM1480_INT_MBOX_0_0;
 	unsigned int action;
 
-	kstat_this_cpu.irqs[K_BCM1480_INT_MBOX_0_0]++;
+	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 	/* Load the mailbox register to figure out what we're supposed to do */
 	action = (__raw_readq(mailbox_0_regs[cpu]) >> 48) & 0xffff;
 
diff --git a/arch/mips/sibyte/sb1250/smp.c b/arch/mips/sibyte/sb1250/smp.c
index 5950a288a7da..cad14003b84f 100644
--- a/arch/mips/sibyte/sb1250/smp.c
+++ b/arch/mips/sibyte/sb1250/smp.c
@@ -166,9 +166,10 @@ struct plat_smp_ops sb_smp_ops = {
 void sb1250_mailbox_interrupt(void)
 {
 	int cpu = smp_processor_id();
+	int irq = K_INT_MBOX_0;
 	unsigned int action;
 
-	kstat_this_cpu.irqs[K_INT_MBOX_0]++;
+	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 	/* Load the mailbox register to figure out what we're supposed to do */
 	action = (____raw_readq(mailbox_regs[cpu]) >> 48) & 0xffff;
 
diff --git a/arch/mn10300/kernel/mn10300-watchdog.c b/arch/mn10300/kernel/mn10300-watchdog.c
index 10811e981d20..2e370d88a87a 100644
--- a/arch/mn10300/kernel/mn10300-watchdog.c
+++ b/arch/mn10300/kernel/mn10300-watchdog.c
@@ -130,6 +130,7 @@ void watchdog_interrupt(struct pt_regs *regs, enum exception_code excep)
 	 * the stack NMI-atomically, it's safe to use smp_processor_id().
 	 */
 	int sum, cpu = smp_processor_id();
+	int irq = NMIIRQ;
 	u8 wdt, tmp;
 
 	wdt = WDCTR & ~WDCTR_WDCNE;
@@ -138,7 +139,7 @@ void watchdog_interrupt(struct pt_regs *regs, enum exception_code excep)
 	NMICR = NMICR_WDIF;
 
 	nmi_count(cpu)++;
-	kstat_this_cpu.irqs[NMIIRQ]++;
+	kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 	sum = irq_stat[cpu].__irq_count;
 
 	if (last_irq_sums[cpu] == sum) {
diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index adfd617b4c18..1c740f5cbd63 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -138,7 +138,7 @@ static void cpu_set_affinity_irq(unsigned int irq, const struct cpumask *dest)
 	if (cpu_dest < 0)
 		return;
 
-	cpumask_copy(&irq_desc[irq].affinity, &cpumask_of_cpu(cpu_dest));
+	cpumask_copy(&irq_desc[irq].affinity, dest);
 }
 #endif
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 17efb7118db1..1b55ffdf0026 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -231,7 +231,7 @@ void fixup_irqs(cpumask_t map)
 		if (irq_desc[irq].status & IRQ_PER_CPU)
 			continue;
 
-		cpus_and(mask, irq_desc[irq].affinity, map);
+		cpumask_and(&mask, irq_desc[irq].affinity, &map);
 		if (any_online_cpu(mask) == NR_CPUS) {
 			printk("Breaking affinity for irq %i\n", irq);
 			mask = map;
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 161b9b9691f0..67f07f453385 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -181,13 +181,7 @@ SECTIONS
 		__initramfs_end = .;
 	}
 #endif
-	. = ALIGN(PAGE_SIZE);
-	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
-		__per_cpu_start = .;
-		*(.data.percpu)
-		*(.data.percpu.shared_aligned)
-		__per_cpu_end = .;
-	}
+	PERCPU(PAGE_SIZE)
 
 	. = ALIGN(8);
 	.machine.desc : AT(ADDR(.machine.desc) - LOAD_OFFSET) {
diff --git a/arch/powerpc/platforms/pseries/xics.c b/arch/powerpc/platforms/pseries/xics.c
index 84e058f1e1cc..80b513449f4c 100644
--- a/arch/powerpc/platforms/pseries/xics.c
+++ b/arch/powerpc/platforms/pseries/xics.c
@@ -153,9 +153,10 @@ static int get_irq_server(unsigned int virq, unsigned int strict_check)
 {
 	int server;
 	/* For the moment only implement delivery to all cpus or one cpu */
-	cpumask_t cpumask = irq_desc[virq].affinity;
+	cpumask_t cpumask;
 	cpumask_t tmp = CPU_MASK_NONE;
 
+	cpumask_copy(&cpumask, irq_desc[virq].affinity);
 	if (!distribute_irqs)
 		return default_server;
 
@@ -869,7 +870,7 @@ void xics_migrate_irqs_away(void)
 		       virq, cpu);
 
 		/* Reset affinity to all cpus */
-		irq_desc[virq].affinity = CPU_MASK_ALL;
+		cpumask_setall(irq_desc[virq].affinity);
 		desc->chip->set_affinity(virq, cpu_all_mask);
 unlock:
 		spin_unlock_irqrestore(&desc->lock, flags);
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index a35297dbac28..532e205303a2 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -566,9 +566,10 @@ static void __init mpic_scan_ht_pics(struct mpic *mpic)
 #ifdef CONFIG_SMP
 static int irq_choose_cpu(unsigned int virt_irq)
 {
-	cpumask_t mask = irq_desc[virt_irq].affinity;
+	cpumask_t mask;
 	int cpuid;
 
+	cpumask_copy(&mask, irq_desc[virt_irq].affinity);
 	if (cpus_equal(mask, CPU_MASK_ALL)) {
 		static int irq_rover;
 		static DEFINE_SPINLOCK(irq_rover_lock);
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index 8ba064f08a6f..d0d6a515499a 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -252,9 +252,10 @@ struct irq_handler_data {
 #ifdef CONFIG_SMP
 static int irq_choose_cpu(unsigned int virt_irq)
 {
-	cpumask_t mask = irq_desc[virt_irq].affinity;
+	cpumask_t mask;
 	int cpuid;
 
+	cpumask_copy(&mask, irq_desc[virt_irq].affinity);
 	if (cpus_equal(mask, CPU_MASK_ALL)) {
 		static int irq_rover;
 		static DEFINE_SPINLOCK(irq_rover_lock);
@@ -805,7 +806,7 @@ void fixup_irqs(void)
 		    !(irq_desc[irq].status & IRQ_PER_CPU)) {
 			if (irq_desc[irq].chip->set_affinity)
 				irq_desc[irq].chip->set_affinity(irq,
-					&irq_desc[irq].affinity);
+					irq_desc[irq].affinity);
 		}
 		spin_unlock_irqrestore(&irq_desc[irq].lock, flags);
 	}
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index 4ee2e48c4b39..db310aa00183 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -36,10 +36,10 @@
 #include <linux/clocksource.h>
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
-#include <linux/irq.h>
 
 #include <asm/oplib.h>
 #include <asm/timer.h>
+#include <asm/irq.h>
 #include <asm/io.h>
 #include <asm/prom.h>
 #include <asm/starfire.h>
@@ -724,14 +724,12 @@ void timer_interrupt(int irq, struct pt_regs *regs)
 	unsigned long tick_mask = tick_ops->softint_mask;
 	int cpu = smp_processor_id();
 	struct clock_event_device *evt = &per_cpu(sparc64_events, cpu);
-	struct irq_desc *desc;
 
 	clear_softint(tick_mask);
 
 	irq_enter();
 
-	desc = irq_to_desc(0);
-	kstat_incr_irqs_this_cpu(0, desc);
+	kstat_incr_irqs_this_cpu(0, irq_to_desc(0));
 
 	if (unlikely(!evt->event_handler)) {
 		printk(KERN_WARNING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3a330a437c6f..06c02c00d7d9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -5,7 +5,7 @@ mainmenu "Linux Kernel Configuration for x86"
 config 64BIT
 	bool "64-bit kernel" if ARCH = "x86"
 	default ARCH = "x86_64"
-	help
+	---help---
 	  Say yes to build a 64-bit kernel - formerly known as x86_64
 	  Say no to build a 32-bit kernel - formerly known as i386
 
@@ -34,12 +34,15 @@ config X86
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
-	select HAVE_ARCH_KGDB if !X86_VOYAGER
+	select HAVE_KVM
+	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_GENERIC_DMA_COHERENT if X86_32
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select USER_STACKTRACE_SUPPORT
+	select HAVE_KERNEL_GZIP
+	select HAVE_KERNEL_BZIP2
+	select HAVE_KERNEL_LZMA
 
 config ARCH_DEFCONFIG
 	string
@@ -133,18 +136,19 @@ config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
 config HAVE_SETUP_PER_CPU_AREA
-	def_bool X86_64_SMP || (X86_SMP && !X86_VOYAGER)
+	def_bool y
+
+config HAVE_DYNAMIC_PER_CPU_AREA
+	def_bool y
 
 config HAVE_CPUMASK_OF_CPU_MAP
 	def_bool X86_64_SMP
 
 config ARCH_HIBERNATION_POSSIBLE
 	def_bool y
-	depends on !SMP || !X86_VOYAGER
 
 config ARCH_SUSPEND_POSSIBLE
 	def_bool y
-	depends on !X86_VOYAGER
 
 config ZONE_DMA32
 	bool
@@ -177,11 +181,6 @@ config GENERIC_PENDING_IRQ
 	depends on GENERIC_HARDIRQS && SMP
 	default y
 
-config X86_SMP
-	bool
-	depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
-	default y
-
 config USE_GENERIC_SMP_HELPERS
 	def_bool y
 	depends on SMP
@@ -197,19 +196,17 @@ config X86_64_SMP
 config X86_HT
 	bool
 	depends on SMP
-	depends on (X86_32 && !X86_VOYAGER) || X86_64
-	default y
-
-config X86_BIOS_REBOOT
-	bool
-	depends on !X86_VOYAGER
 	default y
 
 config X86_TRAMPOLINE
 	bool
-	depends on X86_SMP || (X86_VOYAGER && SMP) || (64BIT && ACPI_SLEEP)
+	depends on SMP || (64BIT && ACPI_SLEEP)
 	default y
 
+config X86_32_LAZY_GS
+	def_bool y
+	depends on X86_32 && !CC_STACKPROTECTOR
+
 config KTIME_SCALAR
 	def_bool X86_32
 source "init/Kconfig"
@@ -247,14 +244,24 @@ config SMP
 
 	  If you don't know what to do here, say N.
 
-config X86_HAS_BOOT_CPU_ID
-	def_bool y
-	depends on X86_VOYAGER
+config X86_X2APIC
+	bool "Support x2apic"
+	depends on X86_LOCAL_APIC && X86_64
+	---help---
+	  This enables x2apic support on CPUs that have this feature.
+
+	  This allows 32-bit apic IDs (so it can support very large systems),
+	  and accesses the local apic via MSRs not via mmio.
+
+	  ( On certain CPU models you may need to enable INTR_REMAP too,
+	    to get functional x2apic mode. )
+
+	  If you don't know what to do here, say N.
 
 config SPARSE_IRQ
 	bool "Support sparse irq numbering"
 	depends on PCI_MSI || HT_IRQ
-	help
+	---help---
 	  This enables support for sparse irqs. This is useful for distro
 	  kernels that want to define a high CONFIG_NR_CPUS value but still
 	  want to have low kernel memory footprint on smaller machines.
@@ -268,114 +275,140 @@ config NUMA_MIGRATE_IRQ_DESC
 	bool "Move irq desc when changing irq smp_affinity"
 	depends on SPARSE_IRQ && NUMA
 	default n
-	help
+	---help---
 	  This enables moving irq_desc to cpu/node that irq will use handled.
 
 	  If you don't know what to do here, say N.
 
-config X86_FIND_SMP_CONFIG
-	def_bool y
-	depends on X86_MPPARSE || X86_VOYAGER
-
 config X86_MPPARSE
 	bool "Enable MPS table" if ACPI
 	default y
 	depends on X86_LOCAL_APIC
-	help
+	---help---
 	  For old smp systems that do not have proper acpi support. Newer systems
 	  (esp with 64bit cpus) with acpi support, MADT and DSDT will override it
 
-choice
-	prompt "Subarchitecture Type"
-	default X86_PC
+config X86_BIGSMP
+	bool "Support for big SMP systems with more than 8 CPUs"
+	depends on X86_32 && SMP
+	---help---
+	  This option is needed for the systems that have more than 8 CPUs
 
-config X86_PC
-	bool "PC-compatible"
-	help
-	  Choose this option if your computer is a standard PC or compatible.
+if X86_32
+config X86_EXTENDED_PLATFORM
+	bool "Support for extended (non-PC) x86 platforms"
+	default y
+	---help---
+	  If you disable this option then the kernel will only support
+	  standard PC platforms. (which covers the vast majority of
+	  systems out there.)
+
+	  If you enable this option then you'll be able to select support
+	  for the following (non-PC) 32 bit x86 platforms:
+		AMD Elan
+		NUMAQ (IBM/Sequent)
+		RDC R-321x SoC
+		SGI 320/540 (Visual Workstation)
+		Summit/EXA (IBM x440)
+		Unisys ES7000 IA32 series
+
+	  If you have one of these systems, or if you want to build a
+	  generic distribution kernel, say Y here - otherwise say N.
+endif
+
+if X86_64
+config X86_EXTENDED_PLATFORM
+	bool "Support for extended (non-PC) x86 platforms"
+	default y
+	---help---
+	  If you disable this option then the kernel will only support
+	  standard PC platforms. (which covers the vast majority of
+	  systems out there.)
+
+	  If you enable this option then you'll be able to select support
+	  for the following (non-PC) 64 bit x86 platforms:
+		ScaleMP vSMP
+		SGI Ultraviolet
+
+	  If you have one of these systems, or if you want to build a
+	  generic distribution kernel, say Y here - otherwise say N.
+endif
+# This is an alphabetically sorted list of 64 bit extended platforms
+# Please maintain the alphabetic order if and when there are additions
+
+config X86_VSMP
+	bool "ScaleMP vSMP"
+	select PARAVIRT
+	depends on X86_64 && PCI
+	depends on X86_EXTENDED_PLATFORM
+	---help---
+	  Support for ScaleMP vSMP systems.  Say 'Y' here if this kernel is
+	  supposed to run on these EM64T-based machines.  Only choose this option
+	  if you have one of these machines.
+
+config X86_UV
+	bool "SGI Ultraviolet"
+	depends on X86_64
+	depends on X86_EXTENDED_PLATFORM
+	select X86_X2APIC
+	---help---
+	  This option is needed in order to support SGI Ultraviolet systems.
+	  If you don't have one of these, you should say N here.
+
+# Following is an alphabetically sorted list of 32 bit extended platforms
+# Please maintain the alphabetic order if and when there are additions
 
 config X86_ELAN
 	bool "AMD Elan"
 	depends on X86_32
-	help
+	depends on X86_EXTENDED_PLATFORM
+	---help---
 	  Select this for an AMD Elan processor.
 
 	  Do not use this option for K6/Athlon/Opteron processors!
 
 	  If unsure, choose "PC-compatible" instead.
 
-config X86_VOYAGER
-	bool "Voyager (NCR)"
-	depends on X86_32 && (SMP || BROKEN) && !PCI
-	help
-	  Voyager is an MCA-based 32-way capable SMP architecture proprietary
-	  to NCR Corp.  Machine classes 345x/35xx/4100/51xx are Voyager-based.
-
-	  *** WARNING ***
-
-	  If you do not specifically know you have a Voyager based machine,
-	  say N here, otherwise the kernel you build will not be bootable.
-
-config X86_GENERICARCH
-       bool "Generic architecture"
+config X86_RDC321X
+	bool "RDC R-321x SoC"
 	depends on X86_32
-       help
-          This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default
+	depends on X86_EXTENDED_PLATFORM
+	select M486
+	select X86_REBOOTFIXUPS
+	---help---
+	  This option is needed for RDC R-321x system-on-chip, also known
+	  as R-8610-(G).
+	  If you don't have one of these chips, you should say N here.
+
+config X86_32_NON_STANDARD
+	bool "Support non-standard 32-bit SMP architectures"
+	depends on X86_32 && SMP
+	depends on X86_EXTENDED_PLATFORM
+	---help---
+	  This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default
 	  subarchitectures.  It is intended for a generic binary kernel.
 	  if you select them all, kernel will probe it one by one. and will
 	  fallback to default.
 
-if X86_GENERICARCH
+# Alphabetically sorted list of Non standard 32 bit platforms
 
 config X86_NUMAQ
 	bool "NUMAQ (IBM/Sequent)"
-	depends on SMP && X86_32 && PCI && X86_MPPARSE
+	depends on X86_32_NON_STANDARD
 	select NUMA
-	help
+	select X86_MPPARSE
+	---help---
 	  This option is used for getting Linux to run on a NUMAQ (IBM/Sequent)
 	  NUMA multiquad box. This changes the way that processors are
 	  bootstrapped, and uses Clustered Logical APIC addressing mode instead
 	  of Flat Logical.  You will need a new lynxer.elf file to flash your
 	  firmware with - send email to <Martin.Bligh@us.ibm.com>.
 
-config X86_SUMMIT
-	bool "Summit/EXA (IBM x440)"
-	depends on X86_32 && SMP
-	help
-	  This option is needed for IBM systems that use the Summit/EXA chipset.
-	  In particular, it is needed for the x440.
-
-config X86_ES7000
-	bool "Support for Unisys ES7000 IA32 series"
-	depends on X86_32 && SMP
-	help
-	  Support for Unisys ES7000 systems.  Say 'Y' here if this kernel is
-	  supposed to run on an IA32-based Unisys ES7000 system.
-
-config X86_BIGSMP
-	bool "Support for big SMP systems with more than 8 CPUs"
-	depends on X86_32 && SMP
-	help
-	  This option is needed for the systems that have more than 8 CPUs
-	  and if the system is not of any sub-arch type above.
-
-endif
-
-config X86_VSMP
-	bool "Support for ScaleMP vSMP"
-	select PARAVIRT
-	depends on X86_64 && PCI
-	help
-	  Support for ScaleMP vSMP systems.  Say 'Y' here if this kernel is
-	  supposed to run on these EM64T-based machines.  Only choose this option
-	  if you have one of these machines.
-
-endchoice
-
 config X86_VISWS
 	bool "SGI 320/540 (Visual Workstation)"
-	depends on X86_32 && PCI && !X86_VOYAGER && X86_MPPARSE && PCI_GODIRECT
-	help
+	depends on X86_32 && PCI && X86_MPPARSE && PCI_GODIRECT
+	depends on X86_32_NON_STANDARD
+	---help---
 	  The SGI Visual Workstation series is an IA32-based workstation
 	  based on SGI systems chips with some legacy PC hardware attached.
 
@@ -384,21 +417,25 @@ config X86_VISWS
 	  A kernel compiled for the Visual Workstation will run on general
 	  PCs as well. See <file:Documentation/sgi-visws.txt> for details.
 
-config X86_RDC321X
-	bool "RDC R-321x SoC"
-	depends on X86_32
-	select M486
-	select X86_REBOOTFIXUPS
-	help
-	  This option is needed for RDC R-321x system-on-chip, also known
-	  as R-8610-(G).
-	  If you don't have one of these chips, you should say N here.
+config X86_SUMMIT
+	bool "Summit/EXA (IBM x440)"
+	depends on X86_32_NON_STANDARD
+	---help---
+	  This option is needed for IBM systems that use the Summit/EXA chipset.
+	  In particular, it is needed for the x440.
+
+config X86_ES7000
+	bool "Unisys ES7000 IA32 series"
+	depends on X86_32_NON_STANDARD && X86_BIGSMP
+	---help---
+	  Support for Unisys ES7000 systems.  Say 'Y' here if this kernel is
+	  supposed to run on an IA32-based Unisys ES7000 system.
 
 config SCHED_OMIT_FRAME_POINTER
 	def_bool y
 	prompt "Single-depth WCHAN output"
 	depends on X86
-	help
+	---help---
 	  Calculate simpler /proc/<PID>/wchan values. If this option
 	  is disabled then wchan values will recurse back to the
 	  caller function. This provides more accurate wchan values,
@@ -408,7 +445,7 @@ config SCHED_OMIT_FRAME_POINTER
 
 menuconfig PARAVIRT_GUEST
 	bool "Paravirtualized guest support"
-	help
+	---help---
 	  Say Y here to get to see options related to running Linux under
 	  various hypervisors.  This option alone does not add any kernel code.
 
@@ -422,8 +459,7 @@ config VMI
 	bool "VMI Guest support"
 	select PARAVIRT
 	depends on X86_32
-	depends on !X86_VOYAGER
-	help
+	---help---
 	  VMI provides a paravirtualized interface to the VMware ESX server
 	  (it could be used by other hypervisors in theory too, but is not
 	  at the moment), by linking the kernel to a GPL-ed ROM module
@@ -433,8 +469,7 @@ config KVM_CLOCK
 	bool "KVM paravirtualized clock"
 	select PARAVIRT
 	select PARAVIRT_CLOCK
-	depends on !X86_VOYAGER
-	help
+	---help---
 	  Turning on this option will allow you to run a paravirtualized clock
 	  when running over the KVM hypervisor. Instead of relying on a PIT
 	  (or probably other) emulation by the underlying device model, the host
@@ -444,17 +479,15 @@ config KVM_CLOCK
 config KVM_GUEST
 	bool "KVM Guest support"
 	select PARAVIRT
-	depends on !X86_VOYAGER
-	help
-	 This option enables various optimizations for running under the KVM
-	 hypervisor.
+	---help---
+	  This option enables various optimizations for running under the KVM
+	  hypervisor.
 
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT
 	bool "Enable paravirtualization code"
-	depends on !X86_VOYAGER
-	help
+	---help---
 	  This changes the kernel so it can modify itself when it is run
 	  under a hypervisor, potentially improving performance significantly
 	  over full virtualization.  However, when run without a hypervisor
@@ -467,51 +500,51 @@ config PARAVIRT_CLOCK
 endif
 
 config PARAVIRT_DEBUG
-       bool "paravirt-ops debugging"
-       depends on PARAVIRT && DEBUG_KERNEL
-       help
-         Enable to debug paravirt_ops internals.  Specifically, BUG if
-	 a paravirt_op is missing when it is called.
+	bool "paravirt-ops debugging"
+	depends on PARAVIRT && DEBUG_KERNEL
+	---help---
+	  Enable to debug paravirt_ops internals.  Specifically, BUG if
+	  a paravirt_op is missing when it is called.
 
 config MEMTEST
 	bool "Memtest"
-	help
+	---help---
 	  This option adds a kernel parameter 'memtest', which allows memtest
 	  to be set.
-		memtest=0, mean disabled; -- default
-		memtest=1, mean do 1 test pattern;
-		...
-		memtest=4, mean do 4 test patterns.
+	        memtest=0, mean disabled; -- default
+	        memtest=1, mean do 1 test pattern;
+	        ...
+	        memtest=4, mean do 4 test patterns.
 	  If you are unsure how to answer this question, answer N.
 
 config X86_SUMMIT_NUMA
 	def_bool y
-	depends on X86_32 && NUMA && X86_GENERICARCH
+	depends on X86_32 && NUMA && X86_32_NON_STANDARD
 
 config X86_CYCLONE_TIMER
 	def_bool y
-	depends on X86_GENERICARCH
+	depends on X86_32_NON_STANDARD
 
 source "arch/x86/Kconfig.cpu"
 
 config HPET_TIMER
 	def_bool X86_64
 	prompt "HPET Timer Support" if X86_32
-	help
-         Use the IA-PC HPET (High Precision Event Timer) to manage
-         time in preference to the PIT and RTC, if a HPET is
-         present.
-         HPET is the next generation timer replacing legacy 8254s.
-         The HPET provides a stable time base on SMP
-         systems, unlike the TSC, but it is more expensive to access,
-         as it is off-chip.  You can find the HPET spec at
-         <http://www.intel.com/hardwaredesign/hpetspec_1.pdf>.
+	---help---
+	  Use the IA-PC HPET (High Precision Event Timer) to manage
+	  time in preference to the PIT and RTC, if a HPET is
+	  present.
+	  HPET is the next generation timer replacing legacy 8254s.
+	  The HPET provides a stable time base on SMP
+	  systems, unlike the TSC, but it is more expensive to access,
+	  as it is off-chip.  You can find the HPET spec at
+	  <http://www.intel.com/hardwaredesign/hpetspec_1.pdf>.
 
-         You can safely choose Y here.  However, HPET will only be
-         activated if the platform and the BIOS support this feature.
-         Otherwise the 8254 will be used for timing services.
+	  You can safely choose Y here.  However, HPET will only be
+	  activated if the platform and the BIOS support this feature.
+	  Otherwise the 8254 will be used for timing services.
 
-         Choose N to continue using the legacy 8254 timer.
+	  Choose N to continue using the legacy 8254 timer.
 
 config HPET_EMULATE_RTC
 	def_bool y
@@ -522,7 +555,7 @@ config HPET_EMULATE_RTC
 config DMI
 	default y
 	bool "Enable DMI scanning" if EMBEDDED
-	help
+	---help---
 	  Enabled scanning of DMI to identify machine quirks. Say Y
 	  here unless you have verified that your setup is not
 	  affected by entries in the DMI blacklist. Required by PNP
@@ -534,7 +567,7 @@ config GART_IOMMU
 	select SWIOTLB
 	select AGP
 	depends on X86_64 && PCI
-	help
+	---help---
 	  Support for full DMA access of devices with 32bit memory access only
 	  on systems with more than 3GB. This is usually needed for USB,
 	  sound, many IDE/SATA chipsets and some other devices.
@@ -549,7 +582,7 @@ config CALGARY_IOMMU
 	bool "IBM Calgary IOMMU support"
 	select SWIOTLB
 	depends on X86_64 && PCI && EXPERIMENTAL
-	help
+	---help---
 	  Support for hardware IOMMUs in IBM's xSeries x366 and x460
 	  systems. Needed to run systems with more than 3GB of memory
 	  properly with 32-bit PCI devices that do not support DAC
@@ -567,7 +600,7 @@ config CALGARY_IOMMU_ENABLED_BY_DEFAULT
 	def_bool y
 	prompt "Should Calgary be enabled by default?"
 	depends on CALGARY_IOMMU
-	help
+	---help---
 	  Should Calgary be enabled by default? if you choose 'y', Calgary
 	  will be used (if it exists). If you choose 'n', Calgary will not be
 	  used even if it exists. If you choose 'n' and would like to use
@@ -579,7 +612,7 @@ config AMD_IOMMU
 	select SWIOTLB
 	select PCI_MSI
 	depends on X86_64 && PCI && ACPI
-	help
+	---help---
 	  With this option you can enable support for AMD IOMMU hardware in
 	  your system. An IOMMU is a hardware component which provides
 	  remapping of DMA memory accesses from devices. With an AMD IOMMU you
@@ -594,7 +627,7 @@ config AMD_IOMMU_STATS
 	bool "Export AMD IOMMU statistics to debugfs"
 	depends on AMD_IOMMU
 	select DEBUG_FS
-	help
+	---help---
 	  This option enables code in the AMD IOMMU driver to collect various
 	  statistics about whats happening in the driver and exports that
 	  information to userspace via debugfs.
@@ -603,7 +636,7 @@ config AMD_IOMMU_STATS
 # need this always selected by IOMMU for the VIA workaround
 config SWIOTLB
 	def_bool y if X86_64
-	help
+	---help---
 	  Support for software bounce buffers used on x86-64 systems
 	  which don't have a hardware IOMMU (e.g. the current generation
 	  of Intel's x86-64 CPUs). Using this PCI devices which can only
@@ -621,7 +654,7 @@ config MAXSMP
 	depends on X86_64 && SMP && DEBUG_KERNEL && EXPERIMENTAL
 	select CPUMASK_OFFSTACK
 	default n
-	help
+	---help---
 	  Configure maximum number of CPUS and NUMA Nodes for this architecture.
 	  If unsure, say N.
 
@@ -632,7 +665,7 @@ config NR_CPUS
 	default "4096" if MAXSMP
 	default "32" if SMP && (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000)
 	default "8" if SMP
-	help
+	---help---
 	  This allows you to specify the maximum number of CPUs which this
 	  kernel will support.  The maximum supported value is 512 and the
 	  minimum value which makes sense is 2.
@@ -643,7 +676,7 @@ config NR_CPUS
 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
 	depends on X86_HT
-	help
+	---help---
 	  SMT scheduler support improves the CPU scheduler's decision making
 	  when dealing with Intel Pentium 4 chips with HyperThreading at a
 	  cost of slightly increased overhead in some places. If unsure say
@@ -653,7 +686,7 @@ config SCHED_MC
 	def_bool y
 	prompt "Multi-core scheduler support"
 	depends on X86_HT
-	help
+	---help---
 	  Multi-core scheduler support improves the CPU scheduler's decision
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
@@ -662,8 +695,8 @@ source "kernel/Kconfig.preempt"
 
 config X86_UP_APIC
 	bool "Local APIC support on uniprocessors"
-	depends on X86_32 && !SMP && !(X86_VOYAGER || X86_GENERICARCH)
-	help
+	depends on X86_32 && !SMP && !X86_32_NON_STANDARD
+	---help---
 	  A local APIC (Advanced Programmable Interrupt Controller) is an
 	  integrated interrupt controller in the CPU. If you have a single-CPU
 	  system which has a processor with a local APIC, you can say Y here to
@@ -676,7 +709,7 @@ config X86_UP_APIC
 config X86_UP_IOAPIC
 	bool "IO-APIC support on uniprocessors"
 	depends on X86_UP_APIC
-	help
+	---help---
 	  An IO-APIC (I/O Advanced Programmable Interrupt Controller) is an
 	  SMP-capable replacement for PC-style interrupt controllers. Most
 	  SMP systems and many recent uniprocessor systems have one.
@@ -687,11 +720,11 @@ config X86_UP_IOAPIC
 
 config X86_LOCAL_APIC
 	def_bool y
-	depends on X86_64 || (X86_32 && (X86_UP_APIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH))
+	depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC
 
 config X86_IO_APIC
 	def_bool y
-	depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH))
+	depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC
 
 config X86_VISWS_APIC
 	def_bool y
@@ -701,7 +734,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
 	bool "Reroute for broken boot IRQs"
 	default n
 	depends on X86_IO_APIC
-	help
+	---help---
 	  This option enables a workaround that fixes a source of
 	  spurious interrupts. This is recommended when threaded
 	  interrupt handling is used on systems where the generation of
@@ -723,7 +756,6 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
 
 config X86_MCE
 	bool "Machine Check Exception"
-	depends on !X86_VOYAGER
 	---help---
 	  Machine Check Exception support allows the processor to notify the
 	  kernel if it detects a problem (e.g. overheating, component failure).
@@ -742,7 +774,7 @@ config X86_MCE_INTEL
 	def_bool y
 	prompt "Intel MCE features"
 	depends on X86_64 && X86_MCE && X86_LOCAL_APIC
-	help
+	---help---
 	   Additional support for intel specific MCE features such as
 	   the thermal monitor.
 
@@ -750,14 +782,14 @@ config X86_MCE_AMD
 	def_bool y
 	prompt "AMD MCE features"
 	depends on X86_64 && X86_MCE && X86_LOCAL_APIC
-	help
+	---help---
 	   Additional support for AMD specific MCE features such as
 	   the DRAM Error Threshold.
 
 config X86_MCE_NONFATAL
 	tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
 	depends on X86_32 && X86_MCE
-	help
+	---help---
 	  Enabling this feature starts a timer that triggers every 5 seconds which
 	  will look at the machine check registers to see if anything happened.
 	  Non-fatal problems automatically get corrected (but still logged).
@@ -770,7 +802,7 @@ config X86_MCE_NONFATAL
 config X86_MCE_P4THERMAL
 	bool "check for P4 thermal throttling interrupt."
 	depends on X86_32 && X86_MCE && (X86_UP_APIC || SMP)
-	help
+	---help---
 	  Enabling this feature will cause a message to be printed when the P4
 	  enters thermal throttling.
 
@@ -778,11 +810,11 @@ config VM86
 	bool "Enable VM86 support" if EMBEDDED
 	default y
 	depends on X86_32
-	help
-          This option is required by programs like DOSEMU to run 16-bit legacy
+	---help---
+	  This option is required by programs like DOSEMU to run 16-bit legacy
 	  code on X86 processors. It also may be needed by software like
-          XFree86 to initialize some video cards via BIOS. Disabling this
-          option saves about 6k.
+	  XFree86 to initialize some video cards via BIOS. Disabling this
+	  option saves about 6k.
 
 config TOSHIBA
 	tristate "Toshiba Laptop support"
@@ -856,33 +888,33 @@ config MICROCODE
 	  module will be called microcode.
 
 config MICROCODE_INTEL
-       bool "Intel microcode patch loading support"
-       depends on MICROCODE
-       default MICROCODE
-       select FW_LOADER
-       --help---
-         This options enables microcode patch loading support for Intel
-         processors.
-
-         For latest news and information on obtaining all the required
-         Intel ingredients for this driver, check:
-         <http://www.urbanmyth.org/microcode/>.
+	bool "Intel microcode patch loading support"
+	depends on MICROCODE
+	default MICROCODE
+	select FW_LOADER
+	---help---
+	  This options enables microcode patch loading support for Intel
+	  processors.
+
+	  For latest news and information on obtaining all the required
+	  Intel ingredients for this driver, check:
+	  <http://www.urbanmyth.org/microcode/>.
 
 config MICROCODE_AMD
-       bool "AMD microcode patch loading support"
-       depends on MICROCODE
-       select FW_LOADER
-       --help---
-         If you select this option, microcode patch loading support for AMD
-	 processors will be enabled.
+	bool "AMD microcode patch loading support"
+	depends on MICROCODE
+	select FW_LOADER
+	---help---
+	  If you select this option, microcode patch loading support for AMD
+	  processors will be enabled.
 
-   config MICROCODE_OLD_INTERFACE
+config MICROCODE_OLD_INTERFACE
 	def_bool y
 	depends on MICROCODE
 
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
-	help
+	---help---
 	  This device gives privileged processes access to the x86
 	  Model-Specific Registers (MSRs).  It is a character device with
 	  major 202 and minors 0 to 31 for /dev/cpu/0/msr to /dev/cpu/31/msr.
@@ -891,7 +923,7 @@ config X86_MSR
 
 config X86_CPUID
 	tristate "/dev/cpu/*/cpuid - CPU information support"
-	help
+	---help---
 	  This device gives processes access to the x86 CPUID instruction to
 	  be executed on a specific processor.  It is a character device
 	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
@@ -943,7 +975,7 @@ config NOHIGHMEM
 config HIGHMEM4G
 	bool "4GB"
 	depends on !X86_NUMAQ
-	help
+	---help---
 	  Select this if you have a 32-bit processor and between 1 and 4
 	  gigabytes of physical RAM.
 
@@ -951,7 +983,7 @@ config HIGHMEM64G
 	bool "64GB"
 	depends on !M386 && !M486
 	select X86_PAE
-	help
+	---help---
 	  Select this if you have a 32-bit processor and more than 4
 	  gigabytes of physical RAM.
 
@@ -962,7 +994,7 @@ choice
 	prompt "Memory split" if EMBEDDED
 	default VMSPLIT_3G
 	depends on X86_32
-	help
+	---help---
 	  Select the desired split between kernel and user memory.
 
 	  If the address range available to the kernel is less than the
@@ -1008,20 +1040,20 @@ config HIGHMEM
 config X86_PAE
 	bool "PAE (Physical Address Extension) Support"
 	depends on X86_32 && !HIGHMEM4G
-	help
+	---help---
 	  PAE is required for NX support, and furthermore enables
 	  larger swapspace support for non-overcommit purposes. It
 	  has the cost of more pagetable lookup overhead, and also
 	  consumes more pagetable space per process.
 
 config ARCH_PHYS_ADDR_T_64BIT
-       def_bool X86_64 || X86_PAE
+	def_bool X86_64 || X86_PAE
 
 config DIRECT_GBPAGES
 	bool "Enable 1GB pages for kernel pagetables" if EMBEDDED
 	default y
 	depends on X86_64
-	help
+	---help---
 	  Allow the kernel linear mapping to use 1GB pages on CPUs that
 	  support it. This can improve the kernel's performance a tiny bit by
 	  reducing TLB pressure. If in doubt, say "Y".
@@ -1031,9 +1063,8 @@ config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
 	depends on SMP
 	depends on X86_64 || (X86_32 && HIGHMEM64G && (X86_NUMAQ || X86_BIGSMP || X86_SUMMIT && ACPI) && EXPERIMENTAL)
-	default n if X86_PC
 	default y if (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP)
-	help
+	---help---
 	  Enable NUMA (Non Uniform Memory Access) support.
 
 	  The kernel will try to allocate memory used by a CPU on the
@@ -1056,19 +1087,19 @@ config K8_NUMA
 	def_bool y
 	prompt "Old style AMD Opteron NUMA detection"
 	depends on X86_64 && NUMA && PCI
-	help
-	 Enable K8 NUMA node topology detection.  You should say Y here if
-	 you have a multi processor AMD K8 system. This uses an old
-	 method to read the NUMA configuration directly from the builtin
-	 Northbridge of Opteron. It is recommended to use X86_64_ACPI_NUMA
-	 instead, which also takes priority if both are compiled in.
+	---help---
+	  Enable K8 NUMA node topology detection.  You should say Y here if
+	  you have a multi processor AMD K8 system. This uses an old
+	  method to read the NUMA configuration directly from the builtin
+	  Northbridge of Opteron. It is recommended to use X86_64_ACPI_NUMA
+	  instead, which also takes priority if both are compiled in.
 
 config X86_64_ACPI_NUMA
 	def_bool y
 	prompt "ACPI NUMA detection"
 	depends on X86_64 && NUMA && ACPI && PCI
 	select ACPI_NUMA
-	help
+	---help---
 	  Enable ACPI SRAT based node topology detection.
 
 # Some NUMA nodes have memory ranges that span
@@ -1083,7 +1114,7 @@ config NODES_SPAN_OTHER_NODES
 config NUMA_EMU
 	bool "NUMA emulation"
 	depends on X86_64 && NUMA
-	help
+	---help---
 	  Enable NUMA emulation. A flat machine will be split
 	  into virtual nodes when booted with "numa=fake=N", where N is the
 	  number of nodes. This is only useful for debugging.
@@ -1096,11 +1127,11 @@ config NODES_SHIFT
 	default "4" if X86_NUMAQ
 	default "3"
 	depends on NEED_MULTIPLE_NODES
-	help
+	---help---
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accomodate various tables.
 
-config HAVE_ARCH_BOOTMEM_NODE
+config HAVE_ARCH_BOOTMEM
 	def_bool y
 	depends on X86_32 && NUMA
 
@@ -1134,7 +1165,7 @@ config ARCH_SPARSEMEM_DEFAULT
 
 config ARCH_SPARSEMEM_ENABLE
 	def_bool y
-	depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC) || X86_GENERICARCH
+	depends on X86_64 || NUMA || (EXPERIMENTAL && X86_32) || X86_32_NON_STANDARD
 	select SPARSEMEM_STATIC if X86_32
 	select SPARSEMEM_VMEMMAP_ENABLE if X86_64
 
@@ -1151,61 +1182,61 @@ source "mm/Kconfig"
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
 	depends on X86_32 && (HIGHMEM4G || HIGHMEM64G)
-	help
+	---help---
 	  The VM uses one page table entry for each page of physical memory.
 	  For systems with a lot of RAM, this can be wasteful of precious
 	  low memory.  Setting this option will put user-space page table
 	  entries in high memory.
 
 config X86_CHECK_BIOS_CORRUPTION
-        bool "Check for low memory corruption"
-	help
-	 Periodically check for memory corruption in low memory, which
-	 is suspected to be caused by BIOS.  Even when enabled in the
-	 configuration, it is disabled at runtime.  Enable it by
-	 setting "memory_corruption_check=1" on the kernel command
-	 line.  By default it scans the low 64k of memory every 60
-	 seconds; see the memory_corruption_check_size and
-	 memory_corruption_check_period parameters in
-	 Documentation/kernel-parameters.txt to adjust this.
-
-	 When enabled with the default parameters, this option has
-	 almost no overhead, as it reserves a relatively small amount
-	 of memory and scans it infrequently.  It both detects corruption
-	 and prevents it from affecting the running system.
-
-	 It is, however, intended as a diagnostic tool; if repeatable
-	 BIOS-originated corruption always affects the same memory,
-	 you can use memmap= to prevent the kernel from using that
-	 memory.
+	bool "Check for low memory corruption"
+	---help---
+	  Periodically check for memory corruption in low memory, which
+	  is suspected to be caused by BIOS.  Even when enabled in the
+	  configuration, it is disabled at runtime.  Enable it by
+	  setting "memory_corruption_check=1" on the kernel command
+	  line.  By default it scans the low 64k of memory every 60
+	  seconds; see the memory_corruption_check_size and
+	  memory_corruption_check_period parameters in
+	  Documentation/kernel-parameters.txt to adjust this.
+
+	  When enabled with the default parameters, this option has
+	  almost no overhead, as it reserves a relatively small amount
+	  of memory and scans it infrequently.  It both detects corruption
+	  and prevents it from affecting the running system.
+
+	  It is, however, intended as a diagnostic tool; if repeatable
+	  BIOS-originated corruption always affects the same memory,
+	  you can use memmap= to prevent the kernel from using that
+	  memory.
 
 config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK
-        bool "Set the default setting of memory_corruption_check"
+	bool "Set the default setting of memory_corruption_check"
 	depends on X86_CHECK_BIOS_CORRUPTION
 	default y
-	help
-	 Set whether the default state of memory_corruption_check is
-	 on or off.
+	---help---
+	  Set whether the default state of memory_corruption_check is
+	  on or off.
 
 config X86_RESERVE_LOW_64K
-        bool "Reserve low 64K of RAM on AMI/Phoenix BIOSen"
+	bool "Reserve low 64K of RAM on AMI/Phoenix BIOSen"
 	default y
-	help
-	 Reserve the first 64K of physical RAM on BIOSes that are known
-	 to potentially corrupt that memory range. A numbers of BIOSes are
-	 known to utilize this area during suspend/resume, so it must not
-	 be used by the kernel.
+	---help---
+	  Reserve the first 64K of physical RAM on BIOSes that are known
+	  to potentially corrupt that memory range. A numbers of BIOSes are
+	  known to utilize this area during suspend/resume, so it must not
+	  be used by the kernel.
 
-	 Set this to N if you are absolutely sure that you trust the BIOS
-	 to get all its memory reservations and usages right.
+	  Set this to N if you are absolutely sure that you trust the BIOS
+	  to get all its memory reservations and usages right.
 
-	 If you have doubts about the BIOS (e.g. suspend/resume does not
-	 work or there's kernel crashes after certain hardware hotplug
-	 events) and it's not AMI or Phoenix, then you might want to enable
-	 X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical
-	 corruption patterns.
+	  If you have doubts about the BIOS (e.g. suspend/resume does not
+	  work or there's kernel crashes after certain hardware hotplug
+	  events) and it's not AMI or Phoenix, then you might want to enable
+	  X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical
+	  corruption patterns.
 
-	 Say Y if unsure.
+	  Say Y if unsure.
 
 config MATH_EMULATION
 	bool
@@ -1271,7 +1302,7 @@ config MTRR_SANITIZER
 	def_bool y
 	prompt "MTRR cleanup support"
 	depends on MTRR
-	help
+	---help---
 	  Convert MTRR layout from continuous to discrete, so X drivers can
 	  add writeback entries.
 
@@ -1286,7 +1317,7 @@ config MTRR_SANITIZER_ENABLE_DEFAULT
 	range 0 1
 	default "0"
 	depends on MTRR_SANITIZER
-	help
+	---help---
 	  Enable mtrr cleanup default value
 
 config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT
@@ -1294,7 +1325,7 @@ config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT
 	range 0 7
 	default "1"
 	depends on MTRR_SANITIZER
-	help
+	---help---
 	  mtrr cleanup spare entries default, it can be changed via
 	  mtrr_spare_reg_nr=N on the kernel command line.
 
@@ -1302,7 +1333,7 @@ config X86_PAT
 	bool
 	prompt "x86 PAT support"
 	depends on MTRR
-	help
+	---help---
 	  Use PAT attributes to setup page level cache control.
 
 	  PATs are the modern equivalents of MTRRs and are much more
@@ -1317,20 +1348,20 @@ config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
 	---help---
-	This enables the kernel to use EFI runtime services that are
-	available (such as the EFI variable services).
+	  This enables the kernel to use EFI runtime services that are
+	  available (such as the EFI variable services).
 
-	This option is only useful on systems that have EFI firmware.
-  	In addition, you should use the latest ELILO loader available
-  	at <http://elilo.sourceforge.net> in order to take advantage
-  	of EFI runtime services. However, even with this option, the
-  	resultant kernel should continue to boot on existing non-EFI
-  	platforms.
+	  This option is only useful on systems that have EFI firmware.
+	  In addition, you should use the latest ELILO loader available
+	  at <http://elilo.sourceforge.net> in order to take advantage
+	  of EFI runtime services. However, even with this option, the
+	  resultant kernel should continue to boot on existing non-EFI
+	  platforms.
 
 config SECCOMP
 	def_bool y
 	prompt "Enable seccomp to safely compute untrusted bytecode"
-	help
+	---help---
 	  This kernel feature is useful for number crunching applications
 	  that may need to compute untrusted bytecode during their
 	  execution. By using pipes or other transports made available to
@@ -1343,13 +1374,16 @@ config SECCOMP
 
 	  If unsure, say Y. Only embedded should say N here.
 
+config CC_STACKPROTECTOR_ALL
+	bool
+
 config CC_STACKPROTECTOR
 	bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
-	depends on X86_64 && EXPERIMENTAL && BROKEN
-	help
-         This option turns on the -fstack-protector GCC feature. This
-	  feature puts, at the beginning of critical functions, a canary
-	  value on the stack just before the return address, and validates
+	select CC_STACKPROTECTOR_ALL
+	---help---
+	  This option turns on the -fstack-protector GCC feature. This
+	  feature puts, at the beginning of functions, a canary value on
+	  the stack just before the return address, and validates
 	  the value just before actually returning.  Stack based buffer
 	  overflows (that need to overwrite this return address) now also
 	  overwrite the canary, which gets detected and the attack is then
@@ -1357,22 +1391,14 @@ config CC_STACKPROTECTOR
 
 	  This feature requires gcc version 4.2 or above, or a distribution
 	  gcc with the feature backported. Older versions are automatically
-	  detected and for those versions, this configuration option is ignored.
-
-config CC_STACKPROTECTOR_ALL
-	bool "Use stack-protector for all functions"
-	depends on CC_STACKPROTECTOR
-	help
-	  Normally, GCC only inserts the canary value protection for
-	  functions that use large-ish on-stack buffers. By enabling
-	  this option, GCC will be asked to do this for ALL functions.
+	  detected and for those versions, this configuration option is
+	  ignored. (and a warning is printed during bootup)
 
 source kernel/Kconfig.hz
 
 config KEXEC
 	bool "kexec system call"
-	depends on X86_BIOS_REBOOT
-	help
+	---help---
 	  kexec is a system call that implements the ability to shutdown your
 	  current kernel, and to start another kernel.  It is like a reboot
 	  but it is independent of the system firmware.   And like a reboot
@@ -1389,7 +1415,7 @@ config KEXEC
 config CRASH_DUMP
 	bool "kernel crash dumps"
 	depends on X86_64 || (X86_32 && HIGHMEM)
-	help
+	---help---
 	  Generate crash dump after being started by kexec.
 	  This should be normally only set in special crash dump kernels
 	  which are loaded in the main kernel with kexec-tools into
@@ -1404,7 +1430,7 @@ config KEXEC_JUMP
 	bool "kexec jump (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
 	depends on KEXEC && HIBERNATION && X86_32
-	help
+	---help---
 	  Jump between original kernel and kexeced kernel and invoke
 	  code in physical address mode via KEXEC
 
@@ -1413,7 +1439,7 @@ config PHYSICAL_START
 	default "0x1000000" if X86_NUMAQ
 	default "0x200000" if X86_64
 	default "0x100000"
-	help
+	---help---
 	  This gives the physical address where the kernel is loaded.
 
 	  If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
@@ -1454,7 +1480,7 @@ config PHYSICAL_START
 config RELOCATABLE
 	bool "Build a relocatable kernel (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
-	help
+	---help---
 	  This builds a kernel image that retains relocation information
 	  so it can be loaded someplace besides the default 1MB.
 	  The relocations tend to make the kernel binary about 10% larger,
@@ -1474,7 +1500,7 @@ config PHYSICAL_ALIGN
 	default "0x100000" if X86_32
 	default "0x200000" if X86_64
 	range 0x2000 0x400000
-	help
+	---help---
 	  This value puts the alignment restrictions on physical address
 	  where kernel is loaded and run from. Kernel is compiled for an
 	  address which meets above alignment restriction.
@@ -1495,7 +1521,7 @@ config PHYSICAL_ALIGN
 
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
-	depends on SMP && HOTPLUG && !X86_VOYAGER
+	depends on SMP && HOTPLUG
 	---help---
 	  Say Y here to allow turning CPUs off and on. CPUs can be
 	  controlled through /sys/devices/system/cpu.
@@ -1507,7 +1533,7 @@ config COMPAT_VDSO
 	def_bool y
 	prompt "Compat VDSO support"
 	depends on X86_32 || IA32_EMULATION
-	help
+	---help---
 	  Map the 32-bit VDSO to the predictable old-style address too.
 	---help---
 	  Say N here if you are running a sufficiently recent glibc
@@ -1519,7 +1545,7 @@ config COMPAT_VDSO
 config CMDLINE_BOOL
 	bool "Built-in kernel command line"
 	default n
-	help
+	---help---
 	  Allow for specifying boot arguments to the kernel at
 	  build time.  On some systems (e.g. embedded ones), it is
 	  necessary or convenient to provide some or all of the
@@ -1537,7 +1563,7 @@ config CMDLINE
 	string "Built-in kernel command string"
 	depends on CMDLINE_BOOL
 	default ""
-	help
+	---help---
 	  Enter arguments here that should be compiled into the kernel
 	  image and used at boot time.  If the boot loader provides a
 	  command line at boot time, it is appended to this string to
@@ -1554,7 +1580,7 @@ config CMDLINE_OVERRIDE
 	bool "Built-in command line overrides boot loader arguments"
 	default n
 	depends on CMDLINE_BOOL
-	help
+	---help---
 	  Set this option to 'Y' to have the kernel ignore the boot loader
 	  command line, and use ONLY the built-in command line.
 
@@ -1576,7 +1602,6 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
 	depends on NUMA
 
 menu "Power management and ACPI options"
-	depends on !X86_VOYAGER
 
 config ARCH_HIBERNATION_HEADER
 	def_bool y
@@ -1654,7 +1679,7 @@ if APM
 
 config APM_IGNORE_USER_SUSPEND
 	bool "Ignore USER SUSPEND"
-	help
+	---help---
 	  This option will ignore USER SUSPEND requests. On machines with a
 	  compliant APM BIOS, you want to say N. However, on the NEC Versa M
 	  series notebooks, it is necessary to say Y because of a BIOS bug.
@@ -1678,7 +1703,7 @@ config APM_DO_ENABLE
 
 config APM_CPU_IDLE
 	bool "Make CPU Idle calls when idle"
-	help
+	---help---
 	  Enable calls to APM CPU Idle/CPU Busy inside the kernel's idle loop.
 	  On some machines, this can activate improved power savings, such as
 	  a slowed CPU clock rate, when the machine is idle. These idle calls
@@ -1689,7 +1714,7 @@ config APM_CPU_IDLE
 
 config APM_DISPLAY_BLANK
 	bool "Enable console blanking using APM"
-	help
+	---help---
 	  Enable console blanking using the APM. Some laptops can use this to
 	  turn off the LCD backlight when the screen blanker of the Linux
 	  virtual console blanks the screen. Note that this is only used by
@@ -1702,7 +1727,7 @@ config APM_DISPLAY_BLANK
 
 config APM_ALLOW_INTS
 	bool "Allow interrupts during APM BIOS calls"
-	help
+	---help---
 	  Normally we disable external interrupts while we are making calls to
 	  the APM BIOS as a measure to lessen the effects of a badly behaving
 	  BIOS implementation.  The BIOS should reenable interrupts if it
@@ -1727,7 +1752,7 @@ config PCI
 	bool "PCI support"
 	default y
 	select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)
-	help
+	---help---
 	  Find out whether you have a PCI motherboard. PCI is the name of a
 	  bus system, i.e. the way the CPU talks to the other stuff inside
 	  your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or
@@ -1798,7 +1823,7 @@ config PCI_MMCONFIG
 config DMAR
 	bool "Support for DMA Remapping Devices (EXPERIMENTAL)"
 	depends on X86_64 && PCI_MSI && ACPI && EXPERIMENTAL
-	help
+	---help---
 	  DMA remapping (DMAR) devices support enables independent address
 	  translations for Direct Memory Access (DMA) from devices.
 	  These DMA remapping devices are reported via ACPI tables
@@ -1820,29 +1845,30 @@ config DMAR_GFX_WA
 	def_bool y
 	prompt "Support for Graphics workaround"
 	depends on DMAR
-	help
-	 Current Graphics drivers tend to use physical address
-	 for DMA and avoid using DMA APIs. Setting this config
-	 option permits the IOMMU driver to set a unity map for
-	 all the OS-visible memory. Hence the driver can continue
-	 to use physical addresses for DMA.
+	---help---
+	  Current Graphics drivers tend to use physical address
+	  for DMA and avoid using DMA APIs. Setting this config
+	  option permits the IOMMU driver to set a unity map for
+	  all the OS-visible memory. Hence the driver can continue
+	  to use physical addresses for DMA.
 
 config DMAR_FLOPPY_WA
 	def_bool y
 	depends on DMAR
-	help
-	 Floppy disk drivers are know to bypass DMA API calls
-	 thereby failing to work when IOMMU is enabled. This
-	 workaround will setup a 1:1 mapping for the first
-	 16M to make floppy (an ISA device) work.
+	---help---
+	  Floppy disk drivers are know to bypass DMA API calls
+	  thereby failing to work when IOMMU is enabled. This
+	  workaround will setup a 1:1 mapping for the first
+	  16M to make floppy (an ISA device) work.
 
 config INTR_REMAP
 	bool "Support for Interrupt Remapping (EXPERIMENTAL)"
 	depends on X86_64 && X86_IO_APIC && PCI_MSI && ACPI && EXPERIMENTAL
-	help
-	 Supports Interrupt remapping for IO-APIC and MSI devices.
-	 To use x2apic mode in the CPU's which support x2APIC enhancements or
-	 to support platforms with CPU's having > 8 bit APIC ID, say Y.
+	select X86_X2APIC
+	---help---
+	  Supports Interrupt remapping for IO-APIC and MSI devices.
+	  To use x2apic mode in the CPU's which support x2APIC enhancements or
+	  to support platforms with CPU's having > 8 bit APIC ID, say Y.
 
 source "drivers/pci/pcie/Kconfig"
 
@@ -1856,8 +1882,7 @@ if X86_32
 
 config ISA
 	bool "ISA support"
-	depends on !X86_VOYAGER
-	help
+	---help---
 	  Find out whether you have ISA slots on your motherboard.  ISA is the
 	  name of a bus system, i.e. the way the CPU talks to the other stuff
 	  inside your box.  Other bus systems are PCI, EISA, MicroChannel
@@ -1883,9 +1908,8 @@ config EISA
 source "drivers/eisa/Kconfig"
 
 config MCA
-	bool "MCA support" if !X86_VOYAGER
-	default y if X86_VOYAGER
-	help
+	bool "MCA support"
+	---help---
 	  MicroChannel Architecture is found in some IBM PS/2 machines and
 	  laptops.  It is a bus system similar to PCI or ISA. See
 	  <file:Documentation/mca.txt> (and especially the web page given
@@ -1895,8 +1919,7 @@ source "drivers/mca/Kconfig"
 
 config SCx200
 	tristate "NatSemi SCx200 support"
-	depends on !X86_VOYAGER
-	help
+	---help---
 	  This provides basic support for National Semiconductor's
 	  (now AMD's) Geode processors.  The driver probes for the
 	  PCI-IDs of several on-chip devices, so its a good dependency
@@ -1908,7 +1931,7 @@ config SCx200HR_TIMER
 	tristate "NatSemi SCx200 27MHz High-Resolution Timer Support"
 	depends on SCx200 && GENERIC_TIME
 	default y
-	help
+	---help---
 	  This driver provides a clocksource built upon the on-chip
 	  27MHz high-resolution timer.  Its also a workaround for
 	  NSC Geode SC-1100's buggy TSC, which loses time when the
@@ -1919,7 +1942,7 @@ config GEODE_MFGPT_TIMER
 	def_bool y
 	prompt "Geode Multi-Function General Purpose Timer (MFGPT) events"
 	depends on MGEODE_LX && GENERIC_TIME && GENERIC_CLOCKEVENTS
-	help
+	---help---
 	  This driver provides a clock event source based on the MFGPT
 	  timer(s) in the CS5535 and CS5536 companion chip for the geode.
 	  MFGPTs have a better resolution and max interval than the
@@ -1928,7 +1951,7 @@ config GEODE_MFGPT_TIMER
 config OLPC
 	bool "One Laptop Per Child support"
 	default n
-	help
+	---help---
 	  Add support for detecting the unique features of the OLPC
 	  XO hardware.
 
@@ -1953,16 +1976,16 @@ config IA32_EMULATION
 	bool "IA32 Emulation"
 	depends on X86_64
 	select COMPAT_BINFMT_ELF
-	help
+	---help---
 	  Include code to run 32-bit programs under a 64-bit kernel. You should
 	  likely turn this on, unless you're 100% sure that you don't have any
 	  32-bit programs left.
 
 config IA32_AOUT
-       tristate "IA32 a.out support"
-       depends on IA32_EMULATION
-       help
-         Support old a.out binaries in the 32bit emulation.
+	tristate "IA32 a.out support"
+	depends on IA32_EMULATION
+	---help---
+	  Support old a.out binaries in the 32bit emulation.
 
 config COMPAT
 	def_bool y
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index c98d52e82966..a95eaf0e582a 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -50,7 +50,7 @@ config M386
 config M486
 	bool "486"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a 486 series processor, either Intel or one of the
 	  compatible processors from AMD, Cyrix, IBM, or Intel.  Includes DX,
 	  DX2, and DX4 variants; also SL/SLC/SLC2/SLC3/SX/SX2 and UMC U5D or
@@ -59,7 +59,7 @@ config M486
 config M586
 	bool "586/K5/5x86/6x86/6x86MX"
 	depends on X86_32
-	help
+	---help---
 	  Select this for an 586 or 686 series processor such as the AMD K5,
 	  the Cyrix 5x86, 6x86 and 6x86MX.  This choice does not
 	  assume the RDTSC (Read Time Stamp Counter) instruction.
@@ -67,21 +67,21 @@ config M586
 config M586TSC
 	bool "Pentium-Classic"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Pentium Classic processor with the RDTSC (Read
 	  Time Stamp Counter) instruction for benchmarking.
 
 config M586MMX
 	bool "Pentium-MMX"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Pentium with the MMX graphics/multimedia
 	  extended instructions.
 
 config M686
 	bool "Pentium-Pro"
 	depends on X86_32
-	help
+	---help---
 	  Select this for Intel Pentium Pro chips.  This enables the use of
 	  Pentium Pro extended instructions, and disables the init-time guard
 	  against the f00f bug found in earlier Pentiums.
@@ -89,7 +89,7 @@ config M686
 config MPENTIUMII
 	bool "Pentium-II/Celeron(pre-Coppermine)"
 	depends on X86_32
-	help
+	---help---
 	  Select this for Intel chips based on the Pentium-II and
 	  pre-Coppermine Celeron core.  This option enables an unaligned
 	  copy optimization, compiles the kernel with optimization flags
@@ -99,7 +99,7 @@ config MPENTIUMII
 config MPENTIUMIII
 	bool "Pentium-III/Celeron(Coppermine)/Pentium-III Xeon"
 	depends on X86_32
-	help
+	---help---
 	  Select this for Intel chips based on the Pentium-III and
 	  Celeron-Coppermine core.  This option enables use of some
 	  extended prefetch instructions in addition to the Pentium II
@@ -108,14 +108,14 @@ config MPENTIUMIII
 config MPENTIUMM
 	bool "Pentium M"
 	depends on X86_32
-	help
+	---help---
 	  Select this for Intel Pentium M (not Pentium-4 M)
 	  notebook chips.
 
 config MPENTIUM4
 	bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
 	depends on X86_32
-	help
+	---help---
 	  Select this for Intel Pentium 4 chips.  This includes the
 	  Pentium 4, Pentium D, P4-based Celeron and Xeon, and
 	  Pentium-4 M (not Pentium M) chips.  This option enables compile
@@ -151,7 +151,7 @@ config MPENTIUM4
 config MK6
 	bool "K6/K6-II/K6-III"
 	depends on X86_32
-	help
+	---help---
 	  Select this for an AMD K6-family processor.  Enables use of
 	  some extended instructions, and passes appropriate optimization
 	  flags to GCC.
@@ -159,14 +159,14 @@ config MK6
 config MK7
 	bool "Athlon/Duron/K7"
 	depends on X86_32
-	help
+	---help---
 	  Select this for an AMD Athlon K7-family processor.  Enables use of
 	  some extended instructions, and passes appropriate optimization
 	  flags to GCC.
 
 config MK8
 	bool "Opteron/Athlon64/Hammer/K8"
-	help
+	---help---
 	  Select this for an AMD Opteron or Athlon64 Hammer-family processor.
 	  Enables use of some extended instructions, and passes appropriate
 	  optimization flags to GCC.
@@ -174,7 +174,7 @@ config MK8
 config MCRUSOE
 	bool "Crusoe"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Transmeta Crusoe processor.  Treats the processor
 	  like a 586 with TSC, and sets some GCC optimization flags (like a
 	  Pentium Pro with no alignment requirements).
@@ -182,13 +182,13 @@ config MCRUSOE
 config MEFFICEON
 	bool "Efficeon"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Transmeta Efficeon processor.
 
 config MWINCHIPC6
 	bool "Winchip-C6"
 	depends on X86_32
-	help
+	---help---
 	  Select this for an IDT Winchip C6 chip.  Linux and GCC
 	  treat this chip as a 586TSC with some extended instructions
 	  and alignment requirements.
@@ -196,7 +196,7 @@ config MWINCHIPC6
 config MWINCHIP3D
 	bool "Winchip-2/Winchip-2A/Winchip-3"
 	depends on X86_32
-	help
+	---help---
 	  Select this for an IDT Winchip-2, 2A or 3.  Linux and GCC
 	  treat this chip as a 586TSC with some extended instructions
 	  and alignment requirements.  Also enable out of order memory
@@ -206,19 +206,19 @@ config MWINCHIP3D
 config MGEODEGX1
 	bool "GeodeGX1"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Geode GX1 (Cyrix MediaGX) chip.
 
 config MGEODE_LX
 	bool "Geode GX/LX"
 	depends on X86_32
-	help
+	---help---
 	  Select this for AMD Geode GX and LX processors.
 
 config MCYRIXIII
 	bool "CyrixIII/VIA-C3"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a Cyrix III or C3 chip.  Presently Linux and GCC
 	  treat this chip as a generic 586. Whilst the CPU is 686 class,
 	  it lacks the cmov extension which gcc assumes is present when
@@ -230,7 +230,7 @@ config MCYRIXIII
 config MVIAC3_2
 	bool "VIA C3-2 (Nehemiah)"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a VIA C3 "Nehemiah". Selecting this enables usage
 	  of SSE and tells gcc to treat the CPU as a 686.
 	  Note, this kernel will not boot on older (pre model 9) C3s.
@@ -238,14 +238,14 @@ config MVIAC3_2
 config MVIAC7
 	bool "VIA C7"
 	depends on X86_32
-	help
+	---help---
 	  Select this for a VIA C7.  Selecting this uses the correct cache
 	  shift and tells gcc to treat the CPU as a 686.
 
 config MPSC
 	bool "Intel P4 / older Netburst based Xeon"
 	depends on X86_64
-	help
+	---help---
 	  Optimize for Intel Pentium 4, Pentium D and older Nocona/Dempsey
 	  Xeon CPUs with Intel 64bit which is compatible with x86-64.
 	  Note that the latest Xeons (Xeon 51xx and 53xx) are not based on the
@@ -255,7 +255,7 @@ config MPSC
 
 config MCORE2
 	bool "Core 2/newer Xeon"
-	help
+	---help---
 
 	  Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
 	  53xx) CPUs. You can distinguish newer from older Xeons by the CPU
@@ -265,7 +265,7 @@ config MCORE2
 config GENERIC_CPU
 	bool "Generic-x86-64"
 	depends on X86_64
-	help
+	---help---
 	  Generic x86-64 CPU.
 	  Run equally well on all x86-64 CPUs.
 
@@ -274,7 +274,7 @@ endchoice
 config X86_GENERIC
 	bool "Generic x86 support"
 	depends on X86_32
-	help
+	---help---
 	  Instead of just including optimizations for the selected
 	  x86 variant (e.g. PII, Crusoe or Athlon), include some more
 	  generic optimizations as well. This will make the kernel
@@ -294,25 +294,23 @@ config X86_CPU
 # Define implied options from the CPU selection here
 config X86_L1_CACHE_BYTES
 	int
-	default "128" if GENERIC_CPU || MPSC
-	default "64" if MK8 || MCORE2
-	depends on X86_64
+	default "128" if MPSC
+	default "64" if GENERIC_CPU || MK8 || MCORE2 || X86_32
 
 config X86_INTERNODE_CACHE_BYTES
 	int
 	default "4096" if X86_VSMP
 	default X86_L1_CACHE_BYTES if !X86_VSMP
-	depends on X86_64
 
 config X86_CMPXCHG
 	def_bool X86_64 || (X86_32 && !M386)
 
 config X86_L1_CACHE_SHIFT
 	int
-	default "7" if MPENTIUM4 || X86_GENERIC || GENERIC_CPU || MPSC
+	default "7" if MPENTIUM4 || MPSC
 	default "4" if X86_ELAN || M486 || M386 || MGEODEGX1
 	default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
-	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7
+	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MVIAC7 || X86_GENERIC || GENERIC_CPU
 
 config X86_XADD
 	def_bool y
@@ -321,7 +319,7 @@ config X86_XADD
 config X86_PPRO_FENCE
 	bool "PentiumPro memory ordering errata workaround"
 	depends on M686 || M586MMX || M586TSC || M586 || M486 || M386 || MGEODEGX1
-	help
+	---help---
 	  Old PentiumPro multiprocessor systems had errata that could cause
 	  memory operations to violate the x86 ordering standard in rare cases.
 	  Enabling this option will attempt to work around some (but not all)
@@ -414,14 +412,14 @@ config X86_DEBUGCTLMSR
 
 menuconfig PROCESSOR_SELECT
 	bool "Supported processor vendors" if EMBEDDED
-	help
+	---help---
 	  This lets you choose what x86 vendor support code your kernel
 	  will include.
 
 config CPU_SUP_INTEL
 	default y
 	bool "Support Intel processors" if PROCESSOR_SELECT
-	help
+	---help---
 	  This enables detection, tunings and quirks for Intel processors
 
 	  You need this enabled if you want your kernel to run on an
@@ -435,7 +433,7 @@ config CPU_SUP_CYRIX_32
 	default y
 	bool "Support Cyrix processors" if PROCESSOR_SELECT
 	depends on !64BIT
-	help
+	---help---
 	  This enables detection, tunings and quirks for Cyrix processors
 
 	  You need this enabled if you want your kernel to run on a
@@ -448,7 +446,7 @@ config CPU_SUP_CYRIX_32
 config CPU_SUP_AMD
 	default y
 	bool "Support AMD processors" if PROCESSOR_SELECT
-	help
+	---help---
 	  This enables detection, tunings and quirks for AMD processors
 
 	  You need this enabled if you want your kernel to run on an
@@ -462,7 +460,7 @@ config CPU_SUP_CENTAUR_32
 	default y
 	bool "Support Centaur processors" if PROCESSOR_SELECT
 	depends on !64BIT
-	help
+	---help---
 	  This enables detection, tunings and quirks for Centaur processors
 
 	  You need this enabled if you want your kernel to run on a
@@ -476,7 +474,7 @@ config CPU_SUP_CENTAUR_64
 	default y
 	bool "Support Centaur processors" if PROCESSOR_SELECT
 	depends on 64BIT
-	help
+	---help---
 	  This enables detection, tunings and quirks for Centaur processors
 
 	  You need this enabled if you want your kernel to run on a
@@ -490,7 +488,7 @@ config CPU_SUP_TRANSMETA_32
 	default y
 	bool "Support Transmeta processors" if PROCESSOR_SELECT
 	depends on !64BIT
-	help
+	---help---
 	  This enables detection, tunings and quirks for Transmeta processors
 
 	  You need this enabled if you want your kernel to run on a
@@ -504,7 +502,7 @@ config CPU_SUP_UMC_32
 	default y
 	bool "Support UMC processors" if PROCESSOR_SELECT
 	depends on !64BIT
-	help
+	---help---
 	  This enables detection, tunings and quirks for UMC processors
 
 	  You need this enabled if you want your kernel to run on a
@@ -523,7 +521,7 @@ config X86_PTRACE_BTS
 	bool "Branch Trace Store"
 	default y
 	depends on X86_DEBUGCTLMSR
-	help
+	---help---
 	  This adds a ptrace interface to the hardware's branch trace store.
 
 	  Debuggers may use it to collect an execution trace of the debugged
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index e1983fa025d2..fdb45df608b6 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -7,7 +7,7 @@ source "lib/Kconfig.debug"
 
 config STRICT_DEVMEM
 	bool "Filter access to /dev/mem"
-	help
+	---help---
 	  If this option is disabled, you allow userspace (root) access to all
 	  of memory, including kernel and userspace memory. Accidental
 	  access to this is obviously disastrous, but specific access can
@@ -25,7 +25,7 @@ config STRICT_DEVMEM
 config X86_VERBOSE_BOOTUP
 	bool "Enable verbose x86 bootup info messages"
 	default y
-	help
+	---help---
 	  Enables the informational output from the decompression stage
 	  (e.g. bzImage) of the boot. If you disable this you will still
 	  see errors. Disable this if you want silent bootup.
@@ -33,7 +33,7 @@ config X86_VERBOSE_BOOTUP
 config EARLY_PRINTK
 	bool "Early printk" if EMBEDDED
 	default y
-	help
+	---help---
 	  Write kernel log output directly into the VGA buffer or to a serial
 	  port.
 
@@ -47,7 +47,7 @@ config EARLY_PRINTK_DBGP
 	bool "Early printk via EHCI debug port"
 	default n
 	depends on EARLY_PRINTK && PCI
-	help
+	---help---
 	  Write kernel log output directly into the EHCI debug port.
 
 	  This is useful for kernel debugging when your machine crashes very
@@ -59,14 +59,14 @@ config EARLY_PRINTK_DBGP
 config DEBUG_STACKOVERFLOW
 	bool "Check for stack overflows"
 	depends on DEBUG_KERNEL
-	help
+	---help---
 	  This option will cause messages to be printed if free stack space
 	  drops below a certain limit.
 
 config DEBUG_STACK_USAGE
 	bool "Stack utilization instrumentation"
 	depends on DEBUG_KERNEL
-	help
+	---help---
 	  Enables the display of the minimum amount of free stack which each
 	  task has ever had available in the sysrq-T and sysrq-P debug output.
 
@@ -75,7 +75,7 @@ config DEBUG_STACK_USAGE
 config DEBUG_PAGEALLOC
 	bool "Debug page memory allocations"
 	depends on DEBUG_KERNEL
-	help
+	---help---
 	  Unmap pages from the kernel linear mapping after free_pages().
 	  This results in a large slowdown, but helps to find certain types
 	  of memory corruptions.
@@ -83,9 +83,9 @@ config DEBUG_PAGEALLOC
 config DEBUG_PER_CPU_MAPS
 	bool "Debug access to per_cpu maps"
 	depends on DEBUG_KERNEL
-	depends on X86_SMP
+	depends on SMP
 	default n
-	help
+	---help---
 	  Say Y to verify that the per_cpu map being accessed has
 	  been setup.  Adds a fair amount of code to kernel memory
 	  and decreases performance.
@@ -96,7 +96,7 @@ config X86_PTDUMP
 	bool "Export kernel pagetable layout to userspace via debugfs"
 	depends on DEBUG_KERNEL
 	select DEBUG_FS
-	help
+	---help---
 	  Say Y here if you want to show the kernel pagetable layout in a
 	  debugfs file. This information is only useful for kernel developers
 	  who are working in architecture specific areas of the kernel.
@@ -108,7 +108,7 @@ config DEBUG_RODATA
 	bool "Write protect kernel read-only data structures"
 	default y
 	depends on DEBUG_KERNEL
-	help
+	---help---
 	  Mark the kernel read-only data as write-protected in the pagetables,
 	  in order to catch accidental (and incorrect) writes to such const
 	  data. This is recommended so that we can catch kernel bugs sooner.
@@ -117,7 +117,8 @@ config DEBUG_RODATA
 config DEBUG_RODATA_TEST
 	bool "Testcase for the DEBUG_RODATA feature"
 	depends on DEBUG_RODATA
-	help
+	default y
+	---help---
 	  This option enables a testcase for the DEBUG_RODATA
 	  feature as well as for the change_page_attr() infrastructure.
 	  If in doubt, say "N"
@@ -125,7 +126,7 @@ config DEBUG_RODATA_TEST
 config DEBUG_NX_TEST
 	tristate "Testcase for the NX non-executable stack feature"
 	depends on DEBUG_KERNEL && m
-	help
+	---help---
 	  This option enables a testcase for the CPU NX capability
 	  and the software setup of this feature.
 	  If in doubt, say "N"
@@ -133,7 +134,7 @@ config DEBUG_NX_TEST
 config 4KSTACKS
 	bool "Use 4Kb for kernel stacks instead of 8Kb"
 	depends on X86_32
-	help
+	---help---
 	  If you say Y here the kernel will use a 4Kb stacksize for the
 	  kernel stack attached to each process/thread. This facilitates
 	  running more threads on a system and also reduces the pressure
@@ -144,7 +145,7 @@ config DOUBLEFAULT
 	default y
 	bool "Enable doublefault exception handler" if EMBEDDED
 	depends on X86_32
-	help
+	---help---
 	  This option allows trapping of rare doublefault exceptions that
 	  would otherwise cause a system to silently reboot. Disabling this
 	  option saves about 4k and might cause you much additional grey
@@ -154,7 +155,7 @@ config IOMMU_DEBUG
 	bool "Enable IOMMU debugging"
 	depends on GART_IOMMU && DEBUG_KERNEL
 	depends on X86_64
-	help
+	---help---
 	  Force the IOMMU to on even when you have less than 4GB of
 	  memory and add debugging code. On overflow always panic. And
 	  allow to enable IOMMU leak tracing. Can be disabled at boot
@@ -170,7 +171,7 @@ config IOMMU_LEAK
 	bool "IOMMU leak tracing"
 	depends on DEBUG_KERNEL
 	depends on IOMMU_DEBUG
-	help
+	---help---
 	  Add a simple leak tracer to the IOMMU code. This is useful when you
 	  are debugging a buggy device driver that leaks IOMMU mappings.
 
@@ -203,25 +204,25 @@ choice
 
 config IO_DELAY_0X80
 	bool "port 0x80 based port-IO delay [recommended]"
-	help
+	---help---
 	  This is the traditional Linux IO delay used for in/out_p.
 	  It is the most tested hence safest selection here.
 
 config IO_DELAY_0XED
 	bool "port 0xed based port-IO delay"
-	help
+	---help---
 	  Use port 0xed as the IO delay. This frees up port 0x80 which is
 	  often used as a hardware-debug port.
 
 config IO_DELAY_UDELAY
 	bool "udelay based port-IO delay"
-	help
+	---help---
 	  Use udelay(2) as the IO delay method. This provides the delay
 	  while not having any side-effect on the IO port space.
 
 config IO_DELAY_NONE
 	bool "no port-IO delay"
-	help
+	---help---
 	  No port-IO delay. Will break on old boxes that require port-IO
 	  delay for certain operations. Should work on most new machines.
 
@@ -255,18 +256,18 @@ config DEBUG_BOOT_PARAMS
 	bool "Debug boot parameters"
 	depends on DEBUG_KERNEL
 	depends on DEBUG_FS
-	help
+	---help---
 	  This option will cause struct boot_params to be exported via debugfs.
 
 config CPA_DEBUG
 	bool "CPA self-test code"
 	depends on DEBUG_KERNEL
-	help
+	---help---
 	  Do change_page_attr() self-tests every 30 seconds.
 
 config OPTIMIZE_INLINING
 	bool "Allow gcc to uninline functions marked 'inline'"
-	help
+	---help---
 	  This option determines if the kernel forces gcc to inline the functions
 	  developers have marked 'inline'. Doing so takes away freedom from gcc to
 	  do what it thinks is best, which is desirable for the gcc 3.x series of
@@ -279,4 +280,3 @@ config OPTIMIZE_INLINING
 	  If unsure, say N.
 
 endmenu
-
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index d1a47adb5aec..1836191839ee 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -70,14 +70,17 @@ else
         # this works around some issues with generating unwind tables in older gccs
         # newer gccs do it by default
         KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
 
-        stackp := $(CONFIG_SHELL) $(srctree)/scripts/gcc-x86_64-has-stack-protector.sh
-        stackp-$(CONFIG_CC_STACKPROTECTOR) := $(shell $(stackp) \
-                "$(CC)" -fstack-protector )
-        stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += $(shell $(stackp) \
-                "$(CC)" -fstack-protector-all )
-
-        KBUILD_CFLAGS += $(stackp-y)
+ifdef CONFIG_CC_STACKPROTECTOR
+	cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh
+        ifeq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC)),y)
+                stackp-y := -fstack-protector
+                stackp-$(CONFIG_CC_STACKPROTECTOR_ALL) += -fstack-protector-all
+                KBUILD_CFLAGS += $(stackp-y)
+        else
+                $(warning stack protector enabled but no compiler support)
+        endif
 endif
 
 # Stackpointer is addressed different for 32 bit and 64 bit x86
@@ -102,29 +105,6 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 # prevent gcc from generating any FP code by mistake
 KBUILD_CFLAGS += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,)
 
-###
-# Sub architecture support
-# fcore-y is linked before mcore-y files.
-
-# Default subarch .c files
-mcore-y  := arch/x86/mach-default/
-
-# Voyager subarch support
-mflags-$(CONFIG_X86_VOYAGER)	:= -Iarch/x86/include/asm/mach-voyager
-mcore-$(CONFIG_X86_VOYAGER)	:= arch/x86/mach-voyager/
-
-# generic subarchitecture
-mflags-$(CONFIG_X86_GENERICARCH):= -Iarch/x86/include/asm/mach-generic
-fcore-$(CONFIG_X86_GENERICARCH)	+= arch/x86/mach-generic/
-mcore-$(CONFIG_X86_GENERICARCH)	:= arch/x86/mach-default/
-
-# default subarch .h files
-mflags-y += -Iarch/x86/include/asm/mach-default
-
-# 64 bit does not support subarch support - clear sub arch variables
-fcore-$(CONFIG_X86_64)  :=
-mcore-$(CONFIG_X86_64)  :=
-
 KBUILD_CFLAGS += $(mflags-y)
 KBUILD_AFLAGS += $(mflags-y)
 
@@ -150,9 +130,6 @@ core-$(CONFIG_LGUEST_GUEST) += arch/x86/lguest/
 core-y += arch/x86/kernel/
 core-y += arch/x86/mm/
 
-# Remaining sub architecture files
-core-y += $(mcore-y)
-
 core-y += arch/x86/crypto/
 core-y += arch/x86/vdso/
 core-$(CONFIG_IA32_EMULATION) += arch/x86/ia32/
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index cd48c7210016..c70eff69a1fb 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -32,7 +32,6 @@ setup-y		+= a20.o cmdline.o copy.o cpu.o cpucheck.o edd.o
 setup-y		+= header.o main.o mca.o memory.o pm.o pmjump.o
 setup-y		+= printf.o string.o tty.o video.o video-mode.o version.o
 setup-$(CONFIG_X86_APM_BOOT) += apm.o
-setup-$(CONFIG_X86_VOYAGER) += voyager.o
 
 # The link order of the video-*.o modules can matter.  In particular,
 # video-vga.o *must* be listed first, followed by video-vesa.o.
diff --git a/arch/x86/boot/a20.c b/arch/x86/boot/a20.c
index 4063d630deff..7c19ce8c2442 100644
--- a/arch/x86/boot/a20.c
+++ b/arch/x86/boot/a20.c
@@ -2,6 +2,7 @@
  *
  *   Copyright (C) 1991, 1992 Linus Torvalds
  *   Copyright 2007-2008 rPath, Inc. - All Rights Reserved
+ *   Copyright 2009 Intel Corporation
  *
  *   This file is part of the Linux kernel, and is made available under
  *   the terms of the GNU General Public License version 2.
@@ -15,16 +16,23 @@
 #include "boot.h"
 
 #define MAX_8042_LOOPS	100000
+#define MAX_8042_FF	32
 
 static int empty_8042(void)
 {
 	u8 status;
 	int loops = MAX_8042_LOOPS;
+	int ffs   = MAX_8042_FF;
 
 	while (loops--) {
 		io_delay();
 
 		status = inb(0x64);
+		if (status == 0xff) {
+			/* FF is a plausible, but very unlikely status */
+			if (!--ffs)
+				return -1; /* Assume no KBC present */
+		}
 		if (status & 1) {
 			/* Read and discard input data */
 			io_delay();
@@ -118,44 +126,37 @@ static void enable_a20_fast(void)
 
 int enable_a20(void)
 {
-#if defined(CONFIG_X86_ELAN)
-	/* Elan croaks if we try to touch the KBC */
-	enable_a20_fast();
-	while (!a20_test_long())
-		;
-	return 0;
-#elif defined(CONFIG_X86_VOYAGER)
-	/* On Voyager, a20_test() is unsafe? */
-	enable_a20_kbc();
-	return 0;
-#else
        int loops = A20_ENABLE_LOOPS;
-	while (loops--) {
-		/* First, check to see if A20 is already enabled
-		   (legacy free, etc.) */
-		if (a20_test_short())
-			return 0;
-
-		/* Next, try the BIOS (INT 0x15, AX=0x2401) */
-		enable_a20_bios();
-		if (a20_test_short())
-			return 0;
-
-		/* Try enabling A20 through the keyboard controller */
-		empty_8042();
-		if (a20_test_short())
-			return 0; /* BIOS worked, but with delayed reaction */
-
-		enable_a20_kbc();
-		if (a20_test_long())
-			return 0;
-
-		/* Finally, try enabling the "fast A20 gate" */
-		enable_a20_fast();
-		if (a20_test_long())
-			return 0;
-	}
-
-	return -1;
-#endif
+       int kbc_err;
+
+       while (loops--) {
+	       /* First, check to see if A20 is already enabled
+		  (legacy free, etc.) */
+	       if (a20_test_short())
+		       return 0;
+	       
+	       /* Next, try the BIOS (INT 0x15, AX=0x2401) */
+	       enable_a20_bios();
+	       if (a20_test_short())
+		       return 0;
+	       
+	       /* Try enabling A20 through the keyboard controller */
+	       kbc_err = empty_8042();
+
+	       if (a20_test_short())
+		       return 0; /* BIOS worked, but with delayed reaction */
+	
+	       if (!kbc_err) {
+		       enable_a20_kbc();
+		       if (a20_test_long())
+			       return 0;
+	       }
+	       
+	       /* Finally, try enabling the "fast A20 gate" */
+	       enable_a20_fast();
+	       if (a20_test_long())
+		       return 0;
+       }
+       
+       return -1;
 }
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index cc0ef13fba7a..7b2692e897e5 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -302,9 +302,6 @@ void probe_cards(int unsafe);
 /* video-vesa.c */
 void vesa_store_edid(void);
 
-/* voyager.c */
-int query_voyager(void);
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* BOOT_BOOT_H */
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 1771c804e02f..3ca4c194b8e5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -4,7 +4,7 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets := vmlinux vmlinux.bin vmlinux.bin.gz head_$(BITS).o misc.o piggy.o
+targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma head_$(BITS).o misc.o piggy.o
 
 KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2
 KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
@@ -47,18 +47,35 @@ ifeq ($(CONFIG_X86_32),y)
 ifdef CONFIG_RELOCATABLE
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin.all FORCE
 	$(call if_changed,gzip)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin.all FORCE
+	$(call if_changed,bzip2)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin.all FORCE
+	$(call if_changed,lzma)
 else
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
+	$(call if_changed,bzip2)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
+	$(call if_changed,lzma)
 endif
 LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T
 
 else
+
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
+$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
+	$(call if_changed,bzip2)
+$(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
+	$(call if_changed,lzma)
 
 LDFLAGS_piggy.o := -r --format binary --oformat elf64-x86-64 -T
 endif
 
-$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
+suffix_$(CONFIG_KERNEL_GZIP)  = gz
+suffix_$(CONFIG_KERNEL_BZIP2) = bz2
+suffix_$(CONFIG_KERNEL_LZMA)  = lzma
+
+$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix_y) FORCE
 	$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 29c5fbf08392..3a8a866fb2e2 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -25,14 +25,12 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/boot.h>
 #include <asm/asm-offsets.h>
 
 .section ".text.head","ax",@progbits
-	.globl startup_32
-
-startup_32:
+ENTRY(startup_32)
 	cld
 	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 	 * us to not reload segments */
@@ -113,6 +111,8 @@ startup_32:
  */
 	leal relocated(%ebx), %eax
 	jmp *%eax
+ENDPROC(startup_32)
+
 .section ".text"
 relocated:
 
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 1d5dff4123e1..ed4a82948002 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -26,8 +26,8 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/pgtable.h>
-#include <asm/page.h>
+#include <asm/pgtable_types.h>
+#include <asm/page_types.h>
 #include <asm/boot.h>
 #include <asm/msr.h>
 #include <asm/processor-flags.h>
@@ -35,9 +35,7 @@
 
 .section ".text.head"
 	.code32
-	.globl startup_32
-
-startup_32:
+ENTRY(startup_32)
 	cld
 	/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 	 * us to not reload segments */
@@ -176,6 +174,7 @@ startup_32:
 
 	/* Jump from 32bit compatibility mode into 64bit mode. */
 	lret
+ENDPROC(startup_32)
 
 no_longmode:
 	/* This isn't an x86-64 CPU so hang */
@@ -295,7 +294,6 @@ relocated:
 	call	decompress_kernel
 	popq	%rsi
 
-
 /*
  * Jump to the decompressed kernel.
  */
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index da062216948a..e45be73684ff 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -116,71 +116,13 @@
 /*
  * gzip declarations
  */
-
-#define OF(args)	args
 #define STATIC		static
 
 #undef memset
 #undef memcpy
 #define memzero(s, n)	memset((s), 0, (n))
 
-typedef unsigned char	uch;
-typedef unsigned short	ush;
-typedef unsigned long	ulg;
-
-/*
- * Window size must be at least 32k, and a power of two.
- * We don't actually have a window just a huge output buffer,
- * so we report a 2G window size, as that should always be
- * larger than our output buffer:
- */
-#define WSIZE		0x80000000
-
-/* Input buffer: */
-static unsigned char	*inbuf;
-
-/* Sliding window buffer (and final output buffer): */
-static unsigned char	*window;
-
-/* Valid bytes in inbuf: */
-static unsigned		insize;
-
-/* Index of next byte to be processed in inbuf: */
-static unsigned		inptr;
-
-/* Bytes in output buffer: */
-static unsigned		outcnt;
-
-/* gzip flag byte */
-#define ASCII_FLAG	0x01 /* bit 0 set: file probably ASCII text */
-#define CONTINUATION	0x02 /* bit 1 set: continuation of multi-part gz file */
-#define EXTRA_FIELD	0x04 /* bit 2 set: extra field present */
-#define ORIG_NAM	0x08 /* bit 3 set: original file name present */
-#define COMMENT		0x10 /* bit 4 set: file comment present */
-#define ENCRYPTED	0x20 /* bit 5 set: file is encrypted */
-#define RESERVED	0xC0 /* bit 6, 7:  reserved */
-
-#define get_byte()	(inptr < insize ? inbuf[inptr++] : fill_inbuf())
-
-/* Diagnostic functions */
-#ifdef DEBUG
-#  define Assert(cond, msg) do { if (!(cond)) error(msg); } while (0)
-#  define Trace(x)	do { fprintf x; } while (0)
-#  define Tracev(x)	do { if (verbose) fprintf x ; } while (0)
-#  define Tracevv(x)	do { if (verbose > 1) fprintf x ; } while (0)
-#  define Tracec(c, x)	do { if (verbose && (c)) fprintf x ; } while (0)
-#  define Tracecv(c, x)	do { if (verbose > 1 && (c)) fprintf x ; } while (0)
-#else
-#  define Assert(cond, msg)
-#  define Trace(x)
-#  define Tracev(x)
-#  define Tracevv(x)
-#  define Tracec(c, x)
-#  define Tracecv(c, x)
-#endif
 
-static int  fill_inbuf(void);
-static void flush_window(void);
 static void error(char *m);
 
 /*
@@ -189,13 +131,8 @@ static void error(char *m);
 static struct boot_params *real_mode;		/* Pointer to real-mode data */
 static int quiet;
 
-extern unsigned char input_data[];
-extern int input_len;
-
-static long bytes_out;
-
 static void *memset(void *s, int c, unsigned n);
-static void *memcpy(void *dest, const void *src, unsigned n);
+void *memcpy(void *dest, const void *src, unsigned n);
 
 static void __putstr(int, const char *);
 #define putstr(__x)  __putstr(0, __x)
@@ -213,7 +150,17 @@ static char *vidmem;
 static int vidport;
 static int lines, cols;
 
-#include "../../../../lib/inflate.c"
+#ifdef CONFIG_KERNEL_GZIP
+#include "../../../../lib/decompress_inflate.c"
+#endif
+
+#ifdef CONFIG_KERNEL_BZIP2
+#include "../../../../lib/decompress_bunzip2.c"
+#endif
+
+#ifdef CONFIG_KERNEL_LZMA
+#include "../../../../lib/decompress_unlzma.c"
+#endif
 
 static void scroll(void)
 {
@@ -282,7 +229,7 @@ static void *memset(void *s, int c, unsigned n)
 	return s;
 }
 
-static void *memcpy(void *dest, const void *src, unsigned n)
+void *memcpy(void *dest, const void *src, unsigned n)
 {
 	int i;
 	const char *s = src;
@@ -293,38 +240,6 @@ static void *memcpy(void *dest, const void *src, unsigned n)
 	return dest;
 }
 
-/* ===========================================================================
- * Fill the input buffer. This is called only when the buffer is empty
- * and at least one byte is really needed.
- */
-static int fill_inbuf(void)
-{
-	error("ran out of input data");
-	return 0;
-}
-
-/* ===========================================================================
- * Write the output window window[0..outcnt-1] and update crc and bytes_out.
- * (Used for the decompressed data only.)
- */
-static void flush_window(void)
-{
-	/* With my window equal to my output buffer
-	 * I only need to compute the crc here.
-	 */
-	unsigned long c = crc;         /* temporary variable */
-	unsigned n;
-	unsigned char *in, ch;
-
-	in = window;
-	for (n = 0; n < outcnt; n++) {
-		ch = *in++;
-		c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-	}
-	crc = c;
-	bytes_out += (unsigned long)outcnt;
-	outcnt = 0;
-}
 
 static void error(char *x)
 {
@@ -407,12 +322,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 	lines = real_mode->screen_info.orig_video_lines;
 	cols = real_mode->screen_info.orig_video_cols;
 
-	window = output;		/* Output buffer (Normally at 1M) */
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
-	inbuf  = input_data;		/* Input buffer */
-	insize = input_len;
-	inptr  = 0;
 
 #ifdef CONFIG_X86_64
 	if ((unsigned long)output & (__KERNEL_ALIGN - 1))
@@ -430,10 +341,9 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
 #endif
 #endif
 
-	makecrc();
 	if (!quiet)
 		putstr("\nDecompressing Linux... ");
-	gunzip();
+	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
 	parse_elf(output);
 	if (!quiet)
 		putstr("done.\nBooting the kernel.\n");
diff --git a/arch/x86/boot/copy.S b/arch/x86/boot/copy.S
index ef50c84e8b4b..11f272c6f5e9 100644
--- a/arch/x86/boot/copy.S
+++ b/arch/x86/boot/copy.S
@@ -8,6 +8,8 @@
  *
  * ----------------------------------------------------------------------- */
 
+#include <linux/linkage.h>
+
 /*
  * Memory copy routines
  */
@@ -15,9 +17,7 @@
 	.code16gcc
 	.text
 
-	.globl	memcpy
-	.type	memcpy, @function
-memcpy:
+GLOBAL(memcpy)
 	pushw	%si
 	pushw	%di
 	movw	%ax, %di
@@ -31,11 +31,9 @@ memcpy:
 	popw	%di
 	popw	%si
 	ret
-	.size	memcpy, .-memcpy
+ENDPROC(memcpy)
 
-	.globl	memset
-	.type	memset, @function
-memset:
+GLOBAL(memset)
 	pushw	%di
 	movw	%ax, %di
 	movzbl	%dl, %eax
@@ -48,52 +46,42 @@ memset:
 	rep; stosb
 	popw	%di
 	ret
-	.size	memset, .-memset
+ENDPROC(memset)
 
-	.globl	copy_from_fs
-	.type	copy_from_fs, @function
-copy_from_fs:
+GLOBAL(copy_from_fs)
 	pushw	%ds
 	pushw	%fs
 	popw	%ds
 	call	memcpy
 	popw	%ds
 	ret
-	.size	copy_from_fs, .-copy_from_fs
+ENDPROC(copy_from_fs)
 
-	.globl	copy_to_fs
-	.type	copy_to_fs, @function
-copy_to_fs:
+GLOBAL(copy_to_fs)
 	pushw	%es
 	pushw	%fs
 	popw	%es
 	call	memcpy
 	popw	%es
 	ret
-	.size	copy_to_fs, .-copy_to_fs
+ENDPROC(copy_to_fs)
 
 #if 0 /* Not currently used, but can be enabled as needed */
-
-	.globl	copy_from_gs
-	.type	copy_from_gs, @function
-copy_from_gs:
+GLOBAL(copy_from_gs)
 	pushw	%ds
 	pushw	%gs
 	popw	%ds
 	call	memcpy
 	popw	%ds
 	ret
-	.size	copy_from_gs, .-copy_from_gs
-	.globl	copy_to_gs
+ENDPROC(copy_from_gs)
 
-	.type	copy_to_gs, @function
-copy_to_gs:
+GLOBAL(copy_to_gs)
 	pushw	%es
 	pushw	%gs
 	popw	%es
 	call	memcpy
 	popw	%es
 	ret
-	.size	copy_to_gs, .-copy_to_gs
-
+ENDPROC(copy_to_gs)
 #endif
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b993062e9a5f..7ccff4884a23 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -19,7 +19,7 @@
 #include <linux/utsrelease.h>
 #include <asm/boot.h>
 #include <asm/e820.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/setup.h>
 #include "boot.h"
 #include "offsets.h"
diff --git a/arch/x86/boot/main.c b/arch/x86/boot/main.c
index 197421db1af1..58f0415d3ae0 100644
--- a/arch/x86/boot/main.c
+++ b/arch/x86/boot/main.c
@@ -149,11 +149,6 @@ void main(void)
 	/* Query MCA information */
 	query_mca();
 
-	/* Voyager */
-#ifdef CONFIG_X86_VOYAGER
-	query_voyager();
-#endif
-
 	/* Query Intel SpeedStep (IST) information */
 	query_ist();
 
diff --git a/arch/x86/boot/pmjump.S b/arch/x86/boot/pmjump.S
index 141b6e20ed31..019c17a75851 100644
--- a/arch/x86/boot/pmjump.S
+++ b/arch/x86/boot/pmjump.S
@@ -15,18 +15,15 @@
 #include <asm/boot.h>
 #include <asm/processor-flags.h>
 #include <asm/segment.h>
+#include <linux/linkage.h>
 
 	.text
-
-	.globl	protected_mode_jump
-	.type	protected_mode_jump, @function
-
 	.code16
 
 /*
  * void protected_mode_jump(u32 entrypoint, u32 bootparams);
  */
-protected_mode_jump:
+GLOBAL(protected_mode_jump)
 	movl	%edx, %esi		# Pointer to boot_params table
 
 	xorl	%ebx, %ebx
@@ -47,12 +44,10 @@ protected_mode_jump:
 	.byte	0x66, 0xea		# ljmpl opcode
 2:	.long	in_pm32			# offset
 	.word	__BOOT_CS		# segment
-
-	.size	protected_mode_jump, .-protected_mode_jump
+ENDPROC(protected_mode_jump)
 
 	.code32
-	.type	in_pm32, @function
-in_pm32:
+GLOBAL(in_pm32)
 	# Set up data segments for flat 32-bit mode
 	movl	%ecx, %ds
 	movl	%ecx, %es
@@ -78,5 +73,4 @@ in_pm32:
 	lldt	%cx
 
 	jmpl	*%eax			# Jump to the 32-bit entrypoint
-
-	.size	in_pm32, .-in_pm32
+ENDPROC(in_pm32)
diff --git a/arch/x86/boot/voyager.c b/arch/x86/boot/voyager.c
deleted file mode 100644
index 433909d61e5c..000000000000
--- a/arch/x86/boot/voyager.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/* -*- linux-c -*- ------------------------------------------------------- *
- *
- *   Copyright (C) 1991, 1992 Linus Torvalds
- *   Copyright 2007 rPath, Inc. - All Rights Reserved
- *
- *   This file is part of the Linux kernel, and is made available under
- *   the terms of the GNU General Public License version 2.
- *
- * ----------------------------------------------------------------------- */
-
-/*
- * Get the Voyager config information
- */
-
-#include "boot.h"
-
-int query_voyager(void)
-{
-	u8 err;
-	u16 es, di;
-	/* Abuse the apm_bios_info area for this */
-	u8 *data_ptr = (u8 *)&boot_params.apm_bios_info;
-
-	data_ptr[0] = 0xff;	/* Flag on config not found(?) */
-
-	asm("pushw %%es ; "
-	    "int $0x15 ; "
-	    "setc %0 ; "
-	    "movw %%es, %1 ; "
-	    "popw %%es"
-	    : "=q" (err), "=r" (es), "=D" (di)
-	    : "a" (0xffc0));
-
-	if (err)
-		return -1;	/* Not Voyager */
-
-	set_fs(es);
-	copy_from_fs(data_ptr, di, 7);	/* Table is 7 bytes apparently */
-	return 0;
-}
diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index edba00d98ac3..235b81d0f6f2 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -1,14 +1,13 @@
 #
 # Automatically generated make config: don't edit
-# Linux kernel version: 2.6.27-rc5
-# Wed Sep  3 17:23:09 2008
+# Linux kernel version: 2.6.29-rc4
+# Tue Feb 24 15:50:58 2009
 #
 # CONFIG_64BIT is not set
 CONFIG_X86_32=y
 # CONFIG_X86_64 is not set
 CONFIG_X86=y
 CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
-# CONFIG_GENERIC_LOCKBREAK is not set
 CONFIG_GENERIC_TIME=y
 CONFIG_GENERIC_CMOS_UPDATE=y
 CONFIG_CLOCKSOURCE_WATCHDOG=y
@@ -24,16 +23,14 @@ CONFIG_GENERIC_ISA_DMA=y
 CONFIG_GENERIC_IOMAP=y
 CONFIG_GENERIC_BUG=y
 CONFIG_GENERIC_HWEIGHT=y
-# CONFIG_GENERIC_GPIO is not set
 CONFIG_ARCH_MAY_HAVE_PC_FDC=y
 # CONFIG_RWSEM_GENERIC_SPINLOCK is not set
 CONFIG_RWSEM_XCHGADD_ALGORITHM=y
-# CONFIG_ARCH_HAS_ILOG2_U32 is not set
-# CONFIG_ARCH_HAS_ILOG2_U64 is not set
 CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
 CONFIG_GENERIC_CALIBRATE_DELAY=y
 # CONFIG_GENERIC_TIME_VSYSCALL is not set
 CONFIG_ARCH_HAS_CPU_RELAX=y
+CONFIG_ARCH_HAS_DEFAULT_IDLE=y
 CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
 CONFIG_HAVE_SETUP_PER_CPU_AREA=y
 # CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
@@ -42,12 +39,12 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y
 # CONFIG_ZONE_DMA32 is not set
 CONFIG_ARCH_POPULATES_NODE_MAP=y
 # CONFIG_AUDIT_ARCH is not set
-CONFIG_ARCH_SUPPORTS_AOUT=y
 CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
 CONFIG_GENERIC_HARDIRQS=y
 CONFIG_GENERIC_IRQ_PROBE=y
 CONFIG_GENERIC_PENDING_IRQ=y
 CONFIG_X86_SMP=y
+CONFIG_USE_GENERIC_SMP_HELPERS=y
 CONFIG_X86_32_SMP=y
 CONFIG_X86_HT=y
 CONFIG_X86_BIOS_REBOOT=y
@@ -76,30 +73,44 @@ CONFIG_TASK_IO_ACCOUNTING=y
 CONFIG_AUDIT=y
 CONFIG_AUDITSYSCALL=y
 CONFIG_AUDIT_TREE=y
+
+#
+# RCU Subsystem
+#
+# CONFIG_CLASSIC_RCU is not set
+CONFIG_TREE_RCU=y
+# CONFIG_PREEMPT_RCU is not set
+# CONFIG_RCU_TRACE is not set
+CONFIG_RCU_FANOUT=32
+# CONFIG_RCU_FANOUT_EXACT is not set
+# CONFIG_TREE_RCU_TRACE is not set
+# CONFIG_PREEMPT_RCU_TRACE is not set
 # CONFIG_IKCONFIG is not set
 CONFIG_LOG_BUF_SHIFT=18
-CONFIG_CGROUPS=y
-# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
-# CONFIG_CGROUP_DEVICE is not set
-CONFIG_CPUSETS=y
 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
 CONFIG_GROUP_SCHED=y
 CONFIG_FAIR_GROUP_SCHED=y
 # CONFIG_RT_GROUP_SCHED is not set
 # CONFIG_USER_SCHED is not set
 CONFIG_CGROUP_SCHED=y
+CONFIG_CGROUPS=y
+# CONFIG_CGROUP_DEBUG is not set
+CONFIG_CGROUP_NS=y
+CONFIG_CGROUP_FREEZER=y
+# CONFIG_CGROUP_DEVICE is not set
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
 CONFIG_CGROUP_CPUACCT=y
 CONFIG_RESOURCE_COUNTERS=y
 # CONFIG_CGROUP_MEM_RES_CTLR is not set
 # CONFIG_SYSFS_DEPRECATED_V2 is not set
-CONFIG_PROC_PID_CPUSET=y
 CONFIG_RELAY=y
 CONFIG_NAMESPACES=y
 CONFIG_UTS_NS=y
 CONFIG_IPC_NS=y
 CONFIG_USER_NS=y
 CONFIG_PID_NS=y
+CONFIG_NET_NS=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_INITRAMFS_SOURCE=""
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
@@ -124,12 +135,15 @@ CONFIG_SIGNALFD=y
 CONFIG_TIMERFD=y
 CONFIG_EVENTFD=y
 CONFIG_SHMEM=y
+CONFIG_AIO=y
 CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_PCI_QUIRKS=y
 CONFIG_SLUB_DEBUG=y
 # CONFIG_SLAB is not set
 CONFIG_SLUB=y
 # CONFIG_SLOB is not set
 CONFIG_PROFILING=y
+CONFIG_TRACEPOINTS=y
 CONFIG_MARKERS=y
 # CONFIG_OPROFILE is not set
 CONFIG_HAVE_OPROFILE=y
@@ -139,15 +153,10 @@ CONFIG_KRETPROBES=y
 CONFIG_HAVE_IOREMAP_PROT=y
 CONFIG_HAVE_KPROBES=y
 CONFIG_HAVE_KRETPROBES=y
-# CONFIG_HAVE_ARCH_TRACEHOOK is not set
-# CONFIG_HAVE_DMA_ATTRS is not set
-CONFIG_USE_GENERIC_SMP_HELPERS=y
-# CONFIG_HAVE_CLK is not set
-CONFIG_PROC_PAGE_MONITOR=y
+CONFIG_HAVE_ARCH_TRACEHOOK=y
 CONFIG_HAVE_GENERIC_DMA_COHERENT=y
 CONFIG_SLABINFO=y
 CONFIG_RT_MUTEXES=y
-# CONFIG_TINY_SHMEM is not set
 CONFIG_BASE_SMALL=0
 CONFIG_MODULES=y
 # CONFIG_MODULE_FORCE_LOAD is not set
@@ -155,12 +164,10 @@ CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_FORCE_UNLOAD=y
 # CONFIG_MODVERSIONS is not set
 # CONFIG_MODULE_SRCVERSION_ALL is not set
-CONFIG_KMOD=y
 CONFIG_STOP_MACHINE=y
 CONFIG_BLOCK=y
 # CONFIG_LBD is not set
 CONFIG_BLK_DEV_IO_TRACE=y
-# CONFIG_LSF is not set
 CONFIG_BLK_DEV_BSG=y
 # CONFIG_BLK_DEV_INTEGRITY is not set
 
@@ -176,7 +183,7 @@ CONFIG_IOSCHED_CFQ=y
 CONFIG_DEFAULT_CFQ=y
 # CONFIG_DEFAULT_NOOP is not set
 CONFIG_DEFAULT_IOSCHED="cfq"
-CONFIG_CLASSIC_RCU=y
+CONFIG_FREEZER=y
 
 #
 # Processor type and features
@@ -186,15 +193,14 @@ CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
 CONFIG_SMP=y
+CONFIG_SPARSE_IRQ=y
 CONFIG_X86_FIND_SMP_CONFIG=y
 CONFIG_X86_MPPARSE=y
-CONFIG_X86_PC=y
 # CONFIG_X86_ELAN is not set
-# CONFIG_X86_VOYAGER is not set
 # CONFIG_X86_GENERICARCH is not set
 # CONFIG_X86_VSMP is not set
 # CONFIG_X86_RDC321X is not set
-CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
+CONFIG_SCHED_OMIT_FRAME_POINTER=y
 # CONFIG_PARAVIRT_GUEST is not set
 # CONFIG_MEMTEST is not set
 # CONFIG_M386 is not set
@@ -238,10 +244,19 @@ CONFIG_X86_TSC=y
 CONFIG_X86_CMOV=y
 CONFIG_X86_MINIMUM_CPU_FAMILY=4
 CONFIG_X86_DEBUGCTLMSR=y
+CONFIG_CPU_SUP_INTEL=y
+CONFIG_CPU_SUP_CYRIX_32=y
+CONFIG_CPU_SUP_AMD=y
+CONFIG_CPU_SUP_CENTAUR_32=y
+CONFIG_CPU_SUP_TRANSMETA_32=y
+CONFIG_CPU_SUP_UMC_32=y
+CONFIG_X86_DS=y
+CONFIG_X86_PTRACE_BTS=y
 CONFIG_HPET_TIMER=y
 CONFIG_HPET_EMULATE_RTC=y
 CONFIG_DMI=y
 # CONFIG_IOMMU_HELPER is not set
+# CONFIG_IOMMU_API is not set
 CONFIG_NR_CPUS=64
 CONFIG_SCHED_SMT=y
 CONFIG_SCHED_MC=y
@@ -250,12 +265,17 @@ CONFIG_PREEMPT_VOLUNTARY=y
 # CONFIG_PREEMPT is not set
 CONFIG_X86_LOCAL_APIC=y
 CONFIG_X86_IO_APIC=y
-# CONFIG_X86_MCE is not set
+CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
+CONFIG_X86_MCE=y
+CONFIG_X86_MCE_NONFATAL=y
+CONFIG_X86_MCE_P4THERMAL=y
 CONFIG_VM86=y
 # CONFIG_TOSHIBA is not set
 # CONFIG_I8K is not set
 CONFIG_X86_REBOOTFIXUPS=y
 CONFIG_MICROCODE=y
+CONFIG_MICROCODE_INTEL=y
+CONFIG_MICROCODE_AMD=y
 CONFIG_MICROCODE_OLD_INTERFACE=y
 CONFIG_X86_MSR=y
 CONFIG_X86_CPUID=y
@@ -264,6 +284,7 @@ CONFIG_HIGHMEM4G=y
 # CONFIG_HIGHMEM64G is not set
 CONFIG_PAGE_OFFSET=0xC0000000
 CONFIG_HIGHMEM=y
+# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
 CONFIG_ARCH_FLATMEM_ENABLE=y
 CONFIG_ARCH_SPARSEMEM_ENABLE=y
 CONFIG_ARCH_SELECT_MEMORY_MODEL=y
@@ -274,14 +295,17 @@ CONFIG_FLATMEM_MANUAL=y
 CONFIG_FLATMEM=y
 CONFIG_FLAT_NODE_MEM_MAP=y
 CONFIG_SPARSEMEM_STATIC=y
-# CONFIG_SPARSEMEM_VMEMMAP_ENABLE is not set
 CONFIG_PAGEFLAGS_EXTENDED=y
 CONFIG_SPLIT_PTLOCK_CPUS=4
-CONFIG_RESOURCES_64BIT=y
+# CONFIG_PHYS_ADDR_T_64BIT is not set
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
 CONFIG_VIRT_TO_BUS=y
+CONFIG_UNEVICTABLE_LRU=y
 CONFIG_HIGHPTE=y
+CONFIG_X86_CHECK_BIOS_CORRUPTION=y
+CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
+CONFIG_X86_RESERVE_LOW_64K=y
 # CONFIG_MATH_EMULATION is not set
 CONFIG_MTRR=y
 # CONFIG_MTRR_SANITIZER is not set
@@ -302,10 +326,11 @@ CONFIG_PHYSICAL_START=0x1000000
 CONFIG_PHYSICAL_ALIGN=0x200000
 CONFIG_HOTPLUG_CPU=y
 # CONFIG_COMPAT_VDSO is not set
+# CONFIG_CMDLINE_BOOL is not set
 CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
 
 #
-# Power management options
+# Power management and ACPI options
 #
 CONFIG_PM=y
 CONFIG_PM_DEBUG=y
@@ -331,19 +356,13 @@ CONFIG_ACPI_BATTERY=y
 CONFIG_ACPI_BUTTON=y
 CONFIG_ACPI_FAN=y
 CONFIG_ACPI_DOCK=y
-# CONFIG_ACPI_BAY is not set
 CONFIG_ACPI_PROCESSOR=y
 CONFIG_ACPI_HOTPLUG_CPU=y
 CONFIG_ACPI_THERMAL=y
-# CONFIG_ACPI_WMI is not set
-# CONFIG_ACPI_ASUS is not set
-# CONFIG_ACPI_TOSHIBA is not set
 # CONFIG_ACPI_CUSTOM_DSDT is not set
 CONFIG_ACPI_BLACKLIST_YEAR=0
 # CONFIG_ACPI_DEBUG is not set
-CONFIG_ACPI_EC=y
 # CONFIG_ACPI_PCI_SLOT is not set
-CONFIG_ACPI_POWER=y
 CONFIG_ACPI_SYSTEM=y
 CONFIG_X86_PM_TIMER=y
 CONFIG_ACPI_CONTAINER=y
@@ -388,7 +407,6 @@ CONFIG_X86_ACPI_CPUFREQ=y
 #
 # shared options
 #
-# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
 # CONFIG_X86_SPEEDSTEP_LIB is not set
 CONFIG_CPU_IDLE=y
 CONFIG_CPU_IDLE_GOV_LADDER=y
@@ -415,6 +433,7 @@ CONFIG_ARCH_SUPPORTS_MSI=y
 CONFIG_PCI_MSI=y
 # CONFIG_PCI_LEGACY is not set
 # CONFIG_PCI_DEBUG is not set
+# CONFIG_PCI_STUB is not set
 CONFIG_HT_IRQ=y
 CONFIG_ISA_DMA_API=y
 # CONFIG_ISA is not set
@@ -452,13 +471,17 @@ CONFIG_HOTPLUG_PCI=y
 # Executable file formats / Emulations
 #
 CONFIG_BINFMT_ELF=y
+CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
+CONFIG_HAVE_AOUT=y
 # CONFIG_BINFMT_AOUT is not set
 CONFIG_BINFMT_MISC=y
+CONFIG_HAVE_ATOMIC_IOMAP=y
 CONFIG_NET=y
 
 #
 # Networking options
 #
+CONFIG_COMPAT_NET_DEV_OPS=y
 CONFIG_PACKET=y
 CONFIG_PACKET_MMAP=y
 CONFIG_UNIX=y
@@ -519,7 +542,6 @@ CONFIG_DEFAULT_CUBIC=y
 # CONFIG_DEFAULT_RENO is not set
 CONFIG_DEFAULT_TCP_CONG="cubic"
 CONFIG_TCP_MD5SIG=y
-# CONFIG_IP_VS is not set
 CONFIG_IPV6=y
 # CONFIG_IPV6_PRIVACY is not set
 # CONFIG_IPV6_ROUTER_PREF is not set
@@ -557,19 +579,21 @@ CONFIG_NF_CONNTRACK_IRC=y
 CONFIG_NF_CONNTRACK_SIP=y
 CONFIG_NF_CT_NETLINK=y
 CONFIG_NETFILTER_XTABLES=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
 CONFIG_NETFILTER_XT_TARGET_MARK=y
 CONFIG_NETFILTER_XT_TARGET_NFLOG=y
 CONFIG_NETFILTER_XT_TARGET_SECMARK=y
-CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
 CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
 CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
 CONFIG_NETFILTER_XT_MATCH_MARK=y
 CONFIG_NETFILTER_XT_MATCH_POLICY=y
 CONFIG_NETFILTER_XT_MATCH_STATE=y
+# CONFIG_IP_VS is not set
 
 #
 # IP: Netfilter Configuration
 #
+CONFIG_NF_DEFRAG_IPV4=y
 CONFIG_NF_CONNTRACK_IPV4=y
 CONFIG_NF_CONNTRACK_PROC_COMPAT=y
 CONFIG_IP_NF_IPTABLES=y
@@ -595,8 +619,8 @@ CONFIG_IP_NF_MANGLE=y
 CONFIG_NF_CONNTRACK_IPV6=y
 CONFIG_IP6_NF_IPTABLES=y
 CONFIG_IP6_NF_MATCH_IPV6HEADER=y
-CONFIG_IP6_NF_FILTER=y
 CONFIG_IP6_NF_TARGET_LOG=y
+CONFIG_IP6_NF_FILTER=y
 CONFIG_IP6_NF_TARGET_REJECT=y
 CONFIG_IP6_NF_MANGLE=y
 # CONFIG_IP_DCCP is not set
@@ -604,6 +628,7 @@ CONFIG_IP6_NF_MANGLE=y
 # CONFIG_TIPC is not set
 # CONFIG_ATM is not set
 # CONFIG_BRIDGE is not set
+# CONFIG_NET_DSA is not set
 # CONFIG_VLAN_8021Q is not set
 # CONFIG_DECNET is not set
 CONFIG_LLC=y
@@ -623,6 +648,7 @@ CONFIG_NET_SCHED=y
 # CONFIG_NET_SCH_HTB is not set
 # CONFIG_NET_SCH_HFSC is not set
 # CONFIG_NET_SCH_PRIO is not set
+# CONFIG_NET_SCH_MULTIQ is not set
 # CONFIG_NET_SCH_RED is not set
 # CONFIG_NET_SCH_SFQ is not set
 # CONFIG_NET_SCH_TEQL is not set
@@ -630,6 +656,7 @@ CONFIG_NET_SCHED=y
 # CONFIG_NET_SCH_GRED is not set
 # CONFIG_NET_SCH_DSMARK is not set
 # CONFIG_NET_SCH_NETEM is not set
+# CONFIG_NET_SCH_DRR is not set
 # CONFIG_NET_SCH_INGRESS is not set
 
 #
@@ -644,6 +671,7 @@ CONFIG_NET_CLS=y
 # CONFIG_NET_CLS_RSVP is not set
 # CONFIG_NET_CLS_RSVP6 is not set
 # CONFIG_NET_CLS_FLOW is not set
+# CONFIG_NET_CLS_CGROUP is not set
 CONFIG_NET_EMATCH=y
 CONFIG_NET_EMATCH_STACK=32
 # CONFIG_NET_EMATCH_CMP is not set
@@ -659,7 +687,9 @@ CONFIG_NET_CLS_ACT=y
 # CONFIG_NET_ACT_NAT is not set
 # CONFIG_NET_ACT_PEDIT is not set
 # CONFIG_NET_ACT_SIMP is not set
+# CONFIG_NET_ACT_SKBEDIT is not set
 CONFIG_NET_SCH_FIFO=y
+# CONFIG_DCB is not set
 
 #
 # Network testing
@@ -676,29 +706,33 @@ CONFIG_HAMRADIO=y
 # CONFIG_IRDA is not set
 # CONFIG_BT is not set
 # CONFIG_AF_RXRPC is not set
+# CONFIG_PHONET is not set
 CONFIG_FIB_RULES=y
-
-#
-# Wireless
-#
+CONFIG_WIRELESS=y
 CONFIG_CFG80211=y
+# CONFIG_CFG80211_REG_DEBUG is not set
 CONFIG_NL80211=y
+CONFIG_WIRELESS_OLD_REGULATORY=y
 CONFIG_WIRELESS_EXT=y
 CONFIG_WIRELESS_EXT_SYSFS=y
+# CONFIG_LIB80211 is not set
 CONFIG_MAC80211=y
 
 #
 # Rate control algorithm selection
 #
-CONFIG_MAC80211_RC_PID=y
-CONFIG_MAC80211_RC_DEFAULT_PID=y
-CONFIG_MAC80211_RC_DEFAULT="pid"
+CONFIG_MAC80211_RC_MINSTREL=y
+# CONFIG_MAC80211_RC_DEFAULT_PID is not set
+CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
+CONFIG_MAC80211_RC_DEFAULT="minstrel"
 # CONFIG_MAC80211_MESH is not set
 CONFIG_MAC80211_LEDS=y
 # CONFIG_MAC80211_DEBUGFS is not set
 # CONFIG_MAC80211_DEBUG_MENU is not set
-# CONFIG_IEEE80211 is not set
-# CONFIG_RFKILL is not set
+# CONFIG_WIMAX is not set
+CONFIG_RFKILL=y
+# CONFIG_RFKILL_INPUT is not set
+CONFIG_RFKILL_LEDS=y
 # CONFIG_NET_9P is not set
 
 #
@@ -722,7 +756,7 @@ CONFIG_PROC_EVENTS=y
 # CONFIG_MTD is not set
 # CONFIG_PARPORT is not set
 CONFIG_PNP=y
-# CONFIG_PNP_DEBUG is not set
+CONFIG_PNP_DEBUG_MESSAGES=y
 
 #
 # Protocols
@@ -750,20 +784,19 @@ CONFIG_BLK_DEV_RAM_SIZE=16384
 CONFIG_MISC_DEVICES=y
 # CONFIG_IBM_ASM is not set
 # CONFIG_PHANTOM is not set
-# CONFIG_EEPROM_93CX6 is not set
 # CONFIG_SGI_IOC4 is not set
 # CONFIG_TIFM_CORE is not set
-# CONFIG_ACER_WMI is not set
-# CONFIG_ASUS_LAPTOP is not set
-# CONFIG_FUJITSU_LAPTOP is not set
-# CONFIG_TC1100_WMI is not set
-# CONFIG_MSI_LAPTOP is not set
-# CONFIG_COMPAL_LAPTOP is not set
-# CONFIG_SONY_LAPTOP is not set
-# CONFIG_THINKPAD_ACPI is not set
-# CONFIG_INTEL_MENLOW is not set
+# CONFIG_ICS932S401 is not set
 # CONFIG_ENCLOSURE_SERVICES is not set
 # CONFIG_HP_ILO is not set
+# CONFIG_C2PORT is not set
+
+#
+# EEPROM support
+#
+# CONFIG_EEPROM_AT24 is not set
+# CONFIG_EEPROM_LEGACY is not set
+# CONFIG_EEPROM_93CX6 is not set
 CONFIG_HAVE_IDE=y
 # CONFIG_IDE is not set
 
@@ -802,7 +835,7 @@ CONFIG_SCSI_WAIT_SCAN=m
 #
 CONFIG_SCSI_SPI_ATTRS=y
 # CONFIG_SCSI_FC_ATTRS is not set
-CONFIG_SCSI_ISCSI_ATTRS=y
+# CONFIG_SCSI_ISCSI_ATTRS is not set
 # CONFIG_SCSI_SAS_ATTRS is not set
 # CONFIG_SCSI_SAS_LIBSAS is not set
 # CONFIG_SCSI_SRP_ATTRS is not set
@@ -875,6 +908,7 @@ CONFIG_PATA_OLDPIIX=y
 CONFIG_PATA_SCH=y
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
+CONFIG_MD_AUTODETECT=y
 # CONFIG_MD_LINEAR is not set
 # CONFIG_MD_RAID0 is not set
 # CONFIG_MD_RAID1 is not set
@@ -930,6 +964,9 @@ CONFIG_PHYLIB=y
 # CONFIG_BROADCOM_PHY is not set
 # CONFIG_ICPLUS_PHY is not set
 # CONFIG_REALTEK_PHY is not set
+# CONFIG_NATIONAL_PHY is not set
+# CONFIG_STE10XP is not set
+# CONFIG_LSI_ET1011C_PHY is not set
 # CONFIG_FIXED_PHY is not set
 # CONFIG_MDIO_BITBANG is not set
 CONFIG_NET_ETHERNET=y
@@ -953,6 +990,9 @@ CONFIG_NET_TULIP=y
 # CONFIG_IBM_NEW_EMAC_RGMII is not set
 # CONFIG_IBM_NEW_EMAC_TAH is not set
 # CONFIG_IBM_NEW_EMAC_EMAC4 is not set
+# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
+# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
+# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
 CONFIG_NET_PCI=y
 # CONFIG_PCNET32 is not set
 # CONFIG_AMD8111_ETH is not set
@@ -960,7 +1000,6 @@ CONFIG_NET_PCI=y
 # CONFIG_B44 is not set
 CONFIG_FORCEDETH=y
 # CONFIG_FORCEDETH_NAPI is not set
-# CONFIG_EEPRO100 is not set
 CONFIG_E100=y
 # CONFIG_FEALNX is not set
 # CONFIG_NATSEMI is not set
@@ -974,15 +1013,16 @@ CONFIG_8139TOO=y
 # CONFIG_R6040 is not set
 # CONFIG_SIS900 is not set
 # CONFIG_EPIC100 is not set
+# CONFIG_SMSC9420 is not set
 # CONFIG_SUNDANCE is not set
 # CONFIG_TLAN is not set
 # CONFIG_VIA_RHINE is not set
 # CONFIG_SC92031 is not set
+# CONFIG_ATL2 is not set
 CONFIG_NETDEV_1000=y
 # CONFIG_ACENIC is not set
 # CONFIG_DL2K is not set
 CONFIG_E1000=y
-# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
 CONFIG_E1000E=y
 # CONFIG_IP1000 is not set
 # CONFIG_IGB is not set
@@ -1000,18 +1040,23 @@ CONFIG_BNX2=y
 # CONFIG_QLA3XXX is not set
 # CONFIG_ATL1 is not set
 # CONFIG_ATL1E is not set
+# CONFIG_JME is not set
 CONFIG_NETDEV_10000=y
 # CONFIG_CHELSIO_T1 is not set
+CONFIG_CHELSIO_T3_DEPENDS=y
 # CONFIG_CHELSIO_T3 is not set
+# CONFIG_ENIC is not set
 # CONFIG_IXGBE is not set
 # CONFIG_IXGB is not set
 # CONFIG_S2IO is not set
 # CONFIG_MYRI10GE is not set
 # CONFIG_NETXEN_NIC is not set
 # CONFIG_NIU is not set
+# CONFIG_MLX4_EN is not set
 # CONFIG_MLX4_CORE is not set
 # CONFIG_TEHUTI is not set
 # CONFIG_BNX2X is not set
+# CONFIG_QLGE is not set
 # CONFIG_SFC is not set
 CONFIG_TR=y
 # CONFIG_IBMOL is not set
@@ -1025,9 +1070,8 @@ CONFIG_TR=y
 # CONFIG_WLAN_PRE80211 is not set
 CONFIG_WLAN_80211=y
 # CONFIG_PCMCIA_RAYCS is not set
-# CONFIG_IPW2100 is not set
-# CONFIG_IPW2200 is not set
 # CONFIG_LIBERTAS is not set
+# CONFIG_LIBERTAS_THINFIRM is not set
 # CONFIG_AIRO is not set
 # CONFIG_HERMES is not set
 # CONFIG_ATMEL is not set
@@ -1044,6 +1088,8 @@ CONFIG_WLAN_80211=y
 CONFIG_ATH5K=y
 # CONFIG_ATH5K_DEBUG is not set
 # CONFIG_ATH9K is not set
+# CONFIG_IPW2100 is not set
+# CONFIG_IPW2200 is not set
 # CONFIG_IWLCORE is not set
 # CONFIG_IWLWIFI_LEDS is not set
 # CONFIG_IWLAGN is not set
@@ -1055,6 +1101,10 @@ CONFIG_ATH5K=y
 # CONFIG_RT2X00 is not set
 
 #
+# Enable WiMAX (Networking options) to see the WiMAX drivers
+#
+
+#
 # USB Network Adapters
 #
 # CONFIG_USB_CATC is not set
@@ -1062,6 +1112,7 @@ CONFIG_ATH5K=y
 # CONFIG_USB_PEGASUS is not set
 # CONFIG_USB_RTL8150 is not set
 # CONFIG_USB_USBNET is not set
+# CONFIG_USB_HSO is not set
 CONFIG_NET_PCMCIA=y
 # CONFIG_PCMCIA_3C589 is not set
 # CONFIG_PCMCIA_3C574 is not set
@@ -1123,6 +1174,7 @@ CONFIG_MOUSE_PS2_LOGIPS2PP=y
 CONFIG_MOUSE_PS2_SYNAPTICS=y
 CONFIG_MOUSE_PS2_LIFEBOOK=y
 CONFIG_MOUSE_PS2_TRACKPOINT=y
+# CONFIG_MOUSE_PS2_ELANTECH is not set
 # CONFIG_MOUSE_PS2_TOUCHKIT is not set
 # CONFIG_MOUSE_SERIAL is not set
 # CONFIG_MOUSE_APPLETOUCH is not set
@@ -1160,15 +1212,16 @@ CONFIG_INPUT_TOUCHSCREEN=y
 # CONFIG_TOUCHSCREEN_FUJITSU is not set
 # CONFIG_TOUCHSCREEN_GUNZE is not set
 # CONFIG_TOUCHSCREEN_ELO is not set
+# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
 # CONFIG_TOUCHSCREEN_MTOUCH is not set
 # CONFIG_TOUCHSCREEN_INEXIO is not set
 # CONFIG_TOUCHSCREEN_MK712 is not set
 # CONFIG_TOUCHSCREEN_PENMOUNT is not set
 # CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
 # CONFIG_TOUCHSCREEN_TOUCHWIN is not set
-# CONFIG_TOUCHSCREEN_UCB1400 is not set
 # CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
 # CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
+# CONFIG_TOUCHSCREEN_TSC2007 is not set
 CONFIG_INPUT_MISC=y
 # CONFIG_INPUT_PCSPKR is not set
 # CONFIG_INPUT_APANEL is not set
@@ -1179,6 +1232,7 @@ CONFIG_INPUT_MISC=y
 # CONFIG_INPUT_KEYSPAN_REMOTE is not set
 # CONFIG_INPUT_POWERMATE is not set
 # CONFIG_INPUT_YEALINK is not set
+# CONFIG_INPUT_CM109 is not set
 # CONFIG_INPUT_UINPUT is not set
 
 #
@@ -1245,6 +1299,7 @@ CONFIG_SERIAL_CORE=y
 CONFIG_SERIAL_CORE_CONSOLE=y
 # CONFIG_SERIAL_JSM is not set
 CONFIG_UNIX98_PTYS=y
+# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
 # CONFIG_LEGACY_PTYS is not set
 # CONFIG_IPMI_HANDLER is not set
 CONFIG_HW_RANDOM=y
@@ -1279,6 +1334,7 @@ CONFIG_I2C=y
 CONFIG_I2C_BOARDINFO=y
 # CONFIG_I2C_CHARDEV is not set
 CONFIG_I2C_HELPER_AUTO=y
+CONFIG_I2C_ALGOBIT=y
 
 #
 # I2C Hardware Bus support
@@ -1331,8 +1387,6 @@ CONFIG_I2C_I801=y
 # Miscellaneous I2C Chip support
 #
 # CONFIG_DS1682 is not set
-# CONFIG_EEPROM_AT24 is not set
-# CONFIG_EEPROM_LEGACY is not set
 # CONFIG_SENSORS_PCF8574 is not set
 # CONFIG_PCF8575 is not set
 # CONFIG_SENSORS_PCA9539 is not set
@@ -1351,8 +1405,78 @@ CONFIG_POWER_SUPPLY=y
 # CONFIG_POWER_SUPPLY_DEBUG is not set
 # CONFIG_PDA_POWER is not set
 # CONFIG_BATTERY_DS2760 is not set
-# CONFIG_HWMON is not set
+# CONFIG_BATTERY_BQ27x00 is not set
+CONFIG_HWMON=y
+# CONFIG_HWMON_VID is not set
+# CONFIG_SENSORS_ABITUGURU is not set
+# CONFIG_SENSORS_ABITUGURU3 is not set
+# CONFIG_SENSORS_AD7414 is not set
+# CONFIG_SENSORS_AD7418 is not set
+# CONFIG_SENSORS_ADM1021 is not set
+# CONFIG_SENSORS_ADM1025 is not set
+# CONFIG_SENSORS_ADM1026 is not set
+# CONFIG_SENSORS_ADM1029 is not set
+# CONFIG_SENSORS_ADM1031 is not set
+# CONFIG_SENSORS_ADM9240 is not set
+# CONFIG_SENSORS_ADT7462 is not set
+# CONFIG_SENSORS_ADT7470 is not set
+# CONFIG_SENSORS_ADT7473 is not set
+# CONFIG_SENSORS_ADT7475 is not set
+# CONFIG_SENSORS_K8TEMP is not set
+# CONFIG_SENSORS_ASB100 is not set
+# CONFIG_SENSORS_ATXP1 is not set
+# CONFIG_SENSORS_DS1621 is not set
+# CONFIG_SENSORS_I5K_AMB is not set
+# CONFIG_SENSORS_F71805F is not set
+# CONFIG_SENSORS_F71882FG is not set
+# CONFIG_SENSORS_F75375S is not set
+# CONFIG_SENSORS_FSCHER is not set
+# CONFIG_SENSORS_FSCPOS is not set
+# CONFIG_SENSORS_FSCHMD is not set
+# CONFIG_SENSORS_GL518SM is not set
+# CONFIG_SENSORS_GL520SM is not set
+# CONFIG_SENSORS_CORETEMP is not set
+# CONFIG_SENSORS_IT87 is not set
+# CONFIG_SENSORS_LM63 is not set
+# CONFIG_SENSORS_LM75 is not set
+# CONFIG_SENSORS_LM77 is not set
+# CONFIG_SENSORS_LM78 is not set
+# CONFIG_SENSORS_LM80 is not set
+# CONFIG_SENSORS_LM83 is not set
+# CONFIG_SENSORS_LM85 is not set
+# CONFIG_SENSORS_LM87 is not set
+# CONFIG_SENSORS_LM90 is not set
+# CONFIG_SENSORS_LM92 is not set
+# CONFIG_SENSORS_LM93 is not set
+# CONFIG_SENSORS_LTC4245 is not set
+# CONFIG_SENSORS_MAX1619 is not set
+# CONFIG_SENSORS_MAX6650 is not set
+# CONFIG_SENSORS_PC87360 is not set
+# CONFIG_SENSORS_PC87427 is not set
+# CONFIG_SENSORS_SIS5595 is not set
+# CONFIG_SENSORS_DME1737 is not set
+# CONFIG_SENSORS_SMSC47M1 is not set
+# CONFIG_SENSORS_SMSC47M192 is not set
+# CONFIG_SENSORS_SMSC47B397 is not set
+# CONFIG_SENSORS_ADS7828 is not set
+# CONFIG_SENSORS_THMC50 is not set
+# CONFIG_SENSORS_VIA686A is not set
+# CONFIG_SENSORS_VT1211 is not set
+# CONFIG_SENSORS_VT8231 is not set
+# CONFIG_SENSORS_W83781D is not set
+# CONFIG_SENSORS_W83791D is not set
+# CONFIG_SENSORS_W83792D is not set
+# CONFIG_SENSORS_W83793 is not set
+# CONFIG_SENSORS_W83L785TS is not set
+# CONFIG_SENSORS_W83L786NG is not set
+# CONFIG_SENSORS_W83627HF is not set
+# CONFIG_SENSORS_W83627EHF is not set
+# CONFIG_SENSORS_HDAPS is not set
+# CONFIG_SENSORS_LIS3LV02D is not set
+# CONFIG_SENSORS_APPLESMC is not set
+# CONFIG_HWMON_DEBUG_CHIP is not set
 CONFIG_THERMAL=y
+# CONFIG_THERMAL_HWMON is not set
 CONFIG_WATCHDOG=y
 # CONFIG_WATCHDOG_NOWAYOUT is not set
 
@@ -1372,6 +1496,7 @@ CONFIG_WATCHDOG=y
 # CONFIG_I6300ESB_WDT is not set
 # CONFIG_ITCO_WDT is not set
 # CONFIG_IT8712F_WDT is not set
+# CONFIG_IT87_WDT is not set
 # CONFIG_HP_WATCHDOG is not set
 # CONFIG_SC1200_WDT is not set
 # CONFIG_PC87413_WDT is not set
@@ -1379,9 +1504,11 @@ CONFIG_WATCHDOG=y
 # CONFIG_SBC8360_WDT is not set
 # CONFIG_SBC7240_WDT is not set
 # CONFIG_CPU5_WDT is not set
+# CONFIG_SMSC_SCH311X_WDT is not set
 # CONFIG_SMSC37B787_WDT is not set
 # CONFIG_W83627HF_WDT is not set
 # CONFIG_W83697HF_WDT is not set
+# CONFIG_W83697UG_WDT is not set
 # CONFIG_W83877F_WDT is not set
 # CONFIG_W83977F_WDT is not set
 # CONFIG_MACHZ_WDT is not set
@@ -1397,11 +1524,11 @@ CONFIG_WATCHDOG=y
 # USB-based Watchdog Cards
 #
 # CONFIG_USBPCWATCHDOG is not set
+CONFIG_SSB_POSSIBLE=y
 
 #
 # Sonics Silicon Backplane
 #
-CONFIG_SSB_POSSIBLE=y
 # CONFIG_SSB is not set
 
 #
@@ -1410,7 +1537,13 @@ CONFIG_SSB_POSSIBLE=y
 # CONFIG_MFD_CORE is not set
 # CONFIG_MFD_SM501 is not set
 # CONFIG_HTC_PASIC3 is not set
+# CONFIG_TWL4030_CORE is not set
 # CONFIG_MFD_TMIO is not set
+# CONFIG_PMIC_DA903X is not set
+# CONFIG_MFD_WM8400 is not set
+# CONFIG_MFD_WM8350_I2C is not set
+# CONFIG_MFD_PCF50633 is not set
+# CONFIG_REGULATOR is not set
 
 #
 # Multimedia devices
@@ -1450,6 +1583,7 @@ CONFIG_DRM=y
 # CONFIG_DRM_I810 is not set
 # CONFIG_DRM_I830 is not set
 CONFIG_DRM_I915=y
+# CONFIG_DRM_I915_KMS is not set
 # CONFIG_DRM_MGA is not set
 # CONFIG_DRM_SIS is not set
 # CONFIG_DRM_VIA is not set
@@ -1459,6 +1593,7 @@ CONFIG_DRM_I915=y
 CONFIG_FB=y
 # CONFIG_FIRMWARE_EDID is not set
 # CONFIG_FB_DDC is not set
+# CONFIG_FB_BOOT_VESA_SUPPORT is not set
 CONFIG_FB_CFB_FILLRECT=y
 CONFIG_FB_CFB_COPYAREA=y
 CONFIG_FB_CFB_IMAGEBLIT=y
@@ -1487,7 +1622,6 @@ CONFIG_FB_TILEBLITTING=y
 # CONFIG_FB_UVESA is not set
 # CONFIG_FB_VESA is not set
 CONFIG_FB_EFI=y
-# CONFIG_FB_IMAC is not set
 # CONFIG_FB_N411 is not set
 # CONFIG_FB_HGA is not set
 # CONFIG_FB_S1D13XXX is not set
@@ -1503,6 +1637,7 @@ CONFIG_FB_EFI=y
 # CONFIG_FB_S3 is not set
 # CONFIG_FB_SAVAGE is not set
 # CONFIG_FB_SIS is not set
+# CONFIG_FB_VIA is not set
 # CONFIG_FB_NEOMAGIC is not set
 # CONFIG_FB_KYRO is not set
 # CONFIG_FB_3DFX is not set
@@ -1515,12 +1650,15 @@ CONFIG_FB_EFI=y
 # CONFIG_FB_CARMINE is not set
 # CONFIG_FB_GEODE is not set
 # CONFIG_FB_VIRTUAL is not set
+# CONFIG_FB_METRONOME is not set
+# CONFIG_FB_MB862XX is not set
 CONFIG_BACKLIGHT_LCD_SUPPORT=y
 # CONFIG_LCD_CLASS_DEVICE is not set
 CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_CORGI is not set
+CONFIG_BACKLIGHT_GENERIC=y
 # CONFIG_BACKLIGHT_PROGEAR is not set
 # CONFIG_BACKLIGHT_MBP_NVIDIA is not set
+# CONFIG_BACKLIGHT_SAHARA is not set
 
 #
 # Display device support
@@ -1540,10 +1678,12 @@ CONFIG_LOGO=y
 # CONFIG_LOGO_LINUX_VGA16 is not set
 CONFIG_LOGO_LINUX_CLUT224=y
 CONFIG_SOUND=y
+CONFIG_SOUND_OSS_CORE=y
 CONFIG_SND=y
 CONFIG_SND_TIMER=y
 CONFIG_SND_PCM=y
 CONFIG_SND_HWDEP=y
+CONFIG_SND_JACK=y
 CONFIG_SND_SEQUENCER=y
 CONFIG_SND_SEQ_DUMMY=y
 CONFIG_SND_OSSEMUL=y
@@ -1551,6 +1691,8 @@ CONFIG_SND_MIXER_OSS=y
 CONFIG_SND_PCM_OSS=y
 CONFIG_SND_PCM_OSS_PLUGINS=y
 CONFIG_SND_SEQUENCER_OSS=y
+CONFIG_SND_HRTIMER=y
+CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
 CONFIG_SND_DYNAMIC_MINORS=y
 CONFIG_SND_SUPPORT_OLD_API=y
 CONFIG_SND_VERBOSE_PROCFS=y
@@ -1605,11 +1747,16 @@ CONFIG_SND_PCI=y
 # CONFIG_SND_FM801 is not set
 CONFIG_SND_HDA_INTEL=y
 CONFIG_SND_HDA_HWDEP=y
+# CONFIG_SND_HDA_RECONFIG is not set
+# CONFIG_SND_HDA_INPUT_BEEP is not set
 CONFIG_SND_HDA_CODEC_REALTEK=y
 CONFIG_SND_HDA_CODEC_ANALOG=y
 CONFIG_SND_HDA_CODEC_SIGMATEL=y
 CONFIG_SND_HDA_CODEC_VIA=y
 CONFIG_SND_HDA_CODEC_ATIHDMI=y
+CONFIG_SND_HDA_CODEC_NVHDMI=y
+CONFIG_SND_HDA_CODEC_INTELHDMI=y
+CONFIG_SND_HDA_ELD=y
 CONFIG_SND_HDA_CODEC_CONEXANT=y
 CONFIG_SND_HDA_CODEC_CMEDIA=y
 CONFIG_SND_HDA_CODEC_SI3054=y
@@ -1643,6 +1790,7 @@ CONFIG_SND_USB=y
 # CONFIG_SND_USB_AUDIO is not set
 # CONFIG_SND_USB_USX2Y is not set
 # CONFIG_SND_USB_CAIAQ is not set
+# CONFIG_SND_USB_US122L is not set
 CONFIG_SND_PCMCIA=y
 # CONFIG_SND_VXPOCKET is not set
 # CONFIG_SND_PDAUDIOCF is not set
@@ -1657,15 +1805,37 @@ CONFIG_HIDRAW=y
 # USB Input Devices
 #
 CONFIG_USB_HID=y
-CONFIG_USB_HIDINPUT_POWERBOOK=y
-CONFIG_HID_FF=y
 CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+
+#
+# Special HID drivers
+#
+CONFIG_HID_COMPAT=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_LOGITECH=y
 CONFIG_LOGITECH_FF=y
 # CONFIG_LOGIRUMBLEPAD2_FF is not set
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_PANTHERLORD=y
 CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SUNPLUS=y
+# CONFIG_GREENASIA_FF is not set
+CONFIG_HID_TOPSEED=y
 CONFIG_THRUSTMASTER_FF=y
 CONFIG_ZEROPLUS_FF=y
-CONFIG_USB_HIDDEV=y
 CONFIG_USB_SUPPORT=y
 CONFIG_USB_ARCH_HAS_HCD=y
 CONFIG_USB_ARCH_HAS_OHCI=y
@@ -1683,6 +1853,8 @@ CONFIG_USB_DEVICEFS=y
 CONFIG_USB_SUSPEND=y
 # CONFIG_USB_OTG is not set
 CONFIG_USB_MON=y
+# CONFIG_USB_WUSB is not set
+# CONFIG_USB_WUSB_CBAF is not set
 
 #
 # USB Host Controller Drivers
@@ -1691,6 +1863,7 @@ CONFIG_USB_MON=y
 CONFIG_USB_EHCI_HCD=y
 # CONFIG_USB_EHCI_ROOT_HUB_TT is not set
 # CONFIG_USB_EHCI_TT_NEWSCHED is not set
+# CONFIG_USB_OXU210HP_HCD is not set
 # CONFIG_USB_ISP116X_HCD is not set
 # CONFIG_USB_ISP1760_HCD is not set
 CONFIG_USB_OHCI_HCD=y
@@ -1700,6 +1873,8 @@ CONFIG_USB_OHCI_LITTLE_ENDIAN=y
 CONFIG_USB_UHCI_HCD=y
 # CONFIG_USB_SL811_HCD is not set
 # CONFIG_USB_R8A66597_HCD is not set
+# CONFIG_USB_WHCI_HCD is not set
+# CONFIG_USB_HWA_HCD is not set
 
 #
 # USB Device Class drivers
@@ -1707,20 +1882,20 @@ CONFIG_USB_UHCI_HCD=y
 # CONFIG_USB_ACM is not set
 CONFIG_USB_PRINTER=y
 # CONFIG_USB_WDM is not set
+# CONFIG_USB_TMC is not set
 
 #
-# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
+# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed;
 #
 
 #
-# may also be needed; see USB_STORAGE Help for more information
+# see USB_STORAGE Help for more information
 #
 CONFIG_USB_STORAGE=y
 # CONFIG_USB_STORAGE_DEBUG is not set
 # CONFIG_USB_STORAGE_DATAFAB is not set
 # CONFIG_USB_STORAGE_FREECOM is not set
 # CONFIG_USB_STORAGE_ISD200 is not set
-# CONFIG_USB_STORAGE_DPCM is not set
 # CONFIG_USB_STORAGE_USBAT is not set
 # CONFIG_USB_STORAGE_SDDR09 is not set
 # CONFIG_USB_STORAGE_SDDR55 is not set
@@ -1728,7 +1903,6 @@ CONFIG_USB_STORAGE=y
 # CONFIG_USB_STORAGE_ALAUDA is not set
 # CONFIG_USB_STORAGE_ONETOUCH is not set
 # CONFIG_USB_STORAGE_KARMA is not set
-# CONFIG_USB_STORAGE_SIERRA is not set
 # CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
 CONFIG_USB_LIBUSUAL=y
 
@@ -1749,6 +1923,7 @@ CONFIG_USB_LIBUSUAL=y
 # CONFIG_USB_EMI62 is not set
 # CONFIG_USB_EMI26 is not set
 # CONFIG_USB_ADUTUX is not set
+# CONFIG_USB_SEVSEG is not set
 # CONFIG_USB_RIO500 is not set
 # CONFIG_USB_LEGOTOWER is not set
 # CONFIG_USB_LCD is not set
@@ -1766,7 +1941,13 @@ CONFIG_USB_LIBUSUAL=y
 # CONFIG_USB_IOWARRIOR is not set
 # CONFIG_USB_TEST is not set
 # CONFIG_USB_ISIGHTFW is not set
+# CONFIG_USB_VST is not set
 # CONFIG_USB_GADGET is not set
+
+#
+# OTG and related infrastructure
+#
+# CONFIG_UWB is not set
 # CONFIG_MMC is not set
 # CONFIG_MEMSTICK is not set
 CONFIG_NEW_LEDS=y
@@ -1775,6 +1956,7 @@ CONFIG_LEDS_CLASS=y
 #
 # LED drivers
 #
+# CONFIG_LEDS_ALIX2 is not set
 # CONFIG_LEDS_PCA9532 is not set
 # CONFIG_LEDS_CLEVO_MAIL is not set
 # CONFIG_LEDS_PCA955X is not set
@@ -1785,6 +1967,7 @@ CONFIG_LEDS_CLASS=y
 CONFIG_LEDS_TRIGGERS=y
 # CONFIG_LEDS_TRIGGER_TIMER is not set
 # CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
+# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
 # CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set
 # CONFIG_ACCESSIBILITY is not set
 # CONFIG_INFINIBAND is not set
@@ -1824,6 +2007,7 @@ CONFIG_RTC_INTF_DEV=y
 # CONFIG_RTC_DRV_M41T80 is not set
 # CONFIG_RTC_DRV_S35390A is not set
 # CONFIG_RTC_DRV_FM3130 is not set
+# CONFIG_RTC_DRV_RX8581 is not set
 
 #
 # SPI RTC drivers
@@ -1833,12 +2017,15 @@ CONFIG_RTC_INTF_DEV=y
 # Platform RTC drivers
 #
 CONFIG_RTC_DRV_CMOS=y
+# CONFIG_RTC_DRV_DS1286 is not set
 # CONFIG_RTC_DRV_DS1511 is not set
 # CONFIG_RTC_DRV_DS1553 is not set
 # CONFIG_RTC_DRV_DS1742 is not set
 # CONFIG_RTC_DRV_STK17TA8 is not set
 # CONFIG_RTC_DRV_M48T86 is not set
+# CONFIG_RTC_DRV_M48T35 is not set
 # CONFIG_RTC_DRV_M48T59 is not set
+# CONFIG_RTC_DRV_BQ4802 is not set
 # CONFIG_RTC_DRV_V3020 is not set
 
 #
@@ -1851,6 +2038,22 @@ CONFIG_DMADEVICES=y
 #
 # CONFIG_INTEL_IOATDMA is not set
 # CONFIG_UIO is not set
+# CONFIG_STAGING is not set
+CONFIG_X86_PLATFORM_DEVICES=y
+# CONFIG_ACER_WMI is not set
+# CONFIG_ASUS_LAPTOP is not set
+# CONFIG_FUJITSU_LAPTOP is not set
+# CONFIG_TC1100_WMI is not set
+# CONFIG_MSI_LAPTOP is not set
+# CONFIG_PANASONIC_LAPTOP is not set
+# CONFIG_COMPAL_LAPTOP is not set
+# CONFIG_SONY_LAPTOP is not set
+# CONFIG_THINKPAD_ACPI is not set
+# CONFIG_INTEL_MENLOW is not set
+CONFIG_EEEPC_LAPTOP=y
+# CONFIG_ACPI_WMI is not set
+# CONFIG_ACPI_ASUS is not set
+# CONFIG_ACPI_TOSHIBA is not set
 
 #
 # Firmware Drivers
@@ -1861,8 +2064,7 @@ CONFIG_EFI_VARS=y
 # CONFIG_DELL_RBU is not set
 # CONFIG_DCDBAS is not set
 CONFIG_DMIID=y
-CONFIG_ISCSI_IBFT_FIND=y
-CONFIG_ISCSI_IBFT=y
+# CONFIG_ISCSI_IBFT_FIND is not set
 
 #
 # File systems
@@ -1872,21 +2074,24 @@ CONFIG_EXT3_FS=y
 CONFIG_EXT3_FS_XATTR=y
 CONFIG_EXT3_FS_POSIX_ACL=y
 CONFIG_EXT3_FS_SECURITY=y
-# CONFIG_EXT4DEV_FS is not set
+# CONFIG_EXT4_FS is not set
 CONFIG_JBD=y
 # CONFIG_JBD_DEBUG is not set
 CONFIG_FS_MBCACHE=y
 # CONFIG_REISERFS_FS is not set
 # CONFIG_JFS_FS is not set
 CONFIG_FS_POSIX_ACL=y
+CONFIG_FILE_LOCKING=y
 # CONFIG_XFS_FS is not set
 # CONFIG_OCFS2_FS is not set
+# CONFIG_BTRFS_FS is not set
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY=y
 CONFIG_INOTIFY_USER=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_QUOTA_TREE=y
 # CONFIG_QFMT_V1 is not set
 CONFIG_QFMT_V2=y
 CONFIG_QUOTACTL=y
@@ -1920,16 +2125,14 @@ CONFIG_PROC_FS=y
 CONFIG_PROC_KCORE=y
 CONFIG_PROC_VMCORE=y
 CONFIG_PROC_SYSCTL=y
+CONFIG_PROC_PAGE_MONITOR=y
 CONFIG_SYSFS=y
 CONFIG_TMPFS=y
 CONFIG_TMPFS_POSIX_ACL=y
 CONFIG_HUGETLBFS=y
 CONFIG_HUGETLB_PAGE=y
 # CONFIG_CONFIGFS_FS is not set
-
-#
-# Miscellaneous filesystems
-#
+CONFIG_MISC_FILESYSTEMS=y
 # CONFIG_ADFS_FS is not set
 # CONFIG_AFFS_FS is not set
 # CONFIG_ECRYPT_FS is not set
@@ -1939,6 +2142,7 @@ CONFIG_HUGETLB_PAGE=y
 # CONFIG_BFS_FS is not set
 # CONFIG_EFS_FS is not set
 # CONFIG_CRAMFS is not set
+# CONFIG_SQUASHFS is not set
 # CONFIG_VXFS_FS is not set
 # CONFIG_MINIX_FS is not set
 # CONFIG_OMFS_FS is not set
@@ -1960,6 +2164,7 @@ CONFIG_NFS_ACL_SUPPORT=y
 CONFIG_NFS_COMMON=y
 CONFIG_SUNRPC=y
 CONFIG_SUNRPC_GSS=y
+# CONFIG_SUNRPC_REGISTER_V4 is not set
 CONFIG_RPCSEC_GSS_KRB5=y
 # CONFIG_RPCSEC_GSS_SPKM3 is not set
 # CONFIG_SMB_FS is not set
@@ -2036,7 +2241,7 @@ CONFIG_NLS_UTF8=y
 #
 CONFIG_TRACE_IRQFLAGS_SUPPORT=y
 CONFIG_PRINTK_TIME=y
-CONFIG_ENABLE_WARN_DEPRECATED=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
 CONFIG_ENABLE_MUST_CHECK=y
 CONFIG_FRAME_WARN=2048
 CONFIG_MAGIC_SYSRQ=y
@@ -2066,33 +2271,54 @@ CONFIG_TIMER_STATS=y
 CONFIG_DEBUG_BUGVERBOSE=y
 # CONFIG_DEBUG_INFO is not set
 # CONFIG_DEBUG_VM is not set
+# CONFIG_DEBUG_VIRTUAL is not set
 # CONFIG_DEBUG_WRITECOUNT is not set
 CONFIG_DEBUG_MEMORY_INIT=y
 # CONFIG_DEBUG_LIST is not set
 # CONFIG_DEBUG_SG is not set
+# CONFIG_DEBUG_NOTIFIERS is not set
+CONFIG_ARCH_WANT_FRAME_POINTERS=y
 CONFIG_FRAME_POINTER=y
 # CONFIG_BOOT_PRINTK_DELAY is not set
 # CONFIG_RCU_TORTURE_TEST is not set
+# CONFIG_RCU_CPU_STALL_DETECTOR is not set
 # CONFIG_KPROBES_SANITY_TEST is not set
 # CONFIG_BACKTRACE_SELF_TEST is not set
+# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
 # CONFIG_LKDTM is not set
 # CONFIG_FAULT_INJECTION is not set
 # CONFIG_LATENCYTOP is not set
 CONFIG_SYSCTL_SYSCALL_CHECK=y
-CONFIG_HAVE_FTRACE=y
+CONFIG_USER_STACKTRACE_SUPPORT=y
+CONFIG_HAVE_FUNCTION_TRACER=y
+CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
+CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
 CONFIG_HAVE_DYNAMIC_FTRACE=y
-# CONFIG_FTRACE is not set
+CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
+CONFIG_HAVE_HW_BRANCH_TRACER=y
+
+#
+# Tracers
+#
+# CONFIG_FUNCTION_TRACER is not set
 # CONFIG_IRQSOFF_TRACER is not set
 # CONFIG_SYSPROF_TRACER is not set
 # CONFIG_SCHED_TRACER is not set
 # CONFIG_CONTEXT_SWITCH_TRACER is not set
+# CONFIG_BOOT_TRACER is not set
+# CONFIG_TRACE_BRANCH_PROFILING is not set
+# CONFIG_POWER_TRACER is not set
+# CONFIG_STACK_TRACER is not set
+# CONFIG_HW_BRANCH_TRACER is not set
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
+# CONFIG_DYNAMIC_PRINTK_DEBUG is not set
 # CONFIG_SAMPLES is not set
 CONFIG_HAVE_ARCH_KGDB=y
 # CONFIG_KGDB is not set
 # CONFIG_STRICT_DEVMEM is not set
 CONFIG_X86_VERBOSE_BOOTUP=y
 CONFIG_EARLY_PRINTK=y
+CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_DEBUG_STACK_USAGE=y
 # CONFIG_DEBUG_PAGEALLOC is not set
@@ -2123,8 +2349,10 @@ CONFIG_OPTIMIZE_INLINING=y
 CONFIG_KEYS=y
 CONFIG_KEYS_DEBUG_PROC_KEYS=y
 CONFIG_SECURITY=y
+# CONFIG_SECURITYFS is not set
 CONFIG_SECURITY_NETWORK=y
 # CONFIG_SECURITY_NETWORK_XFRM is not set
+# CONFIG_SECURITY_PATH is not set
 CONFIG_SECURITY_FILE_CAPABILITIES=y
 # CONFIG_SECURITY_ROOTPLUG is not set
 CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=65536
@@ -2135,7 +2363,6 @@ CONFIG_SECURITY_SELINUX_DISABLE=y
 CONFIG_SECURITY_SELINUX_DEVELOP=y
 CONFIG_SECURITY_SELINUX_AVC_STATS=y
 CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
-# CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT is not set
 # CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
 # CONFIG_SECURITY_SMACK is not set
 CONFIG_CRYPTO=y
@@ -2143,11 +2370,18 @@ CONFIG_CRYPTO=y
 #
 # Crypto core or helper
 #
+# CONFIG_CRYPTO_FIPS is not set
 CONFIG_CRYPTO_ALGAPI=y
+CONFIG_CRYPTO_ALGAPI2=y
 CONFIG_CRYPTO_AEAD=y
+CONFIG_CRYPTO_AEAD2=y
 CONFIG_CRYPTO_BLKCIPHER=y
+CONFIG_CRYPTO_BLKCIPHER2=y
 CONFIG_CRYPTO_HASH=y
+CONFIG_CRYPTO_HASH2=y
+CONFIG_CRYPTO_RNG2=y
 CONFIG_CRYPTO_MANAGER=y
+CONFIG_CRYPTO_MANAGER2=y
 # CONFIG_CRYPTO_GF128MUL is not set
 # CONFIG_CRYPTO_NULL is not set
 # CONFIG_CRYPTO_CRYPTD is not set
@@ -2182,6 +2416,7 @@ CONFIG_CRYPTO_HMAC=y
 # Digest
 #
 # CONFIG_CRYPTO_CRC32C is not set
+# CONFIG_CRYPTO_CRC32C_INTEL is not set
 # CONFIG_CRYPTO_MD4 is not set
 CONFIG_CRYPTO_MD5=y
 # CONFIG_CRYPTO_MICHAEL_MIC is not set
@@ -2222,6 +2457,11 @@ CONFIG_CRYPTO_DES=y
 #
 # CONFIG_CRYPTO_DEFLATE is not set
 # CONFIG_CRYPTO_LZO is not set
+
+#
+# Random Number Generation
+#
+# CONFIG_CRYPTO_ANSI_CPRNG is not set
 CONFIG_CRYPTO_HW=y
 # CONFIG_CRYPTO_DEV_PADLOCK is not set
 # CONFIG_CRYPTO_DEV_GEODE is not set
@@ -2239,6 +2479,7 @@ CONFIG_VIRTUALIZATION=y
 CONFIG_BITREVERSE=y
 CONFIG_GENERIC_FIND_FIRST_BIT=y
 CONFIG_GENERIC_FIND_NEXT_BIT=y
+CONFIG_GENERIC_FIND_LAST_BIT=y
 # CONFIG_CRC_CCITT is not set
 # CONFIG_CRC16 is not set
 CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index 322dd2748fc9..9fe5d212ab4c 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -1,14 +1,13 @@
 #
 # Automatically generated make config: don't edit
-# Linux kernel version: 2.6.27-rc5
-# Wed Sep  3 17:13:39 2008
+# Linux kernel version: 2.6.29-rc4
+# Tue Feb 24 15:44:16 2009
 #
 CONFIG_64BIT=y
 # CONFIG_X86_32 is not set
 CONFIG_X86_64=y
 CONFIG_X86=y
 CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
-# CONFIG_GENERIC_LOCKBREAK is not set
 CONFIG_GENERIC_TIME=y
 CONFIG_GENERIC_CMOS_UPDATE=y
 CONFIG_CLOCKSOURCE_WATCHDOG=y
@@ -23,17 +22,16 @@ CONFIG_ZONE_DMA=y
 CONFIG_GENERIC_ISA_DMA=y
 CONFIG_GENERIC_IOMAP=y
 CONFIG_GENERIC_BUG=y
+CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
 CONFIG_GENERIC_HWEIGHT=y
-# CONFIG_GENERIC_GPIO is not set
 CONFIG_ARCH_MAY_HAVE_PC_FDC=y
 CONFIG_RWSEM_GENERIC_SPINLOCK=y
 # CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
-# CONFIG_ARCH_HAS_ILOG2_U32 is not set
-# CONFIG_ARCH_HAS_ILOG2_U64 is not set
 CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
 CONFIG_GENERIC_CALIBRATE_DELAY=y
 CONFIG_GENERIC_TIME_VSYSCALL=y
 CONFIG_ARCH_HAS_CPU_RELAX=y
+CONFIG_ARCH_HAS_DEFAULT_IDLE=y
 CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
 CONFIG_HAVE_SETUP_PER_CPU_AREA=y
 CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
@@ -42,12 +40,12 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y
 CONFIG_ZONE_DMA32=y
 CONFIG_ARCH_POPULATES_NODE_MAP=y
 CONFIG_AUDIT_ARCH=y
-CONFIG_ARCH_SUPPORTS_AOUT=y
 CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
 CONFIG_GENERIC_HARDIRQS=y
 CONFIG_GENERIC_IRQ_PROBE=y
 CONFIG_GENERIC_PENDING_IRQ=y
 CONFIG_X86_SMP=y
+CONFIG_USE_GENERIC_SMP_HELPERS=y
 CONFIG_X86_64_SMP=y
 CONFIG_X86_HT=y
 CONFIG_X86_BIOS_REBOOT=y
@@ -76,30 +74,44 @@ CONFIG_TASK_IO_ACCOUNTING=y
 CONFIG_AUDIT=y
 CONFIG_AUDITSYSCALL=y
 CONFIG_AUDIT_TREE=y
+
+#
+# RCU Subsystem
+#
+# CONFIG_CLASSIC_RCU is not set
+CONFIG_TREE_RCU=y
+# CONFIG_PREEMPT_RCU is not set
+# CONFIG_RCU_TRACE is not set
+CONFIG_RCU_FANOUT=64
+# CONFIG_RCU_FANOUT_EXACT is not set
+# CONFIG_TREE_RCU_TRACE is not set
+# CONFIG_PREEMPT_RCU_TRACE is not set
 # CONFIG_IKCONFIG is not set
 CONFIG_LOG_BUF_SHIFT=18
-CONFIG_CGROUPS=y
-# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
-# CONFIG_CGROUP_DEVICE is not set
-CONFIG_CPUSETS=y
 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
 CONFIG_GROUP_SCHED=y
 CONFIG_FAIR_GROUP_SCHED=y
 # CONFIG_RT_GROUP_SCHED is not set
 # CONFIG_USER_SCHED is not set
 CONFIG_CGROUP_SCHED=y
+CONFIG_CGROUPS=y
+# CONFIG_CGROUP_DEBUG is not set
+CONFIG_CGROUP_NS=y
+CONFIG_CGROUP_FREEZER=y
+# CONFIG_CGROUP_DEVICE is not set
+CONFIG_CPUSETS=y
+CONFIG_PROC_PID_CPUSET=y
 CONFIG_CGROUP_CPUACCT=y
 CONFIG_RESOURCE_COUNTERS=y
 # CONFIG_CGROUP_MEM_RES_CTLR is not set
 # CONFIG_SYSFS_DEPRECATED_V2 is not set
-CONFIG_PROC_PID_CPUSET=y
 CONFIG_RELAY=y
 CONFIG_NAMESPACES=y
 CONFIG_UTS_NS=y
 CONFIG_IPC_NS=y
 CONFIG_USER_NS=y
 CONFIG_PID_NS=y
+CONFIG_NET_NS=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_INITRAMFS_SOURCE=""
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
@@ -124,12 +136,15 @@ CONFIG_SIGNALFD=y
 CONFIG_TIMERFD=y
 CONFIG_EVENTFD=y
 CONFIG_SHMEM=y
+CONFIG_AIO=y
 CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_PCI_QUIRKS=y
 CONFIG_SLUB_DEBUG=y
 # CONFIG_SLAB is not set
 CONFIG_SLUB=y
 # CONFIG_SLOB is not set
 CONFIG_PROFILING=y
+CONFIG_TRACEPOINTS=y
 CONFIG_MARKERS=y
 # CONFIG_OPROFILE is not set
 CONFIG_HAVE_OPROFILE=y
@@ -139,15 +154,10 @@ CONFIG_KRETPROBES=y
 CONFIG_HAVE_IOREMAP_PROT=y
 CONFIG_HAVE_KPROBES=y
 CONFIG_HAVE_KRETPROBES=y
-# CONFIG_HAVE_ARCH_TRACEHOOK is not set
-# CONFIG_HAVE_DMA_ATTRS is not set
-CONFIG_USE_GENERIC_SMP_HELPERS=y
-# CONFIG_HAVE_CLK is not set
-CONFIG_PROC_PAGE_MONITOR=y
+CONFIG_HAVE_ARCH_TRACEHOOK=y
 # CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
 CONFIG_SLABINFO=y
 CONFIG_RT_MUTEXES=y
-# CONFIG_TINY_SHMEM is not set
 CONFIG_BASE_SMALL=0
 CONFIG_MODULES=y
 # CONFIG_MODULE_FORCE_LOAD is not set
@@ -155,7 +165,6 @@ CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_FORCE_UNLOAD=y
 # CONFIG_MODVERSIONS is not set
 # CONFIG_MODULE_SRCVERSION_ALL is not set
-CONFIG_KMOD=y
 CONFIG_STOP_MACHINE=y
 CONFIG_BLOCK=y
 CONFIG_BLK_DEV_IO_TRACE=y
@@ -175,7 +184,7 @@ CONFIG_IOSCHED_CFQ=y
 CONFIG_DEFAULT_CFQ=y
 # CONFIG_DEFAULT_NOOP is not set
 CONFIG_DEFAULT_IOSCHED="cfq"
-CONFIG_CLASSIC_RCU=y
+CONFIG_FREEZER=y
 
 #
 # Processor type and features
@@ -185,13 +194,14 @@ CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
 CONFIG_SMP=y
+CONFIG_SPARSE_IRQ=y
+# CONFIG_NUMA_MIGRATE_IRQ_DESC is not set
 CONFIG_X86_FIND_SMP_CONFIG=y
 CONFIG_X86_MPPARSE=y
-CONFIG_X86_PC=y
 # CONFIG_X86_ELAN is not set
-# CONFIG_X86_VOYAGER is not set
 # CONFIG_X86_GENERICARCH is not set
 # CONFIG_X86_VSMP is not set
+CONFIG_SCHED_OMIT_FRAME_POINTER=y
 # CONFIG_PARAVIRT_GUEST is not set
 # CONFIG_MEMTEST is not set
 # CONFIG_M386 is not set
@@ -230,6 +240,11 @@ CONFIG_X86_CMPXCHG64=y
 CONFIG_X86_CMOV=y
 CONFIG_X86_MINIMUM_CPU_FAMILY=64
 CONFIG_X86_DEBUGCTLMSR=y
+CONFIG_CPU_SUP_INTEL=y
+CONFIG_CPU_SUP_AMD=y
+CONFIG_CPU_SUP_CENTAUR_64=y
+CONFIG_X86_DS=y
+CONFIG_X86_PTRACE_BTS=y
 CONFIG_HPET_TIMER=y
 CONFIG_HPET_EMULATE_RTC=y
 CONFIG_DMI=y
@@ -237,8 +252,11 @@ CONFIG_GART_IOMMU=y
 CONFIG_CALGARY_IOMMU=y
 CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
 CONFIG_AMD_IOMMU=y
+CONFIG_AMD_IOMMU_STATS=y
 CONFIG_SWIOTLB=y
 CONFIG_IOMMU_HELPER=y
+CONFIG_IOMMU_API=y
+# CONFIG_MAXSMP is not set
 CONFIG_NR_CPUS=64
 CONFIG_SCHED_SMT=y
 CONFIG_SCHED_MC=y
@@ -247,12 +265,19 @@ CONFIG_PREEMPT_VOLUNTARY=y
 # CONFIG_PREEMPT is not set
 CONFIG_X86_LOCAL_APIC=y
 CONFIG_X86_IO_APIC=y
-# CONFIG_X86_MCE is not set
+CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
+CONFIG_X86_MCE=y
+CONFIG_X86_MCE_INTEL=y
+CONFIG_X86_MCE_AMD=y
 # CONFIG_I8K is not set
 CONFIG_MICROCODE=y
+CONFIG_MICROCODE_INTEL=y
+CONFIG_MICROCODE_AMD=y
 CONFIG_MICROCODE_OLD_INTERFACE=y
 CONFIG_X86_MSR=y
 CONFIG_X86_CPUID=y
+CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
+CONFIG_DIRECT_GBPAGES=y
 CONFIG_NUMA=y
 CONFIG_K8_NUMA=y
 CONFIG_X86_64_ACPI_NUMA=y
@@ -269,7 +294,6 @@ CONFIG_SPARSEMEM_MANUAL=y
 CONFIG_SPARSEMEM=y
 CONFIG_NEED_MULTIPLE_NODES=y
 CONFIG_HAVE_MEMORY_PRESENT=y
-# CONFIG_SPARSEMEM_STATIC is not set
 CONFIG_SPARSEMEM_EXTREME=y
 CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
 CONFIG_SPARSEMEM_VMEMMAP=y
@@ -280,10 +304,14 @@ CONFIG_SPARSEMEM_VMEMMAP=y
 CONFIG_PAGEFLAGS_EXTENDED=y
 CONFIG_SPLIT_PTLOCK_CPUS=4
 CONFIG_MIGRATION=y
-CONFIG_RESOURCES_64BIT=y
+CONFIG_PHYS_ADDR_T_64BIT=y
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
 CONFIG_VIRT_TO_BUS=y
+CONFIG_UNEVICTABLE_LRU=y
+CONFIG_X86_CHECK_BIOS_CORRUPTION=y
+CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
+CONFIG_X86_RESERVE_LOW_64K=y
 CONFIG_MTRR=y
 # CONFIG_MTRR_SANITIZER is not set
 CONFIG_X86_PAT=y
@@ -302,11 +330,12 @@ CONFIG_PHYSICAL_START=0x1000000
 CONFIG_PHYSICAL_ALIGN=0x200000
 CONFIG_HOTPLUG_CPU=y
 # CONFIG_COMPAT_VDSO is not set
+# CONFIG_CMDLINE_BOOL is not set
 CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
 CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
 
 #
-# Power management options
+# Power management and ACPI options
 #
 CONFIG_ARCH_HIBERNATION_HEADER=y
 CONFIG_PM=y
@@ -333,20 +362,14 @@ CONFIG_ACPI_BATTERY=y
 CONFIG_ACPI_BUTTON=y
 CONFIG_ACPI_FAN=y
 CONFIG_ACPI_DOCK=y
-# CONFIG_ACPI_BAY is not set
 CONFIG_ACPI_PROCESSOR=y
 CONFIG_ACPI_HOTPLUG_CPU=y
 CONFIG_ACPI_THERMAL=y
 CONFIG_ACPI_NUMA=y
-# CONFIG_ACPI_WMI is not set
-# CONFIG_ACPI_ASUS is not set
-# CONFIG_ACPI_TOSHIBA is not set
 # CONFIG_ACPI_CUSTOM_DSDT is not set
 CONFIG_ACPI_BLACKLIST_YEAR=0
 # CONFIG_ACPI_DEBUG is not set
-CONFIG_ACPI_EC=y
 # CONFIG_ACPI_PCI_SLOT is not set
-CONFIG_ACPI_POWER=y
 CONFIG_ACPI_SYSTEM=y
 CONFIG_X86_PM_TIMER=y
 CONFIG_ACPI_CONTAINER=y
@@ -381,13 +404,17 @@ CONFIG_X86_ACPI_CPUFREQ=y
 #
 # shared options
 #
-# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
 # CONFIG_X86_SPEEDSTEP_LIB is not set
 CONFIG_CPU_IDLE=y
 CONFIG_CPU_IDLE_GOV_LADDER=y
 CONFIG_CPU_IDLE_GOV_MENU=y
 
 #
+# Memory power savings
+#
+# CONFIG_I7300_IDLE is not set
+
+#
 # Bus options (PCI etc.)
 #
 CONFIG_PCI=y
@@ -395,8 +422,10 @@ CONFIG_PCI_DIRECT=y
 CONFIG_PCI_MMCONFIG=y
 CONFIG_PCI_DOMAINS=y
 CONFIG_DMAR=y
+# CONFIG_DMAR_DEFAULT_ON is not set
 CONFIG_DMAR_GFX_WA=y
 CONFIG_DMAR_FLOPPY_WA=y
+# CONFIG_INTR_REMAP is not set
 CONFIG_PCIEPORTBUS=y
 # CONFIG_HOTPLUG_PCI_PCIE is not set
 CONFIG_PCIEAER=y
@@ -405,6 +434,7 @@ CONFIG_ARCH_SUPPORTS_MSI=y
 CONFIG_PCI_MSI=y
 # CONFIG_PCI_LEGACY is not set
 # CONFIG_PCI_DEBUG is not set
+# CONFIG_PCI_STUB is not set
 CONFIG_HT_IRQ=y
 CONFIG_ISA_DMA_API=y
 CONFIG_K8_NB=y
@@ -438,6 +468,8 @@ CONFIG_HOTPLUG_PCI=y
 #
 CONFIG_BINFMT_ELF=y
 CONFIG_COMPAT_BINFMT_ELF=y
+CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
+# CONFIG_HAVE_AOUT is not set
 CONFIG_BINFMT_MISC=y
 CONFIG_IA32_EMULATION=y
 # CONFIG_IA32_AOUT is not set
@@ -449,6 +481,7 @@ CONFIG_NET=y
 #
 # Networking options
 #
+CONFIG_COMPAT_NET_DEV_OPS=y
 CONFIG_PACKET=y
 CONFIG_PACKET_MMAP=y
 CONFIG_UNIX=y
@@ -509,7 +542,6 @@ CONFIG_DEFAULT_CUBIC=y
 # CONFIG_DEFAULT_RENO is not set
 CONFIG_DEFAULT_TCP_CONG="cubic"
 CONFIG_TCP_MD5SIG=y
-# CONFIG_IP_VS is not set
 CONFIG_IPV6=y
 # CONFIG_IPV6_PRIVACY is not set
 # CONFIG_IPV6_ROUTER_PREF is not set
@@ -547,19 +579,21 @@ CONFIG_NF_CONNTRACK_IRC=y
 CONFIG_NF_CONNTRACK_SIP=y
 CONFIG_NF_CT_NETLINK=y
 CONFIG_NETFILTER_XTABLES=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
 CONFIG_NETFILTER_XT_TARGET_MARK=y
 CONFIG_NETFILTER_XT_TARGET_NFLOG=y
 CONFIG_NETFILTER_XT_TARGET_SECMARK=y
-CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
 CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
 CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
 CONFIG_NETFILTER_XT_MATCH_MARK=y
 CONFIG_NETFILTER_XT_MATCH_POLICY=y
 CONFIG_NETFILTER_XT_MATCH_STATE=y
+# CONFIG_IP_VS is not set
 
 #
 # IP: Netfilter Configuration
 #
+CONFIG_NF_DEFRAG_IPV4=y
 CONFIG_NF_CONNTRACK_IPV4=y
 CONFIG_NF_CONNTRACK_PROC_COMPAT=y
 CONFIG_IP_NF_IPTABLES=y
@@ -585,8 +619,8 @@ CONFIG_IP_NF_MANGLE=y
 CONFIG_NF_CONNTRACK_IPV6=y
 CONFIG_IP6_NF_IPTABLES=y
 CONFIG_IP6_NF_MATCH_IPV6HEADER=y
-CONFIG_IP6_NF_FILTER=y
 CONFIG_IP6_NF_TARGET_LOG=y
+CONFIG_IP6_NF_FILTER=y
 CONFIG_IP6_NF_TARGET_REJECT=y
 CONFIG_IP6_NF_MANGLE=y
 # CONFIG_IP_DCCP is not set
@@ -594,6 +628,7 @@ CONFIG_IP6_NF_MANGLE=y
 # CONFIG_TIPC is not set
 # CONFIG_ATM is not set
 # CONFIG_BRIDGE is not set
+# CONFIG_NET_DSA is not set
 # CONFIG_VLAN_8021Q is not set
 # CONFIG_DECNET is not set
 CONFIG_LLC=y
@@ -613,6 +648,7 @@ CONFIG_NET_SCHED=y
 # CONFIG_NET_SCH_HTB is not set
 # CONFIG_NET_SCH_HFSC is not set
 # CONFIG_NET_SCH_PRIO is not set
+# CONFIG_NET_SCH_MULTIQ is not set
 # CONFIG_NET_SCH_RED is not set
 # CONFIG_NET_SCH_SFQ is not set
 # CONFIG_NET_SCH_TEQL is not set
@@ -620,6 +656,7 @@ CONFIG_NET_SCHED=y
 # CONFIG_NET_SCH_GRED is not set
 # CONFIG_NET_SCH_DSMARK is not set
 # CONFIG_NET_SCH_NETEM is not set
+# CONFIG_NET_SCH_DRR is not set
 # CONFIG_NET_SCH_INGRESS is not set
 
 #
@@ -634,6 +671,7 @@ CONFIG_NET_CLS=y
 # CONFIG_NET_CLS_RSVP is not set
 # CONFIG_NET_CLS_RSVP6 is not set
 # CONFIG_NET_CLS_FLOW is not set
+# CONFIG_NET_CLS_CGROUP is not set
 CONFIG_NET_EMATCH=y
 CONFIG_NET_EMATCH_STACK=32
 # CONFIG_NET_EMATCH_CMP is not set
@@ -649,7 +687,9 @@ CONFIG_NET_CLS_ACT=y
 # CONFIG_NET_ACT_NAT is not set
 # CONFIG_NET_ACT_PEDIT is not set
 # CONFIG_NET_ACT_SIMP is not set
+# CONFIG_NET_ACT_SKBEDIT is not set
 CONFIG_NET_SCH_FIFO=y
+# CONFIG_DCB is not set
 
 #
 # Network testing
@@ -666,29 +706,33 @@ CONFIG_HAMRADIO=y
 # CONFIG_IRDA is not set
 # CONFIG_BT is not set
 # CONFIG_AF_RXRPC is not set
+# CONFIG_PHONET is not set
 CONFIG_FIB_RULES=y
-
-#
-# Wireless
-#
+CONFIG_WIRELESS=y
 CONFIG_CFG80211=y
+# CONFIG_CFG80211_REG_DEBUG is not set
 CONFIG_NL80211=y
+CONFIG_WIRELESS_OLD_REGULATORY=y
 CONFIG_WIRELESS_EXT=y
 CONFIG_WIRELESS_EXT_SYSFS=y
+# CONFIG_LIB80211 is not set
 CONFIG_MAC80211=y
 
 #
 # Rate control algorithm selection
 #
-CONFIG_MAC80211_RC_PID=y
-CONFIG_MAC80211_RC_DEFAULT_PID=y
-CONFIG_MAC80211_RC_DEFAULT="pid"
+CONFIG_MAC80211_RC_MINSTREL=y
+# CONFIG_MAC80211_RC_DEFAULT_PID is not set
+CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
+CONFIG_MAC80211_RC_DEFAULT="minstrel"
 # CONFIG_MAC80211_MESH is not set
 CONFIG_MAC80211_LEDS=y
 # CONFIG_MAC80211_DEBUGFS is not set
 # CONFIG_MAC80211_DEBUG_MENU is not set
-# CONFIG_IEEE80211 is not set
-# CONFIG_RFKILL is not set
+# CONFIG_WIMAX is not set
+CONFIG_RFKILL=y
+# CONFIG_RFKILL_INPUT is not set
+CONFIG_RFKILL_LEDS=y
 # CONFIG_NET_9P is not set
 
 #
@@ -712,7 +756,7 @@ CONFIG_PROC_EVENTS=y
 # CONFIG_MTD is not set
 # CONFIG_PARPORT is not set
 CONFIG_PNP=y
-# CONFIG_PNP_DEBUG is not set
+CONFIG_PNP_DEBUG_MESSAGES=y
 
 #
 # Protocols
@@ -740,21 +784,21 @@ CONFIG_BLK_DEV_RAM_SIZE=16384
 CONFIG_MISC_DEVICES=y
 # CONFIG_IBM_ASM is not set
 # CONFIG_PHANTOM is not set
-# CONFIG_EEPROM_93CX6 is not set
 # CONFIG_SGI_IOC4 is not set
 # CONFIG_TIFM_CORE is not set
-# CONFIG_ACER_WMI is not set
-# CONFIG_ASUS_LAPTOP is not set
-# CONFIG_FUJITSU_LAPTOP is not set
-# CONFIG_MSI_LAPTOP is not set
-# CONFIG_COMPAL_LAPTOP is not set
-# CONFIG_SONY_LAPTOP is not set
-# CONFIG_THINKPAD_ACPI is not set
-# CONFIG_INTEL_MENLOW is not set
+# CONFIG_ICS932S401 is not set
 # CONFIG_ENCLOSURE_SERVICES is not set
 # CONFIG_SGI_XP is not set
 # CONFIG_HP_ILO is not set
 # CONFIG_SGI_GRU is not set
+# CONFIG_C2PORT is not set
+
+#
+# EEPROM support
+#
+# CONFIG_EEPROM_AT24 is not set
+# CONFIG_EEPROM_LEGACY is not set
+# CONFIG_EEPROM_93CX6 is not set
 CONFIG_HAVE_IDE=y
 # CONFIG_IDE is not set
 
@@ -793,7 +837,7 @@ CONFIG_SCSI_WAIT_SCAN=m
 #
 CONFIG_SCSI_SPI_ATTRS=y
 # CONFIG_SCSI_FC_ATTRS is not set
-CONFIG_SCSI_ISCSI_ATTRS=y
+# CONFIG_SCSI_ISCSI_ATTRS is not set
 # CONFIG_SCSI_SAS_ATTRS is not set
 # CONFIG_SCSI_SAS_LIBSAS is not set
 # CONFIG_SCSI_SRP_ATTRS is not set
@@ -864,6 +908,7 @@ CONFIG_PATA_OLDPIIX=y
 CONFIG_PATA_SCH=y
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
+CONFIG_MD_AUTODETECT=y
 # CONFIG_MD_LINEAR is not set
 # CONFIG_MD_RAID0 is not set
 # CONFIG_MD_RAID1 is not set
@@ -919,6 +964,9 @@ CONFIG_PHYLIB=y
 # CONFIG_BROADCOM_PHY is not set
 # CONFIG_ICPLUS_PHY is not set
 # CONFIG_REALTEK_PHY is not set
+# CONFIG_NATIONAL_PHY is not set
+# CONFIG_STE10XP is not set
+# CONFIG_LSI_ET1011C_PHY is not set
 # CONFIG_FIXED_PHY is not set
 # CONFIG_MDIO_BITBANG is not set
 CONFIG_NET_ETHERNET=y
@@ -942,6 +990,9 @@ CONFIG_NET_TULIP=y
 # CONFIG_IBM_NEW_EMAC_RGMII is not set
 # CONFIG_IBM_NEW_EMAC_TAH is not set
 # CONFIG_IBM_NEW_EMAC_EMAC4 is not set
+# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
+# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
+# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
 CONFIG_NET_PCI=y
 # CONFIG_PCNET32 is not set
 # CONFIG_AMD8111_ETH is not set
@@ -949,7 +1000,6 @@ CONFIG_NET_PCI=y
 # CONFIG_B44 is not set
 CONFIG_FORCEDETH=y
 # CONFIG_FORCEDETH_NAPI is not set
-# CONFIG_EEPRO100 is not set
 CONFIG_E100=y
 # CONFIG_FEALNX is not set
 # CONFIG_NATSEMI is not set
@@ -963,15 +1013,16 @@ CONFIG_8139TOO_PIO=y
 # CONFIG_R6040 is not set
 # CONFIG_SIS900 is not set
 # CONFIG_EPIC100 is not set
+# CONFIG_SMSC9420 is not set
 # CONFIG_SUNDANCE is not set
 # CONFIG_TLAN is not set
 # CONFIG_VIA_RHINE is not set
 # CONFIG_SC92031 is not set
+# CONFIG_ATL2 is not set
 CONFIG_NETDEV_1000=y
 # CONFIG_ACENIC is not set
 # CONFIG_DL2K is not set
 CONFIG_E1000=y
-# CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
 # CONFIG_E1000E is not set
 # CONFIG_IP1000 is not set
 # CONFIG_IGB is not set
@@ -989,18 +1040,23 @@ CONFIG_TIGON3=y
 # CONFIG_QLA3XXX is not set
 # CONFIG_ATL1 is not set
 # CONFIG_ATL1E is not set
+# CONFIG_JME is not set
 CONFIG_NETDEV_10000=y
 # CONFIG_CHELSIO_T1 is not set
+CONFIG_CHELSIO_T3_DEPENDS=y
 # CONFIG_CHELSIO_T3 is not set
+# CONFIG_ENIC is not set
 # CONFIG_IXGBE is not set
 # CONFIG_IXGB is not set
 # CONFIG_S2IO is not set
 # CONFIG_MYRI10GE is not set
 # CONFIG_NETXEN_NIC is not set
 # CONFIG_NIU is not set
+# CONFIG_MLX4_EN is not set
 # CONFIG_MLX4_CORE is not set
 # CONFIG_TEHUTI is not set
 # CONFIG_BNX2X is not set
+# CONFIG_QLGE is not set
 # CONFIG_SFC is not set
 CONFIG_TR=y
 # CONFIG_IBMOL is not set
@@ -1013,9 +1069,8 @@ CONFIG_TR=y
 # CONFIG_WLAN_PRE80211 is not set
 CONFIG_WLAN_80211=y
 # CONFIG_PCMCIA_RAYCS is not set
-# CONFIG_IPW2100 is not set
-# CONFIG_IPW2200 is not set
 # CONFIG_LIBERTAS is not set
+# CONFIG_LIBERTAS_THINFIRM is not set
 # CONFIG_AIRO is not set
 # CONFIG_HERMES is not set
 # CONFIG_ATMEL is not set
@@ -1032,6 +1087,8 @@ CONFIG_WLAN_80211=y
 CONFIG_ATH5K=y
 # CONFIG_ATH5K_DEBUG is not set
 # CONFIG_ATH9K is not set
+# CONFIG_IPW2100 is not set
+# CONFIG_IPW2200 is not set
 # CONFIG_IWLCORE is not set
 # CONFIG_IWLWIFI_LEDS is not set
 # CONFIG_IWLAGN is not set
@@ -1043,6 +1100,10 @@ CONFIG_ATH5K=y
 # CONFIG_RT2X00 is not set
 
 #
+# Enable WiMAX (Networking options) to see the WiMAX drivers
+#
+
+#
 # USB Network Adapters
 #
 # CONFIG_USB_CATC is not set
@@ -1050,6 +1111,7 @@ CONFIG_ATH5K=y
 # CONFIG_USB_PEGASUS is not set
 # CONFIG_USB_RTL8150 is not set
 # CONFIG_USB_USBNET is not set
+# CONFIG_USB_HSO is not set
 CONFIG_NET_PCMCIA=y
 # CONFIG_PCMCIA_3C589 is not set
 # CONFIG_PCMCIA_3C574 is not set
@@ -1059,6 +1121,7 @@ CONFIG_NET_PCMCIA=y
 # CONFIG_PCMCIA_SMC91C92 is not set
 # CONFIG_PCMCIA_XIRC2PS is not set
 # CONFIG_PCMCIA_AXNET is not set
+# CONFIG_PCMCIA_IBMTR is not set
 # CONFIG_WAN is not set
 CONFIG_FDDI=y
 # CONFIG_DEFXX is not set
@@ -1110,6 +1173,7 @@ CONFIG_MOUSE_PS2_LOGIPS2PP=y
 CONFIG_MOUSE_PS2_SYNAPTICS=y
 CONFIG_MOUSE_PS2_LIFEBOOK=y
 CONFIG_MOUSE_PS2_TRACKPOINT=y
+# CONFIG_MOUSE_PS2_ELANTECH is not set
 # CONFIG_MOUSE_PS2_TOUCHKIT is not set
 # CONFIG_MOUSE_SERIAL is not set
 # CONFIG_MOUSE_APPLETOUCH is not set
@@ -1147,15 +1211,16 @@ CONFIG_INPUT_TOUCHSCREEN=y
 # CONFIG_TOUCHSCREEN_FUJITSU is not set
 # CONFIG_TOUCHSCREEN_GUNZE is not set
 # CONFIG_TOUCHSCREEN_ELO is not set
+# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
 # CONFIG_TOUCHSCREEN_MTOUCH is not set
 # CONFIG_TOUCHSCREEN_INEXIO is not set
 # CONFIG_TOUCHSCREEN_MK712 is not set
 # CONFIG_TOUCHSCREEN_PENMOUNT is not set
 # CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
 # CONFIG_TOUCHSCREEN_TOUCHWIN is not set
-# CONFIG_TOUCHSCREEN_UCB1400 is not set
 # CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
 # CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
+# CONFIG_TOUCHSCREEN_TSC2007 is not set
 CONFIG_INPUT_MISC=y
 # CONFIG_INPUT_PCSPKR is not set
 # CONFIG_INPUT_APANEL is not set
@@ -1165,6 +1230,7 @@ CONFIG_INPUT_MISC=y
 # CONFIG_INPUT_KEYSPAN_REMOTE is not set
 # CONFIG_INPUT_POWERMATE is not set
 # CONFIG_INPUT_YEALINK is not set
+# CONFIG_INPUT_CM109 is not set
 # CONFIG_INPUT_UINPUT is not set
 
 #
@@ -1231,6 +1297,7 @@ CONFIG_SERIAL_CORE=y
 CONFIG_SERIAL_CORE_CONSOLE=y
 # CONFIG_SERIAL_JSM is not set
 CONFIG_UNIX98_PTYS=y
+# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
 # CONFIG_LEGACY_PTYS is not set
 # CONFIG_IPMI_HANDLER is not set
 CONFIG_HW_RANDOM=y
@@ -1260,6 +1327,7 @@ CONFIG_I2C=y
 CONFIG_I2C_BOARDINFO=y
 # CONFIG_I2C_CHARDEV is not set
 CONFIG_I2C_HELPER_AUTO=y
+CONFIG_I2C_ALGOBIT=y
 
 #
 # I2C Hardware Bus support
@@ -1311,8 +1379,6 @@ CONFIG_I2C_I801=y
 # Miscellaneous I2C Chip support
 #
 # CONFIG_DS1682 is not set
-# CONFIG_EEPROM_AT24 is not set
-# CONFIG_EEPROM_LEGACY is not set
 # CONFIG_SENSORS_PCF8574 is not set
 # CONFIG_PCF8575 is not set
 # CONFIG_SENSORS_PCA9539 is not set
@@ -1331,8 +1397,78 @@ CONFIG_POWER_SUPPLY=y
 # CONFIG_POWER_SUPPLY_DEBUG is not set
 # CONFIG_PDA_POWER is not set
 # CONFIG_BATTERY_DS2760 is not set
-# CONFIG_HWMON is not set
+# CONFIG_BATTERY_BQ27x00 is not set
+CONFIG_HWMON=y
+# CONFIG_HWMON_VID is not set
+# CONFIG_SENSORS_ABITUGURU is not set
+# CONFIG_SENSORS_ABITUGURU3 is not set
+# CONFIG_SENSORS_AD7414 is not set
+# CONFIG_SENSORS_AD7418 is not set
+# CONFIG_SENSORS_ADM1021 is not set
+# CONFIG_SENSORS_ADM1025 is not set
+# CONFIG_SENSORS_ADM1026 is not set
+# CONFIG_SENSORS_ADM1029 is not set
+# CONFIG_SENSORS_ADM1031 is not set
+# CONFIG_SENSORS_ADM9240 is not set
+# CONFIG_SENSORS_ADT7462 is not set
+# CONFIG_SENSORS_ADT7470 is not set
+# CONFIG_SENSORS_ADT7473 is not set
+# CONFIG_SENSORS_ADT7475 is not set
+# CONFIG_SENSORS_K8TEMP is not set
+# CONFIG_SENSORS_ASB100 is not set
+# CONFIG_SENSORS_ATXP1 is not set
+# CONFIG_SENSORS_DS1621 is not set
+# CONFIG_SENSORS_I5K_AMB is not set
+# CONFIG_SENSORS_F71805F is not set
+# CONFIG_SENSORS_F71882FG is not set
+# CONFIG_SENSORS_F75375S is not set
+# CONFIG_SENSORS_FSCHER is not set
+# CONFIG_SENSORS_FSCPOS is not set
+# CONFIG_SENSORS_FSCHMD is not set
+# CONFIG_SENSORS_GL518SM is not set
+# CONFIG_SENSORS_GL520SM is not set
+# CONFIG_SENSORS_CORETEMP is not set
+# CONFIG_SENSORS_IT87 is not set
+# CONFIG_SENSORS_LM63 is not set
+# CONFIG_SENSORS_LM75 is not set
+# CONFIG_SENSORS_LM77 is not set
+# CONFIG_SENSORS_LM78 is not set
+# CONFIG_SENSORS_LM80 is not set
+# CONFIG_SENSORS_LM83 is not set
+# CONFIG_SENSORS_LM85 is not set
+# CONFIG_SENSORS_LM87 is not set
+# CONFIG_SENSORS_LM90 is not set
+# CONFIG_SENSORS_LM92 is not set
+# CONFIG_SENSORS_LM93 is not set
+# CONFIG_SENSORS_LTC4245 is not set
+# CONFIG_SENSORS_MAX1619 is not set
+# CONFIG_SENSORS_MAX6650 is not set
+# CONFIG_SENSORS_PC87360 is not set
+# CONFIG_SENSORS_PC87427 is not set
+# CONFIG_SENSORS_SIS5595 is not set
+# CONFIG_SENSORS_DME1737 is not set
+# CONFIG_SENSORS_SMSC47M1 is not set
+# CONFIG_SENSORS_SMSC47M192 is not set
+# CONFIG_SENSORS_SMSC47B397 is not set
+# CONFIG_SENSORS_ADS7828 is not set
+# CONFIG_SENSORS_THMC50 is not set
+# CONFIG_SENSORS_VIA686A is not set
+# CONFIG_SENSORS_VT1211 is not set
+# CONFIG_SENSORS_VT8231 is not set
+# CONFIG_SENSORS_W83781D is not set
+# CONFIG_SENSORS_W83791D is not set
+# CONFIG_SENSORS_W83792D is not set
+# CONFIG_SENSORS_W83793 is not set
+# CONFIG_SENSORS_W83L785TS is not set
+# CONFIG_SENSORS_W83L786NG is not set
+# CONFIG_SENSORS_W83627HF is not set
+# CONFIG_SENSORS_W83627EHF is not set
+# CONFIG_SENSORS_HDAPS is not set
+# CONFIG_SENSORS_LIS3LV02D is not set
+# CONFIG_SENSORS_APPLESMC is not set
+# CONFIG_HWMON_DEBUG_CHIP is not set
 CONFIG_THERMAL=y
+# CONFIG_THERMAL_HWMON is not set
 CONFIG_WATCHDOG=y
 # CONFIG_WATCHDOG_NOWAYOUT is not set
 
@@ -1352,15 +1488,18 @@ CONFIG_WATCHDOG=y
 # CONFIG_I6300ESB_WDT is not set
 # CONFIG_ITCO_WDT is not set
 # CONFIG_IT8712F_WDT is not set
+# CONFIG_IT87_WDT is not set
 # CONFIG_HP_WATCHDOG is not set
 # CONFIG_SC1200_WDT is not set
 # CONFIG_PC87413_WDT is not set
 # CONFIG_60XX_WDT is not set
 # CONFIG_SBC8360_WDT is not set
 # CONFIG_CPU5_WDT is not set
+# CONFIG_SMSC_SCH311X_WDT is not set
 # CONFIG_SMSC37B787_WDT is not set
 # CONFIG_W83627HF_WDT is not set
 # CONFIG_W83697HF_WDT is not set
+# CONFIG_W83697UG_WDT is not set
 # CONFIG_W83877F_WDT is not set
 # CONFIG_W83977F_WDT is not set
 # CONFIG_MACHZ_WDT is not set
@@ -1376,11 +1515,11 @@ CONFIG_WATCHDOG=y
 # USB-based Watchdog Cards
 #
 # CONFIG_USBPCWATCHDOG is not set
+CONFIG_SSB_POSSIBLE=y
 
 #
 # Sonics Silicon Backplane
 #
-CONFIG_SSB_POSSIBLE=y
 # CONFIG_SSB is not set
 
 #
@@ -1389,7 +1528,13 @@ CONFIG_SSB_POSSIBLE=y
 # CONFIG_MFD_CORE is not set
 # CONFIG_MFD_SM501 is not set
 # CONFIG_HTC_PASIC3 is not set
+# CONFIG_TWL4030_CORE is not set
 # CONFIG_MFD_TMIO is not set
+# CONFIG_PMIC_DA903X is not set
+# CONFIG_MFD_WM8400 is not set
+# CONFIG_MFD_WM8350_I2C is not set
+# CONFIG_MFD_PCF50633 is not set
+# CONFIG_REGULATOR is not set
 
 #
 # Multimedia devices
@@ -1423,6 +1568,7 @@ CONFIG_DRM=y
 # CONFIG_DRM_I810 is not set
 # CONFIG_DRM_I830 is not set
 CONFIG_DRM_I915=y
+CONFIG_DRM_I915_KMS=y
 # CONFIG_DRM_MGA is not set
 # CONFIG_DRM_SIS is not set
 # CONFIG_DRM_VIA is not set
@@ -1432,6 +1578,7 @@ CONFIG_DRM_I915=y
 CONFIG_FB=y
 # CONFIG_FIRMWARE_EDID is not set
 # CONFIG_FB_DDC is not set
+# CONFIG_FB_BOOT_VESA_SUPPORT is not set
 CONFIG_FB_CFB_FILLRECT=y
 CONFIG_FB_CFB_COPYAREA=y
 CONFIG_FB_CFB_IMAGEBLIT=y
@@ -1460,7 +1607,6 @@ CONFIG_FB_TILEBLITTING=y
 # CONFIG_FB_UVESA is not set
 # CONFIG_FB_VESA is not set
 CONFIG_FB_EFI=y
-# CONFIG_FB_IMAC is not set
 # CONFIG_FB_N411 is not set
 # CONFIG_FB_HGA is not set
 # CONFIG_FB_S1D13XXX is not set
@@ -1475,6 +1621,7 @@ CONFIG_FB_EFI=y
 # CONFIG_FB_S3 is not set
 # CONFIG_FB_SAVAGE is not set
 # CONFIG_FB_SIS is not set
+# CONFIG_FB_VIA is not set
 # CONFIG_FB_NEOMAGIC is not set
 # CONFIG_FB_KYRO is not set
 # CONFIG_FB_3DFX is not set
@@ -1486,12 +1633,15 @@ CONFIG_FB_EFI=y
 # CONFIG_FB_CARMINE is not set
 # CONFIG_FB_GEODE is not set
 # CONFIG_FB_VIRTUAL is not set
+# CONFIG_FB_METRONOME is not set
+# CONFIG_FB_MB862XX is not set
 CONFIG_BACKLIGHT_LCD_SUPPORT=y
 # CONFIG_LCD_CLASS_DEVICE is not set
 CONFIG_BACKLIGHT_CLASS_DEVICE=y
-# CONFIG_BACKLIGHT_CORGI is not set
+CONFIG_BACKLIGHT_GENERIC=y
 # CONFIG_BACKLIGHT_PROGEAR is not set
 # CONFIG_BACKLIGHT_MBP_NVIDIA is not set
+# CONFIG_BACKLIGHT_SAHARA is not set
 
 #
 # Display device support
@@ -1511,10 +1661,12 @@ CONFIG_LOGO=y
 # CONFIG_LOGO_LINUX_VGA16 is not set
 CONFIG_LOGO_LINUX_CLUT224=y
 CONFIG_SOUND=y
+CONFIG_SOUND_OSS_CORE=y
 CONFIG_SND=y
 CONFIG_SND_TIMER=y
 CONFIG_SND_PCM=y
 CONFIG_SND_HWDEP=y
+CONFIG_SND_JACK=y
 CONFIG_SND_SEQUENCER=y
 CONFIG_SND_SEQ_DUMMY=y
 CONFIG_SND_OSSEMUL=y
@@ -1522,6 +1674,8 @@ CONFIG_SND_MIXER_OSS=y
 CONFIG_SND_PCM_OSS=y
 CONFIG_SND_PCM_OSS_PLUGINS=y
 CONFIG_SND_SEQUENCER_OSS=y
+CONFIG_SND_HRTIMER=y
+CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
 CONFIG_SND_DYNAMIC_MINORS=y
 CONFIG_SND_SUPPORT_OLD_API=y
 CONFIG_SND_VERBOSE_PROCFS=y
@@ -1575,11 +1729,16 @@ CONFIG_SND_PCI=y
 # CONFIG_SND_FM801 is not set
 CONFIG_SND_HDA_INTEL=y
 CONFIG_SND_HDA_HWDEP=y
+# CONFIG_SND_HDA_RECONFIG is not set
+# CONFIG_SND_HDA_INPUT_BEEP is not set
 CONFIG_SND_HDA_CODEC_REALTEK=y
 CONFIG_SND_HDA_CODEC_ANALOG=y
 CONFIG_SND_HDA_CODEC_SIGMATEL=y
 CONFIG_SND_HDA_CODEC_VIA=y
 CONFIG_SND_HDA_CODEC_ATIHDMI=y
+CONFIG_SND_HDA_CODEC_NVHDMI=y
+CONFIG_SND_HDA_CODEC_INTELHDMI=y
+CONFIG_SND_HDA_ELD=y
 CONFIG_SND_HDA_CODEC_CONEXANT=y
 CONFIG_SND_HDA_CODEC_CMEDIA=y
 CONFIG_SND_HDA_CODEC_SI3054=y
@@ -1612,6 +1771,7 @@ CONFIG_SND_USB=y
 # CONFIG_SND_USB_AUDIO is not set
 # CONFIG_SND_USB_USX2Y is not set
 # CONFIG_SND_USB_CAIAQ is not set
+# CONFIG_SND_USB_US122L is not set
 CONFIG_SND_PCMCIA=y
 # CONFIG_SND_VXPOCKET is not set
 # CONFIG_SND_PDAUDIOCF is not set
@@ -1626,15 +1786,37 @@ CONFIG_HIDRAW=y
 # USB Input Devices
 #
 CONFIG_USB_HID=y
-CONFIG_USB_HIDINPUT_POWERBOOK=y
-CONFIG_HID_FF=y
 CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+
+#
+# Special HID drivers
+#
+CONFIG_HID_COMPAT=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_LOGITECH=y
 CONFIG_LOGITECH_FF=y
 # CONFIG_LOGIRUMBLEPAD2_FF is not set
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_PANTHERLORD=y
 CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SUNPLUS=y
+# CONFIG_GREENASIA_FF is not set
+CONFIG_HID_TOPSEED=y
 CONFIG_THRUSTMASTER_FF=y
 CONFIG_ZEROPLUS_FF=y
-CONFIG_USB_HIDDEV=y
 CONFIG_USB_SUPPORT=y
 CONFIG_USB_ARCH_HAS_HCD=y
 CONFIG_USB_ARCH_HAS_OHCI=y
@@ -1652,6 +1834,8 @@ CONFIG_USB_DEVICEFS=y
 CONFIG_USB_SUSPEND=y
 # CONFIG_USB_OTG is not set
 CONFIG_USB_MON=y
+# CONFIG_USB_WUSB is not set
+# CONFIG_USB_WUSB_CBAF is not set
 
 #
 # USB Host Controller Drivers
@@ -1660,6 +1844,7 @@ CONFIG_USB_MON=y
 CONFIG_USB_EHCI_HCD=y
 # CONFIG_USB_EHCI_ROOT_HUB_TT is not set
 # CONFIG_USB_EHCI_TT_NEWSCHED is not set
+# CONFIG_USB_OXU210HP_HCD is not set
 # CONFIG_USB_ISP116X_HCD is not set
 # CONFIG_USB_ISP1760_HCD is not set
 CONFIG_USB_OHCI_HCD=y
@@ -1669,6 +1854,8 @@ CONFIG_USB_OHCI_LITTLE_ENDIAN=y
 CONFIG_USB_UHCI_HCD=y
 # CONFIG_USB_SL811_HCD is not set
 # CONFIG_USB_R8A66597_HCD is not set
+# CONFIG_USB_WHCI_HCD is not set
+# CONFIG_USB_HWA_HCD is not set
 
 #
 # USB Device Class drivers
@@ -1676,20 +1863,20 @@ CONFIG_USB_UHCI_HCD=y
 # CONFIG_USB_ACM is not set
 CONFIG_USB_PRINTER=y
 # CONFIG_USB_WDM is not set
+# CONFIG_USB_TMC is not set
 
 #
-# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
+# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed;
 #
 
 #
-# may also be needed; see USB_STORAGE Help for more information
+# see USB_STORAGE Help for more information
 #
 CONFIG_USB_STORAGE=y
 # CONFIG_USB_STORAGE_DEBUG is not set
 # CONFIG_USB_STORAGE_DATAFAB is not set
 # CONFIG_USB_STORAGE_FREECOM is not set
 # CONFIG_USB_STORAGE_ISD200 is not set
-# CONFIG_USB_STORAGE_DPCM is not set
 # CONFIG_USB_STORAGE_USBAT is not set
 # CONFIG_USB_STORAGE_SDDR09 is not set
 # CONFIG_USB_STORAGE_SDDR55 is not set
@@ -1697,7 +1884,6 @@ CONFIG_USB_STORAGE=y
 # CONFIG_USB_STORAGE_ALAUDA is not set
 # CONFIG_USB_STORAGE_ONETOUCH is not set
 # CONFIG_USB_STORAGE_KARMA is not set
-# CONFIG_USB_STORAGE_SIERRA is not set
 # CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
 CONFIG_USB_LIBUSUAL=y
 
@@ -1718,6 +1904,7 @@ CONFIG_USB_LIBUSUAL=y
 # CONFIG_USB_EMI62 is not set
 # CONFIG_USB_EMI26 is not set
 # CONFIG_USB_ADUTUX is not set
+# CONFIG_USB_SEVSEG is not set
 # CONFIG_USB_RIO500 is not set
 # CONFIG_USB_LEGOTOWER is not set
 # CONFIG_USB_LCD is not set
@@ -1735,7 +1922,13 @@ CONFIG_USB_LIBUSUAL=y
 # CONFIG_USB_IOWARRIOR is not set
 # CONFIG_USB_TEST is not set
 # CONFIG_USB_ISIGHTFW is not set
+# CONFIG_USB_VST is not set
 # CONFIG_USB_GADGET is not set
+
+#
+# OTG and related infrastructure
+#
+# CONFIG_UWB is not set
 # CONFIG_MMC is not set
 # CONFIG_MEMSTICK is not set
 CONFIG_NEW_LEDS=y
@@ -1744,6 +1937,7 @@ CONFIG_LEDS_CLASS=y
 #
 # LED drivers
 #
+# CONFIG_LEDS_ALIX2 is not set
 # CONFIG_LEDS_PCA9532 is not set
 # CONFIG_LEDS_CLEVO_MAIL is not set
 # CONFIG_LEDS_PCA955X is not set
@@ -1754,6 +1948,7 @@ CONFIG_LEDS_CLASS=y
 CONFIG_LEDS_TRIGGERS=y
 # CONFIG_LEDS_TRIGGER_TIMER is not set
 # CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
+# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
 # CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set
 # CONFIG_ACCESSIBILITY is not set
 # CONFIG_INFINIBAND is not set
@@ -1793,6 +1988,7 @@ CONFIG_RTC_INTF_DEV=y
 # CONFIG_RTC_DRV_M41T80 is not set
 # CONFIG_RTC_DRV_S35390A is not set
 # CONFIG_RTC_DRV_FM3130 is not set
+# CONFIG_RTC_DRV_RX8581 is not set
 
 #
 # SPI RTC drivers
@@ -1802,12 +1998,15 @@ CONFIG_RTC_INTF_DEV=y
 # Platform RTC drivers
 #
 CONFIG_RTC_DRV_CMOS=y
+# CONFIG_RTC_DRV_DS1286 is not set
 # CONFIG_RTC_DRV_DS1511 is not set
 # CONFIG_RTC_DRV_DS1553 is not set
 # CONFIG_RTC_DRV_DS1742 is not set
 # CONFIG_RTC_DRV_STK17TA8 is not set
 # CONFIG_RTC_DRV_M48T86 is not set
+# CONFIG_RTC_DRV_M48T35 is not set
 # CONFIG_RTC_DRV_M48T59 is not set
+# CONFIG_RTC_DRV_BQ4802 is not set
 # CONFIG_RTC_DRV_V3020 is not set
 
 #
@@ -1820,6 +2019,21 @@ CONFIG_DMADEVICES=y
 #
 # CONFIG_INTEL_IOATDMA is not set
 # CONFIG_UIO is not set
+# CONFIG_STAGING is not set
+CONFIG_X86_PLATFORM_DEVICES=y
+# CONFIG_ACER_WMI is not set
+# CONFIG_ASUS_LAPTOP is not set
+# CONFIG_FUJITSU_LAPTOP is not set
+# CONFIG_MSI_LAPTOP is not set
+# CONFIG_PANASONIC_LAPTOP is not set
+# CONFIG_COMPAL_LAPTOP is not set
+# CONFIG_SONY_LAPTOP is not set
+# CONFIG_THINKPAD_ACPI is not set
+# CONFIG_INTEL_MENLOW is not set
+CONFIG_EEEPC_LAPTOP=y
+# CONFIG_ACPI_WMI is not set
+# CONFIG_ACPI_ASUS is not set
+# CONFIG_ACPI_TOSHIBA is not set
 
 #
 # Firmware Drivers
@@ -1830,8 +2044,7 @@ CONFIG_EFI_VARS=y
 # CONFIG_DELL_RBU is not set
 # CONFIG_DCDBAS is not set
 CONFIG_DMIID=y
-CONFIG_ISCSI_IBFT_FIND=y
-CONFIG_ISCSI_IBFT=y
+# CONFIG_ISCSI_IBFT_FIND is not set
 
 #
 # File systems
@@ -1841,22 +2054,25 @@ CONFIG_EXT3_FS=y
 CONFIG_EXT3_FS_XATTR=y
 CONFIG_EXT3_FS_POSIX_ACL=y
 CONFIG_EXT3_FS_SECURITY=y
-# CONFIG_EXT4DEV_FS is not set
+# CONFIG_EXT4_FS is not set
 CONFIG_JBD=y
 # CONFIG_JBD_DEBUG is not set
 CONFIG_FS_MBCACHE=y
 # CONFIG_REISERFS_FS is not set
 # CONFIG_JFS_FS is not set
 CONFIG_FS_POSIX_ACL=y
+CONFIG_FILE_LOCKING=y
 # CONFIG_XFS_FS is not set
 # CONFIG_GFS2_FS is not set
 # CONFIG_OCFS2_FS is not set
+# CONFIG_BTRFS_FS is not set
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY=y
 CONFIG_INOTIFY_USER=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_QUOTA_TREE=y
 # CONFIG_QFMT_V1 is not set
 CONFIG_QFMT_V2=y
 CONFIG_QUOTACTL=y
@@ -1890,16 +2106,14 @@ CONFIG_PROC_FS=y
 CONFIG_PROC_KCORE=y
 CONFIG_PROC_VMCORE=y
 CONFIG_PROC_SYSCTL=y
+CONFIG_PROC_PAGE_MONITOR=y
 CONFIG_SYSFS=y
 CONFIG_TMPFS=y
 CONFIG_TMPFS_POSIX_ACL=y
 CONFIG_HUGETLBFS=y
 CONFIG_HUGETLB_PAGE=y
 # CONFIG_CONFIGFS_FS is not set
-
-#
-# Miscellaneous filesystems
-#
+CONFIG_MISC_FILESYSTEMS=y
 # CONFIG_ADFS_FS is not set
 # CONFIG_AFFS_FS is not set
 # CONFIG_ECRYPT_FS is not set
@@ -1909,6 +2123,7 @@ CONFIG_HUGETLB_PAGE=y
 # CONFIG_BFS_FS is not set
 # CONFIG_EFS_FS is not set
 # CONFIG_CRAMFS is not set
+# CONFIG_SQUASHFS is not set
 # CONFIG_VXFS_FS is not set
 # CONFIG_MINIX_FS is not set
 # CONFIG_OMFS_FS is not set
@@ -1930,6 +2145,7 @@ CONFIG_NFS_ACL_SUPPORT=y
 CONFIG_NFS_COMMON=y
 CONFIG_SUNRPC=y
 CONFIG_SUNRPC_GSS=y
+# CONFIG_SUNRPC_REGISTER_V4 is not set
 CONFIG_RPCSEC_GSS_KRB5=y
 # CONFIG_RPCSEC_GSS_SPKM3 is not set
 # CONFIG_SMB_FS is not set
@@ -2006,7 +2222,7 @@ CONFIG_NLS_UTF8=y
 #
 CONFIG_TRACE_IRQFLAGS_SUPPORT=y
 CONFIG_PRINTK_TIME=y
-CONFIG_ENABLE_WARN_DEPRECATED=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
 CONFIG_ENABLE_MUST_CHECK=y
 CONFIG_FRAME_WARN=2048
 CONFIG_MAGIC_SYSRQ=y
@@ -2035,40 +2251,60 @@ CONFIG_TIMER_STATS=y
 CONFIG_DEBUG_BUGVERBOSE=y
 # CONFIG_DEBUG_INFO is not set
 # CONFIG_DEBUG_VM is not set
+# CONFIG_DEBUG_VIRTUAL is not set
 # CONFIG_DEBUG_WRITECOUNT is not set
 CONFIG_DEBUG_MEMORY_INIT=y
 # CONFIG_DEBUG_LIST is not set
 # CONFIG_DEBUG_SG is not set
+# CONFIG_DEBUG_NOTIFIERS is not set
+CONFIG_ARCH_WANT_FRAME_POINTERS=y
 CONFIG_FRAME_POINTER=y
 # CONFIG_BOOT_PRINTK_DELAY is not set
 # CONFIG_RCU_TORTURE_TEST is not set
+# CONFIG_RCU_CPU_STALL_DETECTOR is not set
 # CONFIG_KPROBES_SANITY_TEST is not set
 # CONFIG_BACKTRACE_SELF_TEST is not set
+# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
 # CONFIG_LKDTM is not set
 # CONFIG_FAULT_INJECTION is not set
 # CONFIG_LATENCYTOP is not set
 CONFIG_SYSCTL_SYSCALL_CHECK=y
-CONFIG_HAVE_FTRACE=y
+CONFIG_USER_STACKTRACE_SUPPORT=y
+CONFIG_HAVE_FUNCTION_TRACER=y
+CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
+CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
 CONFIG_HAVE_DYNAMIC_FTRACE=y
-# CONFIG_FTRACE is not set
+CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
+CONFIG_HAVE_HW_BRANCH_TRACER=y
+
+#
+# Tracers
+#
+# CONFIG_FUNCTION_TRACER is not set
 # CONFIG_IRQSOFF_TRACER is not set
 # CONFIG_SYSPROF_TRACER is not set
 # CONFIG_SCHED_TRACER is not set
 # CONFIG_CONTEXT_SWITCH_TRACER is not set
+# CONFIG_BOOT_TRACER is not set
+# CONFIG_TRACE_BRANCH_PROFILING is not set
+# CONFIG_POWER_TRACER is not set
+# CONFIG_STACK_TRACER is not set
+# CONFIG_HW_BRANCH_TRACER is not set
 CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
+# CONFIG_DYNAMIC_PRINTK_DEBUG is not set
 # CONFIG_SAMPLES is not set
 CONFIG_HAVE_ARCH_KGDB=y
 # CONFIG_KGDB is not set
 # CONFIG_STRICT_DEVMEM is not set
 CONFIG_X86_VERBOSE_BOOTUP=y
 CONFIG_EARLY_PRINTK=y
+CONFIG_EARLY_PRINTK_DBGP=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 CONFIG_DEBUG_STACK_USAGE=y
 # CONFIG_DEBUG_PAGEALLOC is not set
 # CONFIG_DEBUG_PER_CPU_MAPS is not set
 # CONFIG_X86_PTDUMP is not set
 CONFIG_DEBUG_RODATA=y
-# CONFIG_DIRECT_GBPAGES is not set
 # CONFIG_DEBUG_RODATA_TEST is not set
 CONFIG_DEBUG_NX_TEST=m
 # CONFIG_IOMMU_DEBUG is not set
@@ -2092,8 +2328,10 @@ CONFIG_OPTIMIZE_INLINING=y
 CONFIG_KEYS=y
 CONFIG_KEYS_DEBUG_PROC_KEYS=y
 CONFIG_SECURITY=y
+# CONFIG_SECURITYFS is not set
 CONFIG_SECURITY_NETWORK=y
 # CONFIG_SECURITY_NETWORK_XFRM is not set
+# CONFIG_SECURITY_PATH is not set
 CONFIG_SECURITY_FILE_CAPABILITIES=y
 # CONFIG_SECURITY_ROOTPLUG is not set
 CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=65536
@@ -2104,7 +2342,6 @@ CONFIG_SECURITY_SELINUX_DISABLE=y
 CONFIG_SECURITY_SELINUX_DEVELOP=y
 CONFIG_SECURITY_SELINUX_AVC_STATS=y
 CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
-# CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT is not set
 # CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
 # CONFIG_SECURITY_SMACK is not set
 CONFIG_CRYPTO=y
@@ -2112,11 +2349,18 @@ CONFIG_CRYPTO=y
 #
 # Crypto core or helper
 #
+# CONFIG_CRYPTO_FIPS is not set
 CONFIG_CRYPTO_ALGAPI=y
+CONFIG_CRYPTO_ALGAPI2=y
 CONFIG_CRYPTO_AEAD=y
+CONFIG_CRYPTO_AEAD2=y
 CONFIG_CRYPTO_BLKCIPHER=y
+CONFIG_CRYPTO_BLKCIPHER2=y
 CONFIG_CRYPTO_HASH=y
+CONFIG_CRYPTO_HASH2=y
+CONFIG_CRYPTO_RNG2=y
 CONFIG_CRYPTO_MANAGER=y
+CONFIG_CRYPTO_MANAGER2=y
 # CONFIG_CRYPTO_GF128MUL is not set
 # CONFIG_CRYPTO_NULL is not set
 # CONFIG_CRYPTO_CRYPTD is not set
@@ -2151,6 +2395,7 @@ CONFIG_CRYPTO_HMAC=y
 # Digest
 #
 # CONFIG_CRYPTO_CRC32C is not set
+# CONFIG_CRYPTO_CRC32C_INTEL is not set
 # CONFIG_CRYPTO_MD4 is not set
 CONFIG_CRYPTO_MD5=y
 # CONFIG_CRYPTO_MICHAEL_MIC is not set
@@ -2191,6 +2436,11 @@ CONFIG_CRYPTO_DES=y
 #
 # CONFIG_CRYPTO_DEFLATE is not set
 # CONFIG_CRYPTO_LZO is not set
+
+#
+# Random Number Generation
+#
+# CONFIG_CRYPTO_ANSI_CPRNG is not set
 CONFIG_CRYPTO_HW=y
 # CONFIG_CRYPTO_DEV_HIFN_795X is not set
 CONFIG_HAVE_KVM=y
@@ -2205,6 +2455,7 @@ CONFIG_VIRTUALIZATION=y
 CONFIG_BITREVERSE=y
 CONFIG_GENERIC_FIND_FIRST_BIT=y
 CONFIG_GENERIC_FIND_NEXT_BIT=y
+CONFIG_GENERIC_FIND_LAST_BIT=y
 # CONFIG_CRC_CCITT is not set
 # CONFIG_CRC16 is not set
 CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 9dabd00e9805..588a7aa937e1 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -33,8 +33,6 @@
 #include <asm/sigframe.h>
 #include <asm/sys_ia32.h>
 
-#define DEBUG_SIG 0
-
 #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))
 
 #define FIX_EFLAGS	(X86_EFLAGS_AC | X86_EFLAGS_OF | \
@@ -46,78 +44,83 @@ void signal_fault(struct pt_regs *regs, void __user *frame, char *where);
 
 int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 {
-	int err;
+	int err = 0;
 
 	if (!access_ok(VERIFY_WRITE, to, sizeof(compat_siginfo_t)))
 		return -EFAULT;
 
-	/* If you change siginfo_t structure, please make sure that
-	   this code is fixed accordingly.
-	   It should never copy any pad contained in the structure
-	   to avoid security leaks, but must copy the generic
-	   3 ints plus the relevant union member.  */
-	err = __put_user(from->si_signo, &to->si_signo);
-	err |= __put_user(from->si_errno, &to->si_errno);
-	err |= __put_user((short)from->si_code, &to->si_code);
-
-	if (from->si_code < 0) {
-		err |= __put_user(from->si_pid, &to->si_pid);
-		err |= __put_user(from->si_uid, &to->si_uid);
-		err |= __put_user(ptr_to_compat(from->si_ptr), &to->si_ptr);
-	} else {
-		/*
-		 * First 32bits of unions are always present:
-		 * si_pid === si_band === si_tid === si_addr(LS half)
-		 */
-		err |= __put_user(from->_sifields._pad[0],
-				  &to->_sifields._pad[0]);
-		switch (from->si_code >> 16) {
-		case __SI_FAULT >> 16:
-			break;
-		case __SI_CHLD >> 16:
-			err |= __put_user(from->si_utime, &to->si_utime);
-			err |= __put_user(from->si_stime, &to->si_stime);
-			err |= __put_user(from->si_status, &to->si_status);
-			/* FALL THROUGH */
-		default:
-		case __SI_KILL >> 16:
-			err |= __put_user(from->si_uid, &to->si_uid);
-			break;
-		case __SI_POLL >> 16:
-			err |= __put_user(from->si_fd, &to->si_fd);
-			break;
-		case __SI_TIMER >> 16:
-			err |= __put_user(from->si_overrun, &to->si_overrun);
-			err |= __put_user(ptr_to_compat(from->si_ptr),
-					  &to->si_ptr);
-			break;
-			 /* This is not generated by the kernel as of now.  */
-		case __SI_RT >> 16:
-		case __SI_MESGQ >> 16:
-			err |= __put_user(from->si_uid, &to->si_uid);
-			err |= __put_user(from->si_int, &to->si_int);
-			break;
+	put_user_try {
+		/* If you change siginfo_t structure, please make sure that
+		   this code is fixed accordingly.
+		   It should never copy any pad contained in the structure
+		   to avoid security leaks, but must copy the generic
+		   3 ints plus the relevant union member.  */
+		put_user_ex(from->si_signo, &to->si_signo);
+		put_user_ex(from->si_errno, &to->si_errno);
+		put_user_ex((short)from->si_code, &to->si_code);
+
+		if (from->si_code < 0) {
+			put_user_ex(from->si_pid, &to->si_pid);
+			put_user_ex(from->si_uid, &to->si_uid);
+			put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
+		} else {
+			/*
+			 * First 32bits of unions are always present:
+			 * si_pid === si_band === si_tid === si_addr(LS half)
+			 */
+			put_user_ex(from->_sifields._pad[0],
+					  &to->_sifields._pad[0]);
+			switch (from->si_code >> 16) {
+			case __SI_FAULT >> 16:
+				break;
+			case __SI_CHLD >> 16:
+				put_user_ex(from->si_utime, &to->si_utime);
+				put_user_ex(from->si_stime, &to->si_stime);
+				put_user_ex(from->si_status, &to->si_status);
+				/* FALL THROUGH */
+			default:
+			case __SI_KILL >> 16:
+				put_user_ex(from->si_uid, &to->si_uid);
+				break;
+			case __SI_POLL >> 16:
+				put_user_ex(from->si_fd, &to->si_fd);
+				break;
+			case __SI_TIMER >> 16:
+				put_user_ex(from->si_overrun, &to->si_overrun);
+				put_user_ex(ptr_to_compat(from->si_ptr),
+					    &to->si_ptr);
+				break;
+				 /* This is not generated by the kernel as of now.  */
+			case __SI_RT >> 16:
+			case __SI_MESGQ >> 16:
+				put_user_ex(from->si_uid, &to->si_uid);
+				put_user_ex(from->si_int, &to->si_int);
+				break;
+			}
 		}
-	}
+	} put_user_catch(err);
+
 	return err;
 }
 
 int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
 {
-	int err;
+	int err = 0;
 	u32 ptr32;
 
 	if (!access_ok(VERIFY_READ, from, sizeof(compat_siginfo_t)))
 		return -EFAULT;
 
-	err = __get_user(to->si_signo, &from->si_signo);
-	err |= __get_user(to->si_errno, &from->si_errno);
-	err |= __get_user(to->si_code, &from->si_code);
+	get_user_try {
+		get_user_ex(to->si_signo, &from->si_signo);
+		get_user_ex(to->si_errno, &from->si_errno);
+		get_user_ex(to->si_code, &from->si_code);
 
-	err |= __get_user(to->si_pid, &from->si_pid);
-	err |= __get_user(to->si_uid, &from->si_uid);
-	err |= __get_user(ptr32, &from->si_ptr);
-	to->si_ptr = compat_ptr(ptr32);
+		get_user_ex(to->si_pid, &from->si_pid);
+		get_user_ex(to->si_uid, &from->si_uid);
+		get_user_ex(ptr32, &from->si_ptr);
+		to->si_ptr = compat_ptr(ptr32);
+	} get_user_catch(err);
 
 	return err;
 }
@@ -142,17 +145,23 @@ asmlinkage long sys32_sigaltstack(const stack_ia32_t __user *uss_ptr,
 				  struct pt_regs *regs)
 {
 	stack_t uss, uoss;
-	int ret;
+	int ret, err = 0;
 	mm_segment_t seg;
 
 	if (uss_ptr) {
 		u32 ptr;
 
 		memset(&uss, 0, sizeof(stack_t));
-		if (!access_ok(VERIFY_READ, uss_ptr, sizeof(stack_ia32_t)) ||
-			    __get_user(ptr, &uss_ptr->ss_sp) ||
-			    __get_user(uss.ss_flags, &uss_ptr->ss_flags) ||
-			    __get_user(uss.ss_size, &uss_ptr->ss_size))
+		if (!access_ok(VERIFY_READ, uss_ptr, sizeof(stack_ia32_t)))
+			return -EFAULT;
+
+		get_user_try {
+			get_user_ex(ptr, &uss_ptr->ss_sp);
+			get_user_ex(uss.ss_flags, &uss_ptr->ss_flags);
+			get_user_ex(uss.ss_size, &uss_ptr->ss_size);
+		} get_user_catch(err);
+
+		if (err)
 			return -EFAULT;
 		uss.ss_sp = compat_ptr(ptr);
 	}
@@ -161,10 +170,16 @@ asmlinkage long sys32_sigaltstack(const stack_ia32_t __user *uss_ptr,
 	ret = do_sigaltstack(uss_ptr ? &uss : NULL, &uoss, regs->sp);
 	set_fs(seg);
 	if (ret >= 0 && uoss_ptr)  {
-		if (!access_ok(VERIFY_WRITE, uoss_ptr, sizeof(stack_ia32_t)) ||
-		    __put_user(ptr_to_compat(uoss.ss_sp), &uoss_ptr->ss_sp) ||
-		    __put_user(uoss.ss_flags, &uoss_ptr->ss_flags) ||
-		    __put_user(uoss.ss_size, &uoss_ptr->ss_size))
+		if (!access_ok(VERIFY_WRITE, uoss_ptr, sizeof(stack_ia32_t)))
+			return -EFAULT;
+
+		put_user_try {
+			put_user_ex(ptr_to_compat(uoss.ss_sp), &uoss_ptr->ss_sp);
+			put_user_ex(uoss.ss_flags, &uoss_ptr->ss_flags);
+			put_user_ex(uoss.ss_size, &uoss_ptr->ss_size);
+		} put_user_catch(err);
+
+		if (err)
 			ret = -EFAULT;
 	}
 	return ret;
@@ -173,75 +188,78 @@ asmlinkage long sys32_sigaltstack(const stack_ia32_t __user *uss_ptr,
 /*
  * Do a signal return; undo the signal stack.
  */
+#define loadsegment_gs(v)	load_gs_index(v)
+#define loadsegment_fs(v)	loadsegment(fs, v)
+#define loadsegment_ds(v)	loadsegment(ds, v)
+#define loadsegment_es(v)	loadsegment(es, v)
+
+#define get_user_seg(seg)	({ unsigned int v; savesegment(seg, v); v; })
+#define set_user_seg(seg, v)	loadsegment_##seg(v)
+
 #define COPY(x)			{		\
-	err |= __get_user(regs->x, &sc->x);	\
+	get_user_ex(regs->x, &sc->x);		\
 }
 
-#define COPY_SEG_CPL3(seg)	{			\
-		unsigned short tmp;			\
-		err |= __get_user(tmp, &sc->seg);	\
-		regs->seg = tmp | 3;			\
-}
+#define GET_SEG(seg)		({			\
+	unsigned short tmp;				\
+	get_user_ex(tmp, &sc->seg);			\
+	tmp;						\
+})
+
+#define COPY_SEG_CPL3(seg)	do {			\
+	regs->seg = GET_SEG(seg) | 3;			\
+} while (0)
 
 #define RELOAD_SEG(seg)		{		\
-	unsigned int cur, pre;			\
-	err |= __get_user(pre, &sc->seg);	\
-	savesegment(seg, cur);			\
+	unsigned int pre = GET_SEG(seg);	\
+	unsigned int cur = get_user_seg(seg);	\
 	pre |= 3;				\
 	if (pre != cur)				\
-		loadsegment(seg, pre);		\
+		set_user_seg(seg, pre);		\
 }
 
 static int ia32_restore_sigcontext(struct pt_regs *regs,
 				   struct sigcontext_ia32 __user *sc,
 				   unsigned int *pax)
 {
-	unsigned int tmpflags, gs, oldgs, err = 0;
+	unsigned int tmpflags, err = 0;
 	void __user *buf;
 	u32 tmp;
 
 	/* Always make any pending restarted system calls return -EINTR */
 	current_thread_info()->restart_block.fn = do_no_restart_syscall;
 
-#if DEBUG_SIG
-	printk(KERN_DEBUG "SIG restore_sigcontext: "
-	       "sc=%p err(%x) eip(%x) cs(%x) flg(%x)\n",
-	       sc, sc->err, sc->ip, sc->cs, sc->flags);
-#endif
-
-	/*
-	 * Reload fs and gs if they have changed in the signal
-	 * handler.  This does not handle long fs/gs base changes in
-	 * the handler, but does not clobber them at least in the
-	 * normal case.
-	 */
-	err |= __get_user(gs, &sc->gs);
-	gs |= 3;
-	savesegment(gs, oldgs);
-	if (gs != oldgs)
-		load_gs_index(gs);
-
-	RELOAD_SEG(fs);
-	RELOAD_SEG(ds);
-	RELOAD_SEG(es);
-
-	COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
-	COPY(dx); COPY(cx); COPY(ip);
-	/* Don't touch extended registers */
-
-	COPY_SEG_CPL3(cs);
-	COPY_SEG_CPL3(ss);
-
-	err |= __get_user(tmpflags, &sc->flags);
-	regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
-	/* disable syscall checks */
-	regs->orig_ax = -1;
-
-	err |= __get_user(tmp, &sc->fpstate);
-	buf = compat_ptr(tmp);
-	err |= restore_i387_xstate_ia32(buf);
-
-	err |= __get_user(*pax, &sc->ax);
+	get_user_try {
+		/*
+		 * Reload fs and gs if they have changed in the signal
+		 * handler.  This does not handle long fs/gs base changes in
+		 * the handler, but does not clobber them at least in the
+		 * normal case.
+		 */
+		RELOAD_SEG(gs);
+		RELOAD_SEG(fs);
+		RELOAD_SEG(ds);
+		RELOAD_SEG(es);
+
+		COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
+		COPY(dx); COPY(cx); COPY(ip);
+		/* Don't touch extended registers */
+
+		COPY_SEG_CPL3(cs);
+		COPY_SEG_CPL3(ss);
+
+		get_user_ex(tmpflags, &sc->flags);
+		regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
+		/* disable syscall checks */
+		regs->orig_ax = -1;
+
+		get_user_ex(tmp, &sc->fpstate);
+		buf = compat_ptr(tmp);
+		err |= restore_i387_xstate_ia32(buf);
+
+		get_user_ex(*pax, &sc->ax);
+	} get_user_catch(err);
+
 	return err;
 }
 
@@ -317,38 +335,36 @@ static int ia32_setup_sigcontext(struct sigcontext_ia32 __user *sc,
 				 void __user *fpstate,
 				 struct pt_regs *regs, unsigned int mask)
 {
-	int tmp, err = 0;
-
-	savesegment(gs, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->gs);
-	savesegment(fs, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->fs);
-	savesegment(ds, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->ds);
-	savesegment(es, tmp);
-	err |= __put_user(tmp, (unsigned int __user *)&sc->es);
-
-	err |= __put_user(regs->di, &sc->di);
-	err |= __put_user(regs->si, &sc->si);
-	err |= __put_user(regs->bp, &sc->bp);
-	err |= __put_user(regs->sp, &sc->sp);
-	err |= __put_user(regs->bx, &sc->bx);
-	err |= __put_user(regs->dx, &sc->dx);
-	err |= __put_user(regs->cx, &sc->cx);
-	err |= __put_user(regs->ax, &sc->ax);
-	err |= __put_user(current->thread.trap_no, &sc->trapno);
-	err |= __put_user(current->thread.error_code, &sc->err);
-	err |= __put_user(regs->ip, &sc->ip);
-	err |= __put_user(regs->cs, (unsigned int __user *)&sc->cs);
-	err |= __put_user(regs->flags, &sc->flags);
-	err |= __put_user(regs->sp, &sc->sp_at_signal);
-	err |= __put_user(regs->ss, (unsigned int __user *)&sc->ss);
-
-	err |= __put_user(ptr_to_compat(fpstate), &sc->fpstate);
-
-	/* non-iBCS2 extensions.. */
-	err |= __put_user(mask, &sc->oldmask);
-	err |= __put_user(current->thread.cr2, &sc->cr2);
+	int err = 0;
+
+	put_user_try {
+		put_user_ex(get_user_seg(gs), (unsigned int __user *)&sc->gs);
+		put_user_ex(get_user_seg(fs), (unsigned int __user *)&sc->fs);
+		put_user_ex(get_user_seg(ds), (unsigned int __user *)&sc->ds);
+		put_user_ex(get_user_seg(es), (unsigned int __user *)&sc->es);
+
+		put_user_ex(regs->di, &sc->di);
+		put_user_ex(regs->si, &sc->si);
+		put_user_ex(regs->bp, &sc->bp);
+		put_user_ex(regs->sp, &sc->sp);
+		put_user_ex(regs->bx, &sc->bx);
+		put_user_ex(regs->dx, &sc->dx);
+		put_user_ex(regs->cx, &sc->cx);
+		put_user_ex(regs->ax, &sc->ax);
+		put_user_ex(current->thread.trap_no, &sc->trapno);
+		put_user_ex(current->thread.error_code, &sc->err);
+		put_user_ex(regs->ip, &sc->ip);
+		put_user_ex(regs->cs, (unsigned int __user *)&sc->cs);
+		put_user_ex(regs->flags, &sc->flags);
+		put_user_ex(regs->sp, &sc->sp_at_signal);
+		put_user_ex(regs->ss, (unsigned int __user *)&sc->ss);
+
+		put_user_ex(ptr_to_compat(fpstate), &sc->fpstate);
+
+		/* non-iBCS2 extensions.. */
+		put_user_ex(mask, &sc->oldmask);
+		put_user_ex(current->thread.cr2, &sc->cr2);
+	} put_user_catch(err);
 
 	return err;
 }
@@ -437,13 +453,17 @@ int ia32_setup_frame(int sig, struct k_sigaction *ka,
 		else
 			restorer = &frame->retcode;
 	}
-	err |= __put_user(ptr_to_compat(restorer), &frame->pretcode);
 
-	/*
-	 * These are actually not used anymore, but left because some
-	 * gdb versions depend on them as a marker.
-	 */
-	err |= __put_user(*((u64 *)&code), (u64 *)frame->retcode);
+	put_user_try {
+		put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
+
+		/*
+		 * These are actually not used anymore, but left because some
+		 * gdb versions depend on them as a marker.
+		 */
+		put_user_ex(*((u64 *)&code), (u64 *)frame->retcode);
+	} put_user_catch(err);
+
 	if (err)
 		return -EFAULT;
 
@@ -462,11 +482,6 @@ int ia32_setup_frame(int sig, struct k_sigaction *ka,
 	regs->cs = __USER32_CS;
 	regs->ss = __USER32_DS;
 
-#if DEBUG_SIG
-	printk(KERN_DEBUG "SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n",
-	       current->comm, current->pid, frame, regs->ip, frame->pretcode);
-#endif
-
 	return 0;
 }
 
@@ -496,41 +511,40 @@ int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
 		return -EFAULT;
 
-	err |= __put_user(sig, &frame->sig);
-	err |= __put_user(ptr_to_compat(&frame->info), &frame->pinfo);
-	err |= __put_user(ptr_to_compat(&frame->uc), &frame->puc);
-	err |= copy_siginfo_to_user32(&frame->info, info);
-	if (err)
-		return -EFAULT;
+	put_user_try {
+		put_user_ex(sig, &frame->sig);
+		put_user_ex(ptr_to_compat(&frame->info), &frame->pinfo);
+		put_user_ex(ptr_to_compat(&frame->uc), &frame->puc);
+		err |= copy_siginfo_to_user32(&frame->info, info);
 
-	/* Create the ucontext.  */
-	if (cpu_has_xsave)
-		err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
-	else
-		err |= __put_user(0, &frame->uc.uc_flags);
-	err |= __put_user(0, &frame->uc.uc_link);
-	err |= __put_user(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
-	err |= __put_user(sas_ss_flags(regs->sp),
-			  &frame->uc.uc_stack.ss_flags);
-	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
-	err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
-				     regs, set->sig[0]);
-	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
-	if (err)
-		return -EFAULT;
+		/* Create the ucontext.  */
+		if (cpu_has_xsave)
+			put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
+		else
+			put_user_ex(0, &frame->uc.uc_flags);
+		put_user_ex(0, &frame->uc.uc_link);
+		put_user_ex(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
+		put_user_ex(sas_ss_flags(regs->sp),
+			    &frame->uc.uc_stack.ss_flags);
+		put_user_ex(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
+		err |= ia32_setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
+					     regs, set->sig[0]);
+		err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
+
+		if (ka->sa.sa_flags & SA_RESTORER)
+			restorer = ka->sa.sa_restorer;
+		else
+			restorer = VDSO32_SYMBOL(current->mm->context.vdso,
+						 rt_sigreturn);
+		put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
+
+		/*
+		 * Not actually used anymore, but left because some gdb
+		 * versions need it.
+		 */
+		put_user_ex(*((u64 *)&code), (u64 *)frame->retcode);
+	} put_user_catch(err);
 
-	if (ka->sa.sa_flags & SA_RESTORER)
-		restorer = ka->sa.sa_restorer;
-	else
-		restorer = VDSO32_SYMBOL(current->mm->context.vdso,
-					 rt_sigreturn);
-	err |= __put_user(ptr_to_compat(restorer), &frame->pretcode);
-
-	/*
-	 * Not actually used anymore, but left because some gdb
-	 * versions need it.
-	 */
-	err |= __put_user(*((u64 *)&code), (u64 *)frame->retcode);
 	if (err)
 		return -EFAULT;
 
@@ -549,10 +563,5 @@ int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	regs->cs = __USER32_CS;
 	regs->ss = __USER32_DS;
 
-#if DEBUG_SIG
-	printk(KERN_DEBUG "SIG deliver (%s:%d): sp=%p pc=%lx ra=%u\n",
-	       current->comm, current->pid, frame, regs->ip, frame->pretcode);
-#endif
-
 	return 0;
 }
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8ef8876666b2..db0c803170ab 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -112,8 +112,8 @@ ENTRY(ia32_sysenter_target)
 	CFI_DEF_CFA	rsp,0
 	CFI_REGISTER	rsp,rbp
 	SWAPGS_UNSAFE_STACK
-	movq	%gs:pda_kernelstack, %rsp
-	addq	$(PDA_STACKOFFSET),%rsp	
+	movq	PER_CPU_VAR(kernel_stack), %rsp
+	addq	$(KERNEL_STACK_OFFSET),%rsp
 	/*
 	 * No need to follow this irqs on/off section: the syscall
 	 * disabled irqs, here we enable it straight after entry:
@@ -273,13 +273,13 @@ ENDPROC(ia32_sysenter_target)
 ENTRY(ia32_cstar_target)
 	CFI_STARTPROC32	simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,PDA_STACKOFFSET
+	CFI_DEF_CFA	rsp,KERNEL_STACK_OFFSET
 	CFI_REGISTER	rip,rcx
 	/*CFI_REGISTER	rflags,r11*/
 	SWAPGS_UNSAFE_STACK
 	movl	%esp,%r8d
 	CFI_REGISTER	rsp,r8
-	movq	%gs:pda_kernelstack,%rsp
+	movq	PER_CPU_VAR(kernel_stack),%rsp
 	/*
 	 * No need to follow this irqs on/off section: the syscall
 	 * disabled irqs and here we enable it straight after entry:
diff --git a/arch/x86/include/asm/a.out-core.h b/arch/x86/include/asm/a.out-core.h
index 3c601f8224be..bb70e397aa84 100644
--- a/arch/x86/include/asm/a.out-core.h
+++ b/arch/x86/include/asm/a.out-core.h
@@ -55,7 +55,7 @@ static inline void aout_dump_thread(struct pt_regs *regs, struct user *dump)
 	dump->regs.ds = (u16)regs->ds;
 	dump->regs.es = (u16)regs->es;
 	dump->regs.fs = (u16)regs->fs;
-	savesegment(gs, dump->regs.gs);
+	dump->regs.gs = get_user_gs(regs);
 	dump->regs.orig_ax = regs->orig_ax;
 	dump->regs.ip = regs->ip;
 	dump->regs.cs = (u16)regs->cs;
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 9830681446ad..4518dc500903 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -102,9 +102,6 @@ static inline void disable_acpi(void)
 	acpi_noirq = 1;
 }
 
-/* Fixmap pages to reserve for ACPI boot-time tables (see fixmap.h) */
-#define FIX_ACPI_PAGES 4
-
 extern int acpi_gsi_to_irq(u32 gsi, unsigned int *irq);
 
 static inline void acpi_noirq_set(void) { acpi_noirq = 1; }
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index ab1d51a8855e..4ef949c1972e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -1,15 +1,18 @@
 #ifndef _ASM_X86_APIC_H
 #define _ASM_X86_APIC_H
 
-#include <linux/pm.h>
+#include <linux/cpumask.h>
 #include <linux/delay.h>
+#include <linux/pm.h>
 
 #include <asm/alternative.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
+#include <asm/cpufeature.h>
 #include <asm/processor.h>
+#include <asm/apicdef.h>
+#include <asm/atomic.h>
+#include <asm/fixmap.h>
+#include <asm/mpspec.h>
 #include <asm/system.h>
-#include <asm/cpufeature.h>
 #include <asm/msr.h>
 
 #define ARCH_APICTIMER_STOPS_ON_C3	1
@@ -33,7 +36,13 @@
 	} while (0)
 
 
+#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
 extern void generic_apic_probe(void);
+#else
+static inline void generic_apic_probe(void)
+{
+}
+#endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
 
@@ -41,6 +50,21 @@ extern unsigned int apic_verbosity;
 extern int local_apic_timer_c2_ok;
 
 extern int disable_apic;
+
+#ifdef CONFIG_SMP
+extern void __inquire_remote_apic(int apicid);
+#else /* CONFIG_SMP */
+static inline void __inquire_remote_apic(int apicid)
+{
+}
+#endif /* CONFIG_SMP */
+
+static inline void default_inquire_remote_apic(int apicid)
+{
+	if (apic_verbosity >= APIC_DEBUG)
+		__inquire_remote_apic(apicid);
+}
+
 /*
  * Basic functions accessing APICs.
  */
@@ -51,7 +75,14 @@ extern int disable_apic;
 #define setup_secondary_clock setup_secondary_APIC_clock
 #endif
 
+#ifdef CONFIG_X86_VSMP
 extern int is_vsmp_box(void);
+#else
+static inline int is_vsmp_box(void)
+{
+	return 0;
+}
+#endif
 extern void xapic_wait_icr_idle(void);
 extern u32 safe_xapic_wait_icr_idle(void);
 extern void xapic_icr_write(u32, u32);
@@ -71,6 +102,12 @@ static inline u32 native_apic_mem_read(u32 reg)
 	return *((volatile u32 *)(APIC_BASE + reg));
 }
 
+extern void native_apic_wait_icr_idle(void);
+extern u32 native_safe_apic_wait_icr_idle(void);
+extern void native_apic_icr_write(u32 low, u32 id);
+extern u64 native_apic_icr_read(void);
+
+#ifdef CONFIG_X86_X2APIC
 static inline void native_apic_msr_write(u32 reg, u32 v)
 {
 	if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||
@@ -91,8 +128,32 @@ static inline u32 native_apic_msr_read(u32 reg)
 	return low;
 }
 
-#ifndef CONFIG_X86_32
-extern int x2apic;
+static inline void native_x2apic_wait_icr_idle(void)
+{
+	/* no need to wait for icr idle in x2apic */
+	return;
+}
+
+static inline u32 native_safe_x2apic_wait_icr_idle(void)
+{
+	/* no need to wait for icr idle in x2apic */
+	return 0;
+}
+
+static inline void native_x2apic_icr_write(u32 low, u32 id)
+{
+	wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low);
+}
+
+static inline u64 native_x2apic_icr_read(void)
+{
+	unsigned long val;
+
+	rdmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), val);
+	return val;
+}
+
+extern int x2apic, x2apic_phys;
 extern void check_x2apic(void);
 extern void enable_x2apic(void);
 extern void enable_IR_x2apic(void);
@@ -110,30 +171,24 @@ static inline int x2apic_enabled(void)
 	return 0;
 }
 #else
-#define x2apic_enabled()	0
+static inline void check_x2apic(void)
+{
+}
+static inline void enable_x2apic(void)
+{
+}
+static inline void enable_IR_x2apic(void)
+{
+}
+static inline int x2apic_enabled(void)
+{
+	return 0;
+}
 #endif
 
-struct apic_ops {
-	u32 (*read)(u32 reg);
-	void (*write)(u32 reg, u32 v);
-	u64 (*icr_read)(void);
-	void (*icr_write)(u32 low, u32 high);
-	void (*wait_icr_idle)(void);
-	u32 (*safe_wait_icr_idle)(void);
-};
-
-extern struct apic_ops *apic_ops;
-
-#define apic_read (apic_ops->read)
-#define apic_write (apic_ops->write)
-#define apic_icr_read (apic_ops->icr_read)
-#define apic_icr_write (apic_ops->icr_write)
-#define apic_wait_icr_idle (apic_ops->wait_icr_idle)
-#define safe_apic_wait_icr_idle (apic_ops->safe_wait_icr_idle)
-
 extern int get_physical_broadcast(void);
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_X2APIC
 static inline void ack_x2APIC_irq(void)
 {
 	/* Docs say use 0 for future compatibility */
@@ -141,18 +196,6 @@ static inline void ack_x2APIC_irq(void)
 }
 #endif
 
-
-static inline void ack_APIC_irq(void)
-{
-	/*
-	 * ack_APIC_irq() actually gets compiled as a single instruction
-	 * ... yummie.
-	 */
-
-	/* Docs say use 0 for future compatibility */
-	apic_write(APIC_EOI, 0);
-}
-
 extern int lapic_get_maxlvt(void);
 extern void clear_local_APIC(void);
 extern void connect_bsp_APIC(void);
@@ -196,4 +239,327 @@ static inline void disable_local_APIC(void) { }
 
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
+#ifdef CONFIG_X86_64
+#define	SET_APIC_ID(x)		(apic->set_apic_id(x))
+#else
+
+#endif
+
+/*
+ * Copyright 2004 James Cleverdon, IBM.
+ * Subject to the GNU Public License, v.2
+ *
+ * Generic APIC sub-arch data struct.
+ *
+ * Hacked for x86-64 by James Cleverdon from i386 architecture code by
+ * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
+ * James Cleverdon.
+ */
+struct apic {
+	char *name;
+
+	int (*probe)(void);
+	int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
+	int (*apic_id_registered)(void);
+
+	u32 irq_delivery_mode;
+	u32 irq_dest_mode;
+
+	const struct cpumask *(*target_cpus)(void);
+
+	int disable_esr;
+
+	int dest_logical;
+	unsigned long (*check_apicid_used)(physid_mask_t bitmap, int apicid);
+	unsigned long (*check_apicid_present)(int apicid);
+
+	void (*vector_allocation_domain)(int cpu, struct cpumask *retmask);
+	void (*init_apic_ldr)(void);
+
+	physid_mask_t (*ioapic_phys_id_map)(physid_mask_t map);
+
+	void (*setup_apic_routing)(void);
+	int (*multi_timer_check)(int apic, int irq);
+	int (*apicid_to_node)(int logical_apicid);
+	int (*cpu_to_logical_apicid)(int cpu);
+	int (*cpu_present_to_apicid)(int mps_cpu);
+	physid_mask_t (*apicid_to_cpu_present)(int phys_apicid);
+	void (*setup_portio_remap)(void);
+	int (*check_phys_apicid_present)(int boot_cpu_physical_apicid);
+	void (*enable_apic_mode)(void);
+	int (*phys_pkg_id)(int cpuid_apic, int index_msb);
+
+	/*
+	 * When one of the next two hooks returns 1 the apic
+	 * is switched to this. Essentially they are additional
+	 * probe functions:
+	 */
+	int (*mps_oem_check)(struct mpc_table *mpc, char *oem, char *productid);
+
+	unsigned int (*get_apic_id)(unsigned long x);
+	unsigned long (*set_apic_id)(unsigned int id);
+	unsigned long apic_id_mask;
+
+	unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask);
+	unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask,
+					       const struct cpumask *andmask);
+
+	/* ipi */
+	void (*send_IPI_mask)(const struct cpumask *mask, int vector);
+	void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
+					 int vector);
+	void (*send_IPI_allbutself)(int vector);
+	void (*send_IPI_all)(int vector);
+	void (*send_IPI_self)(int vector);
+
+	/* wakeup_secondary_cpu */
+	int (*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
+
+	int trampoline_phys_low;
+	int trampoline_phys_high;
+
+	void (*wait_for_init_deassert)(atomic_t *deassert);
+	void (*smp_callin_clear_local_apic)(void);
+	void (*inquire_remote_apic)(int apicid);
+
+	/* apic ops */
+	u32 (*read)(u32 reg);
+	void (*write)(u32 reg, u32 v);
+	u64 (*icr_read)(void);
+	void (*icr_write)(u32 low, u32 high);
+	void (*wait_icr_idle)(void);
+	u32 (*safe_wait_icr_idle)(void);
+};
+
+/*
+ * Pointer to the local APIC driver in use on this system (there's
+ * always just one such driver in use - the kernel decides via an
+ * early probing process which one it picks - and then sticks to it):
+ */
+extern struct apic *apic;
+
+/*
+ * APIC functionality to boot other CPUs - only used on SMP:
+ */
+#ifdef CONFIG_SMP
+extern atomic_t init_deasserted;
+extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
+#endif
+
+static inline u32 apic_read(u32 reg)
+{
+	return apic->read(reg);
+}
+
+static inline void apic_write(u32 reg, u32 val)
+{
+	apic->write(reg, val);
+}
+
+static inline u64 apic_icr_read(void)
+{
+	return apic->icr_read();
+}
+
+static inline void apic_icr_write(u32 low, u32 high)
+{
+	apic->icr_write(low, high);
+}
+
+static inline void apic_wait_icr_idle(void)
+{
+	apic->wait_icr_idle();
+}
+
+static inline u32 safe_apic_wait_icr_idle(void)
+{
+	return apic->safe_wait_icr_idle();
+}
+
+
+static inline void ack_APIC_irq(void)
+{
+	/*
+	 * ack_APIC_irq() actually gets compiled as a single instruction
+	 * ... yummie.
+	 */
+
+	/* Docs say use 0 for future compatibility */
+	apic_write(APIC_EOI, 0);
+}
+
+static inline unsigned default_get_apic_id(unsigned long x)
+{
+	unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
+
+	if (APIC_XAPIC(ver))
+		return (x >> 24) & 0xFF;
+	else
+		return (x >> 24) & 0x0F;
+}
+
+/*
+ * Warm reset vector default position:
+ */
+#define DEFAULT_TRAMPOLINE_PHYS_LOW		0x467
+#define DEFAULT_TRAMPOLINE_PHYS_HIGH		0x469
+
+#ifdef CONFIG_X86_64
+extern struct apic apic_flat;
+extern struct apic apic_physflat;
+extern struct apic apic_x2apic_cluster;
+extern struct apic apic_x2apic_phys;
+extern int default_acpi_madt_oem_check(char *, char *);
+
+extern void apic_send_IPI_self(int vector);
+
+extern struct apic apic_x2apic_uv_x;
+DECLARE_PER_CPU(int, x2apic_extra_bits);
+
+extern int default_cpu_present_to_apicid(int mps_cpu);
+extern int default_check_phys_apicid_present(int boot_cpu_physical_apicid);
+#endif
+
+static inline void default_wait_for_init_deassert(atomic_t *deassert)
+{
+	while (!atomic_read(deassert))
+		cpu_relax();
+	return;
+}
+
+extern void generic_bigsmp_probe(void);
+
+
+#ifdef CONFIG_X86_LOCAL_APIC
+
+#include <asm/smp.h>
+
+#define APIC_DFR_VALUE	(APIC_DFR_FLAT)
+
+static inline const struct cpumask *default_target_cpus(void)
+{
+#ifdef CONFIG_SMP
+	return cpu_online_mask;
+#else
+	return cpumask_of(0);
+#endif
+}
+
+DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid);
+
+
+static inline unsigned int read_apic_id(void)
+{
+	unsigned int reg;
+
+	reg = apic_read(APIC_ID);
+
+	return apic->get_apic_id(reg);
+}
+
+extern void default_setup_apic_routing(void);
+
+#ifdef CONFIG_X86_32
+/*
+ * Set up the logical destination ID.
+ *
+ * Intel recommends to set DFR, LDR and TPR before enabling
+ * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
+ * document number 292116).  So here it goes...
+ */
+extern void default_init_apic_ldr(void);
+
+static inline int default_apic_id_registered(void)
+{
+	return physid_isset(read_apic_id(), phys_cpu_present_map);
+}
+
+static inline unsigned int
+default_cpu_mask_to_apicid(const struct cpumask *cpumask)
+{
+	return cpumask_bits(cpumask)[0];
+}
+
+static inline unsigned int
+default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			       const struct cpumask *andmask)
+{
+	unsigned long mask1 = cpumask_bits(cpumask)[0];
+	unsigned long mask2 = cpumask_bits(andmask)[0];
+	unsigned long mask3 = cpumask_bits(cpu_online_mask)[0];
+
+	return (unsigned int)(mask1 & mask2 & mask3);
+}
+
+static inline int default_phys_pkg_id(int cpuid_apic, int index_msb)
+{
+	return cpuid_apic >> index_msb;
+}
+
+extern int default_apicid_to_node(int logical_apicid);
+
+#endif
+
+static inline unsigned long default_check_apicid_used(physid_mask_t bitmap, int apicid)
+{
+	return physid_isset(apicid, bitmap);
+}
+
+static inline unsigned long default_check_apicid_present(int bit)
+{
+	return physid_isset(bit, phys_cpu_present_map);
+}
+
+static inline physid_mask_t default_ioapic_phys_id_map(physid_mask_t phys_map)
+{
+	return phys_map;
+}
+
+/* Mapping from cpu number to logical apicid */
+static inline int default_cpu_to_logical_apicid(int cpu)
+{
+	return 1 << cpu;
+}
+
+static inline int __default_cpu_present_to_apicid(int mps_cpu)
+{
+	if (mps_cpu < nr_cpu_ids && cpu_present(mps_cpu))
+		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
+	else
+		return BAD_APICID;
+}
+
+static inline int
+__default_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return physid_isset(boot_cpu_physical_apicid, phys_cpu_present_map);
+}
+
+#ifdef CONFIG_X86_32
+static inline int default_cpu_present_to_apicid(int mps_cpu)
+{
+	return __default_cpu_present_to_apicid(mps_cpu);
+}
+
+static inline int
+default_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return __default_check_phys_apicid_present(boot_cpu_physical_apicid);
+}
+#else
+extern int default_cpu_present_to_apicid(int mps_cpu);
+extern int default_check_phys_apicid_present(int boot_cpu_physical_apicid);
+#endif
+
+static inline physid_mask_t default_apicid_to_cpu_present(int phys_apicid)
+{
+	return physid_mask_of_physid(phys_apicid);
+}
+
+#endif /* CONFIG_X86_LOCAL_APIC */
+
+#ifdef CONFIG_X86_32
+extern u8 cpu_2_logical_apicid[NR_CPUS];
+#endif
+
 #endif /* _ASM_X86_APIC_H */
diff --git a/arch/x86/include/asm/apicnum.h b/arch/x86/include/asm/apicnum.h
new file mode 100644
index 000000000000..82f613c607ce
--- /dev/null
+++ b/arch/x86/include/asm/apicnum.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_APICNUM_H
+#define _ASM_X86_APICNUM_H
+
+/* define MAX_IO_APICS */
+#ifdef CONFIG_X86_32
+# define MAX_IO_APICS 64
+#else
+# define MAX_IO_APICS 128
+# define MAX_LOCAL_APIC 32768
+#endif
+
+#endif /* _ASM_X86_APICNUM_H */
diff --git a/arch/x86/include/asm/mach-default/apm.h b/arch/x86/include/asm/apm.h
index 20370c6db74b..20370c6db74b 100644
--- a/arch/x86/include/asm/mach-default/apm.h
+++ b/arch/x86/include/asm/apm.h
diff --git a/arch/x86/include/asm/arch_hooks.h b/arch/x86/include/asm/arch_hooks.h
deleted file mode 100644
index cbd4957838a6..000000000000
--- a/arch/x86/include/asm/arch_hooks.h
+++ /dev/null
@@ -1,26 +0,0 @@
-#ifndef _ASM_X86_ARCH_HOOKS_H
-#define _ASM_X86_ARCH_HOOKS_H
-
-#include <linux/interrupt.h>
-
-/*
- *	linux/include/asm/arch_hooks.h
- *
- *	define the architecture specific hooks
- */
-
-/* these aren't arch hooks, they are generic routines
- * that can be used by the hooks */
-extern void init_ISA_irqs(void);
-extern irqreturn_t timer_interrupt(int irq, void *dev_id);
-
-/* these are the defined hooks */
-extern void intr_init_hook(void);
-extern void pre_intr_init_hook(void);
-extern void pre_setup_arch_hook(void);
-extern void trap_init_hook(void);
-extern void pre_time_init_hook(void);
-extern void time_init_hook(void);
-extern void mca_nmi_hook(void);
-
-#endif /* _ASM_X86_ARCH_HOOKS_H */
diff --git a/arch/x86/include/asm/bigsmp/apic.h b/arch/x86/include/asm/bigsmp/apic.h
deleted file mode 100644
index d8dd9f537911..000000000000
--- a/arch/x86/include/asm/bigsmp/apic.h
+++ /dev/null
@@ -1,155 +0,0 @@
-#ifndef __ASM_MACH_APIC_H
-#define __ASM_MACH_APIC_H
-
-#define xapic_phys_to_log_apicid(cpu) (per_cpu(x86_bios_cpu_apicid, cpu))
-#define esr_disable (1)
-
-static inline int apic_id_registered(void)
-{
-	return (1);
-}
-
-static inline const cpumask_t *target_cpus(void)
-{
-#ifdef CONFIG_SMP
-	return &cpu_online_map;
-#else
-	return &cpumask_of_cpu(0);
-#endif
-}
-
-#undef APIC_DEST_LOGICAL
-#define APIC_DEST_LOGICAL	0
-#define APIC_DFR_VALUE		(APIC_DFR_FLAT)
-#define INT_DELIVERY_MODE	(dest_Fixed)
-#define INT_DEST_MODE		(0)    /* phys delivery to target proc */
-#define NO_BALANCE_IRQ		(0)
-
-static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
-{
-	return (0);
-}
-
-static inline unsigned long check_apicid_present(int bit)
-{
-	return (1);
-}
-
-static inline unsigned long calculate_ldr(int cpu)
-{
-	unsigned long val, id;
-	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
-	id = xapic_phys_to_log_apicid(cpu);
-	val |= SET_APIC_LOGICAL_ID(id);
-	return val;
-}
-
-/*
- * Set up the logical destination ID.
- *
- * Intel recommends to set DFR, LDR and TPR before enabling
- * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
- * document number 292116).  So here it goes...
- */
-static inline void init_apic_ldr(void)
-{
-	unsigned long val;
-	int cpu = smp_processor_id();
-
-	apic_write(APIC_DFR, APIC_DFR_VALUE);
-	val = calculate_ldr(cpu);
-	apic_write(APIC_LDR, val);
-}
-
-static inline void setup_apic_routing(void)
-{
-	printk("Enabling APIC mode:  %s.  Using %d I/O APICs\n",
-		"Physflat", nr_ioapics);
-}
-
-static inline int multi_timer_check(int apic, int irq)
-{
-	return (0);
-}
-
-static inline int apicid_to_node(int logical_apicid)
-{
-	return apicid_2_node[hard_smp_processor_id()];
-}
-
-static inline int cpu_present_to_apicid(int mps_cpu)
-{
-	if (mps_cpu < nr_cpu_ids)
-		return (int) per_cpu(x86_bios_cpu_apicid, mps_cpu);
-
-	return BAD_APICID;
-}
-
-static inline physid_mask_t apicid_to_cpu_present(int phys_apicid)
-{
-	return physid_mask_of_physid(phys_apicid);
-}
-
-extern u8 cpu_2_logical_apicid[];
-/* Mapping from cpu number to logical apicid */
-static inline int cpu_to_logical_apicid(int cpu)
-{
-	if (cpu >= nr_cpu_ids)
-		return BAD_APICID;
-	return cpu_physical_id(cpu);
-}
-
-static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map)
-{
-	/* For clustered we don't have a good way to do this yet - hack */
-	return physids_promote(0xFFL);
-}
-
-static inline void setup_portio_remap(void)
-{
-}
-
-static inline void enable_apic_mode(void)
-{
-}
-
-static inline int check_phys_apicid_present(int boot_cpu_physical_apicid)
-{
-	return (1);
-}
-
-/* As we are using single CPU as destination, pick only one CPU here */
-static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask)
-{
-	int cpu;
-	int apicid;	
-
-	cpu = first_cpu(*cpumask);
-	apicid = cpu_to_logical_apicid(cpu);
-	return apicid;
-}
-
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-						  const struct cpumask *andmask)
-{
-	int cpu;
-
-	/*
-	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
-	 * May as well be the first.
-	 */
-	for_each_cpu_and(cpu, cpumask, andmask)
-		if (cpumask_test_cpu(cpu, cpu_online_mask))
-			break;
-	if (cpu < nr_cpu_ids)
-		return cpu_to_logical_apicid(cpu);
-
-	return BAD_APICID;
-}
-
-static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
-#endif /* __ASM_MACH_APIC_H */
diff --git a/arch/x86/include/asm/bigsmp/apicdef.h b/arch/x86/include/asm/bigsmp/apicdef.h
deleted file mode 100644
index 392c3f5ef2fe..000000000000
--- a/arch/x86/include/asm/bigsmp/apicdef.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_MACH_APICDEF_H
-#define __ASM_MACH_APICDEF_H
-
-#define		APIC_ID_MASK		(0xFF<<24)
-
-static inline unsigned get_apic_id(unsigned long x)
-{
-	return (((x)>>24)&0xFF);
-}
-
-#define		GET_APIC_ID(x)	get_apic_id(x)
-
-#endif
diff --git a/arch/x86/include/asm/bigsmp/ipi.h b/arch/x86/include/asm/bigsmp/ipi.h
deleted file mode 100644
index 27fcd01b3ae6..000000000000
--- a/arch/x86/include/asm/bigsmp/ipi.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#ifndef __ASM_MACH_IPI_H
-#define __ASM_MACH_IPI_H
-
-void send_IPI_mask_sequence(const struct cpumask *mask, int vector);
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector);
-
-static inline void send_IPI_mask(const struct cpumask *mask, int vector)
-{
-	send_IPI_mask_sequence(mask, vector);
-}
-
-static inline void send_IPI_allbutself(int vector)
-{
-	send_IPI_mask_allbutself(cpu_online_mask, vector);
-}
-
-static inline void send_IPI_all(int vector)
-{
-	send_IPI_mask(cpu_online_mask, vector);
-}
-
-#endif /* __ASM_MACH_IPI_H */
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index dd61616cb73d..6526cf08b0e4 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -10,17 +10,31 @@
 #define EXTENDED_VGA	0xfffe		/* 80x50 mode */
 #define ASK_VGA		0xfffd		/* ask for it at bootup */
 
+#ifdef __KERNEL__
+
 /* Physical address where kernel should be loaded. */
 #define LOAD_PHYSICAL_ADDR ((CONFIG_PHYSICAL_START \
 				+ (CONFIG_PHYSICAL_ALIGN - 1)) \
 				& ~(CONFIG_PHYSICAL_ALIGN - 1))
 
+#ifdef CONFIG_KERNEL_BZIP2
+#define BOOT_HEAP_SIZE             0x400000
+#else /* !CONFIG_KERNEL_BZIP2 */
+
 #ifdef CONFIG_X86_64
 #define BOOT_HEAP_SIZE	0x7000
-#define BOOT_STACK_SIZE	0x4000
 #else
 #define BOOT_HEAP_SIZE	0x4000
+#endif
+
+#endif /* !CONFIG_KERNEL_BZIP2 */
+
+#ifdef CONFIG_X86_64
+#define BOOT_STACK_SIZE	0x4000
+#else
 #define BOOT_STACK_SIZE	0x1000
 #endif
 
+#endif /* __KERNEL__ */
+
 #endif /* _ASM_X86_BOOT_H */
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 2f8466540fb5..5b301b7ff5f4 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -5,24 +5,43 @@
 #include <linux/mm.h>
 
 /* Caches aren't brain-dead on the intel. */
-#define flush_cache_all()			do { } while (0)
-#define flush_cache_mm(mm)			do { } while (0)
-#define flush_cache_dup_mm(mm)			do { } while (0)
-#define flush_cache_range(vma, start, end)	do { } while (0)
-#define flush_cache_page(vma, vmaddr, pfn)	do { } while (0)
-#define flush_dcache_page(page)			do { } while (0)
-#define flush_dcache_mmap_lock(mapping)		do { } while (0)
-#define flush_dcache_mmap_unlock(mapping)	do { } while (0)
-#define flush_icache_range(start, end)		do { } while (0)
-#define flush_icache_page(vma, pg)		do { } while (0)
-#define flush_icache_user_range(vma, pg, adr, len)	do { } while (0)
-#define flush_cache_vmap(start, end)		do { } while (0)
-#define flush_cache_vunmap(start, end)		do { } while (0)
+static inline void flush_cache_all(void) { }
+static inline void flush_cache_mm(struct mm_struct *mm) { }
+static inline void flush_cache_dup_mm(struct mm_struct *mm) { }
+static inline void flush_cache_range(struct vm_area_struct *vma,
+				     unsigned long start, unsigned long end) { }
+static inline void flush_cache_page(struct vm_area_struct *vma,
+				    unsigned long vmaddr, unsigned long pfn) { }
+static inline void flush_dcache_page(struct page *page) { }
+static inline void flush_dcache_mmap_lock(struct address_space *mapping) { }
+static inline void flush_dcache_mmap_unlock(struct address_space *mapping) { }
+static inline void flush_icache_range(unsigned long start,
+				      unsigned long end) { }
+static inline void flush_icache_page(struct vm_area_struct *vma,
+				     struct page *page) { }
+static inline void flush_icache_user_range(struct vm_area_struct *vma,
+					   struct page *page,
+					   unsigned long addr,
+					   unsigned long len) { }
+static inline void flush_cache_vmap(unsigned long start, unsigned long end) { }
+static inline void flush_cache_vunmap(unsigned long start,
+				      unsigned long end) { }
 
-#define copy_to_user_page(vma, page, vaddr, dst, src, len)	\
-	memcpy((dst), (src), (len))
-#define copy_from_user_page(vma, page, vaddr, dst, src, len)	\
-	memcpy((dst), (src), (len))
+static inline void copy_to_user_page(struct vm_area_struct *vma,
+				     struct page *page, unsigned long vaddr,
+				     void *dst, const void *src,
+				     unsigned long len)
+{
+	memcpy(dst, src, len);
+}
+
+static inline void copy_from_user_page(struct vm_area_struct *vma,
+				       struct page *page, unsigned long vaddr,
+				       void *dst, const void *src,
+				       unsigned long len)
+{
+	memcpy(dst, src, len);
+}
 
 #define PG_non_WB				PG_arch_1
 PAGEFLAG(NonWB, non_WB)
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 2bc162e0ec6e..0e63c9a2a8d0 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -1,5 +1,55 @@
 /*
- * Some macros to handle stack frames in assembly.
+
+ x86 function call convention, 64-bit:
+ -------------------------------------
+  arguments           |  callee-saved      | extra caller-saved | return
+ [callee-clobbered]   |                    | [callee-clobbered] |
+ ---------------------------------------------------------------------------
+ rdi rsi rdx rcx r8-9 | rbx rbp [*] r12-15 | r10-11             | rax, rdx [**]
+
+ ( rsp is obviously invariant across normal function calls. (gcc can 'merge'
+   functions when it sees tail-call optimization possibilities) rflags is
+   clobbered. Leftover arguments are passed over the stack frame.)
+
+ [*]  In the frame-pointers case rbp is fixed to the stack frame.
+
+ [**] for struct return values wider than 64 bits the return convention is a
+      bit more complex: up to 128 bits width we return small structures
+      straight in rax, rdx. For structures larger than that (3 words or
+      larger) the caller puts a pointer to an on-stack return struct
+      [allocated in the caller's stack frame] into the first argument - i.e.
+      into rdi. All other arguments shift up by one in this case.
+      Fortunately this case is rare in the kernel.
+
+For 32-bit we have the following conventions - kernel is built with
+-mregparm=3 and -freg-struct-return:
+
+ x86 function calling convention, 32-bit:
+ ----------------------------------------
+  arguments         | callee-saved        | extra caller-saved | return
+ [callee-clobbered] |                     | [callee-clobbered] |
+ -------------------------------------------------------------------------
+ eax edx ecx        | ebx edi esi ebp [*] | <none>             | eax, edx [**]
+
+ ( here too esp is obviously invariant across normal function calls. eflags
+   is clobbered. Leftover arguments are passed over the stack frame. )
+
+ [*]  In the frame-pointers case ebp is fixed to the stack frame.
+
+ [**] We build with -freg-struct-return, which on 32-bit means similar
+      semantics as on 64-bit: edx can be used for a second return value
+      (i.e. covering integer and structure sizes up to 64 bits) - after that
+      it gets more complex and more expensive: 3-word or larger struct returns
+      get done in the caller's frame and the pointer to the return struct goes
+      into regparm0, i.e. eax - the other arguments shift up and the
+      function's register parameters degenerate to regparm=2 in essence.
+
+*/
+
+
+/*
+ * 64-bit system call stack frame layout defines and helpers,
+ * for assembly code:
  */
 
 #define R15		  0
@@ -9,7 +59,7 @@
 #define RBP		 32
 #define RBX		 40
 
-/* arguments: interrupts/non tracing syscalls only save upto here*/
+/* arguments: interrupts/non tracing syscalls only save up to here: */
 #define R11		 48
 #define R10		 56
 #define R9		 64
@@ -22,7 +72,7 @@
 #define ORIG_RAX	120       /* + error_code */
 /* end of arguments */
 
-/* cpu exception frame or undefined in case of fast syscall. */
+/* cpu exception frame or undefined in case of fast syscall: */
 #define RIP		128
 #define CS		136
 #define EFLAGS		144
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index bae482df6039..b185091bf19c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -7,6 +7,20 @@
 #include <linux/nodemask.h>
 #include <linux/percpu.h>
 
+#ifdef CONFIG_SMP
+
+extern void prefill_possible_map(void);
+
+#else /* CONFIG_SMP */
+
+static inline void prefill_possible_map(void) {}
+
+#define cpu_physical_id(cpu)			boot_cpu_physical_apicid
+#define safe_smp_processor_id()			0
+#define stack_smp_processor_id()		0
+
+#endif /* CONFIG_SMP */
+
 struct x86_cpu {
 	struct cpu cpu;
 };
@@ -17,4 +31,7 @@ extern void arch_unregister_cpu(int);
 #endif
 
 DECLARE_PER_CPU(int, cpu_state);
+
+extern unsigned int boot_cpu_id;
+
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h
new file mode 100644
index 000000000000..a7f3c75f8ad7
--- /dev/null
+++ b/arch/x86/include/asm/cpumask.h
@@ -0,0 +1,32 @@
+#ifndef _ASM_X86_CPUMASK_H
+#define _ASM_X86_CPUMASK_H
+#ifndef __ASSEMBLY__
+#include <linux/cpumask.h>
+
+#ifdef CONFIG_X86_64
+
+extern cpumask_var_t cpu_callin_mask;
+extern cpumask_var_t cpu_callout_mask;
+extern cpumask_var_t cpu_initialized_mask;
+extern cpumask_var_t cpu_sibling_setup_mask;
+
+extern void setup_cpu_local_masks(void);
+
+#else /* CONFIG_X86_32 */
+
+extern cpumask_t cpu_callin_map;
+extern cpumask_t cpu_callout_map;
+extern cpumask_t cpu_initialized;
+extern cpumask_t cpu_sibling_setup_map;
+
+#define cpu_callin_mask		((struct cpumask *)&cpu_callin_map)
+#define cpu_callout_mask	((struct cpumask *)&cpu_callout_map)
+#define cpu_initialized_mask	((struct cpumask *)&cpu_initialized)
+#define cpu_sibling_setup_mask	((struct cpumask *)&cpu_sibling_setup_map)
+
+static inline void setup_cpu_local_masks(void) { }
+
+#endif /* CONFIG_X86_32 */
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_CPUMASK_H */
diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index 0930b4f8d672..c68c361697e1 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -1,39 +1,21 @@
 #ifndef _ASM_X86_CURRENT_H
 #define _ASM_X86_CURRENT_H
 
-#ifdef CONFIG_X86_32
 #include <linux/compiler.h>
 #include <asm/percpu.h>
 
+#ifndef __ASSEMBLY__
 struct task_struct;
 
 DECLARE_PER_CPU(struct task_struct *, current_task);
-static __always_inline struct task_struct *get_current(void)
-{
-	return x86_read_percpu(current_task);
-}
-
-#else /* X86_32 */
-
-#ifndef __ASSEMBLY__
-#include <asm/pda.h>
-
-struct task_struct;
 
 static __always_inline struct task_struct *get_current(void)
 {
-	return read_pda(pcurrent);
+	return percpu_read(current_task);
 }
 
-#else /* __ASSEMBLY__ */
-
-#include <asm/asm-offsets.h>
-#define GET_CURRENT(reg) movq %gs:(pda_pcurrent),reg
+#define current get_current()
 
 #endif /* __ASSEMBLY__ */
 
-#endif /* X86_32 */
-
-#define current get_current()
-
 #endif /* _ASM_X86_CURRENT_H */
diff --git a/arch/x86/include/asm/mach-default/do_timer.h b/arch/x86/include/asm/do_timer.h
index 23ecda0b28a0..23ecda0b28a0 100644
--- a/arch/x86/include/asm/mach-default/do_timer.h
+++ b/arch/x86/include/asm/do_timer.h
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index f51a3ddde01a..83c1bc8d2e8a 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -112,7 +112,7 @@ extern unsigned int vdso_enabled;
  * now struct_user_regs, they are different)
  */
 
-#define ELF_CORE_COPY_REGS(pr_reg, regs)	\
+#define ELF_CORE_COPY_REGS_COMMON(pr_reg, regs)	\
 do {						\
 	pr_reg[0] = regs->bx;			\
 	pr_reg[1] = regs->cx;			\
@@ -124,7 +124,6 @@ do {						\
 	pr_reg[7] = regs->ds & 0xffff;		\
 	pr_reg[8] = regs->es & 0xffff;		\
 	pr_reg[9] = regs->fs & 0xffff;		\
-	savesegment(gs, pr_reg[10]);		\
 	pr_reg[11] = regs->orig_ax;		\
 	pr_reg[12] = regs->ip;			\
 	pr_reg[13] = regs->cs & 0xffff;		\
@@ -133,6 +132,18 @@ do {						\
 	pr_reg[16] = regs->ss & 0xffff;		\
 } while (0);
 
+#define ELF_CORE_COPY_REGS(pr_reg, regs)	\
+do {						\
+	ELF_CORE_COPY_REGS_COMMON(pr_reg, regs);\
+	pr_reg[10] = get_user_gs(regs);		\
+} while (0);
+
+#define ELF_CORE_COPY_KERNEL_REGS(pr_reg, regs)	\
+do {						\
+	ELF_CORE_COPY_REGS_COMMON(pr_reg, regs);\
+	savesegment(gs, pr_reg[10]);		\
+} while (0);
+
 #define ELF_PLATFORM	(utsname()->machine)
 #define set_personality_64bit()	do { } while (0)
 
diff --git a/arch/x86/include/asm/mach-default/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index 6b1add8e31dd..854d538ae857 100644
--- a/arch/x86/include/asm/mach-default/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -9,12 +9,28 @@
  * is no hardware IRQ pin equivalent for them, they are triggered
  * through the ICC by us (IPIs)
  */
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
-BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
 BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
 BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
+
+BUILD_INTERRUPT3(invalidate_interrupt0,INVALIDATE_TLB_VECTOR_START+0,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt1,INVALIDATE_TLB_VECTOR_START+1,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt2,INVALIDATE_TLB_VECTOR_START+2,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt3,INVALIDATE_TLB_VECTOR_START+3,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt4,INVALIDATE_TLB_VECTOR_START+4,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt5,INVALIDATE_TLB_VECTOR_START+5,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt6,INVALIDATE_TLB_VECTOR_START+6,
+		 smp_invalidate_interrupt)
+BUILD_INTERRUPT3(invalidate_interrupt7,INVALIDATE_TLB_VECTOR_START+7,
+		 smp_invalidate_interrupt)
 #endif
 
 /*
@@ -25,10 +41,15 @@ BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
  * a much simpler SMP time architecture:
  */
 #ifdef CONFIG_X86_LOCAL_APIC
+
 BUILD_INTERRUPT(apic_timer_interrupt,LOCAL_TIMER_VECTOR)
 BUILD_INTERRUPT(error_interrupt,ERROR_APIC_VECTOR)
 BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR)
 
+#ifdef CONFIG_PERF_COUNTERS
+BUILD_INTERRUPT(perf_counter_interrupt, LOCAL_PERF_VECTOR)
+#endif
+
 #ifdef CONFIG_X86_MCE_P4THERMAL
 BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
 #endif
diff --git a/arch/x86/include/asm/es7000/apic.h b/arch/x86/include/asm/es7000/apic.h
deleted file mode 100644
index c58b9cc74465..000000000000
--- a/arch/x86/include/asm/es7000/apic.h
+++ /dev/null
@@ -1,242 +0,0 @@
-#ifndef __ASM_ES7000_APIC_H
-#define __ASM_ES7000_APIC_H
-
-#include <linux/gfp.h>
-
-#define xapic_phys_to_log_apicid(cpu) per_cpu(x86_bios_cpu_apicid, cpu)
-#define esr_disable (1)
-
-static inline int apic_id_registered(void)
-{
-	        return (1);
-}
-
-static inline const cpumask_t *target_cpus_cluster(void)
-{
-	return &CPU_MASK_ALL;
-}
-
-static inline const cpumask_t *target_cpus(void)
-{
-	return &cpumask_of_cpu(smp_processor_id());
-}
-
-#define APIC_DFR_VALUE_CLUSTER		(APIC_DFR_CLUSTER)
-#define INT_DELIVERY_MODE_CLUSTER	(dest_LowestPrio)
-#define INT_DEST_MODE_CLUSTER		(1) /* logical delivery broadcast to all procs */
-#define NO_BALANCE_IRQ_CLUSTER		(1)
-
-#define APIC_DFR_VALUE		(APIC_DFR_FLAT)
-#define INT_DELIVERY_MODE	(dest_Fixed)
-#define INT_DEST_MODE		(0)    /* phys delivery to target procs */
-#define NO_BALANCE_IRQ		(0)
-#undef  APIC_DEST_LOGICAL
-#define APIC_DEST_LOGICAL	0x0
-
-static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
-{
-	return 0;
-}
-static inline unsigned long check_apicid_present(int bit)
-{
-	return physid_isset(bit, phys_cpu_present_map);
-}
-
-#define apicid_cluster(apicid) (apicid & 0xF0)
-
-static inline unsigned long calculate_ldr(int cpu)
-{
-	unsigned long id;
-	id = xapic_phys_to_log_apicid(cpu);
-	return (SET_APIC_LOGICAL_ID(id));
-}
-
-/*
- * Set up the logical destination ID.
- *
- * Intel recommends to set DFR, LdR and TPR before enabling
- * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
- * document number 292116).  So here it goes...
- */
-static inline void init_apic_ldr_cluster(void)
-{
-	unsigned long val;
-	int cpu = smp_processor_id();
-
-	apic_write(APIC_DFR, APIC_DFR_VALUE_CLUSTER);
-	val = calculate_ldr(cpu);
-	apic_write(APIC_LDR, val);
-}
-
-static inline void init_apic_ldr(void)
-{
-	unsigned long val;
-	int cpu = smp_processor_id();
-
-	apic_write(APIC_DFR, APIC_DFR_VALUE);
-	val = calculate_ldr(cpu);
-	apic_write(APIC_LDR, val);
-}
-
-extern int apic_version [MAX_APICS];
-static inline void setup_apic_routing(void)
-{
-	int apic = per_cpu(x86_bios_cpu_apicid, smp_processor_id());
-	printk("Enabling APIC mode:  %s. Using %d I/O APICs, target cpus %lx\n",
-		(apic_version[apic] == 0x14) ?
-			"Physical Cluster" : "Logical Cluster",
-			nr_ioapics, cpus_addr(*target_cpus())[0]);
-}
-
-static inline int multi_timer_check(int apic, int irq)
-{
-	return 0;
-}
-
-static inline int apicid_to_node(int logical_apicid)
-{
-	return 0;
-}
-
-
-static inline int cpu_present_to_apicid(int mps_cpu)
-{
-	if (!mps_cpu)
-		return boot_cpu_physical_apicid;
-	else if (mps_cpu < nr_cpu_ids)
-		return (int) per_cpu(x86_bios_cpu_apicid, mps_cpu);
-	else
-		return BAD_APICID;
-}
-
-static inline physid_mask_t apicid_to_cpu_present(int phys_apicid)
-{
-	static int id = 0;
-	physid_mask_t mask;
-	mask = physid_mask_of_physid(id);
-	++id;
-	return mask;
-}
-
-extern u8 cpu_2_logical_apicid[];
-/* Mapping from cpu number to logical apicid */
-static inline int cpu_to_logical_apicid(int cpu)
-{
-#ifdef CONFIG_SMP
-	if (cpu >= nr_cpu_ids)
-		return BAD_APICID;
-	return (int)cpu_2_logical_apicid[cpu];
-#else
-	return logical_smp_processor_id();
-#endif
-}
-
-static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map)
-{
-	/* For clustered we don't have a good way to do this yet - hack */
-	return physids_promote(0xff);
-}
-
-
-static inline void setup_portio_remap(void)
-{
-}
-
-extern unsigned int boot_cpu_physical_apicid;
-static inline int check_phys_apicid_present(int cpu_physical_apicid)
-{
-	boot_cpu_physical_apicid = read_apic_id();
-	return (1);
-}
-
-static inline unsigned int
-cpu_mask_to_apicid_cluster(const struct cpumask *cpumask)
-{
-	int num_bits_set;
-	int cpus_found = 0;
-	int cpu;
-	int apicid;
-
-	num_bits_set = cpumask_weight(cpumask);
-	/* Return id to all */
-	if (num_bits_set == nr_cpu_ids)
-		return 0xFF;
-	/*
-	 * The cpus in the mask must all be on the apic cluster.  If are not
-	 * on the same apicid cluster return default value of TARGET_CPUS.
-	 */
-	cpu = cpumask_first(cpumask);
-	apicid = cpu_to_logical_apicid(cpu);
-	while (cpus_found < num_bits_set) {
-		if (cpumask_test_cpu(cpu, cpumask)) {
-			int new_apicid = cpu_to_logical_apicid(cpu);
-			if (apicid_cluster(apicid) !=
-					apicid_cluster(new_apicid)){
-				printk ("%s: Not a valid mask!\n", __func__);
-				return 0xFF;
-			}
-			apicid = new_apicid;
-			cpus_found++;
-		}
-		cpu++;
-	}
-	return apicid;
-}
-
-static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask)
-{
-	int num_bits_set;
-	int cpus_found = 0;
-	int cpu;
-	int apicid;
-
-	num_bits_set = cpus_weight(*cpumask);
-	/* Return id to all */
-	if (num_bits_set == nr_cpu_ids)
-		return cpu_to_logical_apicid(0);
-	/*
-	 * The cpus in the mask must all be on the apic cluster.  If are not
-	 * on the same apicid cluster return default value of TARGET_CPUS.
-	 */
-	cpu = first_cpu(*cpumask);
-	apicid = cpu_to_logical_apicid(cpu);
-	while (cpus_found < num_bits_set) {
-		if (cpu_isset(cpu, *cpumask)) {
-			int new_apicid = cpu_to_logical_apicid(cpu);
-			if (apicid_cluster(apicid) !=
-					apicid_cluster(new_apicid)){
-				printk ("%s: Not a valid mask!\n", __func__);
-				return cpu_to_logical_apicid(0);
-			}
-			apicid = new_apicid;
-			cpus_found++;
-		}
-		cpu++;
-	}
-	return apicid;
-}
-
-
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask,
-						  const struct cpumask *andmask)
-{
-	int apicid = cpu_to_logical_apicid(0);
-	cpumask_var_t cpumask;
-
-	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
-		return apicid;
-
-	cpumask_and(cpumask, inmask, andmask);
-	cpumask_and(cpumask, cpumask, cpu_online_mask);
-	apicid = cpu_mask_to_apicid(cpumask);
-
-	free_cpumask_var(cpumask);
-	return apicid;
-}
-
-static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
-#endif /* __ASM_ES7000_APIC_H */
diff --git a/arch/x86/include/asm/es7000/apicdef.h b/arch/x86/include/asm/es7000/apicdef.h
deleted file mode 100644
index 8b234a3cb851..000000000000
--- a/arch/x86/include/asm/es7000/apicdef.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_ES7000_APICDEF_H
-#define __ASM_ES7000_APICDEF_H
-
-#define		APIC_ID_MASK		(0xFF<<24)
-
-static inline unsigned get_apic_id(unsigned long x)
-{
-	return (((x)>>24)&0xFF);
-}
-
-#define		GET_APIC_ID(x)	get_apic_id(x)
-
-#endif
diff --git a/arch/x86/include/asm/es7000/ipi.h b/arch/x86/include/asm/es7000/ipi.h
deleted file mode 100644
index 7e8ed24d4b8a..000000000000
--- a/arch/x86/include/asm/es7000/ipi.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#ifndef __ASM_ES7000_IPI_H
-#define __ASM_ES7000_IPI_H
-
-void send_IPI_mask_sequence(const struct cpumask *mask, int vector);
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector);
-
-static inline void send_IPI_mask(const struct cpumask *mask, int vector)
-{
-	send_IPI_mask_sequence(mask, vector);
-}
-
-static inline void send_IPI_allbutself(int vector)
-{
-	send_IPI_mask_allbutself(cpu_online_mask, vector);
-}
-
-static inline void send_IPI_all(int vector)
-{
-	send_IPI_mask(cpu_online_mask, vector);
-}
-
-#endif /* __ASM_ES7000_IPI_H */
diff --git a/arch/x86/include/asm/es7000/mpparse.h b/arch/x86/include/asm/es7000/mpparse.h
deleted file mode 100644
index c1629b090ec2..000000000000
--- a/arch/x86/include/asm/es7000/mpparse.h
+++ /dev/null
@@ -1,29 +0,0 @@
-#ifndef __ASM_ES7000_MPPARSE_H
-#define __ASM_ES7000_MPPARSE_H
-
-#include <linux/acpi.h>
-
-extern int parse_unisys_oem (char *oemptr);
-extern int find_unisys_acpi_oem_table(unsigned long *oem_addr);
-extern void unmap_unisys_acpi_oem_table(unsigned long oem_addr);
-extern void setup_unisys(void);
-
-#ifndef CONFIG_X86_GENERICARCH
-extern int acpi_madt_oem_check(char *oem_id, char *oem_table_id);
-extern int mps_oem_check(struct mpc_table *mpc, char *oem, char *productid);
-#endif
-
-#ifdef CONFIG_ACPI
-
-static inline int es7000_check_dsdt(void)
-{
-	struct acpi_table_header header;
-
-	if (ACPI_SUCCESS(acpi_get_table_header(ACPI_SIG_DSDT, 0, &header)) &&
-	    !strncmp(header.oem_id, "UNISYS", 6))
-		return 1;
-	return 0;
-}
-#endif
-
-#endif /* __ASM_MACH_MPPARSE_H */
diff --git a/arch/x86/include/asm/es7000/wakecpu.h b/arch/x86/include/asm/es7000/wakecpu.h
deleted file mode 100644
index 78f0daaee436..000000000000
--- a/arch/x86/include/asm/es7000/wakecpu.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef __ASM_ES7000_WAKECPU_H
-#define __ASM_ES7000_WAKECPU_H
-
-#define TRAMPOLINE_PHYS_LOW	0x467
-#define TRAMPOLINE_PHYS_HIGH	0x469
-
-static inline void wait_for_init_deassert(atomic_t *deassert)
-{
-#ifndef CONFIG_ES7000_CLUSTERED_APIC
-	while (!atomic_read(deassert))
-		cpu_relax();
-#endif
-	return;
-}
-
-/* Nothing to do for most platforms, since cleared by the INIT cycle */
-static inline void smp_callin_clear_local_apic(void)
-{
-}
-
-static inline void store_NMI_vector(unsigned short *high, unsigned short *low)
-{
-}
-
-static inline void restore_NMI_vector(unsigned short *high, unsigned short *low)
-{
-}
-
-extern void __inquire_remote_apic(int apicid);
-
-static inline void inquire_remote_apic(int apicid)
-{
-	if (apic_verbosity >= APIC_DEBUG)
-		__inquire_remote_apic(apicid);
-}
-
-#endif /* __ASM_MACH_WAKECPU_H */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 23696d44a0af..63a79c77d220 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -1,11 +1,145 @@
+/*
+ * fixmap.h: compile-time virtual memory allocation
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 1998 Ingo Molnar
+ *
+ * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999
+ * x86_32 and x86_64 integration by Gustavo F. Padovan, February 2009
+ */
+
 #ifndef _ASM_X86_FIXMAP_H
 #define _ASM_X86_FIXMAP_H
 
+#ifndef __ASSEMBLY__
+#include <linux/kernel.h>
+#include <asm/acpi.h>
+#include <asm/apicdef.h>
+#include <asm/page.h>
+#ifdef CONFIG_X86_32
+#include <linux/threads.h>
+#include <asm/kmap_types.h>
+#else
+#include <asm/vsyscall.h>
+#endif
+
+/*
+ * We can't declare FIXADDR_TOP as variable for x86_64 because vsyscall
+ * uses fixmaps that relies on FIXADDR_TOP for proper address calculation.
+ * Because of this, FIXADDR_TOP x86 integration was left as later work.
+ */
+#ifdef CONFIG_X86_32
+/* used by vmalloc.c, vsyscall.lds.S.
+ *
+ * Leave one empty page between vmalloc'ed areas and
+ * the start of the fixmap.
+ */
+extern unsigned long __FIXADDR_TOP;
+#define FIXADDR_TOP	((unsigned long)__FIXADDR_TOP)
+
+#define FIXADDR_USER_START     __fix_to_virt(FIX_VDSO)
+#define FIXADDR_USER_END       __fix_to_virt(FIX_VDSO - 1)
+#else
+#define FIXADDR_TOP	(VSYSCALL_END-PAGE_SIZE)
+
+/* Only covers 32bit vsyscalls currently. Need another set for 64bit. */
+#define FIXADDR_USER_START	((unsigned long)VSYSCALL32_VSYSCALL)
+#define FIXADDR_USER_END	(FIXADDR_USER_START + PAGE_SIZE)
+#endif
+
+
+/*
+ * Here we define all the compile-time 'special' virtual
+ * addresses. The point is to have a constant address at
+ * compile time, but to set the physical address only
+ * in the boot process.
+ * for x86_32: We allocate these special addresses
+ * from the end of virtual memory (0xfffff000) backwards.
+ * Also this lets us do fail-safe vmalloc(), we
+ * can guarantee that these special addresses and
+ * vmalloc()-ed addresses never overlap.
+ *
+ * These 'compile-time allocated' memory buffers are
+ * fixed-size 4k pages (or larger if used with an increment
+ * higher than 1). Use set_fixmap(idx,phys) to associate
+ * physical memory with fixmap indices.
+ *
+ * TLB entries of such buffers will not be flushed across
+ * task switches.
+ */
+enum fixed_addresses {
 #ifdef CONFIG_X86_32
-# include "fixmap_32.h"
+	FIX_HOLE,
+	FIX_VDSO,
 #else
-# include "fixmap_64.h"
+	VSYSCALL_LAST_PAGE,
+	VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
+			    + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
+	VSYSCALL_HPET,
 #endif
+	FIX_DBGP_BASE,
+	FIX_EARLYCON_MEM_BASE,
+#ifdef CONFIG_X86_LOCAL_APIC
+	FIX_APIC_BASE,	/* local (CPU) APIC) -- required for SMP or not */
+#endif
+#ifdef CONFIG_X86_IO_APIC
+	FIX_IO_APIC_BASE_0,
+	FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1,
+#endif
+#ifdef CONFIG_X86_VISWS_APIC
+	FIX_CO_CPU,	/* Cobalt timer */
+	FIX_CO_APIC,	/* Cobalt APIC Redirection Table */
+	FIX_LI_PCIA,	/* Lithium PCI Bridge A */
+	FIX_LI_PCIB,	/* Lithium PCI Bridge B */
+#endif
+#ifdef CONFIG_X86_F00F_BUG
+	FIX_F00F_IDT,	/* Virtual mapping for IDT */
+#endif
+#ifdef CONFIG_X86_CYCLONE_TIMER
+	FIX_CYCLONE_TIMER, /*cyclone timer register*/
+#endif
+#ifdef CONFIG_X86_32
+	FIX_KMAP_BEGIN,	/* reserved pte's for temporary kernel mappings */
+	FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
+#ifdef CONFIG_PCI_MMCONFIG
+	FIX_PCIE_MCFG,
+#endif
+#endif
+#ifdef CONFIG_PARAVIRT
+	FIX_PARAVIRT_BOOTMAP,
+#endif
+	__end_of_permanent_fixed_addresses,
+#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
+	FIX_OHCI1394_BASE,
+#endif
+	/*
+	 * 256 temporary boot-time mappings, used by early_ioremap(),
+	 * before ioremap() is functional.
+	 *
+	 * We round it up to the next 256 pages boundary so that we
+	 * can have a single pgd entry and a single pte table:
+	 */
+#define NR_FIX_BTMAPS		64
+#define FIX_BTMAPS_SLOTS	4
+	FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 -
+			(__end_of_permanent_fixed_addresses & 255),
+	FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1,
+#ifdef CONFIG_X86_32
+	FIX_WP_TEST,
+#endif
+	__end_of_fixed_addresses
+};
+
+
+extern void reserve_top_address(unsigned long reserve);
+
+#define FIXADDR_SIZE	(__end_of_permanent_fixed_addresses << PAGE_SHIFT)
+#define FIXADDR_BOOT_SIZE	(__end_of_fixed_addresses << PAGE_SHIFT)
+#define FIXADDR_START		(FIXADDR_TOP - FIXADDR_SIZE)
+#define FIXADDR_BOOT_START	(FIXADDR_TOP - FIXADDR_BOOT_SIZE)
 
 extern int fixmaps_set;
 
@@ -69,4 +203,5 @@ static inline unsigned long virt_to_fix(const unsigned long vaddr)
 	BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
 	return __virt_to_fix(vaddr);
 }
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_FIXMAP_H */
diff --git a/arch/x86/include/asm/fixmap_32.h b/arch/x86/include/asm/fixmap_32.h
deleted file mode 100644
index c7115c1d7217..000000000000
--- a/arch/x86/include/asm/fixmap_32.h
+++ /dev/null
@@ -1,119 +0,0 @@
-/*
- * fixmap.h: compile-time virtual memory allocation
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Copyright (C) 1998 Ingo Molnar
- *
- * Support of BIGMEM added by Gerhard Wichert, Siemens AG, July 1999
- */
-
-#ifndef _ASM_X86_FIXMAP_32_H
-#define _ASM_X86_FIXMAP_32_H
-
-
-/* used by vmalloc.c, vsyscall.lds.S.
- *
- * Leave one empty page between vmalloc'ed areas and
- * the start of the fixmap.
- */
-extern unsigned long __FIXADDR_TOP;
-#define FIXADDR_USER_START     __fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END       __fix_to_virt(FIX_VDSO - 1)
-
-#ifndef __ASSEMBLY__
-#include <linux/kernel.h>
-#include <asm/acpi.h>
-#include <asm/apicdef.h>
-#include <asm/page.h>
-#include <linux/threads.h>
-#include <asm/kmap_types.h>
-
-/*
- * Here we define all the compile-time 'special' virtual
- * addresses. The point is to have a constant address at
- * compile time, but to set the physical address only
- * in the boot process. We allocate these special addresses
- * from the end of virtual memory (0xfffff000) backwards.
- * Also this lets us do fail-safe vmalloc(), we
- * can guarantee that these special addresses and
- * vmalloc()-ed addresses never overlap.
- *
- * these 'compile-time allocated' memory buffers are
- * fixed-size 4k pages. (or larger if used with an increment
- * highger than 1) use fixmap_set(idx,phys) to associate
- * physical memory with fixmap indices.
- *
- * TLB entries of such buffers will not be flushed across
- * task switches.
- */
-enum fixed_addresses {
-	FIX_HOLE,
-	FIX_VDSO,
-	FIX_DBGP_BASE,
-	FIX_EARLYCON_MEM_BASE,
-#ifdef CONFIG_X86_LOCAL_APIC
-	FIX_APIC_BASE,	/* local (CPU) APIC) -- required for SMP or not */
-#endif
-#ifdef CONFIG_X86_IO_APIC
-	FIX_IO_APIC_BASE_0,
-	FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1,
-#endif
-#ifdef CONFIG_X86_VISWS_APIC
-	FIX_CO_CPU,	/* Cobalt timer */
-	FIX_CO_APIC,	/* Cobalt APIC Redirection Table */
-	FIX_LI_PCIA,	/* Lithium PCI Bridge A */
-	FIX_LI_PCIB,	/* Lithium PCI Bridge B */
-#endif
-#ifdef CONFIG_X86_F00F_BUG
-	FIX_F00F_IDT,	/* Virtual mapping for IDT */
-#endif
-#ifdef CONFIG_X86_CYCLONE_TIMER
-	FIX_CYCLONE_TIMER, /*cyclone timer register*/
-#endif
-	FIX_KMAP_BEGIN,	/* reserved pte's for temporary kernel mappings */
-	FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
-#ifdef CONFIG_PCI_MMCONFIG
-	FIX_PCIE_MCFG,
-#endif
-#ifdef CONFIG_PARAVIRT
-	FIX_PARAVIRT_BOOTMAP,
-#endif
-	__end_of_permanent_fixed_addresses,
-	/*
-	 * 256 temporary boot-time mappings, used by early_ioremap(),
-	 * before ioremap() is functional.
-	 *
-	 * We round it up to the next 256 pages boundary so that we
-	 * can have a single pgd entry and a single pte table:
-	 */
-#define NR_FIX_BTMAPS		64
-#define FIX_BTMAPS_SLOTS	4
-	FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 -
-			(__end_of_permanent_fixed_addresses & 255),
-	FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1,
-	FIX_WP_TEST,
-#ifdef CONFIG_ACPI
-	FIX_ACPI_BEGIN,
-	FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
-#endif
-#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
-	FIX_OHCI1394_BASE,
-#endif
-	__end_of_fixed_addresses
-};
-
-extern void reserve_top_address(unsigned long reserve);
-
-
-#define FIXADDR_TOP	((unsigned long)__FIXADDR_TOP)
-
-#define __FIXADDR_SIZE	(__end_of_permanent_fixed_addresses << PAGE_SHIFT)
-#define __FIXADDR_BOOT_SIZE	(__end_of_fixed_addresses << PAGE_SHIFT)
-#define FIXADDR_START		(FIXADDR_TOP - __FIXADDR_SIZE)
-#define FIXADDR_BOOT_START	(FIXADDR_TOP - __FIXADDR_BOOT_SIZE)
-
-#endif /* !__ASSEMBLY__ */
-#endif /* _ASM_X86_FIXMAP_32_H */
diff --git a/arch/x86/include/asm/fixmap_64.h b/arch/x86/include/asm/fixmap_64.h
deleted file mode 100644
index 8be740977db8..000000000000
--- a/arch/x86/include/asm/fixmap_64.h
+++ /dev/null
@@ -1,79 +0,0 @@
-/*
- * fixmap.h: compile-time virtual memory allocation
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Copyright (C) 1998 Ingo Molnar
- */
-
-#ifndef _ASM_X86_FIXMAP_64_H
-#define _ASM_X86_FIXMAP_64_H
-
-#include <linux/kernel.h>
-#include <asm/acpi.h>
-#include <asm/apicdef.h>
-#include <asm/page.h>
-#include <asm/vsyscall.h>
-
-/*
- * Here we define all the compile-time 'special' virtual
- * addresses. The point is to have a constant address at
- * compile time, but to set the physical address only
- * in the boot process.
- *
- * These 'compile-time allocated' memory buffers are
- * fixed-size 4k pages (or larger if used with an increment
- * higher than 1). Use set_fixmap(idx,phys) to associate
- * physical memory with fixmap indices.
- *
- * TLB entries of such buffers will not be flushed across
- * task switches.
- */
-
-enum fixed_addresses {
-	VSYSCALL_LAST_PAGE,
-	VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
-			    + ((VSYSCALL_END-VSYSCALL_START) >> PAGE_SHIFT) - 1,
-	VSYSCALL_HPET,
-	FIX_DBGP_BASE,
-	FIX_EARLYCON_MEM_BASE,
-	FIX_APIC_BASE,	/* local (CPU) APIC) -- required for SMP or not */
-	FIX_IO_APIC_BASE_0,
-	FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1,
-#ifdef CONFIG_PARAVIRT
-	FIX_PARAVIRT_BOOTMAP,
-#endif
-	__end_of_permanent_fixed_addresses,
-#ifdef CONFIG_ACPI
-	FIX_ACPI_BEGIN,
-	FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
-#endif
-#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
-	FIX_OHCI1394_BASE,
-#endif
-	/*
-	 * 256 temporary boot-time mappings, used by early_ioremap(),
-	 * before ioremap() is functional.
-	 *
-	 * We round it up to the next 256 pages boundary so that we
-	 * can have a single pgd entry and a single pte table:
-	 */
-#define NR_FIX_BTMAPS		64
-#define FIX_BTMAPS_SLOTS	4
-	FIX_BTMAP_END = __end_of_permanent_fixed_addresses + 256 -
-			(__end_of_permanent_fixed_addresses & 255),
-	FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS*FIX_BTMAPS_SLOTS - 1,
-	__end_of_fixed_addresses
-};
-
-#define FIXADDR_TOP	(VSYSCALL_END-PAGE_SIZE)
-#define FIXADDR_SIZE	(__end_of_fixed_addresses << PAGE_SHIFT)
-#define FIXADDR_START	(FIXADDR_TOP - FIXADDR_SIZE)
-
-/* Only covers 32bit vsyscalls currently. Need another set for 64bit. */
-#define FIXADDR_USER_START	((unsigned long)VSYSCALL32_VSYSCALL)
-#define FIXADDR_USER_END	(FIXADDR_USER_START + PAGE_SIZE)
-
-#endif /* _ASM_X86_FIXMAP_64_H */
diff --git a/arch/x86/include/asm/genapic.h b/arch/x86/include/asm/genapic.h
index d48bee663a6f..4b8b98fa7f25 100644
--- a/arch/x86/include/asm/genapic.h
+++ b/arch/x86/include/asm/genapic.h
@@ -1,5 +1 @@
-#ifdef CONFIG_X86_32
-# include "genapic_32.h"
-#else
-# include "genapic_64.h"
-#endif
+#include <asm/apic.h>
diff --git a/arch/x86/include/asm/genapic_32.h b/arch/x86/include/asm/genapic_32.h
deleted file mode 100644
index 2c05b737ee22..000000000000
--- a/arch/x86/include/asm/genapic_32.h
+++ /dev/null
@@ -1,148 +0,0 @@
-#ifndef _ASM_X86_GENAPIC_32_H
-#define _ASM_X86_GENAPIC_32_H
-
-#include <asm/mpspec.h>
-#include <asm/atomic.h>
-
-/*
- * Generic APIC driver interface.
- *
- * An straight forward mapping of the APIC related parts of the
- * x86 subarchitecture interface to a dynamic object.
- *
- * This is used by the "generic" x86 subarchitecture.
- *
- * Copyright 2003 Andi Kleen, SuSE Labs.
- */
-
-struct mpc_bus;
-struct mpc_table;
-struct mpc_cpu;
-
-struct genapic {
-	char *name;
-	int (*probe)(void);
-
-	int (*apic_id_registered)(void);
-	const struct cpumask *(*target_cpus)(void);
-	int int_delivery_mode;
-	int int_dest_mode;
-	int ESR_DISABLE;
-	int apic_destination_logical;
-	unsigned long (*check_apicid_used)(physid_mask_t bitmap, int apicid);
-	unsigned long (*check_apicid_present)(int apicid);
-	int no_balance_irq;
-	int no_ioapic_check;
-	void (*init_apic_ldr)(void);
-	physid_mask_t (*ioapic_phys_id_map)(physid_mask_t map);
-
-	void (*setup_apic_routing)(void);
-	int (*multi_timer_check)(int apic, int irq);
-	int (*apicid_to_node)(int logical_apicid);
-	int (*cpu_to_logical_apicid)(int cpu);
-	int (*cpu_present_to_apicid)(int mps_cpu);
-	physid_mask_t (*apicid_to_cpu_present)(int phys_apicid);
-	void (*setup_portio_remap)(void);
-	int (*check_phys_apicid_present)(int boot_cpu_physical_apicid);
-	void (*enable_apic_mode)(void);
-	u32 (*phys_pkg_id)(u32 cpuid_apic, int index_msb);
-
-	/* mpparse */
-	/* When one of the next two hooks returns 1 the genapic
-	   is switched to this. Essentially they are additional probe
-	   functions. */
-	int (*mps_oem_check)(struct mpc_table *mpc, char *oem,
-			     char *productid);
-	int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-
-	unsigned (*get_apic_id)(unsigned long x);
-	unsigned long apic_id_mask;
-	unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask);
-	unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask,
-					       const struct cpumask *andmask);
-	void (*vector_allocation_domain)(int cpu, struct cpumask *retmask);
-
-#ifdef CONFIG_SMP
-	/* ipi */
-	void (*send_IPI_mask)(const struct cpumask *mask, int vector);
-	void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
-					 int vector);
-	void (*send_IPI_allbutself)(int vector);
-	void (*send_IPI_all)(int vector);
-#endif
-	int (*wakeup_cpu)(int apicid, unsigned long start_eip);
-	int trampoline_phys_low;
-	int trampoline_phys_high;
-	void (*wait_for_init_deassert)(atomic_t *deassert);
-	void (*smp_callin_clear_local_apic)(void);
-	void (*store_NMI_vector)(unsigned short *high, unsigned short *low);
-	void (*restore_NMI_vector)(unsigned short *high, unsigned short *low);
-	void (*inquire_remote_apic)(int apicid);
-};
-
-#define APICFUNC(x) .x = x,
-
-/* More functions could be probably marked IPIFUNC and save some space
-   in UP GENERICARCH kernels, but I don't have the nerve right now
-   to untangle this mess. -AK  */
-#ifdef CONFIG_SMP
-#define IPIFUNC(x) APICFUNC(x)
-#else
-#define IPIFUNC(x)
-#endif
-
-#define APIC_INIT(aname, aprobe)			\
-{							\
-	.name = aname,					\
-	.probe = aprobe,				\
-	.int_delivery_mode = INT_DELIVERY_MODE,		\
-	.int_dest_mode = INT_DEST_MODE,			\
-	.no_balance_irq = NO_BALANCE_IRQ,		\
-	.ESR_DISABLE = esr_disable,			\
-	.apic_destination_logical = APIC_DEST_LOGICAL,	\
-	APICFUNC(apic_id_registered)			\
-	APICFUNC(target_cpus)				\
-	APICFUNC(check_apicid_used)			\
-	APICFUNC(check_apicid_present)			\
-	APICFUNC(init_apic_ldr)				\
-	APICFUNC(ioapic_phys_id_map)			\
-	APICFUNC(setup_apic_routing)			\
-	APICFUNC(multi_timer_check)			\
-	APICFUNC(apicid_to_node)			\
-	APICFUNC(cpu_to_logical_apicid)			\
-	APICFUNC(cpu_present_to_apicid)			\
-	APICFUNC(apicid_to_cpu_present)			\
-	APICFUNC(setup_portio_remap)			\
-	APICFUNC(check_phys_apicid_present)		\
-	APICFUNC(mps_oem_check)				\
-	APICFUNC(get_apic_id)				\
-	.apic_id_mask = APIC_ID_MASK,			\
-	APICFUNC(cpu_mask_to_apicid)			\
-	APICFUNC(cpu_mask_to_apicid_and)		\
-	APICFUNC(vector_allocation_domain)		\
-	APICFUNC(acpi_madt_oem_check)			\
-	IPIFUNC(send_IPI_mask)				\
-	IPIFUNC(send_IPI_allbutself)			\
-	IPIFUNC(send_IPI_all)				\
-	APICFUNC(enable_apic_mode)			\
-	APICFUNC(phys_pkg_id)				\
-	.trampoline_phys_low = TRAMPOLINE_PHYS_LOW,		\
-	.trampoline_phys_high = TRAMPOLINE_PHYS_HIGH,		\
-	APICFUNC(wait_for_init_deassert)		\
-	APICFUNC(smp_callin_clear_local_apic)		\
-	APICFUNC(store_NMI_vector)			\
-	APICFUNC(restore_NMI_vector)			\
-	APICFUNC(inquire_remote_apic)			\
-}
-
-extern struct genapic *genapic;
-extern void es7000_update_genapic_to_cluster(void);
-
-enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
-#define get_uv_system_type()		UV_NONE
-#define is_uv_system()			0
-#define uv_wakeup_secondary(a, b)	1
-#define uv_system_init()		do {} while (0)
-
-
-#endif /* _ASM_X86_GENAPIC_32_H */
diff --git a/arch/x86/include/asm/genapic_64.h b/arch/x86/include/asm/genapic_64.h
deleted file mode 100644
index adf32fb56aa6..000000000000
--- a/arch/x86/include/asm/genapic_64.h
+++ /dev/null
@@ -1,66 +0,0 @@
-#ifndef _ASM_X86_GENAPIC_64_H
-#define _ASM_X86_GENAPIC_64_H
-
-#include <linux/cpumask.h>
-
-/*
- * Copyright 2004 James Cleverdon, IBM.
- * Subject to the GNU Public License, v.2
- *
- * Generic APIC sub-arch data struct.
- *
- * Hacked for x86-64 by James Cleverdon from i386 architecture code by
- * Martin Bligh, Andi Kleen, James Bottomley, John Stultz, and
- * James Cleverdon.
- */
-
-struct genapic {
-	char *name;
-	int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-	u32 int_delivery_mode;
-	u32 int_dest_mode;
-	int (*apic_id_registered)(void);
-	const struct cpumask *(*target_cpus)(void);
-	void (*vector_allocation_domain)(int cpu, struct cpumask *retmask);
-	void (*init_apic_ldr)(void);
-	/* ipi */
-	void (*send_IPI_mask)(const struct cpumask *mask, int vector);
-	void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
-					 int vector);
-	void (*send_IPI_allbutself)(int vector);
-	void (*send_IPI_all)(int vector);
-	void (*send_IPI_self)(int vector);
-	/* */
-	unsigned int (*cpu_mask_to_apicid)(const struct cpumask *cpumask);
-	unsigned int (*cpu_mask_to_apicid_and)(const struct cpumask *cpumask,
-					       const struct cpumask *andmask);
-	unsigned int (*phys_pkg_id)(int index_msb);
-	unsigned int (*get_apic_id)(unsigned long x);
-	unsigned long (*set_apic_id)(unsigned int id);
-	unsigned long apic_id_mask;
-	/* wakeup_secondary_cpu */
-	int (*wakeup_cpu)(int apicid, unsigned long start_eip);
-};
-
-extern struct genapic *genapic;
-
-extern struct genapic apic_flat;
-extern struct genapic apic_physflat;
-extern struct genapic apic_x2apic_cluster;
-extern struct genapic apic_x2apic_phys;
-extern int acpi_madt_oem_check(char *, char *);
-
-extern void apic_send_IPI_self(int vector);
-enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
-extern enum uv_system_type get_uv_system_type(void);
-extern int is_uv_system(void);
-
-extern struct genapic apic_x2apic_uv_x;
-DECLARE_PER_CPU(int, x2apic_extra_bits);
-extern void uv_cpu_init(void);
-extern void uv_system_init(void);
-extern int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip);
-
-extern void setup_apic_routing(void);
-
-#endif /* _ASM_X86_GENAPIC_64_H */
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 000787df66e6..176f058e7159 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -1,11 +1,52 @@
-#ifdef CONFIG_X86_32
-# include "hardirq_32.h"
-#else
-# include "hardirq_64.h"
+#ifndef _ASM_X86_HARDIRQ_H
+#define _ASM_X86_HARDIRQ_H
+
+#include <linux/threads.h>
+#include <linux/irq.h>
+
+typedef struct {
+	unsigned int __softirq_pending;
+	unsigned int __nmi_count;	/* arch dependent */
+	unsigned int irq0_irqs;
+#ifdef CONFIG_X86_LOCAL_APIC
+	unsigned int apic_timer_irqs;	/* arch dependent */
+	unsigned int irq_spurious_count;
+#endif
+#ifdef CONFIG_SMP
+	unsigned int irq_resched_count;
+	unsigned int irq_call_count;
+	unsigned int irq_tlb_count;
+#endif
+#ifdef CONFIG_X86_MCE
+	unsigned int irq_thermal_count;
+# ifdef CONFIG_X86_64
+	unsigned int irq_threshold_count;
+# endif
 #endif
+} ____cacheline_aligned irq_cpustat_t;
+
+DECLARE_PER_CPU(irq_cpustat_t, irq_stat);
+
+/* We can have at most NR_VECTORS irqs routed to a cpu at a time */
+#define MAX_HARDIRQS_PER_CPU NR_VECTORS
+
+#define __ARCH_IRQ_STAT
+
+#define inc_irq_stat(member)	percpu_add(irq_stat.member, 1)
+
+#define local_softirq_pending()	percpu_read(irq_stat.__softirq_pending)
+
+#define __ARCH_SET_SOFTIRQ_PENDING
+
+#define set_softirq_pending(x)	percpu_write(irq_stat.__softirq_pending, (x))
+#define or_softirq_pending(x)	percpu_or(irq_stat.__softirq_pending, (x))
+
+extern void ack_bad_irq(unsigned int irq);
 
 extern u64 arch_irq_stat_cpu(unsigned int cpu);
 #define arch_irq_stat_cpu	arch_irq_stat_cpu
 
 extern u64 arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
+
+#endif /* _ASM_X86_HARDIRQ_H */
diff --git a/arch/x86/include/asm/hardirq_32.h b/arch/x86/include/asm/hardirq_32.h
deleted file mode 100644
index cf7954d1405f..000000000000
--- a/arch/x86/include/asm/hardirq_32.h
+++ /dev/null
@@ -1,30 +0,0 @@
-#ifndef _ASM_X86_HARDIRQ_32_H
-#define _ASM_X86_HARDIRQ_32_H
-
-#include <linux/threads.h>
-#include <linux/irq.h>
-
-typedef struct {
-	unsigned int __softirq_pending;
-	unsigned long idle_timestamp;
-	unsigned int __nmi_count;	/* arch dependent */
-	unsigned int apic_timer_irqs;	/* arch dependent */
-	unsigned int irq0_irqs;
-	unsigned int irq_resched_count;
-	unsigned int irq_call_count;
-	unsigned int irq_tlb_count;
-	unsigned int irq_thermal_count;
-	unsigned int irq_spurious_count;
-} ____cacheline_aligned irq_cpustat_t;
-
-DECLARE_PER_CPU(irq_cpustat_t, irq_stat);
-
-#define __ARCH_IRQ_STAT
-#define __IRQ_STAT(cpu, member) (per_cpu(irq_stat, cpu).member)
-
-#define inc_irq_stat(member)	(__get_cpu_var(irq_stat).member++)
-
-void ack_bad_irq(unsigned int irq);
-#include <linux/irq_cpustat.h>
-
-#endif /* _ASM_X86_HARDIRQ_32_H */
diff --git a/arch/x86/include/asm/hardirq_64.h b/arch/x86/include/asm/hardirq_64.h
deleted file mode 100644
index b5a6b5d56704..000000000000
--- a/arch/x86/include/asm/hardirq_64.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef _ASM_X86_HARDIRQ_64_H
-#define _ASM_X86_HARDIRQ_64_H
-
-#include <linux/threads.h>
-#include <linux/irq.h>
-#include <asm/pda.h>
-#include <asm/apic.h>
-
-/* We can have at most NR_VECTORS irqs routed to a cpu at a time */
-#define MAX_HARDIRQS_PER_CPU NR_VECTORS
-
-#define __ARCH_IRQ_STAT 1
-
-#define inc_irq_stat(member)	add_pda(member, 1)
-
-#define local_softirq_pending() read_pda(__softirq_pending)
-
-#define __ARCH_SET_SOFTIRQ_PENDING 1
-
-#define set_softirq_pending(x) write_pda(__softirq_pending, (x))
-#define or_softirq_pending(x)  or_pda(__softirq_pending, (x))
-
-extern void ack_bad_irq(unsigned int irq);
-
-#endif /* _ASM_X86_HARDIRQ_64_H */
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 8de644b6b959..370e1c83bb49 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -25,8 +25,6 @@
 #include <asm/irq.h>
 #include <asm/sections.h>
 
-#define platform_legacy_irq(irq)	((irq) < 16)
-
 /* Interrupt handlers registered during init_IRQ */
 extern void apic_timer_interrupt(void);
 extern void error_interrupt(void);
@@ -58,7 +56,7 @@ extern void make_8259A_irq(unsigned int irq);
 extern void init_8259A(int aeoi);
 
 /* IOAPIC */
-#define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs))
+#define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
 extern unsigned long io_apic_irqs;
 
 extern void init_VISWS_APIC_irqs(void);
@@ -67,15 +65,7 @@ extern void disable_IO_APIC(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn);
 extern void setup_ioapic_dest(void);
 
-#ifdef CONFIG_X86_64
 extern void enable_IO_APIC(void);
-#endif
-
-/* IPI functions */
-#ifdef CONFIG_X86_32
-extern void send_IPI_self(int vector);
-#endif
-extern void send_IPI(int dest, int vector);
 
 /* Statistics */
 extern atomic_t irq_err_count;
@@ -84,21 +74,11 @@ extern atomic_t irq_mis_count;
 /* EISA */
 extern void eisa_set_level_irq(unsigned int irq);
 
-/* Voyager functions */
-extern asmlinkage void vic_cpi_interrupt(void);
-extern asmlinkage void vic_sys_interrupt(void);
-extern asmlinkage void vic_cmn_interrupt(void);
-extern asmlinkage void qic_timer_interrupt(void);
-extern asmlinkage void qic_invalidate_interrupt(void);
-extern asmlinkage void qic_reschedule_interrupt(void);
-extern asmlinkage void qic_enable_irq_interrupt(void);
-extern asmlinkage void qic_call_function_interrupt(void);
-
 /* SMP */
 extern void smp_apic_timer_interrupt(struct pt_regs *);
 extern void smp_spurious_interrupt(struct pt_regs *);
 extern void smp_error_interrupt(struct pt_regs *);
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 extern void smp_reschedule_interrupt(struct pt_regs *);
 extern void smp_call_function_interrupt(struct pt_regs *);
 extern void smp_call_function_single_interrupt(struct pt_regs *);
diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index 58d7091eeb1f..1a99e6c092af 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -60,4 +60,8 @@ extern struct irq_chip i8259A_chip;
 extern void mask_8259A(void);
 extern void unmask_8259A(void);
 
+#ifdef CONFIG_X86_32
+extern void init_ISA_irqs(void);
+#endif
+
 #endif /* _ASM_X86_I8259_H */
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 1dbbdf4be9b4..e5383e3d2f8c 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -5,6 +5,7 @@
 
 #include <linux/compiler.h>
 #include <asm-generic/int-ll64.h>
+#include <asm/page.h>
 
 #define build_mmio_read(name, size, type, reg, barrier) \
 static inline type name(const volatile void __iomem *addr) \
@@ -80,6 +81,98 @@ static inline void writeq(__u64 val, volatile void __iomem *addr)
 #define readq			readq
 #define writeq			writeq
 
+/**
+ *	virt_to_phys	-	map virtual addresses to physical
+ *	@address: address to remap
+ *
+ *	The returned physical address is the physical (CPU) mapping for
+ *	the memory address given. It is only valid to use this function on
+ *	addresses directly mapped or allocated via kmalloc.
+ *
+ *	This function does not give bus mappings for DMA transfers. In
+ *	almost all conceivable cases a device driver should not be using
+ *	this function
+ */
+
+static inline phys_addr_t virt_to_phys(volatile void *address)
+{
+	return __pa(address);
+}
+
+/**
+ *	phys_to_virt	-	map physical address to virtual
+ *	@address: address to remap
+ *
+ *	The returned virtual address is a current CPU mapping for
+ *	the memory address given. It is only valid to use this function on
+ *	addresses that have a kernel mapping
+ *
+ *	This function does not handle bus mappings for DMA transfers. In
+ *	almost all conceivable cases a device driver should not be using
+ *	this function
+ */
+
+static inline void *phys_to_virt(phys_addr_t address)
+{
+	return __va(address);
+}
+
+/*
+ * Change "struct page" to physical address.
+ */
+#define page_to_phys(page)    ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT)
+
+/*
+ * ISA I/O bus memory addresses are 1:1 with the physical address.
+ * However, we truncate the address to unsigned int to avoid undesirable
+ * promitions in legacy drivers.
+ */
+static inline unsigned int isa_virt_to_bus(volatile void *address)
+{
+	return (unsigned int)virt_to_phys(address);
+}
+#define isa_page_to_bus(page)	((unsigned int)page_to_phys(page))
+#define isa_bus_to_virt		phys_to_virt
+
+/*
+ * However PCI ones are not necessarily 1:1 and therefore these interfaces
+ * are forbidden in portable PCI drivers.
+ *
+ * Allow them on x86 for legacy drivers, though.
+ */
+#define virt_to_bus virt_to_phys
+#define bus_to_virt phys_to_virt
+
+/**
+ * ioremap     -   map bus memory into CPU space
+ * @offset:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * If the area you are trying to map is a PCI BAR you should have a
+ * look at pci_iomap().
+ */
+extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
+				unsigned long prot_val);
+
+/*
+ * The default ioremap() behavior is non-cached:
+ */
+static inline void __iomem *ioremap(resource_size_t offset, unsigned long size)
+{
+	return ioremap_nocache(offset, size);
+}
+
+extern void iounmap(volatile void __iomem *addr);
+
+
 #ifdef CONFIG_X86_32
 # include "io_32.h"
 #else
@@ -91,7 +184,7 @@ extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
 
 extern int ioremap_change_attr(unsigned long vaddr, unsigned long size,
 				unsigned long prot_val);
-extern void __iomem *ioremap_wc(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_wc(resource_size_t offset, unsigned long size);
 
 /*
  * early_ioremap() and early_iounmap() are for temporary early boot-time
@@ -103,7 +196,7 @@ extern void early_ioremap_reset(void);
 extern void __iomem *early_ioremap(unsigned long offset, unsigned long size);
 extern void __iomem *early_memremap(unsigned long offset, unsigned long size);
 extern void early_iounmap(void __iomem *addr, unsigned long size);
-extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);
 
+#define IO_SPACE_LIMIT 0xffff
 
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/io_32.h b/arch/x86/include/asm/io_32.h
index d8e242e1b396..a299900f5920 100644
--- a/arch/x86/include/asm/io_32.h
+++ b/arch/x86/include/asm/io_32.h
@@ -37,8 +37,6 @@
   *  - Arnaldo Carvalho de Melo <acme@conectiva.com.br>
   */
 
-#define IO_SPACE_LIMIT 0xffff
-
 #define XQUAD_PORTIO_BASE 0xfe400000
 #define XQUAD_PORTIO_QUAD 0x40000  /* 256k per quad. */
 
@@ -53,92 +51,6 @@
  */
 #define xlate_dev_kmem_ptr(p)	p
 
-/**
- *	virt_to_phys	-	map virtual addresses to physical
- *	@address: address to remap
- *
- *	The returned physical address is the physical (CPU) mapping for
- *	the memory address given. It is only valid to use this function on
- *	addresses directly mapped or allocated via kmalloc.
- *
- *	This function does not give bus mappings for DMA transfers. In
- *	almost all conceivable cases a device driver should not be using
- *	this function
- */
-
-static inline unsigned long virt_to_phys(volatile void *address)
-{
-	return __pa(address);
-}
-
-/**
- *	phys_to_virt	-	map physical address to virtual
- *	@address: address to remap
- *
- *	The returned virtual address is a current CPU mapping for
- *	the memory address given. It is only valid to use this function on
- *	addresses that have a kernel mapping
- *
- *	This function does not handle bus mappings for DMA transfers. In
- *	almost all conceivable cases a device driver should not be using
- *	this function
- */
-
-static inline void *phys_to_virt(unsigned long address)
-{
-	return __va(address);
-}
-
-/*
- * Change "struct page" to physical address.
- */
-#define page_to_phys(page)    ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT)
-
-/**
- * ioremap     -   map bus memory into CPU space
- * @offset:    bus address of the memory
- * @size:      size of the resource to map
- *
- * ioremap performs a platform specific sequence of operations to
- * make bus memory CPU accessible via the readb/readw/readl/writeb/
- * writew/writel functions and the other mmio helpers. The returned
- * address is not guaranteed to be usable directly as a virtual
- * address.
- *
- * If the area you are trying to map is a PCI BAR you should have a
- * look at pci_iomap().
- */
-extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
-extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
-extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
-				unsigned long prot_val);
-
-/*
- * The default ioremap() behavior is non-cached:
- */
-static inline void __iomem *ioremap(resource_size_t offset, unsigned long size)
-{
-	return ioremap_nocache(offset, size);
-}
-
-extern void iounmap(volatile void __iomem *addr);
-
-/*
- * ISA I/O bus memory addresses are 1:1 with the physical address.
- */
-#define isa_virt_to_bus virt_to_phys
-#define isa_page_to_bus page_to_phys
-#define isa_bus_to_virt phys_to_virt
-
-/*
- * However PCI ones are not necessarily 1:1 and therefore these interfaces
- * are forbidden in portable PCI drivers.
- *
- * Allow them on x86 for legacy drivers, though.
- */
-#define virt_to_bus virt_to_phys
-#define bus_to_virt phys_to_virt
-
 static inline void
 memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
diff --git a/arch/x86/include/asm/io_64.h b/arch/x86/include/asm/io_64.h
index 563c16270ba6..244067893af4 100644
--- a/arch/x86/include/asm/io_64.h
+++ b/arch/x86/include/asm/io_64.h
@@ -136,73 +136,12 @@ __OUTS(b)
 __OUTS(w)
 __OUTS(l)
 
-#define IO_SPACE_LIMIT 0xffff
-
 #if defined(__KERNEL__) && defined(__x86_64__)
 
 #include <linux/vmalloc.h>
 
-#ifndef __i386__
-/*
- * Change virtual addresses to physical addresses and vv.
- * These are pretty trivial
- */
-static inline unsigned long virt_to_phys(volatile void *address)
-{
-	return __pa(address);
-}
-
-static inline void *phys_to_virt(unsigned long address)
-{
-	return __va(address);
-}
-#endif
-
-/*
- * Change "struct page" to physical address.
- */
-#define page_to_phys(page)    ((dma_addr_t)page_to_pfn(page) << PAGE_SHIFT)
-
 #include <asm-generic/iomap.h>
 
-/*
- * This one maps high address device memory and turns off caching for that area.
- * it's useful if some control registers are in such an area and write combining
- * or read caching is not desirable:
- */
-extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
-extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
-extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
-				unsigned long prot_val);
-
-/*
- * The default ioremap() behavior is non-cached:
- */
-static inline void __iomem *ioremap(resource_size_t offset, unsigned long size)
-{
-	return ioremap_nocache(offset, size);
-}
-
-extern void iounmap(volatile void __iomem *addr);
-
-extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);
-
-/*
- * ISA I/O bus memory addresses are 1:1 with the physical address.
- */
-#define isa_virt_to_bus virt_to_phys
-#define isa_page_to_bus page_to_phys
-#define isa_bus_to_virt phys_to_virt
-
-/*
- * However PCI ones are not necessarily 1:1 and therefore these interfaces
- * are forbidden in portable PCI drivers.
- *
- * Allow them on x86 for legacy drivers, though.
- */
-#define virt_to_bus virt_to_phys
-#define bus_to_virt phys_to_virt
-
 void __memcpy_fromio(void *, unsigned long, unsigned);
 void __memcpy_toio(unsigned long, const void *, unsigned);
 
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 7a1f44ac1f17..59cb4a1317b7 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -114,38 +114,16 @@ struct IR_IO_APIC_route_entry {
 extern int nr_ioapics;
 extern int nr_ioapic_registers[MAX_IO_APICS];
 
-/*
- * MP-BIOS irq configuration table structures:
- */
-
 #define MP_MAX_IOAPIC_PIN 127
 
-struct mp_config_ioapic {
-	unsigned long mp_apicaddr;
-	unsigned int mp_apicid;
-	unsigned char mp_type;
-	unsigned char mp_apicver;
-	unsigned char mp_flags;
-};
-
-struct mp_config_intsrc {
-	unsigned int mp_dstapic;
-	unsigned char mp_type;
-	unsigned char mp_irqtype;
-	unsigned short mp_irqflag;
-	unsigned char mp_srcbus;
-	unsigned char mp_srcbusirq;
-	unsigned char mp_dstirq;
-};
-
 /* I/O APIC entries */
-extern struct mp_config_ioapic mp_ioapics[MAX_IO_APICS];
+extern struct mpc_ioapic mp_ioapics[MAX_IO_APICS];
 
 /* # of MP IRQ source entries */
 extern int mp_irq_entries;
 
 /* MP IRQ source entries */
-extern struct mp_config_intsrc mp_irqs[MAX_IRQ_SOURCES];
+extern struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES];
 
 /* non-0 if default (table-less) MP configuration */
 extern int mpc_default_type;
@@ -165,15 +143,6 @@ extern int noioapicreroute;
 /* 1 if the timer IRQ uses the '8259A Virtual Wire' mode */
 extern int timer_through_8259;
 
-static inline void disable_ioapic_setup(void)
-{
-#ifdef CONFIG_PCI
-	noioapicquirk = 1;
-	noioapicreroute = -1;
-#endif
-	skip_ioapic_setup = 1;
-}
-
 /*
  * If we use the IO-APIC for IRQ routing, disable automatic
  * assignment of PCI IRQ's.
@@ -200,6 +169,12 @@ extern void reinit_intr_remapped_IO_APIC(int);
 
 extern void probe_nr_irqs_gsi(void);
 
+extern int setup_ioapic_entry(int apic, int irq,
+			      struct IO_APIC_route_entry *entry,
+			      unsigned int destination, int trigger,
+			      int polarity, int vector);
+extern void ioapic_write_entry(int apic, int pin,
+			       struct IO_APIC_route_entry e);
 #else  /* !CONFIG_X86_IO_APIC */
 #define io_apic_assign_pci_irqs 0
 static const int timer_through_8259 = 0;
diff --git a/arch/x86/include/asm/ipi.h b/arch/x86/include/asm/ipi.h
index c745a306f7d3..0b7228268a63 100644
--- a/arch/x86/include/asm/ipi.h
+++ b/arch/x86/include/asm/ipi.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_IPI_H
 #define _ASM_X86_IPI_H
 
+#ifdef CONFIG_X86_LOCAL_APIC
+
 /*
  * Copyright 2004 James Cleverdon, IBM.
  * Subject to the GNU Public License, v.2
@@ -55,8 +57,8 @@ static inline void __xapic_wait_icr_idle(void)
 		cpu_relax();
 }
 
-static inline void __send_IPI_shortcut(unsigned int shortcut, int vector,
-				       unsigned int dest)
+static inline void
+__default_send_IPI_shortcut(unsigned int shortcut, int vector, unsigned int dest)
 {
 	/*
 	 * Subtle. In the case of the 'never do double writes' workaround
@@ -87,8 +89,8 @@ static inline void __send_IPI_shortcut(unsigned int shortcut, int vector,
  * This is used to send an IPI with no shorthand notation (the destination is
  * specified in bits 56 to 63 of the ICR).
  */
-static inline void __send_IPI_dest_field(unsigned int mask, int vector,
-					 unsigned int dest)
+static inline void
+ __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
 {
 	unsigned long cfg;
 
@@ -117,41 +119,44 @@ static inline void __send_IPI_dest_field(unsigned int mask, int vector,
 	native_apic_mem_write(APIC_ICR, cfg);
 }
 
-static inline void send_IPI_mask_sequence(const struct cpumask *mask,
-					  int vector)
-{
-	unsigned long flags;
-	unsigned long query_cpu;
+extern void default_send_IPI_mask_sequence_phys(const struct cpumask *mask,
+						 int vector);
+extern void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
+							 int vector);
+extern void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
+							 int vector);
+extern void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
+							 int vector);
 
-	/*
-	 * Hack. The clustered APIC addressing mode doesn't allow us to send
-	 * to an arbitrary mask, so I do a unicast to each CPU instead.
-	 * - mbligh
-	 */
-	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask) {
-		__send_IPI_dest_field(per_cpu(x86_cpu_to_apicid, query_cpu),
-				      vector, APIC_DEST_PHYSICAL);
-	}
-	local_irq_restore(flags);
+/* Avoid include hell */
+#define NMI_VECTOR 0x02
+
+extern int no_broadcast;
+
+static inline void __default_local_send_IPI_allbutself(int vector)
+{
+	if (no_broadcast || vector == NMI_VECTOR)
+		apic->send_IPI_mask_allbutself(cpu_online_mask, vector);
+	else
+		__default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, apic->dest_logical);
 }
 
-static inline void send_IPI_mask_allbutself(const struct cpumask *mask,
-					    int vector)
+static inline void __default_local_send_IPI_all(int vector)
 {
-	unsigned long flags;
-	unsigned int query_cpu;
-	unsigned int this_cpu = smp_processor_id();
-
-	/* See Hack comment above */
-
-	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask)
-		if (query_cpu != this_cpu)
-			__send_IPI_dest_field(
-				per_cpu(x86_cpu_to_apicid, query_cpu),
-				vector, APIC_DEST_PHYSICAL);
-	local_irq_restore(flags);
+	if (no_broadcast || vector == NMI_VECTOR)
+		apic->send_IPI_mask(cpu_online_mask, vector);
+	else
+		__default_send_IPI_shortcut(APIC_DEST_ALLINC, vector, apic->dest_logical);
 }
 
+#ifdef CONFIG_X86_32
+extern void default_send_IPI_mask_logical(const struct cpumask *mask,
+						 int vector);
+extern void default_send_IPI_allbutself(int vector);
+extern void default_send_IPI_all(int vector);
+extern void default_send_IPI_self(int vector);
+#endif
+
+#endif
+
 #endif /* _ASM_X86_IPI_H */
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index 592688ed04d3..107eb2196691 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -36,9 +36,11 @@ static inline int irq_canonicalize(int irq)
 extern void fixup_irqs(void);
 #endif
 
-extern unsigned int do_IRQ(struct pt_regs *regs);
 extern void init_IRQ(void);
 extern void native_init_IRQ(void);
+extern bool handle_irq(unsigned irq, struct pt_regs *regs);
+
+extern unsigned int do_IRQ(struct pt_regs *regs);
 
 /* Interrupt vector management */
 extern DECLARE_BITMAP(used_vectors, NR_VECTORS);
diff --git a/arch/x86/include/asm/irq_regs.h b/arch/x86/include/asm/irq_regs.h
index 89c898ab298b..77843225b7ea 100644
--- a/arch/x86/include/asm/irq_regs.h
+++ b/arch/x86/include/asm/irq_regs.h
@@ -1,5 +1,31 @@
-#ifdef CONFIG_X86_32
-# include "irq_regs_32.h"
-#else
-# include "irq_regs_64.h"
-#endif
+/*
+ * Per-cpu current frame pointer - the location of the last exception frame on
+ * the stack, stored in the per-cpu area.
+ *
+ * Jeremy Fitzhardinge <jeremy@goop.org>
+ */
+#ifndef _ASM_X86_IRQ_REGS_H
+#define _ASM_X86_IRQ_REGS_H
+
+#include <asm/percpu.h>
+
+#define ARCH_HAS_OWN_IRQ_REGS
+
+DECLARE_PER_CPU(struct pt_regs *, irq_regs);
+
+static inline struct pt_regs *get_irq_regs(void)
+{
+	return percpu_read(irq_regs);
+}
+
+static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
+{
+	struct pt_regs *old_regs;
+
+	old_regs = get_irq_regs();
+	percpu_write(irq_regs, new_regs);
+
+	return old_regs;
+}
+
+#endif /* _ASM_X86_IRQ_REGS_32_H */
diff --git a/arch/x86/include/asm/irq_regs_32.h b/arch/x86/include/asm/irq_regs_32.h
deleted file mode 100644
index 86afd7473457..000000000000
--- a/arch/x86/include/asm/irq_regs_32.h
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
- * Per-cpu current frame pointer - the location of the last exception frame on
- * the stack, stored in the per-cpu area.
- *
- * Jeremy Fitzhardinge <jeremy@goop.org>
- */
-#ifndef _ASM_X86_IRQ_REGS_32_H
-#define _ASM_X86_IRQ_REGS_32_H
-
-#include <asm/percpu.h>
-
-#define ARCH_HAS_OWN_IRQ_REGS
-
-DECLARE_PER_CPU(struct pt_regs *, irq_regs);
-
-static inline struct pt_regs *get_irq_regs(void)
-{
-	return x86_read_percpu(irq_regs);
-}
-
-static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
-{
-	struct pt_regs *old_regs;
-
-	old_regs = get_irq_regs();
-	x86_write_percpu(irq_regs, new_regs);
-
-	return old_regs;
-}
-
-#endif /* _ASM_X86_IRQ_REGS_32_H */
diff --git a/arch/x86/include/asm/irq_regs_64.h b/arch/x86/include/asm/irq_regs_64.h
deleted file mode 100644
index 3dd9c0b70270..000000000000
--- a/arch/x86/include/asm/irq_regs_64.h
+++ /dev/null
@@ -1 +0,0 @@
-#include <asm-generic/irq_regs.h>
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index f7ff65032b9d..8a285f356f8a 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -1,47 +1,69 @@
 #ifndef _ASM_X86_IRQ_VECTORS_H
 #define _ASM_X86_IRQ_VECTORS_H
 
-#include <linux/threads.h>
+/*
+ * Linux IRQ vector layout.
+ *
+ * There are 256 IDT entries (per CPU - each entry is 8 bytes) which can
+ * be defined by Linux. They are used as a jump table by the CPU when a
+ * given vector is triggered - by a CPU-external, CPU-internal or
+ * software-triggered event.
+ *
+ * Linux sets the kernel code address each entry jumps to early during
+ * bootup, and never changes them. This is the general layout of the
+ * IDT entries:
+ *
+ *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
+ *  Vectors  32 ... 127 : device interrupts
+ *  Vector  128         : legacy int80 syscall interface
+ *  Vectors 129 ... 237 : device interrupts
+ *  Vectors 238 ... 255 : special interrupts
+ *
+ * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
+ *
+ * This file enumerates the exact layout of them:
+ */
 
-#define NMI_VECTOR		0x02
+#define NMI_VECTOR			0x02
 
 /*
  * IDT vectors usable for external interrupt sources start
  * at 0x20:
  */
-#define FIRST_EXTERNAL_VECTOR	0x20
+#define FIRST_EXTERNAL_VECTOR		0x20
 
 #ifdef CONFIG_X86_32
-# define SYSCALL_VECTOR		0x80
+# define SYSCALL_VECTOR			0x80
 #else
-# define IA32_SYSCALL_VECTOR	0x80
+# define IA32_SYSCALL_VECTOR		0x80
 #endif
 
 /*
  * Reserve the lowest usable priority level 0x20 - 0x2f for triggering
  * cleanup after irq migration.
  */
-#define IRQ_MOVE_CLEANUP_VECTOR	FIRST_EXTERNAL_VECTOR
+#define IRQ_MOVE_CLEANUP_VECTOR		FIRST_EXTERNAL_VECTOR
 
 /*
  * Vectors 0x30-0x3f are used for ISA interrupts.
  */
-#define IRQ0_VECTOR		(FIRST_EXTERNAL_VECTOR + 0x10)
-#define IRQ1_VECTOR		(IRQ0_VECTOR + 1)
-#define IRQ2_VECTOR		(IRQ0_VECTOR + 2)
-#define IRQ3_VECTOR		(IRQ0_VECTOR + 3)
-#define IRQ4_VECTOR		(IRQ0_VECTOR + 4)
-#define IRQ5_VECTOR		(IRQ0_VECTOR + 5)
-#define IRQ6_VECTOR		(IRQ0_VECTOR + 6)
-#define IRQ7_VECTOR		(IRQ0_VECTOR + 7)
-#define IRQ8_VECTOR		(IRQ0_VECTOR + 8)
-#define IRQ9_VECTOR		(IRQ0_VECTOR + 9)
-#define IRQ10_VECTOR		(IRQ0_VECTOR + 10)
-#define IRQ11_VECTOR		(IRQ0_VECTOR + 11)
-#define IRQ12_VECTOR		(IRQ0_VECTOR + 12)
-#define IRQ13_VECTOR		(IRQ0_VECTOR + 13)
-#define IRQ14_VECTOR		(IRQ0_VECTOR + 14)
-#define IRQ15_VECTOR		(IRQ0_VECTOR + 15)
+#define IRQ0_VECTOR			(FIRST_EXTERNAL_VECTOR + 0x10)
+
+#define IRQ1_VECTOR			(IRQ0_VECTOR +  1)
+#define IRQ2_VECTOR			(IRQ0_VECTOR +  2)
+#define IRQ3_VECTOR			(IRQ0_VECTOR +  3)
+#define IRQ4_VECTOR			(IRQ0_VECTOR +  4)
+#define IRQ5_VECTOR			(IRQ0_VECTOR +  5)
+#define IRQ6_VECTOR			(IRQ0_VECTOR +  6)
+#define IRQ7_VECTOR			(IRQ0_VECTOR +  7)
+#define IRQ8_VECTOR			(IRQ0_VECTOR +  8)
+#define IRQ9_VECTOR			(IRQ0_VECTOR +  9)
+#define IRQ10_VECTOR			(IRQ0_VECTOR + 10)
+#define IRQ11_VECTOR			(IRQ0_VECTOR + 11)
+#define IRQ12_VECTOR			(IRQ0_VECTOR + 12)
+#define IRQ13_VECTOR			(IRQ0_VECTOR + 13)
+#define IRQ14_VECTOR			(IRQ0_VECTOR + 14)
+#define IRQ15_VECTOR			(IRQ0_VECTOR + 15)
 
 /*
  * Special IRQ vectors used by the SMP architecture, 0xf0-0xff
@@ -49,119 +71,98 @@
  *  some of the following vectors are 'rare', they are merged
  *  into a single vector (CALL_FUNCTION_VECTOR) to save vector space.
  *  TLB, reschedule and local APIC vectors are performance-critical.
- *
- *  Vectors 0xf0-0xfa are free (reserved for future Linux use).
  */
-#ifdef CONFIG_X86_32
-
-# define SPURIOUS_APIC_VECTOR		0xff
-# define ERROR_APIC_VECTOR		0xfe
-# define INVALIDATE_TLB_VECTOR		0xfd
-# define RESCHEDULE_VECTOR		0xfc
-# define CALL_FUNCTION_VECTOR		0xfb
-# define CALL_FUNCTION_SINGLE_VECTOR	0xfa
-# define THERMAL_APIC_VECTOR		0xf0
-
-#else
 
 #define SPURIOUS_APIC_VECTOR		0xff
+/*
+ * Sanity check
+ */
+#if ((SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F)
+# error SPURIOUS_APIC_VECTOR definition error
+#endif
+
 #define ERROR_APIC_VECTOR		0xfe
 #define RESCHEDULE_VECTOR		0xfd
 #define CALL_FUNCTION_VECTOR		0xfc
 #define CALL_FUNCTION_SINGLE_VECTOR	0xfb
 #define THERMAL_APIC_VECTOR		0xfa
-#define THRESHOLD_APIC_VECTOR		0xf9
-#define UV_BAU_MESSAGE			0xf8
-#define INVALIDATE_TLB_VECTOR_END	0xf7
-#define INVALIDATE_TLB_VECTOR_START	0xf0	/* f0-f7 used for TLB flush */
-
-#define NUM_INVALIDATE_TLB_VECTORS	8
 
+#ifdef CONFIG_X86_32
+/* 0xf8 - 0xf9 : free */
+#else
+# define THRESHOLD_APIC_VECTOR		0xf9
+# define UV_BAU_MESSAGE			0xf8
 #endif
 
+/* f0-f7 used for spreading out TLB flushes: */
+#define INVALIDATE_TLB_VECTOR_END	0xf7
+#define INVALIDATE_TLB_VECTOR_START	0xf0
+#define NUM_INVALIDATE_TLB_VECTORS	   8
+
 /*
  * Local APIC timer IRQ vector is on a different priority level,
  * to work around the 'lost local interrupt if more than 2 IRQ
  * sources per level' errata.
  */
-#define LOCAL_TIMER_VECTOR	0xef
+#define LOCAL_TIMER_VECTOR		0xef
+
+/*
+ * Performance monitoring interrupt vector:
+ */
+#define LOCAL_PERF_VECTOR		0xee
 
 /*
  * First APIC vector available to drivers: (vectors 0x30-0xee) we
  * start at 0x31(0x41) to spread out vectors evenly between priority
  * levels. (0x80 is the syscall vector)
  */
-#define FIRST_DEVICE_VECTOR	(IRQ15_VECTOR + 2)
-
-#define NR_VECTORS		256
+#define FIRST_DEVICE_VECTOR		(IRQ15_VECTOR + 2)
 
-#define FPU_IRQ			13
+#define NR_VECTORS			 256
 
-#define	FIRST_VM86_IRQ		3
-#define LAST_VM86_IRQ		15
-#define invalid_vm86_irq(irq)	((irq) < 3 || (irq) > 15)
+#define FPU_IRQ				  13
 
-#define NR_IRQS_LEGACY		16
+#define	FIRST_VM86_IRQ			   3
+#define LAST_VM86_IRQ			  15
 
-#if defined(CONFIG_X86_IO_APIC) && !defined(CONFIG_X86_VOYAGER)
-
-#ifndef CONFIG_SPARSE_IRQ
-# if NR_CPUS < MAX_IO_APICS
-#  define NR_IRQS (NR_VECTORS + (32 * NR_CPUS))
-# else
-#  define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
-# endif
-#else
-# if (8 * NR_CPUS) > (32 * MAX_IO_APICS)
-#  define NR_IRQS (NR_VECTORS + (8 * NR_CPUS))
-# else
-#  define NR_IRQS (NR_VECTORS + (32 * MAX_IO_APICS))
-# endif
+#ifndef __ASSEMBLY__
+static inline int invalid_vm86_irq(int irq)
+{
+	return irq < FIRST_VM86_IRQ || irq > LAST_VM86_IRQ;
+}
 #endif
 
-#elif defined(CONFIG_X86_VOYAGER)
-
-# define NR_IRQS		224
+/*
+ * Size the maximum number of interrupts.
+ *
+ * If the irq_desc[] array has a sparse layout, we can size things
+ * generously - it scales up linearly with the maximum number of CPUs,
+ * and the maximum number of IO-APICs, whichever is higher.
+ *
+ * In other cases we size more conservatively, to not create too large
+ * static arrays.
+ */
 
-#else /* IO_APIC || VOYAGER */
+#define NR_IRQS_LEGACY			  16
 
-# define NR_IRQS		16
+#define CPU_VECTOR_LIMIT		(  8 * NR_CPUS      )
+#define IO_APIC_VECTOR_LIMIT		( 32 * MAX_IO_APICS )
 
+#ifdef CONFIG_X86_IO_APIC
+# ifdef CONFIG_SPARSE_IRQ
+#  define NR_IRQS					\
+	(CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ?	\
+		(NR_VECTORS + CPU_VECTOR_LIMIT)  :	\
+		(NR_VECTORS + IO_APIC_VECTOR_LIMIT))
+# else
+#  if NR_CPUS < MAX_IO_APICS
+#   define NR_IRQS 			(NR_VECTORS + 4*CPU_VECTOR_LIMIT)
+#  else
+#   define NR_IRQS			(NR_VECTORS + IO_APIC_VECTOR_LIMIT)
+#  endif
+# endif
+#else /* !CONFIG_X86_IO_APIC: */
+# define NR_IRQS			NR_IRQS_LEGACY
 #endif
 
-/* Voyager specific defines */
-/* These define the CPIs we use in linux */
-#define VIC_CPI_LEVEL0			0
-#define VIC_CPI_LEVEL1			1
-/* now the fake CPIs */
-#define VIC_TIMER_CPI			2
-#define VIC_INVALIDATE_CPI		3
-#define VIC_RESCHEDULE_CPI		4
-#define VIC_ENABLE_IRQ_CPI		5
-#define VIC_CALL_FUNCTION_CPI		6
-#define VIC_CALL_FUNCTION_SINGLE_CPI	7
-
-/* Now the QIC CPIs:  Since we don't need the two initial levels,
- * these are 2 less than the VIC CPIs */
-#define QIC_CPI_OFFSET			1
-#define QIC_TIMER_CPI			(VIC_TIMER_CPI - QIC_CPI_OFFSET)
-#define QIC_INVALIDATE_CPI		(VIC_INVALIDATE_CPI - QIC_CPI_OFFSET)
-#define QIC_RESCHEDULE_CPI		(VIC_RESCHEDULE_CPI - QIC_CPI_OFFSET)
-#define QIC_ENABLE_IRQ_CPI		(VIC_ENABLE_IRQ_CPI - QIC_CPI_OFFSET)
-#define QIC_CALL_FUNCTION_CPI		(VIC_CALL_FUNCTION_CPI - QIC_CPI_OFFSET)
-#define QIC_CALL_FUNCTION_SINGLE_CPI	(VIC_CALL_FUNCTION_SINGLE_CPI - QIC_CPI_OFFSET)
-
-#define VIC_START_FAKE_CPI		VIC_TIMER_CPI
-#define VIC_END_FAKE_CPI		VIC_CALL_FUNCTION_SINGLE_CPI
-
-/* this is the SYS_INT CPI. */
-#define VIC_SYS_INT			8
-#define VIC_CMN_INT			15
-
-/* This is the boot CPI for alternate processors.  It gets overwritten
- * by the above once the system has activated all available processors */
-#define VIC_CPU_BOOT_CPI		VIC_CPI_LEVEL0
-#define VIC_CPU_BOOT_ERRATA_CPI		(VIC_CPI_LEVEL0 + 8)
-
-
 #endif /* _ASM_X86_IRQ_VECTORS_H */
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index c61d8b2ab8b9..0ceb6d19ed30 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -9,23 +9,8 @@
 # define PAGES_NR		4
 #else
 # define PA_CONTROL_PAGE	0
-# define VA_CONTROL_PAGE	1
-# define PA_PGD			2
-# define VA_PGD			3
-# define PA_PUD_0		4
-# define VA_PUD_0		5
-# define PA_PMD_0		6
-# define VA_PMD_0		7
-# define PA_PTE_0		8
-# define VA_PTE_0		9
-# define PA_PUD_1		10
-# define VA_PUD_1		11
-# define PA_PMD_1		12
-# define VA_PMD_1		13
-# define PA_PTE_1		14
-# define VA_PTE_1		15
-# define PA_TABLE_PAGE		16
-# define PAGES_NR		17
+# define PA_TABLE_PAGE		1
+# define PAGES_NR		2
 #endif
 
 #ifdef CONFIG_X86_32
@@ -157,9 +142,9 @@ relocate_kernel(unsigned long indirection_page,
 		unsigned long start_address) ATTRIB_NORET;
 #endif
 
-#ifdef CONFIG_X86_32
 #define ARCH_HAS_KIMAGE_ARCH
 
+#ifdef CONFIG_X86_32
 struct kimage_arch {
 	pgd_t *pgd;
 #ifdef CONFIG_X86_PAE
@@ -169,6 +154,12 @@ struct kimage_arch {
 	pte_t *pte0;
 	pte_t *pte1;
 };
+#else
+struct kimage_arch {
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+};
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
index 5d98d0b68ffc..9320e2a8a26a 100644
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -52,70 +52,14 @@
 
 #endif
 
+#define GLOBAL(name)	\
+	.globl name;	\
+	name:
+
 #ifdef CONFIG_X86_ALIGNMENT_16
 #define __ALIGN .align 16,0x90
 #define __ALIGN_STR ".align 16,0x90"
 #endif
 
-/*
- * to check ENTRY_X86/END_X86 and
- * KPROBE_ENTRY_X86/KPROBE_END_X86
- * unbalanced-missed-mixed appearance
- */
-#define __set_entry_x86		.set ENTRY_X86_IN, 0
-#define __unset_entry_x86	.set ENTRY_X86_IN, 1
-#define __set_kprobe_x86	.set KPROBE_X86_IN, 0
-#define __unset_kprobe_x86	.set KPROBE_X86_IN, 1
-
-#define __macro_err_x86 .error "ENTRY_X86/KPROBE_X86 unbalanced,missed,mixed"
-
-#define __check_entry_x86	\
-	.ifdef ENTRY_X86_IN;	\
-	.ifeq ENTRY_X86_IN;	\
-	__macro_err_x86;	\
-	.abort;			\
-	.endif;			\
-	.endif
-
-#define __check_kprobe_x86	\
-	.ifdef KPROBE_X86_IN;	\
-	.ifeq KPROBE_X86_IN;	\
-	__macro_err_x86;	\
-	.abort;			\
-	.endif;			\
-	.endif
-
-#define __check_entry_kprobe_x86	\
-	__check_entry_x86;		\
-	__check_kprobe_x86
-
-#define ENTRY_KPROBE_FINAL_X86 __check_entry_kprobe_x86
-
-#define ENTRY_X86(name)			\
-	__check_entry_kprobe_x86;	\
-	__set_entry_x86;		\
-	.globl name;			\
-	__ALIGN;			\
-	name:
-
-#define END_X86(name)			\
-	__unset_entry_x86;		\
-	__check_entry_kprobe_x86;	\
-	.size name, .-name
-
-#define KPROBE_ENTRY_X86(name)		\
-	__check_entry_kprobe_x86;	\
-	__set_kprobe_x86;		\
-	.pushsection .kprobes.text, "ax"; \
-	.globl name;			\
-	__ALIGN;			\
-	name:
-
-#define KPROBE_END_X86(name)		\
-	__unset_kprobe_x86;		\
-	__check_entry_kprobe_x86;	\
-	.size name, .-name;		\
-	.popsection
-
 #endif /* _ASM_X86_LINKAGE_H */
 
diff --git a/arch/x86/include/asm/mach-default/mach_apic.h b/arch/x86/include/asm/mach-default/mach_apic.h
deleted file mode 100644
index cc09cbbee27e..000000000000
--- a/arch/x86/include/asm/mach-default/mach_apic.h
+++ /dev/null
@@ -1,168 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_APIC_H
-#define _ASM_X86_MACH_DEFAULT_MACH_APIC_H
-
-#ifdef CONFIG_X86_LOCAL_APIC
-
-#include <mach_apicdef.h>
-#include <asm/smp.h>
-
-#define APIC_DFR_VALUE	(APIC_DFR_FLAT)
-
-static inline const struct cpumask *target_cpus(void)
-{ 
-#ifdef CONFIG_SMP
-	return cpu_online_mask;
-#else
-	return cpumask_of(0);
-#endif
-} 
-
-#define NO_BALANCE_IRQ (0)
-#define esr_disable (0)
-
-#ifdef CONFIG_X86_64
-#include <asm/genapic.h>
-#define INT_DELIVERY_MODE (genapic->int_delivery_mode)
-#define INT_DEST_MODE (genapic->int_dest_mode)
-#define TARGET_CPUS	  (genapic->target_cpus())
-#define apic_id_registered (genapic->apic_id_registered)
-#define init_apic_ldr (genapic->init_apic_ldr)
-#define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
-#define cpu_mask_to_apicid_and (genapic->cpu_mask_to_apicid_and)
-#define phys_pkg_id	(genapic->phys_pkg_id)
-#define vector_allocation_domain    (genapic->vector_allocation_domain)
-#define read_apic_id()  (GET_APIC_ID(apic_read(APIC_ID)))
-#define send_IPI_self (genapic->send_IPI_self)
-#define wakeup_secondary_cpu (genapic->wakeup_cpu)
-extern void setup_apic_routing(void);
-#else
-#define INT_DELIVERY_MODE dest_LowestPrio
-#define INT_DEST_MODE 1     /* logical delivery broadcast to all procs */
-#define TARGET_CPUS (target_cpus())
-#define wakeup_secondary_cpu wakeup_secondary_cpu_via_init
-/*
- * Set up the logical destination ID.
- *
- * Intel recommends to set DFR, LDR and TPR before enabling
- * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
- * document number 292116).  So here it goes...
- */
-static inline void init_apic_ldr(void)
-{
-	unsigned long val;
-
-	apic_write(APIC_DFR, APIC_DFR_VALUE);
-	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
-	val |= SET_APIC_LOGICAL_ID(1UL << smp_processor_id());
-	apic_write(APIC_LDR, val);
-}
-
-static inline int apic_id_registered(void)
-{
-	return physid_isset(read_apic_id(), phys_cpu_present_map);
-}
-
-static inline unsigned int cpu_mask_to_apicid(const struct cpumask *cpumask)
-{
-	return cpumask_bits(cpumask)[0];
-}
-
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-						  const struct cpumask *andmask)
-{
-	unsigned long mask1 = cpumask_bits(cpumask)[0];
-	unsigned long mask2 = cpumask_bits(andmask)[0];
-	unsigned long mask3 = cpumask_bits(cpu_online_mask)[0];
-
-	return (unsigned int)(mask1 & mask2 & mask3);
-}
-
-static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
-static inline void setup_apic_routing(void)
-{
-#ifdef CONFIG_X86_IO_APIC
-	printk("Enabling APIC mode:  %s.  Using %d I/O APICs\n",
-					"Flat", nr_ioapics);
-#endif
-}
-
-static inline int apicid_to_node(int logical_apicid)
-{
-#ifdef CONFIG_SMP
-	return apicid_2_node[hard_smp_processor_id()];
-#else
-	return 0;
-#endif
-}
-
-static inline void vector_allocation_domain(int cpu, struct cpumask *retmask)
-{
-        /* Careful. Some cpus do not strictly honor the set of cpus
-         * specified in the interrupt destination when using lowest
-         * priority interrupt delivery mode.
-         *
-         * In particular there was a hyperthreading cpu observed to
-         * deliver interrupts to the wrong hyperthread when only one
-         * hyperthread was specified in the interrupt desitination.
-         */
-	*retmask = (cpumask_t) { { [0] = APIC_ALL_CPUS } };
-}
-#endif
-
-static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
-{
-	return physid_isset(apicid, bitmap);
-}
-
-static inline unsigned long check_apicid_present(int bit)
-{
-	return physid_isset(bit, phys_cpu_present_map);
-}
-
-static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map)
-{
-	return phys_map;
-}
-
-static inline int multi_timer_check(int apic, int irq)
-{
-	return 0;
-}
-
-/* Mapping from cpu number to logical apicid */
-static inline int cpu_to_logical_apicid(int cpu)
-{
-	return 1 << cpu;
-}
-
-static inline int cpu_present_to_apicid(int mps_cpu)
-{
-	if (mps_cpu < nr_cpu_ids && cpu_present(mps_cpu))
-		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
-	else
-		return BAD_APICID;
-}
-
-static inline physid_mask_t apicid_to_cpu_present(int phys_apicid)
-{
-	return physid_mask_of_physid(phys_apicid);
-}
-
-static inline void setup_portio_remap(void)
-{
-}
-
-static inline int check_phys_apicid_present(int boot_cpu_physical_apicid)
-{
-	return physid_isset(boot_cpu_physical_apicid, phys_cpu_present_map);
-}
-
-static inline void enable_apic_mode(void)
-{
-}
-#endif /* CONFIG_X86_LOCAL_APIC */
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_APIC_H */
diff --git a/arch/x86/include/asm/mach-default/mach_apicdef.h b/arch/x86/include/asm/mach-default/mach_apicdef.h
deleted file mode 100644
index 53179936d6c6..000000000000
--- a/arch/x86/include/asm/mach-default/mach_apicdef.h
+++ /dev/null
@@ -1,24 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H
-#define _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H
-
-#include <asm/apic.h>
-
-#ifdef CONFIG_X86_64
-#define	APIC_ID_MASK		(genapic->apic_id_mask)
-#define GET_APIC_ID(x)		(genapic->get_apic_id(x))
-#define	SET_APIC_ID(x)		(genapic->set_apic_id(x))
-#else
-#define		APIC_ID_MASK		(0xF<<24)
-static inline unsigned get_apic_id(unsigned long x) 
-{
-	unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
-	if (APIC_XAPIC(ver))
-		return (((x)>>24)&0xFF);
-	else
-		return (((x)>>24)&0xF);
-} 
-
-#define		GET_APIC_ID(x)	get_apic_id(x)
-#endif
-
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_APICDEF_H */
diff --git a/arch/x86/include/asm/mach-default/mach_ipi.h b/arch/x86/include/asm/mach-default/mach_ipi.h
deleted file mode 100644
index 191312d155da..000000000000
--- a/arch/x86/include/asm/mach-default/mach_ipi.h
+++ /dev/null
@@ -1,64 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_IPI_H
-#define _ASM_X86_MACH_DEFAULT_MACH_IPI_H
-
-/* Avoid include hell */
-#define NMI_VECTOR 0x02
-
-void send_IPI_mask_bitmask(const struct cpumask *mask, int vector);
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector);
-void __send_IPI_shortcut(unsigned int shortcut, int vector);
-
-extern int no_broadcast;
-
-#ifdef CONFIG_X86_64
-#include <asm/genapic.h>
-#define send_IPI_mask (genapic->send_IPI_mask)
-#define send_IPI_mask_allbutself (genapic->send_IPI_mask_allbutself)
-#else
-static inline void send_IPI_mask(const struct cpumask *mask, int vector)
-{
-	send_IPI_mask_bitmask(mask, vector);
-}
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector);
-#endif
-
-static inline void __local_send_IPI_allbutself(int vector)
-{
-	if (no_broadcast || vector == NMI_VECTOR)
-		send_IPI_mask_allbutself(cpu_online_mask, vector);
-	else
-		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
-}
-
-static inline void __local_send_IPI_all(int vector)
-{
-	if (no_broadcast || vector == NMI_VECTOR)
-		send_IPI_mask(cpu_online_mask, vector);
-	else
-		__send_IPI_shortcut(APIC_DEST_ALLINC, vector);
-}
-
-#ifdef CONFIG_X86_64
-#define send_IPI_allbutself (genapic->send_IPI_allbutself)
-#define send_IPI_all (genapic->send_IPI_all)
-#else
-static inline void send_IPI_allbutself(int vector)
-{
-	/*
-	 * if there are no other CPUs in the system then we get an APIC send 
-	 * error if we try to broadcast, thus avoid sending IPIs in this case.
-	 */
-	if (!(num_online_cpus() > 1))
-		return;
-
-	__local_send_IPI_allbutself(vector);
-	return;
-}
-
-static inline void send_IPI_all(int vector)
-{
-	__local_send_IPI_all(vector);
-}
-#endif
-
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_IPI_H */
diff --git a/arch/x86/include/asm/mach-default/mach_mpparse.h b/arch/x86/include/asm/mach-default/mach_mpparse.h
deleted file mode 100644
index c70a263d68cd..000000000000
--- a/arch/x86/include/asm/mach-default/mach_mpparse.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H
-#define _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H
-
-static inline int
-mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
-{
-	return 0;
-}
-
-/* Hook from generic ACPI tables.c */
-static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	return 0;
-}
-
-
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_MPPARSE_H */
diff --git a/arch/x86/include/asm/mach-default/mach_mpspec.h b/arch/x86/include/asm/mach-default/mach_mpspec.h
deleted file mode 100644
index e85ede686be8..000000000000
--- a/arch/x86/include/asm/mach-default/mach_mpspec.h
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H
-#define _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H
-
-#define MAX_IRQ_SOURCES 256
-
-#if CONFIG_BASE_SMALL == 0
-#define MAX_MP_BUSSES 256
-#else
-#define MAX_MP_BUSSES 32
-#endif
-
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_MPSPEC_H */
diff --git a/arch/x86/include/asm/mach-default/mach_wakecpu.h b/arch/x86/include/asm/mach-default/mach_wakecpu.h
deleted file mode 100644
index 89897a6a65b9..000000000000
--- a/arch/x86/include/asm/mach-default/mach_wakecpu.h
+++ /dev/null
@@ -1,41 +0,0 @@
-#ifndef _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H
-#define _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H
-
-#define TRAMPOLINE_PHYS_LOW (0x467)
-#define TRAMPOLINE_PHYS_HIGH (0x469)
-
-static inline void wait_for_init_deassert(atomic_t *deassert)
-{
-	while (!atomic_read(deassert))
-		cpu_relax();
-	return;
-}
-
-/* Nothing to do for most platforms, since cleared by the INIT cycle */
-static inline void smp_callin_clear_local_apic(void)
-{
-}
-
-static inline void store_NMI_vector(unsigned short *high, unsigned short *low)
-{
-}
-
-static inline void restore_NMI_vector(unsigned short *high, unsigned short *low)
-{
-}
-
-#ifdef CONFIG_SMP
-extern void __inquire_remote_apic(int apicid);
-#else /* CONFIG_SMP */
-static inline void __inquire_remote_apic(int apicid)
-{
-}
-#endif /* CONFIG_SMP */
-
-static inline void inquire_remote_apic(int apicid)
-{
-	if (apic_verbosity >= APIC_DEBUG)
-		__inquire_remote_apic(apicid);
-}
-
-#endif /* _ASM_X86_MACH_DEFAULT_MACH_WAKECPU_H */
diff --git a/arch/x86/include/asm/mach-generic/gpio.h b/arch/x86/include/asm/mach-generic/gpio.h
deleted file mode 100644
index 995c45efdb33..000000000000
--- a/arch/x86/include/asm/mach-generic/gpio.h
+++ /dev/null
@@ -1,15 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_GPIO_H
-#define _ASM_X86_MACH_GENERIC_GPIO_H
-
-int gpio_request(unsigned gpio, const char *label);
-void gpio_free(unsigned gpio);
-int gpio_direction_input(unsigned gpio);
-int gpio_direction_output(unsigned gpio, int value);
-int gpio_get_value(unsigned gpio);
-void gpio_set_value(unsigned gpio, int value);
-int gpio_to_irq(unsigned gpio);
-int irq_to_gpio(unsigned irq);
-
-#include <asm-generic/gpio.h>           /* cansleep wrappers */
-
-#endif /* _ASM_X86_MACH_GENERIC_GPIO_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_apic.h b/arch/x86/include/asm/mach-generic/mach_apic.h
deleted file mode 100644
index 48553e958ad5..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_apic.h
+++ /dev/null
@@ -1,35 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_APIC_H
-#define _ASM_X86_MACH_GENERIC_MACH_APIC_H
-
-#include <asm/genapic.h>
-
-#define esr_disable (genapic->ESR_DISABLE)
-#define NO_BALANCE_IRQ (genapic->no_balance_irq)
-#define INT_DELIVERY_MODE (genapic->int_delivery_mode)
-#define INT_DEST_MODE (genapic->int_dest_mode)
-#undef APIC_DEST_LOGICAL
-#define APIC_DEST_LOGICAL (genapic->apic_destination_logical)
-#define TARGET_CPUS	  (genapic->target_cpus())
-#define apic_id_registered (genapic->apic_id_registered)
-#define init_apic_ldr (genapic->init_apic_ldr)
-#define ioapic_phys_id_map (genapic->ioapic_phys_id_map)
-#define setup_apic_routing (genapic->setup_apic_routing)
-#define multi_timer_check (genapic->multi_timer_check)
-#define apicid_to_node (genapic->apicid_to_node)
-#define cpu_to_logical_apicid (genapic->cpu_to_logical_apicid) 
-#define cpu_present_to_apicid (genapic->cpu_present_to_apicid)
-#define apicid_to_cpu_present (genapic->apicid_to_cpu_present)
-#define setup_portio_remap (genapic->setup_portio_remap)
-#define check_apicid_present (genapic->check_apicid_present)
-#define check_phys_apicid_present (genapic->check_phys_apicid_present)
-#define check_apicid_used (genapic->check_apicid_used)
-#define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
-#define cpu_mask_to_apicid_and (genapic->cpu_mask_to_apicid_and)
-#define vector_allocation_domain (genapic->vector_allocation_domain)
-#define enable_apic_mode (genapic->enable_apic_mode)
-#define phys_pkg_id (genapic->phys_pkg_id)
-#define wakeup_secondary_cpu (genapic->wakeup_cpu)
-
-extern void generic_bigsmp_probe(void);
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_APIC_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_apicdef.h b/arch/x86/include/asm/mach-generic/mach_apicdef.h
deleted file mode 100644
index 68041f3802f4..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_apicdef.h
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_APICDEF_H
-#define _ASM_X86_MACH_GENERIC_MACH_APICDEF_H
-
-#ifndef APIC_DEFINITION
-#include <asm/genapic.h>
-
-#define GET_APIC_ID (genapic->get_apic_id)
-#define APIC_ID_MASK (genapic->apic_id_mask)
-#endif
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_APICDEF_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_ipi.h b/arch/x86/include/asm/mach-generic/mach_ipi.h
deleted file mode 100644
index ffd637e3c3d9..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_ipi.h
+++ /dev/null
@@ -1,10 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_IPI_H
-#define _ASM_X86_MACH_GENERIC_MACH_IPI_H
-
-#include <asm/genapic.h>
-
-#define send_IPI_mask (genapic->send_IPI_mask)
-#define send_IPI_allbutself (genapic->send_IPI_allbutself)
-#define send_IPI_all (genapic->send_IPI_all)
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_IPI_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_mpparse.h b/arch/x86/include/asm/mach-generic/mach_mpparse.h
deleted file mode 100644
index 9444ab8dca94..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_mpparse.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H
-#define _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H
-
-
-extern int mps_oem_check(struct mpc_table *, char *, char *);
-
-extern int acpi_madt_oem_check(char *, char *);
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_MPPARSE_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_mpspec.h b/arch/x86/include/asm/mach-generic/mach_mpspec.h
deleted file mode 100644
index 3bc407226578..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_mpspec.h
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H
-#define _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H
-
-#define MAX_IRQ_SOURCES 256
-
-/* Summit or generic (i.e. installer) kernels need lots of bus entries. */
-/* Maximum 256 PCI busses, plus 1 ISA bus in each of 4 cabinets. */
-#define MAX_MP_BUSSES 260
-
-extern void numaq_mps_oem_check(struct mpc_table *, char *, char *);
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_MPSPEC_H */
diff --git a/arch/x86/include/asm/mach-generic/mach_wakecpu.h b/arch/x86/include/asm/mach-generic/mach_wakecpu.h
deleted file mode 100644
index 1ab16b168c8a..000000000000
--- a/arch/x86/include/asm/mach-generic/mach_wakecpu.h
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifndef _ASM_X86_MACH_GENERIC_MACH_WAKECPU_H
-#define _ASM_X86_MACH_GENERIC_MACH_WAKECPU_H
-
-#define TRAMPOLINE_PHYS_LOW (genapic->trampoline_phys_low)
-#define TRAMPOLINE_PHYS_HIGH (genapic->trampoline_phys_high)
-#define wait_for_init_deassert (genapic->wait_for_init_deassert)
-#define smp_callin_clear_local_apic (genapic->smp_callin_clear_local_apic)
-#define store_NMI_vector (genapic->store_NMI_vector)
-#define restore_NMI_vector (genapic->restore_NMI_vector)
-#define inquire_remote_apic (genapic->inquire_remote_apic)
-
-#endif /* _ASM_X86_MACH_GENERIC_MACH_APIC_H */
diff --git a/arch/x86/include/asm/mach-rdc321x/gpio.h b/arch/x86/include/asm/mach-rdc321x/gpio.h
deleted file mode 100644
index c210ab5788b0..000000000000
--- a/arch/x86/include/asm/mach-rdc321x/gpio.h
+++ /dev/null
@@ -1,60 +0,0 @@
-#ifndef _ASM_X86_MACH_RDC321X_GPIO_H
-#define _ASM_X86_MACH_RDC321X_GPIO_H
-
-#include <linux/kernel.h>
-
-extern int rdc_gpio_get_value(unsigned gpio);
-extern void rdc_gpio_set_value(unsigned gpio, int value);
-extern int rdc_gpio_direction_input(unsigned gpio);
-extern int rdc_gpio_direction_output(unsigned gpio, int value);
-extern int rdc_gpio_request(unsigned gpio, const char *label);
-extern void rdc_gpio_free(unsigned gpio);
-extern void __init rdc321x_gpio_setup(void);
-
-/* Wrappers for the arch-neutral GPIO API */
-
-static inline int gpio_request(unsigned gpio, const char *label)
-{
-	return rdc_gpio_request(gpio, label);
-}
-
-static inline void gpio_free(unsigned gpio)
-{
-	might_sleep();
-	rdc_gpio_free(gpio);
-}
-
-static inline int gpio_direction_input(unsigned gpio)
-{
-	return rdc_gpio_direction_input(gpio);
-}
-
-static inline int gpio_direction_output(unsigned gpio, int value)
-{
-	return rdc_gpio_direction_output(gpio, value);
-}
-
-static inline int gpio_get_value(unsigned gpio)
-{
-	return rdc_gpio_get_value(gpio);
-}
-
-static inline void gpio_set_value(unsigned gpio, int value)
-{
-	rdc_gpio_set_value(gpio, value);
-}
-
-static inline int gpio_to_irq(unsigned gpio)
-{
-	return gpio;
-}
-
-static inline int irq_to_gpio(unsigned irq)
-{
-	return irq;
-}
-
-/* For cansleep */
-#include <asm-generic/gpio.h>
-
-#endif /* _ASM_X86_MACH_RDC321X_GPIO_H */
diff --git a/arch/x86/include/asm/mach-voyager/do_timer.h b/arch/x86/include/asm/mach-voyager/do_timer.h
deleted file mode 100644
index 9e5a459fd15b..000000000000
--- a/arch/x86/include/asm/mach-voyager/do_timer.h
+++ /dev/null
@@ -1,17 +0,0 @@
-/* defines for inline arch setup functions */
-#include <linux/clockchips.h>
-
-#include <asm/voyager.h>
-#include <asm/i8253.h>
-
-/**
- * do_timer_interrupt_hook - hook into timer tick
- *
- * Call the pit clock event handler. see asm/i8253.h
- **/
-static inline void do_timer_interrupt_hook(void)
-{
-	global_clock_event->event_handler(global_clock_event);
-	voyager_timer_interrupt();
-}
-
diff --git a/arch/x86/include/asm/mach-voyager/entry_arch.h b/arch/x86/include/asm/mach-voyager/entry_arch.h
deleted file mode 100644
index ae52624b5937..000000000000
--- a/arch/x86/include/asm/mach-voyager/entry_arch.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* -*- mode: c; c-basic-offset: 8 -*- */
-
-/* Copyright (C) 2002
- *
- * Author: James.Bottomley@HansenPartnership.com
- *
- * linux/arch/i386/voyager/entry_arch.h
- *
- * This file builds the VIC and QIC CPI gates
- */
-
-/* initialise the voyager interrupt gates 
- *
- * This uses the macros in irq.h to set up assembly jump gates.  The
- * calls are then redirected to the same routine with smp_ prefixed */
-BUILD_INTERRUPT(vic_sys_interrupt, VIC_SYS_INT)
-BUILD_INTERRUPT(vic_cmn_interrupt, VIC_CMN_INT)
-BUILD_INTERRUPT(vic_cpi_interrupt, VIC_CPI_LEVEL0);
-
-/* do all the QIC interrupts */
-BUILD_INTERRUPT(qic_timer_interrupt, QIC_TIMER_CPI);
-BUILD_INTERRUPT(qic_invalidate_interrupt, QIC_INVALIDATE_CPI);
-BUILD_INTERRUPT(qic_reschedule_interrupt, QIC_RESCHEDULE_CPI);
-BUILD_INTERRUPT(qic_enable_irq_interrupt, QIC_ENABLE_IRQ_CPI);
-BUILD_INTERRUPT(qic_call_function_interrupt, QIC_CALL_FUNCTION_CPI);
-BUILD_INTERRUPT(qic_call_function_single_interrupt, QIC_CALL_FUNCTION_SINGLE_CPI);
diff --git a/arch/x86/include/asm/mach-voyager/setup_arch.h b/arch/x86/include/asm/mach-voyager/setup_arch.h
deleted file mode 100644
index 71729ca05cd7..000000000000
--- a/arch/x86/include/asm/mach-voyager/setup_arch.h
+++ /dev/null
@@ -1,12 +0,0 @@
-#include <asm/voyager.h>
-#include <asm/setup.h>
-#define VOYAGER_BIOS_INFO ((struct voyager_bios_info *) \
-			(&boot_params.apm_bios_info))
-
-/* Hook to call BIOS initialisation function */
-
-/* for voyager, pass the voyager BIOS/SUS info area to the detection
- * routines */
-
-#define ARCH_SETUP	voyager_detect(VOYAGER_BIOS_INFO);
-
diff --git a/arch/x86/include/asm/mach-default/mach_timer.h b/arch/x86/include/asm/mach_timer.h
index 853728519ae9..853728519ae9 100644
--- a/arch/x86/include/asm/mach-default/mach_timer.h
+++ b/arch/x86/include/asm/mach_timer.h
diff --git a/arch/x86/include/asm/mach-default/mach_traps.h b/arch/x86/include/asm/mach_traps.h
index f7920601e472..f7920601e472 100644
--- a/arch/x86/include/asm/mach-default/mach_traps.h
+++ b/arch/x86/include/asm/mach_traps.h
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8aeeb3fd73db..f923203dc39a 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -21,11 +21,54 @@ static inline void paravirt_activate_mm(struct mm_struct *prev,
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
 void destroy_context(struct mm_struct *mm);
 
-#ifdef CONFIG_X86_32
-# include "mmu_context_32.h"
-#else
-# include "mmu_context_64.h"
+
+static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+#ifdef CONFIG_SMP
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
+		percpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
+#endif
+}
+
+static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
+			     struct task_struct *tsk)
+{
+	unsigned cpu = smp_processor_id();
+
+	if (likely(prev != next)) {
+		/* stop flush ipis for the previous mm */
+		cpu_clear(cpu, prev->cpu_vm_mask);
+#ifdef CONFIG_SMP
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		percpu_write(cpu_tlbstate.active_mm, next);
 #endif
+		cpu_set(cpu, next->cpu_vm_mask);
+
+		/* Re-load page tables */
+		load_cr3(next->pgd);
+
+		/*
+		 * load the LDT, if the LDT is different:
+		 */
+		if (unlikely(prev->context.ldt != next->context.ldt))
+			load_LDT_nolock(&next->context);
+	}
+#ifdef CONFIG_SMP
+	else {
+		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
+		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);
+
+		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
+			/* We were in lazy tlb mode and leave_mm disabled
+			 * tlb flush IPI delivery. We must reload CR3
+			 * to make sure to use no freed page tables.
+			 */
+			load_cr3(next->pgd);
+			load_LDT_nolock(&next->context);
+		}
+	}
+#endif
+}
 
 #define activate_mm(prev, next)			\
 do {						\
@@ -33,5 +76,17 @@ do {						\
 	switch_mm((prev), (next), NULL);	\
 } while (0);
 
+#ifdef CONFIG_X86_32
+#define deactivate_mm(tsk, mm)			\
+do {						\
+	lazy_load_gs(0);			\
+} while (0)
+#else
+#define deactivate_mm(tsk, mm)			\
+do {						\
+	load_gs_index(0);			\
+	loadsegment(fs, 0);			\
+} while (0)
+#endif
 
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/include/asm/mmu_context_32.h b/arch/x86/include/asm/mmu_context_32.h
deleted file mode 100644
index 7e98ce1d2c0e..000000000000
--- a/arch/x86/include/asm/mmu_context_32.h
+++ /dev/null
@@ -1,55 +0,0 @@
-#ifndef _ASM_X86_MMU_CONTEXT_32_H
-#define _ASM_X86_MMU_CONTEXT_32_H
-
-static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-#ifdef CONFIG_SMP
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK)
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_LAZY);
-#endif
-}
-
-static inline void switch_mm(struct mm_struct *prev,
-			     struct mm_struct *next,
-			     struct task_struct *tsk)
-{
-	int cpu = smp_processor_id();
-
-	if (likely(prev != next)) {
-		/* stop flush ipis for the previous mm */
-		cpu_clear(cpu, prev->cpu_vm_mask);
-#ifdef CONFIG_SMP
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		x86_write_percpu(cpu_tlbstate.active_mm, next);
-#endif
-		cpu_set(cpu, next->cpu_vm_mask);
-
-		/* Re-load page tables */
-		load_cr3(next->pgd);
-
-		/*
-		 * load the LDT, if the LDT is different:
-		 */
-		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
-	}
-#ifdef CONFIG_SMP
-	else {
-		x86_write_percpu(cpu_tlbstate.state, TLBSTATE_OK);
-		BUG_ON(x86_read_percpu(cpu_tlbstate.active_mm) != next);
-
-		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
-			/* We were in lazy tlb mode and leave_mm disabled
-			 * tlb flush IPI delivery. We must reload %cr3.
-			 */
-			load_cr3(next->pgd);
-			load_LDT_nolock(&next->context);
-		}
-	}
-#endif
-}
-
-#define deactivate_mm(tsk, mm)			\
-	asm("movl %0,%%gs": :"r" (0));
-
-#endif /* _ASM_X86_MMU_CONTEXT_32_H */
diff --git a/arch/x86/include/asm/mmu_context_64.h b/arch/x86/include/asm/mmu_context_64.h
deleted file mode 100644
index 677d36e9540a..000000000000
--- a/arch/x86/include/asm/mmu_context_64.h
+++ /dev/null
@@ -1,54 +0,0 @@
-#ifndef _ASM_X86_MMU_CONTEXT_64_H
-#define _ASM_X86_MMU_CONTEXT_64_H
-
-#include <asm/pda.h>
-
-static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-#ifdef CONFIG_SMP
-	if (read_pda(mmu_state) == TLBSTATE_OK)
-		write_pda(mmu_state, TLBSTATE_LAZY);
-#endif
-}
-
-static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
-			     struct task_struct *tsk)
-{
-	unsigned cpu = smp_processor_id();
-	if (likely(prev != next)) {
-		/* stop flush ipis for the previous mm */
-		cpu_clear(cpu, prev->cpu_vm_mask);
-#ifdef CONFIG_SMP
-		write_pda(mmu_state, TLBSTATE_OK);
-		write_pda(active_mm, next);
-#endif
-		cpu_set(cpu, next->cpu_vm_mask);
-		load_cr3(next->pgd);
-
-		if (unlikely(next->context.ldt != prev->context.ldt))
-			load_LDT_nolock(&next->context);
-	}
-#ifdef CONFIG_SMP
-	else {
-		write_pda(mmu_state, TLBSTATE_OK);
-		if (read_pda(active_mm) != next)
-			BUG();
-		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
-			/* We were in lazy tlb mode and leave_mm disabled
-			 * tlb flush IPI delivery. We must reload CR3
-			 * to make sure to use no freed page tables.
-			 */
-			load_cr3(next->pgd);
-			load_LDT_nolock(&next->context);
-		}
-	}
-#endif
-}
-
-#define deactivate_mm(tsk, mm)			\
-do {						\
-	load_gs_index(0);			\
-	asm volatile("movl %0,%%fs"::"r"(0));	\
-} while (0)
-
-#endif /* _ASM_X86_MMU_CONTEXT_64_H */
diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 105fb90a0635..ede6998bd92c 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -91,46 +91,9 @@ static inline int pfn_valid(int pfn)
 #endif /* CONFIG_DISCONTIGMEM */
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES
-
-/*
- * Following are macros that are specific to this numa platform.
- */
-#define reserve_bootmem(addr, size, flags) \
-	reserve_bootmem_node(NODE_DATA(0), (addr), (size), (flags))
-#define alloc_bootmem(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_nopanic(x) \
-	__alloc_bootmem_node_nopanic(NODE_DATA(0), (x), SMP_CACHE_BYTES, \
-				__pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES, 0)
-#define alloc_bootmem_pages(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_pages_nopanic(x) \
-	__alloc_bootmem_node_nopanic(NODE_DATA(0), (x), PAGE_SIZE, \
-				__pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low_pages(x) \
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
-#define alloc_bootmem_node(pgdat, x)					\
-({									\
-	struct pglist_data  __maybe_unused			\
-				*__alloc_bootmem_node__pgdat = (pgdat);	\
-	__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES,	\
-						__pa(MAX_DMA_ADDRESS));	\
-})
-#define alloc_bootmem_pages_node(pgdat, x)				\
-({									\
-	struct pglist_data  __maybe_unused			\
-				*__alloc_bootmem_node__pgdat = (pgdat);	\
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE,		\
-						__pa(MAX_DMA_ADDRESS));	\
-})
-#define alloc_bootmem_low_pages_node(pgdat, x)				\
-({									\
-	struct pglist_data  __maybe_unused			\
-				*__alloc_bootmem_node__pgdat = (pgdat);	\
-	__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0);		\
-})
+/* always use node 0 for bootmem on this numa platform */
+#define bootmem_arch_preferred_node(__bdata, size, align, goal, limit)	\
+	(NODE_DATA(0)->bdata)
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
 #endif /* _ASM_X86_MMZONE_32_H */
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index bd22f2a3713f..642fc7fc8cdc 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -9,7 +9,18 @@ extern int apic_version[MAX_APICS];
 extern int pic_mode;
 
 #ifdef CONFIG_X86_32
-#include <mach_mpspec.h>
+
+/*
+ * Summit or generic (i.e. installer) kernels need lots of bus entries.
+ * Maximum 256 PCI busses, plus 1 ISA bus in each of 4 cabinets.
+ */
+#if CONFIG_BASE_SMALL == 0
+# define MAX_MP_BUSSES		260
+#else
+# define MAX_MP_BUSSES		32
+#endif
+
+#define MAX_IRQ_SOURCES		256
 
 extern unsigned int def_to_bigsmp;
 extern u8 apicid_2_node[];
@@ -20,15 +31,15 @@ extern int mp_bus_id_to_local[MAX_MP_BUSSES];
 extern int quad_local_to_mp_bus_id [NR_CPUS/4][4];
 #endif
 
-#define MAX_APICID 256
+#define MAX_APICID		256
 
-#else
+#else /* CONFIG_X86_64: */
 
-#define MAX_MP_BUSSES 256
+#define MAX_MP_BUSSES		256
 /* Each PCI slot may be a combo card with its own bus.  4 IRQ pins per slot. */
-#define MAX_IRQ_SOURCES (MAX_MP_BUSSES * 4)
+#define MAX_IRQ_SOURCES		(MAX_MP_BUSSES * 4)
 
-#endif
+#endif /* CONFIG_X86_64 */
 
 extern void early_find_smp_config(void);
 extern void early_get_smp_config(void);
@@ -45,11 +56,13 @@ extern int smp_found_config;
 extern int mpc_default_type;
 extern unsigned long mp_lapic_addr;
 
-extern void find_smp_config(void);
 extern void get_smp_config(void);
+
 #ifdef CONFIG_X86_MPPARSE
+extern void find_smp_config(void);
 extern void early_reserve_e820_mpc_new(void);
 #else
+static inline void find_smp_config(void) { }
 static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
@@ -64,6 +77,8 @@ extern int acpi_probe_gsi(void);
 #ifdef CONFIG_X86_IO_APIC
 extern int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin,
 				u32 gsi, int triggering, int polarity);
+extern int mp_find_ioapic(int gsi);
+extern int mp_find_ioapic_pin(int ioapic, int gsi);
 #else
 static inline int
 mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin,
@@ -148,4 +163,8 @@ static inline void physid_set_mask_of_physid(int physid, physid_mask_t *map)
 
 extern physid_mask_t phys_cpu_present_map;
 
+extern int generic_mps_oem_check(struct mpc_table *, char *, char *);
+
+extern int default_acpi_madt_oem_check(char *, char *);
+
 #endif /* _ASM_X86_MPSPEC_H */
diff --git a/arch/x86/include/asm/mpspec_def.h b/arch/x86/include/asm/mpspec_def.h
index 59568bc4767f..4a7f96d7c188 100644
--- a/arch/x86/include/asm/mpspec_def.h
+++ b/arch/x86/include/asm/mpspec_def.h
@@ -24,17 +24,18 @@
 # endif
 #endif
 
-struct intel_mp_floating {
-	char mpf_signature[4];		/* "_MP_"			*/
-	unsigned int mpf_physptr;	/* Configuration table address	*/
-	unsigned char mpf_length;	/* Our length (paragraphs)	*/
-	unsigned char mpf_specification;/* Specification version	*/
-	unsigned char mpf_checksum;	/* Checksum (makes sum 0)	*/
-	unsigned char mpf_feature1;	/* Standard or configuration ?	*/
-	unsigned char mpf_feature2;	/* Bit7 set for IMCR|PIC	*/
-	unsigned char mpf_feature3;	/* Unused (0)			*/
-	unsigned char mpf_feature4;	/* Unused (0)			*/
-	unsigned char mpf_feature5;	/* Unused (0)			*/
+/* Intel MP Floating Pointer Structure */
+struct mpf_intel {
+	char signature[4];		/* "_MP_"			*/
+	unsigned int physptr;		/* Configuration table address	*/
+	unsigned char length;		/* Our length (paragraphs)	*/
+	unsigned char specification;	/* Specification version	*/
+	unsigned char checksum;		/* Checksum (makes sum 0)	*/
+	unsigned char feature1;		/* Standard or configuration ?	*/
+	unsigned char feature2;		/* Bit7 set for IMCR|PIC	*/
+	unsigned char feature3;		/* Unused (0)			*/
+	unsigned char feature4;		/* Unused (0)			*/
+	unsigned char feature5;		/* Unused (0)			*/
 };
 
 #define MPC_SIGNATURE "PCMP"
diff --git a/arch/x86/include/asm/numa_32.h b/arch/x86/include/asm/numa_32.h
index e9f5db796244..a37229011b56 100644
--- a/arch/x86/include/asm/numa_32.h
+++ b/arch/x86/include/asm/numa_32.h
@@ -4,8 +4,12 @@
 extern int pxm_to_nid(int pxm);
 extern void numa_remove_cpu(int cpu);
 
-#ifdef CONFIG_NUMA
+#ifdef CONFIG_HIGHMEM
 extern void set_highmem_pages_init(void);
+#else
+static inline void set_highmem_pages_init(void)
+{
+}
 #endif
 
 #endif /* _ASM_X86_NUMA_32_H */
diff --git a/arch/x86/include/asm/numaq.h b/arch/x86/include/asm/numaq.h
index 1e8bd30b4c16..9f0a5f5d29ec 100644
--- a/arch/x86/include/asm/numaq.h
+++ b/arch/x86/include/asm/numaq.h
@@ -31,6 +31,8 @@
 extern int found_numaq;
 extern int get_memcfg_numaq(void);
 
+extern void *xquad_portio;
+
 /*
  * SYS_CFG_DATA_PRIV_ADDR, struct eachquadmem, and struct sys_cfg_data are the
  */
diff --git a/arch/x86/include/asm/numaq/apic.h b/arch/x86/include/asm/numaq/apic.h
deleted file mode 100644
index bf37bc49bd8e..000000000000
--- a/arch/x86/include/asm/numaq/apic.h
+++ /dev/null
@@ -1,142 +0,0 @@
-#ifndef __ASM_NUMAQ_APIC_H
-#define __ASM_NUMAQ_APIC_H
-
-#include <asm/io.h>
-#include <linux/mmzone.h>
-#include <linux/nodemask.h>
-
-#define APIC_DFR_VALUE	(APIC_DFR_CLUSTER)
-
-static inline const cpumask_t *target_cpus(void)
-{
-	return &CPU_MASK_ALL;
-}
-
-#define NO_BALANCE_IRQ (1)
-#define esr_disable (1)
-
-#define INT_DELIVERY_MODE dest_LowestPrio
-#define INT_DEST_MODE 0     /* physical delivery on LOCAL quad */
- 
-static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
-{
-	return physid_isset(apicid, bitmap);
-}
-static inline unsigned long check_apicid_present(int bit)
-{
-	return physid_isset(bit, phys_cpu_present_map);
-}
-#define apicid_cluster(apicid) (apicid & 0xF0)
-
-static inline int apic_id_registered(void)
-{
-	return 1;
-}
-
-static inline void init_apic_ldr(void)
-{
-	/* Already done in NUMA-Q firmware */
-}
-
-static inline void setup_apic_routing(void)
-{
-	printk("Enabling APIC mode:  %s.  Using %d I/O APICs\n",
-		"NUMA-Q", nr_ioapics);
-}
-
-/*
- * Skip adding the timer int on secondary nodes, which causes
- * a small but painful rift in the time-space continuum.
- */
-static inline int multi_timer_check(int apic, int irq)
-{
-	return apic != 0 && irq == 0;
-}
-
-static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_map)
-{
-	/* We don't have a good way to do this yet - hack */
-	return physids_promote(0xFUL);
-}
-
-/* Mapping from cpu number to logical apicid */
-extern u8 cpu_2_logical_apicid[];
-static inline int cpu_to_logical_apicid(int cpu)
-{
-	if (cpu >= nr_cpu_ids)
-		return BAD_APICID;
-	return (int)cpu_2_logical_apicid[cpu];
-}
-
-/*
- * Supporting over 60 cpus on NUMA-Q requires a locality-dependent
- * cpu to APIC ID relation to properly interact with the intelligent
- * mode of the cluster controller.
- */
-static inline int cpu_present_to_apicid(int mps_cpu)
-{
-	if (mps_cpu < 60)
-		return ((mps_cpu >> 2) << 4) | (1 << (mps_cpu & 0x3));
-	else
-		return BAD_APICID;
-}
-
-static inline int apicid_to_node(int logical_apicid) 
-{
-	return logical_apicid >> 4;
-}
-
-static inline physid_mask_t apicid_to_cpu_present(int logical_apicid)
-{
-	int node = apicid_to_node(logical_apicid);
-	int cpu = __ffs(logical_apicid & 0xf);
-
-	return physid_mask_of_physid(cpu + 4*node);
-}
-
-extern void *xquad_portio;
-
-static inline void setup_portio_remap(void)
-{
-	int num_quads = num_online_nodes();
-
-	if (num_quads <= 1)
-       		return;
-
-	printk("Remapping cross-quad port I/O for %d quads\n", num_quads);
-	xquad_portio = ioremap(XQUAD_PORTIO_BASE, num_quads*XQUAD_PORTIO_QUAD);
-	printk("xquad_portio vaddr 0x%08lx, len %08lx\n",
-		(u_long) xquad_portio, (u_long) num_quads*XQUAD_PORTIO_QUAD);
-}
-
-static inline int check_phys_apicid_present(int boot_cpu_physical_apicid)
-{
-	return (1);
-}
-
-static inline void enable_apic_mode(void)
-{
-}
-
-/*
- * We use physical apicids here, not logical, so just return the default
- * physical broadcast to stop people from breaking us
- */
-static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask)
-{
-	return (int) 0xF;
-}
-
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-						  const struct cpumask *andmask)
-{
-	return (int) 0xF;
-}
-
-/* No NUMA-Q box has a HT CPU, but it can't hurt to use the default code. */
-static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
-#endif /* __ASM_NUMAQ_APIC_H */
diff --git a/arch/x86/include/asm/numaq/apicdef.h b/arch/x86/include/asm/numaq/apicdef.h
deleted file mode 100644
index e012a46cc22a..000000000000
--- a/arch/x86/include/asm/numaq/apicdef.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef __ASM_NUMAQ_APICDEF_H
-#define __ASM_NUMAQ_APICDEF_H
-
-
-#define APIC_ID_MASK (0xF<<24)
-
-static inline unsigned get_apic_id(unsigned long x)
-{
-	        return (((x)>>24)&0x0F);
-}
-
-#define         GET_APIC_ID(x)  get_apic_id(x)
-
-#endif
diff --git a/arch/x86/include/asm/numaq/ipi.h b/arch/x86/include/asm/numaq/ipi.h
deleted file mode 100644
index a8374c652778..000000000000
--- a/arch/x86/include/asm/numaq/ipi.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#ifndef __ASM_NUMAQ_IPI_H
-#define __ASM_NUMAQ_IPI_H
-
-void send_IPI_mask_sequence(const struct cpumask *mask, int vector);
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector);
-
-static inline void send_IPI_mask(const struct cpumask *mask, int vector)
-{
-	send_IPI_mask_sequence(mask, vector);
-}
-
-static inline void send_IPI_allbutself(int vector)
-{
-	send_IPI_mask_allbutself(cpu_online_mask, vector);
-}
-
-static inline void send_IPI_all(int vector)
-{
-	send_IPI_mask(cpu_online_mask, vector);
-}
-
-#endif /* __ASM_NUMAQ_IPI_H */
diff --git a/arch/x86/include/asm/numaq/mpparse.h b/arch/x86/include/asm/numaq/mpparse.h
deleted file mode 100644
index a2eeefcd1cc7..000000000000
--- a/arch/x86/include/asm/numaq/mpparse.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef __ASM_NUMAQ_MPPARSE_H
-#define __ASM_NUMAQ_MPPARSE_H
-
-extern void numaq_mps_oem_check(struct mpc_table *, char *, char *);
-
-#endif /* __ASM_NUMAQ_MPPARSE_H */
diff --git a/arch/x86/include/asm/numaq/wakecpu.h b/arch/x86/include/asm/numaq/wakecpu.h
deleted file mode 100644
index 6f499df8eddb..000000000000
--- a/arch/x86/include/asm/numaq/wakecpu.h
+++ /dev/null
@@ -1,45 +0,0 @@
-#ifndef __ASM_NUMAQ_WAKECPU_H
-#define __ASM_NUMAQ_WAKECPU_H
-
-/* This file copes with machines that wakeup secondary CPUs by NMIs */
-
-#define TRAMPOLINE_PHYS_LOW (0x8)
-#define TRAMPOLINE_PHYS_HIGH (0xa)
-
-/* We don't do anything here because we use NMI's to boot instead */
-static inline void wait_for_init_deassert(atomic_t *deassert)
-{
-}
-
-/*
- * Because we use NMIs rather than the INIT-STARTUP sequence to
- * bootstrap the CPUs, the APIC may be in a weird state. Kick it.
- */
-static inline void smp_callin_clear_local_apic(void)
-{
-	clear_local_APIC();
-}
-
-static inline void store_NMI_vector(unsigned short *high, unsigned short *low)
-{
-	printk("Storing NMI vector\n");
-	*high =
-	  *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH));
-	*low =
-	  *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW));
-}
-
-static inline void restore_NMI_vector(unsigned short *high, unsigned short *low)
-{
-	printk("Restoring NMI vector\n");
-	*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
-								 *high;
-	*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
-								 *low;
-}
-
-static inline void inquire_remote_apic(int apicid)
-{
-}
-
-#endif /* __ASM_NUMAQ_WAKECPU_H */
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 776579119a00..89ed9d70b0aa 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -1,42 +1,11 @@
 #ifndef _ASM_X86_PAGE_H
 #define _ASM_X86_PAGE_H
 
-#include <linux/const.h>
-
-/* PAGE_SHIFT determines the page size */
-#define PAGE_SHIFT	12
-#define PAGE_SIZE	(_AC(1,UL) << PAGE_SHIFT)
-#define PAGE_MASK	(~(PAGE_SIZE-1))
+#include <linux/types.h>
 
 #ifdef __KERNEL__
 
-#define __PHYSICAL_MASK		((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1)
-#define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
-
-/* Cast PAGE_MASK to a signed type so that it is sign-extended if
-   virtual addresses are 32-bits but physical addresses are larger
-   (ie, 32-bit PAE). */
-#define PHYSICAL_PAGE_MASK	(((signed long)PAGE_MASK) & __PHYSICAL_MASK)
-
-/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */
-#define PTE_PFN_MASK		((pteval_t)PHYSICAL_PAGE_MASK)
-
-/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */
-#define PTE_FLAGS_MASK		(~PTE_PFN_MASK)
-
-#define PMD_PAGE_SIZE		(_AC(1, UL) << PMD_SHIFT)
-#define PMD_PAGE_MASK		(~(PMD_PAGE_SIZE-1))
-
-#define HPAGE_SHIFT		PMD_SHIFT
-#define HPAGE_SIZE		(_AC(1,UL) << HPAGE_SHIFT)
-#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
-#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
-
-#define HUGE_MAX_HSTATE 2
-
-#ifndef __ASSEMBLY__
-#include <linux/types.h>
-#endif
+#include <asm/page_types.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/page_64.h>
@@ -44,38 +13,18 @@
 #include <asm/page_32.h>
 #endif	/* CONFIG_X86_64 */
 
-#define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
-
-#define VM_DATA_DEFAULT_FLAGS \
-	(((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
-	 VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
-
-
 #ifndef __ASSEMBLY__
 
-typedef struct { pgdval_t pgd; } pgd_t;
-typedef struct { pgprotval_t pgprot; } pgprot_t;
-
-extern int page_is_ram(unsigned long pagenr);
-extern int devmem_is_allowed(unsigned long pagenr);
-extern void map_devmem(unsigned long pfn, unsigned long size,
-		       pgprot_t vma_prot);
-extern void unmap_devmem(unsigned long pfn, unsigned long size,
-			 pgprot_t vma_prot);
-
-extern unsigned long max_low_pfn_mapped;
-extern unsigned long max_pfn_mapped;
-
 struct page;
 
 static inline void clear_user_page(void *page, unsigned long vaddr,
-				struct page *pg)
+				   struct page *pg)
 {
 	clear_page(page);
 }
 
 static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
-				struct page *topage)
+				  struct page *topage)
 {
 	copy_page(to, from);
 }
@@ -84,99 +33,6 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
-static inline pgd_t native_make_pgd(pgdval_t val)
-{
-	return (pgd_t) { val };
-}
-
-static inline pgdval_t native_pgd_val(pgd_t pgd)
-{
-	return pgd.pgd;
-}
-
-#if PAGETABLE_LEVELS >= 3
-#if PAGETABLE_LEVELS == 4
-typedef struct { pudval_t pud; } pud_t;
-
-static inline pud_t native_make_pud(pmdval_t val)
-{
-	return (pud_t) { val };
-}
-
-static inline pudval_t native_pud_val(pud_t pud)
-{
-	return pud.pud;
-}
-#else	/* PAGETABLE_LEVELS == 3 */
-#include <asm-generic/pgtable-nopud.h>
-
-static inline pudval_t native_pud_val(pud_t pud)
-{
-	return native_pgd_val(pud.pgd);
-}
-#endif	/* PAGETABLE_LEVELS == 4 */
-
-typedef struct { pmdval_t pmd; } pmd_t;
-
-static inline pmd_t native_make_pmd(pmdval_t val)
-{
-	return (pmd_t) { val };
-}
-
-static inline pmdval_t native_pmd_val(pmd_t pmd)
-{
-	return pmd.pmd;
-}
-#else  /* PAGETABLE_LEVELS == 2 */
-#include <asm-generic/pgtable-nopmd.h>
-
-static inline pmdval_t native_pmd_val(pmd_t pmd)
-{
-	return native_pgd_val(pmd.pud.pgd);
-}
-#endif	/* PAGETABLE_LEVELS >= 3 */
-
-static inline pte_t native_make_pte(pteval_t val)
-{
-	return (pte_t) { .pte = val };
-}
-
-static inline pteval_t native_pte_val(pte_t pte)
-{
-	return pte.pte;
-}
-
-static inline pteval_t native_pte_flags(pte_t pte)
-{
-	return native_pte_val(pte) & PTE_FLAGS_MASK;
-}
-
-#define pgprot_val(x)	((x).pgprot)
-#define __pgprot(x)	((pgprot_t) { (x) } )
-
-#ifdef CONFIG_PARAVIRT
-#include <asm/paravirt.h>
-#else  /* !CONFIG_PARAVIRT */
-
-#define pgd_val(x)	native_pgd_val(x)
-#define __pgd(x)	native_make_pgd(x)
-
-#ifndef __PAGETABLE_PUD_FOLDED
-#define pud_val(x)	native_pud_val(x)
-#define __pud(x)	native_make_pud(x)
-#endif
-
-#ifndef __PAGETABLE_PMD_FOLDED
-#define pmd_val(x)	native_pmd_val(x)
-#define __pmd(x)	native_make_pmd(x)
-#endif
-
-#define pte_val(x)	native_pte_val(x)
-#define pte_flags(x)	native_pte_flags(x)
-#define __pte(x)	native_make_pte(x)
-
-#endif	/* CONFIG_PARAVIRT */
-
 #define __pa(x)		__phys_addr((unsigned long)(x))
 #define __pa_nodebug(x)	__phys_addr_nodebug((unsigned long)(x))
 /* __pa_symbol should be used for C visible symbols.
diff --git a/arch/x86/include/asm/page_32.h b/arch/x86/include/asm/page_32.h
index bcde0d7b4325..da4e762406f7 100644
--- a/arch/x86/include/asm/page_32.h
+++ b/arch/x86/include/asm/page_32.h
@@ -1,82 +1,14 @@
 #ifndef _ASM_X86_PAGE_32_H
 #define _ASM_X86_PAGE_32_H
 
-/*
- * This handles the memory map.
- *
- * A __PAGE_OFFSET of 0xC0000000 means that the kernel has
- * a virtual address space of one gigabyte, which limits the
- * amount of physical memory you can use to about 950MB.
- *
- * If you want more physical memory than this then see the CONFIG_HIGHMEM4G
- * and CONFIG_HIGHMEM64G options in the kernel configuration.
- */
-#define __PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
-
-#ifdef CONFIG_4KSTACKS
-#define THREAD_ORDER	0
-#else
-#define THREAD_ORDER	1
-#endif
-#define THREAD_SIZE 	(PAGE_SIZE << THREAD_ORDER)
-
-#define STACKFAULT_STACK 0
-#define DOUBLEFAULT_STACK 1
-#define NMI_STACK 0
-#define DEBUG_STACK 0
-#define MCE_STACK 0
-#define N_EXCEPTION_STACKS 1
-
-#ifdef CONFIG_X86_PAE
-/* 44=32+12, the limit we can fit into an unsigned long pfn */
-#define __PHYSICAL_MASK_SHIFT	44
-#define __VIRTUAL_MASK_SHIFT	32
-#define PAGETABLE_LEVELS	3
-
-#ifndef __ASSEMBLY__
-typedef u64	pteval_t;
-typedef u64	pmdval_t;
-typedef u64	pudval_t;
-typedef u64	pgdval_t;
-typedef u64	pgprotval_t;
-
-typedef union {
-	struct {
-		unsigned long pte_low, pte_high;
-	};
-	pteval_t pte;
-} pte_t;
-#endif	/* __ASSEMBLY__
- */
-#else  /* !CONFIG_X86_PAE */
-#define __PHYSICAL_MASK_SHIFT	32
-#define __VIRTUAL_MASK_SHIFT	32
-#define PAGETABLE_LEVELS	2
-
-#ifndef __ASSEMBLY__
-typedef unsigned long	pteval_t;
-typedef unsigned long	pmdval_t;
-typedef unsigned long	pudval_t;
-typedef unsigned long	pgdval_t;
-typedef unsigned long	pgprotval_t;
-
-typedef union {
-	pteval_t pte;
-	pteval_t pte_low;
-} pte_t;
-
-#endif	/* __ASSEMBLY__ */
-#endif	/* CONFIG_X86_PAE */
+#include <asm/page_32_types.h>
 
 #ifndef __ASSEMBLY__
-typedef struct page *pgtable_t;
-#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #endif
 
-#ifndef __ASSEMBLY__
 #define __phys_addr_nodebug(x)	((x) - PAGE_OFFSET)
 #ifdef CONFIG_DEBUG_VIRTUAL
 extern unsigned long __phys_addr(unsigned long);
@@ -89,23 +21,6 @@ extern unsigned long __phys_addr(unsigned long);
 #define pfn_valid(pfn)		((pfn) < max_mapnr)
 #endif /* CONFIG_FLATMEM */
 
-extern int nx_enabled;
-
-/*
- * This much address space is reserved for vmalloc() and iomap()
- * as well as fixmap mappings.
- */
-extern unsigned int __VMALLOC_RESERVE;
-extern int sysctl_legacy_va_layout;
-
-extern void find_low_pfn_range(void);
-extern unsigned long init_memory_mapping(unsigned long start,
-					 unsigned long end);
-extern void initmem_init(unsigned long, unsigned long);
-extern void free_initmem(void);
-extern void setup_bootmem_allocator(void);
-
-
 #ifdef CONFIG_X86_USE_3DNOW
 #include <asm/mmx.h>
 
diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h
new file mode 100644
index 000000000000..f1e4a79a6e41
--- /dev/null
+++ b/arch/x86/include/asm/page_32_types.h
@@ -0,0 +1,60 @@
+#ifndef _ASM_X86_PAGE_32_DEFS_H
+#define _ASM_X86_PAGE_32_DEFS_H
+
+#include <linux/const.h>
+
+/*
+ * This handles the memory map.
+ *
+ * A __PAGE_OFFSET of 0xC0000000 means that the kernel has
+ * a virtual address space of one gigabyte, which limits the
+ * amount of physical memory you can use to about 950MB.
+ *
+ * If you want more physical memory than this then see the CONFIG_HIGHMEM4G
+ * and CONFIG_HIGHMEM64G options in the kernel configuration.
+ */
+#define __PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
+
+#ifdef CONFIG_4KSTACKS
+#define THREAD_ORDER	0
+#else
+#define THREAD_ORDER	1
+#endif
+#define THREAD_SIZE 	(PAGE_SIZE << THREAD_ORDER)
+
+#define STACKFAULT_STACK 0
+#define DOUBLEFAULT_STACK 1
+#define NMI_STACK 0
+#define DEBUG_STACK 0
+#define MCE_STACK 0
+#define N_EXCEPTION_STACKS 1
+
+#ifdef CONFIG_X86_PAE
+/* 44=32+12, the limit we can fit into an unsigned long pfn */
+#define __PHYSICAL_MASK_SHIFT	44
+#define __VIRTUAL_MASK_SHIFT	32
+
+#else  /* !CONFIG_X86_PAE */
+#define __PHYSICAL_MASK_SHIFT	32
+#define __VIRTUAL_MASK_SHIFT	32
+#endif	/* CONFIG_X86_PAE */
+
+#ifndef __ASSEMBLY__
+
+/*
+ * This much address space is reserved for vmalloc() and iomap()
+ * as well as fixmap mappings.
+ */
+extern unsigned int __VMALLOC_RESERVE;
+extern int sysctl_legacy_va_layout;
+
+extern void find_low_pfn_range(void);
+extern unsigned long init_memory_mapping(unsigned long start,
+					 unsigned long end);
+extern void initmem_init(unsigned long, unsigned long);
+extern void free_initmem(void);
+extern void setup_bootmem_allocator(void);
+
+#endif	/* !__ASSEMBLY__ */
+
+#endif /* _ASM_X86_PAGE_32_DEFS_H */
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 5ebca29f44f0..072694ed81a5 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -1,105 +1,6 @@
 #ifndef _ASM_X86_PAGE_64_H
 #define _ASM_X86_PAGE_64_H
 
-#define PAGETABLE_LEVELS	4
-
-#define THREAD_ORDER	1
-#define THREAD_SIZE  (PAGE_SIZE << THREAD_ORDER)
-#define CURRENT_MASK (~(THREAD_SIZE - 1))
-
-#define EXCEPTION_STACK_ORDER 0
-#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
-
-#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
-#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
-
-#define IRQSTACK_ORDER 2
-#define IRQSTACKSIZE (PAGE_SIZE << IRQSTACK_ORDER)
-
-#define STACKFAULT_STACK 1
-#define DOUBLEFAULT_STACK 2
-#define NMI_STACK 3
-#define DEBUG_STACK 4
-#define MCE_STACK 5
-#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
-
-#define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
-#define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
-
-/*
- * Set __PAGE_OFFSET to the most negative possible address +
- * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
- * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
- * what Xen requires.
- */
-#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
-
-#define __PHYSICAL_START	CONFIG_PHYSICAL_START
-#define __KERNEL_ALIGN		0x200000
-
-/*
- * Make sure kernel is aligned to 2MB address. Catching it at compile
- * time is better. Change your config file and compile the kernel
- * for a 2MB aligned address (CONFIG_PHYSICAL_START)
- */
-#if (CONFIG_PHYSICAL_START % __KERNEL_ALIGN) != 0
-#error "CONFIG_PHYSICAL_START must be a multiple of 2MB"
-#endif
-
-#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
-#define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
-
-/* See Documentation/x86_64/mm.txt for a description of the memory map. */
-#define __PHYSICAL_MASK_SHIFT	46
-#define __VIRTUAL_MASK_SHIFT	48
-
-/*
- * Kernel image size is limited to 512 MB (see level2_kernel_pgt in
- * arch/x86/kernel/head_64.S), and it is mapped here:
- */
-#define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
-#define KERNEL_IMAGE_START	_AC(0xffffffff80000000, UL)
-
-#ifndef __ASSEMBLY__
-void clear_page(void *page);
-void copy_page(void *to, void *from);
-
-/* duplicated to the one in bootmem.h */
-extern unsigned long max_pfn;
-extern unsigned long phys_base;
-
-extern unsigned long __phys_addr(unsigned long);
-#define __phys_reloc_hide(x)	(x)
-
-/*
- * These are used to make use of C type-checking..
- */
-typedef unsigned long	pteval_t;
-typedef unsigned long	pmdval_t;
-typedef unsigned long	pudval_t;
-typedef unsigned long	pgdval_t;
-typedef unsigned long	pgprotval_t;
-
-typedef struct page *pgtable_t;
-
-typedef struct { pteval_t pte; } pte_t;
-
-#define vmemmap ((struct page *)VMEMMAP_START)
-
-extern unsigned long init_memory_mapping(unsigned long start,
-					 unsigned long end);
-
-extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn);
-extern void free_initmem(void);
-
-extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
-extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
-
-#endif	/* !__ASSEMBLY__ */
-
-#ifdef CONFIG_FLATMEM
-#define pfn_valid(pfn)          ((pfn) < max_pfn)
-#endif
-
+#include <asm/page_64_types.h>
 
 #endif /* _ASM_X86_PAGE_64_H */
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
new file mode 100644
index 000000000000..d38c91b70248
--- /dev/null
+++ b/arch/x86/include/asm/page_64_types.h
@@ -0,0 +1,89 @@
+#ifndef _ASM_X86_PAGE_64_DEFS_H
+#define _ASM_X86_PAGE_64_DEFS_H
+
+#define THREAD_ORDER	1
+#define THREAD_SIZE  (PAGE_SIZE << THREAD_ORDER)
+#define CURRENT_MASK (~(THREAD_SIZE - 1))
+
+#define EXCEPTION_STACK_ORDER 0
+#define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER)
+
+#define DEBUG_STACK_ORDER (EXCEPTION_STACK_ORDER + 1)
+#define DEBUG_STKSZ (PAGE_SIZE << DEBUG_STACK_ORDER)
+
+#define IRQ_STACK_ORDER 2
+#define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
+
+#define STACKFAULT_STACK 1
+#define DOUBLEFAULT_STACK 2
+#define NMI_STACK 3
+#define DEBUG_STACK 4
+#define MCE_STACK 5
+#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
+
+#define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
+
+/*
+ * Set __PAGE_OFFSET to the most negative possible address +
+ * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
+ * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
+ * what Xen requires.
+ */
+#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
+
+#define __PHYSICAL_START	CONFIG_PHYSICAL_START
+#define __KERNEL_ALIGN		0x200000
+
+/*
+ * Make sure kernel is aligned to 2MB address. Catching it at compile
+ * time is better. Change your config file and compile the kernel
+ * for a 2MB aligned address (CONFIG_PHYSICAL_START)
+ */
+#if (CONFIG_PHYSICAL_START % __KERNEL_ALIGN) != 0
+#error "CONFIG_PHYSICAL_START must be a multiple of 2MB"
+#endif
+
+#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
+#define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
+
+/* See Documentation/x86_64/mm.txt for a description of the memory map. */
+#define __PHYSICAL_MASK_SHIFT	46
+#define __VIRTUAL_MASK_SHIFT	48
+
+/*
+ * Kernel image size is limited to 512 MB (see level2_kernel_pgt in
+ * arch/x86/kernel/head_64.S), and it is mapped here:
+ */
+#define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
+#define KERNEL_IMAGE_START	_AC(0xffffffff80000000, UL)
+
+#ifndef __ASSEMBLY__
+void clear_page(void *page);
+void copy_page(void *to, void *from);
+
+/* duplicated to the one in bootmem.h */
+extern unsigned long max_pfn;
+extern unsigned long phys_base;
+
+extern unsigned long __phys_addr(unsigned long);
+#define __phys_reloc_hide(x)	(x)
+
+#define vmemmap ((struct page *)VMEMMAP_START)
+
+extern unsigned long init_memory_mapping(unsigned long start,
+					 unsigned long end);
+
+extern void initmem_init(unsigned long start_pfn, unsigned long end_pfn);
+extern void free_initmem(void);
+
+extern void init_extra_mapping_uc(unsigned long phys, unsigned long size);
+extern void init_extra_mapping_wb(unsigned long phys, unsigned long size);
+
+#endif	/* !__ASSEMBLY__ */
+
+#ifdef CONFIG_FLATMEM
+#define pfn_valid(pfn)          ((pfn) < max_pfn)
+#endif
+
+#endif /* _ASM_X86_PAGE_64_DEFS_H */
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
new file mode 100644
index 000000000000..2d625da6603c
--- /dev/null
+++ b/arch/x86/include/asm/page_types.h
@@ -0,0 +1,57 @@
+#ifndef _ASM_X86_PAGE_DEFS_H
+#define _ASM_X86_PAGE_DEFS_H
+
+#include <linux/const.h>
+
+/* PAGE_SHIFT determines the page size */
+#define PAGE_SHIFT	12
+#define PAGE_SIZE	(_AC(1,UL) << PAGE_SHIFT)
+#define PAGE_MASK	(~(PAGE_SIZE-1))
+
+#define __PHYSICAL_MASK		((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1)
+#define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
+
+/* Cast PAGE_MASK to a signed type so that it is sign-extended if
+   virtual addresses are 32-bits but physical addresses are larger
+   (ie, 32-bit PAE). */
+#define PHYSICAL_PAGE_MASK	(((signed long)PAGE_MASK) & __PHYSICAL_MASK)
+
+#define PMD_PAGE_SIZE		(_AC(1, UL) << PMD_SHIFT)
+#define PMD_PAGE_MASK		(~(PMD_PAGE_SIZE-1))
+
+#define HPAGE_SHIFT		PMD_SHIFT
+#define HPAGE_SIZE		(_AC(1,UL) << HPAGE_SHIFT)
+#define HPAGE_MASK		(~(HPAGE_SIZE - 1))
+#define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
+
+#define HUGE_MAX_HSTATE 2
+
+#define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
+
+#define VM_DATA_DEFAULT_FLAGS \
+	(((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
+	 VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
+
+#ifdef CONFIG_X86_64
+#include <asm/page_64_types.h>
+#else
+#include <asm/page_32_types.h>
+#endif	/* CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
+struct pgprot;
+
+extern int page_is_ram(unsigned long pagenr);
+extern int devmem_is_allowed(unsigned long pagenr);
+extern void map_devmem(unsigned long pfn, unsigned long size,
+		       struct pgprot vma_prot);
+extern void unmap_devmem(unsigned long pfn, unsigned long size,
+			 struct pgprot vma_prot);
+
+extern unsigned long max_low_pfn_mapped;
+extern unsigned long max_pfn_mapped;
+
+#endif	/* !__ASSEMBLY__ */
+
+#endif	/* _ASM_X86_PAGE_DEFS_H */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index e299287e8e33..0617d5cc9712 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -4,7 +4,7 @@
  * para-virtualization: those hooks are defined here. */
 
 #ifdef CONFIG_PARAVIRT
-#include <asm/page.h>
+#include <asm/pgtable_types.h>
 #include <asm/asm.h>
 
 /* Bitmask of what can be clobbered: usually at least eax. */
@@ -12,21 +12,38 @@
 #define CLBR_EAX  (1 << 0)
 #define CLBR_ECX  (1 << 1)
 #define CLBR_EDX  (1 << 2)
+#define CLBR_EDI  (1 << 3)
 
-#ifdef CONFIG_X86_64
-#define CLBR_RSI  (1 << 3)
-#define CLBR_RDI  (1 << 4)
+#ifdef CONFIG_X86_32
+/* CLBR_ANY should match all regs platform has. For i386, that's just it */
+#define CLBR_ANY  ((1 << 4) - 1)
+
+#define CLBR_ARG_REGS	(CLBR_EAX | CLBR_EDX | CLBR_ECX)
+#define CLBR_RET_REG	(CLBR_EAX | CLBR_EDX)
+#define CLBR_SCRATCH	(0)
+#else
+#define CLBR_RAX  CLBR_EAX
+#define CLBR_RCX  CLBR_ECX
+#define CLBR_RDX  CLBR_EDX
+#define CLBR_RDI  CLBR_EDI
+#define CLBR_RSI  (1 << 4)
 #define CLBR_R8   (1 << 5)
 #define CLBR_R9   (1 << 6)
 #define CLBR_R10  (1 << 7)
 #define CLBR_R11  (1 << 8)
+
 #define CLBR_ANY  ((1 << 9) - 1)
+
+#define CLBR_ARG_REGS	(CLBR_RDI | CLBR_RSI | CLBR_RDX | \
+			 CLBR_RCX | CLBR_R8 | CLBR_R9)
+#define CLBR_RET_REG	(CLBR_RAX)
+#define CLBR_SCRATCH	(CLBR_R10 | CLBR_R11)
+
 #include <asm/desc_defs.h>
-#else
-/* CLBR_ANY should match all regs platform has. For i386, that's just it */
-#define CLBR_ANY  ((1 << 3) - 1)
 #endif /* X86_64 */
 
+#define CLBR_CALLEE_SAVE ((CLBR_ARG_REGS | CLBR_SCRATCH) & ~CLBR_RET_REG)
+
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -40,6 +57,14 @@ struct tss_struct;
 struct mm_struct;
 struct desc_struct;
 
+/*
+ * Wrapper type for pointers to code which uses the non-standard
+ * calling convention.  See PV_CALL_SAVE_REGS_THUNK below.
+ */
+struct paravirt_callee_save {
+	void *func;
+};
+
 /* general info */
 struct pv_info {
 	unsigned int kernel_rpl;
@@ -189,11 +214,15 @@ struct pv_irq_ops {
 	 * expected to use X86_EFLAGS_IF; all other bits
 	 * returned from save_fl are undefined, and may be ignored by
 	 * restore_fl.
+	 *
+	 * NOTE: These functions callers expect the callee to preserve
+	 * more registers than the standard C calling convention.
 	 */
-	unsigned long (*save_fl)(void);
-	void (*restore_fl)(unsigned long);
-	void (*irq_disable)(void);
-	void (*irq_enable)(void);
+	struct paravirt_callee_save save_fl;
+	struct paravirt_callee_save restore_fl;
+	struct paravirt_callee_save irq_disable;
+	struct paravirt_callee_save irq_enable;
+
 	void (*safe_halt)(void);
 	void (*halt)(void);
 
@@ -244,7 +273,8 @@ struct pv_mmu_ops {
 	void (*flush_tlb_user)(void);
 	void (*flush_tlb_kernel)(void);
 	void (*flush_tlb_single)(unsigned long addr);
-	void (*flush_tlb_others)(const cpumask_t *cpus, struct mm_struct *mm,
+	void (*flush_tlb_others)(const struct cpumask *cpus,
+				 struct mm_struct *mm,
 				 unsigned long va);
 
 	/* Hooks for allocating and freeing a pagetable top-level */
@@ -278,12 +308,11 @@ struct pv_mmu_ops {
 	void (*ptep_modify_prot_commit)(struct mm_struct *mm, unsigned long addr,
 					pte_t *ptep, pte_t pte);
 
-	pteval_t (*pte_val)(pte_t);
-	pteval_t (*pte_flags)(pte_t);
-	pte_t (*make_pte)(pteval_t pte);
+	struct paravirt_callee_save pte_val;
+	struct paravirt_callee_save make_pte;
 
-	pgdval_t (*pgd_val)(pgd_t);
-	pgd_t (*make_pgd)(pgdval_t pgd);
+	struct paravirt_callee_save pgd_val;
+	struct paravirt_callee_save make_pgd;
 
 #if PAGETABLE_LEVELS >= 3
 #ifdef CONFIG_X86_PAE
@@ -298,12 +327,12 @@ struct pv_mmu_ops {
 
 	void (*set_pud)(pud_t *pudp, pud_t pudval);
 
-	pmdval_t (*pmd_val)(pmd_t);
-	pmd_t (*make_pmd)(pmdval_t pmd);
+	struct paravirt_callee_save pmd_val;
+	struct paravirt_callee_save make_pmd;
 
 #if PAGETABLE_LEVELS == 4
-	pudval_t (*pud_val)(pud_t);
-	pud_t (*make_pud)(pudval_t pud);
+	struct paravirt_callee_save pud_val;
+	struct paravirt_callee_save make_pud;
 
 	void (*set_pgd)(pgd_t *pudp, pgd_t pgdval);
 #endif	/* PAGETABLE_LEVELS == 4 */
@@ -388,6 +417,8 @@ extern struct pv_lock_ops pv_lock_ops;
 	asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
 
 unsigned paravirt_patch_nop(void);
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
 unsigned paravirt_patch_ignore(unsigned len);
 unsigned paravirt_patch_call(void *insnbuf,
 			     const void *target, u16 tgt_clobbers,
@@ -479,25 +510,45 @@ int paravirt_disable_iospace(void);
  * makes sure the incoming and outgoing types are always correct.
  */
 #ifdef CONFIG_X86_32
-#define PVOP_VCALL_ARGS			unsigned long __eax, __edx, __ecx
+#define PVOP_VCALL_ARGS				\
+	unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx
 #define PVOP_CALL_ARGS			PVOP_VCALL_ARGS
+
+#define PVOP_CALL_ARG1(x)		"a" ((unsigned long)(x))
+#define PVOP_CALL_ARG2(x)		"d" ((unsigned long)(x))
+#define PVOP_CALL_ARG3(x)		"c" ((unsigned long)(x))
+
 #define PVOP_VCALL_CLOBBERS		"=a" (__eax), "=d" (__edx),	\
 					"=c" (__ecx)
 #define PVOP_CALL_CLOBBERS		PVOP_VCALL_CLOBBERS
+
+#define PVOP_VCALLEE_CLOBBERS		"=a" (__eax), "=d" (__edx)
+#define PVOP_CALLEE_CLOBBERS		PVOP_VCALLEE_CLOBBERS
+
 #define EXTRA_CLOBBERS
 #define VEXTRA_CLOBBERS
-#else
-#define PVOP_VCALL_ARGS		unsigned long __edi, __esi, __edx, __ecx
+#else  /* CONFIG_X86_64 */
+#define PVOP_VCALL_ARGS					\
+	unsigned long __edi = __edi, __esi = __esi,	\
+		__edx = __edx, __ecx = __ecx
 #define PVOP_CALL_ARGS		PVOP_VCALL_ARGS, __eax
+
+#define PVOP_CALL_ARG1(x)		"D" ((unsigned long)(x))
+#define PVOP_CALL_ARG2(x)		"S" ((unsigned long)(x))
+#define PVOP_CALL_ARG3(x)		"d" ((unsigned long)(x))
+#define PVOP_CALL_ARG4(x)		"c" ((unsigned long)(x))
+
 #define PVOP_VCALL_CLOBBERS	"=D" (__edi),				\
 				"=S" (__esi), "=d" (__edx),		\
 				"=c" (__ecx)
-
 #define PVOP_CALL_CLOBBERS	PVOP_VCALL_CLOBBERS, "=a" (__eax)
 
+#define PVOP_VCALLEE_CLOBBERS	"=a" (__eax)
+#define PVOP_CALLEE_CLOBBERS	PVOP_VCALLEE_CLOBBERS
+
 #define EXTRA_CLOBBERS	 , "r8", "r9", "r10", "r11"
 #define VEXTRA_CLOBBERS	 , "rax", "r8", "r9", "r10", "r11"
-#endif
+#endif	/* CONFIG_X86_32 */
 
 #ifdef CONFIG_PARAVIRT_DEBUG
 #define PVOP_TEST_NULL(op)	BUG_ON(op == NULL)
@@ -505,10 +556,11 @@ int paravirt_disable_iospace(void);
 #define PVOP_TEST_NULL(op)	((void)op)
 #endif
 
-#define __PVOP_CALL(rettype, op, pre, post, ...)			\
+#define ____PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr,		\
+		      pre, post, ...)					\
 	({								\
 		rettype __ret;						\
-		PVOP_CALL_ARGS;					\
+		PVOP_CALL_ARGS;						\
 		PVOP_TEST_NULL(op);					\
 		/* This is 32-bit specific, but is okay in 64-bit */	\
 		/* since this condition will never hold */		\
@@ -516,70 +568,113 @@ int paravirt_disable_iospace(void);
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : PVOP_CALL_CLOBBERS		\
+				     : call_clbr			\
 				     : paravirt_type(op),		\
-				       paravirt_clobber(CLBR_ANY),	\
+				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
-				     : "memory", "cc" EXTRA_CLOBBERS);	\
+				     : "memory", "cc" extra_clbr);	\
 			__ret = (rettype)((((u64)__edx) << 32) | __eax); \
 		} else {						\
 			asm volatile(pre				\
 				     paravirt_alt(PARAVIRT_CALL)	\
 				     post				\
-				     : PVOP_CALL_CLOBBERS		\
+				     : call_clbr			\
 				     : paravirt_type(op),		\
-				       paravirt_clobber(CLBR_ANY),	\
+				       paravirt_clobber(clbr),		\
 				       ##__VA_ARGS__			\
-				     : "memory", "cc" EXTRA_CLOBBERS);	\
+				     : "memory", "cc" extra_clbr);	\
 			__ret = (rettype)__eax;				\
 		}							\
 		__ret;							\
 	})
-#define __PVOP_VCALL(op, pre, post, ...)				\
+
+#define __PVOP_CALL(rettype, op, pre, post, ...)			\
+	____PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,	\
+		      EXTRA_CLOBBERS, pre, post, ##__VA_ARGS__)
+
+#define __PVOP_CALLEESAVE(rettype, op, pre, post, ...)			\
+	____PVOP_CALL(rettype, op.func, CLBR_RET_REG,			\
+		      PVOP_CALLEE_CLOBBERS, ,				\
+		      pre, post, ##__VA_ARGS__)
+
+
+#define ____PVOP_VCALL(op, clbr, call_clbr, extra_clbr, pre, post, ...)	\
 	({								\
 		PVOP_VCALL_ARGS;					\
 		PVOP_TEST_NULL(op);					\
 		asm volatile(pre					\
 			     paravirt_alt(PARAVIRT_CALL)		\
 			     post					\
-			     : PVOP_VCALL_CLOBBERS			\
+			     : call_clbr				\
 			     : paravirt_type(op),			\
-			       paravirt_clobber(CLBR_ANY),		\
+			       paravirt_clobber(clbr),			\
 			       ##__VA_ARGS__				\
-			     : "memory", "cc" VEXTRA_CLOBBERS);		\
+			     : "memory", "cc" extra_clbr);		\
 	})
 
+#define __PVOP_VCALL(op, pre, post, ...)				\
+	____PVOP_VCALL(op, CLBR_ANY, PVOP_VCALL_CLOBBERS,		\
+		       VEXTRA_CLOBBERS,					\
+		       pre, post, ##__VA_ARGS__)
+
+#define __PVOP_VCALLEESAVE(rettype, op, pre, post, ...)			\
+	____PVOP_CALL(rettype, op.func, CLBR_RET_REG,			\
+		      PVOP_VCALLEE_CLOBBERS, ,				\
+		      pre, post, ##__VA_ARGS__)
+
+
+
 #define PVOP_CALL0(rettype, op)						\
 	__PVOP_CALL(rettype, op, "", "")
 #define PVOP_VCALL0(op)							\
 	__PVOP_VCALL(op, "", "")
 
+#define PVOP_CALLEE0(rettype, op)					\
+	__PVOP_CALLEESAVE(rettype, op, "", "")
+#define PVOP_VCALLEE0(op)						\
+	__PVOP_VCALLEESAVE(op, "", "")
+
+
 #define PVOP_CALL1(rettype, op, arg1)					\
-	__PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)))
+	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)						\
-	__PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)))
+	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1))
+
+#define PVOP_CALLEE1(rettype, op, arg1)					\
+	__PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1))
+#define PVOP_VCALLEE1(op, arg1)						\
+	__PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1))
+
 
 #define PVOP_CALL2(rettype, op, arg1, arg2)				\
-	__PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), 	\
-	"1" ((unsigned long)(arg2)))
+	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1),		\
+		    PVOP_CALL_ARG2(arg2))
 #define PVOP_VCALL2(op, arg1, arg2)					\
-	__PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), 		\
-	"1" ((unsigned long)(arg2)))
+	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1),			\
+		     PVOP_CALL_ARG2(arg2))
+
+#define PVOP_CALLEE2(rettype, op, arg1, arg2)				\
+	__PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1),	\
+			  PVOP_CALL_ARG2(arg2))
+#define PVOP_VCALLEE2(op, arg1, arg2)					\
+	__PVOP_VCALLEESAVE(op, "", "", PVOP_CALL_ARG1(arg1),		\
+			   PVOP_CALL_ARG2(arg2))
+
 
 #define PVOP_CALL3(rettype, op, arg1, arg2, arg3)			\
-	__PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)),	\
-	"1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)))
+	__PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1),		\
+		    PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3))
 #define PVOP_VCALL3(op, arg1, arg2, arg3)				\
-	__PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)),		\
-	"1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)))
+	__PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1),			\
+		     PVOP_CALL_ARG2(arg2), PVOP_CALL_ARG3(arg3))
 
 /* This is the only difference in x86_64. We can make it much simpler */
 #ifdef CONFIG_X86_32
 #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4)			\
 	__PVOP_CALL(rettype, op,					\
 		    "push %[_arg4];", "lea 4(%%esp),%%esp;",		\
-		    "0" ((u32)(arg1)), "1" ((u32)(arg2)),		\
-		    "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
+		    PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),		\
+		    PVOP_CALL_ARG3(arg3), [_arg4] "mr" ((u32)(arg4)))
 #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4)				\
 	__PVOP_VCALL(op,						\
 		    "push %[_arg4];", "lea 4(%%esp),%%esp;",		\
@@ -587,13 +682,13 @@ int paravirt_disable_iospace(void);
 		    "2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
 #else
 #define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4)			\
-	__PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)),	\
-	"1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)),		\
-	"3"((unsigned long)(arg4)))
+	__PVOP_CALL(rettype, op, "", "",				\
+		    PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),		\
+		    PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4))
 #define PVOP_VCALL4(op, arg1, arg2, arg3, arg4)				\
-	__PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)),		\
-	"1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)),		\
-	"3"((unsigned long)(arg4)))
+	__PVOP_VCALL(op, "", "",					\
+		     PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2),	\
+		     PVOP_CALL_ARG3(arg3), PVOP_CALL_ARG4(arg4))
 #endif
 
 static inline int paravirt_enabled(void)
@@ -984,10 +1079,11 @@ static inline void __flush_tlb_single(unsigned long addr)
 	PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr);
 }
 
-static inline void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+static inline void flush_tlb_others(const struct cpumask *cpumask,
+				    struct mm_struct *mm,
 				    unsigned long va)
 {
-	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, &cpumask, mm, va);
+	PVOP_VCALL3(pv_mmu_ops.flush_tlb_others, cpumask, mm, va);
 }
 
 static inline int paravirt_pgd_alloc(struct mm_struct *mm)
@@ -1059,13 +1155,13 @@ static inline pte_t __pte(pteval_t val)
 	pteval_t ret;
 
 	if (sizeof(pteval_t) > sizeof(long))
-		ret = PVOP_CALL2(pteval_t,
-				 pv_mmu_ops.make_pte,
-				 val, (u64)val >> 32);
+		ret = PVOP_CALLEE2(pteval_t,
+				   pv_mmu_ops.make_pte,
+				   val, (u64)val >> 32);
 	else
-		ret = PVOP_CALL1(pteval_t,
-				 pv_mmu_ops.make_pte,
-				 val);
+		ret = PVOP_CALLEE1(pteval_t,
+				   pv_mmu_ops.make_pte,
+				   val);
 
 	return (pte_t) { .pte = ret };
 }
@@ -1075,29 +1171,12 @@ static inline pteval_t pte_val(pte_t pte)
 	pteval_t ret;
 
 	if (sizeof(pteval_t) > sizeof(long))
-		ret = PVOP_CALL2(pteval_t, pv_mmu_ops.pte_val,
-				 pte.pte, (u64)pte.pte >> 32);
-	else
-		ret = PVOP_CALL1(pteval_t, pv_mmu_ops.pte_val,
-				 pte.pte);
-
-	return ret;
-}
-
-static inline pteval_t pte_flags(pte_t pte)
-{
-	pteval_t ret;
-
-	if (sizeof(pteval_t) > sizeof(long))
-		ret = PVOP_CALL2(pteval_t, pv_mmu_ops.pte_flags,
-				 pte.pte, (u64)pte.pte >> 32);
+		ret = PVOP_CALLEE2(pteval_t, pv_mmu_ops.pte_val,
+				   pte.pte, (u64)pte.pte >> 32);
 	else
-		ret = PVOP_CALL1(pteval_t, pv_mmu_ops.pte_flags,
-				 pte.pte);
+		ret = PVOP_CALLEE1(pteval_t, pv_mmu_ops.pte_val,
+				   pte.pte);
 
-#ifdef CONFIG_PARAVIRT_DEBUG
-	BUG_ON(ret & PTE_PFN_MASK);
-#endif
 	return ret;
 }
 
@@ -1106,11 +1185,11 @@ static inline pgd_t __pgd(pgdval_t val)
 	pgdval_t ret;
 
 	if (sizeof(pgdval_t) > sizeof(long))
-		ret = PVOP_CALL2(pgdval_t, pv_mmu_ops.make_pgd,
-				 val, (u64)val >> 32);
+		ret = PVOP_CALLEE2(pgdval_t, pv_mmu_ops.make_pgd,
+				   val, (u64)val >> 32);
 	else
-		ret = PVOP_CALL1(pgdval_t, pv_mmu_ops.make_pgd,
-				 val);
+		ret = PVOP_CALLEE1(pgdval_t, pv_mmu_ops.make_pgd,
+				   val);
 
 	return (pgd_t) { ret };
 }
@@ -1120,11 +1199,11 @@ static inline pgdval_t pgd_val(pgd_t pgd)
 	pgdval_t ret;
 
 	if (sizeof(pgdval_t) > sizeof(long))
-		ret =  PVOP_CALL2(pgdval_t, pv_mmu_ops.pgd_val,
-				  pgd.pgd, (u64)pgd.pgd >> 32);
+		ret =  PVOP_CALLEE2(pgdval_t, pv_mmu_ops.pgd_val,
+				    pgd.pgd, (u64)pgd.pgd >> 32);
 	else
-		ret =  PVOP_CALL1(pgdval_t, pv_mmu_ops.pgd_val,
-				  pgd.pgd);
+		ret =  PVOP_CALLEE1(pgdval_t, pv_mmu_ops.pgd_val,
+				    pgd.pgd);
 
 	return ret;
 }
@@ -1188,11 +1267,11 @@ static inline pmd_t __pmd(pmdval_t val)
 	pmdval_t ret;
 
 	if (sizeof(pmdval_t) > sizeof(long))
-		ret = PVOP_CALL2(pmdval_t, pv_mmu_ops.make_pmd,
-				 val, (u64)val >> 32);
+		ret = PVOP_CALLEE2(pmdval_t, pv_mmu_ops.make_pmd,
+				   val, (u64)val >> 32);
 	else
-		ret = PVOP_CALL1(pmdval_t, pv_mmu_ops.make_pmd,
-				 val);
+		ret = PVOP_CALLEE1(pmdval_t, pv_mmu_ops.make_pmd,
+				   val);
 
 	return (pmd_t) { ret };
 }
@@ -1202,11 +1281,11 @@ static inline pmdval_t pmd_val(pmd_t pmd)
 	pmdval_t ret;
 
 	if (sizeof(pmdval_t) > sizeof(long))
-		ret =  PVOP_CALL2(pmdval_t, pv_mmu_ops.pmd_val,
-				  pmd.pmd, (u64)pmd.pmd >> 32);
+		ret =  PVOP_CALLEE2(pmdval_t, pv_mmu_ops.pmd_val,
+				    pmd.pmd, (u64)pmd.pmd >> 32);
 	else
-		ret =  PVOP_CALL1(pmdval_t, pv_mmu_ops.pmd_val,
-				  pmd.pmd);
+		ret =  PVOP_CALLEE1(pmdval_t, pv_mmu_ops.pmd_val,
+				    pmd.pmd);
 
 	return ret;
 }
@@ -1228,11 +1307,11 @@ static inline pud_t __pud(pudval_t val)
 	pudval_t ret;
 
 	if (sizeof(pudval_t) > sizeof(long))
-		ret = PVOP_CALL2(pudval_t, pv_mmu_ops.make_pud,
-				 val, (u64)val >> 32);
+		ret = PVOP_CALLEE2(pudval_t, pv_mmu_ops.make_pud,
+				   val, (u64)val >> 32);
 	else
-		ret = PVOP_CALL1(pudval_t, pv_mmu_ops.make_pud,
-				 val);
+		ret = PVOP_CALLEE1(pudval_t, pv_mmu_ops.make_pud,
+				   val);
 
 	return (pud_t) { ret };
 }
@@ -1242,11 +1321,11 @@ static inline pudval_t pud_val(pud_t pud)
 	pudval_t ret;
 
 	if (sizeof(pudval_t) > sizeof(long))
-		ret =  PVOP_CALL2(pudval_t, pv_mmu_ops.pud_val,
-				  pud.pud, (u64)pud.pud >> 32);
+		ret =  PVOP_CALLEE2(pudval_t, pv_mmu_ops.pud_val,
+				    pud.pud, (u64)pud.pud >> 32);
 	else
-		ret =  PVOP_CALL1(pudval_t, pv_mmu_ops.pud_val,
-				  pud.pud);
+		ret =  PVOP_CALLEE1(pudval_t, pv_mmu_ops.pud_val,
+				    pud.pud);
 
 	return ret;
 }
@@ -1374,9 +1453,10 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,
 }
 
 void _paravirt_nop(void);
-#define paravirt_nop	((void *)_paravirt_nop)
+u32 _paravirt_ident_32(u32);
+u64 _paravirt_ident_64(u64);
 
-void paravirt_use_bytelocks(void);
+#define paravirt_nop	((void *)_paravirt_nop)
 
 #ifdef CONFIG_SMP
 
@@ -1426,12 +1506,37 @@ extern struct paravirt_patch_site __parainstructions[],
 	__parainstructions_end[];
 
 #ifdef CONFIG_X86_32
-#define PV_SAVE_REGS "pushl %%ecx; pushl %%edx;"
-#define PV_RESTORE_REGS "popl %%edx; popl %%ecx"
+#define PV_SAVE_REGS "pushl %ecx; pushl %edx;"
+#define PV_RESTORE_REGS "popl %edx; popl %ecx;"
+
+/* save and restore all caller-save registers, except return value */
+#define PV_SAVE_ALL_CALLER_REGS		"pushl %ecx;"
+#define PV_RESTORE_ALL_CALLER_REGS	"popl  %ecx;"
+
 #define PV_FLAGS_ARG "0"
 #define PV_EXTRA_CLOBBERS
 #define PV_VEXTRA_CLOBBERS
 #else
+/* save and restore all caller-save registers, except return value */
+#define PV_SAVE_ALL_CALLER_REGS						\
+	"push %rcx;"							\
+	"push %rdx;"							\
+	"push %rsi;"							\
+	"push %rdi;"							\
+	"push %r8;"							\
+	"push %r9;"							\
+	"push %r10;"							\
+	"push %r11;"
+#define PV_RESTORE_ALL_CALLER_REGS					\
+	"pop %r11;"							\
+	"pop %r10;"							\
+	"pop %r9;"							\
+	"pop %r8;"							\
+	"pop %rdi;"							\
+	"pop %rsi;"							\
+	"pop %rdx;"							\
+	"pop %rcx;"
+
 /* We save some registers, but all of them, that's too much. We clobber all
  * caller saved registers but the argument parameter */
 #define PV_SAVE_REGS "pushq %%rdi;"
@@ -1441,52 +1546,76 @@ extern struct paravirt_patch_site __parainstructions[],
 #define PV_FLAGS_ARG "D"
 #endif
 
+/*
+ * Generate a thunk around a function which saves all caller-save
+ * registers except for the return value.  This allows C functions to
+ * be called from assembler code where fewer than normal registers are
+ * available.  It may also help code generation around calls from C
+ * code if the common case doesn't use many registers.
+ *
+ * When a callee is wrapped in a thunk, the caller can assume that all
+ * arg regs and all scratch registers are preserved across the
+ * call. The return value in rax/eax will not be saved, even for void
+ * functions.
+ */
+#define PV_CALLEE_SAVE_REGS_THUNK(func)					\
+	extern typeof(func) __raw_callee_save_##func;			\
+	static void *__##func##__ __used = func;			\
+									\
+	asm(".pushsection .text;"					\
+	    "__raw_callee_save_" #func ": "				\
+	    PV_SAVE_ALL_CALLER_REGS					\
+	    "call " #func ";"						\
+	    PV_RESTORE_ALL_CALLER_REGS					\
+	    "ret;"							\
+	    ".popsection")
+
+/* Get a reference to a callee-save function */
+#define PV_CALLEE_SAVE(func)						\
+	((struct paravirt_callee_save) { __raw_callee_save_##func })
+
+/* Promise that "func" already uses the right calling convention */
+#define __PV_IS_CALLEE_SAVE(func)			\
+	((struct paravirt_callee_save) { func })
+
 static inline unsigned long __raw_local_save_flags(void)
 {
 	unsigned long f;
 
-	asm volatile(paravirt_alt(PV_SAVE_REGS
-				  PARAVIRT_CALL
-				  PV_RESTORE_REGS)
+	asm volatile(paravirt_alt(PARAVIRT_CALL)
 		     : "=a"(f)
 		     : paravirt_type(pv_irq_ops.save_fl),
 		       paravirt_clobber(CLBR_EAX)
-		     : "memory", "cc" PV_VEXTRA_CLOBBERS);
+		     : "memory", "cc");
 	return f;
 }
 
 static inline void raw_local_irq_restore(unsigned long f)
 {
-	asm volatile(paravirt_alt(PV_SAVE_REGS
-				  PARAVIRT_CALL
-				  PV_RESTORE_REGS)
+	asm volatile(paravirt_alt(PARAVIRT_CALL)
 		     : "=a"(f)
 		     : PV_FLAGS_ARG(f),
 		       paravirt_type(pv_irq_ops.restore_fl),
 		       paravirt_clobber(CLBR_EAX)
-		     : "memory", "cc" PV_EXTRA_CLOBBERS);
+		     : "memory", "cc");
 }
 
 static inline void raw_local_irq_disable(void)
 {
-	asm volatile(paravirt_alt(PV_SAVE_REGS
-				  PARAVIRT_CALL
-				  PV_RESTORE_REGS)
+	asm volatile(paravirt_alt(PARAVIRT_CALL)
 		     :
 		     : paravirt_type(pv_irq_ops.irq_disable),
 		       paravirt_clobber(CLBR_EAX)
-		     : "memory", "eax", "cc" PV_EXTRA_CLOBBERS);
+		     : "memory", "eax", "cc");
 }
 
 static inline void raw_local_irq_enable(void)
 {
-	asm volatile(paravirt_alt(PV_SAVE_REGS
-				  PARAVIRT_CALL
-				  PV_RESTORE_REGS)
+	asm volatile(paravirt_alt(PARAVIRT_CALL)
 		     :
 		     : paravirt_type(pv_irq_ops.irq_enable),
 		       paravirt_clobber(CLBR_EAX)
-		     : "memory", "eax", "cc" PV_EXTRA_CLOBBERS);
+		     : "memory", "eax", "cc");
 }
 
 static inline unsigned long __raw_local_irq_save(void)
@@ -1529,33 +1658,49 @@ static inline unsigned long __raw_local_irq_save(void)
 	.popsection
 
 
+#define COND_PUSH(set, mask, reg)			\
+	.if ((~(set)) & mask); push %reg; .endif
+#define COND_POP(set, mask, reg)			\
+	.if ((~(set)) & mask); pop %reg; .endif
+
 #ifdef CONFIG_X86_64
-#define PV_SAVE_REGS				\
-	push %rax;				\
-	push %rcx;				\
-	push %rdx;				\
-	push %rsi;				\
-	push %rdi;				\
-	push %r8;				\
-	push %r9;				\
-	push %r10;				\
-	push %r11
-#define PV_RESTORE_REGS				\
-	pop %r11;				\
-	pop %r10;				\
-	pop %r9;				\
-	pop %r8;				\
-	pop %rdi;				\
-	pop %rsi;				\
-	pop %rdx;				\
-	pop %rcx;				\
-	pop %rax
+
+#define PV_SAVE_REGS(set)			\
+	COND_PUSH(set, CLBR_RAX, rax);		\
+	COND_PUSH(set, CLBR_RCX, rcx);		\
+	COND_PUSH(set, CLBR_RDX, rdx);		\
+	COND_PUSH(set, CLBR_RSI, rsi);		\
+	COND_PUSH(set, CLBR_RDI, rdi);		\
+	COND_PUSH(set, CLBR_R8, r8);		\
+	COND_PUSH(set, CLBR_R9, r9);		\
+	COND_PUSH(set, CLBR_R10, r10);		\
+	COND_PUSH(set, CLBR_R11, r11)
+#define PV_RESTORE_REGS(set)			\
+	COND_POP(set, CLBR_R11, r11);		\
+	COND_POP(set, CLBR_R10, r10);		\
+	COND_POP(set, CLBR_R9, r9);		\
+	COND_POP(set, CLBR_R8, r8);		\
+	COND_POP(set, CLBR_RDI, rdi);		\
+	COND_POP(set, CLBR_RSI, rsi);		\
+	COND_POP(set, CLBR_RDX, rdx);		\
+	COND_POP(set, CLBR_RCX, rcx);		\
+	COND_POP(set, CLBR_RAX, rax)
+
 #define PARA_PATCH(struct, off)        ((PARAVIRT_PATCH_##struct + (off)) / 8)
 #define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .quad, 8)
 #define PARA_INDIRECT(addr)	*addr(%rip)
 #else
-#define PV_SAVE_REGS   pushl %eax; pushl %edi; pushl %ecx; pushl %edx
-#define PV_RESTORE_REGS popl %edx; popl %ecx; popl %edi; popl %eax
+#define PV_SAVE_REGS(set)			\
+	COND_PUSH(set, CLBR_EAX, eax);		\
+	COND_PUSH(set, CLBR_EDI, edi);		\
+	COND_PUSH(set, CLBR_ECX, ecx);		\
+	COND_PUSH(set, CLBR_EDX, edx)
+#define PV_RESTORE_REGS(set)			\
+	COND_POP(set, CLBR_EDX, edx);		\
+	COND_POP(set, CLBR_ECX, ecx);		\
+	COND_POP(set, CLBR_EDI, edi);		\
+	COND_POP(set, CLBR_EAX, eax)
+
 #define PARA_PATCH(struct, off)        ((PARAVIRT_PATCH_##struct + (off)) / 4)
 #define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .long, 4)
 #define PARA_INDIRECT(addr)	*%cs:addr
@@ -1567,15 +1712,15 @@ static inline unsigned long __raw_local_irq_save(void)
 
 #define DISABLE_INTERRUPTS(clobbers)					\
 	PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
-		  PV_SAVE_REGS;						\
+		  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);		\
 		  call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_disable);	\
-		  PV_RESTORE_REGS;)			\
+		  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
 
 #define ENABLE_INTERRUPTS(clobbers)					\
 	PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), clobbers,	\
-		  PV_SAVE_REGS;						\
+		  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);		\
 		  call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable);	\
-		  PV_RESTORE_REGS;)
+		  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
 
 #define USERGS_SYSRET32							\
 	PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret32),	\
@@ -1605,11 +1750,15 @@ static inline unsigned long __raw_local_irq_save(void)
 	PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE,	\
 		  swapgs)
 
+/*
+ * Note: swapgs is very special, and in practise is either going to be
+ * implemented with a single "swapgs" instruction or something very
+ * special.  Either way, we don't need to save any registers for
+ * it.
+ */
 #define SWAPGS								\
 	PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE,	\
-		  PV_SAVE_REGS;						\
-		  call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs);		\
-		  PV_RESTORE_REGS					\
+		  call PARA_INDIRECT(pv_cpu_ops+PV_CPU_swapgs)		\
 		 )
 
 #define GET_CR2_INTO_RCX				\
diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index b8493b3b9890..b0e70056838e 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -5,10 +5,8 @@
 
 #ifdef CONFIG_X86_PAT
 extern int pat_enabled;
-extern void validate_pat_support(struct cpuinfo_x86 *c);
 #else
 static const int pat_enabled;
-static inline void validate_pat_support(struct cpuinfo_x86 *c) { }
 #endif
 
 extern void pat_init(void);
@@ -17,6 +15,7 @@ extern int reserve_memtype(u64 start, u64 end,
 		unsigned long req_type, unsigned long *ret_type);
 extern int free_memtype(u64 start, u64 end);
 
-extern void pat_disable(char *reason);
+extern int kernel_map_sync_memtype(u64 base, unsigned long size,
+		unsigned long flag);
 
 #endif /* _ASM_X86_PAT_H */
diff --git a/arch/x86/include/asm/mach-default/pci-functions.h b/arch/x86/include/asm/pci-functions.h
index ed0bab427354..ed0bab427354 100644
--- a/arch/x86/include/asm/mach-default/pci-functions.h
+++ b/arch/x86/include/asm/pci-functions.h
diff --git a/arch/x86/include/asm/pda.h b/arch/x86/include/asm/pda.h
deleted file mode 100644
index 2fbfff88df37..000000000000
--- a/arch/x86/include/asm/pda.h
+++ /dev/null
@@ -1,137 +0,0 @@
-#ifndef _ASM_X86_PDA_H
-#define _ASM_X86_PDA_H
-
-#ifndef __ASSEMBLY__
-#include <linux/stddef.h>
-#include <linux/types.h>
-#include <linux/cache.h>
-#include <asm/page.h>
-
-/* Per processor datastructure. %gs points to it while the kernel runs */
-struct x8664_pda {
-	struct task_struct *pcurrent;	/* 0  Current process */
-	unsigned long data_offset;	/* 8 Per cpu data offset from linker
-					   address */
-	unsigned long kernelstack;	/* 16 top of kernel stack for current */
-	unsigned long oldrsp;		/* 24 user rsp for system call */
-	int irqcount;			/* 32 Irq nesting counter. Starts -1 */
-	unsigned int cpunumber;		/* 36 Logical CPU number */
-#ifdef CONFIG_CC_STACKPROTECTOR
-	unsigned long stack_canary;	/* 40 stack canary value */
-					/* gcc-ABI: this canary MUST be at
-					   offset 40!!! */
-#endif
-	char *irqstackptr;
-	short nodenumber;		/* number of current node (32k max) */
-	short in_bootmem;		/* pda lives in bootmem */
-	unsigned int __softirq_pending;
-	unsigned int __nmi_count;	/* number of NMI on this CPUs */
-	short mmu_state;
-	short isidle;
-	struct mm_struct *active_mm;
-	unsigned apic_timer_irqs;
-	unsigned irq0_irqs;
-	unsigned irq_resched_count;
-	unsigned irq_call_count;
-	unsigned irq_tlb_count;
-	unsigned irq_thermal_count;
-	unsigned irq_threshold_count;
-	unsigned irq_spurious_count;
-} ____cacheline_aligned_in_smp;
-
-extern struct x8664_pda **_cpu_pda;
-extern void pda_init(int);
-
-#define cpu_pda(i) (_cpu_pda[i])
-
-/*
- * There is no fast way to get the base address of the PDA, all the accesses
- * have to mention %fs/%gs.  So it needs to be done this Torvaldian way.
- */
-extern void __bad_pda_field(void) __attribute__((noreturn));
-
-/*
- * proxy_pda doesn't actually exist, but tell gcc it is accessed for
- * all PDA accesses so it gets read/write dependencies right.
- */
-extern struct x8664_pda _proxy_pda;
-
-#define pda_offset(field) offsetof(struct x8664_pda, field)
-
-#define pda_to_op(op, field, val)					\
-do {									\
-	typedef typeof(_proxy_pda.field) T__;				\
-	if (0) { T__ tmp__; tmp__ = (val); }	/* type checking */	\
-	switch (sizeof(_proxy_pda.field)) {				\
-	case 2:								\
-		asm(op "w %1,%%gs:%c2" :				\
-		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
-		    "i"(pda_offset(field)));				\
-		break;							\
-	case 4:								\
-		asm(op "l %1,%%gs:%c2" :				\
-		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
-		    "i" (pda_offset(field)));				\
-		break;							\
-	case 8:								\
-		asm(op "q %1,%%gs:%c2":					\
-		    "+m" (_proxy_pda.field) :				\
-		    "ri" ((T__)val),					\
-		    "i"(pda_offset(field)));				\
-		break;							\
-	default:							\
-		__bad_pda_field();					\
-	}								\
-} while (0)
-
-#define pda_from_op(op, field)			\
-({						\
-	typeof(_proxy_pda.field) ret__;		\
-	switch (sizeof(_proxy_pda.field)) {	\
-	case 2:					\
-		asm(op "w %%gs:%c1,%0" :	\
-		    "=r" (ret__) :		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	case 4:					\
-		asm(op "l %%gs:%c1,%0":		\
-		    "=r" (ret__):		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	case 8:					\
-		asm(op "q %%gs:%c1,%0":		\
-		    "=r" (ret__) :		\
-		    "i" (pda_offset(field)),	\
-		    "m" (_proxy_pda.field));	\
-		break;				\
-	default:				\
-		__bad_pda_field();		\
-	}					\
-	ret__;					\
-})
-
-#define read_pda(field)		pda_from_op("mov", field)
-#define write_pda(field, val)	pda_to_op("mov", field, val)
-#define add_pda(field, val)	pda_to_op("add", field, val)
-#define sub_pda(field, val)	pda_to_op("sub", field, val)
-#define or_pda(field, val)	pda_to_op("or", field, val)
-
-/* This is not atomic against other CPUs -- CPU preemption needs to be off */
-#define test_and_clear_bit_pda(bit, field)				\
-({									\
-	int old__;							\
-	asm volatile("btr %2,%%gs:%c3\n\tsbbl %0,%0"			\
-		     : "=r" (old__), "+m" (_proxy_pda.field)		\
-		     : "dIr" (bit), "i" (pda_offset(field)) : "memory");\
-	old__;								\
-})
-
-#endif
-
-#define PDA_STACKOFFSET (5*8)
-
-#endif /* _ASM_X86_PDA_H */
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index ece72053ba63..aee103b26d01 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -2,53 +2,12 @@
 #define _ASM_X86_PERCPU_H
 
 #ifdef CONFIG_X86_64
-#include <linux/compiler.h>
-
-/* Same as asm-generic/percpu.h, except that we store the per cpu offset
-   in the PDA. Longer term the PDA and every per cpu variable
-   should be just put into a single section and referenced directly
-   from %gs */
-
-#ifdef CONFIG_SMP
-#include <asm/pda.h>
-
-#define __per_cpu_offset(cpu) (cpu_pda(cpu)->data_offset)
-#define __my_cpu_offset read_pda(data_offset)
-
-#define per_cpu_offset(x) (__per_cpu_offset(x))
-
+#define __percpu_seg		gs
+#define __percpu_mov_op		movq
+#else
+#define __percpu_seg		fs
+#define __percpu_mov_op		movl
 #endif
-#include <asm-generic/percpu.h>
-
-DECLARE_PER_CPU(struct x8664_pda, pda);
-
-/*
- * These are supposed to be implemented as a single instruction which
- * operates on the per-cpu data base segment.  x86-64 doesn't have
- * that yet, so this is a fairly inefficient workaround for the
- * meantime.  The single instruction is atomic with respect to
- * preemption and interrupts, so we need to explicitly disable
- * interrupts here to achieve the same effect.  However, because it
- * can be used from within interrupt-disable/enable, we can't actually
- * disable interrupts; disabling preemption is enough.
- */
-#define x86_read_percpu(var)						\
-	({								\
-		typeof(per_cpu_var(var)) __tmp;				\
-		preempt_disable();					\
-		__tmp = __get_cpu_var(var);				\
-		preempt_enable();					\
-		__tmp;							\
-	})
-
-#define x86_write_percpu(var, val)					\
-	do {								\
-		preempt_disable();					\
-		__get_cpu_var(var) = (val);				\
-		preempt_enable();					\
-	} while(0)
-
-#else /* CONFIG_X86_64 */
 
 #ifdef __ASSEMBLY__
 
@@ -65,47 +24,48 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *    PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-#define PER_CPU(var, reg)				\
-	movl %fs:per_cpu__##this_cpu_off, reg;		\
+#define PER_CPU(var, reg)						\
+	__percpu_mov_op %__percpu_seg:per_cpu__this_cpu_off, reg;	\
 	lea per_cpu__##var(reg), reg
-#define PER_CPU_VAR(var)	%fs:per_cpu__##var
+#define PER_CPU_VAR(var)	%__percpu_seg:per_cpu__##var
 #else /* ! SMP */
-#define PER_CPU(var, reg)			\
-	movl $per_cpu__##var, reg
+#define PER_CPU(var, reg)						\
+	__percpu_mov_op $per_cpu__##var, reg
 #define PER_CPU_VAR(var)	per_cpu__##var
 #endif	/* SMP */
 
+#ifdef CONFIG_X86_64_SMP
+#define INIT_PER_CPU_VAR(var)  init_per_cpu__##var
+#else
+#define INIT_PER_CPU_VAR(var)  per_cpu__##var
+#endif
+
 #else /* ...!ASSEMBLY */
 
+#include <linux/stringify.h>
+
+#ifdef CONFIG_SMP
+#define __percpu_arg(x)		"%%"__stringify(__percpu_seg)":%P" #x
+#define __my_cpu_offset		percpu_read(this_cpu_off)
+#else
+#define __percpu_arg(x)		"%" #x
+#endif
+
 /*
- * PER_CPU finds an address of a per-cpu variable.
+ * Initialized pointers to per-cpu variables needed for the boot
+ * processor need to use these macros to get the proper address
+ * offset from __per_cpu_load on SMP.
  *
- * Args:
- *    var - variable name
- *    cpu - 32bit register containing the current CPU number
- *
- * The resulting address is stored in the "cpu" argument.
- *
- * Example:
- *    PER_CPU(cpu_gdt_descr, %ebx)
+ * There also must be an entry in vmlinux_64.lds.S
  */
-#ifdef CONFIG_SMP
-
-#define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
-/* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
-#define __percpu_seg "%%fs:"
-
-#else  /* !SMP */
-
-#define __percpu_seg ""
-
-#endif	/* SMP */
-
-#include <asm-generic/percpu.h>
+#define DECLARE_INIT_PER_CPU(var) \
+       extern typeof(per_cpu_var(var)) init_per_cpu_var(var)
 
-/* We can use this directly for local CPU (faster). */
-DECLARE_PER_CPU(unsigned long, this_cpu_off);
+#ifdef CONFIG_X86_64_SMP
+#define init_per_cpu_var(var)  init_per_cpu__##var
+#else
+#define init_per_cpu_var(var)  per_cpu_var(var)
+#endif
 
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
@@ -120,20 +80,25 @@ do {							\
 	}						\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b %1,"__percpu_seg"%0"		\
+		asm(op "b %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
 	case 2:						\
-		asm(op "w %1,"__percpu_seg"%0"		\
+		asm(op "w %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
 	case 4:						\
-		asm(op "l %1,"__percpu_seg"%0"		\
+		asm(op "l %1,"__percpu_arg(0)		\
 		    : "+m" (var)			\
 		    : "ri" ((T__)val));			\
 		break;					\
+	case 8:						\
+		asm(op "q %1,"__percpu_arg(0)		\
+		    : "+m" (var)			\
+		    : "re" ((T__)val));			\
+		break;					\
 	default: __bad_percpu_size();			\
 	}						\
 } while (0)
@@ -143,17 +108,22 @@ do {							\
 	typeof(var) ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_seg"%1,%0"		\
+		asm(op "b "__percpu_arg(1)",%0"		\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_seg"%1,%0"		\
+		asm(op "w "__percpu_arg(1)",%0"		\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_seg"%1,%0"		\
+		asm(op "l "__percpu_arg(1)",%0"		\
+		    : "=r" (ret__)			\
+		    : "m" (var));			\
+		break;					\
+	case 8:						\
+		asm(op "q "__percpu_arg(1)",%0"		\
 		    : "=r" (ret__)			\
 		    : "m" (var));			\
 		break;					\
@@ -162,13 +132,30 @@ do {							\
 	ret__;						\
 })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var, val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var, val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var, val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var, val) percpu_to_op("or", per_cpu__##var, val)
+#define percpu_read(var)	percpu_from_op("mov", per_cpu__##var)
+#define percpu_write(var, val)	percpu_to_op("mov", per_cpu__##var, val)
+#define percpu_add(var, val)	percpu_to_op("add", per_cpu__##var, val)
+#define percpu_sub(var, val)	percpu_to_op("sub", per_cpu__##var, val)
+#define percpu_and(var, val)	percpu_to_op("and", per_cpu__##var, val)
+#define percpu_or(var, val)	percpu_to_op("or", per_cpu__##var, val)
+#define percpu_xor(var, val)	percpu_to_op("xor", per_cpu__##var, val)
+
+/* This is not atomic against other CPUs -- CPU preemption needs to be off */
+#define x86_test_and_clear_bit_percpu(bit, var)				\
+({									\
+	int old__;							\
+	asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0"		\
+		     : "=r" (old__), "+m" (per_cpu__##var)		\
+		     : "dIr" (bit));					\
+	old__;								\
+})
+
+#include <asm-generic/percpu.h>
+
+/* We can use this directly for local CPU (faster). */
+DECLARE_PER_CPU(unsigned long, this_cpu_off);
+
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 
 #ifdef CONFIG_SMP
 
@@ -195,9 +182,9 @@ do {							\
 #define	early_per_cpu_ptr(_name) (_name##_early_ptr)
 #define	early_per_cpu_map(_name, _idx) (_name##_early_map[_idx])
 #define	early_per_cpu(_name, _cpu) 				\
-	(early_per_cpu_ptr(_name) ?				\
-		early_per_cpu_ptr(_name)[_cpu] :		\
-		per_cpu(_name, _cpu))
+	*(early_per_cpu_ptr(_name) ?				\
+		&early_per_cpu_ptr(_name)[_cpu] :		\
+		&per_cpu(_name, _cpu))
 
 #else	/* !CONFIG_SMP */
 #define	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)		\
diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index e0d199fe1d83..c1774ac9da7a 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -53,8 +53,6 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
 #endif
 
-#define pte_none(x)		(!(x).pte_low)
-
 /*
  * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
  * split up the 29 bits of offset into this range:
diff --git a/arch/x86/include/asm/pgtable-2level-defs.h b/arch/x86/include/asm/pgtable-2level_types.h
index d77db8990eaa..daacc23e3fb9 100644
--- a/arch/x86/include/asm/pgtable-2level-defs.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -1,7 +1,23 @@
 #ifndef _ASM_X86_PGTABLE_2LEVEL_DEFS_H
 #define _ASM_X86_PGTABLE_2LEVEL_DEFS_H
 
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+typedef unsigned long	pteval_t;
+typedef unsigned long	pmdval_t;
+typedef unsigned long	pudval_t;
+typedef unsigned long	pgdval_t;
+typedef unsigned long	pgprotval_t;
+
+typedef union {
+	pteval_t pte;
+	pteval_t pte_low;
+} pte_t;
+#endif	/* !__ASSEMBLY__ */
+
 #define SHARED_KERNEL_PMD	0
+#define PAGETABLE_LEVELS	2
 
 /*
  * traditional i386 two-level paging structure:
@@ -10,6 +26,7 @@
 #define PGDIR_SHIFT	22
 #define PTRS_PER_PGD	1024
 
+
 /*
  * the i386 is two-level, so we don't really have any
  * PMD directory physically.
diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 447da43cddb3..3f13cdf61156 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -18,21 +18,6 @@
 	printk("%s:%d: bad pgd %p(%016Lx).\n",				\
 	       __FILE__, __LINE__, &(e), pgd_val(e))
 
-static inline int pud_none(pud_t pud)
-{
-	return pud_val(pud) == 0;
-}
-
-static inline int pud_bad(pud_t pud)
-{
-	return (pud_val(pud) & ~(PTE_PFN_MASK | _KERNPG_TABLE | _PAGE_USER)) != 0;
-}
-
-static inline int pud_present(pud_t pud)
-{
-	return pud_val(pud) & _PAGE_PRESENT;
-}
-
 /* Rules for using set_pte: the pte being assigned *must* be
  * either not present or in a state where the hardware will
  * not attempt to update the pte.  In places where this is
@@ -120,15 +105,6 @@ static inline void pud_clear(pud_t *pudp)
 		write_cr3(pgd);
 }
 
-#define pud_page(pud) pfn_to_page(pud_val(pud) >> PAGE_SHIFT)
-
-#define pud_page_vaddr(pud) ((unsigned long) __va(pud_val(pud) & PTE_PFN_MASK))
-
-
-/* Find an entry in the second-level page table.. */
-#define pmd_offset(pud, address) ((pmd_t *)pud_page_vaddr(*(pud)) +	\
-				  pmd_index(address))
-
 #ifdef CONFIG_SMP
 static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
 {
@@ -145,17 +121,6 @@ static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
 #define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
 #endif
 
-#define __HAVE_ARCH_PTE_SAME
-static inline int pte_same(pte_t a, pte_t b)
-{
-	return a.pte_low == b.pte_low && a.pte_high == b.pte_high;
-}
-
-static inline int pte_none(pte_t pte)
-{
-	return !pte.pte_low && !pte.pte_high;
-}
-
 /*
  * Bits 0, 6 and 7 are taken in the low part of the pte,
  * put the 32 bits of offset into the high part.
diff --git a/arch/x86/include/asm/pgtable-3level-defs.h b/arch/x86/include/asm/pgtable-3level_types.h
index 62561367653c..1bd5876c8649 100644
--- a/arch/x86/include/asm/pgtable-3level-defs.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -1,12 +1,31 @@
 #ifndef _ASM_X86_PGTABLE_3LEVEL_DEFS_H
 #define _ASM_X86_PGTABLE_3LEVEL_DEFS_H
 
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+typedef u64	pteval_t;
+typedef u64	pmdval_t;
+typedef u64	pudval_t;
+typedef u64	pgdval_t;
+typedef u64	pgprotval_t;
+
+typedef union {
+	struct {
+		unsigned long pte_low, pte_high;
+	};
+	pteval_t pte;
+} pte_t;
+#endif	/* !__ASSEMBLY__ */
+
 #ifdef CONFIG_PARAVIRT
 #define SHARED_KERNEL_PMD	(pv_info.shared_kernel_pmd)
 #else
 #define SHARED_KERNEL_PMD	1
 #endif
 
+#define PAGETABLE_LEVELS	3
+
 /*
  * PGDIR_SHIFT determines what a top-level page table entry can map
  */
@@ -25,4 +44,5 @@
  */
 #define PTRS_PER_PTE	512
 
+
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 4f5af8447d54..d0812e155f1d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1,164 +1,9 @@
 #ifndef _ASM_X86_PGTABLE_H
 #define _ASM_X86_PGTABLE_H
 
-#define FIRST_USER_ADDRESS	0
-
-#define _PAGE_BIT_PRESENT	0	/* is present */
-#define _PAGE_BIT_RW		1	/* writeable */
-#define _PAGE_BIT_USER		2	/* userspace addressable */
-#define _PAGE_BIT_PWT		3	/* page write through */
-#define _PAGE_BIT_PCD		4	/* page cache disabled */
-#define _PAGE_BIT_ACCESSED	5	/* was accessed (raised by CPU) */
-#define _PAGE_BIT_DIRTY		6	/* was written to (raised by CPU) */
-#define _PAGE_BIT_PSE		7	/* 4 MB (or 2MB) page */
-#define _PAGE_BIT_PAT		7	/* on 4KB pages */
-#define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
-#define _PAGE_BIT_UNUSED1	9	/* available for programmer */
-#define _PAGE_BIT_IOMAP		10	/* flag used to indicate IO mapping */
-#define _PAGE_BIT_UNUSED3	11
-#define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
-#define _PAGE_BIT_SPECIAL	_PAGE_BIT_UNUSED1
-#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_UNUSED1
-#define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
-
-/* If _PAGE_BIT_PRESENT is clear, we use these: */
-/* - if the user mapped it with PROT_NONE; pte_present gives true */
-#define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
-/* - set: nonlinear file mapping, saved PTE; unset:swap */
-#define _PAGE_BIT_FILE		_PAGE_BIT_DIRTY
-
-#define _PAGE_PRESENT	(_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
-#define _PAGE_RW	(_AT(pteval_t, 1) << _PAGE_BIT_RW)
-#define _PAGE_USER	(_AT(pteval_t, 1) << _PAGE_BIT_USER)
-#define _PAGE_PWT	(_AT(pteval_t, 1) << _PAGE_BIT_PWT)
-#define _PAGE_PCD	(_AT(pteval_t, 1) << _PAGE_BIT_PCD)
-#define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
-#define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
-#define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
-#define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
-#define _PAGE_UNUSED1	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1)
-#define _PAGE_IOMAP	(_AT(pteval_t, 1) << _PAGE_BIT_IOMAP)
-#define _PAGE_UNUSED3	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED3)
-#define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
-#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
-#define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
-#define _PAGE_CPA_TEST	(_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST)
-#define __HAVE_ARCH_PTE_SPECIAL
-
-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
-#define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
-#else
-#define _PAGE_NX	(_AT(pteval_t, 0))
-#endif
+#include <asm/page.h>
 
-#define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
-#define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
-
-#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
-			 _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
-			 _PAGE_DIRTY)
-
-/* Set of bits not changed in pte_modify */
-#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
-			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
-
-#define _PAGE_CACHE_MASK	(_PAGE_PCD | _PAGE_PWT)
-#define _PAGE_CACHE_WB		(0)
-#define _PAGE_CACHE_WC		(_PAGE_PWT)
-#define _PAGE_CACHE_UC_MINUS	(_PAGE_PCD)
-#define _PAGE_CACHE_UC		(_PAGE_PCD | _PAGE_PWT)
-
-#define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
-#define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
-				 _PAGE_ACCESSED | _PAGE_NX)
-
-#define PAGE_SHARED_EXEC	__pgprot(_PAGE_PRESENT | _PAGE_RW |	\
-					 _PAGE_USER | _PAGE_ACCESSED)
-#define PAGE_COPY_NOEXEC	__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
-					 _PAGE_ACCESSED | _PAGE_NX)
-#define PAGE_COPY_EXEC		__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
-					 _PAGE_ACCESSED)
-#define PAGE_COPY		PAGE_COPY_NOEXEC
-#define PAGE_READONLY		__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
-					 _PAGE_ACCESSED | _PAGE_NX)
-#define PAGE_READONLY_EXEC	__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
-					 _PAGE_ACCESSED)
-
-#define __PAGE_KERNEL_EXEC						\
-	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL)
-#define __PAGE_KERNEL		(__PAGE_KERNEL_EXEC | _PAGE_NX)
-
-#define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
-#define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
-#define __PAGE_KERNEL_EXEC_NOCACHE	(__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT)
-#define __PAGE_KERNEL_WC		(__PAGE_KERNEL | _PAGE_CACHE_WC)
-#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
-#define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
-#define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
-#define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
-#define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
-#define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
-#define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
-
-#define __PAGE_KERNEL_IO		(__PAGE_KERNEL | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP)
-#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC | _PAGE_IOMAP)
-
-#define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC		__pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX			__pgprot(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_WC			__pgprot(__PAGE_KERNEL_WC)
-#define PAGE_KERNEL_NOCACHE		__pgprot(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_UC_MINUS		__pgprot(__PAGE_KERNEL_UC_MINUS)
-#define PAGE_KERNEL_EXEC_NOCACHE	__pgprot(__PAGE_KERNEL_EXEC_NOCACHE)
-#define PAGE_KERNEL_LARGE		__pgprot(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_NOCACHE	__pgprot(__PAGE_KERNEL_LARGE_NOCACHE)
-#define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
-
-#define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
-#define PAGE_KERNEL_IO_UC_MINUS		__pgprot(__PAGE_KERNEL_IO_UC_MINUS)
-#define PAGE_KERNEL_IO_WC		__pgprot(__PAGE_KERNEL_IO_WC)
-
-/*         xwr */
-#define __P000	PAGE_NONE
-#define __P001	PAGE_READONLY
-#define __P010	PAGE_COPY
-#define __P011	PAGE_COPY
-#define __P100	PAGE_READONLY_EXEC
-#define __P101	PAGE_READONLY_EXEC
-#define __P110	PAGE_COPY_EXEC
-#define __P111	PAGE_COPY_EXEC
-
-#define __S000	PAGE_NONE
-#define __S001	PAGE_READONLY
-#define __S010	PAGE_SHARED
-#define __S011	PAGE_SHARED
-#define __S100	PAGE_READONLY_EXEC
-#define __S101	PAGE_READONLY_EXEC
-#define __S110	PAGE_SHARED_EXEC
-#define __S111	PAGE_SHARED_EXEC
-
-/*
- * early identity mapping  pte attrib macros.
- */
-#ifdef CONFIG_X86_64
-#define __PAGE_KERNEL_IDENT_LARGE_EXEC	__PAGE_KERNEL_LARGE_EXEC
-#else
-/*
- * For PDE_IDENT_ATTR include USER bit. As the PDE and PTE protection
- * bits are combined, this will alow user to access the high address mapped
- * VDSO in the presence of CONFIG_COMPAT_VDSO
- */
-#define PTE_IDENT_ATTR	 0x003		/* PRESENT+RW */
-#define PDE_IDENT_ATTR	 0x067		/* PRESENT+RW+USER+DIRTY+ACCESSED */
-#define PGD_IDENT_ATTR	 0x001		/* PRESENT (no other attributes) */
-#endif
+#include <asm/pgtable_types.h>
 
 /*
  * Macro to mark a page protection value as UC-
@@ -170,9 +15,6 @@
 
 #ifndef __ASSEMBLY__
 
-#define pgprot_writecombine	pgprot_writecombine
-extern pgprot_t pgprot_writecombine(pgprot_t prot);
-
 /*
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
@@ -183,6 +25,66 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 extern spinlock_t pgd_lock;
 extern struct list_head pgd_list;
 
+#ifdef CONFIG_PARAVIRT
+#include <asm/paravirt.h>
+#else  /* !CONFIG_PARAVIRT */
+#define set_pte(ptep, pte)		native_set_pte(ptep, pte)
+#define set_pte_at(mm, addr, ptep, pte)	native_set_pte_at(mm, addr, ptep, pte)
+
+#define set_pte_present(mm, addr, ptep, pte)				\
+	native_set_pte_present(mm, addr, ptep, pte)
+#define set_pte_atomic(ptep, pte)					\
+	native_set_pte_atomic(ptep, pte)
+
+#define set_pmd(pmdp, pmd)		native_set_pmd(pmdp, pmd)
+
+#ifndef __PAGETABLE_PUD_FOLDED
+#define set_pgd(pgdp, pgd)		native_set_pgd(pgdp, pgd)
+#define pgd_clear(pgd)			native_pgd_clear(pgd)
+#endif
+
+#ifndef set_pud
+# define set_pud(pudp, pud)		native_set_pud(pudp, pud)
+#endif
+
+#ifndef __PAGETABLE_PMD_FOLDED
+#define pud_clear(pud)			native_pud_clear(pud)
+#endif
+
+#define pte_clear(mm, addr, ptep)	native_pte_clear(mm, addr, ptep)
+#define pmd_clear(pmd)			native_pmd_clear(pmd)
+
+#define pte_update(mm, addr, ptep)              do { } while (0)
+#define pte_update_defer(mm, addr, ptep)        do { } while (0)
+
+static inline void __init paravirt_pagetable_setup_start(pgd_t *base)
+{
+	native_pagetable_setup_start(base);
+}
+
+static inline void __init paravirt_pagetable_setup_done(pgd_t *base)
+{
+	native_pagetable_setup_done(base);
+}
+
+#define pgd_val(x)	native_pgd_val(x)
+#define __pgd(x)	native_make_pgd(x)
+
+#ifndef __PAGETABLE_PUD_FOLDED
+#define pud_val(x)	native_pud_val(x)
+#define __pud(x)	native_make_pud(x)
+#endif
+
+#ifndef __PAGETABLE_PMD_FOLDED
+#define pmd_val(x)	native_pmd_val(x)
+#define __pmd(x)	native_make_pmd(x)
+#endif
+
+#define pte_val(x)	native_pte_val(x)
+#define __pte(x)	native_make_pte(x)
+
+#endif	/* CONFIG_PARAVIRT */
+
 /*
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
@@ -236,72 +138,84 @@ static inline unsigned long pte_pfn(pte_t pte)
 
 static inline int pmd_large(pmd_t pte)
 {
-	return (pmd_val(pte) & (_PAGE_PSE | _PAGE_PRESENT)) ==
+	return (pmd_flags(pte) & (_PAGE_PSE | _PAGE_PRESENT)) ==
 		(_PAGE_PSE | _PAGE_PRESENT);
 }
 
+static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
+{
+	pteval_t v = native_pte_val(pte);
+
+	return native_make_pte(v | set);
+}
+
+static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
+{
+	pteval_t v = native_pte_val(pte);
+
+	return native_make_pte(v & ~clear);
+}
+
 static inline pte_t pte_mkclean(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_DIRTY);
+	return pte_clear_flags(pte, _PAGE_DIRTY);
 }
 
 static inline pte_t pte_mkold(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
+	return pte_clear_flags(pte, _PAGE_ACCESSED);
 }
 
 static inline pte_t pte_wrprotect(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_RW);
+	return pte_clear_flags(pte, _PAGE_RW);
 }
 
 static inline pte_t pte_mkexec(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_NX);
+	return pte_clear_flags(pte, _PAGE_NX);
 }
 
 static inline pte_t pte_mkdirty(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_DIRTY);
+	return pte_set_flags(pte, _PAGE_DIRTY);
 }
 
 static inline pte_t pte_mkyoung(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_ACCESSED);
+	return pte_set_flags(pte, _PAGE_ACCESSED);
 }
 
 static inline pte_t pte_mkwrite(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_RW);
+	return pte_set_flags(pte, _PAGE_RW);
 }
 
 static inline pte_t pte_mkhuge(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_PSE);
+	return pte_set_flags(pte, _PAGE_PSE);
 }
 
 static inline pte_t pte_clrhuge(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_PSE);
+	return pte_clear_flags(pte, _PAGE_PSE);
 }
 
 static inline pte_t pte_mkglobal(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_GLOBAL);
+	return pte_set_flags(pte, _PAGE_GLOBAL);
 }
 
 static inline pte_t pte_clrglobal(pte_t pte)
 {
-	return __pte(pte_val(pte) & ~_PAGE_GLOBAL);
+	return pte_clear_flags(pte, _PAGE_GLOBAL);
 }
 
 static inline pte_t pte_mkspecial(pte_t pte)
 {
-	return __pte(pte_val(pte) | _PAGE_SPECIAL);
+	return pte_set_flags(pte, _PAGE_SPECIAL);
 }
 
-extern pteval_t __supported_pte_mask;
-
 /*
  * Mask out unsupported bits in a present pgprot.  Non-present pgprots
  * can use those bits for other purposes, so leave them be.
@@ -374,82 +288,197 @@ static inline int is_new_memtype_allowed(unsigned long flags,
 	return 1;
 }
 
-#ifndef __ASSEMBLY__
-/* Indicate that x86 has its own track and untrack pfn vma functions */
-#define __HAVE_PFNMAP_TRACKING
-
-#define __HAVE_PHYS_MEM_ACCESS_PROT
-struct file;
-pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-                              unsigned long size, pgprot_t vma_prot);
-int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
-                              unsigned long size, pgprot_t *vma_prot);
-#endif
-
-/* Install a pte for a particular vaddr in kernel space. */
-void set_pte_vaddr(unsigned long vaddr, pte_t pte);
+pmd_t *populate_extra_pmd(unsigned long vaddr);
+pte_t *populate_extra_pte(unsigned long vaddr);
+#endif	/* __ASSEMBLY__ */
 
 #ifdef CONFIG_X86_32
-extern void native_pagetable_setup_start(pgd_t *base);
-extern void native_pagetable_setup_done(pgd_t *base);
+# include "pgtable_32.h"
 #else
-static inline void native_pagetable_setup_start(pgd_t *base) {}
-static inline void native_pagetable_setup_done(pgd_t *base) {}
+# include "pgtable_64.h"
 #endif
 
-struct seq_file;
-extern void arch_report_meminfo(struct seq_file *m);
+#ifndef __ASSEMBLY__
+#include <linux/mm_types.h>
 
-#ifdef CONFIG_PARAVIRT
-#include <asm/paravirt.h>
-#else  /* !CONFIG_PARAVIRT */
-#define set_pte(ptep, pte)		native_set_pte(ptep, pte)
-#define set_pte_at(mm, addr, ptep, pte)	native_set_pte_at(mm, addr, ptep, pte)
+static inline int pte_none(pte_t pte)
+{
+	return !pte.pte;
+}
 
-#define set_pte_present(mm, addr, ptep, pte)				\
-	native_set_pte_present(mm, addr, ptep, pte)
-#define set_pte_atomic(ptep, pte)					\
-	native_set_pte_atomic(ptep, pte)
+#define __HAVE_ARCH_PTE_SAME
+static inline int pte_same(pte_t a, pte_t b)
+{
+	return a.pte == b.pte;
+}
 
-#define set_pmd(pmdp, pmd)		native_set_pmd(pmdp, pmd)
+static inline int pte_present(pte_t a)
+{
+	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
+}
 
-#ifndef __PAGETABLE_PUD_FOLDED
-#define set_pgd(pgdp, pgd)		native_set_pgd(pgdp, pgd)
-#define pgd_clear(pgd)			native_pgd_clear(pgd)
-#endif
+static inline int pmd_present(pmd_t pmd)
+{
+	return pmd_flags(pmd) & _PAGE_PRESENT;
+}
 
-#ifndef set_pud
-# define set_pud(pudp, pud)		native_set_pud(pudp, pud)
-#endif
+static inline int pmd_none(pmd_t pmd)
+{
+	/* Only check low word on 32-bit platforms, since it might be
+	   out of sync with upper half. */
+	return (unsigned long)native_pmd_val(pmd) == 0;
+}
 
-#ifndef __PAGETABLE_PMD_FOLDED
-#define pud_clear(pud)			native_pud_clear(pud)
-#endif
+static inline unsigned long pmd_page_vaddr(pmd_t pmd)
+{
+	return (unsigned long)__va(pmd_val(pmd) & PTE_PFN_MASK);
+}
 
-#define pte_clear(mm, addr, ptep)	native_pte_clear(mm, addr, ptep)
-#define pmd_clear(pmd)			native_pmd_clear(pmd)
+/*
+ * Currently stuck as a macro due to indirect forward reference to
+ * linux/mmzone.h's __section_mem_map_addr() definition:
+ */
+#define pmd_page(pmd)	pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
 
-#define pte_update(mm, addr, ptep)              do { } while (0)
-#define pte_update_defer(mm, addr, ptep)        do { } while (0)
+/*
+ * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
+ *
+ * this macro returns the index of the entry in the pmd page which would
+ * control the given virtual address
+ */
+static inline unsigned pmd_index(unsigned long address)
+{
+	return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
+}
 
-static inline void __init paravirt_pagetable_setup_start(pgd_t *base)
+/*
+ * Conversion functions: convert a page and protection to a page entry,
+ * and a page entry and page directory to the page they refer to.
+ *
+ * (Currently stuck as a macro because of indirect forward reference
+ * to linux/mm.h:page_to_nid())
+ */
+#define mk_pte(page, pgprot)   pfn_pte(page_to_pfn(page), (pgprot))
+
+/*
+ * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE]
+ *
+ * this function returns the index of the entry in the pte page which would
+ * control the given virtual address
+ */
+static inline unsigned pte_index(unsigned long address)
 {
-	native_pagetable_setup_start(base);
+	return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
 }
 
-static inline void __init paravirt_pagetable_setup_done(pgd_t *base)
+static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
 {
-	native_pagetable_setup_done(base);
+	return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
 }
-#endif	/* CONFIG_PARAVIRT */
 
-#endif	/* __ASSEMBLY__ */
+static inline int pmd_bad(pmd_t pmd)
+{
+	return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE;
+}
 
-#ifdef CONFIG_X86_32
-# include "pgtable_32.h"
+static inline unsigned long pages_to_mb(unsigned long npg)
+{
+	return npg >> (20 - PAGE_SHIFT);
+}
+
+#define io_remap_pfn_range(vma, vaddr, pfn, size, prot)	\
+	remap_pfn_range(vma, vaddr, pfn, size, prot)
+
+#if PAGETABLE_LEVELS > 2
+static inline int pud_none(pud_t pud)
+{
+	return native_pud_val(pud) == 0;
+}
+
+static inline int pud_present(pud_t pud)
+{
+	return pud_flags(pud) & _PAGE_PRESENT;
+}
+
+static inline unsigned long pud_page_vaddr(pud_t pud)
+{
+	return (unsigned long)__va((unsigned long)pud_val(pud) & PTE_PFN_MASK);
+}
+
+/*
+ * Currently stuck as a macro due to indirect forward reference to
+ * linux/mmzone.h's __section_mem_map_addr() definition:
+ */
+#define pud_page(pud)		pfn_to_page(pud_val(pud) >> PAGE_SHIFT)
+
+/* Find an entry in the second-level page table.. */
+static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
+{
+	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
+}
+
+static inline unsigned long pmd_pfn(pmd_t pmd)
+{
+	return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
+static inline int pud_large(pud_t pud)
+{
+	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
+		(_PAGE_PSE | _PAGE_PRESENT);
+}
+
+static inline int pud_bad(pud_t pud)
+{
+	return (pud_flags(pud) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
+}
 #else
-# include "pgtable_64.h"
-#endif
+static inline int pud_large(pud_t pud)
+{
+	return 0;
+}
+#endif	/* PAGETABLE_LEVELS > 2 */
+
+#if PAGETABLE_LEVELS > 3
+static inline int pgd_present(pgd_t pgd)
+{
+	return pgd_flags(pgd) & _PAGE_PRESENT;
+}
+
+static inline unsigned long pgd_page_vaddr(pgd_t pgd)
+{
+	return (unsigned long)__va((unsigned long)pgd_val(pgd) & PTE_PFN_MASK);
+}
+
+/*
+ * Currently stuck as a macro due to indirect forward reference to
+ * linux/mmzone.h's __section_mem_map_addr() definition:
+ */
+#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+
+/* to find an entry in a page-table-directory. */
+static inline unsigned pud_index(unsigned long address)
+{
+	return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
+}
+
+static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
+{
+	return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address);
+}
+
+static inline int pgd_bad(pgd_t pgd)
+{
+	return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE;
+}
+
+static inline int pgd_none(pgd_t pgd)
+{
+	return !native_pgd_val(pgd);
+}
+#endif	/* PAGETABLE_LEVELS > 3 */
+
+#endif	/* __ASSEMBLY__ */
 
 /*
  * the pgd page can be thought of an array like this: pgd_t[PTRS_PER_PGD]
@@ -476,28 +505,6 @@ static inline void __init paravirt_pagetable_setup_done(pgd_t *base)
 
 #ifndef __ASSEMBLY__
 
-enum {
-	PG_LEVEL_NONE,
-	PG_LEVEL_4K,
-	PG_LEVEL_2M,
-	PG_LEVEL_1G,
-	PG_LEVEL_NUM
-};
-
-#ifdef CONFIG_PROC_FS
-extern void update_page_count(int level, unsigned long pages);
-#else
-static inline void update_page_count(int level, unsigned long pages) { }
-#endif
-
-/*
- * Helper function that returns the kernel pagetable entry controlling
- * the virtual address 'address'. NULL means no pagetable entry present.
- * NOTE: the return type is pte_t but if the pmd is PSE then we return it
- * as a pte too.
- */
-extern pte_t *lookup_address(unsigned long address, unsigned int *level);
-
 /* local pte updates need not use xchg for locking */
 static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
 {
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 72b020deb46b..97612fc7632f 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_PGTABLE_32_H
 #define _ASM_X86_PGTABLE_32_H
 
+#include <asm/pgtable_32_types.h>
 
 /*
  * The Linux memory management assumes a three-level page table setup. On
@@ -33,47 +34,6 @@ void paging_init(void);
 
 extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t);
 
-/*
- * The Linux x86 paging architecture is 'compile-time dual-mode', it
- * implements both the traditional 2-level x86 page tables and the
- * newer 3-level PAE-mode page tables.
- */
-#ifdef CONFIG_X86_PAE
-# include <asm/pgtable-3level-defs.h>
-# define PMD_SIZE	(1UL << PMD_SHIFT)
-# define PMD_MASK	(~(PMD_SIZE - 1))
-#else
-# include <asm/pgtable-2level-defs.h>
-#endif
-
-#define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
-#define PGDIR_MASK	(~(PGDIR_SIZE - 1))
-
-/* Just any arbitrary offset to the start of the vmalloc VM area: the
- * current 8MB value just means that there will be a 8MB "hole" after the
- * physical memory until the kernel virtual memory starts.  That means that
- * any out-of-bounds memory accesses will hopefully be caught.
- * The vmalloc() routines leaves a hole of 4kB between each vmalloced
- * area for the same reason. ;)
- */
-#define VMALLOC_OFFSET	(8 * 1024 * 1024)
-#define VMALLOC_START	((unsigned long)high_memory + VMALLOC_OFFSET)
-#ifdef CONFIG_X86_PAE
-#define LAST_PKMAP 512
-#else
-#define LAST_PKMAP 1024
-#endif
-
-#define PKMAP_BASE ((FIXADDR_BOOT_START - PAGE_SIZE * (LAST_PKMAP + 1))	\
-		    & PMD_MASK)
-
-#ifdef CONFIG_HIGHMEM
-# define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
-#else
-# define VMALLOC_END	(FIXADDR_START - 2 * PAGE_SIZE)
-#endif
-
-#define MAXMEM	(VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)
 
 /*
  * Define this if things work differently on an i386 and an i486:
@@ -85,55 +45,12 @@ extern void set_pmd_pfn(unsigned long, unsigned long, pgprot_t);
 /* The boot page tables (all created as a single array) */
 extern unsigned long pg0[];
 
-#define pte_present(x)	((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
-
-/* To avoid harmful races, pmd_none(x) should check only the lower when PAE */
-#define pmd_none(x)	(!(unsigned long)pmd_val((x)))
-#define pmd_present(x)	(pmd_val((x)) & _PAGE_PRESENT)
-#define pmd_bad(x) ((pmd_val(x) & (PTE_FLAGS_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
-
-#define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))
-
 #ifdef CONFIG_X86_PAE
 # include <asm/pgtable-3level.h>
 #else
 # include <asm/pgtable-2level.h>
 #endif
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- */
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
-
-
-static inline int pud_large(pud_t pud) { return 0; }
-
-/*
- * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
- *
- * this macro returns the index of the entry in the pmd page which would
- * control the given virtual address
- */
-#define pmd_index(address)				\
-	(((address) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
-
-/*
- * the pte page can be thought of an array like this: pte_t[PTRS_PER_PTE]
- *
- * this macro returns the index of the entry in the pte page which would
- * control the given virtual address
- */
-#define pte_index(address)					\
-	(((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
-#define pte_offset_kernel(dir, address)				\
-	((pte_t *)pmd_page_vaddr(*(dir)) +  pte_index((address)))
-
-#define pmd_page(pmd) (pfn_to_page(pmd_val((pmd)) >> PAGE_SHIFT))
-
-#define pmd_page_vaddr(pmd)					\
-	((unsigned long)__va(pmd_val((pmd)) & PTE_PFN_MASK))
-
 #if defined(CONFIG_HIGHPTE)
 #define pte_offset_map(dir, address)					\
 	((pte_t *)kmap_atomic_pte(pmd_page(*(dir)), KM_PTE0) +		\
@@ -176,7 +93,4 @@ do {						\
 #define kern_addr_valid(kaddr)	(0)
 #endif
 
-#define io_remap_pfn_range(vma, vaddr, pfn, size, prot)	\
-	remap_pfn_range(vma, vaddr, pfn, size, prot)
-
 #endif /* _ASM_X86_PGTABLE_32_H */
diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
new file mode 100644
index 000000000000..bd8df3b2fe04
--- /dev/null
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -0,0 +1,46 @@
+#ifndef _ASM_X86_PGTABLE_32_DEFS_H
+#define _ASM_X86_PGTABLE_32_DEFS_H
+
+/*
+ * The Linux x86 paging architecture is 'compile-time dual-mode', it
+ * implements both the traditional 2-level x86 page tables and the
+ * newer 3-level PAE-mode page tables.
+ */
+#ifdef CONFIG_X86_PAE
+# include <asm/pgtable-3level_types.h>
+# define PMD_SIZE	(1UL << PMD_SHIFT)
+# define PMD_MASK	(~(PMD_SIZE - 1))
+#else
+# include <asm/pgtable-2level_types.h>
+#endif
+
+#define PGDIR_SIZE	(1UL << PGDIR_SHIFT)
+#define PGDIR_MASK	(~(PGDIR_SIZE - 1))
+
+/* Just any arbitrary offset to the start of the vmalloc VM area: the
+ * current 8MB value just means that there will be a 8MB "hole" after the
+ * physical memory until the kernel virtual memory starts.  That means that
+ * any out-of-bounds memory accesses will hopefully be caught.
+ * The vmalloc() routines leaves a hole of 4kB between each vmalloced
+ * area for the same reason. ;)
+ */
+#define VMALLOC_OFFSET	(8 * 1024 * 1024)
+#define VMALLOC_START	((unsigned long)high_memory + VMALLOC_OFFSET)
+#ifdef CONFIG_X86_PAE
+#define LAST_PKMAP 512
+#else
+#define LAST_PKMAP 1024
+#endif
+
+#define PKMAP_BASE ((FIXADDR_BOOT_START - PAGE_SIZE * (LAST_PKMAP + 1))	\
+		    & PMD_MASK)
+
+#ifdef CONFIG_HIGHMEM
+# define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
+#else
+# define VMALLOC_END	(FIXADDR_START - 2 * PAGE_SIZE)
+#endif
+
+#define MAXMEM	(VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)
+
+#endif /* _ASM_X86_PGTABLE_32_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index ba09289accaa..6b87bc6d5018 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -2,6 +2,8 @@
 #define _ASM_X86_PGTABLE_64_H
 
 #include <linux/const.h>
+#include <asm/pgtable_64_types.h>
+
 #ifndef __ASSEMBLY__
 
 /*
@@ -11,7 +13,6 @@
 #include <asm/processor.h>
 #include <linux/bitops.h>
 #include <linux/threads.h>
-#include <asm/pda.h>
 
 extern pud_t level3_kernel_pgt[512];
 extern pud_t level3_ident_pgt[512];
@@ -26,32 +27,6 @@ extern void paging_init(void);
 
 #endif /* !__ASSEMBLY__ */
 
-#define SHARED_KERNEL_PMD	0
-
-/*
- * PGDIR_SHIFT determines what a top-level page table entry can map
- */
-#define PGDIR_SHIFT	39
-#define PTRS_PER_PGD	512
-
-/*
- * 3rd level page
- */
-#define PUD_SHIFT	30
-#define PTRS_PER_PUD	512
-
-/*
- * PMD_SHIFT determines the size of the area a middle-level
- * page table can map
- */
-#define PMD_SHIFT	21
-#define PTRS_PER_PMD	512
-
-/*
- * entries per page directory level
- */
-#define PTRS_PER_PTE	512
-
 #ifndef __ASSEMBLY__
 
 #define pte_ERROR(e)					\
@@ -67,9 +42,6 @@ extern void paging_init(void);
 	printk("%s:%d: bad pgd %p(%016lx).\n",		\
 	       __FILE__, __LINE__, &(e), pgd_val(e))
 
-#define pgd_none(x)	(!pgd_val(x))
-#define pud_none(x)	(!pud_val(x))
-
 struct mm_struct;
 
 void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
@@ -134,48 +106,6 @@ static inline void native_pgd_clear(pgd_t *pgd)
 	native_set_pgd(pgd, native_make_pgd(0));
 }
 
-#define pte_same(a, b)		((a).pte == (b).pte)
-
-#endif /* !__ASSEMBLY__ */
-
-#define PMD_SIZE	(_AC(1, UL) << PMD_SHIFT)
-#define PMD_MASK	(~(PMD_SIZE - 1))
-#define PUD_SIZE	(_AC(1, UL) << PUD_SHIFT)
-#define PUD_MASK	(~(PUD_SIZE - 1))
-#define PGDIR_SIZE	(_AC(1, UL) << PGDIR_SHIFT)
-#define PGDIR_MASK	(~(PGDIR_SIZE - 1))
-
-
-#define MAXMEM		 _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#define VMALLOC_START    _AC(0xffffc20000000000, UL)
-#define VMALLOC_END      _AC(0xffffe1ffffffffff, UL)
-#define VMEMMAP_START	 _AC(0xffffe20000000000, UL)
-#define MODULES_VADDR    _AC(0xffffffffa0000000, UL)
-#define MODULES_END      _AC(0xffffffffff000000, UL)
-#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
-
-#ifndef __ASSEMBLY__
-
-static inline int pgd_bad(pgd_t pgd)
-{
-	return (pgd_val(pgd) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
-}
-
-static inline int pud_bad(pud_t pud)
-{
-	return (pud_val(pud) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
-}
-
-static inline int pmd_bad(pmd_t pmd)
-{
-	return (pmd_val(pmd) & ~(PTE_PFN_MASK | _PAGE_USER)) != _KERNPG_TABLE;
-}
-
-#define pte_none(x)	(!pte_val((x)))
-#define pte_present(x)	(pte_val((x)) & (_PAGE_PRESENT | _PAGE_PROTNONE))
-
-#define pages_to_mb(x)	((x) >> (20 - PAGE_SHIFT))   /* FIXME: is this right? */
-
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -184,41 +114,12 @@ static inline int pmd_bad(pmd_t pmd)
 /*
  * Level 4 access.
  */
-#define pgd_page_vaddr(pgd)						\
-	((unsigned long)__va((unsigned long)pgd_val((pgd)) & PTE_PFN_MASK))
-#define pgd_page(pgd)		(pfn_to_page(pgd_val((pgd)) >> PAGE_SHIFT))
-#define pgd_present(pgd) (pgd_val(pgd) & _PAGE_PRESENT)
 static inline int pgd_large(pgd_t pgd) { return 0; }
 #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
 
 /* PUD - Level3 access */
-/* to find an entry in a page-table-directory. */
-#define pud_page_vaddr(pud)						\
-	((unsigned long)__va(pud_val((pud)) & PHYSICAL_PAGE_MASK))
-#define pud_page(pud)	(pfn_to_page(pud_val((pud)) >> PAGE_SHIFT))
-#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
-#define pud_offset(pgd, address)					\
-	((pud_t *)pgd_page_vaddr(*(pgd)) + pud_index((address)))
-#define pud_present(pud) (pud_val((pud)) & _PAGE_PRESENT)
-
-static inline int pud_large(pud_t pte)
-{
-	return (pud_val(pte) & (_PAGE_PSE | _PAGE_PRESENT)) ==
-		(_PAGE_PSE | _PAGE_PRESENT);
-}
 
 /* PMD  - Level 2 access */
-#define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val((pmd)) & PTE_PFN_MASK))
-#define pmd_page(pmd)		(pfn_to_page(pmd_val((pmd)) >> PAGE_SHIFT))
-
-#define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
-#define pmd_offset(dir, address) ((pmd_t *)pud_page_vaddr(*(dir)) + \
-				  pmd_index(address))
-#define pmd_none(x)	(!pmd_val((x)))
-#define pmd_present(x)	(pmd_val((x)) & _PAGE_PRESENT)
-#define pfn_pmd(nr, prot) (__pmd(((nr) << PAGE_SHIFT) | pgprot_val((prot))))
-#define pmd_pfn(x)  ((pmd_val((x)) & __PHYSICAL_MASK) >> PAGE_SHIFT)
-
 #define pte_to_pgoff(pte) ((pte_val((pte)) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
 #define pgoff_to_pte(off) ((pte_t) { .pte = ((off) << PAGE_SHIFT) |	\
 					    _PAGE_FILE })
@@ -226,13 +127,6 @@ static inline int pud_large(pud_t pte)
 
 /* PTE - Level 1 access. */
 
-/* page, protection -> pte */
-#define mk_pte(page, pgprot)	pfn_pte(page_to_pfn((page)), (pgprot))
-
-#define pte_index(address) (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
-#define pte_offset_kernel(dir, address) ((pte_t *) pmd_page_vaddr(*(dir)) + \
-					 pte_index((address)))
-
 /* x86-64 always has all page tables mapped. */
 #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
 #define pte_offset_map_nested(dir, address) pte_offset_kernel((dir), (address))
@@ -266,9 +160,6 @@ extern int direct_gbpages;
 extern int kern_addr_valid(unsigned long addr);
 extern void cleanup_highmap(void);
 
-#define io_remap_pfn_range(vma, vaddr, pfn, size, prot)	\
-	remap_pfn_range(vma, vaddr, pfn, size, prot)
-
 #define HAVE_ARCH_UNMAPPED_AREA
 #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
new file mode 100644
index 000000000000..fbf42b8e0383
--- /dev/null
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -0,0 +1,63 @@
+#ifndef _ASM_X86_PGTABLE_64_DEFS_H
+#define _ASM_X86_PGTABLE_64_DEFS_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef unsigned long	pteval_t;
+typedef unsigned long	pmdval_t;
+typedef unsigned long	pudval_t;
+typedef unsigned long	pgdval_t;
+typedef unsigned long	pgprotval_t;
+
+typedef struct { pteval_t pte; } pte_t;
+
+#endif	/* !__ASSEMBLY__ */
+
+#define SHARED_KERNEL_PMD	0
+#define PAGETABLE_LEVELS	4
+
+/*
+ * PGDIR_SHIFT determines what a top-level page table entry can map
+ */
+#define PGDIR_SHIFT	39
+#define PTRS_PER_PGD	512
+
+/*
+ * 3rd level page
+ */
+#define PUD_SHIFT	30
+#define PTRS_PER_PUD	512
+
+/*
+ * PMD_SHIFT determines the size of the area a middle-level
+ * page table can map
+ */
+#define PMD_SHIFT	21
+#define PTRS_PER_PMD	512
+
+/*
+ * entries per page directory level
+ */
+#define PTRS_PER_PTE	512
+
+#define PMD_SIZE	(_AC(1, UL) << PMD_SHIFT)
+#define PMD_MASK	(~(PMD_SIZE - 1))
+#define PUD_SIZE	(_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK	(~(PUD_SIZE - 1))
+#define PGDIR_SIZE	(_AC(1, UL) << PGDIR_SHIFT)
+#define PGDIR_MASK	(~(PGDIR_SIZE - 1))
+
+
+#define MAXMEM		 _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
+#define VMALLOC_START    _AC(0xffffc20000000000, UL)
+#define VMALLOC_END      _AC(0xffffe1ffffffffff, UL)
+#define VMEMMAP_START	 _AC(0xffffe20000000000, UL)
+#define MODULES_VADDR    _AC(0xffffffffa0000000, UL)
+#define MODULES_END      _AC(0xffffffffff000000, UL)
+#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
+
+#endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
new file mode 100644
index 000000000000..4d258ad76a0f
--- /dev/null
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -0,0 +1,328 @@
+#ifndef _ASM_X86_PGTABLE_DEFS_H
+#define _ASM_X86_PGTABLE_DEFS_H
+
+#include <linux/const.h>
+#include <asm/page_types.h>
+
+#define FIRST_USER_ADDRESS	0
+
+#define _PAGE_BIT_PRESENT	0	/* is present */
+#define _PAGE_BIT_RW		1	/* writeable */
+#define _PAGE_BIT_USER		2	/* userspace addressable */
+#define _PAGE_BIT_PWT		3	/* page write through */
+#define _PAGE_BIT_PCD		4	/* page cache disabled */
+#define _PAGE_BIT_ACCESSED	5	/* was accessed (raised by CPU) */
+#define _PAGE_BIT_DIRTY		6	/* was written to (raised by CPU) */
+#define _PAGE_BIT_PSE		7	/* 4 MB (or 2MB) page */
+#define _PAGE_BIT_PAT		7	/* on 4KB pages */
+#define _PAGE_BIT_GLOBAL	8	/* Global TLB entry PPro+ */
+#define _PAGE_BIT_UNUSED1	9	/* available for programmer */
+#define _PAGE_BIT_IOMAP		10	/* flag used to indicate IO mapping */
+#define _PAGE_BIT_UNUSED3	11
+#define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
+#define _PAGE_BIT_SPECIAL	_PAGE_BIT_UNUSED1
+#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_UNUSED1
+#define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
+
+/* If _PAGE_BIT_PRESENT is clear, we use these: */
+/* - if the user mapped it with PROT_NONE; pte_present gives true */
+#define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
+/* - set: nonlinear file mapping, saved PTE; unset:swap */
+#define _PAGE_BIT_FILE		_PAGE_BIT_DIRTY
+
+#define _PAGE_PRESENT	(_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
+#define _PAGE_RW	(_AT(pteval_t, 1) << _PAGE_BIT_RW)
+#define _PAGE_USER	(_AT(pteval_t, 1) << _PAGE_BIT_USER)
+#define _PAGE_PWT	(_AT(pteval_t, 1) << _PAGE_BIT_PWT)
+#define _PAGE_PCD	(_AT(pteval_t, 1) << _PAGE_BIT_PCD)
+#define _PAGE_ACCESSED	(_AT(pteval_t, 1) << _PAGE_BIT_ACCESSED)
+#define _PAGE_DIRTY	(_AT(pteval_t, 1) << _PAGE_BIT_DIRTY)
+#define _PAGE_PSE	(_AT(pteval_t, 1) << _PAGE_BIT_PSE)
+#define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
+#define _PAGE_UNUSED1	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1)
+#define _PAGE_IOMAP	(_AT(pteval_t, 1) << _PAGE_BIT_IOMAP)
+#define _PAGE_UNUSED3	(_AT(pteval_t, 1) << _PAGE_BIT_UNUSED3)
+#define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
+#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
+#define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
+#define _PAGE_CPA_TEST	(_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST)
+#define __HAVE_ARCH_PTE_SPECIAL
+
+#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
+#define _PAGE_NX	(_AT(pteval_t, 1) << _PAGE_BIT_NX)
+#else
+#define _PAGE_NX	(_AT(pteval_t, 0))
+#endif
+
+#define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
+#define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+
+#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
+			 _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
+			 _PAGE_DIRTY)
+
+/* Set of bits not changed in pte_modify */
+#define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
+			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
+
+#define _PAGE_CACHE_MASK	(_PAGE_PCD | _PAGE_PWT)
+#define _PAGE_CACHE_WB		(0)
+#define _PAGE_CACHE_WC		(_PAGE_PWT)
+#define _PAGE_CACHE_UC_MINUS	(_PAGE_PCD)
+#define _PAGE_CACHE_UC		(_PAGE_PCD | _PAGE_PWT)
+
+#define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
+#define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
+				 _PAGE_ACCESSED | _PAGE_NX)
+
+#define PAGE_SHARED_EXEC	__pgprot(_PAGE_PRESENT | _PAGE_RW |	\
+					 _PAGE_USER | _PAGE_ACCESSED)
+#define PAGE_COPY_NOEXEC	__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
+					 _PAGE_ACCESSED | _PAGE_NX)
+#define PAGE_COPY_EXEC		__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
+					 _PAGE_ACCESSED)
+#define PAGE_COPY		PAGE_COPY_NOEXEC
+#define PAGE_READONLY		__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
+					 _PAGE_ACCESSED | _PAGE_NX)
+#define PAGE_READONLY_EXEC	__pgprot(_PAGE_PRESENT | _PAGE_USER |	\
+					 _PAGE_ACCESSED)
+
+#define __PAGE_KERNEL_EXEC						\
+	(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL)
+#define __PAGE_KERNEL		(__PAGE_KERNEL_EXEC | _PAGE_NX)
+
+#define __PAGE_KERNEL_RO		(__PAGE_KERNEL & ~_PAGE_RW)
+#define __PAGE_KERNEL_RX		(__PAGE_KERNEL_EXEC & ~_PAGE_RW)
+#define __PAGE_KERNEL_EXEC_NOCACHE	(__PAGE_KERNEL_EXEC | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_WC		(__PAGE_KERNEL | _PAGE_CACHE_WC)
+#define __PAGE_KERNEL_NOCACHE		(__PAGE_KERNEL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_UC_MINUS		(__PAGE_KERNEL | _PAGE_PCD)
+#define __PAGE_KERNEL_VSYSCALL		(__PAGE_KERNEL_RX | _PAGE_USER)
+#define __PAGE_KERNEL_VSYSCALL_NOCACHE	(__PAGE_KERNEL_VSYSCALL | _PAGE_PCD | _PAGE_PWT)
+#define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
+#define __PAGE_KERNEL_LARGE_NOCACHE	(__PAGE_KERNEL | _PAGE_CACHE_UC | _PAGE_PSE)
+#define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
+
+#define __PAGE_KERNEL_IO		(__PAGE_KERNEL | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO_UC_MINUS	(__PAGE_KERNEL_UC_MINUS | _PAGE_IOMAP)
+#define __PAGE_KERNEL_IO_WC		(__PAGE_KERNEL_WC | _PAGE_IOMAP)
+
+#define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
+#define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
+#define PAGE_KERNEL_EXEC		__pgprot(__PAGE_KERNEL_EXEC)
+#define PAGE_KERNEL_RX			__pgprot(__PAGE_KERNEL_RX)
+#define PAGE_KERNEL_WC			__pgprot(__PAGE_KERNEL_WC)
+#define PAGE_KERNEL_NOCACHE		__pgprot(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_UC_MINUS		__pgprot(__PAGE_KERNEL_UC_MINUS)
+#define PAGE_KERNEL_EXEC_NOCACHE	__pgprot(__PAGE_KERNEL_EXEC_NOCACHE)
+#define PAGE_KERNEL_LARGE		__pgprot(__PAGE_KERNEL_LARGE)
+#define PAGE_KERNEL_LARGE_NOCACHE	__pgprot(__PAGE_KERNEL_LARGE_NOCACHE)
+#define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
+#define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
+#define PAGE_KERNEL_VSYSCALL_NOCACHE	__pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
+
+#define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#define PAGE_KERNEL_IO_UC_MINUS		__pgprot(__PAGE_KERNEL_IO_UC_MINUS)
+#define PAGE_KERNEL_IO_WC		__pgprot(__PAGE_KERNEL_IO_WC)
+
+/*         xwr */
+#define __P000	PAGE_NONE
+#define __P001	PAGE_READONLY
+#define __P010	PAGE_COPY
+#define __P011	PAGE_COPY
+#define __P100	PAGE_READONLY_EXEC
+#define __P101	PAGE_READONLY_EXEC
+#define __P110	PAGE_COPY_EXEC
+#define __P111	PAGE_COPY_EXEC
+
+#define __S000	PAGE_NONE
+#define __S001	PAGE_READONLY
+#define __S010	PAGE_SHARED
+#define __S011	PAGE_SHARED
+#define __S100	PAGE_READONLY_EXEC
+#define __S101	PAGE_READONLY_EXEC
+#define __S110	PAGE_SHARED_EXEC
+#define __S111	PAGE_SHARED_EXEC
+
+/*
+ * early identity mapping  pte attrib macros.
+ */
+#ifdef CONFIG_X86_64
+#define __PAGE_KERNEL_IDENT_LARGE_EXEC	__PAGE_KERNEL_LARGE_EXEC
+#else
+/*
+ * For PDE_IDENT_ATTR include USER bit. As the PDE and PTE protection
+ * bits are combined, this will alow user to access the high address mapped
+ * VDSO in the presence of CONFIG_COMPAT_VDSO
+ */
+#define PTE_IDENT_ATTR	 0x003		/* PRESENT+RW */
+#define PDE_IDENT_ATTR	 0x067		/* PRESENT+RW+USER+DIRTY+ACCESSED */
+#define PGD_IDENT_ATTR	 0x001		/* PRESENT (no other attributes) */
+#endif
+
+#ifdef CONFIG_X86_32
+# include "pgtable_32_types.h"
+#else
+# include "pgtable_64_types.h"
+#endif
+
+#ifndef __ASSEMBLY__
+
+#include <linux/types.h>
+
+/* PTE_PFN_MASK extracts the PFN from a (pte|pmd|pud|pgd)val_t */
+#define PTE_PFN_MASK		((pteval_t)PHYSICAL_PAGE_MASK)
+
+/* PTE_FLAGS_MASK extracts the flags from a (pte|pmd|pud|pgd)val_t */
+#define PTE_FLAGS_MASK		(~PTE_PFN_MASK)
+
+typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
+
+typedef struct { pgdval_t pgd; } pgd_t;
+
+static inline pgd_t native_make_pgd(pgdval_t val)
+{
+	return (pgd_t) { val };
+}
+
+static inline pgdval_t native_pgd_val(pgd_t pgd)
+{
+	return pgd.pgd;
+}
+
+static inline pgdval_t pgd_flags(pgd_t pgd)
+{
+	return native_pgd_val(pgd) & PTE_FLAGS_MASK;
+}
+
+#if PAGETABLE_LEVELS > 3
+typedef struct { pudval_t pud; } pud_t;
+
+static inline pud_t native_make_pud(pmdval_t val)
+{
+	return (pud_t) { val };
+}
+
+static inline pudval_t native_pud_val(pud_t pud)
+{
+	return pud.pud;
+}
+#else
+#include <asm-generic/pgtable-nopud.h>
+
+static inline pudval_t native_pud_val(pud_t pud)
+{
+	return native_pgd_val(pud.pgd);
+}
+#endif
+
+#if PAGETABLE_LEVELS > 2
+typedef struct { pmdval_t pmd; } pmd_t;
+
+static inline pmd_t native_make_pmd(pmdval_t val)
+{
+	return (pmd_t) { val };
+}
+
+static inline pmdval_t native_pmd_val(pmd_t pmd)
+{
+	return pmd.pmd;
+}
+#else
+#include <asm-generic/pgtable-nopmd.h>
+
+static inline pmdval_t native_pmd_val(pmd_t pmd)
+{
+	return native_pgd_val(pmd.pud.pgd);
+}
+#endif
+
+static inline pudval_t pud_flags(pud_t pud)
+{
+	return native_pud_val(pud) & PTE_FLAGS_MASK;
+}
+
+static inline pmdval_t pmd_flags(pmd_t pmd)
+{
+	return native_pmd_val(pmd) & PTE_FLAGS_MASK;
+}
+
+static inline pte_t native_make_pte(pteval_t val)
+{
+	return (pte_t) { .pte = val };
+}
+
+static inline pteval_t native_pte_val(pte_t pte)
+{
+	return pte.pte;
+}
+
+static inline pteval_t pte_flags(pte_t pte)
+{
+	return native_pte_val(pte) & PTE_FLAGS_MASK;
+}
+
+#define pgprot_val(x)	((x).pgprot)
+#define __pgprot(x)	((pgprot_t) { (x) } )
+
+
+typedef struct page *pgtable_t;
+
+extern pteval_t __supported_pte_mask;
+extern int nx_enabled;
+
+#define pgprot_writecombine	pgprot_writecombine
+extern pgprot_t pgprot_writecombine(pgprot_t prot);
+
+/* Indicate that x86 has its own track and untrack pfn vma functions */
+#define __HAVE_PFNMAP_TRACKING
+
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+                              unsigned long size, pgprot_t vma_prot);
+int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
+                              unsigned long size, pgprot_t *vma_prot);
+
+/* Install a pte for a particular vaddr in kernel space. */
+void set_pte_vaddr(unsigned long vaddr, pte_t pte);
+
+#ifdef CONFIG_X86_32
+extern void native_pagetable_setup_start(pgd_t *base);
+extern void native_pagetable_setup_done(pgd_t *base);
+#else
+static inline void native_pagetable_setup_start(pgd_t *base) {}
+static inline void native_pagetable_setup_done(pgd_t *base) {}
+#endif
+
+struct seq_file;
+extern void arch_report_meminfo(struct seq_file *m);
+
+enum {
+	PG_LEVEL_NONE,
+	PG_LEVEL_4K,
+	PG_LEVEL_2M,
+	PG_LEVEL_1G,
+	PG_LEVEL_NUM
+};
+
+#ifdef CONFIG_PROC_FS
+extern void update_page_count(int level, unsigned long pages);
+#else
+static inline void update_page_count(int level, unsigned long pages) { }
+#endif
+
+/*
+ * Helper function that returns the kernel pagetable entry controlling
+ * the virtual address 'address'. NULL means no pagetable entry present.
+ * NOTE: the return type is pte_t but if the pmd is PSE then we return it
+ * as a pte too.
+ */
+extern pte_t *lookup_address(unsigned long address, unsigned int *level);
+
+#endif	/* !__ASSEMBLY__ */
+
+#endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3bfd5235a9eb..76139506c3e4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -16,6 +16,7 @@ struct mm_struct;
 #include <asm/cpufeature.h>
 #include <asm/system.h>
 #include <asm/page.h>
+#include <asm/pgtable_types.h>
 #include <asm/percpu.h>
 #include <asm/msr.h>
 #include <asm/desc_defs.h>
@@ -73,7 +74,7 @@ struct cpuinfo_x86 {
 	char			pad0;
 #else
 	/* Number of 4K pages in DTLB/ITLB combined(in pages): */
-	int			 x86_tlbsize;
+	int			x86_tlbsize;
 	__u8			x86_virt_bits;
 	__u8			x86_phys_bits;
 #endif
@@ -247,7 +248,6 @@ struct x86_hw_tss {
 #define IO_BITMAP_LONGS			(IO_BITMAP_BYTES/sizeof(long))
 #define IO_BITMAP_OFFSET		offsetof(struct tss_struct, io_bitmap)
 #define INVALID_IO_BITMAP_OFFSET	0x8000
-#define INVALID_IO_BITMAP_OFFSET_LAZY	0x9000
 
 struct tss_struct {
 	/*
@@ -262,11 +262,6 @@ struct tss_struct {
 	 * be within the limit.
 	 */
 	unsigned long		io_bitmap[IO_BITMAP_LONGS + 1];
-	/*
-	 * Cache the current maximum and the last task that used the bitmap:
-	 */
-	unsigned long		io_bitmap_max;
-	struct thread_struct	*io_bitmap_owner;
 
 	/*
 	 * .. and then another 0x100 bytes for the emergency kernel stack:
@@ -378,9 +373,30 @@ union thread_xstate {
 
 #ifdef CONFIG_X86_64
 DECLARE_PER_CPU(struct orig_ist, orig_ist);
+
+union irq_stack_union {
+	char irq_stack[IRQ_STACK_SIZE];
+	/*
+	 * GCC hardcodes the stack canary as %gs:40.  Since the
+	 * irq_stack is the object at %gs:0, we reserve the bottom
+	 * 48 bytes of the irq stack for the canary.
+	 */
+	struct {
+		char gs_base[40];
+		unsigned long stack_canary;
+	};
+};
+
+DECLARE_PER_CPU(union irq_stack_union, irq_stack_union);
+DECLARE_INIT_PER_CPU(irq_stack_union);
+
+DECLARE_PER_CPU(char *, irq_stack_ptr);
+#else	/* X86_64 */
+#ifdef CONFIG_CC_STACKPROTECTOR
+DECLARE_PER_CPU(unsigned long, stack_canary);
 #endif
+#endif	/* X86_64 */
 
-extern void print_cpu_info(struct cpuinfo_x86 *);
 extern unsigned int xstate_size;
 extern void free_thread_xstate(struct task_struct *);
 extern struct kmem_cache *task_xstate_cachep;
@@ -752,9 +768,9 @@ extern int sysenter_setup(void);
 extern struct desc_ptr		early_gdt_descr;
 
 extern void cpu_set_gdt(int);
-extern void switch_to_new_gdt(void);
+extern void switch_to_new_gdt(int);
+extern void load_percpu_segment(int);
 extern void cpu_init(void);
-extern void init_gdt(int cpu);
 
 static inline unsigned long get_debugctlmsr(void)
 {
@@ -839,6 +855,7 @@ static inline void spin_lock_prefetch(const void *x)
  * User space process size: 3GB (default).
  */
 #define TASK_SIZE		PAGE_OFFSET
+#define TASK_SIZE_MAX		TASK_SIZE
 #define STACK_TOP		TASK_SIZE
 #define STACK_TOP_MAX		STACK_TOP
 
@@ -898,7 +915,7 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 /*
  * User space process size. 47bits minus one guard page.
  */
-#define TASK_SIZE64	((1UL << 47) - PAGE_SIZE)
+#define TASK_SIZE_MAX	((1UL << 47) - PAGE_SIZE)
 
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
@@ -907,12 +924,12 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 					0xc0000000 : 0xFFFFe000)
 
 #define TASK_SIZE		(test_thread_flag(TIF_IA32) ? \
-					IA32_PAGE_OFFSET : TASK_SIZE64)
+					IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 #define TASK_SIZE_OF(child)	((test_tsk_thread_flag(child, TIF_IA32)) ? \
-					IA32_PAGE_OFFSET : TASK_SIZE64)
+					IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 
 #define STACK_TOP		TASK_SIZE
-#define STACK_TOP_MAX		TASK_SIZE64
+#define STACK_TOP_MAX		TASK_SIZE_MAX
 
 #define INIT_THREAD  { \
 	.sp0 = (unsigned long)&init_stack + sizeof(init_stack) \
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index d6a22f92ba77..49fb3ecf3bb3 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -18,11 +18,7 @@ extern void syscall32_cpu_init(void);
 
 extern void check_efer(void);
 
-#ifdef CONFIG_X86_BIOS_REBOOT
 extern int reboot_force;
-#else
-static const int reboot_force = 0;
-#endif
 
 long do_arch_prctl(struct task_struct *task, int code, unsigned long addr);
 
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6d34d954c228..e304b66abeea 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -28,7 +28,7 @@ struct pt_regs {
 	int  xds;
 	int  xes;
 	int  xfs;
-	/* int  gs; */
+	int  xgs;
 	long orig_eax;
 	long eip;
 	int  xcs;
@@ -50,7 +50,7 @@ struct pt_regs {
 	unsigned long ds;
 	unsigned long es;
 	unsigned long fs;
-	/* int  gs; */
+	unsigned long gs;
 	unsigned long orig_ax;
 	unsigned long ip;
 	unsigned long cs;
diff --git a/arch/x86/include/asm/mach-rdc321x/rdc321x_defs.h b/arch/x86/include/asm/rdc321x_defs.h
index c8e9c8bed3d0..c8e9c8bed3d0 100644
--- a/arch/x86/include/asm/mach-rdc321x/rdc321x_defs.h
+++ b/arch/x86/include/asm/rdc321x_defs.h
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 1dc1b51ac623..14e0ed86a6f9 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -61,7 +61,7 @@
  *
  *  26 - ESPFIX small SS
  *  27 - per-cpu			[ offset to per-cpu data area ]
- *  28 - unused
+ *  28 - stack_canary-20		[ for stack protector ]
  *  29 - unused
  *  30 - unused
  *  31 - TSS for double fault handler
@@ -95,6 +95,13 @@
 #define __KERNEL_PERCPU 0
 #endif
 
+#define GDT_ENTRY_STACK_CANARY		(GDT_ENTRY_KERNEL_BASE + 16)
+#ifdef CONFIG_CC_STACKPROTECTOR
+#define __KERNEL_STACK_CANARY		(GDT_ENTRY_STACK_CANARY * 8)
+#else
+#define __KERNEL_STACK_CANARY		0
+#endif
+
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
 /*
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index c2308f5250fd..05c6f6b11fd5 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -13,6 +13,7 @@
 struct mpc_cpu;
 struct mpc_bus;
 struct mpc_oemtable;
+
 struct x86_quirks {
 	int (*arch_pre_time_init)(void);
 	int (*arch_time_init)(void);
@@ -28,11 +29,18 @@ struct x86_quirks {
 	void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name);
 	void (*mpc_oem_pci_bus)(struct mpc_bus *m);
 	void (*smp_read_mpc_oem)(struct mpc_oemtable *oemtable,
-                                    unsigned short oemsize);
+				unsigned short oemsize);
 	int (*setup_ioapic_ids)(void);
-	int (*update_genapic)(void);
 };
 
+extern void x86_quirk_pre_intr_init(void);
+extern void x86_quirk_intr_init(void);
+
+extern void x86_quirk_trap_init(void);
+
+extern void x86_quirk_pre_time_init(void);
+extern void x86_quirk_time_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #ifdef __i386__
@@ -56,7 +64,11 @@ struct x86_quirks {
 #include <asm/bootparam.h>
 
 /* Interrupt control for vSMPowered x86_64 systems */
+#ifdef CONFIG_X86_VSMP
 void vsmp_init(void);
+#else
+static inline void vsmp_init(void) { }
+#endif
 
 void setup_bios_corruption_check(void);
 
@@ -68,8 +80,6 @@ static inline void visws_early_detect(void) { }
 static inline int is_visws_box(void) { return 0; }
 #endif
 
-extern int wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip);
-extern int wakeup_secondary_cpu_via_init(int apicid, unsigned long start_eip);
 extern struct x86_quirks *x86_quirks;
 extern unsigned long saved_video_mode;
 
@@ -99,7 +109,6 @@ extern unsigned long init_pg_tables_start;
 extern unsigned long init_pg_tables_end;
 
 #else
-void __init x86_64_init_pda(void);
 void __init x86_64_start_kernel(char *real_mode);
 void __init x86_64_start_reservations(char *real_mode_data);
 
diff --git a/arch/x86/include/asm/mach-default/setup_arch.h b/arch/x86/include/asm/setup_arch.h
index 38846208b548..38846208b548 100644
--- a/arch/x86/include/asm/mach-default/setup_arch.h
+++ b/arch/x86/include/asm/setup_arch.h
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 19953df61c52..47d0e21f2b9e 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -15,34 +15,8 @@
 #  include <asm/io_apic.h>
 # endif
 #endif
-#include <asm/pda.h>
 #include <asm/thread_info.h>
-
-#ifdef CONFIG_X86_64
-
-extern cpumask_var_t cpu_callin_mask;
-extern cpumask_var_t cpu_callout_mask;
-extern cpumask_var_t cpu_initialized_mask;
-extern cpumask_var_t cpu_sibling_setup_mask;
-
-#else /* CONFIG_X86_32 */
-
-extern cpumask_t cpu_callin_map;
-extern cpumask_t cpu_callout_map;
-extern cpumask_t cpu_initialized;
-extern cpumask_t cpu_sibling_setup_map;
-
-#define cpu_callin_mask		((struct cpumask *)&cpu_callin_map)
-#define cpu_callout_mask	((struct cpumask *)&cpu_callout_map)
-#define cpu_initialized_mask	((struct cpumask *)&cpu_initialized)
-#define cpu_sibling_setup_mask	((struct cpumask *)&cpu_sibling_setup_map)
-
-#endif /* CONFIG_X86_32 */
-
-extern void (*mtrr_hook)(void);
-extern void zap_low_mappings(void);
-
-extern int __cpuinit get_local_pda(int cpu);
+#include <asm/cpumask.h>
 
 extern int smp_num_siblings;
 extern unsigned int num_processors;
@@ -50,9 +24,7 @@ extern unsigned int num_processors;
 DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 DECLARE_PER_CPU(u16, cpu_llc_id);
-#ifdef CONFIG_X86_32
 DECLARE_PER_CPU(int, cpu_number);
-#endif
 
 static inline struct cpumask *cpu_sibling_mask(int cpu)
 {
@@ -167,8 +139,6 @@ void play_dead_common(void);
 void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 
-extern void prefill_possible_map(void);
-
 void smp_store_cpu_info(int id);
 #define cpu_physical_id(cpu)	per_cpu(x86_cpu_to_apicid, cpu)
 
@@ -177,10 +147,6 @@ static inline int num_booting_cpus(void)
 {
 	return cpumask_weight(cpu_callout_mask);
 }
-#else
-static inline void prefill_possible_map(void)
-{
-}
 #endif /* CONFIG_SMP */
 
 extern unsigned disabled_cpus __cpuinitdata;
@@ -191,11 +157,11 @@ extern unsigned disabled_cpus __cpuinitdata;
  * from the initial startup. We map APIC_BASE very early in page_setup(),
  * so this is correct in the x86 case.
  */
-#define raw_smp_processor_id() (x86_read_percpu(cpu_number))
+#define raw_smp_processor_id() (percpu_read(cpu_number))
 extern int safe_smp_processor_id(void);
 
 #elif defined(CONFIG_X86_64_SMP)
-#define raw_smp_processor_id()	read_pda(cpunumber)
+#define raw_smp_processor_id() (percpu_read(cpu_number))
 
 #define stack_smp_processor_id()					\
 ({								\
@@ -205,10 +171,6 @@ extern int safe_smp_processor_id(void);
 })
 #define safe_smp_processor_id()		smp_processor_id()
 
-#else /* !CONFIG_X86_32_SMP && !CONFIG_X86_64_SMP */
-#define cpu_physical_id(cpu)		boot_cpu_physical_apicid
-#define safe_smp_processor_id()		0
-#define stack_smp_processor_id() 	0
 #endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -220,28 +182,9 @@ static inline int logical_smp_processor_id(void)
 	return GET_APIC_LOGICAL_ID(*(u32 *)(APIC_BASE + APIC_LDR));
 }
 
-#include <mach_apicdef.h>
-static inline unsigned int read_apic_id(void)
-{
-	unsigned int reg;
-
-	reg = *(u32 *)(APIC_BASE + APIC_ID);
-
-	return GET_APIC_ID(reg);
-}
 #endif
 
-
-# if defined(APIC_DEFINITION) || defined(CONFIG_X86_64)
 extern int hard_smp_processor_id(void);
-# else
-#include <mach_apicdef.h>
-static inline int hard_smp_processor_id(void)
-{
-	/* we don't want to mark this access volatile - bad code generation */
-	return read_apic_id();
-}
-# endif /* APIC_DEFINITION */
 
 #else /* CONFIG_X86_LOCAL_APIC */
 
@@ -251,11 +194,5 @@ static inline int hard_smp_processor_id(void)
 
 #endif /* CONFIG_X86_LOCAL_APIC */
 
-#ifdef CONFIG_X86_HAS_BOOT_CPU_ID
-extern unsigned char boot_cpu_id;
-#else
-#define boot_cpu_id	0
-#endif
-
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_SMP_H */
diff --git a/arch/x86/include/asm/mach-default/smpboot_hooks.h b/arch/x86/include/asm/smpboot_hooks.h
index 23bf52103b89..1def60114906 100644
--- a/arch/x86/include/asm/mach-default/smpboot_hooks.h
+++ b/arch/x86/include/asm/smpboot_hooks.h
@@ -13,10 +13,10 @@ static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 	CMOS_WRITE(0xa, 0xf);
 	local_flush_tlb();
 	pr_debug("1.\n");
-	*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) =
+	*((volatile unsigned short *)phys_to_virt(apic->trampoline_phys_high)) =
 								 start_eip >> 4;
 	pr_debug("2.\n");
-	*((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) =
+	*((volatile unsigned short *)phys_to_virt(apic->trampoline_phys_low)) =
 							 start_eip & 0xf;
 	pr_debug("3.\n");
 }
@@ -34,7 +34,7 @@ static inline void smpboot_restore_warm_reset_vector(void)
 	 */
 	CMOS_WRITE(0, 0xf);
 
-	*((volatile long *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
+	*((volatile long *)phys_to_virt(apic->trampoline_phys_low)) = 0;
 }
 
 static inline void __init smpboot_setup_io_apic(void)
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 8247e94ac6b1..3a5696656680 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -172,70 +172,8 @@ static inline int __ticket_spin_is_contended(raw_spinlock_t *lock)
 	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
 }
 
-#ifdef CONFIG_PARAVIRT
-/*
- * Define virtualization-friendly old-style lock byte lock, for use in
- * pv_lock_ops if desired.
- *
- * This differs from the pre-2.6.24 spinlock by always using xchgb
- * rather than decb to take the lock; this allows it to use a
- * zero-initialized lock structure.  It also maintains a 1-byte
- * contention counter, so that we can implement
- * __byte_spin_is_contended.
- */
-struct __byte_spinlock {
-	s8 lock;
-	s8 spinners;
-};
-
-static inline int __byte_spin_is_locked(raw_spinlock_t *lock)
-{
-	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
-	return bl->lock != 0;
-}
-
-static inline int __byte_spin_is_contended(raw_spinlock_t *lock)
-{
-	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
-	return bl->spinners != 0;
-}
-
-static inline void __byte_spin_lock(raw_spinlock_t *lock)
-{
-	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
-	s8 val = 1;
-
-	asm("1: xchgb %1, %0\n"
-	    "   test %1,%1\n"
-	    "   jz 3f\n"
-	    "   " LOCK_PREFIX "incb %2\n"
-	    "2: rep;nop\n"
-	    "   cmpb $1, %0\n"
-	    "   je 2b\n"
-	    "   " LOCK_PREFIX "decb %2\n"
-	    "   jmp 1b\n"
-	    "3:"
-	    : "+m" (bl->lock), "+q" (val), "+m" (bl->spinners): : "memory");
-}
-
-static inline int __byte_spin_trylock(raw_spinlock_t *lock)
-{
-	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
-	u8 old = 1;
-
-	asm("xchgb %1,%0"
-	    : "+m" (bl->lock), "+q" (old) : : "memory");
+#ifndef CONFIG_PARAVIRT
 
-	return old == 0;
-}
-
-static inline void __byte_spin_unlock(raw_spinlock_t *lock)
-{
-	struct __byte_spinlock *bl = (struct __byte_spinlock *)lock;
-	smp_wmb();
-	bl->lock = 0;
-}
-#else  /* !CONFIG_PARAVIRT */
 static inline int __raw_spin_is_locked(raw_spinlock_t *lock)
 {
 	return __ticket_spin_is_locked(lock);
@@ -268,7 +206,7 @@ static __always_inline void __raw_spin_lock_flags(raw_spinlock_t *lock,
 	__raw_spin_lock(lock);
 }
 
-#endif	/* CONFIG_PARAVIRT */
+#endif
 
 static inline void __raw_spin_unlock_wait(raw_spinlock_t *lock)
 {
@@ -330,8 +268,7 @@ static inline int __raw_read_trylock(raw_rwlock_t *lock)
 {
 	atomic_t *count = (atomic_t *)lock;
 
-	atomic_dec(count);
-	if (atomic_read(count) >= 0)
+	if (atomic_dec_return(count) >= 0)
 		return 1;
 	atomic_inc(count);
 	return 0;
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
new file mode 100644
index 000000000000..c2d742c6e15f
--- /dev/null
+++ b/arch/x86/include/asm/stackprotector.h
@@ -0,0 +1,124 @@
+/*
+ * GCC stack protector support.
+ *
+ * Stack protector works by putting predefined pattern at the start of
+ * the stack frame and verifying that it hasn't been overwritten when
+ * returning from the function.  The pattern is called stack canary
+ * and unfortunately gcc requires it to be at a fixed offset from %gs.
+ * On x86_64, the offset is 40 bytes and on x86_32 20 bytes.  x86_64
+ * and x86_32 use segment registers differently and thus handles this
+ * requirement differently.
+ *
+ * On x86_64, %gs is shared by percpu area and stack canary.  All
+ * percpu symbols are zero based and %gs points to the base of percpu
+ * area.  The first occupant of the percpu area is always
+ * irq_stack_union which contains stack_canary at offset 40.  Userland
+ * %gs is always saved and restored on kernel entry and exit using
+ * swapgs, so stack protector doesn't add any complexity there.
+ *
+ * On x86_32, it's slightly more complicated.  As in x86_64, %gs is
+ * used for userland TLS.  Unfortunately, some processors are much
+ * slower at loading segment registers with different value when
+ * entering and leaving the kernel, so the kernel uses %fs for percpu
+ * area and manages %gs lazily so that %gs is switched only when
+ * necessary, usually during task switch.
+ *
+ * As gcc requires the stack canary at %gs:20, %gs can't be managed
+ * lazily if stack protector is enabled, so the kernel saves and
+ * restores userland %gs on kernel entry and exit.  This behavior is
+ * controlled by CONFIG_X86_32_LAZY_GS and accessors are defined in
+ * system.h to hide the details.
+ */
+
+#ifndef _ASM_STACKPROTECTOR_H
+#define _ASM_STACKPROTECTOR_H 1
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+
+#include <asm/tsc.h>
+#include <asm/processor.h>
+#include <asm/percpu.h>
+#include <asm/system.h>
+#include <asm/desc.h>
+#include <linux/random.h>
+
+/*
+ * 24 byte read-only segment initializer for stack canary.  Linker
+ * can't handle the address bit shifting.  Address will be set in
+ * head_32 for boot CPU and setup_per_cpu_areas() for others.
+ */
+#define GDT_STACK_CANARY_INIT						\
+	[GDT_ENTRY_STACK_CANARY] = { { { 0x00000018, 0x00409000 } } },
+
+/*
+ * Initialize the stackprotector canary value.
+ *
+ * NOTE: this must only be called from functions that never return,
+ * and it must always be inlined.
+ */
+static __always_inline void boot_init_stack_canary(void)
+{
+	u64 canary;
+	u64 tsc;
+
+#ifdef CONFIG_X86_64
+	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
+#endif
+	/*
+	 * We both use the random pool and the current TSC as a source
+	 * of randomness. The TSC only matters for very early init,
+	 * there it already has some randomness on most systems. Later
+	 * on during the bootup the random pool has true entropy too.
+	 */
+	get_random_bytes(&canary, sizeof(canary));
+	tsc = __native_read_tsc();
+	canary += tsc + (tsc << 32UL);
+
+	current->stack_canary = canary;
+#ifdef CONFIG_X86_64
+	percpu_write(irq_stack_union.stack_canary, canary);
+#else
+	percpu_write(stack_canary, canary);
+#endif
+}
+
+static inline void setup_stack_canary_segment(int cpu)
+{
+#ifdef CONFIG_X86_32
+	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu) - 20;
+	struct desc_struct *gdt_table = get_cpu_gdt_table(cpu);
+	struct desc_struct desc;
+
+	desc = gdt_table[GDT_ENTRY_STACK_CANARY];
+	desc.base0 = canary & 0xffff;
+	desc.base1 = (canary >> 16) & 0xff;
+	desc.base2 = (canary >> 24) & 0xff;
+	write_gdt_entry(gdt_table, GDT_ENTRY_STACK_CANARY, &desc, DESCTYPE_S);
+#endif
+}
+
+static inline void load_stack_canary_segment(void)
+{
+#ifdef CONFIG_X86_32
+	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
+#endif
+}
+
+#else	/* CC_STACKPROTECTOR */
+
+#define GDT_STACK_CANARY_INIT
+
+/* dummy boot_init_stack_canary() is defined in linux/stackprotector.h */
+
+static inline void setup_stack_canary_segment(int cpu)
+{ }
+
+static inline void load_stack_canary_segment(void)
+{
+#ifdef CONFIG_X86_32
+	asm volatile ("mov %0, %%gs" : : "r" (0));
+#endif
+}
+
+#endif	/* CC_STACKPROTECTOR */
+#endif	/* _ASM_STACKPROTECTOR_H */
diff --git a/arch/x86/include/asm/summit/apic.h b/arch/x86/include/asm/summit/apic.h
deleted file mode 100644
index 93d2c8667cfe..000000000000
--- a/arch/x86/include/asm/summit/apic.h
+++ /dev/null
@@ -1,202 +0,0 @@
-#ifndef __ASM_SUMMIT_APIC_H
-#define __ASM_SUMMIT_APIC_H
-
-#include <asm/smp.h>
-#include <linux/gfp.h>
-
-#define esr_disable (1)
-#define NO_BALANCE_IRQ (0)
-
-/* In clustered mode, the high nibble of APIC ID is a cluster number.
- * The low nibble is a 4-bit bitmap. */
-#define XAPIC_DEST_CPUS_SHIFT	4
-#define XAPIC_DEST_CPUS_MASK	((1u << XAPIC_DEST_CPUS_SHIFT) - 1)
-#define XAPIC_DEST_CLUSTER_MASK	(XAPIC_DEST_CPUS_MASK << XAPIC_DEST_CPUS_SHIFT)
-
-#define APIC_DFR_VALUE	(APIC_DFR_CLUSTER)
-
-static inline const cpumask_t *target_cpus(void)
-{
-	/* CPU_MASK_ALL (0xff) has undefined behaviour with
-	 * dest_LowestPrio mode logical clustered apic interrupt routing
-	 * Just start on cpu 0.  IRQ balancing will spread load
-	 */
-	return &cpumask_of_cpu(0);
-}
-
-#define INT_DELIVERY_MODE (dest_LowestPrio)
-#define INT_DEST_MODE 1     /* logical delivery broadcast to all procs */
-
-static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
-{
-	return 0;
-}
-
-/* we don't use the phys_cpu_present_map to indicate apicid presence */
-static inline unsigned long check_apicid_present(int bit)
-{
-	return 1;
-}
-
-#define apicid_cluster(apicid) ((apicid) & XAPIC_DEST_CLUSTER_MASK)
-
-extern u8 cpu_2_logical_apicid[];
-
-static inline void init_apic_ldr(void)
-{
-	unsigned long val, id;
-	int count = 0;
-	u8 my_id = (u8)hard_smp_processor_id();
-	u8 my_cluster = (u8)apicid_cluster(my_id);
-#ifdef CONFIG_SMP
-	u8 lid;
-	int i;
-
-	/* Create logical APIC IDs by counting CPUs already in cluster. */
-	for (count = 0, i = nr_cpu_ids; --i >= 0; ) {
-		lid = cpu_2_logical_apicid[i];
-		if (lid != BAD_APICID && apicid_cluster(lid) == my_cluster)
-			++count;
-	}
-#endif
-	/* We only have a 4 wide bitmap in cluster mode.  If a deranged
-	 * BIOS puts 5 CPUs in one APIC cluster, we're hosed. */
-	BUG_ON(count >= XAPIC_DEST_CPUS_SHIFT);
-	id = my_cluster | (1UL << count);
-	apic_write(APIC_DFR, APIC_DFR_VALUE);
-	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
-	val |= SET_APIC_LOGICAL_ID(id);
-	apic_write(APIC_LDR, val);
-}
-
-static inline int multi_timer_check(int apic, int irq)
-{
-	return 0;
-}
-
-static inline int apic_id_registered(void)
-{
-	return 1;
-}
-
-static inline void setup_apic_routing(void)
-{
-	printk("Enabling APIC mode:  Summit.  Using %d I/O APICs\n",
-						nr_ioapics);
-}
-
-static inline int apicid_to_node(int logical_apicid)
-{
-#ifdef CONFIG_SMP
-	return apicid_2_node[hard_smp_processor_id()];
-#else
-	return 0;
-#endif
-}
-
-/* Mapping from cpu number to logical apicid */
-static inline int cpu_to_logical_apicid(int cpu)
-{
-#ifdef CONFIG_SMP
-	if (cpu >= nr_cpu_ids)
-		return BAD_APICID;
-	return (int)cpu_2_logical_apicid[cpu];
-#else
-	return logical_smp_processor_id();
-#endif
-}
-
-static inline int cpu_present_to_apicid(int mps_cpu)
-{
-	if (mps_cpu < nr_cpu_ids)
-		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
-	else
-		return BAD_APICID;
-}
-
-static inline physid_mask_t ioapic_phys_id_map(physid_mask_t phys_id_map)
-{
-	/* For clustered we don't have a good way to do this yet - hack */
-	return physids_promote(0x0F);
-}
-
-static inline physid_mask_t apicid_to_cpu_present(int apicid)
-{
-	return physid_mask_of_physid(0);
-}
-
-static inline void setup_portio_remap(void)
-{
-}
-
-static inline int check_phys_apicid_present(int boot_cpu_physical_apicid)
-{
-	return 1;
-}
-
-static inline void enable_apic_mode(void)
-{
-}
-
-static inline unsigned int cpu_mask_to_apicid(const cpumask_t *cpumask)
-{
-	int num_bits_set;
-	int cpus_found = 0;
-	int cpu;
-	int apicid;
-
-	num_bits_set = cpus_weight(*cpumask);
-	/* Return id to all */
-	if (num_bits_set >= nr_cpu_ids)
-		return (int) 0xFF;
-	/*
-	 * The cpus in the mask must all be on the apic cluster.  If are not
-	 * on the same apicid cluster return default value of TARGET_CPUS.
-	 */
-	cpu = first_cpu(*cpumask);
-	apicid = cpu_to_logical_apicid(cpu);
-	while (cpus_found < num_bits_set) {
-		if (cpu_isset(cpu, *cpumask)) {
-			int new_apicid = cpu_to_logical_apicid(cpu);
-			if (apicid_cluster(apicid) !=
-					apicid_cluster(new_apicid)){
-				printk ("%s: Not a valid mask!\n", __func__);
-				return 0xFF;
-			}
-			apicid = apicid | new_apicid;
-			cpus_found++;
-		}
-		cpu++;
-	}
-	return apicid;
-}
-
-static inline unsigned int cpu_mask_to_apicid_and(const struct cpumask *inmask,
-						  const struct cpumask *andmask)
-{
-	int apicid = cpu_to_logical_apicid(0);
-	cpumask_var_t cpumask;
-
-	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
-		return apicid;
-
-	cpumask_and(cpumask, inmask, andmask);
-	cpumask_and(cpumask, cpumask, cpu_online_mask);
-	apicid = cpu_mask_to_apicid(cpumask);
-
-	free_cpumask_var(cpumask);
-	return apicid;
-}
-
-/* cpuid returns the value latched in the HW at reset, not the APIC ID
- * register's value.  For any box whose BIOS changes APIC IDs, like
- * clustered APIC systems, we must use hard_smp_processor_id.
- *
- * See Intel's IA-32 SW Dev's Manual Vol2 under CPUID.
- */
-static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return hard_smp_processor_id() >> index_msb;
-}
-
-#endif /* __ASM_SUMMIT_APIC_H */
diff --git a/arch/x86/include/asm/summit/apicdef.h b/arch/x86/include/asm/summit/apicdef.h
deleted file mode 100644
index f3fbca1f61c1..000000000000
--- a/arch/x86/include/asm/summit/apicdef.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_SUMMIT_APICDEF_H
-#define __ASM_SUMMIT_APICDEF_H
-
-#define		APIC_ID_MASK		(0xFF<<24)
-
-static inline unsigned get_apic_id(unsigned long x)
-{
-	return (x>>24)&0xFF;
-}
-
-#define		GET_APIC_ID(x)	get_apic_id(x)
-
-#endif
diff --git a/arch/x86/include/asm/summit/ipi.h b/arch/x86/include/asm/summit/ipi.h
deleted file mode 100644
index a8a2c24f50cc..000000000000
--- a/arch/x86/include/asm/summit/ipi.h
+++ /dev/null
@@ -1,26 +0,0 @@
-#ifndef __ASM_SUMMIT_IPI_H
-#define __ASM_SUMMIT_IPI_H
-
-void send_IPI_mask_sequence(const cpumask_t *mask, int vector);
-void send_IPI_mask_allbutself(const cpumask_t *mask, int vector);
-
-static inline void send_IPI_mask(const cpumask_t *mask, int vector)
-{
-	send_IPI_mask_sequence(mask, vector);
-}
-
-static inline void send_IPI_allbutself(int vector)
-{
-	cpumask_t mask = cpu_online_map;
-	cpu_clear(smp_processor_id(), mask);
-
-	if (!cpus_empty(mask))
-		send_IPI_mask(&mask, vector);
-}
-
-static inline void send_IPI_all(int vector)
-{
-	send_IPI_mask(&cpu_online_map, vector);
-}
-
-#endif /* __ASM_SUMMIT_IPI_H */
diff --git a/arch/x86/include/asm/summit/mpparse.h b/arch/x86/include/asm/summit/mpparse.h
deleted file mode 100644
index 380e86c02363..000000000000
--- a/arch/x86/include/asm/summit/mpparse.h
+++ /dev/null
@@ -1,109 +0,0 @@
-#ifndef __ASM_SUMMIT_MPPARSE_H
-#define __ASM_SUMMIT_MPPARSE_H
-
-#include <asm/tsc.h>
-
-extern int use_cyclone;
-
-#ifdef CONFIG_X86_SUMMIT_NUMA
-extern void setup_summit(void);
-#else
-#define setup_summit()	{}
-#endif
-
-static inline int mps_oem_check(struct mpc_table *mpc, char *oem,
-		char *productid)
-{
-	if (!strncmp(oem, "IBM ENSW", 8) &&
-			(!strncmp(productid, "VIGIL SMP", 9)
-			 || !strncmp(productid, "EXA", 3)
-			 || !strncmp(productid, "RUTHLESS SMP", 12))){
-		mark_tsc_unstable("Summit based system");
-		use_cyclone = 1; /*enable cyclone-timer*/
-		setup_summit();
-		return 1;
-	}
-	return 0;
-}
-
-/* Hook from generic ACPI tables.c */
-static inline int acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	if (!strncmp(oem_id, "IBM", 3) &&
-	    (!strncmp(oem_table_id, "SERVIGIL", 8)
-	     || !strncmp(oem_table_id, "EXA", 3))){
-		mark_tsc_unstable("Summit based system");
-		use_cyclone = 1; /*enable cyclone-timer*/
-		setup_summit();
-		return 1;
-	}
-	return 0;
-}
-
-struct rio_table_hdr {
-	unsigned char version;      /* Version number of this data structure           */
-	                            /* Version 3 adds chassis_num & WP_index           */
-	unsigned char num_scal_dev; /* # of Scalability devices (Twisters for Vigil)   */
-	unsigned char num_rio_dev;  /* # of RIO I/O devices (Cyclones and Winnipegs)   */
-} __attribute__((packed));
-
-struct scal_detail {
-	unsigned char node_id;      /* Scalability Node ID                             */
-	unsigned long CBAR;         /* Address of 1MB register space                   */
-	unsigned char port0node;    /* Node ID port connected to: 0xFF=None            */
-	unsigned char port0port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
-	unsigned char port1node;    /* Node ID port connected to: 0xFF = None          */
-	unsigned char port1port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
-	unsigned char port2node;    /* Node ID port connected to: 0xFF = None          */
-	unsigned char port2port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
-	unsigned char chassis_num;  /* 1 based Chassis number (1 = boot node)          */
-} __attribute__((packed));
-
-struct rio_detail {
-	unsigned char node_id;      /* RIO Node ID                                     */
-	unsigned long BBAR;         /* Address of 1MB register space                   */
-	unsigned char type;         /* Type of device                                  */
-	unsigned char owner_id;     /* For WPEG: Node ID of Cyclone that owns this WPEG*/
-	                            /* For CYC:  Node ID of Twister that owns this CYC */
-	unsigned char port0node;    /* Node ID port connected to: 0xFF=None            */
-	unsigned char port0port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
-	unsigned char port1node;    /* Node ID port connected to: 0xFF=None            */
-	unsigned char port1port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
-	unsigned char first_slot;   /* For WPEG: Lowest slot number below this WPEG    */
-	                            /* For CYC:  0                                     */
-	unsigned char status;       /* For WPEG: Bit 0 = 1 : the XAPIC is used         */
-	                            /*                 = 0 : the XAPIC is not used, ie:*/
-	                            /*                     ints fwded to another XAPIC */
-	                            /*           Bits1:7 Reserved                      */
-	                            /* For CYC:  Bits0:7 Reserved                      */
-	unsigned char WP_index;     /* For WPEG: WPEG instance index - lower ones have */
-	                            /*           lower slot numbers/PCI bus numbers    */
-	                            /* For CYC:  No meaning                            */
-	unsigned char chassis_num;  /* 1 based Chassis number                          */
-	                            /* For LookOut WPEGs this field indicates the      */
-	                            /* Expansion Chassis #, enumerated from Boot       */
-	                            /* Node WPEG external port, then Boot Node CYC     */
-	                            /* external port, then Next Vigil chassis WPEG     */
-	                            /* external port, etc.                             */
-	                            /* Shared Lookouts have only 1 chassis number (the */
-	                            /* first one assigned)                             */
-} __attribute__((packed));
-
-
-typedef enum {
-	CompatTwister = 0,  /* Compatibility Twister               */
-	AltTwister    = 1,  /* Alternate Twister of internal 8-way */
-	CompatCyclone = 2,  /* Compatibility Cyclone               */
-	AltCyclone    = 3,  /* Alternate Cyclone of internal 8-way */
-	CompatWPEG    = 4,  /* Compatibility WPEG                  */
-	AltWPEG       = 5,  /* Second Planar WPEG                  */
-	LookOutAWPEG  = 6,  /* LookOut WPEG                        */
-	LookOutBWPEG  = 7,  /* LookOut WPEG                        */
-} node_type;
-
-static inline int is_WPEG(struct rio_detail *rio){
-	return (rio->type == CompatWPEG || rio->type == AltWPEG ||
-		rio->type == LookOutAWPEG || rio->type == LookOutBWPEG);
-}
-
-#endif /* __ASM_SUMMIT_MPPARSE_H */
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index e26d34b0bc79..7043408f6904 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -29,21 +29,21 @@ asmlinkage int sys_get_thread_area(struct user_desc __user *);
 /* X86_32 only */
 #ifdef CONFIG_X86_32
 /* kernel/process_32.c */
-asmlinkage int sys_fork(struct pt_regs);
-asmlinkage int sys_clone(struct pt_regs);
-asmlinkage int sys_vfork(struct pt_regs);
-asmlinkage int sys_execve(struct pt_regs);
+int sys_fork(struct pt_regs *);
+int sys_clone(struct pt_regs *);
+int sys_vfork(struct pt_regs *);
+int sys_execve(struct pt_regs *);
 
 /* kernel/signal_32.c */
 asmlinkage int sys_sigsuspend(int, int, old_sigset_t);
 asmlinkage int sys_sigaction(int, const struct old_sigaction __user *,
 			     struct old_sigaction __user *);
-asmlinkage int sys_sigaltstack(unsigned long);
-asmlinkage unsigned long sys_sigreturn(unsigned long);
-asmlinkage int sys_rt_sigreturn(unsigned long);
+int sys_sigaltstack(struct pt_regs *);
+unsigned long sys_sigreturn(struct pt_regs *);
+long sys_rt_sigreturn(struct pt_regs *);
 
 /* kernel/ioport.c */
-asmlinkage long sys_iopl(unsigned long);
+long sys_iopl(struct pt_regs *);
 
 /* kernel/sys_i386_32.c */
 asmlinkage long sys_mmap2(unsigned long, unsigned long, unsigned long,
@@ -59,8 +59,8 @@ struct oldold_utsname;
 asmlinkage int sys_olduname(struct oldold_utsname __user *);
 
 /* kernel/vm86_32.c */
-asmlinkage int sys_vm86old(struct pt_regs);
-asmlinkage int sys_vm86(struct pt_regs);
+int sys_vm86old(struct pt_regs *);
+int sys_vm86(struct pt_regs *);
 
 #else /* CONFIG_X86_32 */
 
@@ -82,7 +82,7 @@ asmlinkage long sys_iopl(unsigned int, struct pt_regs *);
 /* kernel/signal_64.c */
 asmlinkage long sys_sigaltstack(const stack_t __user *, stack_t __user *,
 				struct pt_regs *);
-asmlinkage long sys_rt_sigreturn(struct pt_regs *);
+long sys_rt_sigreturn(struct pt_regs *);
 
 /* kernel/sys_x86_64.c */
 asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index 8e626ea33a1a..643c59b4bc6e 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -20,9 +20,26 @@
 struct task_struct; /* one of the stranger aspects of C forward declarations */
 struct task_struct *__switch_to(struct task_struct *prev,
 				struct task_struct *next);
+struct tss_struct;
+void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
+		      struct tss_struct *tss);
 
 #ifdef CONFIG_X86_32
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+#define __switch_canary							\
+	"movl %P[task_canary](%[next]), %%ebx\n\t"			\
+	"movl %%ebx, "__percpu_arg([stack_canary])"\n\t"
+#define __switch_canary_oparam						\
+	, [stack_canary] "=m" (per_cpu_var(stack_canary))
+#define __switch_canary_iparam						\
+	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
+#else	/* CC_STACKPROTECTOR */
+#define __switch_canary
+#define __switch_canary_oparam
+#define __switch_canary_iparam
+#endif	/* CC_STACKPROTECTOR */
+
 /*
  * Saving eflags is important. It switches not only IOPL between tasks,
  * it also protects other tasks from NT leaking through sysenter etc.
@@ -44,6 +61,7 @@ do {									\
 		     "movl %[next_sp],%%esp\n\t"	/* restore ESP   */ \
 		     "movl $1f,%[prev_ip]\n\t"	/* save    EIP   */	\
 		     "pushl %[next_ip]\n\t"	/* restore EIP   */	\
+		     __switch_canary					\
 		     "jmp __switch_to\n"	/* regparm call  */	\
 		     "1:\t"						\
 		     "popl %%ebp\n\t"		/* restore EBP   */	\
@@ -58,6 +76,8 @@ do {									\
 		       "=b" (ebx), "=c" (ecx), "=d" (edx),		\
 		       "=S" (esi), "=D" (edi)				\
 		       							\
+		       __switch_canary_oparam				\
+									\
 		       /* input parameters: */				\
 		     : [next_sp]  "m" (next->thread.sp),		\
 		       [next_ip]  "m" (next->thread.ip),		\
@@ -66,6 +86,8 @@ do {									\
 		       [prev]     "a" (prev),				\
 		       [next]     "d" (next)				\
 									\
+		       __switch_canary_iparam				\
+									\
 		     : /* reloaded segment registers */			\
 			"memory");					\
 } while (0)
@@ -86,27 +108,44 @@ do {									\
 	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
 	  "r12", "r13", "r14", "r15"
 
+#ifdef CONFIG_CC_STACKPROTECTOR
+#define __switch_canary							  \
+	"movq %P[task_canary](%%rsi),%%r8\n\t"				  \
+	"movq %%r8,"__percpu_arg([gs_canary])"\n\t"
+#define __switch_canary_oparam						  \
+	, [gs_canary] "=m" (per_cpu_var(irq_stack_union.stack_canary))
+#define __switch_canary_iparam						  \
+	, [task_canary] "i" (offsetof(struct task_struct, stack_canary))
+#else	/* CC_STACKPROTECTOR */
+#define __switch_canary
+#define __switch_canary_oparam
+#define __switch_canary_iparam
+#endif	/* CC_STACKPROTECTOR */
+
 /* Save restore flags to clear handle leaking NT */
 #define switch_to(prev, next, last) \
-	asm volatile(SAVE_CONTEXT						    \
+	asm volatile(SAVE_CONTEXT					  \
 	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
 	     "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */	  \
 	     "call __switch_to\n\t"					  \
 	     ".globl thread_return\n"					  \
 	     "thread_return:\n\t"					  \
-	     "movq %%gs:%P[pda_pcurrent],%%rsi\n\t"			  \
+	     "movq "__percpu_arg([current_task])",%%rsi\n\t"		  \
+	     __switch_canary						  \
 	     "movq %P[thread_info](%%rsi),%%r8\n\t"			  \
-	     LOCK_PREFIX "btr  %[tif_fork],%P[ti_flags](%%r8)\n\t"	  \
 	     "movq %%rax,%%rdi\n\t" 					  \
-	     "jc   ret_from_fork\n\t"					  \
+	     "testl  %[_tif_fork],%P[ti_flags](%%r8)\n\t"	  \
+	     "jnz   ret_from_fork\n\t"					  \
 	     RESTORE_CONTEXT						  \
 	     : "=a" (last)					  	  \
+	       __switch_canary_oparam					  \
 	     : [next] "S" (next), [prev] "D" (prev),			  \
 	       [threadrsp] "i" (offsetof(struct task_struct, thread.sp)), \
 	       [ti_flags] "i" (offsetof(struct thread_info, flags)),	  \
-	       [tif_fork] "i" (TIF_FORK),			  	  \
+	       [_tif_fork] "i" (_TIF_FORK),			  	  \
 	       [thread_info] "i" (offsetof(struct task_struct, stack)),   \
-	       [pda_pcurrent] "i" (offsetof(struct x8664_pda, pcurrent))  \
+	       [current_task] "m" (per_cpu_var(current_task))		  \
+	       __switch_canary_iparam					  \
 	     : "memory", "cc" __EXTRA_CLOBBER)
 #endif
 
@@ -165,6 +204,25 @@ extern void native_load_gs_index(unsigned);
 #define savesegment(seg, value)				\
 	asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
 
+/*
+ * x86_32 user gs accessors.
+ */
+#ifdef CONFIG_X86_32
+#ifdef CONFIG_X86_32_LAZY_GS
+#define get_user_gs(regs)	(u16)({unsigned long v; savesegment(gs, v); v;})
+#define set_user_gs(regs, v)	loadsegment(gs, (unsigned long)(v))
+#define task_user_gs(tsk)	((tsk)->thread.gs)
+#define lazy_save_gs(v)		savesegment(gs, (v))
+#define lazy_load_gs(v)		loadsegment(gs, (v))
+#else	/* X86_32_LAZY_GS */
+#define get_user_gs(regs)	(u16)((regs)->gs)
+#define set_user_gs(regs, v)	do { (regs)->gs = (v); } while (0)
+#define task_user_gs(tsk)	(task_pt_regs(tsk)->gs)
+#define lazy_save_gs(v)		do { } while (0)
+#define lazy_load_gs(v)		do { } while (0)
+#endif	/* X86_32_LAZY_GS */
+#endif	/* X86_32 */
+
 static inline unsigned long get_limit(unsigned long segment)
 {
 	unsigned long __limit;
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 98789647baa9..df9d5f78385e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -40,6 +40,7 @@ struct thread_info {
 						*/
 	__u8			supervisor_stack[0];
 #endif
+	int			uaccess_err;
 };
 
 #define INIT_THREAD_INFO(tsk)			\
@@ -194,25 +195,21 @@ static inline struct thread_info *current_thread_info(void)
 
 #else /* X86_32 */
 
-#include <asm/pda.h>
+#include <asm/percpu.h>
+#define KERNEL_STACK_OFFSET (5*8)
 
 /*
  * macros/functions for gaining access to the thread information structure
  * preempt_count needs to be 1 initially, until the scheduler is functional.
  */
 #ifndef __ASSEMBLY__
-static inline struct thread_info *current_thread_info(void)
-{
-	struct thread_info *ti;
-	ti = (void *)(read_pda(kernelstack) + PDA_STACKOFFSET - THREAD_SIZE);
-	return ti;
-}
+DECLARE_PER_CPU(unsigned long, kernel_stack);
 
-/* do not use in interrupt context */
-static inline struct thread_info *stack_thread_info(void)
+static inline struct thread_info *current_thread_info(void)
 {
 	struct thread_info *ti;
-	asm("andq %%rsp,%0; " : "=r" (ti) : "0" (~(THREAD_SIZE - 1)));
+	ti = (void *)(percpu_read(kernel_stack) +
+		      KERNEL_STACK_OFFSET - THREAD_SIZE);
 	return ti;
 }
 
@@ -220,8 +217,8 @@ static inline struct thread_info *stack_thread_info(void)
 
 /* how to get the thread information struct from ASM */
 #define GET_THREAD_INFO(reg) \
-	movq %gs:pda_kernelstack,reg ; \
-	subq $(THREAD_SIZE-PDA_STACKOFFSET),reg
+	movq PER_CPU_VAR(kernel_stack),reg ; \
+	subq $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg
 
 #endif
 
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index 4f5c24724856..a81195eaa2b3 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -3,6 +3,7 @@
 #include <linux/init.h>
 #include <linux/pm.h>
 #include <linux/percpu.h>
+#include <linux/interrupt.h>
 
 #define TICK_SIZE (tick_nsec / 1000)
 
@@ -11,8 +12,9 @@ unsigned long native_calibrate_tsc(void);
 
 #ifdef CONFIG_X86_32
 extern int timer_ack;
-#endif
 extern int recalibrate_cpu_khz(void);
+extern irqreturn_t timer_interrupt(int irq, void *dev_id);
+#endif /* CONFIG_X86_32 */
 
 extern int no_timer_check;
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 0e7bbb549116..d3539f998f88 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -113,7 +113,7 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 		__flush_tlb();
 }
 
-static inline void native_flush_tlb_others(const cpumask_t *cpumask,
+static inline void native_flush_tlb_others(const struct cpumask *cpumask,
 					   struct mm_struct *mm,
 					   unsigned long va)
 {
@@ -142,31 +142,28 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	flush_tlb_mm(vma->vm_mm);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumask, struct mm_struct *mm,
-			     unsigned long va);
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     struct mm_struct *mm, unsigned long va);
 
 #define TLBSTATE_OK	1
 #define TLBSTATE_LAZY	2
 
-#ifdef CONFIG_X86_32
 struct tlb_state {
 	struct mm_struct *active_mm;
 	int state;
-	char __cacheline_padding[L1_CACHE_BYTES-8];
 };
 DECLARE_PER_CPU(struct tlb_state, cpu_tlbstate);
 
-void reset_lazy_tlbstate(void);
-#else
 static inline void reset_lazy_tlbstate(void)
 {
+	percpu_write(cpu_tlbstate.state, 0);
+	percpu_write(cpu_tlbstate.active_mm, &init_mm);
 }
-#endif
 
 #endif	/* SMP */
 
 #ifndef CONFIG_PARAVIRT
-#define flush_tlb_others(mask, mm, va)	native_flush_tlb_others(&mask, mm, va)
+#define flush_tlb_others(mask, mm, va)	native_flush_tlb_others(mask, mm, va)
 #endif
 
 static inline void flush_tlb_kernel_range(unsigned long start,
@@ -175,4 +172,6 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 	flush_tlb_all();
 }
 
+extern void zap_low_mappings(void);
+
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 4e2f2e0aab27..77cfb2cfb386 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -74,6 +74,8 @@ static inline const struct cpumask *cpumask_of_node(int node)
 	return &node_to_cpumask_map[node];
 }
 
+static inline void setup_node_to_cpumask_map(void) { }
+
 #else /* CONFIG_X86_64 */
 
 /* Mappings between node number and cpus on that node. */
@@ -83,7 +85,8 @@ extern cpumask_t *node_to_cpumask_map;
 DECLARE_EARLY_PER_CPU(int, x86_cpu_to_node_map);
 
 /* Returns the number of the current Node. */
-#define numa_node_id()		read_pda(nodenumber)
+DECLARE_PER_CPU(int, node_number);
+#define numa_node_id()		percpu_read(node_number)
 
 #ifdef CONFIG_DEBUG_PER_CPU_MAPS
 extern int cpu_to_node(int cpu);
@@ -102,10 +105,7 @@ static inline int cpu_to_node(int cpu)
 /* Same function but used if called before per_cpu areas are setup */
 static inline int early_cpu_to_node(int cpu)
 {
-	if (early_per_cpu_ptr(x86_cpu_to_node_map))
-		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
-
-	return per_cpu(x86_cpu_to_node_map, cpu);
+	return early_per_cpu(x86_cpu_to_node_map, cpu);
 }
 
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
@@ -122,6 +122,8 @@ static inline cpumask_t node_to_cpumask(int node)
 
 #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
 
+extern void setup_node_to_cpumask_map(void);
+
 /*
  * Replace default node_to_cpumask_ptr with optimized version
  * Deprecated: use "const struct cpumask *mask = cpumask_of_node(node)"
@@ -192,9 +194,20 @@ extern int __node_distance(int, int);
 
 #else /* !CONFIG_NUMA */
 
-#define numa_node_id()		0
-#define	cpu_to_node(cpu)	0
-#define	early_cpu_to_node(cpu)	0
+static inline int numa_node_id(void)
+{
+	return 0;
+}
+
+static inline int cpu_to_node(int cpu)
+{
+	return 0;
+}
+
+static inline int early_cpu_to_node(int cpu)
+{
+	return 0;
+}
 
 static inline const cpumask_t *cpumask_of_node(int node)
 {
@@ -209,6 +222,8 @@ static inline int node_to_first_cpu(int node)
 	return first_cpu(cpu_online_map);
 }
 
+static inline void setup_node_to_cpumask_map(void) { }
+
 /*
  * Replace default node_to_cpumask_ptr with optimized version
  * Deprecated: use "const struct cpumask *mask = cpumask_of_node(node)"
diff --git a/arch/x86/include/asm/trampoline.h b/arch/x86/include/asm/trampoline.h
index 780ba0ab94f9..90f06c25221d 100644
--- a/arch/x86/include/asm/trampoline.h
+++ b/arch/x86/include/asm/trampoline.h
@@ -13,6 +13,7 @@ extern unsigned char *trampoline_base;
 
 extern unsigned long init_rsp;
 extern unsigned long initial_code;
+extern unsigned long initial_gs;
 
 #define TRAMPOLINE_SIZE roundup(trampoline_end - trampoline_data, PAGE_SIZE)
 #define TRAMPOLINE_BASE 0x6000
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index cf3bb053da0b..0d5342515b86 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -41,7 +41,7 @@ dotraplinkage void do_int3(struct pt_regs *, long);
 dotraplinkage void do_overflow(struct pt_regs *, long);
 dotraplinkage void do_bounds(struct pt_regs *, long);
 dotraplinkage void do_invalid_op(struct pt_regs *, long);
-dotraplinkage void do_device_not_available(struct pt_regs);
+dotraplinkage void do_device_not_available(struct pt_regs *, long);
 dotraplinkage void do_coprocessor_segment_overrun(struct pt_regs *, long);
 dotraplinkage void do_invalid_TSS(struct pt_regs *, long);
 dotraplinkage void do_segment_not_present(struct pt_regs *, long);
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 4340055b7559..b685ece89d5c 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -121,7 +121,7 @@ extern int __get_user_bad(void);
 
 #define __get_user_x(size, ret, x, ptr)		      \
 	asm volatile("call __get_user_" #size	      \
-		     : "=a" (ret),"=d" (x)	      \
+		     : "=a" (ret), "=d" (x)	      \
 		     : "0" (ptr))		      \
 
 /* Careful: we have to cast the result to the type of the pointer
@@ -181,12 +181,12 @@ extern int __get_user_bad(void);
 
 #define __put_user_x(size, x, ptr, __ret_pu)			\
 	asm volatile("call __put_user_" #size : "=a" (__ret_pu)	\
-		     :"0" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
+		     : "0" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 
 
 
 #ifdef CONFIG_X86_32
-#define __put_user_u64(x, addr, err)					\
+#define __put_user_asm_u64(x, addr, err, errret)			\
 	asm volatile("1:	movl %%eax,0(%2)\n"			\
 		     "2:	movl %%edx,4(%2)\n"			\
 		     "3:\n"						\
@@ -197,14 +197,24 @@ extern int __get_user_bad(void);
 		     _ASM_EXTABLE(1b, 4b)				\
 		     _ASM_EXTABLE(2b, 4b)				\
 		     : "=r" (err)					\
-		     : "A" (x), "r" (addr), "i" (-EFAULT), "0" (err))
+		     : "A" (x), "r" (addr), "i" (errret), "0" (err))
+
+#define __put_user_asm_ex_u64(x, addr)					\
+	asm volatile("1:	movl %%eax,0(%1)\n"			\
+		     "2:	movl %%edx,4(%1)\n"			\
+		     "3:\n"						\
+		     _ASM_EXTABLE(1b, 2b - 1b)				\
+		     _ASM_EXTABLE(2b, 3b - 2b)				\
+		     : : "A" (x), "r" (addr))
 
 #define __put_user_x8(x, ptr, __ret_pu)				\
 	asm volatile("call __put_user_8" : "=a" (__ret_pu)	\
 		     : "A" ((typeof(*(ptr)))(x)), "c" (ptr) : "ebx")
 #else
-#define __put_user_u64(x, ptr, retval) \
-	__put_user_asm(x, ptr, retval, "q", "", "Zr", -EFAULT)
+#define __put_user_asm_u64(x, ptr, retval, errret) \
+	__put_user_asm(x, ptr, retval, "q", "", "Zr", errret)
+#define __put_user_asm_ex_u64(x, addr)	\
+	__put_user_asm_ex(x, addr, "q", "", "Zr")
 #define __put_user_x8(x, ptr, __ret_pu) __put_user_x(8, x, ptr, __ret_pu)
 #endif
 
@@ -276,10 +286,32 @@ do {									\
 		__put_user_asm(x, ptr, retval, "w", "w", "ir", errret);	\
 		break;							\
 	case 4:								\
-		__put_user_asm(x, ptr, retval, "l", "k",  "ir", errret);\
+		__put_user_asm(x, ptr, retval, "l", "k", "ir", errret);	\
 		break;							\
 	case 8:								\
-		__put_user_u64((__typeof__(*ptr))(x), ptr, retval);	\
+		__put_user_asm_u64((__typeof__(*ptr))(x), ptr, retval,	\
+				   errret);				\
+		break;							\
+	default:							\
+		__put_user_bad();					\
+	}								\
+} while (0)
+
+#define __put_user_size_ex(x, ptr, size)				\
+do {									\
+	__chk_user_ptr(ptr);						\
+	switch (size) {							\
+	case 1:								\
+		__put_user_asm_ex(x, ptr, "b", "b", "iq");		\
+		break;							\
+	case 2:								\
+		__put_user_asm_ex(x, ptr, "w", "w", "ir");		\
+		break;							\
+	case 4:								\
+		__put_user_asm_ex(x, ptr, "l", "k", "ir");		\
+		break;							\
+	case 8:								\
+		__put_user_asm_ex_u64((__typeof__(*ptr))(x), ptr);	\
 		break;							\
 	default:							\
 		__put_user_bad();					\
@@ -311,9 +343,12 @@ do {									\
 
 #ifdef CONFIG_X86_32
 #define __get_user_asm_u64(x, ptr, retval, errret)	(x) = __get_user_bad()
+#define __get_user_asm_ex_u64(x, ptr)			(x) = __get_user_bad()
 #else
 #define __get_user_asm_u64(x, ptr, retval, errret) \
 	 __get_user_asm(x, ptr, retval, "q", "", "=r", errret)
+#define __get_user_asm_ex_u64(x, ptr) \
+	 __get_user_asm_ex(x, ptr, "q", "", "=r")
 #endif
 
 #define __get_user_size(x, ptr, size, retval, errret)			\
@@ -350,6 +385,33 @@ do {									\
 		     : "=r" (err), ltype(x)				\
 		     : "m" (__m(addr)), "i" (errret), "0" (err))
 
+#define __get_user_size_ex(x, ptr, size)				\
+do {									\
+	__chk_user_ptr(ptr);						\
+	switch (size) {							\
+	case 1:								\
+		__get_user_asm_ex(x, ptr, "b", "b", "=q");		\
+		break;							\
+	case 2:								\
+		__get_user_asm_ex(x, ptr, "w", "w", "=r");		\
+		break;							\
+	case 4:								\
+		__get_user_asm_ex(x, ptr, "l", "k", "=r");		\
+		break;							\
+	case 8:								\
+		__get_user_asm_ex_u64(x, ptr);				\
+		break;							\
+	default:							\
+		(x) = __get_user_bad();					\
+	}								\
+} while (0)
+
+#define __get_user_asm_ex(x, addr, itype, rtype, ltype)			\
+	asm volatile("1:	mov"itype" %1,%"rtype"0\n"		\
+		     "2:\n"						\
+		     _ASM_EXTABLE(1b, 2b - 1b)				\
+		     : ltype(x) : "m" (__m(addr)))
+
 #define __put_user_nocheck(x, ptr, size)			\
 ({								\
 	int __pu_err;						\
@@ -385,6 +447,26 @@ struct __large_struct { unsigned long buf[100]; };
 		     _ASM_EXTABLE(1b, 3b)				\
 		     : "=r"(err)					\
 		     : ltype(x), "m" (__m(addr)), "i" (errret), "0" (err))
+
+#define __put_user_asm_ex(x, addr, itype, rtype, ltype)			\
+	asm volatile("1:	mov"itype" %"rtype"0,%1\n"		\
+		     "2:\n"						\
+		     _ASM_EXTABLE(1b, 2b - 1b)				\
+		     : : ltype(x), "m" (__m(addr)))
+
+/*
+ * uaccess_try and catch
+ */
+#define uaccess_try	do {						\
+	int prev_err = current_thread_info()->uaccess_err;		\
+	current_thread_info()->uaccess_err = 0;				\
+	barrier();
+
+#define uaccess_catch(err)						\
+	(err) |= current_thread_info()->uaccess_err;			\
+	current_thread_info()->uaccess_err = prev_err;			\
+} while (0)
+
 /**
  * __get_user: - Get a simple variable from user space, with less checking.
  * @x:   Variable to store result.
@@ -408,6 +490,7 @@ struct __large_struct { unsigned long buf[100]; };
 
 #define __get_user(x, ptr)						\
 	__get_user_nocheck((x), (ptr), sizeof(*(ptr)))
+
 /**
  * __put_user: - Write a simple value into user space, with less checking.
  * @x:   Value to copy to user space.
@@ -435,6 +518,45 @@ struct __large_struct { unsigned long buf[100]; };
 #define __put_user_unaligned __put_user
 
 /*
+ * {get|put}_user_try and catch
+ *
+ * get_user_try {
+ *	get_user_ex(...);
+ * } get_user_catch(err)
+ */
+#define get_user_try		uaccess_try
+#define get_user_catch(err)	uaccess_catch(err)
+
+#define get_user_ex(x, ptr)	do {					\
+	unsigned long __gue_val;					\
+	__get_user_size_ex((__gue_val), (ptr), (sizeof(*(ptr))));	\
+	(x) = (__force __typeof__(*(ptr)))__gue_val;			\
+} while (0)
+
+#ifdef CONFIG_X86_WP_WORKS_OK
+
+#define put_user_try		uaccess_try
+#define put_user_catch(err)	uaccess_catch(err)
+
+#define put_user_ex(x, ptr)						\
+	__put_user_size_ex((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
+
+#else /* !CONFIG_X86_WP_WORKS_OK */
+
+#define put_user_try		do {		\
+	int __uaccess_err = 0;
+
+#define put_user_catch(err)			\
+	(err) |= __uaccess_err;			\
+} while (0)
+
+#define put_user_ex(x, ptr)	do {		\
+	__uaccess_err |= __put_user(x, ptr);	\
+} while (0)
+
+#endif /* CONFIG_X86_WP_WORKS_OK */
+
+/*
  * movsl can be slow when source and dest are not both 8-byte aligned
  */
 #ifdef CONFIG_X86_INTEL_USERCOPY
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 84210c479fca..8cc687326eb8 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -188,16 +188,16 @@ __copy_to_user_inatomic(void __user *dst, const void *src, unsigned size)
 extern long __copy_user_nocache(void *dst, const void __user *src,
 				unsigned size, int zerorest);
 
-static inline int __copy_from_user_nocache(void *dst, const void __user *src,
-					   unsigned size)
+static inline int
+__copy_from_user_nocache(void *dst, const void __user *src, unsigned size)
 {
 	might_sleep();
 	return __copy_user_nocache(dst, src, size, 1);
 }
 
-static inline int __copy_from_user_inatomic_nocache(void *dst,
-						    const void __user *src,
-						    unsigned size)
+static inline int
+__copy_from_user_inatomic_nocache(void *dst, const void __user *src,
+				  unsigned size)
 {
 	return __copy_user_nocache(dst, src, size, 0);
 }
diff --git a/arch/x86/include/asm/uv/uv.h b/arch/x86/include/asm/uv/uv.h
new file mode 100644
index 000000000000..c0a01b5d985b
--- /dev/null
+++ b/arch/x86/include/asm/uv/uv.h
@@ -0,0 +1,33 @@
+#ifndef _ASM_X86_UV_UV_H
+#define _ASM_X86_UV_UV_H
+
+enum uv_system_type {UV_NONE, UV_LEGACY_APIC, UV_X2APIC, UV_NON_UNIQUE_APIC};
+
+struct cpumask;
+struct mm_struct;
+
+#ifdef CONFIG_X86_UV
+
+extern enum uv_system_type get_uv_system_type(void);
+extern int is_uv_system(void);
+extern void uv_cpu_init(void);
+extern void uv_system_init(void);
+extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
+						 struct mm_struct *mm,
+						 unsigned long va,
+						 unsigned int cpu);
+
+#else	/* X86_UV */
+
+static inline enum uv_system_type get_uv_system_type(void) { return UV_NONE; }
+static inline int is_uv_system(void)	{ return 0; }
+static inline void uv_cpu_init(void)	{ }
+static inline void uv_system_init(void)	{ }
+static inline const struct cpumask *
+uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm,
+		    unsigned long va, unsigned int cpu)
+{ return cpumask; }
+
+#endif	/* X86_UV */
+
+#endif	/* _ASM_X86_UV_UV_H */
diff --git a/arch/x86/include/asm/uv/uv_bau.h b/arch/x86/include/asm/uv/uv_bau.h
index 50423c7b56b2..9b0e61bf7a88 100644
--- a/arch/x86/include/asm/uv/uv_bau.h
+++ b/arch/x86/include/asm/uv/uv_bau.h
@@ -325,7 +325,6 @@ static inline void bau_cpubits_clear(struct bau_local_cpumask *dstp, int nbits)
 #define cpubit_isset(cpu, bau_local_cpumask) \
 	test_bit((cpu), (bau_local_cpumask).bits)
 
-extern int uv_flush_tlb_others(cpumask_t *, struct mm_struct *, unsigned long);
 extern void uv_bau_message_intr1(void);
 extern void uv_bau_timeout_intr1(void);
 
diff --git a/arch/x86/include/asm/vic.h b/arch/x86/include/asm/vic.h
deleted file mode 100644
index 53100f353612..000000000000
--- a/arch/x86/include/asm/vic.h
+++ /dev/null
@@ -1,61 +0,0 @@
-/* Copyright (C) 1999,2001
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * Standard include definitions for the NCR Voyager Interrupt Controller */
-
-/* The eight CPI vectors.  To activate a CPI, you write a bit mask
- * corresponding to the processor set to be interrupted into the
- * relevant register.  That set of CPUs will then be interrupted with
- * the CPI */
-static const int VIC_CPI_Registers[] =
-	{0xFC00, 0xFC01, 0xFC08, 0xFC09,
-	 0xFC10, 0xFC11, 0xFC18, 0xFC19 };
-
-#define VIC_PROC_WHO_AM_I		0xfc29
-#	define	QUAD_IDENTIFIER		0xC0
-#	define  EIGHT_SLOT_IDENTIFIER	0xE0
-#define QIC_EXTENDED_PROCESSOR_SELECT	0xFC72
-#define VIC_CPI_BASE_REGISTER		0xFC41
-#define VIC_PROCESSOR_ID		0xFC21
-#	define VIC_CPU_MASQUERADE_ENABLE 0x8
-
-#define VIC_CLAIM_REGISTER_0		0xFC38
-#define VIC_CLAIM_REGISTER_1		0xFC39
-#define VIC_REDIRECT_REGISTER_0		0xFC60
-#define VIC_REDIRECT_REGISTER_1		0xFC61
-#define VIC_PRIORITY_REGISTER		0xFC20
-
-#define VIC_PRIMARY_MC_BASE		0xFC48
-#define VIC_SECONDARY_MC_BASE		0xFC49
-
-#define QIC_PROCESSOR_ID		0xFC71
-#	define	QIC_CPUID_ENABLE	0x08
-
-#define QIC_VIC_CPI_BASE_REGISTER	0xFC79
-#define QIC_CPI_BASE_REGISTER		0xFC7A
-
-#define QIC_MASK_REGISTER0		0xFC80
-/* NOTE: these are masked high, enabled low */
-#	define QIC_PERF_TIMER		0x01
-#	define QIC_LPE			0x02
-#	define QIC_SYS_INT		0x04
-#	define QIC_CMN_INT		0x08
-/* at the moment, just enable CMN_INT, disable SYS_INT */
-#	define QIC_DEFAULT_MASK0	(~(QIC_CMN_INT /* | VIC_SYS_INT */))
-#define QIC_MASK_REGISTER1		0xFC81
-#	define QIC_BOOT_CPI_MASK	0xFE
-/* Enable CPI's 1-6 inclusive */
-#	define QIC_CPI_ENABLE		0x81
-
-#define QIC_INTERRUPT_CLEAR0		0xFC8A
-#define QIC_INTERRUPT_CLEAR1		0xFC8B
-
-/* this is where we place the CPI vectors */
-#define VIC_DEFAULT_CPI_BASE		0xC0
-/* this is where we place the QIC CPI vectors */
-#define QIC_DEFAULT_CPI_BASE		0xD0
-
-#define VIC_BOOT_INTERRUPT_MASK		0xfe
-
-extern void smp_vic_timer_interrupt(void);
diff --git a/arch/x86/include/asm/voyager.h b/arch/x86/include/asm/voyager.h
deleted file mode 100644
index b3e647307625..000000000000
--- a/arch/x86/include/asm/voyager.h
+++ /dev/null
@@ -1,529 +0,0 @@
-/* Copyright (C) 1999,2001
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * Standard include definitions for the NCR Voyager system */
-
-#undef	VOYAGER_DEBUG
-#undef	VOYAGER_CAT_DEBUG
-
-#ifdef VOYAGER_DEBUG
-#define VDEBUG(x)	printk x
-#else
-#define VDEBUG(x)
-#endif
-
-/* There are three levels of voyager machine: 3,4 and 5. The rule is
- * if it's less than 3435 it's a Level 3 except for a 3360 which is
- * a level 4.  A 3435 or above is a Level 5 */
-#define VOYAGER_LEVEL5_AND_ABOVE	0x3435
-#define VOYAGER_LEVEL4			0x3360
-
-/* The L4 DINO ASIC */
-#define VOYAGER_DINO			0x43
-
-/* voyager ports in standard I/O space */
-#define VOYAGER_MC_SETUP	0x96
-
-
-#define	VOYAGER_CAT_CONFIG_PORT			0x97
-#	define VOYAGER_CAT_DESELECT		0xff
-#define VOYAGER_SSPB_RELOCATION_PORT		0x98
-
-/* Valid CAT controller commands */
-/* start instruction register cycle */
-#define VOYAGER_CAT_IRCYC			0x01
-/* start data register cycle */
-#define VOYAGER_CAT_DRCYC			0x02
-/* move to execute state */
-#define VOYAGER_CAT_RUN				0x0F
-/* end operation */
-#define VOYAGER_CAT_END				0x80
-/* hold in idle state */
-#define VOYAGER_CAT_HOLD			0x90
-/* single step an "intest" vector */
-#define VOYAGER_CAT_STEP			0xE0
-/* return cat controller to CLEMSON mode */
-#define VOYAGER_CAT_CLEMSON			0xFF
-
-/* the default cat command header */
-#define VOYAGER_CAT_HEADER			0x7F
-
-/* the range of possible CAT module ids in the system */
-#define VOYAGER_MIN_MODULE			0x10
-#define VOYAGER_MAX_MODULE			0x1f
-
-/* The voyager registers per asic */
-#define VOYAGER_ASIC_ID_REG			0x00
-#define VOYAGER_ASIC_TYPE_REG			0x01
-/* the sub address registers can be made auto incrementing on reads */
-#define VOYAGER_AUTO_INC_REG			0x02
-#	define VOYAGER_AUTO_INC			0x04
-#	define VOYAGER_NO_AUTO_INC		0xfb
-#define VOYAGER_SUBADDRDATA			0x03
-#define VOYAGER_SCANPATH			0x05
-#	define VOYAGER_CONNECT_ASIC		0x01
-#	define VOYAGER_DISCONNECT_ASIC		0xfe
-#define VOYAGER_SUBADDRLO			0x06
-#define VOYAGER_SUBADDRHI			0x07
-#define VOYAGER_SUBMODSELECT			0x08
-#define VOYAGER_SUBMODPRESENT			0x09
-
-#define VOYAGER_SUBADDR_LO			0xff
-#define VOYAGER_SUBADDR_HI			0xffff
-
-/* the maximum size of a scan path -- used to form instructions */
-#define VOYAGER_MAX_SCAN_PATH			0x100
-/* the biggest possible register size (in bytes) */
-#define VOYAGER_MAX_REG_SIZE			4
-
-/* Total number of possible modules (including submodules) */
-#define VOYAGER_MAX_MODULES			16
-/* Largest number of asics per module */
-#define VOYAGER_MAX_ASICS_PER_MODULE		7
-
-/* the CAT asic of each module is always the first one */
-#define VOYAGER_CAT_ID				0
-#define VOYAGER_PSI				0x1a
-
-/* voyager instruction operations and registers */
-#define VOYAGER_READ_CONFIG			0x1
-#define VOYAGER_WRITE_CONFIG			0x2
-#define VOYAGER_BYPASS				0xff
-
-typedef struct voyager_asic {
-	__u8	asic_addr;	/* ASIC address; Level 4 */
-	__u8	asic_type;      /* ASIC type */
-	__u8	asic_id;	/* ASIC id */
-	__u8	jtag_id[4];	/* JTAG id */
-	__u8	asic_location;	/* Location within scan path; start w/ 0 */
-	__u8	bit_location;	/* Location within bit stream; start w/ 0 */
-	__u8	ireg_length;	/* Instruction register length */
-	__u16	subaddr;	/* Amount of sub address space */
-	struct voyager_asic *next;	/* Next asic in linked list */
-} voyager_asic_t;
-
-typedef struct voyager_module {
-	__u8	module_addr;		/* Module address */
-	__u8	scan_path_connected;	/* Scan path connected */
-	__u16   ee_size;		/* Size of the EEPROM */
-	__u16   num_asics;		/* Number of Asics */
-	__u16   inst_bits;		/* Instruction bits in the scan path */
-	__u16   largest_reg;		/* Largest register in the scan path */
-	__u16   smallest_reg;		/* Smallest register in the scan path */
-	voyager_asic_t   *asic;		/* First ASIC in scan path (CAT_I) */
-	struct   voyager_module *submodule;	/* Submodule pointer */
-	struct   voyager_module *next;		/* Next module in linked list */
-} voyager_module_t;
-
-typedef struct voyager_eeprom_hdr {
-	 __u8  module_id[4];
-	 __u8  version_id;
-	 __u8  config_id;
-	 __u16 boundry_id;	/* boundary scan id */
-	 __u16 ee_size;		/* size of EEPROM */
-	 __u8  assembly[11];	/* assembly # */
-	 __u8  assembly_rev;	/* assembly rev */
-	 __u8  tracer[4];	/* tracer number */
-	 __u16 assembly_cksum;	/* asm checksum */
-	 __u16 power_consump;	/* pwr requirements */
-	 __u16 num_asics;	/* number of asics */
-	 __u16 bist_time;	/* min. bist time */
-	 __u16 err_log_offset;	/* error log offset */
-	 __u16 scan_path_offset;/* scan path offset */
-	 __u16 cct_offset;
-	 __u16 log_length;	/* length of err log */
-	 __u16 xsum_end;	/* offset to end of
-				   checksum */
-	 __u8  reserved[4];
-	 __u8  sflag;		/* starting sentinal */
-	 __u8  part_number[13];	/* prom part number */
-	 __u8  version[10];	/* version number */
-	 __u8  signature[8];
-	 __u16 eeprom_chksum;
-	 __u32  data_stamp_offset;
-	 __u8  eflag ;		 /* ending sentinal */
-} __attribute__((packed)) voyager_eprom_hdr_t;
-
-
-
-#define VOYAGER_EPROM_SIZE_OFFSET				\
-	((__u16)(&(((voyager_eprom_hdr_t *)0)->ee_size)))
-#define VOYAGER_XSUM_END_OFFSET		0x2a
-
-/* the following three definitions are for internal table layouts
- * in the module EPROMs.  We really only care about the IDs and
- * offsets */
-typedef struct voyager_sp_table {
-	__u8 asic_id;
-	__u8 bypass_flag;
-	__u16 asic_data_offset;
-	__u16 config_data_offset;
-} __attribute__((packed)) voyager_sp_table_t;
-
-typedef struct voyager_jtag_table {
-	__u8 icode[4];
-	__u8 runbist[4];
-	__u8 intest[4];
-	__u8 samp_preld[4];
-	__u8 ireg_len;
-} __attribute__((packed)) voyager_jtt_t;
-
-typedef struct voyager_asic_data_table {
-	__u8 jtag_id[4];
-	__u16 length_bsr;
-	__u16 length_bist_reg;
-	__u32 bist_clk;
-	__u16 subaddr_bits;
-	__u16 seed_bits;
-	__u16 sig_bits;
-	__u16 jtag_offset;
-} __attribute__((packed)) voyager_at_t;
-
-/* Voyager Interrupt Controller (VIC) registers */
-
-/* Base to add to Cross Processor Interrupts (CPIs) when triggering
- * the CPU IRQ line */
-/* register defines for the WCBICs (one per processor) */
-#define VOYAGER_WCBIC0	0x41		/* bus A node P1 processor 0 */
-#define VOYAGER_WCBIC1	0x49		/* bus A node P1 processor 1 */
-#define VOYAGER_WCBIC2	0x51		/* bus A node P2 processor 0 */
-#define VOYAGER_WCBIC3	0x59		/* bus A node P2 processor 1 */
-#define VOYAGER_WCBIC4	0x61		/* bus B node P1 processor 0 */
-#define VOYAGER_WCBIC5	0x69		/* bus B node P1 processor 1 */
-#define VOYAGER_WCBIC6	0x71		/* bus B node P2 processor 0 */
-#define VOYAGER_WCBIC7	0x79		/* bus B node P2 processor 1 */
-
-
-/* top of memory registers */
-#define VOYAGER_WCBIC_TOM_L	0x4
-#define VOYAGER_WCBIC_TOM_H	0x5
-
-/* register defines for Voyager Memory Contol (VMC)
- * these are present on L4 machines only */
-#define	VOYAGER_VMC1		0x81
-#define VOYAGER_VMC2		0x91
-#define VOYAGER_VMC3		0xa1
-#define VOYAGER_VMC4		0xb1
-
-/* VMC Ports */
-#define VOYAGER_VMC_MEMORY_SETUP	0x9
-#	define VMC_Interleaving		0x01
-#	define VMC_4Way			0x02
-#	define VMC_EvenCacheLines	0x04
-#	define VMC_HighLine		0x08
-#	define VMC_Start0_Enable	0x20
-#	define VMC_Start1_Enable	0x40
-#	define VMC_Vremap		0x80
-#define VOYAGER_VMC_BANK_DENSITY	0xa
-#	define	VMC_BANK_EMPTY		0
-#	define	VMC_BANK_4MB		1
-#	define	VMC_BANK_16MB		2
-#	define	VMC_BANK_64MB		3
-#	define	VMC_BANK0_MASK		0x03
-#	define	VMC_BANK1_MASK		0x0C
-#	define	VMC_BANK2_MASK		0x30
-#	define	VMC_BANK3_MASK		0xC0
-
-/* Magellan Memory Controller (MMC) defines - present on L5 */
-#define VOYAGER_MMC_ASIC_ID		1
-/* the two memory modules corresponding to memory cards in the system */
-#define VOYAGER_MMC_MEMORY0_MODULE	0x14
-#define VOYAGER_MMC_MEMORY1_MODULE	0x15
-/* the Magellan Memory Address (MMA) defines */
-#define VOYAGER_MMA_ASIC_ID		2
-
-/* Submodule number for the Quad Baseboard */
-#define VOYAGER_QUAD_BASEBOARD		1
-
-/* ASIC defines for the Quad Baseboard */
-#define VOYAGER_QUAD_QDATA0		1
-#define VOYAGER_QUAD_QDATA1		2
-#define VOYAGER_QUAD_QABC		3
-
-/* Useful areas in extended CMOS */
-#define VOYAGER_PROCESSOR_PRESENT_MASK	0x88a
-#define VOYAGER_MEMORY_CLICKMAP		0xa23
-#define VOYAGER_DUMP_LOCATION		0xb1a
-
-/* SUS In Control bit - used to tell SUS that we don't need to be
- * babysat anymore */
-#define VOYAGER_SUS_IN_CONTROL_PORT	0x3ff
-#	define VOYAGER_IN_CONTROL_FLAG	0x80
-
-/* Voyager PSI defines */
-#define VOYAGER_PSI_STATUS_REG		0x08
-#	define PSI_DC_FAIL		0x01
-#	define PSI_MON			0x02
-#	define PSI_FAULT		0x04
-#	define PSI_ALARM		0x08
-#	define PSI_CURRENT		0x10
-#	define PSI_DVM			0x20
-#	define PSI_PSCFAULT		0x40
-#	define PSI_STAT_CHG		0x80
-
-#define VOYAGER_PSI_SUPPLY_REG		0x8000
-	/* read */
-#	define PSI_FAIL_DC		0x01
-#	define PSI_FAIL_AC		0x02
-#	define PSI_MON_INT		0x04
-#	define PSI_SWITCH_OFF		0x08
-#	define PSI_HX_OFF		0x10
-#	define PSI_SECURITY		0x20
-#	define PSI_CMOS_BATT_LOW	0x40
-#	define PSI_CMOS_BATT_FAIL	0x80
-	/* write */
-#	define PSI_CLR_SWITCH_OFF	0x13
-#	define PSI_CLR_HX_OFF		0x14
-#	define PSI_CLR_CMOS_BATT_FAIL	0x17
-
-#define VOYAGER_PSI_MASK		0x8001
-#	define PSI_MASK_MASK		0x10
-
-#define VOYAGER_PSI_AC_FAIL_REG		0x8004
-#define	AC_FAIL_STAT_CHANGE		0x80
-
-#define VOYAGER_PSI_GENERAL_REG		0x8007
-	/* read */
-#	define PSI_SWITCH_ON		0x01
-#	define PSI_SWITCH_ENABLED	0x02
-#	define PSI_ALARM_ENABLED	0x08
-#	define PSI_SECURE_ENABLED	0x10
-#	define PSI_COLD_RESET		0x20
-#	define PSI_COLD_START		0x80
-	/* write */
-#	define PSI_POWER_DOWN		0x10
-#	define PSI_SWITCH_DISABLE	0x01
-#	define PSI_SWITCH_ENABLE	0x11
-#	define PSI_CLEAR		0x12
-#	define PSI_ALARM_DISABLE	0x03
-#	define PSI_ALARM_ENABLE		0x13
-#	define PSI_CLEAR_COLD_RESET	0x05
-#	define PSI_SET_COLD_RESET	0x15
-#	define PSI_CLEAR_COLD_START	0x07
-#	define PSI_SET_COLD_START	0x17
-
-
-
-struct voyager_bios_info {
-	__u8	len;
-	__u8	major;
-	__u8	minor;
-	__u8	debug;
-	__u8	num_classes;
-	__u8	class_1;
-	__u8	class_2;
-};
-
-/* The following structures and definitions are for the Kernel/SUS
- * interface these are needed to find out how SUS initialised any Quad
- * boards in the system */
-
-#define	NUMBER_OF_MC_BUSSES	2
-#define SLOTS_PER_MC_BUS	8
-#define MAX_CPUS                16      /* 16 way CPU system */
-#define MAX_PROCESSOR_BOARDS	4	/* 4 processor slot system */
-#define MAX_CACHE_LEVELS	4	/* # of cache levels supported */
-#define MAX_SHARED_CPUS		4	/* # of CPUs that can share a LARC */
-#define NUMBER_OF_POS_REGS	8
-
-typedef struct {
-	__u8	MC_Slot;
-	__u8	POS_Values[NUMBER_OF_POS_REGS];
-} __attribute__((packed)) MC_SlotInformation_t;
-
-struct QuadDescription {
-	__u8  Type;	/* for type 0 (DYADIC or MONADIC) all fields
-			 * will be zero except for slot */
-	__u8 StructureVersion;
-	__u32 CPI_BaseAddress;
-	__u32  LARC_BankSize;
-	__u32 LocalMemoryStateBits;
-	__u8  Slot; /* Processor slots 1 - 4 */
-} __attribute__((packed));
-
-struct ProcBoardInfo {
-	__u8 Type;
-	__u8 StructureVersion;
-	__u8 NumberOfBoards;
-	struct QuadDescription QuadData[MAX_PROCESSOR_BOARDS];
-} __attribute__((packed));
-
-struct CacheDescription {
-	__u8 Level;
-	__u32 TotalSize;
-	__u16 LineSize;
-	__u8  Associativity;
-	__u8  CacheType;
-	__u8  WriteType;
-	__u8  Number_CPUs_SharedBy;
-	__u8  Shared_CPUs_Hardware_IDs[MAX_SHARED_CPUS];
-
-} __attribute__((packed));
-
-struct CPU_Description {
-	__u8 CPU_HardwareId;
-	char *FRU_String;
-	__u8 NumberOfCacheLevels;
-	struct CacheDescription CacheLevelData[MAX_CACHE_LEVELS];
-} __attribute__((packed));
-
-struct CPU_Info {
-	__u8 Type;
-	__u8 StructureVersion;
-	__u8 NumberOf_CPUs;
-	struct CPU_Description CPU_Data[MAX_CPUS];
-} __attribute__((packed));
-
-
-/*
- * This structure will be used by SUS and the OS.
- * The assumption about this structure is that no blank space is
- * packed in it by our friend the compiler.
- */
-typedef struct {
-	__u8	Mailbox_SUS;		/* Written to by SUS to give
-					   commands/response to the OS */
-	__u8	Mailbox_OS;		/* Written to by the OS to give
-					   commands/response to SUS */
-	__u8	SUS_MailboxVersion;	/* Tells the OS which iteration of the
-					   interface SUS supports */
-	__u8	OS_MailboxVersion;	/* Tells SUS which iteration of the
-					   interface the OS supports */
-	__u32	OS_Flags;		/* Flags set by the OS as info for
-					   SUS */
-	__u32	SUS_Flags;		/* Flags set by SUS as info
-					   for the OS */
-	__u32	WatchDogPeriod;		/* Watchdog period (in seconds) which
-					   the DP uses to see if the OS
-					   is dead */
-	__u32	WatchDogCount;		/* Updated by the OS on every tic. */
-	__u32	MemoryFor_SUS_ErrorLog;	/* Flat 32 bit address which tells SUS
-					   where to stuff the SUS error log
-					   on a dump */
-	MC_SlotInformation_t MC_SlotInfo[NUMBER_OF_MC_BUSSES*SLOTS_PER_MC_BUS];
-					/* Storage for MCA POS data */
-	/* All new SECOND_PASS_INTERFACE fields added from this point */
-	struct ProcBoardInfo    *BoardData;
-	struct CPU_Info         *CPU_Data;
-	/* All new fields must be added from this point */
-} Voyager_KernelSUS_Mbox_t;
-
-/* structure for finding the right memory address to send a QIC CPI to */
-struct voyager_qic_cpi {
-	/* Each cache line (32 bytes) can trigger a cpi.  The cpi
-	 * read/write may occur anywhere in the cache line---pick the
-	 * middle to be safe */
-	struct  {
-		__u32 pad1[3];
-		__u32 cpi;
-		__u32 pad2[4];
-	} qic_cpi[8];
-};
-
-struct voyager_status {
-	__u32	power_fail:1;
-	__u32	switch_off:1;
-	__u32	request_from_kernel:1;
-};
-
-struct voyager_psi_regs {
-	__u8 cat_id;
-	__u8 cat_dev;
-	__u8 cat_control;
-	__u8 subaddr;
-	__u8 dummy4;
-	__u8 checkbit;
-	__u8 subaddr_low;
-	__u8 subaddr_high;
-	__u8 intstatus;
-	__u8 stat1;
-	__u8 stat3;
-	__u8 fault;
-	__u8 tms;
-	__u8 gen;
-	__u8 sysconf;
-	__u8 dummy15;
-};
-
-struct voyager_psi_subregs {
-	__u8 supply;
-	__u8 mask;
-	__u8 present;
-	__u8 DCfail;
-	__u8 ACfail;
-	__u8 fail;
-	__u8 UPSfail;
-	__u8 genstatus;
-};
-
-struct voyager_psi {
-	struct voyager_psi_regs regs;
-	struct voyager_psi_subregs subregs;
-};
-
-struct voyager_SUS {
-#define	VOYAGER_DUMP_BUTTON_NMI		0x1
-#define VOYAGER_SUS_VALID		0x2
-#define VOYAGER_SYSINT_COMPLETE		0x3
-	__u8	SUS_mbox;
-#define VOYAGER_NO_COMMAND		0x0
-#define VOYAGER_IGNORE_DUMP		0x1
-#define VOYAGER_DO_DUMP			0x2
-#define VOYAGER_SYSINT_HANDSHAKE	0x3
-#define VOYAGER_DO_MEM_DUMP		0x4
-#define VOYAGER_SYSINT_WAS_RECOVERED	0x5
-	__u8	kernel_mbox;
-#define	VOYAGER_MAILBOX_VERSION		0x10
-	__u8	SUS_version;
-	__u8	kernel_version;
-#define VOYAGER_OS_HAS_SYSINT		0x1
-#define VOYAGER_OS_IN_PROGRESS		0x2
-#define VOYAGER_UPDATING_WDPERIOD	0x4
-	__u32	kernel_flags;
-#define VOYAGER_SUS_BOOTING		0x1
-#define VOYAGER_SUS_IN_PROGRESS		0x2
-	__u32	SUS_flags;
-	__u32	watchdog_period;
-	__u32	watchdog_count;
-	__u32	SUS_errorlog;
-	/* lots of system configuration stuff under here */
-};
-
-/* Variables exported by voyager_smp */
-extern __u32 voyager_extended_vic_processors;
-extern __u32 voyager_allowed_boot_processors;
-extern __u32 voyager_quad_processors;
-extern struct voyager_qic_cpi *voyager_quad_cpi_addr[NR_CPUS];
-extern struct voyager_SUS *voyager_SUS;
-
-/* variables exported always */
-extern struct task_struct *voyager_thread;
-extern int voyager_level;
-extern struct voyager_status voyager_status;
-
-/* functions exported by the voyager and voyager_smp modules */
-extern int voyager_cat_readb(__u8 module, __u8 asic, int reg);
-extern void voyager_cat_init(void);
-extern void voyager_detect(struct voyager_bios_info *);
-extern void voyager_trap_init(void);
-extern void voyager_setup_irqs(void);
-extern int voyager_memory_detect(int region, __u32 *addr, __u32 *length);
-extern void voyager_smp_intr_init(void);
-extern __u8 voyager_extended_cmos_read(__u16 cmos_address);
-extern void voyager_smp_dump(void);
-extern void voyager_timer_interrupt(void);
-extern void smp_local_timer_interrupt(void);
-extern void voyager_power_off(void);
-extern void smp_voyager_power_off(void *dummy);
-extern void voyager_restart(void);
-extern void voyager_cat_power_off(void);
-extern void voyager_cat_do_common_interrupt(void);
-extern void voyager_handle_nmi(void);
-extern void voyager_smp_intr_init(void);
-/* Commands for the following are */
-#define	VOYAGER_PSI_READ	0
-#define VOYAGER_PSI_WRITE	1
-#define VOYAGER_PSI_SUBREAD	2
-#define VOYAGER_PSI_SUBWRITE	3
-extern void voyager_cat_psi(__u8, __u16, __u8 *);
diff --git a/arch/x86/include/asm/xen/events.h b/arch/x86/include/asm/xen/events.h
index 19144184983a..1df35417c412 100644
--- a/arch/x86/include/asm/xen/events.h
+++ b/arch/x86/include/asm/xen/events.h
@@ -15,10 +15,4 @@ static inline int xen_irqs_disabled(struct pt_regs *regs)
 	return raw_irqs_disabled_flags(regs->flags);
 }
 
-static inline void xen_do_IRQ(int irq, struct pt_regs *regs)
-{
-	regs->orig_ax = ~irq;
-	do_IRQ(regs);
-}
-
 #endif /* _ASM_X86_XEN_EVENTS_H */
diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h
index 81fbd735aec4..d5b7e90c0edf 100644
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -38,22 +38,30 @@ extern struct shared_info *HYPERVISOR_shared_info;
 extern struct start_info *xen_start_info;
 
 enum xen_domain_type {
-	XEN_NATIVE,
-	XEN_PV_DOMAIN,
-	XEN_HVM_DOMAIN,
+	XEN_NATIVE,		/* running on bare hardware    */
+	XEN_PV_DOMAIN,		/* running in a PV domain      */
+	XEN_HVM_DOMAIN,		/* running in a Xen hvm domain */
 };
 
-extern enum xen_domain_type xen_domain_type;
-
 #ifdef CONFIG_XEN
-#define xen_domain()		(xen_domain_type != XEN_NATIVE)
+extern enum xen_domain_type xen_domain_type;
 #else
-#define xen_domain()		(0)
+#define xen_domain_type		XEN_NATIVE
 #endif
 
-#define xen_pv_domain()		(xen_domain() && xen_domain_type == XEN_PV_DOMAIN)
-#define xen_hvm_domain()	(xen_domain() && xen_domain_type == XEN_HVM_DOMAIN)
+#define xen_domain()		(xen_domain_type != XEN_NATIVE)
+#define xen_pv_domain()		(xen_domain() &&			\
+				 xen_domain_type == XEN_PV_DOMAIN)
+#define xen_hvm_domain()	(xen_domain() &&			\
+				 xen_domain_type == XEN_HVM_DOMAIN)
+
+#ifdef CONFIG_XEN_DOM0
+#include <xen/interface/xen.h>
 
-#define xen_initial_domain()	(xen_pv_domain() && xen_start_info->flags & SIF_INITDOMAIN)
+#define xen_initial_domain()	(xen_pv_domain() && \
+				 xen_start_info->flags & SIF_INITDOMAIN)
+#else  /* !CONFIG_XEN_DOM0 */
+#define xen_initial_domain()	(0)
+#endif	/* CONFIG_XEN_DOM0 */
 
 #endif /* _ASM_X86_XEN_HYPERVISOR_H */
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 4bd990ee43df..1a918dde46b5 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -164,6 +164,7 @@ static inline pte_t __pte_ma(pteval_t x)
 
 
 xmaddr_t arbitrary_virt_to_machine(void *address);
+unsigned long arbitrary_virt_to_mfn(void *vaddr);
 void make_lowmem_page_readonly(void *vaddr);
 void make_lowmem_page_readwrite(void *vaddr);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d364df03c1d6..95f216bbfaf1 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -23,11 +23,12 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_vsyscall_64.o	:= $(PROFILING) -g0 $(nostackp)
 CFLAGS_hpet.o		:= $(nostackp)
 CFLAGS_tsc.o		:= $(nostackp)
+CFLAGS_paravirt.o	:= $(nostackp)
 
 obj-y			:= process_$(BITS).o signal.o entry_$(BITS).o
 obj-y			+= traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
 obj-y			+= time_$(BITS).o ioport.o ldt.o dumpstack.o
-obj-y			+= setup.o i8259.o irqinit_$(BITS).o setup_percpu.o
+obj-y			+= setup.o i8259.o irqinit_$(BITS).o
 obj-$(CONFIG_X86_VISWS)	+= visws_quirks.o
 obj-$(CONFIG_X86_32)	+= probe_roms_32.o
 obj-$(CONFIG_X86_32)	+= sys_i386_32.o i386_ksyms_32.o
@@ -49,31 +50,27 @@ obj-y				+= step.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
-obj-$(CONFIG_X86_BIOS_REBOOT)	+= reboot.o
+obj-y				+= reboot.o
 obj-$(CONFIG_MCA)		+= mca_32.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
 apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
-obj-$(CONFIG_X86_SMP)		+= smp.o
-obj-$(CONFIG_X86_SMP)		+= smpboot.o tsc_sync.o ipi.o tlb_$(BITS).o
-obj-$(CONFIG_X86_32_SMP)	+= smpcommon.o
-obj-$(CONFIG_X86_64_SMP)	+= tsc_sync.o smpcommon.o
+obj-$(CONFIG_SMP)		+= smp.o
+obj-$(CONFIG_SMP)		+= smpboot.o tsc_sync.o
+obj-$(CONFIG_SMP)		+= setup_percpu.o
+obj-$(CONFIG_X86_64_SMP)	+= tsc_sync.o
 obj-$(CONFIG_X86_TRAMPOLINE)	+= trampoline_$(BITS).o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
-obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o nmi.o
-obj-$(CONFIG_X86_IO_APIC)	+= io_apic.o
+obj-y				+= apic/
 obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)	+= ftrace.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
-obj-$(CONFIG_X86_NUMAQ)		+= numaq_32.o
-obj-$(CONFIG_X86_ES7000)	+= es7000_32.o
-obj-$(CONFIG_X86_SUMMIT_NUMA)	+= summit_32.o
-obj-y				+= vsmp_64.o
+obj-$(CONFIG_X86_VSMP)		+= vsmp_64.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
 obj-$(CONFIG_MODULES)		+= module_$(BITS).o
 obj-$(CONFIG_EFI) 		+= efi.o efi_$(BITS).o efi_stub_$(BITS).o
@@ -114,16 +111,13 @@ obj-$(CONFIG_SWIOTLB)			+= pci-swiotlb_64.o # NB rename without _64
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
-        obj-y				+= genapic_64.o genapic_flat_64.o genx2apic_uv_x.o tlb_uv.o
-	obj-y				+= bios_uv.o uv_irq.o uv_sysfs.o
-        obj-y				+= genx2apic_cluster.o
-        obj-y				+= genx2apic_phys.o
-        obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer_64.o
-        obj-$(CONFIG_AUDIT)		+= audit_64.o
-
-        obj-$(CONFIG_GART_IOMMU)	+= pci-gart_64.o aperture_64.o
-        obj-$(CONFIG_CALGARY_IOMMU)	+= pci-calgary_64.o tce_64.o
-        obj-$(CONFIG_AMD_IOMMU)		+= amd_iommu_init.o amd_iommu.o
-
-        obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
+	obj-$(CONFIG_X86_UV)		+= tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o
+	obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer_64.o
+	obj-$(CONFIG_AUDIT)		+= audit_64.o
+
+	obj-$(CONFIG_GART_IOMMU)	+= pci-gart_64.o aperture_64.o
+	obj-$(CONFIG_CALGARY_IOMMU)	+= pci-calgary_64.o tce_64.o
+	obj-$(CONFIG_AMD_IOMMU)		+= amd_iommu_init.o amd_iommu.o
+
+	obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
 endif
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7678f10c4568..a18eb7ce2236 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -37,15 +37,10 @@
 #include <asm/pgtable.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
-#include <asm/genapic.h>
 #include <asm/io.h>
 #include <asm/mpspec.h>
 #include <asm/smp.h>
 
-#ifdef CONFIG_X86_LOCAL_APIC
-# include <mach_apic.h>
-#endif
-
 static int __initdata acpi_force = 0;
 u32 acpi_rsdt_forced;
 #ifdef	CONFIG_ACPI
@@ -56,16 +51,7 @@ int acpi_disabled = 1;
 EXPORT_SYMBOL(acpi_disabled);
 
 #ifdef	CONFIG_X86_64
-
-#include <asm/proto.h>
-
-#else				/* X86 */
-
-#ifdef	CONFIG_X86_LOCAL_APIC
-#include <mach_apic.h>
-#include <mach_mpparse.h>
-#endif				/* CONFIG_X86_LOCAL_APIC */
-
+# include <asm/proto.h>
 #endif				/* X86 */
 
 #define BAD_MADT_ENTRY(entry, end) (					    \
@@ -121,35 +107,18 @@ enum acpi_irq_model_id acpi_irq_model = ACPI_IRQ_MODEL_PIC;
  */
 char *__init __acpi_map_table(unsigned long phys, unsigned long size)
 {
-	unsigned long base, offset, mapped_size;
-	int idx;
 
 	if (!phys || !size)
 		return NULL;
 
-	if (phys+size <= (max_low_pfn_mapped << PAGE_SHIFT))
-		return __va(phys);
-
-	offset = phys & (PAGE_SIZE - 1);
-	mapped_size = PAGE_SIZE - offset;
-	clear_fixmap(FIX_ACPI_END);
-	set_fixmap(FIX_ACPI_END, phys);
-	base = fix_to_virt(FIX_ACPI_END);
-
-	/*
-	 * Most cases can be covered by the below.
-	 */
-	idx = FIX_ACPI_END;
-	while (mapped_size < size) {
-		if (--idx < FIX_ACPI_BEGIN)
-			return NULL;	/* cannot handle this */
-		phys += PAGE_SIZE;
-		clear_fixmap(idx);
-		set_fixmap(idx, phys);
-		mapped_size += PAGE_SIZE;
-	}
+	return early_ioremap(phys, size);
+}
+void __init __acpi_unmap_table(char *map, unsigned long size)
+{
+	if (!map || !size)
+		return;
 
-	return ((unsigned char *)base + offset);
+	early_iounmap(map, size);
 }
 
 #ifdef CONFIG_PCI_MMCONFIG
@@ -239,7 +208,8 @@ static int __init acpi_parse_madt(struct acpi_table_header *table)
 		       madt->address);
 	}
 
-	acpi_madt_oem_check(madt->header.oem_id, madt->header.oem_table_id);
+	default_acpi_madt_oem_check(madt->header.oem_id,
+				    madt->header.oem_table_id);
 
 	return 0;
 }
@@ -884,7 +854,7 @@ static struct {
 	DECLARE_BITMAP(pin_programmed, MP_MAX_IOAPIC_PIN + 1);
 } mp_ioapic_routing[MAX_IO_APICS];
 
-static int mp_find_ioapic(int gsi)
+int mp_find_ioapic(int gsi)
 {
 	int i = 0;
 
@@ -899,6 +869,16 @@ static int mp_find_ioapic(int gsi)
 	return -1;
 }
 
+int mp_find_ioapic_pin(int ioapic, int gsi)
+{
+	if (WARN_ON(ioapic == -1))
+		return -1;
+	if (WARN_ON(gsi > mp_ioapic_routing[ioapic].gsi_end))
+		return -1;
+
+	return gsi - mp_ioapic_routing[ioapic].gsi_base;
+}
+
 static u8 __init uniq_ioapic_id(u8 id)
 {
 #ifdef CONFIG_X86_32
@@ -912,8 +892,8 @@ static u8 __init uniq_ioapic_id(u8 id)
 	DECLARE_BITMAP(used, 256);
 	bitmap_zero(used, 256);
 	for (i = 0; i < nr_ioapics; i++) {
-		struct mp_config_ioapic *ia = &mp_ioapics[i];
-		__set_bit(ia->mp_apicid, used);
+		struct mpc_ioapic *ia = &mp_ioapics[i];
+		__set_bit(ia->apicid, used);
 	}
 	if (!test_bit(id, used))
 		return id;
@@ -945,29 +925,29 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
 
 	idx = nr_ioapics;
 
-	mp_ioapics[idx].mp_type = MP_IOAPIC;
-	mp_ioapics[idx].mp_flags = MPC_APIC_USABLE;
-	mp_ioapics[idx].mp_apicaddr = address;
+	mp_ioapics[idx].type = MP_IOAPIC;
+	mp_ioapics[idx].flags = MPC_APIC_USABLE;
+	mp_ioapics[idx].apicaddr = address;
 
 	set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
-	mp_ioapics[idx].mp_apicid = uniq_ioapic_id(id);
+	mp_ioapics[idx].apicid = uniq_ioapic_id(id);
 #ifdef CONFIG_X86_32
-	mp_ioapics[idx].mp_apicver = io_apic_get_version(idx);
+	mp_ioapics[idx].apicver = io_apic_get_version(idx);
 #else
-	mp_ioapics[idx].mp_apicver = 0;
+	mp_ioapics[idx].apicver = 0;
 #endif
 	/*
 	 * Build basic GSI lookup table to facilitate gsi->io_apic lookups
 	 * and to prevent reprogramming of IOAPIC pins (PCI GSIs).
 	 */
-	mp_ioapic_routing[idx].apic_id = mp_ioapics[idx].mp_apicid;
+	mp_ioapic_routing[idx].apic_id = mp_ioapics[idx].apicid;
 	mp_ioapic_routing[idx].gsi_base = gsi_base;
 	mp_ioapic_routing[idx].gsi_end = gsi_base +
 	    io_apic_get_redir_entries(idx);
 
-	printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%lx, "
-	       "GSI %d-%d\n", idx, mp_ioapics[idx].mp_apicid,
-	       mp_ioapics[idx].mp_apicver, mp_ioapics[idx].mp_apicaddr,
+	printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, "
+	       "GSI %d-%d\n", idx, mp_ioapics[idx].apicid,
+	       mp_ioapics[idx].apicver, mp_ioapics[idx].apicaddr,
 	       mp_ioapic_routing[idx].gsi_base, mp_ioapic_routing[idx].gsi_end);
 
 	nr_ioapics++;
@@ -996,19 +976,19 @@ int __init acpi_probe_gsi(void)
 	return max_gsi + 1;
 }
 
-static void assign_to_mp_irq(struct mp_config_intsrc *m,
-				    struct mp_config_intsrc *mp_irq)
+static void assign_to_mp_irq(struct mpc_intsrc *m,
+				    struct mpc_intsrc *mp_irq)
 {
-	memcpy(mp_irq, m, sizeof(struct mp_config_intsrc));
+	memcpy(mp_irq, m, sizeof(struct mpc_intsrc));
 }
 
-static int mp_irq_cmp(struct mp_config_intsrc *mp_irq,
-				struct mp_config_intsrc *m)
+static int mp_irq_cmp(struct mpc_intsrc *mp_irq,
+				struct mpc_intsrc *m)
 {
-	return memcmp(mp_irq, m, sizeof(struct mp_config_intsrc));
+	return memcmp(mp_irq, m, sizeof(struct mpc_intsrc));
 }
 
-static void save_mp_irq(struct mp_config_intsrc *m)
+static void save_mp_irq(struct mpc_intsrc *m)
 {
 	int i;
 
@@ -1026,7 +1006,7 @@ void __init mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi)
 {
 	int ioapic;
 	int pin;
-	struct mp_config_intsrc mp_irq;
+	struct mpc_intsrc mp_irq;
 
 	/*
 	 * Convert 'gsi' to 'ioapic.pin'.
@@ -1034,7 +1014,7 @@ void __init mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi)
 	ioapic = mp_find_ioapic(gsi);
 	if (ioapic < 0)
 		return;
-	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
+	pin = mp_find_ioapic_pin(ioapic, gsi);
 
 	/*
 	 * TBD: This check is for faulty timer entries, where the override
@@ -1044,13 +1024,13 @@ void __init mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger, u32 gsi)
 	if ((bus_irq == 0) && (trigger == 3))
 		trigger = 1;
 
-	mp_irq.mp_type = MP_INTSRC;
-	mp_irq.mp_irqtype = mp_INT;
-	mp_irq.mp_irqflag = (trigger << 2) | polarity;
-	mp_irq.mp_srcbus = MP_ISA_BUS;
-	mp_irq.mp_srcbusirq = bus_irq;	/* IRQ */
-	mp_irq.mp_dstapic = mp_ioapics[ioapic].mp_apicid; /* APIC ID */
-	mp_irq.mp_dstirq = pin;	/* INTIN# */
+	mp_irq.type = MP_INTSRC;
+	mp_irq.irqtype = mp_INT;
+	mp_irq.irqflag = (trigger << 2) | polarity;
+	mp_irq.srcbus = MP_ISA_BUS;
+	mp_irq.srcbusirq = bus_irq;	/* IRQ */
+	mp_irq.dstapic = mp_ioapics[ioapic].apicid; /* APIC ID */
+	mp_irq.dstirq = pin;	/* INTIN# */
 
 	save_mp_irq(&mp_irq);
 }
@@ -1060,7 +1040,7 @@ void __init mp_config_acpi_legacy_irqs(void)
 	int i;
 	int ioapic;
 	unsigned int dstapic;
-	struct mp_config_intsrc mp_irq;
+	struct mpc_intsrc mp_irq;
 
 #if defined (CONFIG_MCA) || defined (CONFIG_EISA)
 	/*
@@ -1085,7 +1065,7 @@ void __init mp_config_acpi_legacy_irqs(void)
 	ioapic = mp_find_ioapic(0);
 	if (ioapic < 0)
 		return;
-	dstapic = mp_ioapics[ioapic].mp_apicid;
+	dstapic = mp_ioapics[ioapic].apicid;
 
 	/*
 	 * Use the default configuration for the IRQs 0-15.  Unless
@@ -1095,16 +1075,14 @@ void __init mp_config_acpi_legacy_irqs(void)
 		int idx;
 
 		for (idx = 0; idx < mp_irq_entries; idx++) {
-			struct mp_config_intsrc *irq = mp_irqs + idx;
+			struct mpc_intsrc *irq = mp_irqs + idx;
 
 			/* Do we already have a mapping for this ISA IRQ? */
-			if (irq->mp_srcbus == MP_ISA_BUS
-			    && irq->mp_srcbusirq == i)
+			if (irq->srcbus == MP_ISA_BUS && irq->srcbusirq == i)
 				break;
 
 			/* Do we already have a mapping for this IOAPIC pin */
-			if (irq->mp_dstapic == dstapic &&
-			    irq->mp_dstirq == i)
+			if (irq->dstapic == dstapic && irq->dstirq == i)
 				break;
 		}
 
@@ -1113,13 +1091,13 @@ void __init mp_config_acpi_legacy_irqs(void)
 			continue;	/* IRQ already used */
 		}
 
-		mp_irq.mp_type = MP_INTSRC;
-		mp_irq.mp_irqflag = 0;	/* Conforming */
-		mp_irq.mp_srcbus = MP_ISA_BUS;
-		mp_irq.mp_dstapic = dstapic;
-		mp_irq.mp_irqtype = mp_INT;
-		mp_irq.mp_srcbusirq = i; /* Identity mapped */
-		mp_irq.mp_dstirq = i;
+		mp_irq.type = MP_INTSRC;
+		mp_irq.irqflag = 0;	/* Conforming */
+		mp_irq.srcbus = MP_ISA_BUS;
+		mp_irq.dstapic = dstapic;
+		mp_irq.irqtype = mp_INT;
+		mp_irq.srcbusirq = i; /* Identity mapped */
+		mp_irq.dstirq = i;
 
 		save_mp_irq(&mp_irq);
 	}
@@ -1156,7 +1134,7 @@ int mp_register_gsi(u32 gsi, int triggering, int polarity)
 		return gsi;
 	}
 
-	ioapic_pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
+	ioapic_pin = mp_find_ioapic_pin(ioapic, gsi);
 
 #ifdef CONFIG_X86_32
 	if (ioapic_renumber_irq)
@@ -1230,22 +1208,22 @@ int mp_config_acpi_gsi(unsigned char number, unsigned int devfn, u8 pin,
 			u32 gsi, int triggering, int polarity)
 {
 #ifdef CONFIG_X86_MPPARSE
-	struct mp_config_intsrc mp_irq;
+	struct mpc_intsrc mp_irq;
 	int ioapic;
 
 	if (!acpi_ioapic)
 		return 0;
 
 	/* print the entry should happen on mptable identically */
-	mp_irq.mp_type = MP_INTSRC;
-	mp_irq.mp_irqtype = mp_INT;
-	mp_irq.mp_irqflag = (triggering == ACPI_EDGE_SENSITIVE ? 4 : 0x0c) |
+	mp_irq.type = MP_INTSRC;
+	mp_irq.irqtype = mp_INT;
+	mp_irq.irqflag = (triggering == ACPI_EDGE_SENSITIVE ? 4 : 0x0c) |
 				(polarity == ACPI_ACTIVE_HIGH ? 1 : 3);
-	mp_irq.mp_srcbus = number;
-	mp_irq.mp_srcbusirq = (((devfn >> 3) & 0x1f) << 2) | ((pin - 1) & 3);
+	mp_irq.srcbus = number;
+	mp_irq.srcbusirq = (((devfn >> 3) & 0x1f) << 2) | ((pin - 1) & 3);
 	ioapic = mp_find_ioapic(gsi);
-	mp_irq.mp_dstapic = mp_ioapic_routing[ioapic].apic_id;
-	mp_irq.mp_dstirq = gsi - mp_ioapic_routing[ioapic].gsi_base;
+	mp_irq.dstapic = mp_ioapic_routing[ioapic].apic_id;
+	mp_irq.dstirq = mp_find_ioapic_pin(ioapic, gsi);
 
 	save_mp_irq(&mp_irq);
 #endif
@@ -1372,7 +1350,7 @@ static void __init acpi_process_madt(void)
 		if (!error) {
 			acpi_lapic = 1;
 
-#ifdef CONFIG_X86_GENERICARCH
+#ifdef CONFIG_X86_BIGSMP
 			generic_bigsmp_probe();
 #endif
 			/*
@@ -1384,9 +1362,8 @@ static void __init acpi_process_madt(void)
 				acpi_ioapic = 1;
 
 				smp_found_config = 1;
-#ifdef CONFIG_X86_32
-				setup_apic_routing();
-#endif
+				if (apic->setup_apic_routing)
+					apic->setup_apic_routing();
 			}
 		}
 		if (error == -EINVAL) {
diff --git a/arch/x86/kernel/acpi/realmode/wakeup.S b/arch/x86/kernel/acpi/realmode/wakeup.S
index 3355973b12ac..580b4e296010 100644
--- a/arch/x86/kernel/acpi/realmode/wakeup.S
+++ b/arch/x86/kernel/acpi/realmode/wakeup.S
@@ -3,8 +3,8 @@
  */
 #include <asm/segment.h>
 #include <asm/msr-index.h>
-#include <asm/page.h>
-#include <asm/pgtable.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
 #include <asm/processor-flags.h>
 
 	.code16
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index a60c1f3bcb87..7c243a2c5115 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -101,6 +101,7 @@ int acpi_save_state_mem(void)
 	stack_start.sp = temp_stack + sizeof(temp_stack);
 	early_gdt_descr.address =
 			(unsigned long)get_cpu_gdt_table(smp_processor_id());
+	initial_gs = per_cpu_offset(smp_processor_id());
 #endif
 	initial_code = (unsigned long)wakeup_long64;
 	saved_magic = 0x123456789abcdef0;
diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index a12e6a9fb659..8ded418b0593 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -1,7 +1,7 @@
 	.section .text.page_aligned
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 
 # Copyright 2003, 2008 Pavel Machek <pavel@suse.cz>, distribute under GPLv2
 
diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 96258d9dc974..8ea5164cbd04 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -1,8 +1,8 @@
 .text
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/pgtable.h>
-#include <asm/page.h>
+#include <asm/pgtable_types.h>
+#include <asm/page_types.h>
 #include <asm/msr.h>
 #include <asm/asm-offsets.h>
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a84ac7b570e6..6907b8e85d52 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -498,12 +498,12 @@ void *text_poke_early(void *addr, const void *opcode, size_t len)
  */
 void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 {
-	unsigned long flags;
 	char *vaddr;
 	int nr_pages = 2;
 	struct page *pages[2];
 	int i;
 
+	might_sleep();
 	if (!core_kernel_text((unsigned long)addr)) {
 		pages[0] = vmalloc_to_page(addr);
 		pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -517,9 +517,9 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
 		nr_pages = 1;
 	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
 	BUG_ON(!vaddr);
-	local_irq_save(flags);
+	local_irq_disable();
 	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
+	local_irq_enable();
 	vunmap(vaddr);
 	sync_core();
 	/* Could also do a CLFLUSH here to speed up CPU recovery; but
diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile
new file mode 100644
index 000000000000..da7b7b9f8bd8
--- /dev/null
+++ b/arch/x86/kernel/apic/Makefile
@@ -0,0 +1,19 @@
+#
+# Makefile for local APIC drivers and for the IO-APIC code
+#
+
+obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o probe_$(BITS).o ipi.o nmi.o
+obj-$(CONFIG_X86_IO_APIC)	+= io_apic.o
+obj-$(CONFIG_SMP)		+= ipi.o
+
+ifeq ($(CONFIG_X86_64),y)
+obj-y				+= apic_flat_64.o
+obj-$(CONFIG_X86_X2APIC)	+= x2apic_cluster.o
+obj-$(CONFIG_X86_X2APIC)	+= x2apic_phys.o
+obj-$(CONFIG_X86_UV)		+= x2apic_uv_x.o
+endif
+
+obj-$(CONFIG_X86_BIGSMP)	+= bigsmp_32.o
+obj-$(CONFIG_X86_NUMAQ)		+= numaq_32.o
+obj-$(CONFIG_X86_ES7000)	+= es7000_32.o
+obj-$(CONFIG_X86_SUMMIT)	+= summit_32.o
diff --git a/arch/x86/kernel/apic.c b/arch/x86/kernel/apic/apic.c
index 570f36e44e59..f9cecdfd05c5 100644
--- a/arch/x86/kernel/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1,7 +1,7 @@
 /*
  *	Local APIC handling, local APIC timers
  *
- *	(c) 1999, 2000 Ingo Molnar <mingo@redhat.com>
+ *	(c) 1999, 2000, 2009 Ingo Molnar <mingo@redhat.com>
  *
  *	Fixes
  *	Maciej W. Rozycki	:	Bits for genuine 82489DX APICs;
@@ -14,51 +14,69 @@
  *	Mikael Pettersson	:	PM converted to driver model.
  */
 
-#include <linux/init.h>
-
-#include <linux/mm.h>
-#include <linux/delay.h>
-#include <linux/bootmem.h>
-#include <linux/interrupt.h>
-#include <linux/mc146818rtc.h>
 #include <linux/kernel_stat.h>
-#include <linux/sysdev.h>
-#include <linux/ioport.h>
-#include <linux/cpu.h>
-#include <linux/clockchips.h>
+#include <linux/mc146818rtc.h>
 #include <linux/acpi_pmtmr.h>
+#include <linux/clockchips.h>
+#include <linux/interrupt.h>
+#include <linux/bootmem.h>
+#include <linux/ftrace.h>
+#include <linux/ioport.h>
 #include <linux/module.h>
-#include <linux/dmi.h>
+#include <linux/sysdev.h>
+#include <linux/delay.h>
+#include <linux/timex.h>
 #include <linux/dmar.h>
-#include <linux/ftrace.h>
-#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/cpu.h>
+#include <linux/dmi.h>
 #include <linux/nmi.h>
-#include <linux/timex.h>
+#include <linux/smp.h>
+#include <linux/mm.h>
 
+#include <asm/pgalloc.h>
 #include <asm/atomic.h>
-#include <asm/mtrr.h>
 #include <asm/mpspec.h>
-#include <asm/desc.h>
-#include <asm/arch_hooks.h>
-#include <asm/hpet.h>
-#include <asm/pgalloc.h>
 #include <asm/i8253.h>
-#include <asm/idle.h>
+#include <asm/i8259.h>
 #include <asm/proto.h>
 #include <asm/apic.h>
-#include <asm/i8259.h>
+#include <asm/desc.h>
+#include <asm/hpet.h>
+#include <asm/idle.h>
+#include <asm/mtrr.h>
 #include <asm/smp.h>
 
-#include <mach_apic.h>
-#include <mach_apicdef.h>
-#include <mach_ipi.h>
+unsigned int num_processors;
+
+unsigned disabled_cpus __cpuinitdata;
+
+/* Processor that is doing the boot up */
+unsigned int boot_cpu_physical_apicid = -1U;
 
 /*
- * Sanity check
+ * The highest APIC ID seen during enumeration.
+ *
+ * This determines the messaging protocol we can use: if all APIC IDs
+ * are in the 0 ... 7 range, then we can use logical addressing which
+ * has some performance advantages (better broadcasting).
+ *
+ * If there's an APIC ID above 8, we use physical addressing.
  */
-#if ((SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F)
-# error SPURIOUS_APIC_VECTOR definition error
-#endif
+unsigned int max_physical_apicid;
+
+/*
+ * Bitmask of physically existing CPUs:
+ */
+physid_mask_t phys_cpu_present_map;
+
+/*
+ * Map cpu index to physical APIC ID
+ */
+DEFINE_EARLY_PER_CPU(u16, x86_cpu_to_apicid, BAD_APICID);
+DEFINE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid, BAD_APICID);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
 
 #ifdef CONFIG_X86_32
 /*
@@ -92,11 +110,7 @@ static __init int setup_apicpmtimer(char *s)
 __setup("apicpmtimer", setup_apicpmtimer);
 #endif
 
-#ifdef CONFIG_X86_64
-#define HAVE_X2APIC
-#endif
-
-#ifdef HAVE_X2APIC
+#ifdef CONFIG_X86_X2APIC
 int x2apic;
 /* x2apic enabled before OS handover */
 static int x2apic_preenabled;
@@ -194,18 +208,13 @@ static int modern_apic(void)
 	return lapic_get_version() >= 0x14;
 }
 
-/*
- * Paravirt kernels also might be using these below ops. So we still
- * use generic apic_read()/apic_write(), which might be pointing to different
- * ops in PARAVIRT case.
- */
-void xapic_wait_icr_idle(void)
+void native_apic_wait_icr_idle(void)
 {
 	while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
 		cpu_relax();
 }
 
-u32 safe_xapic_wait_icr_idle(void)
+u32 native_safe_apic_wait_icr_idle(void)
 {
 	u32 send_status;
 	int timeout;
@@ -221,13 +230,13 @@ u32 safe_xapic_wait_icr_idle(void)
 	return send_status;
 }
 
-void xapic_icr_write(u32 low, u32 id)
+void native_apic_icr_write(u32 low, u32 id)
 {
 	apic_write(APIC_ICR2, SET_APIC_DEST_FIELD(id));
 	apic_write(APIC_ICR, low);
 }
 
-static u64 xapic_icr_read(void)
+u64 native_apic_icr_read(void)
 {
 	u32 icr1, icr2;
 
@@ -237,54 +246,6 @@ static u64 xapic_icr_read(void)
 	return icr1 | ((u64)icr2 << 32);
 }
 
-static struct apic_ops xapic_ops = {
-	.read = native_apic_mem_read,
-	.write = native_apic_mem_write,
-	.icr_read = xapic_icr_read,
-	.icr_write = xapic_icr_write,
-	.wait_icr_idle = xapic_wait_icr_idle,
-	.safe_wait_icr_idle = safe_xapic_wait_icr_idle,
-};
-
-struct apic_ops __read_mostly *apic_ops = &xapic_ops;
-EXPORT_SYMBOL_GPL(apic_ops);
-
-#ifdef HAVE_X2APIC
-static void x2apic_wait_icr_idle(void)
-{
-	/* no need to wait for icr idle in x2apic */
-	return;
-}
-
-static u32 safe_x2apic_wait_icr_idle(void)
-{
-	/* no need to wait for icr idle in x2apic */
-	return 0;
-}
-
-void x2apic_icr_write(u32 low, u32 id)
-{
-	wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low);
-}
-
-static u64 x2apic_icr_read(void)
-{
-	unsigned long val;
-
-	rdmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), val);
-	return val;
-}
-
-static struct apic_ops x2apic_ops = {
-	.read = native_apic_msr_read,
-	.write = native_apic_msr_write,
-	.icr_read = x2apic_icr_read,
-	.icr_write = x2apic_icr_write,
-	.wait_icr_idle = x2apic_wait_icr_idle,
-	.safe_wait_icr_idle = safe_x2apic_wait_icr_idle,
-};
-#endif
-
 /**
  * enable_NMI_through_LVT0 - enable NMI through local vector table 0
  */
@@ -457,7 +418,7 @@ static void lapic_timer_setup(enum clock_event_mode mode,
 static void lapic_timer_broadcast(const struct cpumask *mask)
 {
 #ifdef CONFIG_SMP
-	send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
+	apic->send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
 #endif
 }
 
@@ -535,7 +496,8 @@ static void __init lapic_cal_handler(struct clock_event_device *dev)
 	}
 }
 
-static int __init calibrate_by_pmtimer(long deltapm, long *delta)
+static int __init
+calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc)
 {
 	const long pm_100ms = PMTMR_TICKS_PER_SEC / 10;
 	const long pm_thresh = pm_100ms / 100;
@@ -546,7 +508,7 @@ static int __init calibrate_by_pmtimer(long deltapm, long *delta)
 	return -1;
 #endif
 
-	apic_printk(APIC_VERBOSE, "... PM timer delta = %ld\n", deltapm);
+	apic_printk(APIC_VERBOSE, "... PM-Timer delta = %ld\n", deltapm);
 
 	/* Check, if the PM timer is available */
 	if (!deltapm)
@@ -556,19 +518,30 @@ static int __init calibrate_by_pmtimer(long deltapm, long *delta)
 
 	if (deltapm > (pm_100ms - pm_thresh) &&
 	    deltapm < (pm_100ms + pm_thresh)) {
-		apic_printk(APIC_VERBOSE, "... PM timer result ok\n");
-	} else {
-		res = (((u64)deltapm) *  mult) >> 22;
-		do_div(res, 1000000);
-		pr_warning("APIC calibration not consistent "
-			"with PM Timer: %ldms instead of 100ms\n",
-			(long)res);
-		/* Correct the lapic counter value */
-		res = (((u64)(*delta)) * pm_100ms);
+		apic_printk(APIC_VERBOSE, "... PM-Timer result ok\n");
+		return 0;
+	}
+
+	res = (((u64)deltapm) *  mult) >> 22;
+	do_div(res, 1000000);
+	pr_warning("APIC calibration not consistent "
+		   "with PM-Timer: %ldms instead of 100ms\n",(long)res);
+
+	/* Correct the lapic counter value */
+	res = (((u64)(*delta)) * pm_100ms);
+	do_div(res, deltapm);
+	pr_info("APIC delta adjusted to PM-Timer: "
+		"%lu (%ld)\n", (unsigned long)res, *delta);
+	*delta = (long)res;
+
+	/* Correct the tsc counter value */
+	if (cpu_has_tsc) {
+		res = (((u64)(*deltatsc)) * pm_100ms);
 		do_div(res, deltapm);
-		pr_info("APIC delta adjusted to PM-Timer: "
-			"%lu (%ld)\n", (unsigned long)res, *delta);
-		*delta = (long)res;
+		apic_printk(APIC_VERBOSE, "TSC delta adjusted to "
+					  "PM-Timer: %lu (%ld) \n",
+					(unsigned long)res, *deltatsc);
+		*deltatsc = (long)res;
 	}
 
 	return 0;
@@ -579,7 +552,7 @@ static int __init calibrate_APIC_clock(void)
 	struct clock_event_device *levt = &__get_cpu_var(lapic_events);
 	void (*real_handler)(struct clock_event_device *dev);
 	unsigned long deltaj;
-	long delta;
+	long delta, deltatsc;
 	int pm_referenced = 0;
 
 	local_irq_disable();
@@ -609,9 +582,11 @@ static int __init calibrate_APIC_clock(void)
 	delta = lapic_cal_t1 - lapic_cal_t2;
 	apic_printk(APIC_VERBOSE, "... lapic delta = %ld\n", delta);
 
+	deltatsc = (long)(lapic_cal_tsc2 - lapic_cal_tsc1);
+
 	/* we trust the PM based calibration if possible */
 	pm_referenced = !calibrate_by_pmtimer(lapic_cal_pm2 - lapic_cal_pm1,
-					&delta);
+					&delta, &deltatsc);
 
 	/* Calculate the scaled math multiplication factor */
 	lapic_clockevent.mult = div_sc(delta, TICK_NSEC * LAPIC_CAL_LOOPS,
@@ -629,11 +604,10 @@ static int __init calibrate_APIC_clock(void)
 		    calibration_result);
 
 	if (cpu_has_tsc) {
-		delta = (long)(lapic_cal_tsc2 - lapic_cal_tsc1);
 		apic_printk(APIC_VERBOSE, "..... CPU clock speed is "
 			    "%ld.%04ld MHz.\n",
-			    (delta / LAPIC_CAL_LOOPS) / (1000000 / HZ),
-			    (delta / LAPIC_CAL_LOOPS) % (1000000 / HZ));
+			    (deltatsc / LAPIC_CAL_LOOPS) / (1000000 / HZ),
+			    (deltatsc / LAPIC_CAL_LOOPS) % (1000000 / HZ));
 	}
 
 	apic_printk(APIC_VERBOSE, "..... host bus clock speed is "
@@ -991,11 +965,11 @@ int __init verify_local_APIC(void)
 	 */
 	reg0 = apic_read(APIC_ID);
 	apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg0);
-	apic_write(APIC_ID, reg0 ^ APIC_ID_MASK);
+	apic_write(APIC_ID, reg0 ^ apic->apic_id_mask);
 	reg1 = apic_read(APIC_ID);
 	apic_printk(APIC_DEBUG, "Getting ID: %x\n", reg1);
 	apic_write(APIC_ID, reg0);
-	if (reg1 != (reg0 ^ APIC_ID_MASK))
+	if (reg1 != (reg0 ^ apic->apic_id_mask))
 		return 0;
 
 	/*
@@ -1089,7 +1063,7 @@ static void __cpuinit lapic_setup_esr(void)
 		return;
 	}
 
-	if (esr_disable) {
+	if (apic->disable_esr) {
 		/*
 		 * Something untraceable is creating bad interrupts on
 		 * secondary quads ... for the moment, just leave the
@@ -1130,9 +1104,14 @@ void __cpuinit setup_local_APIC(void)
 	unsigned int value;
 	int i, j;
 
+	if (disable_apic) {
+		arch_disable_smp_support();
+		return;
+	}
+
 #ifdef CONFIG_X86_32
 	/* Pound the ESR really hard over the head with a big hammer - mbligh */
-	if (lapic_is_integrated() && esr_disable) {
+	if (lapic_is_integrated() && apic->disable_esr) {
 		apic_write(APIC_ESR, 0);
 		apic_write(APIC_ESR, 0);
 		apic_write(APIC_ESR, 0);
@@ -1146,7 +1125,7 @@ void __cpuinit setup_local_APIC(void)
 	 * Double-check whether this APIC is really registered.
 	 * This is meaningless in clustered apic mode, so we skip it.
 	 */
-	if (!apic_id_registered())
+	if (!apic->apic_id_registered())
 		BUG();
 
 	/*
@@ -1154,7 +1133,7 @@ void __cpuinit setup_local_APIC(void)
 	 * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
 	 * document number 292116).  So here it goes...
 	 */
-	init_apic_ldr();
+	apic->init_apic_ldr();
 
 	/*
 	 * Set Task Priority to 'accept all'. We never change this
@@ -1282,17 +1261,12 @@ void __cpuinit end_local_APIC_setup(void)
 	apic_pm_activate();
 }
 
-#ifdef HAVE_X2APIC
+#ifdef CONFIG_X86_X2APIC
 void check_x2apic(void)
 {
-	int msr, msr2;
-
-	rdmsr(MSR_IA32_APICBASE, msr, msr2);
-
-	if (msr & X2APIC_ENABLE) {
+	if (x2apic_enabled()) {
 		pr_info("x2apic enabled by BIOS, switching to x2apic ops\n");
 		x2apic_preenabled = x2apic = 1;
-		apic_ops = &x2apic_ops;
 	}
 }
 
@@ -1300,6 +1274,9 @@ void enable_x2apic(void)
 {
 	int msr, msr2;
 
+	if (!x2apic)
+		return;
+
 	rdmsr(MSR_IA32_APICBASE, msr, msr2);
 	if (!(msr & X2APIC_ENABLE)) {
 		pr_info("Enabling x2apic\n");
@@ -1363,7 +1340,6 @@ void __init enable_IR_x2apic(void)
 
 	if (!x2apic) {
 		x2apic = 1;
-		apic_ops = &x2apic_ops;
 		enable_x2apic();
 	}
 
@@ -1401,7 +1377,7 @@ end:
 
 	return;
 }
-#endif /* HAVE_X2APIC */
+#endif /* CONFIG_X86_X2APIC */
 
 #ifdef CONFIG_X86_64
 /*
@@ -1532,7 +1508,7 @@ void __init early_init_lapic_mapping(void)
  */
 void __init init_apic_mappings(void)
 {
-#ifdef HAVE_X2APIC
+#ifdef CONFIG_X86_X2APIC
 	if (x2apic) {
 		boot_cpu_physical_apicid = read_apic_id();
 		return;
@@ -1570,11 +1546,11 @@ int apic_version[MAX_APICS];
 
 int __init APIC_init_uniprocessor(void)
 {
-#ifdef CONFIG_X86_64
 	if (disable_apic) {
 		pr_info("Apic disabled\n");
 		return -1;
 	}
+#ifdef CONFIG_X86_64
 	if (!cpu_has_apic) {
 		disable_apic = 1;
 		pr_info("Apic disabled by BIOS\n");
@@ -1596,11 +1572,9 @@ int __init APIC_init_uniprocessor(void)
 	}
 #endif
 
-#ifdef HAVE_X2APIC
 	enable_IR_x2apic();
-#endif
 #ifdef CONFIG_X86_64
-	setup_apic_routing();
+	default_setup_apic_routing();
 #endif
 
 	verify_local_APIC();
@@ -1621,35 +1595,31 @@ int __init APIC_init_uniprocessor(void)
 	physid_set_mask_of_physid(boot_cpu_physical_apicid, &phys_cpu_present_map);
 	setup_local_APIC();
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_IO_APIC
 	/*
 	 * Now enable IO-APICs, actually call clear_IO_APIC
-	 * We need clear_IO_APIC before enabling vector on BP
+	 * We need clear_IO_APIC before enabling error vector
 	 */
 	if (!skip_ioapic_setup && nr_ioapics)
 		enable_IO_APIC();
 #endif
 
-#ifdef CONFIG_X86_IO_APIC
-	if (!smp_found_config || skip_ioapic_setup || !nr_ioapics)
-#endif
-		localise_nmi_watchdog();
 	end_local_APIC_setup();
 
 #ifdef CONFIG_X86_IO_APIC
 	if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
 		setup_IO_APIC();
-# ifdef CONFIG_X86_64
-	else
+	else {
 		nr_ioapics = 0;
-# endif
+		localise_nmi_watchdog();
+	}
+#else
+	localise_nmi_watchdog();
 #endif
 
+	setup_boot_clock();
 #ifdef CONFIG_X86_64
-	setup_boot_APIC_clock();
 	check_nmi_watchdog();
-#else
-	setup_boot_clock();
 #endif
 
 	return 0;
@@ -1738,7 +1708,8 @@ void __init connect_bsp_APIC(void)
 		outb(0x01, 0x23);
 	}
 #endif
-	enable_apic_mode();
+	if (apic->enable_apic_mode)
+		apic->enable_apic_mode();
 }
 
 /**
@@ -1876,29 +1847,39 @@ void __cpuinit generic_processor_info(int apicid, int version)
 	}
 #endif
 
-#if defined(CONFIG_X86_SMP) || defined(CONFIG_X86_64)
-	/* are we being called early in kernel startup? */
-	if (early_per_cpu_ptr(x86_cpu_to_apicid)) {
-		u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
-		u16 *bios_cpu_apicid = early_per_cpu_ptr(x86_bios_cpu_apicid);
-
-		cpu_to_apicid[cpu] = apicid;
-		bios_cpu_apicid[cpu] = apicid;
-	} else {
-		per_cpu(x86_cpu_to_apicid, cpu) = apicid;
-		per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
-	}
+#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
+	early_per_cpu(x86_bios_cpu_apicid, cpu) = apicid;
 #endif
 
 	set_cpu_possible(cpu, true);
 	set_cpu_present(cpu, true);
 }
 
-#ifdef CONFIG_X86_64
 int hard_smp_processor_id(void)
 {
 	return read_apic_id();
 }
+
+void default_init_apic_ldr(void)
+{
+	unsigned long val;
+
+	apic_write(APIC_DFR, APIC_DFR_VALUE);
+	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
+	val |= SET_APIC_LOGICAL_ID(1UL << smp_processor_id());
+	apic_write(APIC_LDR, val);
+}
+
+#ifdef CONFIG_X86_32
+int default_apicid_to_node(int logical_apicid)
+{
+#ifdef CONFIG_SMP
+	return apicid_2_node[hard_smp_processor_id()];
+#else
+	return 0;
+#endif
+}
 #endif
 
 /*
@@ -1976,7 +1957,7 @@ static int lapic_resume(struct sys_device *dev)
 
 	local_irq_save(flags);
 
-#ifdef HAVE_X2APIC
+#ifdef CONFIG_X86_X2APIC
 	if (x2apic)
 		enable_x2apic();
 	else
diff --git a/arch/x86/kernel/genapic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 34185488e4fb..f933822dba18 100644
--- a/arch/x86/kernel/genapic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -17,9 +17,8 @@
 #include <linux/init.h>
 #include <linux/hardirq.h>
 #include <asm/smp.h>
+#include <asm/apic.h>
 #include <asm/ipi.h>
-#include <asm/genapic.h>
-#include <mach_apicdef.h>
 
 #ifdef CONFIG_ACPI
 #include <acpi/acpi_bus.h>
@@ -74,7 +73,7 @@ static inline void _flat_send_IPI_mask(unsigned long mask, int vector)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	__send_IPI_dest_field(mask, vector, APIC_DEST_LOGICAL);
+	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
 	local_irq_restore(flags);
 }
 
@@ -85,14 +84,15 @@ static void flat_send_IPI_mask(const struct cpumask *cpumask, int vector)
 	_flat_send_IPI_mask(mask, vector);
 }
 
-static void flat_send_IPI_mask_allbutself(const struct cpumask *cpumask,
-					  int vector)
+static void
+ flat_send_IPI_mask_allbutself(const struct cpumask *cpumask, int vector)
 {
 	unsigned long mask = cpumask_bits(cpumask)[0];
 	int cpu = smp_processor_id();
 
 	if (cpu < BITS_PER_LONG)
 		clear_bit(cpu, &mask);
+
 	_flat_send_IPI_mask(mask, vector);
 }
 
@@ -114,23 +114,27 @@ static void flat_send_IPI_allbutself(int vector)
 			_flat_send_IPI_mask(mask, vector);
 		}
 	} else if (num_online_cpus() > 1) {
-		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector,APIC_DEST_LOGICAL);
+		__default_send_IPI_shortcut(APIC_DEST_ALLBUT,
+					    vector, apic->dest_logical);
 	}
 }
 
 static void flat_send_IPI_all(int vector)
 {
-	if (vector == NMI_VECTOR)
+	if (vector == NMI_VECTOR) {
 		flat_send_IPI_mask(cpu_online_mask, vector);
-	else
-		__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
+	} else {
+		__default_send_IPI_shortcut(APIC_DEST_ALLINC,
+					    vector, apic->dest_logical);
+	}
 }
 
-static unsigned int get_apic_id(unsigned long x)
+static unsigned int flat_get_apic_id(unsigned long x)
 {
 	unsigned int id;
 
 	id = (((x)>>24) & 0xFFu);
+
 	return id;
 }
 
@@ -146,7 +150,7 @@ static unsigned int read_xapic_id(void)
 {
 	unsigned int id;
 
-	id = get_apic_id(apic_read(APIC_ID));
+	id = flat_get_apic_id(apic_read(APIC_ID));
 	return id;
 }
 
@@ -169,31 +173,67 @@ static unsigned int flat_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
 	return mask1 & mask2;
 }
 
-static unsigned int phys_pkg_id(int index_msb)
+static int flat_phys_pkg_id(int initial_apic_id, int index_msb)
 {
 	return hard_smp_processor_id() >> index_msb;
 }
 
-struct genapic apic_flat =  {
-	.name = "flat",
-	.acpi_madt_oem_check = flat_acpi_madt_oem_check,
-	.int_delivery_mode = dest_LowestPrio,
-	.int_dest_mode = (APIC_DEST_LOGICAL != 0),
-	.target_cpus = flat_target_cpus,
-	.vector_allocation_domain = flat_vector_allocation_domain,
-	.apic_id_registered = flat_apic_id_registered,
-	.init_apic_ldr = flat_init_apic_ldr,
-	.send_IPI_all = flat_send_IPI_all,
-	.send_IPI_allbutself = flat_send_IPI_allbutself,
-	.send_IPI_mask = flat_send_IPI_mask,
-	.send_IPI_mask_allbutself = flat_send_IPI_mask_allbutself,
-	.send_IPI_self = apic_send_IPI_self,
-	.cpu_mask_to_apicid = flat_cpu_mask_to_apicid,
-	.cpu_mask_to_apicid_and = flat_cpu_mask_to_apicid_and,
-	.phys_pkg_id = phys_pkg_id,
-	.get_apic_id = get_apic_id,
-	.set_apic_id = set_apic_id,
-	.apic_id_mask = (0xFFu<<24),
+struct apic apic_flat =  {
+	.name				= "flat",
+	.probe				= NULL,
+	.acpi_madt_oem_check		= flat_acpi_madt_oem_check,
+	.apic_id_registered		= flat_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	.irq_dest_mode			= 1, /* logical */
+
+	.target_cpus			= flat_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= NULL,
+	.check_apicid_present		= NULL,
+
+	.vector_allocation_domain	= flat_vector_allocation_domain,
+	.init_apic_ldr			= flat_init_apic_ldr,
+
+	.ioapic_phys_id_map		= NULL,
+	.setup_apic_routing		= NULL,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= NULL,
+	.cpu_to_logical_apicid		= NULL,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= NULL,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= flat_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= flat_get_apic_id,
+	.set_apic_id			= set_apic_id,
+	.apic_id_mask			= 0xFFu << 24,
+
+	.cpu_mask_to_apicid		= flat_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= flat_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= flat_send_IPI_mask,
+	.send_IPI_mask_allbutself	= flat_send_IPI_mask_allbutself,
+	.send_IPI_allbutself		= flat_send_IPI_allbutself,
+	.send_IPI_all			= flat_send_IPI_all,
+	.send_IPI_self			= apic_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+	.wait_for_init_deassert		= NULL,
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 };
 
 /*
@@ -232,18 +272,18 @@ static void physflat_vector_allocation_domain(int cpu, struct cpumask *retmask)
 
 static void physflat_send_IPI_mask(const struct cpumask *cpumask, int vector)
 {
-	send_IPI_mask_sequence(cpumask, vector);
+	default_send_IPI_mask_sequence_phys(cpumask, vector);
 }
 
 static void physflat_send_IPI_mask_allbutself(const struct cpumask *cpumask,
 					      int vector)
 {
-	send_IPI_mask_allbutself(cpumask, vector);
+	default_send_IPI_mask_allbutself_phys(cpumask, vector);
 }
 
 static void physflat_send_IPI_allbutself(int vector)
 {
-	send_IPI_mask_allbutself(cpu_online_mask, vector);
+	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
 }
 
 static void physflat_send_IPI_all(int vector)
@@ -276,32 +316,72 @@ physflat_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	for_each_cpu_and(cpu, cpumask, andmask)
+	for_each_cpu_and(cpu, cpumask, andmask) {
 		if (cpumask_test_cpu(cpu, cpu_online_mask))
 			break;
+	}
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
+
 	return BAD_APICID;
 }
 
-struct genapic apic_physflat =  {
-	.name = "physical flat",
-	.acpi_madt_oem_check = physflat_acpi_madt_oem_check,
-	.int_delivery_mode = dest_Fixed,
-	.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
-	.target_cpus = physflat_target_cpus,
-	.vector_allocation_domain = physflat_vector_allocation_domain,
-	.apic_id_registered = flat_apic_id_registered,
-	.init_apic_ldr = flat_init_apic_ldr,/*not needed, but shouldn't hurt*/
-	.send_IPI_all = physflat_send_IPI_all,
-	.send_IPI_allbutself = physflat_send_IPI_allbutself,
-	.send_IPI_mask = physflat_send_IPI_mask,
-	.send_IPI_mask_allbutself = physflat_send_IPI_mask_allbutself,
-	.send_IPI_self = apic_send_IPI_self,
-	.cpu_mask_to_apicid = physflat_cpu_mask_to_apicid,
-	.cpu_mask_to_apicid_and = physflat_cpu_mask_to_apicid_and,
-	.phys_pkg_id = phys_pkg_id,
-	.get_apic_id = get_apic_id,
-	.set_apic_id = set_apic_id,
-	.apic_id_mask = (0xFFu<<24),
+struct apic apic_physflat =  {
+
+	.name				= "physical flat",
+	.probe				= NULL,
+	.acpi_madt_oem_check		= physflat_acpi_madt_oem_check,
+	.apic_id_registered		= flat_apic_id_registered,
+
+	.irq_delivery_mode		= dest_Fixed,
+	.irq_dest_mode			= 0, /* physical */
+
+	.target_cpus			= physflat_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= 0,
+	.check_apicid_used		= NULL,
+	.check_apicid_present		= NULL,
+
+	.vector_allocation_domain	= physflat_vector_allocation_domain,
+	/* not needed, but shouldn't hurt: */
+	.init_apic_ldr			= flat_init_apic_ldr,
+
+	.ioapic_phys_id_map		= NULL,
+	.setup_apic_routing		= NULL,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= NULL,
+	.cpu_to_logical_apicid		= NULL,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= NULL,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= flat_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= flat_get_apic_id,
+	.set_apic_id			= set_apic_id,
+	.apic_id_mask			= 0xFFu << 24,
+
+	.cpu_mask_to_apicid		= physflat_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= physflat_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= physflat_send_IPI_mask,
+	.send_IPI_mask_allbutself	= physflat_send_IPI_mask_allbutself,
+	.send_IPI_allbutself		= physflat_send_IPI_allbutself,
+	.send_IPI_all			= physflat_send_IPI_all,
+	.send_IPI_self			= apic_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+	.wait_for_init_deassert		= NULL,
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
 };
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
new file mode 100644
index 000000000000..d806ecaa948f
--- /dev/null
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -0,0 +1,267 @@
+/*
+ * APIC driver for "bigsmp" xAPIC machines with more than 8 virtual CPUs.
+ *
+ * Drives the local APIC in "clustered mode".
+ */
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/dmi.h>
+#include <linux/smp.h>
+
+#include <asm/apicdef.h>
+#include <asm/fixmap.h>
+#include <asm/mpspec.h>
+#include <asm/apic.h>
+#include <asm/ipi.h>
+
+static unsigned bigsmp_get_apic_id(unsigned long x)
+{
+	return (x >> 24) & 0xFF;
+}
+
+static int bigsmp_apic_id_registered(void)
+{
+	return 1;
+}
+
+static const cpumask_t *bigsmp_target_cpus(void)
+{
+#ifdef CONFIG_SMP
+	return &cpu_online_map;
+#else
+	return &cpumask_of_cpu(0);
+#endif
+}
+
+static unsigned long bigsmp_check_apicid_used(physid_mask_t bitmap, int apicid)
+{
+	return 0;
+}
+
+static unsigned long bigsmp_check_apicid_present(int bit)
+{
+	return 1;
+}
+
+static inline unsigned long calculate_ldr(int cpu)
+{
+	unsigned long val, id;
+
+	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
+	id = per_cpu(x86_bios_cpu_apicid, cpu);
+	val |= SET_APIC_LOGICAL_ID(id);
+
+	return val;
+}
+
+/*
+ * Set up the logical destination ID.
+ *
+ * Intel recommends to set DFR, LDR and TPR before enabling
+ * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
+ * document number 292116).  So here it goes...
+ */
+static void bigsmp_init_apic_ldr(void)
+{
+	unsigned long val;
+	int cpu = smp_processor_id();
+
+	apic_write(APIC_DFR, APIC_DFR_FLAT);
+	val = calculate_ldr(cpu);
+	apic_write(APIC_LDR, val);
+}
+
+static void bigsmp_setup_apic_routing(void)
+{
+	printk(KERN_INFO
+		"Enabling APIC mode:  Physflat.  Using %d I/O APICs\n",
+		nr_ioapics);
+}
+
+static int bigsmp_apicid_to_node(int logical_apicid)
+{
+	return apicid_2_node[hard_smp_processor_id()];
+}
+
+static int bigsmp_cpu_present_to_apicid(int mps_cpu)
+{
+	if (mps_cpu < nr_cpu_ids)
+		return (int) per_cpu(x86_bios_cpu_apicid, mps_cpu);
+
+	return BAD_APICID;
+}
+
+static physid_mask_t bigsmp_apicid_to_cpu_present(int phys_apicid)
+{
+	return physid_mask_of_physid(phys_apicid);
+}
+
+/* Mapping from cpu number to logical apicid */
+static inline int bigsmp_cpu_to_logical_apicid(int cpu)
+{
+	if (cpu >= nr_cpu_ids)
+		return BAD_APICID;
+	return cpu_physical_id(cpu);
+}
+
+static physid_mask_t bigsmp_ioapic_phys_id_map(physid_mask_t phys_map)
+{
+	/* For clustered we don't have a good way to do this yet - hack */
+	return physids_promote(0xFFL);
+}
+
+static int bigsmp_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return 1;
+}
+
+/* As we are using single CPU as destination, pick only one CPU here */
+static unsigned int bigsmp_cpu_mask_to_apicid(const cpumask_t *cpumask)
+{
+	return bigsmp_cpu_to_logical_apicid(first_cpu(*cpumask));
+}
+
+static unsigned int bigsmp_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			      const struct cpumask *andmask)
+{
+	int cpu;
+
+	/*
+	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
+	 * May as well be the first.
+	 */
+	for_each_cpu_and(cpu, cpumask, andmask) {
+		if (cpumask_test_cpu(cpu, cpu_online_mask))
+			break;
+	}
+	if (cpu < nr_cpu_ids)
+		return bigsmp_cpu_to_logical_apicid(cpu);
+
+	return BAD_APICID;
+}
+
+static int bigsmp_phys_pkg_id(int cpuid_apic, int index_msb)
+{
+	return cpuid_apic >> index_msb;
+}
+
+static inline void bigsmp_send_IPI_mask(const struct cpumask *mask, int vector)
+{
+	default_send_IPI_mask_sequence_phys(mask, vector);
+}
+
+static void bigsmp_send_IPI_allbutself(int vector)
+{
+	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+}
+
+static void bigsmp_send_IPI_all(int vector)
+{
+	bigsmp_send_IPI_mask(cpu_online_mask, vector);
+}
+
+static int dmi_bigsmp; /* can be set by dmi scanners */
+
+static int hp_ht_bigsmp(const struct dmi_system_id *d)
+{
+	printk(KERN_NOTICE "%s detected: force use of apic=bigsmp\n", d->ident);
+	dmi_bigsmp = 1;
+
+	return 0;
+}
+
+
+static const struct dmi_system_id bigsmp_dmi_table[] = {
+	{ hp_ht_bigsmp, "HP ProLiant DL760 G2",
+		{	DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
+			DMI_MATCH(DMI_BIOS_VERSION, "P44-"),
+		}
+	},
+
+	{ hp_ht_bigsmp, "HP ProLiant DL740",
+		{	DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
+			DMI_MATCH(DMI_BIOS_VERSION, "P47-"),
+		}
+	},
+	{ } /* NULL entry stops DMI scanning */
+};
+
+static void bigsmp_vector_allocation_domain(int cpu, cpumask_t *retmask)
+{
+	cpus_clear(*retmask);
+	cpu_set(cpu, *retmask);
+}
+
+static int probe_bigsmp(void)
+{
+	if (def_to_bigsmp)
+		dmi_bigsmp = 1;
+	else
+		dmi_check_system(bigsmp_dmi_table);
+
+	return dmi_bigsmp;
+}
+
+struct apic apic_bigsmp = {
+
+	.name				= "bigsmp",
+	.probe				= probe_bigsmp,
+	.acpi_madt_oem_check		= NULL,
+	.apic_id_registered		= bigsmp_apic_id_registered,
+
+	.irq_delivery_mode		= dest_Fixed,
+	/* phys delivery to target CPU: */
+	.irq_dest_mode			= 0,
+
+	.target_cpus			= bigsmp_target_cpus,
+	.disable_esr			= 1,
+	.dest_logical			= 0,
+	.check_apicid_used		= bigsmp_check_apicid_used,
+	.check_apicid_present		= bigsmp_check_apicid_present,
+
+	.vector_allocation_domain	= bigsmp_vector_allocation_domain,
+	.init_apic_ldr			= bigsmp_init_apic_ldr,
+
+	.ioapic_phys_id_map		= bigsmp_ioapic_phys_id_map,
+	.setup_apic_routing		= bigsmp_setup_apic_routing,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= bigsmp_apicid_to_node,
+	.cpu_to_logical_apicid		= bigsmp_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= bigsmp_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= bigsmp_apicid_to_cpu_present,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= bigsmp_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= bigsmp_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= bigsmp_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0xFF << 24,
+
+	.cpu_mask_to_apicid		= bigsmp_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= bigsmp_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= bigsmp_send_IPI_mask,
+	.send_IPI_mask_allbutself	= NULL,
+	.send_IPI_allbutself		= bigsmp_send_IPI_allbutself,
+	.send_IPI_all			= bigsmp_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+
+	.wait_for_init_deassert		= default_wait_for_init_deassert,
+
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= default_inquire_remote_apic,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
diff --git a/arch/x86/kernel/apic/es7000_32.c b/arch/x86/kernel/apic/es7000_32.c
new file mode 100644
index 000000000000..19588f2770ee
--- /dev/null
+++ b/arch/x86/kernel/apic/es7000_32.c
@@ -0,0 +1,780 @@
+/*
+ * Written by: Garry Forsgren, Unisys Corporation
+ *             Natalie Protasevich, Unisys Corporation
+ *
+ * This file contains the code to configure and interface
+ * with Unisys ES7000 series hardware system manager.
+ *
+ * Copyright (c) 2003 Unisys Corporation.
+ * Copyright (C) 2009, Red Hat, Inc., Ingo Molnar
+ *
+ *   All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston MA 02111-1307, USA.
+ *
+ * Contact information: Unisys Corporation, Township Line & Union Meeting
+ * Roads-A, Unisys Way, Blue Bell, Pennsylvania, 19424, or:
+ *
+ * http://www.unisys.com
+ */
+#include <linux/notifier.h>
+#include <linux/spinlock.h>
+#include <linux/cpumask.h>
+#include <linux/threads.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/reboot.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/acpi.h>
+#include <linux/init.h>
+#include <linux/nmi.h>
+#include <linux/smp.h>
+#include <linux/io.h>
+
+#include <asm/apicdef.h>
+#include <asm/atomic.h>
+#include <asm/fixmap.h>
+#include <asm/mpspec.h>
+#include <asm/setup.h>
+#include <asm/apic.h>
+#include <asm/ipi.h>
+
+/*
+ * ES7000 chipsets
+ */
+
+#define NON_UNISYS			0
+#define ES7000_CLASSIC			1
+#define ES7000_ZORRO			2
+
+#define	MIP_REG				1
+#define	MIP_PSAI_REG			4
+
+#define	MIP_BUSY			1
+#define	MIP_SPIN			0xf0000
+#define	MIP_VALID			0x0100000000000000ULL
+#define	MIP_SW_APIC			0x1020b
+
+#define	MIP_PORT(val)			((val >> 32) & 0xffff)
+
+#define	MIP_RD_LO(val)			(val & 0xffffffff)
+
+struct mip_reg {
+	unsigned long long		off_0x00;
+	unsigned long long		off_0x08;
+	unsigned long long		off_0x10;
+	unsigned long long		off_0x18;
+	unsigned long long		off_0x20;
+	unsigned long long		off_0x28;
+	unsigned long long		off_0x30;
+	unsigned long long		off_0x38;
+};
+
+struct mip_reg_info {
+	unsigned long long		mip_info;
+	unsigned long long		delivery_info;
+	unsigned long long		host_reg;
+	unsigned long long		mip_reg;
+};
+
+struct psai {
+	unsigned long long		entry_type;
+	unsigned long long		addr;
+	unsigned long long		bep_addr;
+};
+
+#ifdef CONFIG_ACPI
+
+struct es7000_oem_table {
+	struct acpi_table_header	Header;
+	u32				OEMTableAddr;
+	u32				OEMTableSize;
+};
+
+static unsigned long			oem_addrX;
+static unsigned long			oem_size;
+
+#endif
+
+/*
+ * ES7000 Globals
+ */
+
+static volatile unsigned long		*psai;
+static struct mip_reg			*mip_reg;
+static struct mip_reg			*host_reg;
+static int 				mip_port;
+static unsigned long			mip_addr;
+static unsigned long			host_addr;
+
+int					es7000_plat;
+
+/*
+ * GSI override for ES7000 platforms.
+ */
+
+static unsigned int			base;
+
+static int
+es7000_rename_gsi(int ioapic, int gsi)
+{
+	if (es7000_plat == ES7000_ZORRO)
+		return gsi;
+
+	if (!base) {
+		int i;
+		for (i = 0; i < nr_ioapics; i++)
+			base += nr_ioapic_registers[i];
+	}
+
+	if (!ioapic && (gsi < 16))
+		gsi += base;
+
+	return gsi;
+}
+
+static int wakeup_secondary_cpu_via_mip(int cpu, unsigned long eip)
+{
+	unsigned long vect = 0, psaival = 0;
+
+	if (psai == NULL)
+		return -1;
+
+	vect = ((unsigned long)__pa(eip)/0x1000) << 16;
+	psaival = (0x1000000 | vect | cpu);
+
+	while (*psai & 0x1000000)
+		;
+
+	*psai = psaival;
+
+	return 0;
+}
+
+static int es7000_apic_is_cluster(void)
+{
+	/* MPENTIUMIII */
+	if (boot_cpu_data.x86 == 6 &&
+	    (boot_cpu_data.x86_model >= 7 || boot_cpu_data.x86_model <= 11))
+		return 1;
+
+	return 0;
+}
+
+static void setup_unisys(void)
+{
+	/*
+	 * Determine the generation of the ES7000 currently running.
+	 *
+	 * es7000_plat = 1 if the machine is a 5xx ES7000 box
+	 * es7000_plat = 2 if the machine is a x86_64 ES7000 box
+	 *
+	 */
+	if (!(boot_cpu_data.x86 <= 15 && boot_cpu_data.x86_model <= 2))
+		es7000_plat = ES7000_ZORRO;
+	else
+		es7000_plat = ES7000_CLASSIC;
+	ioapic_renumber_irq = es7000_rename_gsi;
+}
+
+/*
+ * Parse the OEM Table:
+ */
+static int parse_unisys_oem(char *oemptr)
+{
+	int			i;
+	int 			success = 0;
+	unsigned char		type, size;
+	unsigned long		val;
+	char			*tp = NULL;
+	struct psai		*psaip = NULL;
+	struct mip_reg_info 	*mi;
+	struct mip_reg		*host, *mip;
+
+	tp = oemptr;
+
+	tp += 8;
+
+	for (i = 0; i <= 6; i++) {
+		type = *tp++;
+		size = *tp++;
+		tp -= 2;
+		switch (type) {
+		case MIP_REG:
+			mi = (struct mip_reg_info *)tp;
+			val = MIP_RD_LO(mi->host_reg);
+			host_addr = val;
+			host = (struct mip_reg *)val;
+			host_reg = __va(host);
+			val = MIP_RD_LO(mi->mip_reg);
+			mip_port = MIP_PORT(mi->mip_info);
+			mip_addr = val;
+			mip = (struct mip_reg *)val;
+			mip_reg = __va(mip);
+			pr_debug("es7000_mipcfg: host_reg = 0x%lx \n",
+				 (unsigned long)host_reg);
+			pr_debug("es7000_mipcfg: mip_reg = 0x%lx \n",
+				 (unsigned long)mip_reg);
+			success++;
+			break;
+		case MIP_PSAI_REG:
+			psaip = (struct psai *)tp;
+			if (tp != NULL) {
+				if (psaip->addr)
+					psai = __va(psaip->addr);
+				else
+					psai = NULL;
+				success++;
+			}
+			break;
+		default:
+			break;
+		}
+		tp += size;
+	}
+
+	if (success < 2)
+		es7000_plat = NON_UNISYS;
+	else
+		setup_unisys();
+
+	return es7000_plat;
+}
+
+#ifdef CONFIG_ACPI
+static int find_unisys_acpi_oem_table(unsigned long *oem_addr)
+{
+	struct acpi_table_header *header = NULL;
+	struct es7000_oem_table *table;
+	acpi_size tbl_size;
+	acpi_status ret;
+	int i = 0;
+
+	for (;;) {
+		ret = acpi_get_table_with_size("OEM1", i++, &header, &tbl_size);
+		if (!ACPI_SUCCESS(ret))
+			return -1;
+
+		if (!memcmp((char *) &header->oem_id, "UNISYS", 6))
+			break;
+
+		early_acpi_os_unmap_memory(header, tbl_size);
+	}
+
+	table = (void *)header;
+
+	oem_addrX	= table->OEMTableAddr;
+	oem_size	= table->OEMTableSize;
+
+	early_acpi_os_unmap_memory(header, tbl_size);
+
+	*oem_addr	= (unsigned long)__acpi_map_table(oem_addrX, oem_size);
+
+	return 0;
+}
+
+static void unmap_unisys_acpi_oem_table(unsigned long oem_addr)
+{
+	if (!oem_addr)
+		return;
+
+	__acpi_unmap_table((char *)oem_addr, oem_size);
+}
+
+static int es7000_check_dsdt(void)
+{
+	struct acpi_table_header header;
+
+	if (ACPI_SUCCESS(acpi_get_table_header(ACPI_SIG_DSDT, 0, &header)) &&
+	    !strncmp(header.oem_id, "UNISYS", 6))
+		return 1;
+	return 0;
+}
+
+static int es7000_acpi_ret;
+
+/* Hook from generic ACPI tables.c */
+static int es7000_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+	unsigned long oem_addr = 0;
+	int check_dsdt;
+	int ret = 0;
+
+	/* check dsdt at first to avoid clear fix_map for oem_addr */
+	check_dsdt = es7000_check_dsdt();
+
+	if (!find_unisys_acpi_oem_table(&oem_addr)) {
+		if (check_dsdt) {
+			ret = parse_unisys_oem((char *)oem_addr);
+		} else {
+			setup_unisys();
+			ret = 1;
+		}
+		/*
+		 * we need to unmap it
+		 */
+		unmap_unisys_acpi_oem_table(oem_addr);
+	}
+
+	es7000_acpi_ret = ret;
+
+	return ret && !es7000_apic_is_cluster();
+}
+
+static int es7000_acpi_madt_oem_check_cluster(char *oem_id, char *oem_table_id)
+{
+	int ret = es7000_acpi_ret;
+
+	return ret && es7000_apic_is_cluster();
+}
+
+#else /* !CONFIG_ACPI: */
+static int es7000_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+	return 0;
+}
+
+static int es7000_acpi_madt_oem_check_cluster(char *oem_id, char *oem_table_id)
+{
+	return 0;
+}
+#endif /* !CONFIG_ACPI */
+
+static void es7000_spin(int n)
+{
+	int i = 0;
+
+	while (i++ < n)
+		rep_nop();
+}
+
+static int es7000_mip_write(struct mip_reg *mip_reg)
+{
+	int status = 0;
+	int spin;
+
+	spin = MIP_SPIN;
+	while ((host_reg->off_0x38 & MIP_VALID) != 0) {
+		if (--spin <= 0) {
+			WARN(1,	"Timeout waiting for Host Valid Flag\n");
+			return -1;
+		}
+		es7000_spin(MIP_SPIN);
+	}
+
+	memcpy(host_reg, mip_reg, sizeof(struct mip_reg));
+	outb(1, mip_port);
+
+	spin = MIP_SPIN;
+
+	while ((mip_reg->off_0x38 & MIP_VALID) == 0) {
+		if (--spin <= 0) {
+			WARN(1,	"Timeout waiting for MIP Valid Flag\n");
+			return -1;
+		}
+		es7000_spin(MIP_SPIN);
+	}
+
+	status = (mip_reg->off_0x00 & 0xffff0000000000ULL) >> 48;
+	mip_reg->off_0x38 &= ~MIP_VALID;
+
+	return status;
+}
+
+static void es7000_enable_apic_mode(void)
+{
+	struct mip_reg es7000_mip_reg;
+	int mip_status;
+
+	if (!es7000_plat)
+		return;
+
+	printk(KERN_INFO "ES7000: Enabling APIC mode.\n");
+	memset(&es7000_mip_reg, 0, sizeof(struct mip_reg));
+	es7000_mip_reg.off_0x00 = MIP_SW_APIC;
+	es7000_mip_reg.off_0x38 = MIP_VALID;
+
+	while ((mip_status = es7000_mip_write(&es7000_mip_reg)) != 0)
+		WARN(1, "Command failed, status = %x\n", mip_status);
+}
+
+static void es7000_vector_allocation_domain(int cpu, cpumask_t *retmask)
+{
+	/* Careful. Some cpus do not strictly honor the set of cpus
+	 * specified in the interrupt destination when using lowest
+	 * priority interrupt delivery mode.
+	 *
+	 * In particular there was a hyperthreading cpu observed to
+	 * deliver interrupts to the wrong hyperthread when only one
+	 * hyperthread was specified in the interrupt desitination.
+	 */
+	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
+}
+
+
+static void es7000_wait_for_init_deassert(atomic_t *deassert)
+{
+	while (!atomic_read(deassert))
+		cpu_relax();
+}
+
+static unsigned int es7000_get_apic_id(unsigned long x)
+{
+	return (x >> 24) & 0xFF;
+}
+
+static void es7000_send_IPI_mask(const struct cpumask *mask, int vector)
+{
+	default_send_IPI_mask_sequence_phys(mask, vector);
+}
+
+static void es7000_send_IPI_allbutself(int vector)
+{
+	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
+}
+
+static void es7000_send_IPI_all(int vector)
+{
+	es7000_send_IPI_mask(cpu_online_mask, vector);
+}
+
+static int es7000_apic_id_registered(void)
+{
+	return 1;
+}
+
+static const cpumask_t *target_cpus_cluster(void)
+{
+	return &CPU_MASK_ALL;
+}
+
+static const cpumask_t *es7000_target_cpus(void)
+{
+	return &cpumask_of_cpu(smp_processor_id());
+}
+
+static unsigned long
+es7000_check_apicid_used(physid_mask_t bitmap, int apicid)
+{
+	return 0;
+}
+static unsigned long es7000_check_apicid_present(int bit)
+{
+	return physid_isset(bit, phys_cpu_present_map);
+}
+
+static unsigned long calculate_ldr(int cpu)
+{
+	unsigned long id = per_cpu(x86_bios_cpu_apicid, cpu);
+
+	return SET_APIC_LOGICAL_ID(id);
+}
+
+/*
+ * Set up the logical destination ID.
+ *
+ * Intel recommends to set DFR, LdR and TPR before enabling
+ * an APIC.  See e.g. "AP-388 82489DX User's Manual" (Intel
+ * document number 292116).  So here it goes...
+ */
+static void es7000_init_apic_ldr_cluster(void)
+{
+	unsigned long val;
+	int cpu = smp_processor_id();
+
+	apic_write(APIC_DFR, APIC_DFR_CLUSTER);
+	val = calculate_ldr(cpu);
+	apic_write(APIC_LDR, val);
+}
+
+static void es7000_init_apic_ldr(void)
+{
+	unsigned long val;
+	int cpu = smp_processor_id();
+
+	apic_write(APIC_DFR, APIC_DFR_FLAT);
+	val = calculate_ldr(cpu);
+	apic_write(APIC_LDR, val);
+}
+
+static void es7000_setup_apic_routing(void)
+{
+	int apic = per_cpu(x86_bios_cpu_apicid, smp_processor_id());
+
+	printk(KERN_INFO
+	  "Enabling APIC mode:  %s. Using %d I/O APICs, target cpus %lx\n",
+		(apic_version[apic] == 0x14) ?
+			"Physical Cluster" : "Logical Cluster",
+		nr_ioapics, cpus_addr(*es7000_target_cpus())[0]);
+}
+
+static int es7000_apicid_to_node(int logical_apicid)
+{
+	return 0;
+}
+
+
+static int es7000_cpu_present_to_apicid(int mps_cpu)
+{
+	if (!mps_cpu)
+		return boot_cpu_physical_apicid;
+	else if (mps_cpu < nr_cpu_ids)
+		return per_cpu(x86_bios_cpu_apicid, mps_cpu);
+	else
+		return BAD_APICID;
+}
+
+static int cpu_id;
+
+static physid_mask_t es7000_apicid_to_cpu_present(int phys_apicid)
+{
+	physid_mask_t mask;
+
+	mask = physid_mask_of_physid(cpu_id);
+	++cpu_id;
+
+	return mask;
+}
+
+/* Mapping from cpu number to logical apicid */
+static int es7000_cpu_to_logical_apicid(int cpu)
+{
+#ifdef CONFIG_SMP
+	if (cpu >= nr_cpu_ids)
+		return BAD_APICID;
+	return cpu_2_logical_apicid[cpu];
+#else
+	return logical_smp_processor_id();
+#endif
+}
+
+static physid_mask_t es7000_ioapic_phys_id_map(physid_mask_t phys_map)
+{
+	/* For clustered we don't have a good way to do this yet - hack */
+	return physids_promote(0xff);
+}
+
+static int es7000_check_phys_apicid_present(int cpu_physical_apicid)
+{
+	boot_cpu_physical_apicid = read_apic_id();
+	return 1;
+}
+
+static unsigned int es7000_cpu_mask_to_apicid(const cpumask_t *cpumask)
+{
+	unsigned int round = 0;
+	int cpu, uninitialized_var(apicid);
+
+	/*
+	 * The cpus in the mask must all be on the apic cluster.
+	 */
+	for_each_cpu(cpu, cpumask) {
+		int new_apicid = es7000_cpu_to_logical_apicid(cpu);
+
+		if (round && APIC_CLUSTER(apicid) != APIC_CLUSTER(new_apicid)) {
+			WARN(1, "Not a valid mask!");
+
+			return BAD_APICID;
+		}
+		apicid = new_apicid;
+		round++;
+	}
+	return apicid;
+}
+
+static unsigned int
+es7000_cpu_mask_to_apicid_and(const struct cpumask *inmask,
+			      const struct cpumask *andmask)
+{
+	int apicid = es7000_cpu_to_logical_apicid(0);
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return apicid;
+
+	cpumask_and(cpumask, inmask, andmask);
+	cpumask_and(cpumask, cpumask, cpu_online_mask);
+	apicid = es7000_cpu_mask_to_apicid(cpumask);
+
+	free_cpumask_var(cpumask);
+
+	return apicid;
+}
+
+static int es7000_phys_pkg_id(int cpuid_apic, int index_msb)
+{
+	return cpuid_apic >> index_msb;
+}
+
+static int probe_es7000(void)
+{
+	/* probed later in mptable/ACPI hooks */
+	return 0;
+}
+
+static int es7000_mps_ret;
+static int es7000_mps_oem_check(struct mpc_table *mpc, char *oem,
+		char *productid)
+{
+	int ret = 0;
+
+	if (mpc->oemptr) {
+		struct mpc_oemtable *oem_table =
+			(struct mpc_oemtable *)mpc->oemptr;
+
+		if (!strncmp(oem, "UNISYS", 6))
+			ret = parse_unisys_oem((char *)oem_table);
+	}
+
+	es7000_mps_ret = ret;
+
+	return ret && !es7000_apic_is_cluster();
+}
+
+static int es7000_mps_oem_check_cluster(struct mpc_table *mpc, char *oem,
+		char *productid)
+{
+	int ret = es7000_mps_ret;
+
+	return ret && es7000_apic_is_cluster();
+}
+
+struct apic apic_es7000_cluster = {
+
+	.name				= "es7000",
+	.probe				= probe_es7000,
+	.acpi_madt_oem_check		= es7000_acpi_madt_oem_check_cluster,
+	.apic_id_registered		= es7000_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	/* logical delivery broadcast to all procs: */
+	.irq_dest_mode			= 1,
+
+	.target_cpus			= target_cpus_cluster,
+	.disable_esr			= 1,
+	.dest_logical			= 0,
+	.check_apicid_used		= es7000_check_apicid_used,
+	.check_apicid_present		= es7000_check_apicid_present,
+
+	.vector_allocation_domain	= es7000_vector_allocation_domain,
+	.init_apic_ldr			= es7000_init_apic_ldr_cluster,
+
+	.ioapic_phys_id_map		= es7000_ioapic_phys_id_map,
+	.setup_apic_routing		= es7000_setup_apic_routing,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= es7000_apicid_to_node,
+	.cpu_to_logical_apicid		= es7000_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= es7000_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= es7000_apicid_to_cpu_present,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= es7000_check_phys_apicid_present,
+	.enable_apic_mode		= es7000_enable_apic_mode,
+	.phys_pkg_id			= es7000_phys_pkg_id,
+	.mps_oem_check			= es7000_mps_oem_check_cluster,
+
+	.get_apic_id			= es7000_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0xFF << 24,
+
+	.cpu_mask_to_apicid		= es7000_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= es7000_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= es7000_send_IPI_mask,
+	.send_IPI_mask_allbutself	= NULL,
+	.send_IPI_allbutself		= es7000_send_IPI_allbutself,
+	.send_IPI_all			= es7000_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.wakeup_secondary_cpu		= wakeup_secondary_cpu_via_mip,
+
+	.trampoline_phys_low		= 0x467,
+	.trampoline_phys_high		= 0x469,
+
+	.wait_for_init_deassert		= NULL,
+
+	/* Nothing to do for most platforms, since cleared by the INIT cycle: */
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= default_inquire_remote_apic,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
+
+struct apic apic_es7000 = {
+
+	.name				= "es7000",
+	.probe				= probe_es7000,
+	.acpi_madt_oem_check		= es7000_acpi_madt_oem_check,
+	.apic_id_registered		= es7000_apic_id_registered,
+
+	.irq_delivery_mode		= dest_Fixed,
+	/* phys delivery to target CPUs: */
+	.irq_dest_mode			= 0,
+
+	.target_cpus			= es7000_target_cpus,
+	.disable_esr			= 1,
+	.dest_logical			= 0,
+	.check_apicid_used		= es7000_check_apicid_used,
+	.check_apicid_present		= es7000_check_apicid_present,
+
+	.vector_allocation_domain	= es7000_vector_allocation_domain,
+	.init_apic_ldr			= es7000_init_apic_ldr,
+
+	.ioapic_phys_id_map		= es7000_ioapic_phys_id_map,
+	.setup_apic_routing		= es7000_setup_apic_routing,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= es7000_apicid_to_node,
+	.cpu_to_logical_apicid		= es7000_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= es7000_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= es7000_apicid_to_cpu_present,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= es7000_check_phys_apicid_present,
+	.enable_apic_mode		= es7000_enable_apic_mode,
+	.phys_pkg_id			= es7000_phys_pkg_id,
+	.mps_oem_check			= es7000_mps_oem_check,
+
+	.get_apic_id			= es7000_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0xFF << 24,
+
+	.cpu_mask_to_apicid		= es7000_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= es7000_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= es7000_send_IPI_mask,
+	.send_IPI_mask_allbutself	= NULL,
+	.send_IPI_allbutself		= es7000_send_IPI_allbutself,
+	.send_IPI_all			= es7000_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.trampoline_phys_low		= 0x467,
+	.trampoline_phys_high		= 0x469,
+
+	.wait_for_init_deassert		= es7000_wait_for_init_deassert,
+
+	/* Nothing to do for most platforms, since cleared by the INIT cycle: */
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= default_inquire_remote_apic,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index bc7ac4da90d7..00e6071cefc4 100644
--- a/arch/x86/kernel/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1,7 +1,7 @@
 /*
  *	Intel IO-APIC support for multi-Pentium hosts.
  *
- *	Copyright (C) 1997, 1998, 1999, 2000 Ingo Molnar, Hajnalka Szabo
+ *	Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
  *
  *	Many thanks to Stig Venaas for trying out countless experimental
  *	patches and reporting/debugging problems patiently!
@@ -46,6 +46,7 @@
 #include <asm/idle.h>
 #include <asm/io.h>
 #include <asm/smp.h>
+#include <asm/cpu.h>
 #include <asm/desc.h>
 #include <asm/proto.h>
 #include <asm/acpi.h>
@@ -61,9 +62,7 @@
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_irq.h>
 
-#include <mach_ipi.h>
-#include <mach_apic.h>
-#include <mach_apicdef.h>
+#include <asm/apic.h>
 
 #define __apicdebuginit(type) static type __init
 
@@ -82,11 +81,11 @@ static DEFINE_SPINLOCK(vector_lock);
 int nr_ioapic_registers[MAX_IO_APICS];
 
 /* I/O APIC entries */
-struct mp_config_ioapic mp_ioapics[MAX_IO_APICS];
+struct mpc_ioapic mp_ioapics[MAX_IO_APICS];
 int nr_ioapics;
 
 /* MP IRQ source entries */
-struct mp_config_intsrc mp_irqs[MAX_IRQ_SOURCES];
+struct mpc_intsrc mp_irqs[MAX_IRQ_SOURCES];
 
 /* # of MP IRQ source entries */
 int mp_irq_entries;
@@ -99,10 +98,19 @@ DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
 
 int skip_ioapic_setup;
 
+void arch_disable_smp_support(void)
+{
+#ifdef CONFIG_PCI
+	noioapicquirk = 1;
+	noioapicreroute = -1;
+#endif
+	skip_ioapic_setup = 1;
+}
+
 static int __init parse_noapic(char *str)
 {
 	/* disable IO-APIC */
-	disable_ioapic_setup();
+	arch_disable_smp_support();
 	return 0;
 }
 early_param("noapic", parse_noapic);
@@ -356,7 +364,7 @@ set_extra_move_desc(struct irq_desc *desc, const struct cpumask *mask)
 
 	if (!cfg->move_in_progress) {
 		/* it means that domain is not changed */
-		if (!cpumask_intersects(&desc->affinity, mask))
+		if (!cpumask_intersects(desc->affinity, mask))
 			cfg->move_desc_pending = 1;
 	}
 }
@@ -386,7 +394,7 @@ struct io_apic {
 static __attribute_const__ struct io_apic __iomem *io_apic_base(int idx)
 {
 	return (void __iomem *) __fix_to_virt(FIX_IO_APIC_BASE_0 + idx)
-		+ (mp_ioapics[idx].mp_apicaddr & ~PAGE_MASK);
+		+ (mp_ioapics[idx].apicaddr & ~PAGE_MASK);
 }
 
 static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
@@ -478,7 +486,7 @@ __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 	io_apic_write(apic, 0x10 + 2*pin, eu.w1);
 }
 
-static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
+void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&ioapic_lock, flags);
@@ -513,11 +521,11 @@ static void send_cleanup_vector(struct irq_cfg *cfg)
 		for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
 			cfg->move_cleanup_count++;
 		for_each_cpu_and(i, cfg->old_domain, cpu_online_mask)
-			send_IPI_mask(cpumask_of(i), IRQ_MOVE_CLEANUP_VECTOR);
+			apic->send_IPI_mask(cpumask_of(i), IRQ_MOVE_CLEANUP_VECTOR);
 	} else {
 		cpumask_and(cleanup_mask, cfg->old_domain, cpu_online_mask);
 		cfg->move_cleanup_count = cpumask_weight(cleanup_mask);
-		send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+		apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
 		free_cpumask_var(cleanup_mask);
 	}
 	cfg->move_in_progress = 0;
@@ -562,8 +570,9 @@ static int
 assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask);
 
 /*
- * Either sets desc->affinity to a valid value, and returns cpu_mask_to_apicid
- * of that, or returns BAD_APICID and leaves desc->affinity untouched.
+ * Either sets desc->affinity to a valid value, and returns
+ * ->cpu_mask_to_apicid of that, or returns BAD_APICID and
+ * leaves desc->affinity untouched.
  */
 static unsigned int
 set_desc_affinity(struct irq_desc *desc, const struct cpumask *mask)
@@ -579,9 +588,10 @@ set_desc_affinity(struct irq_desc *desc, const struct cpumask *mask)
 	if (assign_irq_vector(irq, cfg, mask))
 		return BAD_APICID;
 
-	cpumask_and(&desc->affinity, cfg->domain, mask);
+	cpumask_and(desc->affinity, cfg->domain, mask);
 	set_extra_move_desc(desc, mask);
-	return cpu_mask_to_apicid_and(&desc->affinity, cpu_online_mask);
+
+	return apic->cpu_mask_to_apicid_and(desc->affinity, cpu_online_mask);
 }
 
 static void
@@ -796,23 +806,6 @@ static void clear_IO_APIC (void)
 			clear_IO_APIC_pin(apic, pin);
 }
 
-#if !defined(CONFIG_SMP) && defined(CONFIG_X86_32)
-void send_IPI_self(int vector)
-{
-	unsigned int cfg;
-
-	/*
-	 * Wait for idle.
-	 */
-	apic_wait_icr_idle();
-	cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
-	/*
-	 * Send the IPI. The write to APIC_ICR fires this off.
-	 */
-	apic_write(APIC_ICR, cfg);
-}
-#endif /* !CONFIG_SMP && CONFIG_X86_32*/
-
 #ifdef CONFIG_X86_32
 /*
  * support for broken MP BIOSs, enables hand-redirection of PIRQ0-7 to
@@ -820,8 +813,9 @@ void send_IPI_self(int vector)
  */
 
 #define MAX_PIRQS 8
-static int pirq_entries [MAX_PIRQS];
-static int pirqs_enabled;
+static int pirq_entries[MAX_PIRQS] = {
+	[0 ... MAX_PIRQS - 1] = -1
+};
 
 static int __init ioapic_pirq_setup(char *str)
 {
@@ -830,10 +824,6 @@ static int __init ioapic_pirq_setup(char *str)
 
 	get_options(str, ARRAY_SIZE(ints), ints);
 
-	for (i = 0; i < MAX_PIRQS; i++)
-		pirq_entries[i] = -1;
-
-	pirqs_enabled = 1;
 	apic_printk(APIC_VERBOSE, KERN_INFO
 			"PIRQ redirection, working around broken MP-BIOS.\n");
 	max = MAX_PIRQS;
@@ -944,10 +934,10 @@ static int find_irq_entry(int apic, int pin, int type)
 	int i;
 
 	for (i = 0; i < mp_irq_entries; i++)
-		if (mp_irqs[i].mp_irqtype == type &&
-		    (mp_irqs[i].mp_dstapic == mp_ioapics[apic].mp_apicid ||
-		     mp_irqs[i].mp_dstapic == MP_APIC_ALL) &&
-		    mp_irqs[i].mp_dstirq == pin)
+		if (mp_irqs[i].irqtype == type &&
+		    (mp_irqs[i].dstapic == mp_ioapics[apic].apicid ||
+		     mp_irqs[i].dstapic == MP_APIC_ALL) &&
+		    mp_irqs[i].dstirq == pin)
 			return i;
 
 	return -1;
@@ -961,13 +951,13 @@ static int __init find_isa_irq_pin(int irq, int type)
 	int i;
 
 	for (i = 0; i < mp_irq_entries; i++) {
-		int lbus = mp_irqs[i].mp_srcbus;
+		int lbus = mp_irqs[i].srcbus;
 
 		if (test_bit(lbus, mp_bus_not_pci) &&
-		    (mp_irqs[i].mp_irqtype == type) &&
-		    (mp_irqs[i].mp_srcbusirq == irq))
+		    (mp_irqs[i].irqtype == type) &&
+		    (mp_irqs[i].srcbusirq == irq))
 
-			return mp_irqs[i].mp_dstirq;
+			return mp_irqs[i].dstirq;
 	}
 	return -1;
 }
@@ -977,17 +967,17 @@ static int __init find_isa_irq_apic(int irq, int type)
 	int i;
 
 	for (i = 0; i < mp_irq_entries; i++) {
-		int lbus = mp_irqs[i].mp_srcbus;
+		int lbus = mp_irqs[i].srcbus;
 
 		if (test_bit(lbus, mp_bus_not_pci) &&
-		    (mp_irqs[i].mp_irqtype == type) &&
-		    (mp_irqs[i].mp_srcbusirq == irq))
+		    (mp_irqs[i].irqtype == type) &&
+		    (mp_irqs[i].srcbusirq == irq))
 			break;
 	}
 	if (i < mp_irq_entries) {
 		int apic;
 		for(apic = 0; apic < nr_ioapics; apic++) {
-			if (mp_ioapics[apic].mp_apicid == mp_irqs[i].mp_dstapic)
+			if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic)
 				return apic;
 		}
 	}
@@ -1012,23 +1002,23 @@ int IO_APIC_get_PCI_irq_vector(int bus, int slot, int pin)
 		return -1;
 	}
 	for (i = 0; i < mp_irq_entries; i++) {
-		int lbus = mp_irqs[i].mp_srcbus;
+		int lbus = mp_irqs[i].srcbus;
 
 		for (apic = 0; apic < nr_ioapics; apic++)
-			if (mp_ioapics[apic].mp_apicid == mp_irqs[i].mp_dstapic ||
-			    mp_irqs[i].mp_dstapic == MP_APIC_ALL)
+			if (mp_ioapics[apic].apicid == mp_irqs[i].dstapic ||
+			    mp_irqs[i].dstapic == MP_APIC_ALL)
 				break;
 
 		if (!test_bit(lbus, mp_bus_not_pci) &&
-		    !mp_irqs[i].mp_irqtype &&
+		    !mp_irqs[i].irqtype &&
 		    (bus == lbus) &&
-		    (slot == ((mp_irqs[i].mp_srcbusirq >> 2) & 0x1f))) {
-			int irq = pin_2_irq(i,apic,mp_irqs[i].mp_dstirq);
+		    (slot == ((mp_irqs[i].srcbusirq >> 2) & 0x1f))) {
+			int irq = pin_2_irq(i, apic, mp_irqs[i].dstirq);
 
 			if (!(apic || IO_APIC_IRQ(irq)))
 				continue;
 
-			if (pin == (mp_irqs[i].mp_srcbusirq & 3))
+			if (pin == (mp_irqs[i].srcbusirq & 3))
 				return irq;
 			/*
 			 * Use the first all-but-pin matching entry as a
@@ -1071,7 +1061,7 @@ static int EISA_ELCR(unsigned int irq)
  * EISA conforming in the MP table, that means its trigger type must
  * be read in from the ELCR */
 
-#define default_EISA_trigger(idx)	(EISA_ELCR(mp_irqs[idx].mp_srcbusirq))
+#define default_EISA_trigger(idx)	(EISA_ELCR(mp_irqs[idx].srcbusirq))
 #define default_EISA_polarity(idx)	default_ISA_polarity(idx)
 
 /* PCI interrupts are always polarity one level triggered,
@@ -1088,13 +1078,13 @@ static int EISA_ELCR(unsigned int irq)
 
 static int MPBIOS_polarity(int idx)
 {
-	int bus = mp_irqs[idx].mp_srcbus;
+	int bus = mp_irqs[idx].srcbus;
 	int polarity;
 
 	/*
 	 * Determine IRQ line polarity (high active or low active):
 	 */
-	switch (mp_irqs[idx].mp_irqflag & 3)
+	switch (mp_irqs[idx].irqflag & 3)
 	{
 		case 0: /* conforms, ie. bus-type dependent polarity */
 			if (test_bit(bus, mp_bus_not_pci))
@@ -1130,13 +1120,13 @@ static int MPBIOS_polarity(int idx)
 
 static int MPBIOS_trigger(int idx)
 {
-	int bus = mp_irqs[idx].mp_srcbus;
+	int bus = mp_irqs[idx].srcbus;
 	int trigger;
 
 	/*
 	 * Determine IRQ trigger mode (edge or level sensitive):
 	 */
-	switch ((mp_irqs[idx].mp_irqflag>>2) & 3)
+	switch ((mp_irqs[idx].irqflag>>2) & 3)
 	{
 		case 0: /* conforms, ie. bus-type dependent */
 			if (test_bit(bus, mp_bus_not_pci))
@@ -1214,16 +1204,16 @@ int (*ioapic_renumber_irq)(int ioapic, int irq);
 static int pin_2_irq(int idx, int apic, int pin)
 {
 	int irq, i;
-	int bus = mp_irqs[idx].mp_srcbus;
+	int bus = mp_irqs[idx].srcbus;
 
 	/*
 	 * Debugging check, we are in big trouble if this message pops up!
 	 */
-	if (mp_irqs[idx].mp_dstirq != pin)
+	if (mp_irqs[idx].dstirq != pin)
 		printk(KERN_ERR "broken BIOS or MPTABLE parser, ayiee!!\n");
 
 	if (test_bit(bus, mp_bus_not_pci)) {
-		irq = mp_irqs[idx].mp_srcbusirq;
+		irq = mp_irqs[idx].srcbusirq;
 	} else {
 		/*
 		 * PCI IRQs are mapped in order
@@ -1315,7 +1305,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
 		int new_cpu;
 		int vector, offset;
 
-		vector_allocation_domain(cpu, tmp_mask);
+		apic->vector_allocation_domain(cpu, tmp_mask);
 
 		vector = current_vector;
 		offset = current_offset;
@@ -1485,10 +1475,10 @@ static void ioapic_register_intr(int irq, struct irq_desc *desc, unsigned long t
 					      handle_edge_irq, "edge");
 }
 
-static int setup_ioapic_entry(int apic, int irq,
-			      struct IO_APIC_route_entry *entry,
-			      unsigned int destination, int trigger,
-			      int polarity, int vector)
+int setup_ioapic_entry(int apic_id, int irq,
+		       struct IO_APIC_route_entry *entry,
+		       unsigned int destination, int trigger,
+		       int polarity, int vector)
 {
 	/*
 	 * add it to the IO-APIC irq-routing table:
@@ -1497,25 +1487,25 @@ static int setup_ioapic_entry(int apic, int irq,
 
 #ifdef CONFIG_INTR_REMAP
 	if (intr_remapping_enabled) {
-		struct intel_iommu *iommu = map_ioapic_to_ir(apic);
+		struct intel_iommu *iommu = map_ioapic_to_ir(apic_id);
 		struct irte irte;
 		struct IR_IO_APIC_route_entry *ir_entry =
 			(struct IR_IO_APIC_route_entry *) entry;
 		int index;
 
 		if (!iommu)
-			panic("No mapping iommu for ioapic %d\n", apic);
+			panic("No mapping iommu for ioapic %d\n", apic_id);
 
 		index = alloc_irte(iommu, irq, 1);
 		if (index < 0)
-			panic("Failed to allocate IRTE for ioapic %d\n", apic);
+			panic("Failed to allocate IRTE for ioapic %d\n", apic_id);
 
 		memset(&irte, 0, sizeof(irte));
 
 		irte.present = 1;
-		irte.dst_mode = INT_DEST_MODE;
+		irte.dst_mode = apic->irq_dest_mode;
 		irte.trigger_mode = trigger;
-		irte.dlvry_mode = INT_DELIVERY_MODE;
+		irte.dlvry_mode = apic->irq_delivery_mode;
 		irte.vector = vector;
 		irte.dest_id = IRTE_DEST(destination);
 
@@ -1528,8 +1518,8 @@ static int setup_ioapic_entry(int apic, int irq,
 	} else
 #endif
 	{
-		entry->delivery_mode = INT_DELIVERY_MODE;
-		entry->dest_mode = INT_DEST_MODE;
+		entry->delivery_mode = apic->irq_delivery_mode;
+		entry->dest_mode = apic->irq_dest_mode;
 		entry->dest = destination;
 	}
 
@@ -1546,7 +1536,7 @@ static int setup_ioapic_entry(int apic, int irq,
 	return 0;
 }
 
-static void setup_IO_APIC_irq(int apic, int pin, unsigned int irq, struct irq_desc *desc,
+static void setup_IO_APIC_irq(int apic_id, int pin, unsigned int irq, struct irq_desc *desc,
 			      int trigger, int polarity)
 {
 	struct irq_cfg *cfg;
@@ -1558,22 +1548,22 @@ static void setup_IO_APIC_irq(int apic, int pin, unsigned int irq, struct irq_de
 
 	cfg = desc->chip_data;
 
-	if (assign_irq_vector(irq, cfg, TARGET_CPUS))
+	if (assign_irq_vector(irq, cfg, apic->target_cpus()))
 		return;
 
-	dest = cpu_mask_to_apicid_and(cfg->domain, TARGET_CPUS);
+	dest = apic->cpu_mask_to_apicid_and(cfg->domain, apic->target_cpus());
 
 	apic_printk(APIC_VERBOSE,KERN_DEBUG
 		    "IOAPIC[%d]: Set routing entry (%d-%d -> 0x%x -> "
 		    "IRQ %d Mode:%i Active:%i)\n",
-		    apic, mp_ioapics[apic].mp_apicid, pin, cfg->vector,
+		    apic_id, mp_ioapics[apic_id].apicid, pin, cfg->vector,
 		    irq, trigger, polarity);
 
 
-	if (setup_ioapic_entry(mp_ioapics[apic].mp_apicid, irq, &entry,
+	if (setup_ioapic_entry(mp_ioapics[apic_id].apicid, irq, &entry,
 			       dest, trigger, polarity, cfg->vector)) {
 		printk("Failed to setup ioapic entry for ioapic  %d, pin %d\n",
-		       mp_ioapics[apic].mp_apicid, pin);
+		       mp_ioapics[apic_id].apicid, pin);
 		__clear_irq_vector(irq, cfg);
 		return;
 	}
@@ -1582,12 +1572,12 @@ static void setup_IO_APIC_irq(int apic, int pin, unsigned int irq, struct irq_de
 	if (irq < NR_IRQS_LEGACY)
 		disable_8259A_irq(irq);
 
-	ioapic_write_entry(apic, pin, entry);
+	ioapic_write_entry(apic_id, pin, entry);
 }
 
 static void __init setup_IO_APIC_irqs(void)
 {
-	int apic, pin, idx, irq;
+	int apic_id, pin, idx, irq;
 	int notcon = 0;
 	struct irq_desc *desc;
 	struct irq_cfg *cfg;
@@ -1595,21 +1585,19 @@ static void __init setup_IO_APIC_irqs(void)
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
 
-	for (apic = 0; apic < nr_ioapics; apic++) {
-		for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+	for (apic_id = 0; apic_id < nr_ioapics; apic_id++) {
+		for (pin = 0; pin < nr_ioapic_registers[apic_id]; pin++) {
 
-			idx = find_irq_entry(apic, pin, mp_INT);
+			idx = find_irq_entry(apic_id, pin, mp_INT);
 			if (idx == -1) {
 				if (!notcon) {
 					notcon = 1;
 					apic_printk(APIC_VERBOSE,
 						KERN_DEBUG " %d-%d",
-						mp_ioapics[apic].mp_apicid,
-						pin);
+						mp_ioapics[apic_id].apicid, pin);
 				} else
 					apic_printk(APIC_VERBOSE, " %d-%d",
-						mp_ioapics[apic].mp_apicid,
-						pin);
+						mp_ioapics[apic_id].apicid, pin);
 				continue;
 			}
 			if (notcon) {
@@ -1618,20 +1606,25 @@ static void __init setup_IO_APIC_irqs(void)
 				notcon = 0;
 			}
 
-			irq = pin_2_irq(idx, apic, pin);
-#ifdef CONFIG_X86_32
-			if (multi_timer_check(apic, irq))
+			irq = pin_2_irq(idx, apic_id, pin);
+
+			/*
+			 * Skip the timer IRQ if there's a quirk handler
+			 * installed and if it returns 1:
+			 */
+			if (apic->multi_timer_check &&
+					apic->multi_timer_check(apic_id, irq))
 				continue;
-#endif
+
 			desc = irq_to_desc_alloc_cpu(irq, cpu);
 			if (!desc) {
 				printk(KERN_INFO "can not get irq_desc for %d\n", irq);
 				continue;
 			}
 			cfg = desc->chip_data;
-			add_pin_to_irq_cpu(cfg, cpu, apic, pin);
+			add_pin_to_irq_cpu(cfg, cpu, apic_id, pin);
 
-			setup_IO_APIC_irq(apic, pin, irq, desc,
+			setup_IO_APIC_irq(apic_id, pin, irq, desc,
 					irq_trigger(idx), irq_polarity(idx));
 		}
 	}
@@ -1644,7 +1637,7 @@ static void __init setup_IO_APIC_irqs(void)
 /*
  * Set up the timer pin, possibly with the 8259A-master behind.
  */
-static void __init setup_timer_IRQ0_pin(unsigned int apic, unsigned int pin,
+static void __init setup_timer_IRQ0_pin(unsigned int apic_id, unsigned int pin,
 					int vector)
 {
 	struct IO_APIC_route_entry entry;
@@ -1660,10 +1653,10 @@ static void __init setup_timer_IRQ0_pin(unsigned int apic, unsigned int pin,
 	 * We use logical delivery to get the timer IRQ
 	 * to the first CPU.
 	 */
-	entry.dest_mode = INT_DEST_MODE;
-	entry.mask = 1;					/* mask IRQ now */
-	entry.dest = cpu_mask_to_apicid(TARGET_CPUS);
-	entry.delivery_mode = INT_DELIVERY_MODE;
+	entry.dest_mode = apic->irq_dest_mode;
+	entry.mask = 0;			/* don't mask IRQ for edge */
+	entry.dest = apic->cpu_mask_to_apicid(apic->target_cpus());
+	entry.delivery_mode = apic->irq_delivery_mode;
 	entry.polarity = 0;
 	entry.trigger = 0;
 	entry.vector = vector;
@@ -1677,7 +1670,7 @@ static void __init setup_timer_IRQ0_pin(unsigned int apic, unsigned int pin,
 	/*
 	 * Add it to the IO-APIC irq-routing table:
 	 */
-	ioapic_write_entry(apic, pin, entry);
+	ioapic_write_entry(apic_id, pin, entry);
 }
 
 
@@ -1699,7 +1692,7 @@ __apicdebuginit(void) print_IO_APIC(void)
 	printk(KERN_DEBUG "number of MP IRQ sources: %d.\n", mp_irq_entries);
 	for (i = 0; i < nr_ioapics; i++)
 		printk(KERN_DEBUG "number of IO-APIC #%d registers: %d.\n",
-		       mp_ioapics[i].mp_apicid, nr_ioapic_registers[i]);
+		       mp_ioapics[i].apicid, nr_ioapic_registers[i]);
 
 	/*
 	 * We are a bit conservative about what we expect.  We have to
@@ -1719,7 +1712,7 @@ __apicdebuginit(void) print_IO_APIC(void)
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 
 	printk("\n");
-	printk(KERN_DEBUG "IO APIC #%d......\n", mp_ioapics[apic].mp_apicid);
+	printk(KERN_DEBUG "IO APIC #%d......\n", mp_ioapics[apic].apicid);
 	printk(KERN_DEBUG ".... register #00: %08X\n", reg_00.raw);
 	printk(KERN_DEBUG ".......    : physical APIC id: %02X\n", reg_00.bits.ID);
 	printk(KERN_DEBUG ".......    : Delivery Type: %X\n", reg_00.bits.delivery_type);
@@ -1980,13 +1973,6 @@ void __init enable_IO_APIC(void)
 	int apic;
 	unsigned long flags;
 
-#ifdef CONFIG_X86_32
-	int i;
-	if (!pirqs_enabled)
-		for (i = 0; i < MAX_PIRQS; i++)
-			pirq_entries[i] = -1;
-#endif
-
 	/*
 	 * The number of IO-APIC IRQ registers (== #pins):
 	 */
@@ -2090,7 +2076,7 @@ static void __init setup_ioapic_ids_from_mpc(void)
 {
 	union IO_APIC_reg_00 reg_00;
 	physid_mask_t phys_id_present_map;
-	int apic;
+	int apic_id;
 	int i;
 	unsigned char old_id;
 	unsigned long flags;
@@ -2109,26 +2095,26 @@ static void __init setup_ioapic_ids_from_mpc(void)
 	 * This is broken; anything with a real cpu count has to
 	 * circumvent this idiocy regardless.
 	 */
-	phys_id_present_map = ioapic_phys_id_map(phys_cpu_present_map);
+	phys_id_present_map = apic->ioapic_phys_id_map(phys_cpu_present_map);
 
 	/*
 	 * Set the IOAPIC ID to the value stored in the MPC table.
 	 */
-	for (apic = 0; apic < nr_ioapics; apic++) {
+	for (apic_id = 0; apic_id < nr_ioapics; apic_id++) {
 
 		/* Read the register 0 value */
 		spin_lock_irqsave(&ioapic_lock, flags);
-		reg_00.raw = io_apic_read(apic, 0);
+		reg_00.raw = io_apic_read(apic_id, 0);
 		spin_unlock_irqrestore(&ioapic_lock, flags);
 
-		old_id = mp_ioapics[apic].mp_apicid;
+		old_id = mp_ioapics[apic_id].apicid;
 
-		if (mp_ioapics[apic].mp_apicid >= get_physical_broadcast()) {
+		if (mp_ioapics[apic_id].apicid >= get_physical_broadcast()) {
 			printk(KERN_ERR "BIOS bug, IO-APIC#%d ID is %d in the MPC table!...\n",
-				apic, mp_ioapics[apic].mp_apicid);
+				apic_id, mp_ioapics[apic_id].apicid);
 			printk(KERN_ERR "... fixing up to %d. (tell your hw vendor)\n",
 				reg_00.bits.ID);
-			mp_ioapics[apic].mp_apicid = reg_00.bits.ID;
+			mp_ioapics[apic_id].apicid = reg_00.bits.ID;
 		}
 
 		/*
@@ -2136,10 +2122,10 @@ static void __init setup_ioapic_ids_from_mpc(void)
 		 * system must have a unique ID or we get lots of nice
 		 * 'stuck on smp_invalidate_needed IPI wait' messages.
 		 */
-		if (check_apicid_used(phys_id_present_map,
-					mp_ioapics[apic].mp_apicid)) {
+		if (apic->check_apicid_used(phys_id_present_map,
+					mp_ioapics[apic_id].apicid)) {
 			printk(KERN_ERR "BIOS bug, IO-APIC#%d ID %d is already used!...\n",
-				apic, mp_ioapics[apic].mp_apicid);
+				apic_id, mp_ioapics[apic_id].apicid);
 			for (i = 0; i < get_physical_broadcast(); i++)
 				if (!physid_isset(i, phys_id_present_map))
 					break;
@@ -2148,13 +2134,13 @@ static void __init setup_ioapic_ids_from_mpc(void)
 			printk(KERN_ERR "... fixing up to %d. (tell your hw vendor)\n",
 				i);
 			physid_set(i, phys_id_present_map);
-			mp_ioapics[apic].mp_apicid = i;
+			mp_ioapics[apic_id].apicid = i;
 		} else {
 			physid_mask_t tmp;
-			tmp = apicid_to_cpu_present(mp_ioapics[apic].mp_apicid);
+			tmp = apic->apicid_to_cpu_present(mp_ioapics[apic_id].apicid);
 			apic_printk(APIC_VERBOSE, "Setting %d in the "
 					"phys_id_present_map\n",
-					mp_ioapics[apic].mp_apicid);
+					mp_ioapics[apic_id].apicid);
 			physids_or(phys_id_present_map, phys_id_present_map, tmp);
 		}
 
@@ -2163,11 +2149,11 @@ static void __init setup_ioapic_ids_from_mpc(void)
 		 * We need to adjust the IRQ routing table
 		 * if the ID changed.
 		 */
-		if (old_id != mp_ioapics[apic].mp_apicid)
+		if (old_id != mp_ioapics[apic_id].apicid)
 			for (i = 0; i < mp_irq_entries; i++)
-				if (mp_irqs[i].mp_dstapic == old_id)
-					mp_irqs[i].mp_dstapic
-						= mp_ioapics[apic].mp_apicid;
+				if (mp_irqs[i].dstapic == old_id)
+					mp_irqs[i].dstapic
+						= mp_ioapics[apic_id].apicid;
 
 		/*
 		 * Read the right value from the MPC table and
@@ -2175,20 +2161,20 @@ static void __init setup_ioapic_ids_from_mpc(void)
 		 */
 		apic_printk(APIC_VERBOSE, KERN_INFO
 			"...changing IO-APIC physical APIC ID to %d ...",
-			mp_ioapics[apic].mp_apicid);
+			mp_ioapics[apic_id].apicid);
 
-		reg_00.bits.ID = mp_ioapics[apic].mp_apicid;
+		reg_00.bits.ID = mp_ioapics[apic_id].apicid;
 		spin_lock_irqsave(&ioapic_lock, flags);
-		io_apic_write(apic, 0, reg_00.raw);
+		io_apic_write(apic_id, 0, reg_00.raw);
 		spin_unlock_irqrestore(&ioapic_lock, flags);
 
 		/*
 		 * Sanity check
 		 */
 		spin_lock_irqsave(&ioapic_lock, flags);
-		reg_00.raw = io_apic_read(apic, 0);
+		reg_00.raw = io_apic_read(apic_id, 0);
 		spin_unlock_irqrestore(&ioapic_lock, flags);
-		if (reg_00.bits.ID != mp_ioapics[apic].mp_apicid)
+		if (reg_00.bits.ID != mp_ioapics[apic_id].apicid)
 			printk("could not set ID!\n");
 		else
 			apic_printk(APIC_VERBOSE, " ok.\n");
@@ -2291,7 +2277,7 @@ static int ioapic_retrigger_irq(unsigned int irq)
 	unsigned long flags;
 
 	spin_lock_irqsave(&vector_lock, flags);
-	send_IPI_mask(cpumask_of(cpumask_first(cfg->domain)), cfg->vector);
+	apic->send_IPI_mask(cpumask_of(cpumask_first(cfg->domain)), cfg->vector);
 	spin_unlock_irqrestore(&vector_lock, flags);
 
 	return 1;
@@ -2299,7 +2285,7 @@ static int ioapic_retrigger_irq(unsigned int irq)
 #else
 static int ioapic_retrigger_irq(unsigned int irq)
 {
-	send_IPI_self(irq_cfg(irq)->vector);
+	apic->send_IPI_self(irq_cfg(irq)->vector);
 
 	return 1;
 }
@@ -2363,7 +2349,7 @@ migrate_ioapic_irq_desc(struct irq_desc *desc, const struct cpumask *mask)
 
 	set_extra_move_desc(desc, mask);
 
-	dest = cpu_mask_to_apicid_and(cfg->domain, mask);
+	dest = apic->cpu_mask_to_apicid_and(cfg->domain, mask);
 
 	modify_ioapic_rte = desc->status & IRQ_LEVEL;
 	if (modify_ioapic_rte) {
@@ -2383,7 +2369,7 @@ migrate_ioapic_irq_desc(struct irq_desc *desc, const struct cpumask *mask)
 	if (cfg->move_in_progress)
 		send_cleanup_vector(cfg);
 
-	cpumask_copy(&desc->affinity, mask);
+	cpumask_copy(desc->affinity, mask);
 }
 
 static int migrate_irq_remapped_level_desc(struct irq_desc *desc)
@@ -2405,11 +2391,11 @@ static int migrate_irq_remapped_level_desc(struct irq_desc *desc)
 	}
 
 	/* everthing is clear. we have right of way */
-	migrate_ioapic_irq_desc(desc, &desc->pending_mask);
+	migrate_ioapic_irq_desc(desc, desc->pending_mask);
 
 	ret = 0;
 	desc->status &= ~IRQ_MOVE_PENDING;
-	cpumask_clear(&desc->pending_mask);
+	cpumask_clear(desc->pending_mask);
 
 unmask:
 	unmask_IO_APIC_irq_desc(desc);
@@ -2434,7 +2420,7 @@ static void ir_irq_migration(struct work_struct *work)
 				continue;
 			}
 
-			desc->chip->set_affinity(irq, &desc->pending_mask);
+			desc->chip->set_affinity(irq, desc->pending_mask);
 			spin_unlock_irqrestore(&desc->lock, flags);
 		}
 	}
@@ -2448,7 +2434,7 @@ static void set_ir_ioapic_affinity_irq_desc(struct irq_desc *desc,
 {
 	if (desc->status & IRQ_LEVEL) {
 		desc->status |= IRQ_MOVE_PENDING;
-		cpumask_copy(&desc->pending_mask, mask);
+		cpumask_copy(desc->pending_mask, mask);
 		migrate_irq_remapped_level_desc(desc);
 		return;
 	}
@@ -2516,7 +2502,7 @@ static void irq_complete_move(struct irq_desc **descp)
 
 		/* domain has not changed, but affinity did */
 		me = smp_processor_id();
-		if (cpu_isset(me, desc->affinity)) {
+		if (cpumask_test_cpu(me, desc->affinity)) {
 			*descp = desc = move_irq_desc(desc, me);
 			/* get the new one */
 			cfg = desc->chip_data;
@@ -2867,19 +2853,15 @@ static inline void __init check_timer(void)
 	int cpu = boot_cpu_id;
 	int apic1, pin1, apic2, pin2;
 	unsigned long flags;
-	unsigned int ver;
 	int no_pin1 = 0;
 
 	local_irq_save(flags);
 
-	ver = apic_read(APIC_LVR);
-	ver = GET_APIC_VERSION(ver);
-
 	/*
 	 * get/set the timer IRQ vector:
 	 */
 	disable_8259A_irq(0);
-	assign_irq_vector(0, cfg, TARGET_CPUS);
+	assign_irq_vector(0, cfg, apic->target_cpus());
 
 	/*
 	 * As IRQ0 is to be enabled in the 8259A, the virtual
@@ -2893,7 +2875,13 @@ static inline void __init check_timer(void)
 	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_EXTINT);
 	init_8259A(1);
 #ifdef CONFIG_X86_32
-	timer_ack = (nmi_watchdog == NMI_IO_APIC && !APIC_INTEGRATED(ver));
+	{
+		unsigned int ver;
+
+		ver = apic_read(APIC_LVR);
+		ver = GET_APIC_VERSION(ver);
+		timer_ack = (nmi_watchdog == NMI_IO_APIC && !APIC_INTEGRATED(ver));
+	}
 #endif
 
 	pin1  = find_isa_irq_pin(0, mp_INT);
@@ -2932,8 +2920,17 @@ static inline void __init check_timer(void)
 		if (no_pin1) {
 			add_pin_to_irq_cpu(cfg, cpu, apic1, pin1);
 			setup_timer_IRQ0_pin(apic1, pin1, cfg->vector);
+		} else {
+			/* for edge trigger, setup_IO_APIC_irq already
+			 * leave it unmasked.
+			 * so only need to unmask if it is level-trigger
+			 * do we really have level trigger timer?
+			 */
+			int idx;
+			idx = find_irq_entry(apic1, pin1, mp_INT);
+			if (idx != -1 && irq_trigger(idx))
+				unmask_IO_APIC_irq_desc(desc);
 		}
-		unmask_IO_APIC_irq_desc(desc);
 		if (timer_irq_works()) {
 			if (nmi_watchdog == NMI_IO_APIC) {
 				setup_nmi();
@@ -2947,6 +2944,7 @@ static inline void __init check_timer(void)
 		if (intr_remapping_enabled)
 			panic("timer doesn't work through Interrupt-remapped IO-APIC");
 #endif
+		local_irq_disable();
 		clear_IO_APIC_pin(apic1, pin1);
 		if (!no_pin1)
 			apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2961,7 +2959,6 @@ static inline void __init check_timer(void)
 		 */
 		replace_pin_at_irq_cpu(cfg, cpu, apic1, pin1, apic2, pin2);
 		setup_timer_IRQ0_pin(apic2, pin2, cfg->vector);
-		unmask_IO_APIC_irq_desc(desc);
 		enable_8259A_irq(0);
 		if (timer_irq_works()) {
 			apic_printk(APIC_QUIET, KERN_INFO "....... works.\n");
@@ -2976,6 +2973,7 @@ static inline void __init check_timer(void)
 		/*
 		 * Cleanup, just in case ...
 		 */
+		local_irq_disable();
 		disable_8259A_irq(0);
 		clear_IO_APIC_pin(apic2, pin2);
 		apic_printk(APIC_QUIET, KERN_INFO "....... failed.\n");
@@ -3001,6 +2999,7 @@ static inline void __init check_timer(void)
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
+	local_irq_disable();
 	disable_8259A_irq(0);
 	apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed.\n");
@@ -3018,6 +3017,7 @@ static inline void __init check_timer(void)
 		apic_printk(APIC_QUIET, KERN_INFO "..... works.\n");
 		goto out;
 	}
+	local_irq_disable();
 	apic_printk(APIC_QUIET, KERN_INFO "..... failed :(.\n");
 	panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
 		"report.  Then try booting with the 'noapic' option.\n");
@@ -3047,13 +3047,9 @@ out:
 void __init setup_IO_APIC(void)
 {
 
-#ifdef CONFIG_X86_32
-	enable_IO_APIC();
-#else
 	/*
 	 * calling enable_IO_APIC() is moved to setup_local_APIC for BP
 	 */
-#endif
 
 	io_apic_irqs = ~PIC_IRQS;
 
@@ -3118,8 +3114,8 @@ static int ioapic_resume(struct sys_device *dev)
 
 	spin_lock_irqsave(&ioapic_lock, flags);
 	reg_00.raw = io_apic_read(dev->id, 0);
-	if (reg_00.bits.ID != mp_ioapics[dev->id].mp_apicid) {
-		reg_00.bits.ID = mp_ioapics[dev->id].mp_apicid;
+	if (reg_00.bits.ID != mp_ioapics[dev->id].apicid) {
+		reg_00.bits.ID = mp_ioapics[dev->id].apicid;
 		io_apic_write(dev->id, 0, reg_00.raw);
 	}
 	spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -3169,6 +3165,7 @@ static int __init ioapic_init_sysfs(void)
 
 device_initcall(ioapic_init_sysfs);
 
+static int nr_irqs_gsi = NR_IRQS_LEGACY;
 /*
  * Dynamic irq allocate and deallocation
  */
@@ -3183,11 +3180,11 @@ unsigned int create_irq_nr(unsigned int irq_want)
 	struct irq_desc *desc_new = NULL;
 
 	irq = 0;
-	spin_lock_irqsave(&vector_lock, flags);
-	for (new = irq_want; new < NR_IRQS; new++) {
-		if (platform_legacy_irq(new))
-			continue;
+	if (irq_want < nr_irqs_gsi)
+		irq_want = nr_irqs_gsi;
 
+	spin_lock_irqsave(&vector_lock, flags);
+	for (new = irq_want; new < nr_irqs; new++) {
 		desc_new = irq_to_desc_alloc_cpu(new, cpu);
 		if (!desc_new) {
 			printk(KERN_INFO "can not get irq_desc for %d\n", new);
@@ -3197,7 +3194,7 @@ unsigned int create_irq_nr(unsigned int irq_want)
 
 		if (cfg_new->vector != 0)
 			continue;
-		if (__assign_irq_vector(new, cfg_new, TARGET_CPUS) == 0)
+		if (__assign_irq_vector(new, cfg_new, apic->target_cpus()) == 0)
 			irq = new;
 		break;
 	}
@@ -3212,7 +3209,6 @@ unsigned int create_irq_nr(unsigned int irq_want)
 	return irq;
 }
 
-static int nr_irqs_gsi = NR_IRQS_LEGACY;
 int create_irq(void)
 {
 	unsigned int irq_want;
@@ -3259,12 +3255,15 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms
 	int err;
 	unsigned dest;
 
+	if (disable_apic)
+		return -ENXIO;
+
 	cfg = irq_cfg(irq);
-	err = assign_irq_vector(irq, cfg, TARGET_CPUS);
+	err = assign_irq_vector(irq, cfg, apic->target_cpus());
 	if (err)
 		return err;
 
-	dest = cpu_mask_to_apicid_and(cfg->domain, TARGET_CPUS);
+	dest = apic->cpu_mask_to_apicid_and(cfg->domain, apic->target_cpus());
 
 #ifdef CONFIG_INTR_REMAP
 	if (irq_remapped(irq)) {
@@ -3278,9 +3277,9 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms
 		memset (&irte, 0, sizeof(irte));
 
 		irte.present = 1;
-		irte.dst_mode = INT_DEST_MODE;
+		irte.dst_mode = apic->irq_dest_mode;
 		irte.trigger_mode = 0; /* edge */
-		irte.dlvry_mode = INT_DELIVERY_MODE;
+		irte.dlvry_mode = apic->irq_delivery_mode;
 		irte.vector = cfg->vector;
 		irte.dest_id = IRTE_DEST(dest);
 
@@ -3298,10 +3297,10 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms
 		msg->address_hi = MSI_ADDR_BASE_HI;
 		msg->address_lo =
 			MSI_ADDR_BASE_LO |
-			((INT_DEST_MODE == 0) ?
+			((apic->irq_dest_mode == 0) ?
 				MSI_ADDR_DEST_MODE_PHYSICAL:
 				MSI_ADDR_DEST_MODE_LOGICAL) |
-			((INT_DELIVERY_MODE != dest_LowestPrio) ?
+			((apic->irq_delivery_mode != dest_LowestPrio) ?
 				MSI_ADDR_REDIRECTION_CPU:
 				MSI_ADDR_REDIRECTION_LOWPRI) |
 			MSI_ADDR_DEST_ID(dest);
@@ -3309,7 +3308,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms
 		msg->data =
 			MSI_DATA_TRIGGER_EDGE |
 			MSI_DATA_LEVEL_ASSERT |
-			((INT_DELIVERY_MODE != dest_LowestPrio) ?
+			((apic->irq_delivery_mode != dest_LowestPrio) ?
 				MSI_DATA_DELIVERY_FIXED:
 				MSI_DATA_DELIVERY_LOWPRI) |
 			MSI_DATA_VECTOR(cfg->vector);
@@ -3464,40 +3463,6 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
 	return 0;
 }
 
-int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc)
-{
-	unsigned int irq;
-	int ret;
-	unsigned int irq_want;
-
-	irq_want = nr_irqs_gsi;
-	irq = create_irq_nr(irq_want);
-	if (irq == 0)
-		return -1;
-
-#ifdef CONFIG_INTR_REMAP
-	if (!intr_remapping_enabled)
-		goto no_ir;
-
-	ret = msi_alloc_irte(dev, irq, 1);
-	if (ret < 0)
-		goto error;
-no_ir:
-#endif
-	ret = setup_msi_irq(dev, msidesc, irq);
-	if (ret < 0) {
-		destroy_irq(irq);
-		return ret;
-	}
-	return 0;
-
-#ifdef CONFIG_INTR_REMAP
-error:
-	destroy_irq(irq);
-	return ret;
-#endif
-}
-
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
 	unsigned int irq;
@@ -3514,9 +3479,9 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	sub_handle = 0;
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
 		irq = create_irq_nr(irq_want);
-		irq_want++;
 		if (irq == 0)
 			return -1;
+		irq_want = irq + 1;
 #ifdef CONFIG_INTR_REMAP
 		if (!intr_remapping_enabled)
 			goto no_ir;
@@ -3727,13 +3692,17 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
 	struct irq_cfg *cfg;
 	int err;
 
+	if (disable_apic)
+		return -ENXIO;
+
 	cfg = irq_cfg(irq);
-	err = assign_irq_vector(irq, cfg, TARGET_CPUS);
+	err = assign_irq_vector(irq, cfg, apic->target_cpus());
 	if (!err) {
 		struct ht_irq_msg msg;
 		unsigned dest;
 
-		dest = cpu_mask_to_apicid_and(cfg->domain, TARGET_CPUS);
+		dest = apic->cpu_mask_to_apicid_and(cfg->domain,
+						    apic->target_cpus());
 
 		msg.address_hi = HT_IRQ_HIGH_DEST_ID(dest);
 
@@ -3741,11 +3710,11 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
 			HT_IRQ_LOW_BASE |
 			HT_IRQ_LOW_DEST_ID(dest) |
 			HT_IRQ_LOW_VECTOR(cfg->vector) |
-			((INT_DEST_MODE == 0) ?
+			((apic->irq_dest_mode == 0) ?
 				HT_IRQ_LOW_DM_PHYSICAL :
 				HT_IRQ_LOW_DM_LOGICAL) |
 			HT_IRQ_LOW_RQEOI_EDGE |
-			((INT_DELIVERY_MODE != dest_LowestPrio) ?
+			((apic->irq_delivery_mode != dest_LowestPrio) ?
 				HT_IRQ_LOW_MT_FIXED :
 				HT_IRQ_LOW_MT_ARBITRATED) |
 			HT_IRQ_LOW_IRQ_MASKED;
@@ -3761,7 +3730,7 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
 }
 #endif /* CONFIG_HT_IRQ */
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_UV
 /*
  * Re-target the irq to the specified CPU and enable the specified MMR located
  * on the specified blade to allow the sending of MSIs to the specified CPU.
@@ -3793,12 +3762,12 @@ int arch_enable_uv_irq(char *irq_name, unsigned int irq, int cpu, int mmr_blade,
 	BUG_ON(sizeof(struct uv_IO_APIC_route_entry) != sizeof(unsigned long));
 
 	entry->vector = cfg->vector;
-	entry->delivery_mode = INT_DELIVERY_MODE;
-	entry->dest_mode = INT_DEST_MODE;
+	entry->delivery_mode = apic->irq_delivery_mode;
+	entry->dest_mode = apic->irq_dest_mode;
 	entry->polarity = 0;
 	entry->trigger = 0;
 	entry->mask = 0;
-	entry->dest = cpu_mask_to_apicid(eligible_cpu);
+	entry->dest = apic->cpu_mask_to_apicid(eligible_cpu);
 
 	mmr_pnode = uv_blade_to_pnode(mmr_blade);
 	uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
@@ -3861,6 +3830,28 @@ void __init probe_nr_irqs_gsi(void)
 	printk(KERN_DEBUG "nr_irqs_gsi: %d\n", nr_irqs_gsi);
 }
 
+#ifdef CONFIG_SPARSE_IRQ
+int __init arch_probe_nr_irqs(void)
+{
+	int nr;
+
+	if (nr_irqs > (NR_VECTORS * nr_cpu_ids))
+		nr_irqs = NR_VECTORS * nr_cpu_ids;
+
+	nr = nr_irqs_gsi + 8 * nr_cpu_ids;
+#if defined(CONFIG_PCI_MSI) || defined(CONFIG_HT_IRQ)
+	/*
+	 * for MSI and HT dyn irq
+	 */
+	nr += nr_irqs_gsi * 16;
+#endif
+	if (nr < nr_irqs)
+		nr_irqs = nr;
+
+	return 0;
+}
+#endif
+
 /* --------------------------------------------------------------------------
                           ACPI-based IOAPIC Configuration
    -------------------------------------------------------------------------- */
@@ -3886,7 +3877,7 @@ int __init io_apic_get_unique_id(int ioapic, int apic_id)
 	 */
 
 	if (physids_empty(apic_id_map))
-		apic_id_map = ioapic_phys_id_map(phys_cpu_present_map);
+		apic_id_map = apic->ioapic_phys_id_map(phys_cpu_present_map);
 
 	spin_lock_irqsave(&ioapic_lock, flags);
 	reg_00.raw = io_apic_read(ioapic, 0);
@@ -3902,10 +3893,10 @@ int __init io_apic_get_unique_id(int ioapic, int apic_id)
 	 * Every APIC in a system must have a unique ID or we get lots of nice
 	 * 'stuck on smp_invalidate_needed IPI wait' messages.
 	 */
-	if (check_apicid_used(apic_id_map, apic_id)) {
+	if (apic->check_apicid_used(apic_id_map, apic_id)) {
 
 		for (i = 0; i < get_physical_broadcast(); i++) {
-			if (!check_apicid_used(apic_id_map, i))
+			if (!apic->check_apicid_used(apic_id_map, i))
 				break;
 		}
 
@@ -3918,7 +3909,7 @@ int __init io_apic_get_unique_id(int ioapic, int apic_id)
 		apic_id = i;
 	}
 
-	tmp = apicid_to_cpu_present(apic_id);
+	tmp = apic->apicid_to_cpu_present(apic_id);
 	physids_or(apic_id_map, apic_id_map, tmp);
 
 	if (reg_00.bits.ID != apic_id) {
@@ -3995,8 +3986,8 @@ int acpi_get_override_irq(int bus_irq, int *trigger, int *polarity)
 		return -1;
 
 	for (i = 0; i < mp_irq_entries; i++)
-		if (mp_irqs[i].mp_irqtype == mp_INT &&
-		    mp_irqs[i].mp_srcbusirq == bus_irq)
+		if (mp_irqs[i].irqtype == mp_INT &&
+		    mp_irqs[i].srcbusirq == bus_irq)
 			break;
 	if (i >= mp_irq_entries)
 		return -1;
@@ -4011,7 +4002,7 @@ int acpi_get_override_irq(int bus_irq, int *trigger, int *polarity)
 /*
  * This function currently is only a helper for the i386 smp boot process where
  * we need to reprogram the ioredtbls to cater for the cpus which have come online
- * so mask in all cases should simply be TARGET_CPUS
+ * so mask in all cases should simply be apic->target_cpus()
  */
 #ifdef CONFIG_SMP
 void __init setup_ioapic_dest(void)
@@ -4050,9 +4041,9 @@ void __init setup_ioapic_dest(void)
 			 */
 			if (desc->status &
 			    (IRQ_NO_BALANCING | IRQ_AFFINITY_SET))
-				mask = &desc->affinity;
+				mask = desc->affinity;
 			else
-				mask = TARGET_CPUS;
+				mask = apic->target_cpus();
 
 #ifdef CONFIG_INTR_REMAP
 			if (intr_remapping_enabled)
@@ -4111,7 +4102,7 @@ void __init ioapic_init_mappings(void)
 	ioapic_res = ioapic_setup_resources();
 	for (i = 0; i < nr_ioapics; i++) {
 		if (smp_found_config) {
-			ioapic_phys = mp_ioapics[i].mp_apicaddr;
+			ioapic_phys = mp_ioapics[i].apicaddr;
 #ifdef CONFIG_X86_32
 			if (!ioapic_phys) {
 				printk(KERN_ERR
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
new file mode 100644
index 000000000000..dbf5445727a9
--- /dev/null
+++ b/arch/x86/kernel/apic/ipi.c
@@ -0,0 +1,164 @@
+#include <linux/cpumask.h>
+#include <linux/interrupt.h>
+#include <linux/init.h>
+
+#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/spinlock.h>
+#include <linux/kernel_stat.h>
+#include <linux/mc146818rtc.h>
+#include <linux/cache.h>
+#include <linux/cpu.h>
+#include <linux/module.h>
+
+#include <asm/smp.h>
+#include <asm/mtrr.h>
+#include <asm/tlbflush.h>
+#include <asm/mmu_context.h>
+#include <asm/apic.h>
+#include <asm/proto.h>
+#include <asm/ipi.h>
+
+void default_send_IPI_mask_sequence_phys(const struct cpumask *mask, int vector)
+{
+	unsigned long query_cpu;
+	unsigned long flags;
+
+	/*
+	 * Hack. The clustered APIC addressing mode doesn't allow us to send
+	 * to an arbitrary mask, so I do a unicast to each CPU instead.
+	 * - mbligh
+	 */
+	local_irq_save(flags);
+	for_each_cpu(query_cpu, mask) {
+		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
+				query_cpu), vector, APIC_DEST_PHYSICAL);
+	}
+	local_irq_restore(flags);
+}
+
+void default_send_IPI_mask_allbutself_phys(const struct cpumask *mask,
+						 int vector)
+{
+	unsigned int this_cpu = smp_processor_id();
+	unsigned int query_cpu;
+	unsigned long flags;
+
+	/* See Hack comment above */
+
+	local_irq_save(flags);
+	for_each_cpu(query_cpu, mask) {
+		if (query_cpu == this_cpu)
+			continue;
+		__default_send_IPI_dest_field(per_cpu(x86_cpu_to_apicid,
+				 query_cpu), vector, APIC_DEST_PHYSICAL);
+	}
+	local_irq_restore(flags);
+}
+
+void default_send_IPI_mask_sequence_logical(const struct cpumask *mask,
+						 int vector)
+{
+	unsigned long flags;
+	unsigned int query_cpu;
+
+	/*
+	 * Hack. The clustered APIC addressing mode doesn't allow us to send
+	 * to an arbitrary mask, so I do a unicasts to each CPU instead. This
+	 * should be modified to do 1 message per cluster ID - mbligh
+	 */
+
+	local_irq_save(flags);
+	for_each_cpu(query_cpu, mask)
+		__default_send_IPI_dest_field(
+			apic->cpu_to_logical_apicid(query_cpu), vector,
+			apic->dest_logical);
+	local_irq_restore(flags);
+}
+
+void default_send_IPI_mask_allbutself_logical(const struct cpumask *mask,
+						 int vector)
+{
+	unsigned long flags;
+	unsigned int query_cpu;
+	unsigned int this_cpu = smp_processor_id();
+
+	/* See Hack comment above */
+
+	local_irq_save(flags);
+	for_each_cpu(query_cpu, mask) {
+		if (query_cpu == this_cpu)
+			continue;
+		__default_send_IPI_dest_field(
+			apic->cpu_to_logical_apicid(query_cpu), vector,
+			apic->dest_logical);
+		}
+	local_irq_restore(flags);
+}
+
+#ifdef CONFIG_X86_32
+
+/*
+ * This is only used on smaller machines.
+ */
+void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
+{
+	unsigned long mask = cpumask_bits(cpumask)[0];
+	unsigned long flags;
+
+	local_irq_save(flags);
+	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
+	__default_send_IPI_dest_field(mask, vector, apic->dest_logical);
+	local_irq_restore(flags);
+}
+
+void default_send_IPI_allbutself(int vector)
+{
+	/*
+	 * if there are no other CPUs in the system then we get an APIC send
+	 * error if we try to broadcast, thus avoid sending IPIs in this case.
+	 */
+	if (!(num_online_cpus() > 1))
+		return;
+
+	__default_local_send_IPI_allbutself(vector);
+}
+
+void default_send_IPI_all(int vector)
+{
+	__default_local_send_IPI_all(vector);
+}
+
+void default_send_IPI_self(int vector)
+{
+	__default_send_IPI_shortcut(APIC_DEST_SELF, vector, apic->dest_logical);
+}
+
+/* must come after the send_IPI functions above for inlining */
+static int convert_apicid_to_cpu(int apic_id)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		if (per_cpu(x86_cpu_to_apicid, i) == apic_id)
+			return i;
+	}
+	return -1;
+}
+
+int safe_smp_processor_id(void)
+{
+	int apicid, cpuid;
+
+	if (!boot_cpu_has(X86_FEATURE_APIC))
+		return 0;
+
+	apicid = hard_smp_processor_id();
+	if (apicid == BAD_APICID)
+		return 0;
+
+	cpuid = convert_apicid_to_cpu(apicid);
+
+	return cpuid >= 0 ? cpuid : 0;
+}
+#endif
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/apic/nmi.c
index 7228979f1e7f..bdfad80c3cf1 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/apic/nmi.c
@@ -34,7 +34,7 @@
 
 #include <asm/mce.h>
 
-#include <mach_traps.h>
+#include <asm/mach_traps.h>
 
 int unknown_nmi_panic;
 int nmi_watchdog_enabled;
@@ -61,11 +61,7 @@ static int endflag __initdata;
 
 static inline unsigned int get_nmi_count(int cpu)
 {
-#ifdef CONFIG_X86_64
-	return cpu_pda(cpu)->__nmi_count;
-#else
-	return nmi_count(cpu);
-#endif
+	return per_cpu(irq_stat, cpu).__nmi_count;
 }
 
 static inline int mce_in_progress(void)
@@ -82,12 +78,8 @@ static inline int mce_in_progress(void)
  */
 static inline unsigned int get_timer_irqs(int cpu)
 {
-#ifdef CONFIG_X86_64
-	return read_pda(apic_timer_irqs) + read_pda(irq0_irqs);
-#else
 	return per_cpu(irq_stat, cpu).apic_timer_irqs +
 		per_cpu(irq_stat, cpu).irq0_irqs;
-#endif
 }
 
 #ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/apic/numaq_32.c b/arch/x86/kernel/apic/numaq_32.c
new file mode 100644
index 000000000000..ba2fc6465534
--- /dev/null
+++ b/arch/x86/kernel/apic/numaq_32.c
@@ -0,0 +1,557 @@
+/*
+ * Written by: Patricia Gaughen, IBM Corporation
+ *
+ * Copyright (C) 2002, IBM Corp.
+ * Copyright (C) 2009, Red Hat, Inc., Ingo Molnar
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Send feedback to <gone@us.ibm.com>
+ */
+#include <linux/nodemask.h>
+#include <linux/topology.h>
+#include <linux/bootmem.h>
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/numa.h>
+#include <linux/smp.h>
+#include <linux/io.h>
+#include <linux/mm.h>
+
+#include <asm/processor.h>
+#include <asm/fixmap.h>
+#include <asm/mpspec.h>
+#include <asm/numaq.h>
+#include <asm/setup.h>
+#include <asm/apic.h>
+#include <asm/e820.h>
+#include <asm/ipi.h>
+
+#define	MB_TO_PAGES(addr)		((addr) << (20 - PAGE_SHIFT))
+
+int found_numaq;
+
+/*
+ * Have to match translation table entries to main table entries by counter
+ * hence the mpc_record variable .... can't see a less disgusting way of
+ * doing this ....
+ */
+struct mpc_trans {
+	unsigned char			mpc_type;
+	unsigned char			trans_len;
+	unsigned char			trans_type;
+	unsigned char			trans_quad;
+	unsigned char			trans_global;
+	unsigned char			trans_local;
+	unsigned short			trans_reserved;
+};
+
+/* x86_quirks member */
+static int				mpc_record;
+
+static struct mpc_trans			*translation_table[MAX_MPC_ENTRY];
+
+int					mp_bus_id_to_node[MAX_MP_BUSSES];
+int					mp_bus_id_to_local[MAX_MP_BUSSES];
+int					quad_local_to_mp_bus_id[NR_CPUS/4][4];
+
+
+static inline void numaq_register_node(int node, struct sys_cfg_data *scd)
+{
+	struct eachquadmem *eq = scd->eq + node;
+
+	node_set_online(node);
+
+	/* Convert to pages */
+	node_start_pfn[node] =
+		 MB_TO_PAGES(eq->hi_shrd_mem_start - eq->priv_mem_size);
+
+	node_end_pfn[node] =
+		 MB_TO_PAGES(eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
+
+	e820_register_active_regions(node, node_start_pfn[node],
+						node_end_pfn[node]);
+
+	memory_present(node, node_start_pfn[node], node_end_pfn[node]);
+
+	node_remap_size[node] = node_memmap_size_bytes(node,
+					node_start_pfn[node],
+					node_end_pfn[node]);
+}
+
+/*
+ * Function: smp_dump_qct()
+ *
+ * Description: gets memory layout from the quad config table.  This
+ * function also updates node_online_map with the nodes (quads) present.
+ */
+static void __init smp_dump_qct(void)
+{
+	struct sys_cfg_data *scd;
+	int node;
+
+	scd = (void *)__va(SYS_CFG_DATA_PRIV_ADDR);
+
+	nodes_clear(node_online_map);
+	for_each_node(node) {
+		if (scd->quads_present31_0 & (1 << node))
+			numaq_register_node(node, scd);
+	}
+}
+
+void __cpuinit numaq_tsc_disable(void)
+{
+	if (!found_numaq)
+		return;
+
+	if (num_online_nodes() > 1) {
+		printk(KERN_DEBUG "NUMAQ: disabling TSC\n");
+		setup_clear_cpu_cap(X86_FEATURE_TSC);
+	}
+}
+
+static int __init numaq_pre_time_init(void)
+{
+	numaq_tsc_disable();
+	return 0;
+}
+
+static inline int generate_logical_apicid(int quad, int phys_apicid)
+{
+	return (quad << 4) + (phys_apicid ? phys_apicid << 1 : 1);
+}
+
+/* x86_quirks member */
+static int mpc_apic_id(struct mpc_cpu *m)
+{
+	int quad = translation_table[mpc_record]->trans_quad;
+	int logical_apicid = generate_logical_apicid(quad, m->apicid);
+
+	printk(KERN_DEBUG
+		"Processor #%d %u:%u APIC version %d (quad %d, apic %d)\n",
+		 m->apicid, (m->cpufeature & CPU_FAMILY_MASK) >> 8,
+		(m->cpufeature & CPU_MODEL_MASK) >> 4,
+		 m->apicver, quad, logical_apicid);
+
+	return logical_apicid;
+}
+
+/* x86_quirks member */
+static void mpc_oem_bus_info(struct mpc_bus *m, char *name)
+{
+	int quad = translation_table[mpc_record]->trans_quad;
+	int local = translation_table[mpc_record]->trans_local;
+
+	mp_bus_id_to_node[m->busid] = quad;
+	mp_bus_id_to_local[m->busid] = local;
+
+	printk(KERN_INFO "Bus #%d is %s (node %d)\n", m->busid, name, quad);
+}
+
+/* x86_quirks member */
+static void mpc_oem_pci_bus(struct mpc_bus *m)
+{
+	int quad = translation_table[mpc_record]->trans_quad;
+	int local = translation_table[mpc_record]->trans_local;
+
+	quad_local_to_mp_bus_id[quad][local] = m->busid;
+}
+
+static void __init MP_translation_info(struct mpc_trans *m)
+{
+	printk(KERN_INFO
+	    "Translation: record %d, type %d, quad %d, global %d, local %d\n",
+	       mpc_record, m->trans_type, m->trans_quad, m->trans_global,
+	       m->trans_local);
+
+	if (mpc_record >= MAX_MPC_ENTRY)
+		printk(KERN_ERR "MAX_MPC_ENTRY exceeded!\n");
+	else
+		translation_table[mpc_record] = m; /* stash this for later */
+
+	if (m->trans_quad < MAX_NUMNODES && !node_online(m->trans_quad))
+		node_set_online(m->trans_quad);
+}
+
+static int __init mpf_checksum(unsigned char *mp, int len)
+{
+	int sum = 0;
+
+	while (len--)
+		sum += *mp++;
+
+	return sum & 0xFF;
+}
+
+/*
+ * Read/parse the MPC oem tables
+ */
+static void __init
+ smp_read_mpc_oem(struct mpc_oemtable *oemtable, unsigned short oemsize)
+{
+	int count = sizeof(*oemtable);	/* the header size */
+	unsigned char *oemptr = ((unsigned char *)oemtable) + count;
+
+	mpc_record = 0;
+	printk(KERN_INFO
+		"Found an OEM MPC table at %8p - parsing it ... \n", oemtable);
+
+	if (memcmp(oemtable->signature, MPC_OEM_SIGNATURE, 4)) {
+		printk(KERN_WARNING
+		       "SMP mpc oemtable: bad signature [%c%c%c%c]!\n",
+		       oemtable->signature[0], oemtable->signature[1],
+		       oemtable->signature[2], oemtable->signature[3]);
+		return;
+	}
+
+	if (mpf_checksum((unsigned char *)oemtable, oemtable->length)) {
+		printk(KERN_WARNING "SMP oem mptable: checksum error!\n");
+		return;
+	}
+
+	while (count < oemtable->length) {
+		switch (*oemptr) {
+		case MP_TRANSLATION:
+			{
+				struct mpc_trans *m = (void *)oemptr;
+
+				MP_translation_info(m);
+				oemptr += sizeof(*m);
+				count += sizeof(*m);
+				++mpc_record;
+				break;
+			}
+		default:
+			printk(KERN_WARNING
+			       "Unrecognised OEM table entry type! - %d\n",
+			       (int)*oemptr);
+			return;
+		}
+	}
+}
+
+static int __init numaq_setup_ioapic_ids(void)
+{
+	/* so can skip it */
+	return 1;
+}
+
+static struct x86_quirks numaq_x86_quirks __initdata = {
+	.arch_pre_time_init		= numaq_pre_time_init,
+	.arch_time_init			= NULL,
+	.arch_pre_intr_init		= NULL,
+	.arch_memory_setup		= NULL,
+	.arch_intr_init			= NULL,
+	.arch_trap_init			= NULL,
+	.mach_get_smp_config		= NULL,
+	.mach_find_smp_config		= NULL,
+	.mpc_record			= &mpc_record,
+	.mpc_apic_id			= mpc_apic_id,
+	.mpc_oem_bus_info		= mpc_oem_bus_info,
+	.mpc_oem_pci_bus		= mpc_oem_pci_bus,
+	.smp_read_mpc_oem		= smp_read_mpc_oem,
+	.setup_ioapic_ids		= numaq_setup_ioapic_ids,
+};
+
+static __init void early_check_numaq(void)
+{
+	/*
+	 * Find possible boot-time SMP configuration:
+	 */
+	early_find_smp_config();
+
+	/*
+	 * get boot-time SMP configuration:
+	 */
+	if (smp_found_config)
+		early_get_smp_config();
+
+	if (found_numaq)
+		x86_quirks = &numaq_x86_quirks;
+}
+
+int __init get_memcfg_numaq(void)
+{
+	early_check_numaq();
+	if (!found_numaq)
+		return 0;
+	smp_dump_qct();
+
+	return 1;
+}
+
+#define NUMAQ_APIC_DFR_VALUE	(APIC_DFR_CLUSTER)
+
+static inline unsigned int numaq_get_apic_id(unsigned long x)
+{
+	return (x >> 24) & 0x0F;
+}
+
+static inline void numaq_send_IPI_mask(const struct cpumask *mask, int vector)
+{
+	default_send_IPI_mask_sequence_logical(mask, vector);
+}
+
+static inline void numaq_send_IPI_allbutself(int vector)
+{
+	default_send_IPI_mask_allbutself_logical(cpu_online_mask, vector);
+}
+
+static inline void numaq_send_IPI_all(int vector)
+{
+	numaq_send_IPI_mask(cpu_online_mask, vector);
+}
+
+#define NUMAQ_TRAMPOLINE_PHYS_LOW	(0x8)
+#define NUMAQ_TRAMPOLINE_PHYS_HIGH	(0xa)
+
+/*
+ * Because we use NMIs rather than the INIT-STARTUP sequence to
+ * bootstrap the CPUs, the APIC may be in a weird state. Kick it:
+ */
+static inline void numaq_smp_callin_clear_local_apic(void)
+{
+	clear_local_APIC();
+}
+
+static inline const cpumask_t *numaq_target_cpus(void)
+{
+	return &CPU_MASK_ALL;
+}
+
+static inline unsigned long
+numaq_check_apicid_used(physid_mask_t bitmap, int apicid)
+{
+	return physid_isset(apicid, bitmap);
+}
+
+static inline unsigned long numaq_check_apicid_present(int bit)
+{
+	return physid_isset(bit, phys_cpu_present_map);
+}
+
+static inline int numaq_apic_id_registered(void)
+{
+	return 1;
+}
+
+static inline void numaq_init_apic_ldr(void)
+{
+	/* Already done in NUMA-Q firmware */
+}
+
+static inline void numaq_setup_apic_routing(void)
+{
+	printk(KERN_INFO
+		"Enabling APIC mode:  NUMA-Q.  Using %d I/O APICs\n",
+		nr_ioapics);
+}
+
+/*
+ * Skip adding the timer int on secondary nodes, which causes
+ * a small but painful rift in the time-space continuum.
+ */
+static inline int numaq_multi_timer_check(int apic, int irq)
+{
+	return apic != 0 && irq == 0;
+}
+
+static inline physid_mask_t numaq_ioapic_phys_id_map(physid_mask_t phys_map)
+{
+	/* We don't have a good way to do this yet - hack */
+	return physids_promote(0xFUL);
+}
+
+static inline int numaq_cpu_to_logical_apicid(int cpu)
+{
+	if (cpu >= nr_cpu_ids)
+		return BAD_APICID;
+	return cpu_2_logical_apicid[cpu];
+}
+
+/*
+ * Supporting over 60 cpus on NUMA-Q requires a locality-dependent
+ * cpu to APIC ID relation to properly interact with the intelligent
+ * mode of the cluster controller.
+ */
+static inline int numaq_cpu_present_to_apicid(int mps_cpu)
+{
+	if (mps_cpu < 60)
+		return ((mps_cpu >> 2) << 4) | (1 << (mps_cpu & 0x3));
+	else
+		return BAD_APICID;
+}
+
+static inline int numaq_apicid_to_node(int logical_apicid)
+{
+	return logical_apicid >> 4;
+}
+
+static inline physid_mask_t numaq_apicid_to_cpu_present(int logical_apicid)
+{
+	int node = numaq_apicid_to_node(logical_apicid);
+	int cpu = __ffs(logical_apicid & 0xf);
+
+	return physid_mask_of_physid(cpu + 4*node);
+}
+
+/* Where the IO area was mapped on multiquad, always 0 otherwise */
+void *xquad_portio;
+
+static inline int numaq_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return 1;
+}
+
+/*
+ * We use physical apicids here, not logical, so just return the default
+ * physical broadcast to stop people from breaking us
+ */
+static inline unsigned int numaq_cpu_mask_to_apicid(const cpumask_t *cpumask)
+{
+	return 0x0F;
+}
+
+static inline unsigned int
+numaq_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			     const struct cpumask *andmask)
+{
+	return 0x0F;
+}
+
+/* No NUMA-Q box has a HT CPU, but it can't hurt to use the default code. */
+static inline int numaq_phys_pkg_id(int cpuid_apic, int index_msb)
+{
+	return cpuid_apic >> index_msb;
+}
+
+static int
+numaq_mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
+{
+	if (strncmp(oem, "IBM NUMA", 8))
+		printk(KERN_ERR "Warning! Not a NUMA-Q system!\n");
+	else
+		found_numaq = 1;
+
+	return found_numaq;
+}
+
+static int probe_numaq(void)
+{
+	/* already know from get_memcfg_numaq() */
+	return found_numaq;
+}
+
+static void numaq_vector_allocation_domain(int cpu, cpumask_t *retmask)
+{
+	/* Careful. Some cpus do not strictly honor the set of cpus
+	 * specified in the interrupt destination when using lowest
+	 * priority interrupt delivery mode.
+	 *
+	 * In particular there was a hyperthreading cpu observed to
+	 * deliver interrupts to the wrong hyperthread when only one
+	 * hyperthread was specified in the interrupt desitination.
+	 */
+	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
+}
+
+static void numaq_setup_portio_remap(void)
+{
+	int num_quads = num_online_nodes();
+
+	if (num_quads <= 1)
+		return;
+
+	printk(KERN_INFO
+		"Remapping cross-quad port I/O for %d quads\n", num_quads);
+
+	xquad_portio = ioremap(XQUAD_PORTIO_BASE, num_quads*XQUAD_PORTIO_QUAD);
+
+	printk(KERN_INFO
+		"xquad_portio vaddr 0x%08lx, len %08lx\n",
+		(u_long) xquad_portio, (u_long) num_quads*XQUAD_PORTIO_QUAD);
+}
+
+struct apic apic_numaq = {
+
+	.name				= "NUMAQ",
+	.probe				= probe_numaq,
+	.acpi_madt_oem_check		= NULL,
+	.apic_id_registered		= numaq_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	/* physical delivery on LOCAL quad: */
+	.irq_dest_mode			= 0,
+
+	.target_cpus			= numaq_target_cpus,
+	.disable_esr			= 1,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= numaq_check_apicid_used,
+	.check_apicid_present		= numaq_check_apicid_present,
+
+	.vector_allocation_domain	= numaq_vector_allocation_domain,
+	.init_apic_ldr			= numaq_init_apic_ldr,
+
+	.ioapic_phys_id_map		= numaq_ioapic_phys_id_map,
+	.setup_apic_routing		= numaq_setup_apic_routing,
+	.multi_timer_check		= numaq_multi_timer_check,
+	.apicid_to_node			= numaq_apicid_to_node,
+	.cpu_to_logical_apicid		= numaq_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= numaq_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= numaq_apicid_to_cpu_present,
+	.setup_portio_remap		= numaq_setup_portio_remap,
+	.check_phys_apicid_present	= numaq_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= numaq_phys_pkg_id,
+	.mps_oem_check			= numaq_mps_oem_check,
+
+	.get_apic_id			= numaq_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0x0F << 24,
+
+	.cpu_mask_to_apicid		= numaq_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= numaq_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= numaq_send_IPI_mask,
+	.send_IPI_mask_allbutself	= NULL,
+	.send_IPI_allbutself		= numaq_send_IPI_allbutself,
+	.send_IPI_all			= numaq_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.wakeup_secondary_cpu		= wakeup_secondary_cpu_via_nmi,
+	.trampoline_phys_low		= NUMAQ_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= NUMAQ_TRAMPOLINE_PHYS_HIGH,
+
+	/* We don't do anything here because we use NMI's to boot instead */
+	.wait_for_init_deassert		= NULL,
+
+	.smp_callin_clear_local_apic	= numaq_smp_callin_clear_local_apic,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
new file mode 100644
index 000000000000..141c99a1c264
--- /dev/null
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -0,0 +1,284 @@
+/*
+ * Default generic APIC driver. This handles up to 8 CPUs.
+ *
+ * Copyright 2003 Andi Kleen, SuSE Labs.
+ * Subject to the GNU Public License, v.2
+ *
+ * Generic x86 APIC driver probe layer.
+ */
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/module.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/ctype.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <asm/fixmap.h>
+#include <asm/mpspec.h>
+#include <asm/apicdef.h>
+#include <asm/apic.h>
+#include <asm/setup.h>
+
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <asm/mpspec.h>
+#include <asm/fixmap.h>
+#include <asm/apicdef.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <asm/ipi.h>
+
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <asm/acpi.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+
+#ifdef CONFIG_HOTPLUG_CPU
+#define DEFAULT_SEND_IPI	(1)
+#else
+#define DEFAULT_SEND_IPI	(0)
+#endif
+
+int no_broadcast = DEFAULT_SEND_IPI;
+
+static __init int no_ipi_broadcast(char *str)
+{
+	get_option(&str, &no_broadcast);
+	pr_info("Using %s mode\n",
+		no_broadcast ? "No IPI Broadcast" : "IPI Broadcast");
+	return 1;
+}
+__setup("no_ipi_broadcast=", no_ipi_broadcast);
+
+static int __init print_ipi_mode(void)
+{
+	pr_info("Using IPI %s mode\n",
+		no_broadcast ? "No-Shortcut" : "Shortcut");
+	return 0;
+}
+late_initcall(print_ipi_mode);
+
+void default_setup_apic_routing(void)
+{
+#ifdef CONFIG_X86_IO_APIC
+	printk(KERN_INFO
+		"Enabling APIC mode:  Flat.  Using %d I/O APICs\n",
+		nr_ioapics);
+#endif
+}
+
+static void default_vector_allocation_domain(int cpu, struct cpumask *retmask)
+{
+	/*
+	 * Careful. Some cpus do not strictly honor the set of cpus
+	 * specified in the interrupt destination when using lowest
+	 * priority interrupt delivery mode.
+	 *
+	 * In particular there was a hyperthreading cpu observed to
+	 * deliver interrupts to the wrong hyperthread when only one
+	 * hyperthread was specified in the interrupt desitination.
+	 */
+	*retmask = (cpumask_t) { { [0] = APIC_ALL_CPUS } };
+}
+
+/* should be called last. */
+static int probe_default(void)
+{
+	return 1;
+}
+
+struct apic apic_default = {
+
+	.name				= "default",
+	.probe				= probe_default,
+	.acpi_madt_oem_check		= NULL,
+	.apic_id_registered		= default_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	/* logical delivery broadcast to all CPUs: */
+	.irq_dest_mode			= 1,
+
+	.target_cpus			= default_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= default_check_apicid_used,
+	.check_apicid_present		= default_check_apicid_present,
+
+	.vector_allocation_domain	= default_vector_allocation_domain,
+	.init_apic_ldr			= default_init_apic_ldr,
+
+	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
+	.setup_apic_routing		= default_setup_apic_routing,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= default_apicid_to_node,
+	.cpu_to_logical_apicid		= default_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= default_apicid_to_cpu_present,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= default_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= default_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0x0F << 24,
+
+	.cpu_mask_to_apicid		= default_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= default_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= default_send_IPI_mask_logical,
+	.send_IPI_mask_allbutself	= default_send_IPI_mask_allbutself_logical,
+	.send_IPI_allbutself		= default_send_IPI_allbutself,
+	.send_IPI_all			= default_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+
+	.wait_for_init_deassert		= default_wait_for_init_deassert,
+
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= default_inquire_remote_apic,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
+
+extern struct apic apic_numaq;
+extern struct apic apic_summit;
+extern struct apic apic_bigsmp;
+extern struct apic apic_es7000;
+extern struct apic apic_es7000_cluster;
+extern struct apic apic_default;
+
+struct apic *apic = &apic_default;
+EXPORT_SYMBOL_GPL(apic);
+
+static struct apic *apic_probe[] __initdata = {
+#ifdef CONFIG_X86_NUMAQ
+	&apic_numaq,
+#endif
+#ifdef CONFIG_X86_SUMMIT
+	&apic_summit,
+#endif
+#ifdef CONFIG_X86_BIGSMP
+	&apic_bigsmp,
+#endif
+#ifdef CONFIG_X86_ES7000
+	&apic_es7000,
+	&apic_es7000_cluster,
+#endif
+	&apic_default,	/* must be last */
+	NULL,
+};
+
+static int cmdline_apic __initdata;
+static int __init parse_apic(char *arg)
+{
+	int i;
+
+	if (!arg)
+		return -EINVAL;
+
+	for (i = 0; apic_probe[i]; i++) {
+		if (!strcmp(apic_probe[i]->name, arg)) {
+			apic = apic_probe[i];
+			cmdline_apic = 1;
+			return 0;
+		}
+	}
+
+	/* Parsed again by __setup for debug/verbose */
+	return 0;
+}
+early_param("apic", parse_apic);
+
+void __init generic_bigsmp_probe(void)
+{
+#ifdef CONFIG_X86_BIGSMP
+	/*
+	 * This routine is used to switch to bigsmp mode when
+	 * - There is no apic= option specified by the user
+	 * - generic_apic_probe() has chosen apic_default as the sub_arch
+	 * - we find more than 8 CPUs in acpi LAPIC listing with xAPIC support
+	 */
+
+	if (!cmdline_apic && apic == &apic_default) {
+		if (apic_bigsmp.probe()) {
+			apic = &apic_bigsmp;
+			printk(KERN_INFO "Overriding APIC driver with %s\n",
+			       apic->name);
+		}
+	}
+#endif
+}
+
+void __init generic_apic_probe(void)
+{
+	if (!cmdline_apic) {
+		int i;
+		for (i = 0; apic_probe[i]; i++) {
+			if (apic_probe[i]->probe()) {
+				apic = apic_probe[i];
+				break;
+			}
+		}
+		/* Not visible without early console */
+		if (!apic_probe[i])
+			panic("Didn't find an APIC driver");
+	}
+	printk(KERN_INFO "Using APIC driver %s\n", apic->name);
+}
+
+/* These functions can switch the APIC even after the initial ->probe() */
+
+int __init
+generic_mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
+{
+	int i;
+
+	for (i = 0; apic_probe[i]; ++i) {
+		if (!apic_probe[i]->mps_oem_check)
+			continue;
+		if (!apic_probe[i]->mps_oem_check(mpc, oem, productid))
+			continue;
+
+		if (!cmdline_apic) {
+			apic = apic_probe[i];
+			printk(KERN_INFO "Switched to APIC driver `%s'.\n",
+			       apic->name);
+		}
+		return 1;
+	}
+	return 0;
+}
+
+int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+	int i;
+
+	for (i = 0; apic_probe[i]; ++i) {
+		if (!apic_probe[i]->acpi_madt_oem_check)
+			continue;
+		if (!apic_probe[i]->acpi_madt_oem_check(oem_id, oem_table_id))
+			continue;
+
+		if (!cmdline_apic) {
+			apic = apic_probe[i];
+			printk(KERN_INFO "Switched to APIC driver `%s'.\n",
+			       apic->name);
+		}
+		return 1;
+	}
+	return 0;
+}
diff --git a/arch/x86/kernel/genapic_64.c b/arch/x86/kernel/apic/probe_64.c
index 2bced78b0b8e..8d7748efe6a8 100644
--- a/arch/x86/kernel/genapic_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -19,22 +19,27 @@
 #include <linux/dmar.h>
 
 #include <asm/smp.h>
+#include <asm/apic.h>
 #include <asm/ipi.h>
-#include <asm/genapic.h>
 #include <asm/setup.h>
 
-extern struct genapic apic_flat;
-extern struct genapic apic_physflat;
-extern struct genapic apic_x2xpic_uv_x;
-extern struct genapic apic_x2apic_phys;
-extern struct genapic apic_x2apic_cluster;
+extern struct apic apic_flat;
+extern struct apic apic_physflat;
+extern struct apic apic_x2xpic_uv_x;
+extern struct apic apic_x2apic_phys;
+extern struct apic apic_x2apic_cluster;
 
-struct genapic __read_mostly *genapic = &apic_flat;
+struct apic __read_mostly *apic = &apic_flat;
+EXPORT_SYMBOL_GPL(apic);
 
-static struct genapic *apic_probe[] __initdata = {
+static struct apic *apic_probe[] __initdata = {
+#ifdef CONFIG_X86_UV
 	&apic_x2apic_uv_x,
+#endif
+#ifdef CONFIG_X86_X2APIC
 	&apic_x2apic_phys,
 	&apic_x2apic_cluster,
+#endif
 	&apic_physflat,
 	NULL,
 };
@@ -42,39 +47,45 @@ static struct genapic *apic_probe[] __initdata = {
 /*
  * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode.
  */
-void __init setup_apic_routing(void)
+void __init default_setup_apic_routing(void)
 {
-	if (genapic == &apic_x2apic_phys || genapic == &apic_x2apic_cluster) {
-		if (!intr_remapping_enabled)
-			genapic = &apic_flat;
+#ifdef CONFIG_X86_X2APIC
+	if (x2apic && (apic != &apic_x2apic_phys &&
+#ifdef CONFIG_X86_UV
+		       apic != &apic_x2apic_uv_x &&
+#endif
+		       apic != &apic_x2apic_cluster)) {
+		if (x2apic_phys)
+			apic = &apic_x2apic_phys;
+		else
+			apic = &apic_x2apic_cluster;
+		printk(KERN_INFO "Setting APIC routing to %s\n", apic->name);
 	}
+#endif
 
-	if (genapic == &apic_flat) {
+	if (apic == &apic_flat) {
 		if (max_physical_apicid >= 8)
-			genapic = &apic_physflat;
-		printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
+			apic = &apic_physflat;
+		printk(KERN_INFO "Setting APIC routing to %s\n", apic->name);
 	}
-
-	if (x86_quirks->update_genapic)
-		x86_quirks->update_genapic();
 }
 
 /* Same for both flat and physical. */
 
 void apic_send_IPI_self(int vector)
 {
-	__send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
+	__default_send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
 }
 
-int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
 {
 	int i;
 
 	for (i = 0; apic_probe[i]; ++i) {
 		if (apic_probe[i]->acpi_madt_oem_check(oem_id, oem_table_id)) {
-			genapic = apic_probe[i];
+			apic = apic_probe[i];
 			printk(KERN_INFO "Setting APIC routing to %s.\n",
-				genapic->name);
+				apic->name);
 			return 1;
 		}
 	}
diff --git a/arch/x86/kernel/apic/summit_32.c b/arch/x86/kernel/apic/summit_32.c
new file mode 100644
index 000000000000..aac52fa873ff
--- /dev/null
+++ b/arch/x86/kernel/apic/summit_32.c
@@ -0,0 +1,579 @@
+/*
+ * IBM Summit-Specific Code
+ *
+ * Written By: Matthew Dobson, IBM Corporation
+ *
+ * Copyright (c) 2003 IBM Corp.
+ *
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Send feedback to <colpatch@us.ibm.com>
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/init.h>
+#include <asm/io.h>
+#include <asm/bios_ebda.h>
+
+/*
+ * APIC driver for the IBM "Summit" chipset.
+ */
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <asm/mpspec.h>
+#include <asm/apic.h>
+#include <asm/smp.h>
+#include <asm/fixmap.h>
+#include <asm/apicdef.h>
+#include <asm/ipi.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/smp.h>
+
+static unsigned summit_get_apic_id(unsigned long x)
+{
+	return (x >> 24) & 0xFF;
+}
+
+static inline void summit_send_IPI_mask(const cpumask_t *mask, int vector)
+{
+	default_send_IPI_mask_sequence_logical(mask, vector);
+}
+
+static void summit_send_IPI_allbutself(int vector)
+{
+	cpumask_t mask = cpu_online_map;
+	cpu_clear(smp_processor_id(), mask);
+
+	if (!cpus_empty(mask))
+		summit_send_IPI_mask(&mask, vector);
+}
+
+static void summit_send_IPI_all(int vector)
+{
+	summit_send_IPI_mask(&cpu_online_map, vector);
+}
+
+#include <asm/tsc.h>
+
+extern int use_cyclone;
+
+#ifdef CONFIG_X86_SUMMIT_NUMA
+static void setup_summit(void);
+#else
+static inline void setup_summit(void) {}
+#endif
+
+static int summit_mps_oem_check(struct mpc_table *mpc, char *oem,
+		char *productid)
+{
+	if (!strncmp(oem, "IBM ENSW", 8) &&
+			(!strncmp(productid, "VIGIL SMP", 9)
+			 || !strncmp(productid, "EXA", 3)
+			 || !strncmp(productid, "RUTHLESS SMP", 12))){
+		mark_tsc_unstable("Summit based system");
+		use_cyclone = 1; /*enable cyclone-timer*/
+		setup_summit();
+		return 1;
+	}
+	return 0;
+}
+
+/* Hook from generic ACPI tables.c */
+static int summit_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+{
+	if (!strncmp(oem_id, "IBM", 3) &&
+	    (!strncmp(oem_table_id, "SERVIGIL", 8)
+	     || !strncmp(oem_table_id, "EXA", 3))){
+		mark_tsc_unstable("Summit based system");
+		use_cyclone = 1; /*enable cyclone-timer*/
+		setup_summit();
+		return 1;
+	}
+	return 0;
+}
+
+struct rio_table_hdr {
+	unsigned char version;      /* Version number of this data structure           */
+	                            /* Version 3 adds chassis_num & WP_index           */
+	unsigned char num_scal_dev; /* # of Scalability devices (Twisters for Vigil)   */
+	unsigned char num_rio_dev;  /* # of RIO I/O devices (Cyclones and Winnipegs)   */
+} __attribute__((packed));
+
+struct scal_detail {
+	unsigned char node_id;      /* Scalability Node ID                             */
+	unsigned long CBAR;         /* Address of 1MB register space                   */
+	unsigned char port0node;    /* Node ID port connected to: 0xFF=None            */
+	unsigned char port0port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
+	unsigned char port1node;    /* Node ID port connected to: 0xFF = None          */
+	unsigned char port1port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
+	unsigned char port2node;    /* Node ID port connected to: 0xFF = None          */
+	unsigned char port2port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
+	unsigned char chassis_num;  /* 1 based Chassis number (1 = boot node)          */
+} __attribute__((packed));
+
+struct rio_detail {
+	unsigned char node_id;      /* RIO Node ID                                     */
+	unsigned long BBAR;         /* Address of 1MB register space                   */
+	unsigned char type;         /* Type of device                                  */
+	unsigned char owner_id;     /* For WPEG: Node ID of Cyclone that owns this WPEG*/
+	                            /* For CYC:  Node ID of Twister that owns this CYC */
+	unsigned char port0node;    /* Node ID port connected to: 0xFF=None            */
+	unsigned char port0port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
+	unsigned char port1node;    /* Node ID port connected to: 0xFF=None            */
+	unsigned char port1port;    /* Port num port connected to: 0,1,2, or 0xFF=None */
+	unsigned char first_slot;   /* For WPEG: Lowest slot number below this WPEG    */
+	                            /* For CYC:  0                                     */
+	unsigned char status;       /* For WPEG: Bit 0 = 1 : the XAPIC is used         */
+	                            /*                 = 0 : the XAPIC is not used, ie:*/
+	                            /*                     ints fwded to another XAPIC */
+	                            /*           Bits1:7 Reserved                      */
+	                            /* For CYC:  Bits0:7 Reserved                      */
+	unsigned char WP_index;     /* For WPEG: WPEG instance index - lower ones have */
+	                            /*           lower slot numbers/PCI bus numbers    */
+	                            /* For CYC:  No meaning                            */
+	unsigned char chassis_num;  /* 1 based Chassis number                          */
+	                            /* For LookOut WPEGs this field indicates the      */
+	                            /* Expansion Chassis #, enumerated from Boot       */
+	                            /* Node WPEG external port, then Boot Node CYC     */
+	                            /* external port, then Next Vigil chassis WPEG     */
+	                            /* external port, etc.                             */
+	                            /* Shared Lookouts have only 1 chassis number (the */
+	                            /* first one assigned)                             */
+} __attribute__((packed));
+
+
+typedef enum {
+	CompatTwister = 0,  /* Compatibility Twister               */
+	AltTwister    = 1,  /* Alternate Twister of internal 8-way */
+	CompatCyclone = 2,  /* Compatibility Cyclone               */
+	AltCyclone    = 3,  /* Alternate Cyclone of internal 8-way */
+	CompatWPEG    = 4,  /* Compatibility WPEG                  */
+	AltWPEG       = 5,  /* Second Planar WPEG                  */
+	LookOutAWPEG  = 6,  /* LookOut WPEG                        */
+	LookOutBWPEG  = 7,  /* LookOut WPEG                        */
+} node_type;
+
+static inline int is_WPEG(struct rio_detail *rio){
+	return (rio->type == CompatWPEG || rio->type == AltWPEG ||
+		rio->type == LookOutAWPEG || rio->type == LookOutBWPEG);
+}
+
+
+/* In clustered mode, the high nibble of APIC ID is a cluster number.
+ * The low nibble is a 4-bit bitmap. */
+#define XAPIC_DEST_CPUS_SHIFT	4
+#define XAPIC_DEST_CPUS_MASK	((1u << XAPIC_DEST_CPUS_SHIFT) - 1)
+#define XAPIC_DEST_CLUSTER_MASK	(XAPIC_DEST_CPUS_MASK << XAPIC_DEST_CPUS_SHIFT)
+
+#define SUMMIT_APIC_DFR_VALUE	(APIC_DFR_CLUSTER)
+
+static const cpumask_t *summit_target_cpus(void)
+{
+	/* CPU_MASK_ALL (0xff) has undefined behaviour with
+	 * dest_LowestPrio mode logical clustered apic interrupt routing
+	 * Just start on cpu 0.  IRQ balancing will spread load
+	 */
+	return &cpumask_of_cpu(0);
+}
+
+static unsigned long summit_check_apicid_used(physid_mask_t bitmap, int apicid)
+{
+	return 0;
+}
+
+/* we don't use the phys_cpu_present_map to indicate apicid presence */
+static unsigned long summit_check_apicid_present(int bit)
+{
+	return 1;
+}
+
+static void summit_init_apic_ldr(void)
+{
+	unsigned long val, id;
+	int count = 0;
+	u8 my_id = (u8)hard_smp_processor_id();
+	u8 my_cluster = APIC_CLUSTER(my_id);
+#ifdef CONFIG_SMP
+	u8 lid;
+	int i;
+
+	/* Create logical APIC IDs by counting CPUs already in cluster. */
+	for (count = 0, i = nr_cpu_ids; --i >= 0; ) {
+		lid = cpu_2_logical_apicid[i];
+		if (lid != BAD_APICID && APIC_CLUSTER(lid) == my_cluster)
+			++count;
+	}
+#endif
+	/* We only have a 4 wide bitmap in cluster mode.  If a deranged
+	 * BIOS puts 5 CPUs in one APIC cluster, we're hosed. */
+	BUG_ON(count >= XAPIC_DEST_CPUS_SHIFT);
+	id = my_cluster | (1UL << count);
+	apic_write(APIC_DFR, SUMMIT_APIC_DFR_VALUE);
+	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
+	val |= SET_APIC_LOGICAL_ID(id);
+	apic_write(APIC_LDR, val);
+}
+
+static int summit_apic_id_registered(void)
+{
+	return 1;
+}
+
+static void summit_setup_apic_routing(void)
+{
+	printk("Enabling APIC mode:  Summit.  Using %d I/O APICs\n",
+						nr_ioapics);
+}
+
+static int summit_apicid_to_node(int logical_apicid)
+{
+#ifdef CONFIG_SMP
+	return apicid_2_node[hard_smp_processor_id()];
+#else
+	return 0;
+#endif
+}
+
+/* Mapping from cpu number to logical apicid */
+static inline int summit_cpu_to_logical_apicid(int cpu)
+{
+#ifdef CONFIG_SMP
+	if (cpu >= nr_cpu_ids)
+		return BAD_APICID;
+	return cpu_2_logical_apicid[cpu];
+#else
+	return logical_smp_processor_id();
+#endif
+}
+
+static int summit_cpu_present_to_apicid(int mps_cpu)
+{
+	if (mps_cpu < nr_cpu_ids)
+		return (int)per_cpu(x86_bios_cpu_apicid, mps_cpu);
+	else
+		return BAD_APICID;
+}
+
+static physid_mask_t summit_ioapic_phys_id_map(physid_mask_t phys_id_map)
+{
+	/* For clustered we don't have a good way to do this yet - hack */
+	return physids_promote(0x0F);
+}
+
+static physid_mask_t summit_apicid_to_cpu_present(int apicid)
+{
+	return physid_mask_of_physid(0);
+}
+
+static int summit_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return 1;
+}
+
+static unsigned int summit_cpu_mask_to_apicid(const cpumask_t *cpumask)
+{
+	unsigned int round = 0;
+	int cpu, apicid = 0;
+
+	/*
+	 * The cpus in the mask must all be on the apic cluster.
+	 */
+	for_each_cpu(cpu, cpumask) {
+		int new_apicid = summit_cpu_to_logical_apicid(cpu);
+
+		if (round && APIC_CLUSTER(apicid) != APIC_CLUSTER(new_apicid)) {
+			printk("%s: Not a valid mask!\n", __func__);
+			return BAD_APICID;
+		}
+		apicid |= new_apicid;
+		round++;
+	}
+	return apicid;
+}
+
+static unsigned int summit_cpu_mask_to_apicid_and(const struct cpumask *inmask,
+			      const struct cpumask *andmask)
+{
+	int apicid = summit_cpu_to_logical_apicid(0);
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return apicid;
+
+	cpumask_and(cpumask, inmask, andmask);
+	cpumask_and(cpumask, cpumask, cpu_online_mask);
+	apicid = summit_cpu_mask_to_apicid(cpumask);
+
+	free_cpumask_var(cpumask);
+
+	return apicid;
+}
+
+/*
+ * cpuid returns the value latched in the HW at reset, not the APIC ID
+ * register's value.  For any box whose BIOS changes APIC IDs, like
+ * clustered APIC systems, we must use hard_smp_processor_id.
+ *
+ * See Intel's IA-32 SW Dev's Manual Vol2 under CPUID.
+ */
+static int summit_phys_pkg_id(int cpuid_apic, int index_msb)
+{
+	return hard_smp_processor_id() >> index_msb;
+}
+
+static int probe_summit(void)
+{
+	/* probed later in mptable/ACPI hooks */
+	return 0;
+}
+
+static void summit_vector_allocation_domain(int cpu, cpumask_t *retmask)
+{
+	/* Careful. Some cpus do not strictly honor the set of cpus
+	 * specified in the interrupt destination when using lowest
+	 * priority interrupt delivery mode.
+	 *
+	 * In particular there was a hyperthreading cpu observed to
+	 * deliver interrupts to the wrong hyperthread when only one
+	 * hyperthread was specified in the interrupt desitination.
+	 */
+	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
+}
+
+#ifdef CONFIG_X86_SUMMIT_NUMA
+static struct rio_table_hdr *rio_table_hdr;
+static struct scal_detail   *scal_devs[MAX_NUMNODES];
+static struct rio_detail    *rio_devs[MAX_NUMNODES*4];
+
+#ifndef CONFIG_X86_NUMAQ
+static int mp_bus_id_to_node[MAX_MP_BUSSES];
+#endif
+
+static int setup_pci_node_map_for_wpeg(int wpeg_num, int last_bus)
+{
+	int twister = 0, node = 0;
+	int i, bus, num_buses;
+
+	for (i = 0; i < rio_table_hdr->num_rio_dev; i++) {
+		if (rio_devs[i]->node_id == rio_devs[wpeg_num]->owner_id) {
+			twister = rio_devs[i]->owner_id;
+			break;
+		}
+	}
+	if (i == rio_table_hdr->num_rio_dev) {
+		printk(KERN_ERR "%s: Couldn't find owner Cyclone for Winnipeg!\n", __func__);
+		return last_bus;
+	}
+
+	for (i = 0; i < rio_table_hdr->num_scal_dev; i++) {
+		if (scal_devs[i]->node_id == twister) {
+			node = scal_devs[i]->node_id;
+			break;
+		}
+	}
+	if (i == rio_table_hdr->num_scal_dev) {
+		printk(KERN_ERR "%s: Couldn't find owner Twister for Cyclone!\n", __func__);
+		return last_bus;
+	}
+
+	switch (rio_devs[wpeg_num]->type) {
+	case CompatWPEG:
+		/*
+		 * The Compatibility Winnipeg controls the 2 legacy buses,
+		 * the 66MHz PCI bus [2 slots] and the 2 "extra" buses in case
+		 * a PCI-PCI bridge card is used in either slot: total 5 buses.
+		 */
+		num_buses = 5;
+		break;
+	case AltWPEG:
+		/*
+		 * The Alternate Winnipeg controls the 2 133MHz buses [1 slot
+		 * each], their 2 "extra" buses, the 100MHz bus [2 slots] and
+		 * the "extra" buses for each of those slots: total 7 buses.
+		 */
+		num_buses = 7;
+		break;
+	case LookOutAWPEG:
+	case LookOutBWPEG:
+		/*
+		 * A Lookout Winnipeg controls 3 100MHz buses [2 slots each]
+		 * & the "extra" buses for each of those slots: total 9 buses.
+		 */
+		num_buses = 9;
+		break;
+	default:
+		printk(KERN_INFO "%s: Unsupported Winnipeg type!\n", __func__);
+		return last_bus;
+	}
+
+	for (bus = last_bus; bus < last_bus + num_buses; bus++)
+		mp_bus_id_to_node[bus] = node;
+	return bus;
+}
+
+static int build_detail_arrays(void)
+{
+	unsigned long ptr;
+	int i, scal_detail_size, rio_detail_size;
+
+	if (rio_table_hdr->num_scal_dev > MAX_NUMNODES) {
+		printk(KERN_WARNING "%s: MAX_NUMNODES too low!  Defined as %d, but system has %d nodes.\n", __func__, MAX_NUMNODES, rio_table_hdr->num_scal_dev);
+		return 0;
+	}
+
+	switch (rio_table_hdr->version) {
+	default:
+		printk(KERN_WARNING "%s: Invalid Rio Grande Table Version: %d\n", __func__, rio_table_hdr->version);
+		return 0;
+	case 2:
+		scal_detail_size = 11;
+		rio_detail_size = 13;
+		break;
+	case 3:
+		scal_detail_size = 12;
+		rio_detail_size = 15;
+		break;
+	}
+
+	ptr = (unsigned long)rio_table_hdr + 3;
+	for (i = 0; i < rio_table_hdr->num_scal_dev; i++, ptr += scal_detail_size)
+		scal_devs[i] = (struct scal_detail *)ptr;
+
+	for (i = 0; i < rio_table_hdr->num_rio_dev; i++, ptr += rio_detail_size)
+		rio_devs[i] = (struct rio_detail *)ptr;
+
+	return 1;
+}
+
+void setup_summit(void)
+{
+	unsigned long		ptr;
+	unsigned short		offset;
+	int			i, next_wpeg, next_bus = 0;
+
+	/* The pointer to the EBDA is stored in the word @ phys 0x40E(40:0E) */
+	ptr = get_bios_ebda();
+	ptr = (unsigned long)phys_to_virt(ptr);
+
+	rio_table_hdr = NULL;
+	offset = 0x180;
+	while (offset) {
+		/* The block id is stored in the 2nd word */
+		if (*((unsigned short *)(ptr + offset + 2)) == 0x4752) {
+			/* set the pointer past the offset & block id */
+			rio_table_hdr = (struct rio_table_hdr *)(ptr + offset + 4);
+			break;
+		}
+		/* The next offset is stored in the 1st word.  0 means no more */
+		offset = *((unsigned short *)(ptr + offset));
+	}
+	if (!rio_table_hdr) {
+		printk(KERN_ERR "%s: Unable to locate Rio Grande Table in EBDA - bailing!\n", __func__);
+		return;
+	}
+
+	if (!build_detail_arrays())
+		return;
+
+	/* The first Winnipeg we're looking for has an index of 0 */
+	next_wpeg = 0;
+	do {
+		for (i = 0; i < rio_table_hdr->num_rio_dev; i++) {
+			if (is_WPEG(rio_devs[i]) && rio_devs[i]->WP_index == next_wpeg) {
+				/* It's the Winnipeg we're looking for! */
+				next_bus = setup_pci_node_map_for_wpeg(i, next_bus);
+				next_wpeg++;
+				break;
+			}
+		}
+		/*
+		 * If we go through all Rio devices and don't find one with
+		 * the next index, it means we've found all the Winnipegs,
+		 * and thus all the PCI buses.
+		 */
+		if (i == rio_table_hdr->num_rio_dev)
+			next_wpeg = 0;
+	} while (next_wpeg != 0);
+}
+#endif
+
+struct apic apic_summit = {
+
+	.name				= "summit",
+	.probe				= probe_summit,
+	.acpi_madt_oem_check		= summit_acpi_madt_oem_check,
+	.apic_id_registered		= summit_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	/* logical delivery broadcast to all CPUs: */
+	.irq_dest_mode			= 1,
+
+	.target_cpus			= summit_target_cpus,
+	.disable_esr			= 1,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= summit_check_apicid_used,
+	.check_apicid_present		= summit_check_apicid_present,
+
+	.vector_allocation_domain	= summit_vector_allocation_domain,
+	.init_apic_ldr			= summit_init_apic_ldr,
+
+	.ioapic_phys_id_map		= summit_ioapic_phys_id_map,
+	.setup_apic_routing		= summit_setup_apic_routing,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= summit_apicid_to_node,
+	.cpu_to_logical_apicid		= summit_cpu_to_logical_apicid,
+	.cpu_present_to_apicid		= summit_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= summit_apicid_to_cpu_present,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= summit_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= summit_phys_pkg_id,
+	.mps_oem_check			= summit_mps_oem_check,
+
+	.get_apic_id			= summit_get_apic_id,
+	.set_apic_id			= NULL,
+	.apic_id_mask			= 0xFF << 24,
+
+	.cpu_mask_to_apicid		= summit_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= summit_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= summit_send_IPI_mask,
+	.send_IPI_mask_allbutself	= NULL,
+	.send_IPI_allbutself		= summit_send_IPI_allbutself,
+	.send_IPI_all			= summit_send_IPI_all,
+	.send_IPI_self			= default_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+
+	.wait_for_init_deassert		= default_wait_for_init_deassert,
+
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= default_inquire_remote_apic,
+
+	.read				= native_apic_mem_read,
+	.write				= native_apic_mem_write,
+	.icr_read			= native_apic_icr_read,
+	.icr_write			= native_apic_icr_write,
+	.wait_icr_idle			= native_apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_apic_wait_icr_idle,
+};
diff --git a/arch/x86/kernel/genx2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 6ce497cc372d..8fb87b6dd633 100644
--- a/arch/x86/kernel/genx2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -7,17 +7,14 @@
 #include <linux/dmar.h>
 
 #include <asm/smp.h>
+#include <asm/apic.h>
 #include <asm/ipi.h>
-#include <asm/genapic.h>
 
 DEFINE_PER_CPU(u32, x86_cpu_to_logical_apicid);
 
 static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
 {
-	if (cpu_has_x2apic)
-		return 1;
-
-	return 0;
+	return x2apic_enabled();
 }
 
 /* Start with all IRQs pointing to boot CPU.  IRQ balancing will shift them. */
@@ -36,8 +33,8 @@ static void x2apic_vector_allocation_domain(int cpu, struct cpumask *retmask)
 	cpumask_set_cpu(cpu, retmask);
 }
 
-static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
-				   unsigned int dest)
+static void
+ __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest)
 {
 	unsigned long cfg;
 
@@ -46,7 +43,7 @@ static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
 	/*
 	 * send the IPI.
 	 */
-	x2apic_icr_write(cfg, apicid);
+	native_x2apic_icr_write(cfg, apicid);
 }
 
 /*
@@ -57,45 +54,50 @@ static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
  */
 static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
 {
-	unsigned long flags;
 	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask)
+	for_each_cpu(query_cpu, mask) {
 		__x2apic_send_IPI_dest(
 			per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-			vector, APIC_DEST_LOGICAL);
+			vector, apic->dest_logical);
+	}
 	local_irq_restore(flags);
 }
 
-static void x2apic_send_IPI_mask_allbutself(const struct cpumask *mask,
-					    int vector)
+static void
+ x2apic_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 {
-	unsigned long flags;
-	unsigned long query_cpu;
 	unsigned long this_cpu = smp_processor_id();
+	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask)
-		if (query_cpu != this_cpu)
-			__x2apic_send_IPI_dest(
+	for_each_cpu(query_cpu, mask) {
+		if (query_cpu == this_cpu)
+			continue;
+		__x2apic_send_IPI_dest(
 				per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-				vector, APIC_DEST_LOGICAL);
+				vector, apic->dest_logical);
+	}
 	local_irq_restore(flags);
 }
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
-	unsigned long flags;
-	unsigned long query_cpu;
 	unsigned long this_cpu = smp_processor_id();
+	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
-	for_each_online_cpu(query_cpu)
-		if (query_cpu != this_cpu)
-			__x2apic_send_IPI_dest(
+	for_each_online_cpu(query_cpu) {
+		if (query_cpu == this_cpu)
+			continue;
+		__x2apic_send_IPI_dest(
 				per_cpu(x86_cpu_to_logical_apicid, query_cpu),
-				vector, APIC_DEST_LOGICAL);
+				vector, apic->dest_logical);
+	}
 	local_irq_restore(flags);
 }
 
@@ -111,21 +113,21 @@ static int x2apic_apic_id_registered(void)
 
 static unsigned int x2apic_cpu_mask_to_apicid(const struct cpumask *cpumask)
 {
-	int cpu;
-
 	/*
 	 * We're using fixed IRQ delivery, can only return one logical APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_first(cpumask);
+	int cpu = cpumask_first(cpumask);
+
 	if ((unsigned)cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_logical_apicid, cpu);
 	else
 		return BAD_APICID;
 }
 
-static unsigned int x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-						  const struct cpumask *andmask)
+static unsigned int
+x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			      const struct cpumask *andmask)
 {
 	int cpu;
 
@@ -133,15 +135,18 @@ static unsigned int x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
 	 * We're using fixed IRQ delivery, can only return one logical APIC ID.
 	 * May as well be the first.
 	 */
-	for_each_cpu_and(cpu, cpumask, andmask)
+	for_each_cpu_and(cpu, cpumask, andmask) {
 		if (cpumask_test_cpu(cpu, cpu_online_mask))
 			break;
+	}
+
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_logical_apicid, cpu);
+
 	return BAD_APICID;
 }
 
-static unsigned int get_apic_id(unsigned long x)
+static unsigned int x2apic_cluster_phys_get_apic_id(unsigned long x)
 {
 	unsigned int id;
 
@@ -157,7 +162,7 @@ static unsigned long set_apic_id(unsigned int id)
 	return x;
 }
 
-static unsigned int phys_pkg_id(int index_msb)
+static int x2apic_cluster_phys_pkg_id(int initial_apicid, int index_msb)
 {
 	return current_cpu_data.initial_apicid >> index_msb;
 }
@@ -172,27 +177,63 @@ static void init_x2apic_ldr(void)
 	int cpu = smp_processor_id();
 
 	per_cpu(x86_cpu_to_logical_apicid, cpu) = apic_read(APIC_LDR);
-	return;
-}
-
-struct genapic apic_x2apic_cluster = {
-	.name = "cluster x2apic",
-	.acpi_madt_oem_check = x2apic_acpi_madt_oem_check,
-	.int_delivery_mode = dest_LowestPrio,
-	.int_dest_mode = (APIC_DEST_LOGICAL != 0),
-	.target_cpus = x2apic_target_cpus,
-	.vector_allocation_domain = x2apic_vector_allocation_domain,
-	.apic_id_registered = x2apic_apic_id_registered,
-	.init_apic_ldr = init_x2apic_ldr,
-	.send_IPI_all = x2apic_send_IPI_all,
-	.send_IPI_allbutself = x2apic_send_IPI_allbutself,
-	.send_IPI_mask = x2apic_send_IPI_mask,
-	.send_IPI_mask_allbutself = x2apic_send_IPI_mask_allbutself,
-	.send_IPI_self = x2apic_send_IPI_self,
-	.cpu_mask_to_apicid = x2apic_cpu_mask_to_apicid,
-	.cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and,
-	.phys_pkg_id = phys_pkg_id,
-	.get_apic_id = get_apic_id,
-	.set_apic_id = set_apic_id,
-	.apic_id_mask = (0xFFFFFFFFu),
+}
+
+struct apic apic_x2apic_cluster = {
+
+	.name				= "cluster x2apic",
+	.probe				= NULL,
+	.acpi_madt_oem_check		= x2apic_acpi_madt_oem_check,
+	.apic_id_registered		= x2apic_apic_id_registered,
+
+	.irq_delivery_mode		= dest_LowestPrio,
+	.irq_dest_mode			= 1, /* logical */
+
+	.target_cpus			= x2apic_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= NULL,
+	.check_apicid_present		= NULL,
+
+	.vector_allocation_domain	= x2apic_vector_allocation_domain,
+	.init_apic_ldr			= init_x2apic_ldr,
+
+	.ioapic_phys_id_map		= NULL,
+	.setup_apic_routing		= NULL,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= NULL,
+	.cpu_to_logical_apicid		= NULL,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= NULL,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= x2apic_cluster_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= x2apic_cluster_phys_get_apic_id,
+	.set_apic_id			= set_apic_id,
+	.apic_id_mask			= 0xFFFFFFFFu,
+
+	.cpu_mask_to_apicid		= x2apic_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= x2apic_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= x2apic_send_IPI_mask,
+	.send_IPI_mask_allbutself	= x2apic_send_IPI_mask_allbutself,
+	.send_IPI_allbutself		= x2apic_send_IPI_allbutself,
+	.send_IPI_all			= x2apic_send_IPI_all,
+	.send_IPI_self			= x2apic_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+	.wait_for_init_deassert		= NULL,
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_msr_read,
+	.write				= native_apic_msr_write,
+	.icr_read			= native_x2apic_icr_read,
+	.icr_write			= native_x2apic_icr_write,
+	.wait_icr_idle			= native_x2apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_x2apic_wait_icr_idle,
 };
diff --git a/arch/x86/kernel/genx2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 21bcc0e098ba..23625b9f98b2 100644
--- a/arch/x86/kernel/genx2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -7,10 +7,10 @@
 #include <linux/dmar.h>
 
 #include <asm/smp.h>
+#include <asm/apic.h>
 #include <asm/ipi.h>
-#include <asm/genapic.h>
 
-static int x2apic_phys;
+int x2apic_phys;
 
 static int set_x2apic_phys_mode(char *arg)
 {
@@ -21,10 +21,10 @@ early_param("x2apic_phys", set_x2apic_phys_mode);
 
 static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
 {
-	if (cpu_has_x2apic && x2apic_phys)
-		return 1;
-
-	return 0;
+	if (x2apic_phys)
+		return x2apic_enabled();
+	else
+		return 0;
 }
 
 /* Start with all IRQs pointing to boot CPU.  IRQ balancing will shift them. */
@@ -50,13 +50,13 @@ static void __x2apic_send_IPI_dest(unsigned int apicid, int vector,
 	/*
 	 * send the IPI.
 	 */
-	x2apic_icr_write(cfg, apicid);
+	native_x2apic_icr_write(cfg, apicid);
 }
 
 static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
 {
-	unsigned long flags;
 	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
@@ -66,12 +66,12 @@ static void x2apic_send_IPI_mask(const struct cpumask *mask, int vector)
 	local_irq_restore(flags);
 }
 
-static void x2apic_send_IPI_mask_allbutself(const struct cpumask *mask,
-					    int vector)
+static void
+ x2apic_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 {
-	unsigned long flags;
-	unsigned long query_cpu;
 	unsigned long this_cpu = smp_processor_id();
+	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
 	for_each_cpu(query_cpu, mask) {
@@ -85,16 +85,17 @@ static void x2apic_send_IPI_mask_allbutself(const struct cpumask *mask,
 
 static void x2apic_send_IPI_allbutself(int vector)
 {
-	unsigned long flags;
-	unsigned long query_cpu;
 	unsigned long this_cpu = smp_processor_id();
+	unsigned long query_cpu;
+	unsigned long flags;
 
 	local_irq_save(flags);
-	for_each_online_cpu(query_cpu)
-		if (query_cpu != this_cpu)
-			__x2apic_send_IPI_dest(
-				per_cpu(x86_cpu_to_apicid, query_cpu),
-				vector, APIC_DEST_PHYSICAL);
+	for_each_online_cpu(query_cpu) {
+		if (query_cpu == this_cpu)
+			continue;
+		__x2apic_send_IPI_dest(per_cpu(x86_cpu_to_apicid, query_cpu),
+				       vector, APIC_DEST_PHYSICAL);
+	}
 	local_irq_restore(flags);
 }
 
@@ -110,21 +111,21 @@ static int x2apic_apic_id_registered(void)
 
 static unsigned int x2apic_cpu_mask_to_apicid(const struct cpumask *cpumask)
 {
-	int cpu;
-
 	/*
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_first(cpumask);
+	int cpu = cpumask_first(cpumask);
+
 	if ((unsigned)cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	else
 		return BAD_APICID;
 }
 
-static unsigned int x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-						  const struct cpumask *andmask)
+static unsigned int
+x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			      const struct cpumask *andmask)
 {
 	int cpu;
 
@@ -132,31 +133,28 @@ static unsigned int x2apic_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	for_each_cpu_and(cpu, cpumask, andmask)
+	for_each_cpu_and(cpu, cpumask, andmask) {
 		if (cpumask_test_cpu(cpu, cpu_online_mask))
 			break;
+	}
+
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
+
 	return BAD_APICID;
 }
 
-static unsigned int get_apic_id(unsigned long x)
+static unsigned int x2apic_phys_get_apic_id(unsigned long x)
 {
-	unsigned int id;
-
-	id = x;
-	return id;
+	return x;
 }
 
 static unsigned long set_apic_id(unsigned int id)
 {
-	unsigned long x;
-
-	x = id;
-	return x;
+	return id;
 }
 
-static unsigned int phys_pkg_id(int index_msb)
+static int x2apic_phys_pkg_id(int initial_apicid, int index_msb)
 {
 	return current_cpu_data.initial_apicid >> index_msb;
 }
@@ -168,27 +166,63 @@ static void x2apic_send_IPI_self(int vector)
 
 static void init_x2apic_ldr(void)
 {
-	return;
-}
-
-struct genapic apic_x2apic_phys = {
-	.name = "physical x2apic",
-	.acpi_madt_oem_check = x2apic_acpi_madt_oem_check,
-	.int_delivery_mode = dest_Fixed,
-	.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
-	.target_cpus = x2apic_target_cpus,
-	.vector_allocation_domain = x2apic_vector_allocation_domain,
-	.apic_id_registered = x2apic_apic_id_registered,
-	.init_apic_ldr = init_x2apic_ldr,
-	.send_IPI_all = x2apic_send_IPI_all,
-	.send_IPI_allbutself = x2apic_send_IPI_allbutself,
-	.send_IPI_mask = x2apic_send_IPI_mask,
-	.send_IPI_mask_allbutself = x2apic_send_IPI_mask_allbutself,
-	.send_IPI_self = x2apic_send_IPI_self,
-	.cpu_mask_to_apicid = x2apic_cpu_mask_to_apicid,
-	.cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and,
-	.phys_pkg_id = phys_pkg_id,
-	.get_apic_id = get_apic_id,
-	.set_apic_id = set_apic_id,
-	.apic_id_mask = (0xFFFFFFFFu),
+}
+
+struct apic apic_x2apic_phys = {
+
+	.name				= "physical x2apic",
+	.probe				= NULL,
+	.acpi_madt_oem_check		= x2apic_acpi_madt_oem_check,
+	.apic_id_registered		= x2apic_apic_id_registered,
+
+	.irq_delivery_mode		= dest_Fixed,
+	.irq_dest_mode			= 0, /* physical */
+
+	.target_cpus			= x2apic_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= 0,
+	.check_apicid_used		= NULL,
+	.check_apicid_present		= NULL,
+
+	.vector_allocation_domain	= x2apic_vector_allocation_domain,
+	.init_apic_ldr			= init_x2apic_ldr,
+
+	.ioapic_phys_id_map		= NULL,
+	.setup_apic_routing		= NULL,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= NULL,
+	.cpu_to_logical_apicid		= NULL,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= NULL,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= x2apic_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= x2apic_phys_get_apic_id,
+	.set_apic_id			= set_apic_id,
+	.apic_id_mask			= 0xFFFFFFFFu,
+
+	.cpu_mask_to_apicid		= x2apic_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= x2apic_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= x2apic_send_IPI_mask,
+	.send_IPI_mask_allbutself	= x2apic_send_IPI_mask_allbutself,
+	.send_IPI_allbutself		= x2apic_send_IPI_allbutself,
+	.send_IPI_all			= x2apic_send_IPI_all,
+	.send_IPI_self			= x2apic_send_IPI_self,
+
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+	.wait_for_init_deassert		= NULL,
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_msr_read,
+	.write				= native_apic_msr_write,
+	.icr_read			= native_x2apic_icr_read,
+	.icr_write			= native_x2apic_icr_write,
+	.wait_icr_idle			= native_x2apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_x2apic_wait_icr_idle,
 };
diff --git a/arch/x86/kernel/genx2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c
index b193e082f6ce..1bd6da1f8fad 100644
--- a/arch/x86/kernel/genx2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -7,27 +7,28 @@
  *
  * Copyright (C) 2007-2008 Silicon Graphics, Inc. All rights reserved.
  */
-
-#include <linux/kernel.h>
-#include <linux/threads.h>
-#include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/hardirq.h>
+#include <linux/proc_fs.h>
+#include <linux/threads.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
 #include <linux/string.h>
 #include <linux/ctype.h>
-#include <linux/init.h>
 #include <linux/sched.h>
-#include <linux/module.h>
-#include <linux/hardirq.h>
 #include <linux/timer.h>
-#include <linux/proc_fs.h>
-#include <asm/current.h>
-#include <asm/smp.h>
-#include <asm/ipi.h>
-#include <asm/genapic.h>
-#include <asm/pgtable.h>
+#include <linux/cpu.h>
+#include <linux/init.h>
+
 #include <asm/uv/uv_mmrs.h>
 #include <asm/uv/uv_hub.h>
+#include <asm/current.h>
+#include <asm/pgtable.h>
 #include <asm/uv/bios.h>
+#include <asm/uv/uv.h>
+#include <asm/apic.h>
+#include <asm/ipi.h>
+#include <asm/smp.h>
 
 DEFINE_PER_CPU(int, x2apic_extra_bits);
 
@@ -90,39 +91,43 @@ static void uv_vector_allocation_domain(int cpu, struct cpumask *retmask)
 	cpumask_set_cpu(cpu, retmask);
 }
 
-int uv_wakeup_secondary(int phys_apicid, unsigned int start_rip)
+static int uv_wakeup_secondary(int phys_apicid, unsigned long start_rip)
 {
+#ifdef CONFIG_SMP
 	unsigned long val;
 	int pnode;
 
 	pnode = uv_apicid_to_pnode(phys_apicid);
 	val = (1UL << UVH_IPI_INT_SEND_SHFT) |
 	    (phys_apicid << UVH_IPI_INT_APIC_ID_SHFT) |
-	    (((long)start_rip << UVH_IPI_INT_VECTOR_SHFT) >> 12) |
+	    ((start_rip << UVH_IPI_INT_VECTOR_SHFT) >> 12) |
 	    APIC_DM_INIT;
 	uv_write_global_mmr64(pnode, UVH_IPI_INT, val);
 	mdelay(10);
 
 	val = (1UL << UVH_IPI_INT_SEND_SHFT) |
 	    (phys_apicid << UVH_IPI_INT_APIC_ID_SHFT) |
-	    (((long)start_rip << UVH_IPI_INT_VECTOR_SHFT) >> 12) |
+	    ((start_rip << UVH_IPI_INT_VECTOR_SHFT) >> 12) |
 	    APIC_DM_STARTUP;
 	uv_write_global_mmr64(pnode, UVH_IPI_INT, val);
+
+	atomic_set(&init_deasserted, 1);
+#endif
 	return 0;
 }
 
 static void uv_send_IPI_one(int cpu, int vector)
 {
-	unsigned long val, apicid, lapicid;
+	unsigned long val, apicid;
 	int pnode;
 
 	apicid = per_cpu(x86_cpu_to_apicid, cpu);
-	lapicid = apicid & 0x3f;		/* ZZZ macro needed */
 	pnode = uv_apicid_to_pnode(apicid);
-	val =
-	    (1UL << UVH_IPI_INT_SEND_SHFT) | (lapicid <<
-					      UVH_IPI_INT_APIC_ID_SHFT) |
-	    (vector << UVH_IPI_INT_VECTOR_SHFT);
+
+	val = (1UL << UVH_IPI_INT_SEND_SHFT) |
+	      (apicid << UVH_IPI_INT_APIC_ID_SHFT) |
+	      (vector << UVH_IPI_INT_VECTOR_SHFT);
+
 	uv_write_global_mmr64(pnode, UVH_IPI_INT, val);
 }
 
@@ -136,22 +141,24 @@ static void uv_send_IPI_mask(const struct cpumask *mask, int vector)
 
 static void uv_send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
 {
-	unsigned int cpu;
 	unsigned int this_cpu = smp_processor_id();
+	unsigned int cpu;
 
-	for_each_cpu(cpu, mask)
+	for_each_cpu(cpu, mask) {
 		if (cpu != this_cpu)
 			uv_send_IPI_one(cpu, vector);
+	}
 }
 
 static void uv_send_IPI_allbutself(int vector)
 {
-	unsigned int cpu;
 	unsigned int this_cpu = smp_processor_id();
+	unsigned int cpu;
 
-	for_each_online_cpu(cpu)
+	for_each_online_cpu(cpu) {
 		if (cpu != this_cpu)
 			uv_send_IPI_one(cpu, vector);
+	}
 }
 
 static void uv_send_IPI_all(int vector)
@@ -170,21 +177,21 @@ static void uv_init_apic_ldr(void)
 
 static unsigned int uv_cpu_mask_to_apicid(const struct cpumask *cpumask)
 {
-	int cpu;
-
 	/*
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	cpu = cpumask_first(cpumask);
+	int cpu = cpumask_first(cpumask);
+
 	if ((unsigned)cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
 	else
 		return BAD_APICID;
 }
 
-static unsigned int uv_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-					      const struct cpumask *andmask)
+static unsigned int
+uv_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+			  const struct cpumask *andmask)
 {
 	int cpu;
 
@@ -192,15 +199,17 @@ static unsigned int uv_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
 	 * We're using fixed IRQ delivery, can only return one phys APIC ID.
 	 * May as well be the first.
 	 */
-	for_each_cpu_and(cpu, cpumask, andmask)
+	for_each_cpu_and(cpu, cpumask, andmask) {
 		if (cpumask_test_cpu(cpu, cpu_online_mask))
 			break;
+	}
 	if (cpu < nr_cpu_ids)
 		return per_cpu(x86_cpu_to_apicid, cpu);
+
 	return BAD_APICID;
 }
 
-static unsigned int get_apic_id(unsigned long x)
+static unsigned int x2apic_get_apic_id(unsigned long x)
 {
 	unsigned int id;
 
@@ -222,10 +231,10 @@ static unsigned long set_apic_id(unsigned int id)
 static unsigned int uv_read_apic_id(void)
 {
 
-	return get_apic_id(apic_read(APIC_ID));
+	return x2apic_get_apic_id(apic_read(APIC_ID));
 }
 
-static unsigned int phys_pkg_id(int index_msb)
+static int uv_phys_pkg_id(int initial_apicid, int index_msb)
 {
 	return uv_read_apic_id() >> index_msb;
 }
@@ -235,26 +244,64 @@ static void uv_send_IPI_self(int vector)
 	apic_write(APIC_SELF_IPI, vector);
 }
 
-struct genapic apic_x2apic_uv_x = {
-	.name = "UV large system",
-	.acpi_madt_oem_check = uv_acpi_madt_oem_check,
-	.int_delivery_mode = dest_Fixed,
-	.int_dest_mode = (APIC_DEST_PHYSICAL != 0),
-	.target_cpus = uv_target_cpus,
-	.vector_allocation_domain = uv_vector_allocation_domain,
-	.apic_id_registered = uv_apic_id_registered,
-	.init_apic_ldr = uv_init_apic_ldr,
-	.send_IPI_all = uv_send_IPI_all,
-	.send_IPI_allbutself = uv_send_IPI_allbutself,
-	.send_IPI_mask = uv_send_IPI_mask,
-	.send_IPI_mask_allbutself = uv_send_IPI_mask_allbutself,
-	.send_IPI_self = uv_send_IPI_self,
-	.cpu_mask_to_apicid = uv_cpu_mask_to_apicid,
-	.cpu_mask_to_apicid_and = uv_cpu_mask_to_apicid_and,
-	.phys_pkg_id = phys_pkg_id,
-	.get_apic_id = get_apic_id,
-	.set_apic_id = set_apic_id,
-	.apic_id_mask = (0xFFFFFFFFu),
+struct apic apic_x2apic_uv_x = {
+
+	.name				= "UV large system",
+	.probe				= NULL,
+	.acpi_madt_oem_check		= uv_acpi_madt_oem_check,
+	.apic_id_registered		= uv_apic_id_registered,
+
+	.irq_delivery_mode		= dest_Fixed,
+	.irq_dest_mode			= 1, /* logical */
+
+	.target_cpus			= uv_target_cpus,
+	.disable_esr			= 0,
+	.dest_logical			= APIC_DEST_LOGICAL,
+	.check_apicid_used		= NULL,
+	.check_apicid_present		= NULL,
+
+	.vector_allocation_domain	= uv_vector_allocation_domain,
+	.init_apic_ldr			= uv_init_apic_ldr,
+
+	.ioapic_phys_id_map		= NULL,
+	.setup_apic_routing		= NULL,
+	.multi_timer_check		= NULL,
+	.apicid_to_node			= NULL,
+	.cpu_to_logical_apicid		= NULL,
+	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
+	.apicid_to_cpu_present		= NULL,
+	.setup_portio_remap		= NULL,
+	.check_phys_apicid_present	= default_check_phys_apicid_present,
+	.enable_apic_mode		= NULL,
+	.phys_pkg_id			= uv_phys_pkg_id,
+	.mps_oem_check			= NULL,
+
+	.get_apic_id			= x2apic_get_apic_id,
+	.set_apic_id			= set_apic_id,
+	.apic_id_mask			= 0xFFFFFFFFu,
+
+	.cpu_mask_to_apicid		= uv_cpu_mask_to_apicid,
+	.cpu_mask_to_apicid_and		= uv_cpu_mask_to_apicid_and,
+
+	.send_IPI_mask			= uv_send_IPI_mask,
+	.send_IPI_mask_allbutself	= uv_send_IPI_mask_allbutself,
+	.send_IPI_allbutself		= uv_send_IPI_allbutself,
+	.send_IPI_all			= uv_send_IPI_all,
+	.send_IPI_self			= uv_send_IPI_self,
+
+	.wakeup_secondary_cpu		= uv_wakeup_secondary,
+	.trampoline_phys_low		= DEFAULT_TRAMPOLINE_PHYS_LOW,
+	.trampoline_phys_high		= DEFAULT_TRAMPOLINE_PHYS_HIGH,
+	.wait_for_init_deassert		= NULL,
+	.smp_callin_clear_local_apic	= NULL,
+	.inquire_remote_apic		= NULL,
+
+	.read				= native_apic_msr_read,
+	.write				= native_apic_msr_write,
+	.icr_read			= native_x2apic_icr_read,
+	.icr_write			= native_x2apic_icr_write,
+	.wait_icr_idle			= native_x2apic_wait_icr_idle,
+	.safe_wait_icr_idle		= native_safe_x2apic_wait_icr_idle,
 };
 
 static __cpuinit void set_x2apic_extra_bits(int pnode)
@@ -322,7 +369,7 @@ static __init void map_high(char *id, unsigned long base, int shift,
 	paddr = base << shift;
 	bytes = (1UL << shift) * (max_pnode + 1);
 	printk(KERN_INFO "UV: Map %s_HI 0x%lx - 0x%lx\n", id, paddr,
-	       					paddr + bytes);
+						paddr + bytes);
 	if (map_type == map_uc)
 		init_extra_mapping_uc(paddr, bytes);
 	else
@@ -485,7 +532,7 @@ late_initcall(uv_init_heartbeat);
 
 /*
  * Called on each cpu to initialize the per_cpu UV data area.
- * 	ZZZ hotplug not supported yet
+ * FIXME: hotplug not supported yet
  */
 void __cpuinit uv_cpu_init(void)
 {
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 266ec6c18b6c..10033fe718e0 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -301,7 +301,7 @@ extern int (*console_blank_hook)(int);
  */
 #define APM_ZERO_SEGS
 
-#include "apm.h"
+#include <asm/apm.h>
 
 /*
  * Define to re-initialize the interrupt 0 timer to 100 Hz after a suspend.
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index ee4df08feee6..fbf2f33e3080 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -75,6 +75,7 @@ void foo(void)
 	OFFSET(PT_DS,  pt_regs, ds);
 	OFFSET(PT_ES,  pt_regs, es);
 	OFFSET(PT_FS,  pt_regs, fs);
+	OFFSET(PT_GS,  pt_regs, gs);
 	OFFSET(PT_ORIG_EAX, pt_regs, orig_ax);
 	OFFSET(PT_EIP, pt_regs, ip);
 	OFFSET(PT_CS,  pt_regs, cs);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1d41d3f1edbc..8793ab33e2c1 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -11,7 +11,6 @@
 #include <linux/hardirq.h>
 #include <linux/suspend.h>
 #include <linux/kbuild.h>
-#include <asm/pda.h>
 #include <asm/processor.h>
 #include <asm/segment.h>
 #include <asm/thread_info.h>
@@ -48,16 +47,6 @@ int main(void)
 #endif
 	BLANK();
 #undef ENTRY
-#define ENTRY(entry) DEFINE(pda_ ## entry, offsetof(struct x8664_pda, entry))
-	ENTRY(kernelstack); 
-	ENTRY(oldrsp); 
-	ENTRY(pcurrent); 
-	ENTRY(irqcount);
-	ENTRY(cpunumber);
-	ENTRY(irqstackptr);
-	ENTRY(data_offset);
-	BLANK();
-#undef ENTRY
 #ifdef CONFIG_PARAVIRT
 	BLANK();
 	OFFSET(PARAVIRT_enabled, pv_info, paravirt_enabled);
diff --git a/arch/x86/kernel/cpu/addon_cpuid_features.c b/arch/x86/kernel/cpu/addon_cpuid_features.c
index 2cf23634b6d9..6882a735d9c0 100644
--- a/arch/x86/kernel/cpu/addon_cpuid_features.c
+++ b/arch/x86/kernel/cpu/addon_cpuid_features.c
@@ -7,7 +7,7 @@
 #include <asm/pat.h>
 #include <asm/processor.h>
 
-#include <mach_apic.h>
+#include <asm/apic.h>
 
 struct cpuid_bit {
 	u16 feature;
@@ -69,7 +69,7 @@ void __cpuinit init_scattered_cpuid_features(struct cpuinfo_x86 *c)
  */
 void __cpuinit detect_extended_topology(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 	unsigned int eax, ebx, ecx, edx, sub_index;
 	unsigned int ht_mask_width, core_plus_mask_width;
 	unsigned int core_select_mask, core_level_siblings;
@@ -116,22 +116,14 @@ void __cpuinit detect_extended_topology(struct cpuinfo_x86 *c)
 
 	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
 
-#ifdef CONFIG_X86_32
-	c->cpu_core_id = phys_pkg_id(c->initial_apicid, ht_mask_width)
+	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
 						 & core_select_mask;
-	c->phys_proc_id = phys_pkg_id(c->initial_apicid, core_plus_mask_width);
+	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
 	/*
 	 * Reinit the apicid, now that we have extended initial_apicid.
 	 */
-	c->apicid = phys_pkg_id(c->initial_apicid, 0);
-#else
-	c->cpu_core_id = phys_pkg_id(ht_mask_width) & core_select_mask;
-	c->phys_proc_id = phys_pkg_id(core_plus_mask_width);
-	/*
-	 * Reinit the apicid, now that we have extended initial_apicid.
-	 */
-	c->apicid = phys_pkg_id(0);
-#endif
+	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
+
 	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
 
 
@@ -143,37 +135,3 @@ void __cpuinit detect_extended_topology(struct cpuinfo_x86 *c)
 	return;
 #endif
 }
-
-#ifdef CONFIG_X86_PAT
-void __cpuinit validate_pat_support(struct cpuinfo_x86 *c)
-{
-	if (!cpu_has_pat)
-		pat_disable("PAT not supported by CPU.");
-
-	switch (c->x86_vendor) {
-	case X86_VENDOR_INTEL:
-		/*
-		 * There is a known erratum on Pentium III and Core Solo
-		 * and Core Duo CPUs.
-		 * " Page with PAT set to WC while associated MTRR is UC
-		 *   may consolidate to UC "
-		 * Because of this erratum, it is better to stick with
-		 * setting WC in MTRR rather than using PAT on these CPUs.
-		 *
-		 * Enable PAT WC only on P4, Core 2 or later CPUs.
-		 */
-		if (c->x86 > 0x6 || (c->x86 == 6 && c->x86_model >= 15))
-			return;
-
-		pat_disable("PAT WC disabled due to known CPU erratum.");
-		return;
-
-	case X86_VENDOR_AMD:
-	case X86_VENDOR_CENTAUR:
-	case X86_VENDOR_TRANSMETA:
-		return;
-	}
-
-	pat_disable("PAT disabled. Not yet verified on this CPU type.");
-}
-#endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 7c878f6aa919..25423a5b80ed 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -12,8 +12,6 @@
 # include <asm/cacheflush.h>
 #endif
 
-#include <mach_apic.h>
-
 #include "cpu.h"
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 83492b1f93b1..826d5c876278 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -21,14 +21,14 @@
 #include <asm/asm.h>
 #include <asm/numa.h>
 #include <asm/smp.h>
-#ifdef CONFIG_X86_LOCAL_APIC
-#include <asm/mpspec.h>
+#include <asm/cpu.h>
+#include <asm/cpumask.h>
 #include <asm/apic.h>
-#include <mach_apic.h>
-#include <asm/genapic.h>
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/uv/uv.h>
 #endif
 
-#include <asm/pda.h>
 #include <asm/pgtable.h>
 #include <asm/processor.h>
 #include <asm/desc.h>
@@ -37,6 +37,7 @@
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/hypervisor.h>
+#include <asm/stackprotector.h>
 
 #include "cpu.h"
 
@@ -50,6 +51,15 @@ cpumask_var_t cpu_initialized_mask;
 /* representing cpus for which sibling maps can be computed */
 cpumask_var_t cpu_sibling_setup_mask;
 
+/* correctly size the local cpu masks */
+void __init setup_cpu_local_masks(void)
+{
+	alloc_bootmem_cpumask_var(&cpu_initialized_mask);
+	alloc_bootmem_cpumask_var(&cpu_callin_mask);
+	alloc_bootmem_cpumask_var(&cpu_callout_mask);
+	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
+}
+
 #else /* CONFIG_X86_32 */
 
 cpumask_t cpu_callin_map;
@@ -62,23 +72,23 @@ cpumask_t cpu_sibling_setup_map;
 
 static struct cpu_dev *this_cpu __cpuinitdata;
 
+DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 #ifdef CONFIG_X86_64
-/* We need valid kernel segments for data and code in long mode too
- * IRET will check the segment types  kkeil 2000/10/28
- * Also sysret mandates a special GDT layout
- */
-/* The TLS descriptors are currently at a different place compared to i386.
-   Hopefully nobody expects them at a fixed place (Wine?) */
-DEFINE_PER_CPU(struct gdt_page, gdt_page) = { .gdt = {
+	/*
+	 * We need valid kernel segments for data and code in long mode too
+	 * IRET will check the segment types  kkeil 2000/10/28
+	 * Also sysret mandates a special GDT layout
+	 *
+	 * The TLS descriptors are currently at a different place compared to i386.
+	 * Hopefully nobody expects them at a fixed place (Wine?)
+	 */
 	[GDT_ENTRY_KERNEL32_CS] = { { { 0x0000ffff, 0x00cf9b00 } } },
 	[GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00af9b00 } } },
 	[GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9300 } } },
 	[GDT_ENTRY_DEFAULT_USER32_CS] = { { { 0x0000ffff, 0x00cffb00 } } },
 	[GDT_ENTRY_DEFAULT_USER_DS] = { { { 0x0000ffff, 0x00cff300 } } },
 	[GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00affb00 } } },
-} };
 #else
-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 	[GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00cf9a00 } } },
 	[GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9200 } } },
 	[GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00cffa00 } } },
@@ -110,9 +120,10 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
 	[GDT_ENTRY_APMBIOS_BASE+2] = { { { 0x0000ffff, 0x00409200 } } },
 
 	[GDT_ENTRY_ESPFIX_SS] = { { { 0x00000000, 0x00c09200 } } },
-	[GDT_ENTRY_PERCPU] = { { { 0x00000000, 0x00000000 } } },
-} };
+	[GDT_ENTRY_PERCPU] = { { { 0x0000ffff, 0x00cf9200 } } },
+	GDT_STACK_CANARY_INIT
 #endif
+} };
 EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
 
 #ifdef CONFIG_X86_32
@@ -213,6 +224,49 @@ static inline void squash_the_stupid_serial_number(struct cpuinfo_x86 *c)
 #endif
 
 /*
+ * Some CPU features depend on higher CPUID levels, which may not always
+ * be available due to CPUID level capping or broken virtualization
+ * software.  Add those features to this table to auto-disable them.
+ */
+struct cpuid_dependent_feature {
+	u32 feature;
+	u32 level;
+};
+static const struct cpuid_dependent_feature __cpuinitconst
+cpuid_dependent_features[] = {
+	{ X86_FEATURE_MWAIT,		0x00000005 },
+	{ X86_FEATURE_DCA,		0x00000009 },
+	{ X86_FEATURE_XSAVE,		0x0000000d },
+	{ 0, 0 }
+};
+
+static void __cpuinit filter_cpuid_features(struct cpuinfo_x86 *c, bool warn)
+{
+	const struct cpuid_dependent_feature *df;
+	for (df = cpuid_dependent_features; df->feature; df++) {
+		/*
+		 * Note: cpuid_level is set to -1 if unavailable, but
+		 * extended_extended_level is set to 0 if unavailable
+		 * and the legitimate extended levels are all negative
+		 * when signed; hence the weird messing around with
+		 * signs here...
+		 */
+		if (cpu_has(c, df->feature) &&
+		    ((s32)df->level < 0 ?
+		     (u32)df->level > (u32)c->extended_cpuid_level :
+		     (s32)df->level > (s32)c->cpuid_level)) {
+			clear_cpu_cap(c, df->feature);
+			if (warn)
+				printk(KERN_WARNING
+				       "CPU: CPU feature %s disabled "
+				       "due to lack of CPUID level 0x%x\n",
+				       x86_cap_flags[df->feature],
+				       df->level);
+		}
+	}
+}
+
+/*
  * Naming convention should be: <Name> [(<Codename>)]
  * This table only is used unless init_<vendor>() below doesn't set it;
  * in particular, if CPUID levels 0x80000002..4 are supported, this isn't used
@@ -242,18 +296,29 @@ static char __cpuinit *table_lookup_model(struct cpuinfo_x86 *c)
 
 __u32 cleared_cpu_caps[NCAPINTS] __cpuinitdata;
 
+void load_percpu_segment(int cpu)
+{
+#ifdef CONFIG_X86_32
+	loadsegment(fs, __KERNEL_PERCPU);
+#else
+	loadsegment(gs, 0);
+	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+#endif
+	load_stack_canary_segment();
+}
+
 /* Current gdt points %fs at the "master" per-cpu area: after this,
  * it's on the real one. */
-void switch_to_new_gdt(void)
+void switch_to_new_gdt(int cpu)
 {
 	struct desc_ptr gdt_descr;
 
-	gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+	gdt_descr.address = (long)get_cpu_gdt_table(cpu);
 	gdt_descr.size = GDT_SIZE - 1;
 	load_gdt(&gdt_descr);
-#ifdef CONFIG_X86_32
-	asm("mov %0, %%fs" : : "r" (__KERNEL_PERCPU) : "memory");
-#endif
+	/* Reload the per-cpu base */
+
+	load_percpu_segment(cpu);
 }
 
 static struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
@@ -383,11 +448,7 @@ void __cpuinit detect_ht(struct cpuinfo_x86 *c)
 		}
 
 		index_msb = get_count_order(smp_num_siblings);
-#ifdef CONFIG_X86_64
-		c->phys_proc_id = phys_pkg_id(index_msb);
-#else
-		c->phys_proc_id = phys_pkg_id(c->initial_apicid, index_msb);
-#endif
+		c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb);
 
 		smp_num_siblings = smp_num_siblings / c->x86_max_cores;
 
@@ -395,13 +456,8 @@ void __cpuinit detect_ht(struct cpuinfo_x86 *c)
 
 		core_bits = get_count_order(c->x86_max_cores);
 
-#ifdef CONFIG_X86_64
-		c->cpu_core_id = phys_pkg_id(index_msb) &
+		c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, index_msb) &
 					       ((1 << core_bits) - 1);
-#else
-		c->cpu_core_id = phys_pkg_id(c->initial_apicid, index_msb) &
-					       ((1 << core_bits) - 1);
-#endif
 	}
 
 out:
@@ -570,11 +626,10 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	if (this_cpu->c_early_init)
 		this_cpu->c_early_init(c);
 
-	validate_pat_support(c);
-
 #ifdef CONFIG_SMP
 	c->cpu_index = boot_cpu_id;
 #endif
+	filter_cpuid_features(c, false);
 }
 
 void __init early_cpu_init(void)
@@ -637,7 +692,7 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 *c)
 		c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
 #ifdef CONFIG_X86_32
 # ifdef CONFIG_X86_HT
-		c->apicid = phys_pkg_id(c->initial_apicid, 0);
+		c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
 # else
 		c->apicid = c->initial_apicid;
 # endif
@@ -684,7 +739,7 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 		this_cpu->c_identify(c);
 
 #ifdef CONFIG_X86_64
-	c->apicid = phys_pkg_id(0);
+	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
 #endif
 
 	/*
@@ -708,6 +763,9 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 	 * we do "generic changes."
 	 */
 
+	/* Filter out anything that depends on CPUID levels we don't have */
+	filter_cpuid_features(c, true);
+
 	/* If the model name is still unset, do table lookup. */
 	if (!c->x86_model_id[0]) {
 		char *p;
@@ -877,54 +935,22 @@ static __init int setup_disablecpuid(char *arg)
 __setup("clearcpuid=", setup_disablecpuid);
 
 #ifdef CONFIG_X86_64
-struct x8664_pda **_cpu_pda __read_mostly;
-EXPORT_SYMBOL(_cpu_pda);
-
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
-static char boot_cpu_stack[IRQSTACKSIZE] __page_aligned_bss;
+DEFINE_PER_CPU_FIRST(union irq_stack_union,
+		     irq_stack_union) __aligned(PAGE_SIZE);
+DEFINE_PER_CPU(char *, irq_stack_ptr) =
+	init_per_cpu_var(irq_stack_union.irq_stack) + IRQ_STACK_SIZE - 64;
 
-void __cpuinit pda_init(int cpu)
-{
-	struct x8664_pda *pda = cpu_pda(cpu);
+DEFINE_PER_CPU(unsigned long, kernel_stack) =
+	(unsigned long)&init_thread_union - KERNEL_STACK_OFFSET + THREAD_SIZE;
+EXPORT_PER_CPU_SYMBOL(kernel_stack);
 
-	/* Setup up data that may be needed in __get_free_pages early */
-	loadsegment(fs, 0);
-	loadsegment(gs, 0);
-	/* Memory clobbers used to order PDA accessed */
-	mb();
-	wrmsrl(MSR_GS_BASE, pda);
-	mb();
-
-	pda->cpunumber = cpu;
-	pda->irqcount = -1;
-	pda->kernelstack = (unsigned long)stack_thread_info() -
-				 PDA_STACKOFFSET + THREAD_SIZE;
-	pda->active_mm = &init_mm;
-	pda->mmu_state = 0;
-
-	if (cpu == 0) {
-		/* others are initialized in smpboot.c */
-		pda->pcurrent = &init_task;
-		pda->irqstackptr = boot_cpu_stack;
-		pda->irqstackptr += IRQSTACKSIZE - 64;
-	} else {
-		if (!pda->irqstackptr) {
-			pda->irqstackptr = (char *)
-				__get_free_pages(GFP_ATOMIC, IRQSTACK_ORDER);
-			if (!pda->irqstackptr)
-				panic("cannot allocate irqstack for cpu %d",
-				      cpu);
-			pda->irqstackptr += IRQSTACKSIZE - 64;
-		}
+DEFINE_PER_CPU(unsigned int, irq_count) = -1;
 
-		if (pda->nodenumber == 0 && cpu_to_node(cpu) != NUMA_NO_NODE)
-			pda->nodenumber = cpu_to_node(cpu);
-	}
-}
-
-static char boot_exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ +
-				  DEBUG_STKSZ] __page_aligned_bss;
+static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
+	[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ])
+	__aligned(PAGE_SIZE);
 
 extern asmlinkage void ignore_sysret(void);
 
@@ -957,16 +983,21 @@ unsigned long kernel_eflags;
  */
 DEFINE_PER_CPU(struct orig_ist, orig_ist);
 
-#else
+#else	/* x86_64 */
 
-/* Make sure %fs is initialized properly in idle threads */
+#ifdef CONFIG_CC_STACKPROTECTOR
+DEFINE_PER_CPU(unsigned long, stack_canary);
+#endif
+
+/* Make sure %fs and %gs are initialized properly in idle threads */
 struct pt_regs * __cpuinit idle_regs(struct pt_regs *regs)
 {
 	memset(regs, 0, sizeof(struct pt_regs));
 	regs->fs = __KERNEL_PERCPU;
+	regs->gs = __KERNEL_STACK_CANARY;
 	return regs;
 }
-#endif
+#endif	/* x86_64 */
 
 /*
  * cpu_init() initializes state that is per-CPU. Some data is already
@@ -982,15 +1013,14 @@ void __cpuinit cpu_init(void)
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
 	struct orig_ist *orig_ist = &per_cpu(orig_ist, cpu);
 	unsigned long v;
-	char *estacks = NULL;
 	struct task_struct *me;
 	int i;
 
-	/* CPU 0 is initialised in head64.c */
-	if (cpu != 0)
-		pda_init(cpu);
-	else
-		estacks = boot_exception_stacks;
+#ifdef CONFIG_NUMA
+	if (cpu != 0 && percpu_read(node_number) == 0 &&
+	    cpu_to_node(cpu) != NUMA_NO_NODE)
+		percpu_write(node_number, cpu_to_node(cpu));
+#endif
 
 	me = current;
 
@@ -1006,7 +1036,9 @@ void __cpuinit cpu_init(void)
 	 * and set up the GDT descriptor:
 	 */
 
-	switch_to_new_gdt();
+	switch_to_new_gdt(cpu);
+	loadsegment(fs, 0);
+
 	load_idt((const struct desc_ptr *)&idt_descr);
 
 	memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
@@ -1017,25 +1049,20 @@ void __cpuinit cpu_init(void)
 	barrier();
 
 	check_efer();
-	if (cpu != 0 && x2apic)
+	if (cpu != 0)
 		enable_x2apic();
 
 	/*
 	 * set up and load the per-CPU TSS
 	 */
 	if (!orig_ist->ist[0]) {
-		static const unsigned int order[N_EXCEPTION_STACKS] = {
-		  [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STACK_ORDER,
-		  [DEBUG_STACK - 1] = DEBUG_STACK_ORDER
+		static const unsigned int sizes[N_EXCEPTION_STACKS] = {
+		  [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ,
+		  [DEBUG_STACK - 1] = DEBUG_STKSZ
 		};
+		char *estacks = per_cpu(exception_stacks, cpu);
 		for (v = 0; v < N_EXCEPTION_STACKS; v++) {
-			if (cpu) {
-				estacks = (char *)__get_free_pages(GFP_ATOMIC, order[v]);
-				if (!estacks)
-					panic("Cannot allocate exception "
-					      "stack %ld %d\n", v, cpu);
-			}
-			estacks += PAGE_SIZE << order[v];
+			estacks += sizes[v];
 			orig_ist->ist[v] = t->x86_tss.ist[v] =
 					(unsigned long)estacks;
 		}
@@ -1069,22 +1096,19 @@ void __cpuinit cpu_init(void)
 	 */
 	if (kgdb_connected && arch_kgdb_ops.correct_hw_break)
 		arch_kgdb_ops.correct_hw_break();
-	else {
+	else
 #endif
-	/*
-	 * Clear all 6 debug registers:
-	 */
-
-	set_debugreg(0UL, 0);
-	set_debugreg(0UL, 1);
-	set_debugreg(0UL, 2);
-	set_debugreg(0UL, 3);
-	set_debugreg(0UL, 6);
-	set_debugreg(0UL, 7);
-#ifdef CONFIG_KGDB
-	/* If the kgdb is connected no debug regs should be altered. */
+	{
+		/*
+		 * Clear all 6 debug registers:
+		 */
+		set_debugreg(0UL, 0);
+		set_debugreg(0UL, 1);
+		set_debugreg(0UL, 2);
+		set_debugreg(0UL, 3);
+		set_debugreg(0UL, 6);
+		set_debugreg(0UL, 7);
 	}
-#endif
 
 	fpu_init();
 
@@ -1114,7 +1138,7 @@ void __cpuinit cpu_init(void)
 		clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
 
 	load_idt(&idt_descr);
-	switch_to_new_gdt();
+	switch_to_new_gdt(cpu);
 
 	/*
 	 * Set up and load the per-CPU TSS and LDT
@@ -1135,9 +1159,6 @@ void __cpuinit cpu_init(void)
 	__set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
 #endif
 
-	/* Clear %gs. */
-	asm volatile ("mov %0, %%gs" : : "r" (0));
-
 	/* Clear all 6 debug registers: */
 	set_debugreg(0, 0);
 	set_debugreg(0, 1);
diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 3babe1f1e912..23da96e57b17 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -601,7 +601,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	if (!data)
 		return -ENOMEM;
 
-	data->acpi_data = percpu_ptr(acpi_perf_data, cpu);
+	data->acpi_data = per_cpu_ptr(acpi_perf_data, cpu);
 	per_cpu(drv_data, cpu) = data;
 
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC))
diff --git a/arch/x86/kernel/cpu/cpufreq/e_powersaver.c b/arch/x86/kernel/cpu/cpufreq/e_powersaver.c
index 3f83ea12c47a..35a257dd4bb7 100644
--- a/arch/x86/kernel/cpu/cpufreq/e_powersaver.c
+++ b/arch/x86/kernel/cpu/cpufreq/e_powersaver.c
@@ -204,12 +204,12 @@ static int eps_cpu_init(struct cpufreq_policy *policy)
 	}
 	/* Enable Enhanced PowerSaver */
 	rdmsrl(MSR_IA32_MISC_ENABLE, val);
-	if (!(val & 1 << 16)) {
-		val |= 1 << 16;
+	if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+		val |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
 		wrmsrl(MSR_IA32_MISC_ENABLE, val);
 		/* Can be locked at 0 */
 		rdmsrl(MSR_IA32_MISC_ENABLE, val);
-		if (!(val & 1 << 16)) {
+		if (!(val & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
 			printk(KERN_INFO "eps: Can't enable Enhanced PowerSaver\n");
 			return -ENODEV;
 		}
diff --git a/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c b/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
index f08998278a3a..c9f1fdc02830 100644
--- a/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
+++ b/arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -390,14 +390,14 @@ static int centrino_cpu_init(struct cpufreq_policy *policy)
 	   enable it if not. */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
-	if (!(l & (1<<16))) {
-		l |= (1<<16);
+	if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
+		l |= MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP;
 		dprintk("trying to enable Enhanced SpeedStep (%x)\n", l);
 		wrmsr(MSR_IA32_MISC_ENABLE, l, h);
 
 		/* check to see if it stuck */
 		rdmsr(MSR_IA32_MISC_ENABLE, l, h);
-		if (!(l & (1<<16))) {
+		if (!(l & MSR_IA32_MISC_ENABLE_ENHANCED_SPEEDSTEP)) {
 			printk(KERN_INFO PFX
 				"couldn't enable Enhanced SpeedStep\n");
 			return -ENODEV;
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5fff00c70de0..1a89a2b68d15 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -25,7 +25,6 @@
 #ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/mpspec.h>
 #include <asm/apic.h>
-#include <mach_apic.h>
 #endif
 
 static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
@@ -69,6 +68,18 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
 		sched_clock_stable = 1;
 	}
 
+	/*
+	 * There is a known erratum on Pentium III and Core Solo
+	 * and Core Duo CPUs.
+	 * " Page with PAT set to WC while associated MTRR is UC
+	 *   may consolidate to UC "
+	 * Because of this erratum, it is better to stick with
+	 * setting WC in MTRR rather than using PAT on these CPUs.
+	 *
+	 * Enable PAT WC only on P4, Core 2 or later CPUs.
+	 */
+	if (c->x86 == 6 && c->x86_model < 15)
+		clear_cpu_cap(c, X86_FEATURE_PAT);
 }
 
 #ifdef CONFIG_X86_32
@@ -141,10 +152,10 @@ static void __cpuinit intel_workarounds(struct cpuinfo_x86 *c)
 	 */
 	if ((c->x86 == 15) && (c->x86_model == 1) && (c->x86_mask == 1)) {
 		rdmsr(MSR_IA32_MISC_ENABLE, lo, hi);
-		if ((lo & (1<<9)) == 0) {
+		if ((lo & MSR_IA32_MISC_ENABLE_PREFETCH_DISABLE) == 0) {
 			printk (KERN_INFO "CPU: C0 stepping P4 Xeon detected.\n");
 			printk (KERN_INFO "CPU: Disabling hardware prefetching (Errata 037)\n");
-			lo |= (1<<9);	/* Disable hw prefetching */
+			lo |= MSR_IA32_MISC_ENABLE_PREFETCH_DISABLE;
 			wrmsr (MSR_IA32_MISC_ENABLE, lo, hi);
 		}
 	}
diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index da299eb85fc0..7293508d8f5c 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -147,7 +147,16 @@ struct _cpuid4_info {
 	union _cpuid4_leaf_ecx ecx;
 	unsigned long size;
 	unsigned long can_disable;
-	cpumask_t shared_cpu_map;	/* future?: only cpus/node is needed */
+	DECLARE_BITMAP(shared_cpu_map, NR_CPUS);
+};
+
+/* subset of above _cpuid4_info w/o shared_cpu_map */
+struct _cpuid4_info_regs {
+	union _cpuid4_leaf_eax eax;
+	union _cpuid4_leaf_ebx ebx;
+	union _cpuid4_leaf_ecx ecx;
+	unsigned long size;
+	unsigned long can_disable;
 };
 
 #ifdef CONFIG_PCI
@@ -278,7 +287,7 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 }
 
 static void __cpuinit
-amd_check_l3_disable(int index, struct _cpuid4_info *this_leaf)
+amd_check_l3_disable(int index, struct _cpuid4_info_regs *this_leaf)
 {
 	if (index < 3)
 		return;
@@ -286,7 +295,8 @@ amd_check_l3_disable(int index, struct _cpuid4_info *this_leaf)
 }
 
 static int
-__cpuinit cpuid4_cache_lookup(int index, struct _cpuid4_info *this_leaf)
+__cpuinit cpuid4_cache_lookup_regs(int index,
+				   struct _cpuid4_info_regs *this_leaf)
 {
 	union _cpuid4_leaf_eax 	eax;
 	union _cpuid4_leaf_ebx 	ebx;
@@ -314,6 +324,15 @@ __cpuinit cpuid4_cache_lookup(int index, struct _cpuid4_info *this_leaf)
 	return 0;
 }
 
+static int
+__cpuinit cpuid4_cache_lookup(int index, struct _cpuid4_info *this_leaf)
+{
+	struct _cpuid4_info_regs *leaf_regs =
+		(struct _cpuid4_info_regs *)this_leaf;
+
+	return cpuid4_cache_lookup_regs(index, leaf_regs);
+}
+
 static int __cpuinit find_num_cache_leaves(void)
 {
 	unsigned int		eax, ebx, ecx, edx;
@@ -353,11 +372,10 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
 		 * parameters cpuid leaf to find the cache details
 		 */
 		for (i = 0; i < num_cache_leaves; i++) {
-			struct _cpuid4_info this_leaf;
-
+			struct _cpuid4_info_regs this_leaf;
 			int retval;
 
-			retval = cpuid4_cache_lookup(i, &this_leaf);
+			retval = cpuid4_cache_lookup_regs(i, &this_leaf);
 			if (retval >= 0) {
 				switch(this_leaf.eax.split.level) {
 				    case 1:
@@ -506,17 +524,20 @@ static void __cpuinit cache_shared_cpu_map_setup(unsigned int cpu, int index)
 	num_threads_sharing = 1 + this_leaf->eax.split.num_threads_sharing;
 
 	if (num_threads_sharing == 1)
-		cpu_set(cpu, this_leaf->shared_cpu_map);
+		cpumask_set_cpu(cpu, to_cpumask(this_leaf->shared_cpu_map));
 	else {
 		index_msb = get_count_order(num_threads_sharing);
 
 		for_each_online_cpu(i) {
 			if (cpu_data(i).apicid >> index_msb ==
 			    c->apicid >> index_msb) {
-				cpu_set(i, this_leaf->shared_cpu_map);
+				cpumask_set_cpu(i,
+					to_cpumask(this_leaf->shared_cpu_map));
 				if (i != cpu && per_cpu(cpuid4_info, i))  {
-					sibling_leaf = CPUID4_INFO_IDX(i, index);
-					cpu_set(cpu, sibling_leaf->shared_cpu_map);
+					sibling_leaf =
+						CPUID4_INFO_IDX(i, index);
+					cpumask_set_cpu(cpu, to_cpumask(
+						sibling_leaf->shared_cpu_map));
 				}
 			}
 		}
@@ -528,9 +549,10 @@ static void __cpuinit cache_remove_shared_cpu_map(unsigned int cpu, int index)
 	int sibling;
 
 	this_leaf = CPUID4_INFO_IDX(cpu, index);
-	for_each_cpu_mask_nr(sibling, this_leaf->shared_cpu_map) {
+	for_each_cpu(sibling, to_cpumask(this_leaf->shared_cpu_map)) {
 		sibling_leaf = CPUID4_INFO_IDX(sibling, index);
-		cpu_clear(cpu, sibling_leaf->shared_cpu_map);
+		cpumask_clear_cpu(cpu,
+				  to_cpumask(sibling_leaf->shared_cpu_map));
 	}
 }
 #else
@@ -635,8 +657,9 @@ static ssize_t show_shared_cpu_map_func(struct _cpuid4_info *this_leaf,
 	int n = 0;
 
 	if (len > 1) {
-		cpumask_t *mask = &this_leaf->shared_cpu_map;
+		const struct cpumask *mask;
 
+		mask = to_cpumask(this_leaf->shared_cpu_map);
 		n = type?
 			cpulist_scnprintf(buf, len-2, mask) :
 			cpumask_scnprintf(buf, len-2, mask);
@@ -699,7 +722,8 @@ static struct pci_dev *get_k8_northbridge(int node)
 
 static ssize_t show_cache_disable(struct _cpuid4_info *this_leaf, char *buf)
 {
-	int node = cpu_to_node(first_cpu(this_leaf->shared_cpu_map));
+	const struct cpumask *mask = to_cpumask(this_leaf->shared_cpu_map);
+	int node = cpu_to_node(cpumask_first(mask));
 	struct pci_dev *dev = NULL;
 	ssize_t ret = 0;
 	int i;
@@ -733,7 +757,8 @@ static ssize_t
 store_cache_disable(struct _cpuid4_info *this_leaf, const char *buf,
 		    size_t count)
 {
-	int node = cpu_to_node(first_cpu(this_leaf->shared_cpu_map));
+	const struct cpumask *mask = to_cpumask(this_leaf->shared_cpu_map);
+	int node = cpu_to_node(cpumask_first(mask));
 	struct pci_dev *dev = NULL;
 	unsigned int ret, index, val;
 
@@ -878,7 +903,7 @@ err_out:
 	return -ENOMEM;
 }
 
-static cpumask_t cache_dev_map = CPU_MASK_NONE;
+static DECLARE_BITMAP(cache_dev_map, NR_CPUS);
 
 /* Add/Remove cache interface for CPU device */
 static int __cpuinit cache_add_dev(struct sys_device * sys_dev)
@@ -918,7 +943,7 @@ static int __cpuinit cache_add_dev(struct sys_device * sys_dev)
 		}
 		kobject_uevent(&(this_object->kobj), KOBJ_ADD);
 	}
-	cpu_set(cpu, cache_dev_map);
+	cpumask_set_cpu(cpu, to_cpumask(cache_dev_map));
 
 	kobject_uevent(per_cpu(cache_kobject, cpu), KOBJ_ADD);
 	return 0;
@@ -931,9 +956,9 @@ static void __cpuinit cache_remove_dev(struct sys_device * sys_dev)
 
 	if (per_cpu(cpuid4_info, cpu) == NULL)
 		return;
-	if (!cpu_isset(cpu, cache_dev_map))
+	if (!cpumask_test_cpu(cpu, to_cpumask(cache_dev_map)))
 		return;
-	cpu_clear(cpu, cache_dev_map);
+	cpumask_clear_cpu(cpu, to_cpumask(cache_dev_map));
 
 	for (i = 0; i < num_cache_leaves; i++)
 		kobject_put(&(INDEX_KOBJECT_PTR(cpu,i)->kobj));
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
index f2ee0ae29bd6..9817506dd469 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -67,7 +67,7 @@ static struct threshold_block threshold_defaults = {
 struct threshold_bank {
 	struct kobject *kobj;
 	struct threshold_block *blocks;
-	cpumask_t cpus;
+	cpumask_var_t cpus;
 };
 static DEFINE_PER_CPU(struct threshold_bank *, threshold_banks[NR_BANKS]);
 
@@ -481,7 +481,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
 
 #ifdef CONFIG_SMP
 	if (cpu_data(cpu).cpu_core_id && shared_bank[bank]) {	/* symlink */
-		i = first_cpu(per_cpu(cpu_core_map, cpu));
+		i = cpumask_first(&per_cpu(cpu_core_map, cpu));
 
 		/* first core not up yet */
 		if (cpu_data(i).cpu_core_id)
@@ -501,7 +501,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
 		if (err)
 			goto out;
 
-		b->cpus = per_cpu(cpu_core_map, cpu);
+		cpumask_copy(b->cpus, &per_cpu(cpu_core_map, cpu));
 		per_cpu(threshold_banks, cpu)[bank] = b;
 		goto out;
 	}
@@ -512,15 +512,20 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
 		err = -ENOMEM;
 		goto out;
 	}
+	if (!alloc_cpumask_var(&b->cpus, GFP_KERNEL)) {
+		kfree(b);
+		err = -ENOMEM;
+		goto out;
+	}
 
 	b->kobj = kobject_create_and_add(name, &per_cpu(device_mce, cpu).kobj);
 	if (!b->kobj)
 		goto out_free;
 
 #ifndef CONFIG_SMP
-	b->cpus = CPU_MASK_ALL;
+	cpumask_setall(b->cpus);
 #else
-	b->cpus = per_cpu(cpu_core_map, cpu);
+	cpumask_copy(b->cpus, &per_cpu(cpu_core_map, cpu));
 #endif
 
 	per_cpu(threshold_banks, cpu)[bank] = b;
@@ -529,7 +534,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
 	if (err)
 		goto out_free;
 
-	for_each_cpu_mask_nr(i, b->cpus) {
+	for_each_cpu(i, b->cpus) {
 		if (i == cpu)
 			continue;
 
@@ -545,6 +550,7 @@ static __cpuinit int threshold_create_bank(unsigned int cpu, unsigned int bank)
 
 out_free:
 	per_cpu(threshold_banks, cpu)[bank] = NULL;
+	free_cpumask_var(b->cpus);
 	kfree(b);
 out:
 	return err;
@@ -619,7 +625,7 @@ static void threshold_remove_bank(unsigned int cpu, int bank)
 #endif
 
 	/* remove all sibling symlinks before unregistering */
-	for_each_cpu_mask_nr(i, b->cpus) {
+	for_each_cpu(i, b->cpus) {
 		if (i == cpu)
 			continue;
 
@@ -632,6 +638,7 @@ static void threshold_remove_bank(unsigned int cpu, int bank)
 free_out:
 	kobject_del(b->kobj);
 	kobject_put(b->kobj);
+	free_cpumask_var(b->cpus);
 	kfree(b);
 	per_cpu(threshold_banks, cpu)[bank] = NULL;
 }
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
index f44c36624360..aa5e287c98e0 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
@@ -7,6 +7,7 @@
 #include <linux/interrupt.h>
 #include <linux/percpu.h>
 #include <asm/processor.h>
+#include <asm/apic.h>
 #include <asm/msr.h>
 #include <asm/mce.h>
 #include <asm/hw_irq.h>
@@ -48,13 +49,13 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 	h = apic_read(APIC_LVTTHMR);
-	if ((l & (1 << 3)) && (h & APIC_DM_SMI)) {
+	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
 		       "CPU%d: Thermal monitoring handled by SMI\n", cpu);
 		return;
 	}
 
-	if (cpu_has(c, X86_FEATURE_TM2) && (l & (1 << 13)))
+	if (cpu_has(c, X86_FEATURE_TM2) && (l & MSR_IA32_MISC_ENABLE_TM2))
 		tm2 = 1;
 
 	if (h & APIC_VECTOR_MASK) {
@@ -72,7 +73,7 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 	wrmsr(MSR_IA32_THERM_INTERRUPT, l | 0x03, h);
 
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
-	wrmsr(MSR_IA32_MISC_ENABLE, l | (1 << 3), h);
+	wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
 
 	l = apic_read(APIC_LVTTHMR);
 	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
diff --git a/arch/x86/kernel/cpu/mcheck/p4.c b/arch/x86/kernel/cpu/mcheck/p4.c
index 9b60fce09f75..f53bdcbaf382 100644
--- a/arch/x86/kernel/cpu/mcheck/p4.c
+++ b/arch/x86/kernel/cpu/mcheck/p4.c
@@ -85,7 +85,7 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 	h = apic_read(APIC_LVTTHMR);
-	if ((l & (1<<3)) && (h & APIC_DM_SMI)) {
+	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG "CPU%d: Thermal monitoring handled by SMI\n",
 				cpu);
 		return; /* -EBUSY */
@@ -111,7 +111,7 @@ static void intel_init_thermal(struct cpuinfo_x86 *c)
 	vendor_thermal_interrupt = intel_thermal_interrupt;
 
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
-	wrmsr(MSR_IA32_MISC_ENABLE, l | (1<<3), h);
+	wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);
 
 	l = apic_read(APIC_LVTTHMR);
 	apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
diff --git a/arch/x86/kernel/cpu/perfctr-watchdog.c b/arch/x86/kernel/cpu/perfctr-watchdog.c
index 9abd48b22674..f6c70a164e32 100644
--- a/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ b/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -19,7 +19,7 @@
 #include <linux/nmi.h>
 #include <linux/kprobes.h>
 
-#include <asm/apic.h>
+#include <asm/genapic.h>
 #include <asm/intel_arch_perfmon.h>
 
 struct nmi_watchdog_ctlblk {
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 01b1244ef1c0..d67e0e48bc2d 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -7,11 +7,10 @@
 /*
  *	Get CPU information for use by the procfs.
  */
-#ifdef CONFIG_X86_32
 static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
 			      unsigned int cpu)
 {
-#ifdef CONFIG_X86_HT
+#ifdef CONFIG_SMP
 	if (c->x86_max_cores * smp_num_siblings > 1) {
 		seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
 		seq_printf(m, "siblings\t: %d\n",
@@ -24,6 +23,7 @@ static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
 #endif
 }
 
+#ifdef CONFIG_X86_32
 static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 {
 	/*
@@ -50,22 +50,6 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 		   c->wp_works_ok ? "yes" : "no");
 }
 #else
-static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
-			      unsigned int cpu)
-{
-#ifdef CONFIG_SMP
-	if (c->x86_max_cores * smp_num_siblings > 1) {
-		seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-		seq_printf(m, "siblings\t: %d\n",
-			   cpus_weight(per_cpu(cpu_core_map, cpu)));
-		seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
-		seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
-		seq_printf(m, "apicid\t\t: %d\n", c->apicid);
-		seq_printf(m, "initial apicid\t: %d\n", c->initial_apicid);
-	}
-#endif
-}
-
 static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 {
 	seq_printf(m,
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c689d19e35ab..ff958248e61d 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -24,12 +24,10 @@
 #include <asm/apic.h>
 #include <asm/hpet.h>
 #include <linux/kdebug.h>
-#include <asm/smp.h>
+#include <asm/cpu.h>
 #include <asm/reboot.h>
 #include <asm/virtext.h>
 
-#include <mach_ipi.h>
-
 
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 6b1f6f6f8661..87d103ded1c3 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -99,7 +99,7 @@ print_context_stack(struct thread_info *tinfo,
 				frame = frame->next_frame;
 				bp = (unsigned long) frame;
 			} else {
-				ops->address(data, addr, bp == 0);
+				ops->address(data, addr, 0);
 			}
 			print_ftrace_graph_addr(addr, data, ops, tinfo, graph);
 		}
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index c302d0707048..d35db5993fd6 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -106,7 +106,8 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 		const struct stacktrace_ops *ops, void *data)
 {
 	const unsigned cpu = get_cpu();
-	unsigned long *irqstack_end = (unsigned long *)cpu_pda(cpu)->irqstackptr;
+	unsigned long *irq_stack_end =
+		(unsigned long *)per_cpu(irq_stack_ptr, cpu);
 	unsigned used = 0;
 	struct thread_info *tinfo;
 	int graph = 0;
@@ -160,23 +161,23 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
 			stack = (unsigned long *) estack_end[-2];
 			continue;
 		}
-		if (irqstack_end) {
-			unsigned long *irqstack;
-			irqstack = irqstack_end -
-				(IRQSTACKSIZE - 64) / sizeof(*irqstack);
+		if (irq_stack_end) {
+			unsigned long *irq_stack;
+			irq_stack = irq_stack_end -
+				(IRQ_STACK_SIZE - 64) / sizeof(*irq_stack);
 
-			if (stack >= irqstack && stack < irqstack_end) {
+			if (stack >= irq_stack && stack < irq_stack_end) {
 				if (ops->stack(data, "IRQ") < 0)
 					break;
 				bp = print_context_stack(tinfo, stack, bp,
-					ops, data, irqstack_end, &graph);
+					ops, data, irq_stack_end, &graph);
 				/*
 				 * We link to the next stack (which would be
 				 * the process stack normally) the last
 				 * pointer (index -1 to end) in the IRQ stack:
 				 */
-				stack = (unsigned long *) (irqstack_end[-1]);
-				irqstack_end = NULL;
+				stack = (unsigned long *) (irq_stack_end[-1]);
+				irq_stack_end = NULL;
 				ops->stack(data, "EOI");
 				continue;
 			}
@@ -199,10 +200,10 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	unsigned long *stack;
 	int i;
 	const int cpu = smp_processor_id();
-	unsigned long *irqstack_end =
-		(unsigned long *) (cpu_pda(cpu)->irqstackptr);
-	unsigned long *irqstack =
-		(unsigned long *) (cpu_pda(cpu)->irqstackptr - IRQSTACKSIZE);
+	unsigned long *irq_stack_end =
+		(unsigned long *)(per_cpu(irq_stack_ptr, cpu));
+	unsigned long *irq_stack =
+		(unsigned long *)(per_cpu(irq_stack_ptr, cpu) - IRQ_STACK_SIZE);
 
 	/*
 	 * debugging aid: "show_stack(NULL, NULL);" prints the
@@ -218,9 +219,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
 
 	stack = sp;
 	for (i = 0; i < kstack_depth_to_print; i++) {
-		if (stack >= irqstack && stack <= irqstack_end) {
-			if (stack == irqstack_end) {
-				stack = (unsigned long *) (irqstack_end[-1]);
+		if (stack >= irq_stack && stack <= irq_stack_end) {
+			if (stack == irq_stack_end) {
+				stack = (unsigned long *) (irq_stack_end[-1]);
 				printk(" <EOI> ");
 			}
 		} else {
@@ -241,7 +242,7 @@ void show_registers(struct pt_regs *regs)
 	int i;
 	unsigned long sp;
 	const int cpu = smp_processor_id();
-	struct task_struct *cur = cpu_pda(cpu)->pcurrent;
+	struct task_struct *cur = current;
 
 	sp = regs->sp;
 	printk("CPU %d ", cpu);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e85826829cf2..508bec1cee27 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -858,6 +858,9 @@ void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
  */
 void __init reserve_early(u64 start, u64 end, char *name)
 {
+	if (start >= end)
+		return;
+
 	drop_overlaps_that_are_ok(start, end);
 	__reserve_early(start, end, name, 0);
 }
diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index 504ad198e4ad..639ad98238a2 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -13,8 +13,8 @@
 #include <asm/setup.h>
 #include <xen/hvc-console.h>
 #include <asm/pci-direct.h>
-#include <asm/pgtable.h>
 #include <asm/fixmap.h>
+#include <asm/pgtable.h>
 #include <linux/usb/ehci_def.h>
 
 /* Simple VGA output */
diff --git a/arch/x86/kernel/efi.c b/arch/x86/kernel/efi.c
index eb1ef3b67dd5..1736acc4d7aa 100644
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -366,10 +366,12 @@ void __init efi_init(void)
 					SMBIOS_TABLE_GUID)) {
 			efi.smbios = config_tables[i].table;
 			printk(" SMBIOS=0x%lx ", config_tables[i].table);
+#ifdef CONFIG_X86_UV
 		} else if (!efi_guidcmp(config_tables[i].guid,
 					UV_SYSTEM_TABLE_GUID)) {
 			efi.uv_systab = config_tables[i].table;
 			printk(" UVsystab=0x%lx ", config_tables[i].table);
+#endif
 		} else if (!efi_guidcmp(config_tables[i].guid,
 					HCDP_TABLE_GUID)) {
 			efi.hcdp = config_tables[i].table;
diff --git a/arch/x86/kernel/efi_64.c b/arch/x86/kernel/efi_64.c
index cb783b92c50c..22c3b7828c50 100644
--- a/arch/x86/kernel/efi_64.c
+++ b/arch/x86/kernel/efi_64.c
@@ -36,6 +36,7 @@
 #include <asm/proto.h>
 #include <asm/efi.h>
 #include <asm/cacheflush.h>
+#include <asm/fixmap.h>
 
 static pgd_t save_pgd __initdata;
 static unsigned long efi_flags __initdata;
diff --git a/arch/x86/kernel/efi_stub_32.S b/arch/x86/kernel/efi_stub_32.S
index ef00bb77d7e4..fbe66e626c09 100644
--- a/arch/x86/kernel/efi_stub_32.S
+++ b/arch/x86/kernel/efi_stub_32.S
@@ -6,7 +6,7 @@
  */
 
 #include <linux/linkage.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 
 /*
  * efi_call_phys(void *, ...) is a function with variable parameters.
@@ -113,6 +113,7 @@ ENTRY(efi_call_phys)
 	movl	(%edx), %ecx
 	pushl	%ecx
 	ret
+ENDPROC(efi_call_phys)
 .previous
 
 .data
diff --git a/arch/x86/kernel/efi_stub_64.S b/arch/x86/kernel/efi_stub_64.S
index 99b47d48c9f4..4c07ccab8146 100644
--- a/arch/x86/kernel/efi_stub_64.S
+++ b/arch/x86/kernel/efi_stub_64.S
@@ -41,6 +41,7 @@ ENTRY(efi_call0)
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call0)
 
 ENTRY(efi_call1)
 	SAVE_XMM
@@ -50,6 +51,7 @@ ENTRY(efi_call1)
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call1)
 
 ENTRY(efi_call2)
 	SAVE_XMM
@@ -59,6 +61,7 @@ ENTRY(efi_call2)
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call2)
 
 ENTRY(efi_call3)
 	SAVE_XMM
@@ -69,6 +72,7 @@ ENTRY(efi_call3)
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call3)
 
 ENTRY(efi_call4)
 	SAVE_XMM
@@ -80,6 +84,7 @@ ENTRY(efi_call4)
 	addq $32, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call4)
 
 ENTRY(efi_call5)
 	SAVE_XMM
@@ -92,6 +97,7 @@ ENTRY(efi_call5)
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call5)
 
 ENTRY(efi_call6)
 	SAVE_XMM
@@ -107,3 +113,4 @@ ENTRY(efi_call6)
 	addq $48, %rsp
 	RESTORE_XMM
 	ret
+ENDPROC(efi_call6)
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 46469029e9d3..899e8938e79f 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -30,12 +30,13 @@
  *	1C(%esp) - %ds
  *	20(%esp) - %es
  *	24(%esp) - %fs
- *	28(%esp) - orig_eax
- *	2C(%esp) - %eip
- *	30(%esp) - %cs
- *	34(%esp) - %eflags
- *	38(%esp) - %oldesp
- *	3C(%esp) - %oldss
+ *	28(%esp) - %gs		saved iff !CONFIG_X86_32_LAZY_GS
+ *	2C(%esp) - orig_eax
+ *	30(%esp) - %eip
+ *	34(%esp) - %cs
+ *	38(%esp) - %eflags
+ *	3C(%esp) - %oldesp
+ *	40(%esp) - %oldss
  *
  * "current" is in register %ebx during any slow entries.
  */
@@ -46,7 +47,7 @@
 #include <asm/errno.h>
 #include <asm/segment.h>
 #include <asm/smp.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/desc.h>
 #include <asm/percpu.h>
 #include <asm/dwarf2.h>
@@ -101,121 +102,221 @@
 #define resume_userspace_sig	resume_userspace
 #endif
 
-#define SAVE_ALL \
-	cld; \
-	pushl %fs; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	/*CFI_REL_OFFSET fs, 0;*/\
-	pushl %es; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	/*CFI_REL_OFFSET es, 0;*/\
-	pushl %ds; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	/*CFI_REL_OFFSET ds, 0;*/\
-	pushl %eax; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET eax, 0;\
-	pushl %ebp; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET ebp, 0;\
-	pushl %edi; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET edi, 0;\
-	pushl %esi; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET esi, 0;\
-	pushl %edx; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET edx, 0;\
-	pushl %ecx; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET ecx, 0;\
-	pushl %ebx; \
-	CFI_ADJUST_CFA_OFFSET 4;\
-	CFI_REL_OFFSET ebx, 0;\
-	movl $(__USER_DS), %edx; \
-	movl %edx, %ds; \
-	movl %edx, %es; \
-	movl $(__KERNEL_PERCPU), %edx; \
+/*
+ * User gs save/restore
+ *
+ * %gs is used for userland TLS and kernel only uses it for stack
+ * canary which is required to be at %gs:20 by gcc.  Read the comment
+ * at the top of stackprotector.h for more info.
+ *
+ * Local labels 98 and 99 are used.
+ */
+#ifdef CONFIG_X86_32_LAZY_GS
+
+ /* unfortunately push/pop can't be no-op */
+.macro PUSH_GS
+	pushl $0
+	CFI_ADJUST_CFA_OFFSET 4
+.endm
+.macro POP_GS pop=0
+	addl $(4 + \pop), %esp
+	CFI_ADJUST_CFA_OFFSET -(4 + \pop)
+.endm
+.macro POP_GS_EX
+.endm
+
+ /* all the rest are no-op */
+.macro PTGS_TO_GS
+.endm
+.macro PTGS_TO_GS_EX
+.endm
+.macro GS_TO_REG reg
+.endm
+.macro REG_TO_PTGS reg
+.endm
+.macro SET_KERNEL_GS reg
+.endm
+
+#else	/* CONFIG_X86_32_LAZY_GS */
+
+.macro PUSH_GS
+	pushl %gs
+	CFI_ADJUST_CFA_OFFSET 4
+	/*CFI_REL_OFFSET gs, 0*/
+.endm
+
+.macro POP_GS pop=0
+98:	popl %gs
+	CFI_ADJUST_CFA_OFFSET -4
+	/*CFI_RESTORE gs*/
+  .if \pop <> 0
+	add $\pop, %esp
+	CFI_ADJUST_CFA_OFFSET -\pop
+  .endif
+.endm
+.macro POP_GS_EX
+.pushsection .fixup, "ax"
+99:	movl $0, (%esp)
+	jmp 98b
+.section __ex_table, "a"
+	.align 4
+	.long 98b, 99b
+.popsection
+.endm
+
+.macro PTGS_TO_GS
+98:	mov PT_GS(%esp), %gs
+.endm
+.macro PTGS_TO_GS_EX
+.pushsection .fixup, "ax"
+99:	movl $0, PT_GS(%esp)
+	jmp 98b
+.section __ex_table, "a"
+	.align 4
+	.long 98b, 99b
+.popsection
+.endm
+
+.macro GS_TO_REG reg
+	movl %gs, \reg
+	/*CFI_REGISTER gs, \reg*/
+.endm
+.macro REG_TO_PTGS reg
+	movl \reg, PT_GS(%esp)
+	/*CFI_REL_OFFSET gs, PT_GS*/
+.endm
+.macro SET_KERNEL_GS reg
+	movl $(__KERNEL_STACK_CANARY), \reg
+	movl \reg, %gs
+.endm
+
+#endif	/* CONFIG_X86_32_LAZY_GS */
+
+.macro SAVE_ALL
+	cld
+	PUSH_GS
+	pushl %fs
+	CFI_ADJUST_CFA_OFFSET 4
+	/*CFI_REL_OFFSET fs, 0;*/
+	pushl %es
+	CFI_ADJUST_CFA_OFFSET 4
+	/*CFI_REL_OFFSET es, 0;*/
+	pushl %ds
+	CFI_ADJUST_CFA_OFFSET 4
+	/*CFI_REL_OFFSET ds, 0;*/
+	pushl %eax
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET eax, 0
+	pushl %ebp
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET ebp, 0
+	pushl %edi
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET edi, 0
+	pushl %esi
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET esi, 0
+	pushl %edx
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET edx, 0
+	pushl %ecx
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET ecx, 0
+	pushl %ebx
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET ebx, 0
+	movl $(__USER_DS), %edx
+	movl %edx, %ds
+	movl %edx, %es
+	movl $(__KERNEL_PERCPU), %edx
 	movl %edx, %fs
+	SET_KERNEL_GS %edx
+.endm
 
-#define RESTORE_INT_REGS \
-	popl %ebx;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE ebx;\
-	popl %ecx;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE ecx;\
-	popl %edx;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE edx;\
-	popl %esi;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE esi;\
-	popl %edi;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE edi;\
-	popl %ebp;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	CFI_RESTORE ebp;\
-	popl %eax;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
+.macro RESTORE_INT_REGS
+	popl %ebx
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE ebx
+	popl %ecx
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE ecx
+	popl %edx
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE edx
+	popl %esi
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE esi
+	popl %edi
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE edi
+	popl %ebp
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE ebp
+	popl %eax
+	CFI_ADJUST_CFA_OFFSET -4
 	CFI_RESTORE eax
+.endm
 
-#define RESTORE_REGS	\
-	RESTORE_INT_REGS; \
-1:	popl %ds;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	/*CFI_RESTORE ds;*/\
-2:	popl %es;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	/*CFI_RESTORE es;*/\
-3:	popl %fs;	\
-	CFI_ADJUST_CFA_OFFSET -4;\
-	/*CFI_RESTORE fs;*/\
-.pushsection .fixup,"ax";	\
-4:	movl $0,(%esp);	\
-	jmp 1b;		\
-5:	movl $0,(%esp);	\
-	jmp 2b;		\
-6:	movl $0,(%esp);	\
-	jmp 3b;		\
-.section __ex_table,"a";\
-	.align 4;	\
-	.long 1b,4b;	\
-	.long 2b,5b;	\
-	.long 3b,6b;	\
+.macro RESTORE_REGS pop=0
+	RESTORE_INT_REGS
+1:	popl %ds
+	CFI_ADJUST_CFA_OFFSET -4
+	/*CFI_RESTORE ds;*/
+2:	popl %es
+	CFI_ADJUST_CFA_OFFSET -4
+	/*CFI_RESTORE es;*/
+3:	popl %fs
+	CFI_ADJUST_CFA_OFFSET -4
+	/*CFI_RESTORE fs;*/
+	POP_GS \pop
+.pushsection .fixup, "ax"
+4:	movl $0, (%esp)
+	jmp 1b
+5:	movl $0, (%esp)
+	jmp 2b
+6:	movl $0, (%esp)
+	jmp 3b
+.section __ex_table, "a"
+	.align 4
+	.long 1b, 4b
+	.long 2b, 5b
+	.long 3b, 6b
 .popsection
+	POP_GS_EX
+.endm
 
-#define RING0_INT_FRAME \
-	CFI_STARTPROC simple;\
-	CFI_SIGNAL_FRAME;\
-	CFI_DEF_CFA esp, 3*4;\
-	/*CFI_OFFSET cs, -2*4;*/\
+.macro RING0_INT_FRAME
+	CFI_STARTPROC simple
+	CFI_SIGNAL_FRAME
+	CFI_DEF_CFA esp, 3*4
+	/*CFI_OFFSET cs, -2*4;*/
 	CFI_OFFSET eip, -3*4
+.endm
 
-#define RING0_EC_FRAME \
-	CFI_STARTPROC simple;\
-	CFI_SIGNAL_FRAME;\
-	CFI_DEF_CFA esp, 4*4;\
-	/*CFI_OFFSET cs, -2*4;*/\
+.macro RING0_EC_FRAME
+	CFI_STARTPROC simple
+	CFI_SIGNAL_FRAME
+	CFI_DEF_CFA esp, 4*4
+	/*CFI_OFFSET cs, -2*4;*/
 	CFI_OFFSET eip, -3*4
+.endm
 
-#define RING0_PTREGS_FRAME \
-	CFI_STARTPROC simple;\
-	CFI_SIGNAL_FRAME;\
-	CFI_DEF_CFA esp, PT_OLDESP-PT_EBX;\
-	/*CFI_OFFSET cs, PT_CS-PT_OLDESP;*/\
-	CFI_OFFSET eip, PT_EIP-PT_OLDESP;\
-	/*CFI_OFFSET es, PT_ES-PT_OLDESP;*/\
-	/*CFI_OFFSET ds, PT_DS-PT_OLDESP;*/\
-	CFI_OFFSET eax, PT_EAX-PT_OLDESP;\
-	CFI_OFFSET ebp, PT_EBP-PT_OLDESP;\
-	CFI_OFFSET edi, PT_EDI-PT_OLDESP;\
-	CFI_OFFSET esi, PT_ESI-PT_OLDESP;\
-	CFI_OFFSET edx, PT_EDX-PT_OLDESP;\
-	CFI_OFFSET ecx, PT_ECX-PT_OLDESP;\
+.macro RING0_PTREGS_FRAME
+	CFI_STARTPROC simple
+	CFI_SIGNAL_FRAME
+	CFI_DEF_CFA esp, PT_OLDESP-PT_EBX
+	/*CFI_OFFSET cs, PT_CS-PT_OLDESP;*/
+	CFI_OFFSET eip, PT_EIP-PT_OLDESP
+	/*CFI_OFFSET es, PT_ES-PT_OLDESP;*/
+	/*CFI_OFFSET ds, PT_DS-PT_OLDESP;*/
+	CFI_OFFSET eax, PT_EAX-PT_OLDESP
+	CFI_OFFSET ebp, PT_EBP-PT_OLDESP
+	CFI_OFFSET edi, PT_EDI-PT_OLDESP
+	CFI_OFFSET esi, PT_ESI-PT_OLDESP
+	CFI_OFFSET edx, PT_EDX-PT_OLDESP
+	CFI_OFFSET ecx, PT_ECX-PT_OLDESP
 	CFI_OFFSET ebx, PT_EBX-PT_OLDESP
+.endm
 
 ENTRY(ret_from_fork)
 	CFI_STARTPROC
@@ -362,6 +463,7 @@ sysenter_exit:
 	xorl %ebp,%ebp
 	TRACE_IRQS_ON
 1:	mov  PT_FS(%esp), %fs
+	PTGS_TO_GS
 	ENABLE_INTERRUPTS_SYSEXIT
 
 #ifdef CONFIG_AUDITSYSCALL
@@ -410,6 +512,7 @@ sysexit_audit:
 	.align 4
 	.long 1b,2b
 .popsection
+	PTGS_TO_GS_EX
 ENDPROC(ia32_sysenter_target)
 
 	# system call handler stub
@@ -452,8 +555,7 @@ restore_all:
 restore_nocheck:
 	TRACE_IRQS_IRET
 restore_nocheck_notrace:
-	RESTORE_REGS
-	addl $4, %esp			# skip orig_eax/error_code
+	RESTORE_REGS 4			# skip orig_eax/error_code
 	CFI_ADJUST_CFA_OFFSET -4
 irq_return:
 	INTERRUPT_RETURN
@@ -595,28 +697,50 @@ syscall_badsys:
 END(syscall_badsys)
 	CFI_ENDPROC
 
-#define FIXUP_ESPFIX_STACK \
-	/* since we are on a wrong stack, we cant make it a C code :( */ \
-	PER_CPU(gdt_page, %ebx); \
-	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
-	addl %esp, %eax; \
-	pushl $__KERNEL_DS; \
-	CFI_ADJUST_CFA_OFFSET 4; \
-	pushl %eax; \
-	CFI_ADJUST_CFA_OFFSET 4; \
-	lss (%esp), %esp; \
-	CFI_ADJUST_CFA_OFFSET -8;
-#define UNWIND_ESPFIX_STACK \
-	movl %ss, %eax; \
-	/* see if on espfix stack */ \
-	cmpw $__ESPFIX_SS, %ax; \
-	jne 27f; \
-	movl $__KERNEL_DS, %eax; \
-	movl %eax, %ds; \
-	movl %eax, %es; \
-	/* switch to normal stack */ \
-	FIXUP_ESPFIX_STACK; \
-27:;
+/*
+ * System calls that need a pt_regs pointer.
+ */
+#define PTREGSCALL(name) \
+	ALIGN; \
+ptregs_##name: \
+	leal 4(%esp),%eax; \
+	jmp sys_##name;
+
+PTREGSCALL(iopl)
+PTREGSCALL(fork)
+PTREGSCALL(clone)
+PTREGSCALL(vfork)
+PTREGSCALL(execve)
+PTREGSCALL(sigaltstack)
+PTREGSCALL(sigreturn)
+PTREGSCALL(rt_sigreturn)
+PTREGSCALL(vm86)
+PTREGSCALL(vm86old)
+
+.macro FIXUP_ESPFIX_STACK
+	/* since we are on a wrong stack, we cant make it a C code :( */
+	PER_CPU(gdt_page, %ebx)
+	GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah)
+	addl %esp, %eax
+	pushl $__KERNEL_DS
+	CFI_ADJUST_CFA_OFFSET 4
+	pushl %eax
+	CFI_ADJUST_CFA_OFFSET 4
+	lss (%esp), %esp
+	CFI_ADJUST_CFA_OFFSET -8
+.endm
+.macro UNWIND_ESPFIX_STACK
+	movl %ss, %eax
+	/* see if on espfix stack */
+	cmpw $__ESPFIX_SS, %ax
+	jne 27f
+	movl $__KERNEL_DS, %eax
+	movl %eax, %ds
+	movl %eax, %es
+	/* switch to normal stack */
+	FIXUP_ESPFIX_STACK
+27:
+.endm
 
 /*
  * Build the entry stubs and pointer table with some assembler magic.
@@ -672,7 +796,7 @@ common_interrupt:
 ENDPROC(common_interrupt)
 	CFI_ENDPROC
 
-#define BUILD_INTERRUPT(name, nr)	\
+#define BUILD_INTERRUPT3(name, nr, fn)	\
 ENTRY(name)				\
 	RING0_INT_FRAME;		\
 	pushl $~(nr);			\
@@ -680,13 +804,15 @@ ENTRY(name)				\
 	SAVE_ALL;			\
 	TRACE_IRQS_OFF			\
 	movl %esp,%eax;			\
-	call smp_##name;		\
+	call fn;			\
 	jmp ret_from_intr;		\
 	CFI_ENDPROC;			\
 ENDPROC(name)
 
+#define BUILD_INTERRUPT(name, nr)	BUILD_INTERRUPT3(name, nr, smp_##name)
+
 /* The include is where all of the SMP etc. interrupts come from */
-#include "entry_arch.h"
+#include <asm/entry_arch.h>
 
 ENTRY(coprocessor_error)
 	RING0_INT_FRAME
@@ -1068,7 +1194,10 @@ ENTRY(page_fault)
 	CFI_ADJUST_CFA_OFFSET 4
 	ALIGN
 error_code:
-	/* the function address is in %fs's slot on the stack */
+	/* the function address is in %gs's slot on the stack */
+	pushl %fs
+	CFI_ADJUST_CFA_OFFSET 4
+	/*CFI_REL_OFFSET fs, 0*/
 	pushl %es
 	CFI_ADJUST_CFA_OFFSET 4
 	/*CFI_REL_OFFSET es, 0*/
@@ -1097,20 +1226,15 @@ error_code:
 	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET ebx, 0
 	cld
-	pushl %fs
-	CFI_ADJUST_CFA_OFFSET 4
-	/*CFI_REL_OFFSET fs, 0*/
 	movl $(__KERNEL_PERCPU), %ecx
 	movl %ecx, %fs
 	UNWIND_ESPFIX_STACK
-	popl %ecx
-	CFI_ADJUST_CFA_OFFSET -4
-	/*CFI_REGISTER es, ecx*/
-	movl PT_FS(%esp), %edi		# get the function address
+	GS_TO_REG %ecx
+	movl PT_GS(%esp), %edi		# get the function address
 	movl PT_ORIG_EAX(%esp), %edx	# get the error code
 	movl $-1, PT_ORIG_EAX(%esp)	# no syscall to restart
-	mov  %ecx, PT_FS(%esp)
-	/*CFI_REL_OFFSET fs, ES*/
+	REG_TO_PTGS %ecx
+	SET_KERNEL_GS %ecx
 	movl $(__USER_DS), %ecx
 	movl %ecx, %ds
 	movl %ecx, %es
@@ -1134,26 +1258,27 @@ END(page_fault)
  * by hand onto the new stack - while updating the return eip past
  * the instruction that would have done it for sysenter.
  */
-#define FIX_STACK(offset, ok, label)		\
-	cmpw $__KERNEL_CS,4(%esp);		\
-	jne ok;					\
-label:						\
-	movl TSS_sysenter_sp0+offset(%esp),%esp;	\
-	CFI_DEF_CFA esp, 0;			\
-	CFI_UNDEFINED eip;			\
-	pushfl;					\
-	CFI_ADJUST_CFA_OFFSET 4;		\
-	pushl $__KERNEL_CS;			\
-	CFI_ADJUST_CFA_OFFSET 4;		\
-	pushl $sysenter_past_esp;		\
-	CFI_ADJUST_CFA_OFFSET 4;		\
+.macro FIX_STACK offset ok label
+	cmpw $__KERNEL_CS, 4(%esp)
+	jne \ok
+\label:
+	movl TSS_sysenter_sp0 + \offset(%esp), %esp
+	CFI_DEF_CFA esp, 0
+	CFI_UNDEFINED eip
+	pushfl
+	CFI_ADJUST_CFA_OFFSET 4
+	pushl $__KERNEL_CS
+	CFI_ADJUST_CFA_OFFSET 4
+	pushl $sysenter_past_esp
+	CFI_ADJUST_CFA_OFFSET 4
 	CFI_REL_OFFSET eip, 0
+.endm
 
 ENTRY(debug)
 	RING0_INT_FRAME
 	cmpl $ia32_sysenter_target,(%esp)
 	jne debug_stack_correct
-	FIX_STACK(12, debug_stack_correct, debug_esp_fix_insn)
+	FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
 debug_stack_correct:
 	pushl $-1			# mark this as an int
 	CFI_ADJUST_CFA_OFFSET 4
@@ -1211,7 +1336,7 @@ nmi_stack_correct:
 
 nmi_stack_fixup:
 	RING0_INT_FRAME
-	FIX_STACK(12,nmi_stack_correct, 1)
+	FIX_STACK 12, nmi_stack_correct, 1
 	jmp nmi_stack_correct
 
 nmi_debug_stack_check:
@@ -1222,7 +1347,7 @@ nmi_debug_stack_check:
 	jb nmi_stack_correct
 	cmpl $debug_esp_fix_insn,(%esp)
 	ja nmi_stack_correct
-	FIX_STACK(24,nmi_stack_correct, 1)
+	FIX_STACK 24, nmi_stack_correct, 1
 	jmp nmi_stack_correct
 
 nmi_espfix_stack:
@@ -1234,7 +1359,7 @@ nmi_espfix_stack:
 	CFI_ADJUST_CFA_OFFSET 4
 	pushl %esp
 	CFI_ADJUST_CFA_OFFSET 4
-	addw $4, (%esp)
+	addl $4, (%esp)
 	/* copy the iret frame of 12 bytes */
 	.rept 3
 	pushl 16(%esp)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a1346217e43c..83d1836b9467 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -48,10 +48,11 @@
 #include <asm/unistd.h>
 #include <asm/thread_info.h>
 #include <asm/hw_irq.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/irqflags.h>
 #include <asm/paravirt.h>
 #include <asm/ftrace.h>
+#include <asm/percpu.h>
 
 /* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
 #include <linux/elf-em.h>
@@ -76,20 +77,17 @@ ENTRY(ftrace_caller)
 	movq 8(%rbp), %rsi
 	subq $MCOUNT_INSN_SIZE, %rdi
 
-.globl ftrace_call
-ftrace_call:
+GLOBAL(ftrace_call)
 	call ftrace_stub
 
 	MCOUNT_RESTORE_FRAME
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-.globl ftrace_graph_call
-ftrace_graph_call:
+GLOBAL(ftrace_graph_call)
 	jmp ftrace_stub
 #endif
 
-.globl ftrace_stub
-ftrace_stub:
+GLOBAL(ftrace_stub)
 	retq
 END(ftrace_caller)
 
@@ -109,8 +107,7 @@ ENTRY(mcount)
 	jnz ftrace_graph_caller
 #endif
 
-.globl ftrace_stub
-ftrace_stub:
+GLOBAL(ftrace_stub)
 	retq
 
 trace:
@@ -147,9 +144,7 @@ ENTRY(ftrace_graph_caller)
 	retq
 END(ftrace_graph_caller)
 
-
-.globl return_to_handler
-return_to_handler:
+GLOBAL(return_to_handler)
 	subq  $80, %rsp
 
 	movq %rax, (%rsp)
@@ -187,6 +182,7 @@ return_to_handler:
 ENTRY(native_usergs_sysret64)
 	swapgs
 	sysretq
+ENDPROC(native_usergs_sysret64)
 #endif /* CONFIG_PARAVIRT */
 
 
@@ -209,7 +205,7 @@ ENTRY(native_usergs_sysret64)
 
 	/* %rsp:at FRAMEEND */
 	.macro FIXUP_TOP_OF_STACK tmp offset=0
-	movq %gs:pda_oldrsp,\tmp
+	movq PER_CPU_VAR(old_rsp),\tmp
 	movq \tmp,RSP+\offset(%rsp)
 	movq $__USER_DS,SS+\offset(%rsp)
 	movq $__USER_CS,CS+\offset(%rsp)
@@ -220,7 +216,7 @@ ENTRY(native_usergs_sysret64)
 
 	.macro RESTORE_TOP_OF_STACK tmp offset=0
 	movq RSP+\offset(%rsp),\tmp
-	movq \tmp,%gs:pda_oldrsp
+	movq \tmp,PER_CPU_VAR(old_rsp)
 	movq EFLAGS+\offset(%rsp),\tmp
 	movq \tmp,R11+\offset(%rsp)
 	.endm
@@ -336,15 +332,15 @@ ENTRY(save_args)
 	je 1f
 	SWAPGS
 	/*
-	 * irqcount is used to check if a CPU is already on an interrupt stack
+	 * irq_count is used to check if a CPU is already on an interrupt stack
 	 * or not. While this is essentially redundant with preempt_count it is
 	 * a little cheaper to use a separate counter in the PDA (short of
 	 * moving irq_enter into assembly, which would be too much work)
 	 */
-1:	incl %gs:pda_irqcount
+1:	incl PER_CPU_VAR(irq_count)
 	jne 2f
 	popq_cfi %rax			/* move return address... */
-	mov %gs:pda_irqstackptr,%rsp
+	mov PER_CPU_VAR(irq_stack_ptr),%rsp
 	EMPTY_FRAME 0
 	pushq_cfi %rbp			/* backlink for unwinder */
 	pushq_cfi %rax			/* ... to the new stack */
@@ -409,6 +405,8 @@ END(save_paranoid)
 ENTRY(ret_from_fork)
 	DEFAULT_FRAME
 
+	LOCK ; btr $TIF_FORK,TI_flags(%r8)
+
 	push kernel_eflags(%rip)
 	CFI_ADJUST_CFA_OFFSET 8
 	popf					# reset kernel eflags
@@ -468,7 +466,7 @@ END(ret_from_fork)
 ENTRY(system_call)
 	CFI_STARTPROC	simple
 	CFI_SIGNAL_FRAME
-	CFI_DEF_CFA	rsp,PDA_STACKOFFSET
+	CFI_DEF_CFA	rsp,KERNEL_STACK_OFFSET
 	CFI_REGISTER	rip,rcx
 	/*CFI_REGISTER	rflags,r11*/
 	SWAPGS_UNSAFE_STACK
@@ -479,8 +477,8 @@ ENTRY(system_call)
 	 */
 ENTRY(system_call_after_swapgs)
 
-	movq	%rsp,%gs:pda_oldrsp
-	movq	%gs:pda_kernelstack,%rsp
+	movq	%rsp,PER_CPU_VAR(old_rsp)
+	movq	PER_CPU_VAR(kernel_stack),%rsp
 	/*
 	 * No need to follow this irqs off/on section - it's straight
 	 * and short:
@@ -523,7 +521,7 @@ sysret_check:
 	CFI_REGISTER	rip,rcx
 	RESTORE_ARGS 0,-ARG_SKIP,1
 	/*CFI_REGISTER	rflags,r11*/
-	movq	%gs:pda_oldrsp, %rsp
+	movq	PER_CPU_VAR(old_rsp), %rsp
 	USERGS_SYSRET64
 
 	CFI_RESTORE_STATE
@@ -630,16 +628,14 @@ tracesys:
  * Syscall return path ending with IRET.
  * Has correct top of stack, but partial stack frame.
  */
-	.globl int_ret_from_sys_call
-	.globl int_with_check
-int_ret_from_sys_call:
+GLOBAL(int_ret_from_sys_call)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
 	testl $3,CS-ARGOFFSET(%rsp)
 	je retint_restore_args
 	movl $_TIF_ALLWORK_MASK,%edi
 	/* edi:	mask to check */
-int_with_check:
+GLOBAL(int_with_check)
 	LOCKDEP_SYS_EXIT_IRQ
 	GET_THREAD_INFO(%rcx)
 	movl TI_flags(%rcx),%edx
@@ -833,11 +829,11 @@ common_interrupt:
 	XCPT_FRAME
 	addq $-0x80,(%rsp)		/* Adjust vector to [-256,-1] range */
 	interrupt do_IRQ
-	/* 0(%rsp): oldrsp-ARGOFFSET */
+	/* 0(%rsp): old_rsp-ARGOFFSET */
 ret_from_intr:
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
-	decl %gs:pda_irqcount
+	decl PER_CPU_VAR(irq_count)
 	leaveq
 	CFI_DEF_CFA_REGISTER	rsp
 	CFI_ADJUST_CFA_OFFSET	-8
@@ -982,8 +978,10 @@ apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \
 	irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
 #endif
 
+#ifdef CONFIG_X86_UV
 apicinterrupt UV_BAU_MESSAGE \
 	uv_bau_message_intr1 uv_bau_message_interrupt
+#endif
 apicinterrupt LOCAL_TIMER_VECTOR \
 	apic_timer_interrupt smp_apic_timer_interrupt
 
@@ -1073,10 +1071,10 @@ ENTRY(\sym)
 	TRACE_IRQS_OFF
 	movq %rsp,%rdi		/* pt_regs pointer */
 	xorl %esi,%esi		/* no error code */
-	movq %gs:pda_data_offset, %rbp
-	subq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	PER_CPU(init_tss, %rbp)
+	subq $EXCEPTION_STKSZ, TSS_ist + (\ist - 1) * 8(%rbp)
 	call \do_sym
-	addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
+	addq $EXCEPTION_STKSZ, TSS_ist + (\ist - 1) * 8(%rbp)
 	jmp paranoid_exit	/* %ebx: no swapgs flag */
 	CFI_ENDPROC
 END(\sym)
@@ -1138,7 +1136,7 @@ ENTRY(native_load_gs_index)
 	CFI_STARTPROC
 	pushf
 	CFI_ADJUST_CFA_OFFSET 8
-	DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
+	DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
 	SWAPGS
 gs_change:
 	movl %edi,%gs
@@ -1260,14 +1258,14 @@ ENTRY(call_softirq)
 	CFI_REL_OFFSET rbp,0
 	mov  %rsp,%rbp
 	CFI_DEF_CFA_REGISTER rbp
-	incl %gs:pda_irqcount
-	cmove %gs:pda_irqstackptr,%rsp
+	incl PER_CPU_VAR(irq_count)
+	cmove PER_CPU_VAR(irq_stack_ptr),%rsp
 	push  %rbp			# backlink for old unwinder
 	call __do_softirq
 	leaveq
 	CFI_DEF_CFA_REGISTER	rsp
 	CFI_ADJUST_CFA_OFFSET   -8
-	decl %gs:pda_irqcount
+	decl PER_CPU_VAR(irq_count)
 	ret
 	CFI_ENDPROC
 END(call_softirq)
@@ -1297,15 +1295,15 @@ ENTRY(xen_do_hypervisor_callback)   # do_hypervisor_callback(struct *pt_regs)
 	movq %rdi, %rsp            # we don't return, adjust the stack frame
 	CFI_ENDPROC
 	DEFAULT_FRAME
-11:	incl %gs:pda_irqcount
+11:	incl PER_CPU_VAR(irq_count)
 	movq %rsp,%rbp
 	CFI_DEF_CFA_REGISTER rbp
-	cmovzq %gs:pda_irqstackptr,%rsp
+	cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
 	pushq %rbp			# backlink for old unwinder
 	call xen_evtchn_do_upcall
 	popq %rsp
 	CFI_DEF_CFA_REGISTER rsp
-	decl %gs:pda_irqcount
+	decl PER_CPU_VAR(irq_count)
 	jmp  error_exit
 	CFI_ENDPROC
 END(do_hypervisor_callback)
diff --git a/arch/x86/kernel/es7000_32.c b/arch/x86/kernel/es7000_32.c
deleted file mode 100644
index 53699c931ad4..000000000000
--- a/arch/x86/kernel/es7000_32.c
+++ /dev/null
@@ -1,378 +0,0 @@
-/*
- * Written by: Garry Forsgren, Unisys Corporation
- *             Natalie Protasevich, Unisys Corporation
- * This file contains the code to configure and interface
- * with Unisys ES7000 series hardware system manager.
- *
- * Copyright (c) 2003 Unisys Corporation.  All Rights Reserved.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of version 2 of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it would be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, write the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston MA 02111-1307, USA.
- *
- * Contact information: Unisys Corporation, Township Line & Union Meeting
- * Roads-A, Unisys Way, Blue Bell, Pennsylvania, 19424, or:
- *
- * http://www.unisys.com
- */
-
-#include <linux/module.h>
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/smp.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/errno.h>
-#include <linux/notifier.h>
-#include <linux/reboot.h>
-#include <linux/init.h>
-#include <linux/acpi.h>
-#include <asm/io.h>
-#include <asm/nmi.h>
-#include <asm/smp.h>
-#include <asm/atomic.h>
-#include <asm/apicdef.h>
-#include <mach_mpparse.h>
-#include <asm/genapic.h>
-#include <asm/setup.h>
-
-/*
- * ES7000 chipsets
- */
-
-#define NON_UNISYS		0
-#define ES7000_CLASSIC		1
-#define ES7000_ZORRO		2
-
-
-#define	MIP_REG			1
-#define	MIP_PSAI_REG		4
-
-#define	MIP_BUSY		1
-#define	MIP_SPIN		0xf0000
-#define	MIP_VALID		0x0100000000000000ULL
-#define	MIP_PORT(VALUE)	((VALUE >> 32) & 0xffff)
-
-#define	MIP_RD_LO(VALUE)	(VALUE & 0xffffffff)
-
-struct mip_reg_info {
-	unsigned long long mip_info;
-	unsigned long long delivery_info;
-	unsigned long long host_reg;
-	unsigned long long mip_reg;
-};
-
-struct part_info {
-	unsigned char type;
-	unsigned char length;
-	unsigned char part_id;
-	unsigned char apic_mode;
-	unsigned long snum;
-	char ptype[16];
-	char sname[64];
-	char pname[64];
-};
-
-struct psai {
-	unsigned long long entry_type;
-	unsigned long long addr;
-	unsigned long long bep_addr;
-};
-
-struct es7000_mem_info {
-	unsigned char type;
-	unsigned char length;
-	unsigned char resv[6];
-	unsigned long long  start;
-	unsigned long long  size;
-};
-
-struct es7000_oem_table {
-	unsigned long long hdr;
-	struct mip_reg_info mip;
-	struct part_info pif;
-	struct es7000_mem_info shm;
-	struct psai psai;
-};
-
-#ifdef CONFIG_ACPI
-
-struct oem_table {
-	struct acpi_table_header Header;
-	u32 OEMTableAddr;
-	u32 OEMTableSize;
-};
-
-extern int find_unisys_acpi_oem_table(unsigned long *oem_addr);
-extern void unmap_unisys_acpi_oem_table(unsigned long oem_addr);
-#endif
-
-struct mip_reg {
-	unsigned long long off_0;
-	unsigned long long off_8;
-	unsigned long long off_10;
-	unsigned long long off_18;
-	unsigned long long off_20;
-	unsigned long long off_28;
-	unsigned long long off_30;
-	unsigned long long off_38;
-};
-
-#define	MIP_SW_APIC		0x1020b
-#define	MIP_FUNC(VALUE)		(VALUE & 0xff)
-
-/*
- * ES7000 Globals
- */
-
-static volatile unsigned long	*psai = NULL;
-static struct mip_reg		*mip_reg;
-static struct mip_reg		*host_reg;
-static int 			mip_port;
-static unsigned long		mip_addr, host_addr;
-
-int es7000_plat;
-
-/*
- * GSI override for ES7000 platforms.
- */
-
-static unsigned int base;
-
-static int
-es7000_rename_gsi(int ioapic, int gsi)
-{
-	if (es7000_plat == ES7000_ZORRO)
-		return gsi;
-
-	if (!base) {
-		int i;
-		for (i = 0; i < nr_ioapics; i++)
-			base += nr_ioapic_registers[i];
-	}
-
-	if (!ioapic && (gsi < 16))
-		gsi += base;
-	return gsi;
-}
-
-static int wakeup_secondary_cpu_via_mip(int cpu, unsigned long eip)
-{
-	unsigned long vect = 0, psaival = 0;
-
-	if (psai == NULL)
-		return -1;
-
-	vect = ((unsigned long)__pa(eip)/0x1000) << 16;
-	psaival = (0x1000000 | vect | cpu);
-
-	while (*psai & 0x1000000)
-		;
-
-	*psai = psaival;
-
-	return 0;
-}
-
-static void noop_wait_for_deassert(atomic_t *deassert_not_used)
-{
-}
-
-static int __init es7000_update_genapic(void)
-{
-	genapic->wakeup_cpu = wakeup_secondary_cpu_via_mip;
-
-	/* MPENTIUMIII */
-	if (boot_cpu_data.x86 == 6 &&
-	    (boot_cpu_data.x86_model >= 7 || boot_cpu_data.x86_model <= 11)) {
-		es7000_update_genapic_to_cluster();
-		genapic->wait_for_init_deassert = noop_wait_for_deassert;
-		genapic->wakeup_cpu = wakeup_secondary_cpu_via_mip;
-	}
-
-	return 0;
-}
-
-void __init
-setup_unisys(void)
-{
-	/*
-	 * Determine the generation of the ES7000 currently running.
-	 *
-	 * es7000_plat = 1 if the machine is a 5xx ES7000 box
-	 * es7000_plat = 2 if the machine is a x86_64 ES7000 box
-	 *
-	 */
-	if (!(boot_cpu_data.x86 <= 15 && boot_cpu_data.x86_model <= 2))
-		es7000_plat = ES7000_ZORRO;
-	else
-		es7000_plat = ES7000_CLASSIC;
-	ioapic_renumber_irq = es7000_rename_gsi;
-
-	x86_quirks->update_genapic = es7000_update_genapic;
-}
-
-/*
- * Parse the OEM Table
- */
-
-int __init
-parse_unisys_oem (char *oemptr)
-{
-	int                     i;
-	int 			success = 0;
-	unsigned char           type, size;
-	unsigned long           val;
-	char                    *tp = NULL;
-	struct psai             *psaip = NULL;
-	struct mip_reg_info 	*mi;
-	struct mip_reg		*host, *mip;
-
-	tp = oemptr;
-
-	tp += 8;
-
-	for (i=0; i <= 6; i++) {
-		type = *tp++;
-		size = *tp++;
-		tp -= 2;
-		switch (type) {
-		case MIP_REG:
-			mi = (struct mip_reg_info *)tp;
-			val = MIP_RD_LO(mi->host_reg);
-			host_addr = val;
-			host = (struct mip_reg *)val;
-			host_reg = __va(host);
-			val = MIP_RD_LO(mi->mip_reg);
-			mip_port = MIP_PORT(mi->mip_info);
-			mip_addr = val;
-			mip = (struct mip_reg *)val;
-			mip_reg = __va(mip);
-			pr_debug("es7000_mipcfg: host_reg = 0x%lx \n",
-				 (unsigned long)host_reg);
-			pr_debug("es7000_mipcfg: mip_reg = 0x%lx \n",
-				 (unsigned long)mip_reg);
-			success++;
-			break;
-		case MIP_PSAI_REG:
-			psaip = (struct psai *)tp;
-			if (tp != NULL) {
-				if (psaip->addr)
-					psai = __va(psaip->addr);
-				else
-					psai = NULL;
-				success++;
-			}
-			break;
-		default:
-			break;
-		}
-		tp += size;
-	}
-
-	if (success < 2) {
-		es7000_plat = NON_UNISYS;
-	} else
-		setup_unisys();
-	return es7000_plat;
-}
-
-#ifdef CONFIG_ACPI
-static unsigned long oem_addrX;
-static unsigned long oem_size;
-int __init find_unisys_acpi_oem_table(unsigned long *oem_addr)
-{
-	struct acpi_table_header *header = NULL;
-	int i = 0;
-
-	while (ACPI_SUCCESS(acpi_get_table("OEM1", i++, &header))) {
-		if (!memcmp((char *) &header->oem_id, "UNISYS", 6)) {
-			struct oem_table *t = (struct oem_table *)header;
-
-			oem_addrX = t->OEMTableAddr;
-			oem_size = t->OEMTableSize;
-
-			*oem_addr = (unsigned long)__acpi_map_table(oem_addrX,
-								    oem_size);
-			return 0;
-		}
-	}
-	return -1;
-}
-
-void __init unmap_unisys_acpi_oem_table(unsigned long oem_addr)
-{
-}
-#endif
-
-static void
-es7000_spin(int n)
-{
-	int i = 0;
-
-	while (i++ < n)
-		rep_nop();
-}
-
-static int __init
-es7000_mip_write(struct mip_reg *mip_reg)
-{
-	int			status = 0;
-	int			spin;
-
-	spin = MIP_SPIN;
-	while (((unsigned long long)host_reg->off_38 &
-		(unsigned long long)MIP_VALID) != 0) {
-			if (--spin <= 0) {
-				printk("es7000_mip_write: Timeout waiting for Host Valid Flag");
-				return -1;
-			}
-		es7000_spin(MIP_SPIN);
-	}
-
-	memcpy(host_reg, mip_reg, sizeof(struct mip_reg));
-	outb(1, mip_port);
-
-	spin = MIP_SPIN;
-
-	while (((unsigned long long)mip_reg->off_38 &
-		(unsigned long long)MIP_VALID) == 0) {
-		if (--spin <= 0) {
-			printk("es7000_mip_write: Timeout waiting for MIP Valid Flag");
-			return -1;
-		}
-		es7000_spin(MIP_SPIN);
-	}
-
-	status = ((unsigned long long)mip_reg->off_0 &
-		(unsigned long long)0xffff0000000000ULL) >> 48;
-	mip_reg->off_38 = ((unsigned long long)mip_reg->off_38 &
-		(unsigned long long)~MIP_VALID);
-	return status;
-}
-
-void __init
-es7000_sw_apic(void)
-{
-	if (es7000_plat) {
-		int mip_status;
-		struct mip_reg es7000_mip_reg;
-
-		printk("ES7000: Enabling APIC mode.\n");
-        	memset(&es7000_mip_reg, 0, sizeof(struct mip_reg));
-        	es7000_mip_reg.off_0 = MIP_SW_APIC;
-        	es7000_mip_reg.off_38 = (MIP_VALID);
-        	while ((mip_status = es7000_mip_write(&es7000_mip_reg)) != 0)
-              		printk("es7000_sw_apic: command failed, status = %x\n",
-				mip_status);
-		return;
-	}
-}
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b9a4d8c4b935..f5b272247690 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -26,27 +26,6 @@
 #include <asm/bios_ebda.h>
 #include <asm/trampoline.h>
 
-/* boot cpu pda */
-static struct x8664_pda _boot_cpu_pda;
-
-#ifdef CONFIG_SMP
-/*
- * We install an empty cpu_pda pointer table to indicate to early users
- * (numa_set_node) that the cpu_pda pointer table for cpus other than
- * the boot cpu is not yet setup.
- */
-static struct x8664_pda *__cpu_pda[NR_CPUS] __initdata;
-#else
-static struct x8664_pda *__cpu_pda[NR_CPUS] __read_mostly;
-#endif
-
-void __init x86_64_init_pda(void)
-{
-	_cpu_pda = __cpu_pda;
-	cpu_pda(0) = &_boot_cpu_pda;
-	pda_init(0);
-}
-
 static void __init zap_identity_mappings(void)
 {
 	pgd_t *pgd = pgd_offset_k(0UL);
@@ -112,8 +91,6 @@ void __init x86_64_start_kernel(char * real_mode_data)
 	if (console_loglevel == 10)
 		early_printk("Kernel alive\n");
 
-	x86_64_init_pda();
-
 	x86_64_start_reservations(real_mode_data);
 }
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index e835b4eea70b..c32ca19d591a 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -11,14 +11,15 @@
 #include <linux/init.h>
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
-#include <asm/pgtable.h>
+#include <asm/page_types.h>
+#include <asm/pgtable_types.h>
 #include <asm/desc.h>
 #include <asm/cache.h>
 #include <asm/thread_info.h>
 #include <asm/asm-offsets.h>
 #include <asm/setup.h>
 #include <asm/processor-flags.h>
+#include <asm/percpu.h>
 
 /* Physical address */
 #define pa(X) ((X) - __PAGE_OFFSET)
@@ -429,14 +430,34 @@ is386:	movl $2,%ecx		# set MP
 	ljmp $(__KERNEL_CS),$1f
 1:	movl $(__KERNEL_DS),%eax	# reload all the segment registers
 	movl %eax,%ss			# after changing gdt.
-	movl %eax,%fs			# gets reset once there's real percpu
 
 	movl $(__USER_DS),%eax		# DS/ES contains default USER segment
 	movl %eax,%ds
 	movl %eax,%es
 
-	xorl %eax,%eax			# Clear GS and LDT
+	movl $(__KERNEL_PERCPU), %eax
+	movl %eax,%fs			# set this cpu's percpu
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+	/*
+	 * The linker can't handle this by relocation.  Manually set
+	 * base address in stack canary segment descriptor.
+	 */
+	cmpb $0,ready
+	jne 1f
+	movl $per_cpu__gdt_page,%eax
+	movl $per_cpu__stack_canary,%ecx
+	subl $20, %ecx
+	movw %cx, 8 * GDT_ENTRY_STACK_CANARY + 2(%eax)
+	shrl $16, %ecx
+	movb %cl, 8 * GDT_ENTRY_STACK_CANARY + 4(%eax)
+	movb %ch, 8 * GDT_ENTRY_STACK_CANARY + 7(%eax)
+1:
+#endif
+	movl $(__KERNEL_STACK_CANARY),%eax
 	movl %eax,%gs
+
+	xorl %eax,%eax			# Clear LDT
 	lldt %ax
 
 	cld			# gcc2 wants the direction flag cleared at all times
@@ -446,8 +467,6 @@ is386:	movl $2,%ecx		# set MP
 	movb $1, ready
 	cmpb $0,%cl		# the first CPU calls start_kernel
 	je   1f
-	movl $(__KERNEL_PERCPU), %eax
-	movl %eax,%fs		# set this cpu's percpu
 	movl (stack_start), %esp
 1:
 #endif /* CONFIG_SMP */
@@ -548,12 +567,8 @@ early_fault:
 	pushl %eax
 	pushl %edx		/* trapno */
 	pushl $fault_msg
-#ifdef CONFIG_EARLY_PRINTK
-	call early_printk
-#else
 	call printk
 #endif
-#endif
 	call dump_stack
 hlt_loop:
 	hlt
@@ -580,11 +595,10 @@ ignore_int:
 	pushl 32(%esp)
 	pushl 40(%esp)
 	pushl $int_msg
-#ifdef CONFIG_EARLY_PRINTK
-	call early_printk
-#else
 	call printk
-#endif
+
+	call dump_stack
+
 	addl $(5*4),%esp
 	popl %ds
 	popl %es
@@ -660,7 +674,7 @@ early_recursion_flag:
 	.long 0
 
 int_msg:
-	.asciz "Unknown interrupt or fault at EIP %p %p %p\n"
+	.asciz "Unknown interrupt or fault at: %p %p %p\n"
 
 fault_msg:
 /* fault info: */
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 0e275d495563..54b29bb24e71 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -19,6 +19,7 @@
 #include <asm/msr.h>
 #include <asm/cache.h>
 #include <asm/processor-flags.h>
+#include <asm/percpu.h>
 
 #ifdef CONFIG_PARAVIRT
 #include <asm/asm-offsets.h>
@@ -226,12 +227,15 @@ ENTRY(secondary_startup_64)
 	movl %eax,%fs
 	movl %eax,%gs
 
-	/* 
-	 * Setup up a dummy PDA. this is just for some early bootup code
-	 * that does in_interrupt() 
-	 */ 
+	/* Set up %gs.
+	 *
+	 * The base of %gs always points to the bottom of the irqstack
+	 * union.  If the stack protector canary is enabled, it is
+	 * located at %gs:40.  Note that, on SMP, the boot cpu uses
+	 * init data section till per cpu areas are set up.
+	 */
 	movl	$MSR_GS_BASE,%ecx
-	movq	$empty_zero_page,%rax
+	movq	initial_gs(%rip),%rax
 	movq    %rax,%rdx
 	shrq	$32,%rdx
 	wrmsr	
@@ -257,6 +261,8 @@ ENTRY(secondary_startup_64)
 	.align	8
 	ENTRY(initial_code)
 	.quad	x86_64_start_kernel
+	ENTRY(initial_gs)
+	.quad	INIT_PER_CPU_VAR(irq_stack_union)
 	__FINITDATA
 
 	ENTRY(stack_start)
@@ -323,8 +329,6 @@ early_idt_ripmsg:
 #endif /* CONFIG_EARLY_PRINTK */
 	.previous
 
-.balign PAGE_SIZE
-
 #define NEXT_PAGE(name) \
 	.balign	PAGE_SIZE; \
 ENTRY(name)
@@ -401,7 +405,8 @@ NEXT_PAGE(level2_spare_pgt)
 	.globl early_gdt_descr
 early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
-	.quad   per_cpu__gdt_page
+early_gdt_descr_base:
+	.quad	INIT_PER_CPU_VAR(gdt_page)
 
 ENTRY(phys_base)
 	/* This must match the first entry in level2_kernel_pgt */
@@ -412,7 +417,7 @@ ENTRY(phys_base)
 	.section .bss, "aw", @nobits
 	.align L1_CACHE_BYTES
 ENTRY(idt_table)
-	.skip 256 * 16
+	.skip IDT_ENTRIES * 16
 
 	.section .bss.page_aligned, "aw", @nobits
 	.align PAGE_SIZE
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 11d5093eb281..df89102bef80 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -22,7 +22,6 @@
 #include <asm/pgtable.h>
 #include <asm/desc.h>
 #include <asm/apic.h>
-#include <asm/arch_hooks.h>
 #include <asm/i8259.h>
 
 /*
diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
index b12208f4dfee..99c4d308f16b 100644
--- a/arch/x86/kernel/ioport.c
+++ b/arch/x86/kernel/ioport.c
@@ -85,19 +85,8 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on)
 
 	t->io_bitmap_max = bytes;
 
-#ifdef CONFIG_X86_32
-	/*
-	 * Sets the lazy trigger so that the next I/O operation will
-	 * reload the correct bitmap.
-	 * Reset the owner so that a process switch will not set
-	 * tss->io_bitmap_base to IO_BITMAP_OFFSET.
-	 */
-	tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
-	tss->io_bitmap_owner = NULL;
-#else
 	/* Update the TSS: */
 	memcpy(tss->io_bitmap, t->io_bitmap_ptr, bytes_updated);
-#endif
 
 	put_cpu();
 
@@ -131,9 +120,8 @@ static int do_iopl(unsigned int level, struct pt_regs *regs)
 }
 
 #ifdef CONFIG_X86_32
-asmlinkage long sys_iopl(unsigned long regsp)
+long sys_iopl(struct pt_regs *regs)
 {
-	struct pt_regs *regs = (struct pt_regs *)&regsp;
 	unsigned int level = regs->bx;
 	struct thread_struct *t = &current->thread;
 	int rc;
diff --git a/arch/x86/kernel/ipi.c b/arch/x86/kernel/ipi.c
deleted file mode 100644
index 285bbf8831fa..000000000000
--- a/arch/x86/kernel/ipi.c
+++ /dev/null
@@ -1,190 +0,0 @@
-#include <linux/cpumask.h>
-#include <linux/interrupt.h>
-#include <linux/init.h>
-
-#include <linux/mm.h>
-#include <linux/delay.h>
-#include <linux/spinlock.h>
-#include <linux/kernel_stat.h>
-#include <linux/mc146818rtc.h>
-#include <linux/cache.h>
-#include <linux/cpu.h>
-#include <linux/module.h>
-
-#include <asm/smp.h>
-#include <asm/mtrr.h>
-#include <asm/tlbflush.h>
-#include <asm/mmu_context.h>
-#include <asm/apic.h>
-#include <asm/proto.h>
-
-#ifdef CONFIG_X86_32
-#include <mach_apic.h>
-#include <mach_ipi.h>
-
-/*
- * the following functions deal with sending IPIs between CPUs.
- *
- * We use 'broadcast', CPU->CPU IPIs and self-IPIs too.
- */
-
-static inline int __prepare_ICR(unsigned int shortcut, int vector)
-{
-	unsigned int icr = shortcut | APIC_DEST_LOGICAL;
-
-	switch (vector) {
-	default:
-		icr |= APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
-		icr |= APIC_DM_NMI;
-		break;
-	}
-	return icr;
-}
-
-static inline int __prepare_ICR2(unsigned int mask)
-{
-	return SET_APIC_DEST_FIELD(mask);
-}
-
-void __send_IPI_shortcut(unsigned int shortcut, int vector)
-{
-	/*
-	 * Subtle. In the case of the 'never do double writes' workaround
-	 * we have to lock out interrupts to be safe.  As we don't care
-	 * of the value read we use an atomic rmw access to avoid costly
-	 * cli/sti.  Otherwise we use an even cheaper single atomic write
-	 * to the APIC.
-	 */
-	unsigned int cfg;
-
-	/*
-	 * Wait for idle.
-	 */
-	apic_wait_icr_idle();
-
-	/*
-	 * No need to touch the target chip field
-	 */
-	cfg = __prepare_ICR(shortcut, vector);
-
-	/*
-	 * Send the IPI. The write to APIC_ICR fires this off.
-	 */
-	apic_write(APIC_ICR, cfg);
-}
-
-void send_IPI_self(int vector)
-{
-	__send_IPI_shortcut(APIC_DEST_SELF, vector);
-}
-
-/*
- * This is used to send an IPI with no shorthand notation (the destination is
- * specified in bits 56 to 63 of the ICR).
- */
-static inline void __send_IPI_dest_field(unsigned long mask, int vector)
-{
-	unsigned long cfg;
-
-	/*
-	 * Wait for idle.
-	 */
-	if (unlikely(vector == NMI_VECTOR))
-		safe_apic_wait_icr_idle();
-	else
-		apic_wait_icr_idle();
-
-	/*
-	 * prepare target chip field
-	 */
-	cfg = __prepare_ICR2(mask);
-	apic_write(APIC_ICR2, cfg);
-
-	/*
-	 * program the ICR
-	 */
-	cfg = __prepare_ICR(0, vector);
-
-	/*
-	 * Send the IPI. The write to APIC_ICR fires this off.
-	 */
-	apic_write(APIC_ICR, cfg);
-}
-
-/*
- * This is only used on smaller machines.
- */
-void send_IPI_mask_bitmask(const struct cpumask *cpumask, int vector)
-{
-	unsigned long mask = cpumask_bits(cpumask)[0];
-	unsigned long flags;
-
-	local_irq_save(flags);
-	WARN_ON(mask & ~cpumask_bits(cpu_online_mask)[0]);
-	__send_IPI_dest_field(mask, vector);
-	local_irq_restore(flags);
-}
-
-void send_IPI_mask_sequence(const struct cpumask *mask, int vector)
-{
-	unsigned long flags;
-	unsigned int query_cpu;
-
-	/*
-	 * Hack. The clustered APIC addressing mode doesn't allow us to send
-	 * to an arbitrary mask, so I do a unicasts to each CPU instead. This
-	 * should be modified to do 1 message per cluster ID - mbligh
-	 */
-
-	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask)
-		__send_IPI_dest_field(cpu_to_logical_apicid(query_cpu), vector);
-	local_irq_restore(flags);
-}
-
-void send_IPI_mask_allbutself(const struct cpumask *mask, int vector)
-{
-	unsigned long flags;
-	unsigned int query_cpu;
-	unsigned int this_cpu = smp_processor_id();
-
-	/* See Hack comment above */
-
-	local_irq_save(flags);
-	for_each_cpu(query_cpu, mask)
-		if (query_cpu != this_cpu)
-			__send_IPI_dest_field(cpu_to_logical_apicid(query_cpu),
-					      vector);
-	local_irq_restore(flags);
-}
-
-/* must come after the send_IPI functions above for inlining */
-static int convert_apicid_to_cpu(int apic_id)
-{
-	int i;
-
-	for_each_possible_cpu(i) {
-		if (per_cpu(x86_cpu_to_apicid, i) == apic_id)
-			return i;
-	}
-	return -1;
-}
-
-int safe_smp_processor_id(void)
-{
-	int apicid, cpuid;
-
-	if (!boot_cpu_has(X86_FEATURE_APIC))
-		return 0;
-
-	apicid = hard_smp_processor_id();
-	if (apicid == BAD_APICID)
-		return 0;
-
-	cpuid = convert_apicid_to_cpu(apicid);
-
-	return cpuid >= 0 ? cpuid : 0;
-}
-#endif
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 3973e2df7f87..f13ca1650aaf 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -6,10 +6,12 @@
 #include <linux/kernel_stat.h>
 #include <linux/seq_file.h>
 #include <linux/smp.h>
+#include <linux/ftrace.h>
 
 #include <asm/apic.h>
 #include <asm/io_apic.h>
 #include <asm/irq.h>
+#include <asm/idle.h>
 
 atomic_t irq_err_count;
 
@@ -36,11 +38,7 @@ void ack_bad_irq(unsigned int irq)
 #endif
 }
 
-#ifdef CONFIG_X86_32
-# define irq_stats(x)		(&per_cpu(irq_stat, x))
-#else
-# define irq_stats(x)		cpu_pda(x)
-#endif
+#define irq_stats(x)		(&per_cpu(irq_stat, x))
 /*
  * /proc/interrupts printing:
  */
@@ -192,4 +190,40 @@ u64 arch_irq_stat(void)
 	return sum;
 }
 
+
+/*
+ * do_IRQ handles all normal device IRQ's (the special
+ * SMP cross-CPU interrupts have their own specific
+ * handlers).
+ */
+unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+
+	/* high bit used in ret_from_ code  */
+	unsigned vector = ~regs->orig_ax;
+	unsigned irq;
+
+	exit_idle();
+	irq_enter();
+
+	irq = __get_cpu_var(vector_irq)[vector];
+
+	if (!handle_irq(irq, regs)) {
+#ifdef CONFIG_X86_64
+		if (!disable_apic)
+			ack_APIC_irq();
+#endif
+
+		if (printk_ratelimit())
+			printk(KERN_EMERG "%s: %d.%d No irq handler for vector (irq %d)\n",
+			       __func__, smp_processor_id(), vector, irq);
+	}
+
+	irq_exit();
+
+	set_irq_regs(old_regs);
+	return 1;
+}
+
 EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index 74b9ff7341e9..3b09634a5153 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -16,6 +16,7 @@
 #include <linux/cpu.h>
 #include <linux/delay.h>
 #include <linux/uaccess.h>
+#include <linux/percpu.h>
 
 #include <asm/apic.h>
 
@@ -55,13 +56,13 @@ static inline void print_stack_overflow(void) { }
 union irq_ctx {
 	struct thread_info      tinfo;
 	u32                     stack[THREAD_SIZE/sizeof(u32)];
-};
+} __attribute__((aligned(PAGE_SIZE)));
 
-static union irq_ctx *hardirq_ctx[NR_CPUS] __read_mostly;
-static union irq_ctx *softirq_ctx[NR_CPUS] __read_mostly;
+static DEFINE_PER_CPU(union irq_ctx *, hardirq_ctx);
+static DEFINE_PER_CPU(union irq_ctx *, softirq_ctx);
 
-static char softirq_stack[NR_CPUS * THREAD_SIZE] __page_aligned_bss;
-static char hardirq_stack[NR_CPUS * THREAD_SIZE] __page_aligned_bss;
+static DEFINE_PER_CPU_PAGE_ALIGNED(union irq_ctx, hardirq_stack);
+static DEFINE_PER_CPU_PAGE_ALIGNED(union irq_ctx, softirq_stack);
 
 static void call_on_stack(void *func, void *stack)
 {
@@ -81,7 +82,7 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq)
 	u32 *isp, arg1, arg2;
 
 	curctx = (union irq_ctx *) current_thread_info();
-	irqctx = hardirq_ctx[smp_processor_id()];
+	irqctx = __get_cpu_var(hardirq_ctx);
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -125,34 +126,34 @@ void __cpuinit irq_ctx_init(int cpu)
 {
 	union irq_ctx *irqctx;
 
-	if (hardirq_ctx[cpu])
+	if (per_cpu(hardirq_ctx, cpu))
 		return;
 
-	irqctx = (union irq_ctx*) &hardirq_stack[cpu*THREAD_SIZE];
+	irqctx = &per_cpu(hardirq_stack, cpu);
 	irqctx->tinfo.task		= NULL;
 	irqctx->tinfo.exec_domain	= NULL;
 	irqctx->tinfo.cpu		= cpu;
 	irqctx->tinfo.preempt_count	= HARDIRQ_OFFSET;
 	irqctx->tinfo.addr_limit	= MAKE_MM_SEG(0);
 
-	hardirq_ctx[cpu] = irqctx;
+	per_cpu(hardirq_ctx, cpu) = irqctx;
 
-	irqctx = (union irq_ctx *) &softirq_stack[cpu*THREAD_SIZE];
+	irqctx = &per_cpu(softirq_stack, cpu);
 	irqctx->tinfo.task		= NULL;
 	irqctx->tinfo.exec_domain	= NULL;
 	irqctx->tinfo.cpu		= cpu;
 	irqctx->tinfo.preempt_count	= 0;
 	irqctx->tinfo.addr_limit	= MAKE_MM_SEG(0);
 
-	softirq_ctx[cpu] = irqctx;
+	per_cpu(softirq_ctx, cpu) = irqctx;
 
 	printk(KERN_DEBUG "CPU %u irqstacks, hard=%p soft=%p\n",
-	       cpu, hardirq_ctx[cpu], softirq_ctx[cpu]);
+	       cpu, per_cpu(hardirq_ctx, cpu),  per_cpu(softirq_ctx, cpu));
 }
 
 void irq_ctx_exit(int cpu)
 {
-	hardirq_ctx[cpu] = NULL;
+	per_cpu(hardirq_ctx, cpu) = NULL;
 }
 
 asmlinkage void do_softirq(void)
@@ -169,7 +170,7 @@ asmlinkage void do_softirq(void)
 
 	if (local_softirq_pending()) {
 		curctx = current_thread_info();
-		irqctx = softirq_ctx[smp_processor_id()];
+		irqctx = __get_cpu_var(softirq_ctx);
 		irqctx->tinfo.task = curctx->task;
 		irqctx->tinfo.previous_esp = current_stack_pointer;
 
@@ -191,33 +192,16 @@ static inline int
 execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq) { return 0; }
 #endif
 
-/*
- * do_IRQ handles all normal device IRQ's (the special
- * SMP cross-CPU interrupts have their own specific
- * handlers).
- */
-unsigned int do_IRQ(struct pt_regs *regs)
+bool handle_irq(unsigned irq, struct pt_regs *regs)
 {
-	struct pt_regs *old_regs;
-	/* high bit used in ret_from_ code */
-	int overflow;
-	unsigned vector = ~regs->orig_ax;
 	struct irq_desc *desc;
-	unsigned irq;
-
-
-	old_regs = set_irq_regs(regs);
-	irq_enter();
-	irq = __get_cpu_var(vector_irq)[vector];
+	int overflow;
 
 	overflow = check_stack_overflow();
 
 	desc = irq_to_desc(irq);
-	if (unlikely(!desc)) {
-		printk(KERN_EMERG "%s: cannot handle IRQ %d vector %#x cpu %d\n",
-					__func__, irq, vector, smp_processor_id());
-		BUG();
-	}
+	if (unlikely(!desc))
+		return false;
 
 	if (!execute_on_irq_stack(overflow, desc, irq)) {
 		if (unlikely(overflow))
@@ -225,13 +209,10 @@ unsigned int do_IRQ(struct pt_regs *regs)
 		desc->handle_irq(irq, desc);
 	}
 
-	irq_exit();
-	set_irq_regs(old_regs);
-	return 1;
+	return true;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-#include <mach_apic.h>
 
 /* A cpu has been removed from cpu_online_mask.  Reset irq affinities. */
 void fixup_irqs(void)
@@ -248,7 +229,7 @@ void fixup_irqs(void)
 		if (irq == 2)
 			continue;
 
-		affinity = &desc->affinity;
+		affinity = desc->affinity;
 		if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
 			printk("Breaking affinity for irq %i\n", irq);
 			affinity = cpu_all_mask;
diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 63c88e6ec025..977d8b43a0dd 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -18,6 +18,13 @@
 #include <linux/smp.h>
 #include <asm/io_apic.h>
 #include <asm/idle.h>
+#include <asm/apic.h>
+
+DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
+EXPORT_PER_CPU_SYMBOL(irq_stat);
+
+DEFINE_PER_CPU(struct pt_regs *, irq_regs);
+EXPORT_PER_CPU_SYMBOL(irq_regs);
 
 /*
  * Probabilistic stack overflow check:
@@ -41,42 +48,18 @@ static inline void stack_overflow_check(struct pt_regs *regs)
 #endif
 }
 
-/*
- * do_IRQ handles all normal device IRQ's (the special
- * SMP cross-CPU interrupts have their own specific
- * handlers).
- */
-asmlinkage unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
+bool handle_irq(unsigned irq, struct pt_regs *regs)
 {
-	struct pt_regs *old_regs = set_irq_regs(regs);
 	struct irq_desc *desc;
 
-	/* high bit used in ret_from_ code  */
-	unsigned vector = ~regs->orig_ax;
-	unsigned irq;
-
-	exit_idle();
-	irq_enter();
-	irq = __get_cpu_var(vector_irq)[vector];
-
 	stack_overflow_check(regs);
 
 	desc = irq_to_desc(irq);
-	if (likely(desc))
-		generic_handle_irq_desc(irq, desc);
-	else {
-		if (!disable_apic)
-			ack_APIC_irq();
-
-		if (printk_ratelimit())
-			printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
-				__func__, smp_processor_id(), vector);
-	}
-
-	irq_exit();
+	if (unlikely(!desc))
+		return false;
 
-	set_irq_regs(old_regs);
-	return 1;
+	generic_handle_irq_desc(irq, desc);
+	return true;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -100,7 +83,7 @@ void fixup_irqs(void)
 		/* interrupt's are disabled at this point */
 		spin_lock(&desc->lock);
 
-		affinity = &desc->affinity;
+		affinity = desc->affinity;
 		if (!irq_has_action(irq) ||
 		    cpumask_equal(affinity, cpu_online_mask)) {
 			spin_unlock(&desc->lock);
diff --git a/arch/x86/kernel/irqinit_32.c b/arch/x86/kernel/irqinit_32.c
index 10a09c2f1828..50b8c3a3006c 100644
--- a/arch/x86/kernel/irqinit_32.c
+++ b/arch/x86/kernel/irqinit_32.c
@@ -18,7 +18,7 @@
 #include <asm/pgtable.h>
 #include <asm/desc.h>
 #include <asm/apic.h>
-#include <asm/arch_hooks.h>
+#include <asm/setup.h>
 #include <asm/i8259.h>
 #include <asm/traps.h>
 
@@ -78,6 +78,15 @@ void __init init_ISA_irqs(void)
 	}
 }
 
+/*
+ * IRQ2 is cascade interrupt to second interrupt controller
+ */
+static struct irqaction irq2 = {
+	.handler = no_action,
+	.mask = CPU_MASK_NONE,
+	.name = "cascade",
+};
+
 DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
 	[0 ... IRQ0_VECTOR - 1] = -1,
 	[IRQ0_VECTOR] = 0,
@@ -118,8 +127,8 @@ void __init native_init_IRQ(void)
 {
 	int i;
 
-	/* all the set up before the call gates are initialised */
-	pre_intr_init_hook();
+	/* Execute any quirks before the call gates are initialised: */
+	x86_quirk_pre_intr_init();
 
 	/*
 	 * Cover the whole vector space, no vector can escape
@@ -140,8 +149,15 @@ void __init native_init_IRQ(void)
 	 */
 	alloc_intr_gate(RESCHEDULE_VECTOR, reschedule_interrupt);
 
-	/* IPI for invalidation */
-	alloc_intr_gate(INVALIDATE_TLB_VECTOR, invalidate_interrupt);
+	/* IPIs for invalidation */
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+0, invalidate_interrupt0);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+1, invalidate_interrupt1);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+2, invalidate_interrupt2);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+3, invalidate_interrupt3);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+4, invalidate_interrupt4);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+5, invalidate_interrupt5);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+6, invalidate_interrupt6);
+	alloc_intr_gate(INVALIDATE_TLB_VECTOR_START+7, invalidate_interrupt7);
 
 	/* IPI for generic function call */
 	alloc_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
@@ -169,10 +185,14 @@ void __init native_init_IRQ(void)
 	alloc_intr_gate(THERMAL_APIC_VECTOR, thermal_interrupt);
 #endif
 
-	/* setup after call gates are initialised (usually add in
-	 * the architecture specific gates)
+	if (!acpi_ioapic)
+		setup_irq(2, &irq2);
+
+	/*
+	 * Call quirks after call gates are initialised (usually add in
+	 * the architecture specific gates):
 	 */
-	intr_init_hook();
+	x86_quirk_intr_init();
 
 	/*
 	 * External FPU? Set up irq13 if so, for
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 10435a120d22..eedfaebe1063 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -46,7 +46,7 @@
 #include <asm/apicdef.h>
 #include <asm/system.h>
 
-#include <mach_ipi.h>
+#include <asm/apic.h>
 
 /*
  * Put the error code here just in case the user cares:
@@ -347,7 +347,7 @@ void kgdb_post_primary_code(struct pt_regs *regs, int e_vector, int err_code)
  */
 void kgdb_roundup_cpus(unsigned long flags)
 {
-	send_IPI_allbutself(APIC_DM_NMI);
+	apic->send_IPI_allbutself(APIC_DM_NMI);
 }
 #endif
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 652fce6d2cce..137f2e8132df 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -19,7 +19,6 @@
 #include <linux/clocksource.h>
 #include <linux/kvm_para.h>
 #include <asm/pvclock.h>
-#include <asm/arch_hooks.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
 #include <linux/percpu.h>
diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index 37f420018a41..f5fc8c781a62 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -121,7 +121,7 @@ static void machine_kexec_page_table_set_one(
 static void machine_kexec_prepare_page_tables(struct kimage *image)
 {
 	void *control_page;
-	pmd_t *pmd = 0;
+	pmd_t *pmd = NULL;
 
 	control_page = page_address(image->control_code_page);
 #ifdef CONFIG_X86_PAE
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index c43caa3a91f3..6993d51b7fd8 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,15 +18,6 @@
 #include <asm/mmu_context.h>
 #include <asm/io.h>
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u64 kexec_pgd[512] PAGE_ALIGNED;
-static u64 kexec_pud0[512] PAGE_ALIGNED;
-static u64 kexec_pmd0[512] PAGE_ALIGNED;
-static u64 kexec_pte0[512] PAGE_ALIGNED;
-static u64 kexec_pud1[512] PAGE_ALIGNED;
-static u64 kexec_pmd1[512] PAGE_ALIGNED;
-static u64 kexec_pte1[512] PAGE_ALIGNED;
-
 static void init_level2_page(pmd_t *level2p, unsigned long addr)
 {
 	unsigned long end_addr;
@@ -107,12 +98,65 @@ out:
 	return result;
 }
 
+static void free_transition_pgtable(struct kimage *image)
+{
+	free_page((unsigned long)image->arch.pud);
+	free_page((unsigned long)image->arch.pmd);
+	free_page((unsigned long)image->arch.pte);
+}
+
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	unsigned long vaddr, paddr;
+	int result = -ENOMEM;
+
+	vaddr = (unsigned long)relocate_kernel;
+	paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+	pgd += pgd_index(vaddr);
+	if (!pgd_present(*pgd)) {
+		pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
+		if (!pud)
+			goto err;
+		image->arch.pud = pud;
+		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+	}
+	pud = pud_offset(pgd, vaddr);
+	if (!pud_present(*pud)) {
+		pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+		if (!pmd)
+			goto err;
+		image->arch.pmd = pmd;
+		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+	}
+	pmd = pmd_offset(pud, vaddr);
+	if (!pmd_present(*pmd)) {
+		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		if (!pte)
+			goto err;
+		image->arch.pte = pte;
+		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+	}
+	pte = pte_offset_kernel(pmd, vaddr);
+	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+	return 0;
+err:
+	free_transition_pgtable(image);
+	return result;
+}
+
 
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
 	pgd_t *level4p;
+	int result;
 	level4p = (pgd_t *)__va(start_pgtable);
-	return init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+	result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+	if (result)
+		return result;
+	return init_transition_pgtable(image, level4p);
 }
 
 static void set_idt(void *newidt, u16 limit)
@@ -174,7 +218,7 @@ int machine_kexec_prepare(struct kimage *image)
 
 void machine_kexec_cleanup(struct kimage *image)
 {
-	return;
+	free_transition_pgtable(image);
 }
 
 /*
@@ -195,22 +239,6 @@ void machine_kexec(struct kimage *image)
 	memcpy(control_page, relocate_kernel, PAGE_SIZE);
 
 	page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);
-	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
-	page_list[PA_PGD] = virt_to_phys(&kexec_pgd);
-	page_list[VA_PGD] = (unsigned long)kexec_pgd;
-	page_list[PA_PUD_0] = virt_to_phys(&kexec_pud0);
-	page_list[VA_PUD_0] = (unsigned long)kexec_pud0;
-	page_list[PA_PMD_0] = virt_to_phys(&kexec_pmd0);
-	page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-	page_list[PA_PTE_0] = virt_to_phys(&kexec_pte0);
-	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-	page_list[PA_PUD_1] = virt_to_phys(&kexec_pud1);
-	page_list[VA_PUD_1] = (unsigned long)kexec_pud1;
-	page_list[PA_PMD_1] = virt_to_phys(&kexec_pmd1);
-	page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-	page_list[PA_PTE_1] = virt_to_phys(&kexec_pte1);
-	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
-
 	page_list[PA_TABLE_PAGE] =
 	  (unsigned long)__pa(page_address(image->control_code_page));
 
diff --git a/arch/x86/kernel/mca_32.c b/arch/x86/kernel/mca_32.c
index 2dc183758be3..845d80ce1ef1 100644
--- a/arch/x86/kernel/mca_32.c
+++ b/arch/x86/kernel/mca_32.c
@@ -51,7 +51,6 @@
 #include <linux/ioport.h>
 #include <asm/uaccess.h>
 #include <linux/init.h>
-#include <asm/arch_hooks.h>
 
 static unsigned char which_scsi;
 
@@ -474,6 +473,4 @@ void __kprobes mca_handle_nmi(void)
 	 * adapter was responsible for the error.
 	 */
 	bus_for_each_dev(&mca_bus_type, NULL, NULL, mca_handle_nmi_callback);
-
-	mca_nmi_hook();
-} /* mca_handle_nmi */
+}
diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index b7f4c929e615..5e9f4fc51385 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -87,9 +87,9 @@
 #include <linux/cpu.h>
 #include <linux/firmware.h>
 #include <linux/platform_device.h>
+#include <linux/uaccess.h>
 
 #include <asm/msr.h>
-#include <asm/uaccess.h>
 #include <asm/processor.h>
 #include <asm/microcode.h>
 
@@ -196,7 +196,7 @@ static inline int update_match_cpu(struct cpu_signature *csig, int sig, int pf)
 	return (!sigmatch(sig, csig->sig, pf, csig->pf)) ? 0 : 1;
 }
 
-static inline int 
+static inline int
 update_match_revision(struct microcode_header_intel *mc_header,	int rev)
 {
 	return (mc_header->rev <= rev) ? 0 : 1;
@@ -442,8 +442,8 @@ static int request_microcode_fw(int cpu, struct device *device)
 		return ret;
 	}
 
-	ret = generic_load_microcode(cpu, (void*)firmware->data, firmware->size,
-			&get_ucode_fw);
+	ret = generic_load_microcode(cpu, (void *)firmware->data,
+				     firmware->size, &get_ucode_fw);
 
 	release_firmware(firmware);
 
@@ -460,7 +460,7 @@ static int request_microcode_user(int cpu, const void __user *buf, size_t size)
 	/* We should bind the task to the CPU */
 	BUG_ON(cpu != raw_smp_processor_id());
 
-	return generic_load_microcode(cpu, (void*)buf, size, &get_ucode_user);
+	return generic_load_microcode(cpu, (void *)buf, size, &get_ucode_user);
 }
 
 static void microcode_fini_cpu(int cpu)
diff --git a/arch/x86/kernel/module_32.c b/arch/x86/kernel/module_32.c
index 3db0a5442eb1..0edd819050e7 100644
--- a/arch/x86/kernel/module_32.c
+++ b/arch/x86/kernel/module_32.c
@@ -42,7 +42,7 @@ void module_free(struct module *mod, void *module_region)
 {
 	vfree(module_region);
 	/* FIXME: If module_region == mod->init_region, trim exception
-           table entries. */
+	   table entries. */
 }
 
 /* We don't need anything special. */
@@ -113,13 +113,13 @@ int module_finalize(const Elf_Ehdr *hdr,
 		*para = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
-	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { 
+	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
 		if (!strcmp(".text", secstrings + s->sh_name))
 			text = s;
 		if (!strcmp(".altinstructions", secstrings + s->sh_name))
 			alt = s;
 		if (!strcmp(".smp_locks", secstrings + s->sh_name))
-			locks= s;
+			locks = s;
 		if (!strcmp(".parainstructions", secstrings + s->sh_name))
 			para = s;
 	}
diff --git a/arch/x86/kernel/module_64.c b/arch/x86/kernel/module_64.c
index 6ba87830d4b1..c23880b90b5c 100644
--- a/arch/x86/kernel/module_64.c
+++ b/arch/x86/kernel/module_64.c
@@ -30,14 +30,14 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 
-#define DEBUGP(fmt...) 
+#define DEBUGP(fmt...)
 
 #ifndef CONFIG_UML
 void module_free(struct module *mod, void *module_region)
 {
 	vfree(module_region);
 	/* FIXME: If module_region == mod->init_region, trim exception
-           table entries. */
+	   table entries. */
 }
 
 void *module_alloc(unsigned long size)
@@ -77,7 +77,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
 	Elf64_Sym *sym;
 	void *loc;
-	u64 val; 
+	u64 val;
 
 	DEBUGP("Applying relocate section %u to %u\n", relsec,
 	       sechdrs[relsec].sh_info);
@@ -91,11 +91,11 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
 			+ ELF64_R_SYM(rel[i].r_info);
 
-	        DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
-		       (int)ELF64_R_TYPE(rel[i].r_info), 
-		       sym->st_value, rel[i].r_addend, (u64)loc);
+		DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
+			(int)ELF64_R_TYPE(rel[i].r_info),
+			sym->st_value, rel[i].r_addend, (u64)loc);
 
-		val = sym->st_value + rel[i].r_addend; 
+		val = sym->st_value + rel[i].r_addend;
 
 		switch (ELF64_R_TYPE(rel[i].r_info)) {
 		case R_X86_64_NONE:
@@ -113,16 +113,16 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
-		case R_X86_64_PC32: 
+		case R_X86_64_PC32:
 			val -= (u64)loc;
 			*(u32 *)loc = val;
 #if 0
 			if ((s64)val != *(s32 *)loc)
-				goto overflow; 
+				goto overflow;
 #endif
 			break;
 		default:
-			printk(KERN_ERR "module %s: Unknown rela relocation: %Lu\n",
+			printk(KERN_ERR "module %s: Unknown rela relocation: %llu\n",
 			       me->name, ELF64_R_TYPE(rel[i].r_info));
 			return -ENOEXEC;
 		}
@@ -130,7 +130,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 	return 0;
 
 overflow:
-	printk(KERN_ERR "overflow in relocation type %d val %Lx\n", 
+	printk(KERN_ERR "overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
 	printk(KERN_ERR "`%s' likely not compiled with -mcmodel=kernel\n",
 	       me->name);
@@ -143,13 +143,13 @@ int apply_relocate(Elf_Shdr *sechdrs,
 		   unsigned int relsec,
 		   struct module *me)
 {
-	printk("non add relocation not supported\n");
+	printk(KERN_ERR "non add relocation not supported\n");
 	return -ENOSYS;
-} 
+}
 
 int module_finalize(const Elf_Ehdr *hdr,
-                    const Elf_Shdr *sechdrs,
-                    struct module *me)
+		    const Elf_Shdr *sechdrs,
+		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL;
@@ -161,7 +161,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 		if (!strcmp(".altinstructions", secstrings + s->sh_name))
 			alt = s;
 		if (!strcmp(".smp_locks", secstrings + s->sh_name))
-			locks= s;
+			locks = s;
 		if (!strcmp(".parainstructions", secstrings + s->sh_name))
 			para = s;
 	}
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index a649a4ccad43..37cb1bda1baf 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -3,7 +3,7 @@
  *	compliant MP-table parsing routines.
  *
  *	(c) 1995 Alan Cox, Building #3 <alan@lxorguk.ukuu.org.uk>
- *	(c) 1998, 1999, 2000 Ingo Molnar <mingo@redhat.com>
+ *	(c) 1998, 1999, 2000, 2009 Ingo Molnar <mingo@redhat.com>
  *      (c) 2008 Alexey Starikovskiy <astarikovskiy@suse.de>
  */
 
@@ -29,12 +29,7 @@
 #include <asm/setup.h>
 #include <asm/smp.h>
 
-#include <mach_apic.h>
-#ifdef CONFIG_X86_32
-#include <mach_apicdef.h>
-#include <mach_mpparse.h>
-#endif
-
+#include <asm/apic.h>
 /*
  * Checksum an MP configuration block.
  */
@@ -144,11 +139,11 @@ static void __init MP_ioapic_info(struct mpc_ioapic *m)
 	if (bad_ioapic(m->apicaddr))
 		return;
 
-	mp_ioapics[nr_ioapics].mp_apicaddr = m->apicaddr;
-	mp_ioapics[nr_ioapics].mp_apicid = m->apicid;
-	mp_ioapics[nr_ioapics].mp_type = m->type;
-	mp_ioapics[nr_ioapics].mp_apicver = m->apicver;
-	mp_ioapics[nr_ioapics].mp_flags = m->flags;
+	mp_ioapics[nr_ioapics].apicaddr = m->apicaddr;
+	mp_ioapics[nr_ioapics].apicid = m->apicid;
+	mp_ioapics[nr_ioapics].type = m->type;
+	mp_ioapics[nr_ioapics].apicver = m->apicver;
+	mp_ioapics[nr_ioapics].flags = m->flags;
 	nr_ioapics++;
 }
 
@@ -160,55 +155,55 @@ static void print_MP_intsrc_info(struct mpc_intsrc *m)
 		m->srcbusirq, m->dstapic, m->dstirq);
 }
 
-static void __init print_mp_irq_info(struct mp_config_intsrc *mp_irq)
+static void __init print_mp_irq_info(struct mpc_intsrc *mp_irq)
 {
 	apic_printk(APIC_VERBOSE, "Int: type %d, pol %d, trig %d, bus %02x,"
 		" IRQ %02x, APIC ID %x, APIC INT %02x\n",
-		mp_irq->mp_irqtype, mp_irq->mp_irqflag & 3,
-		(mp_irq->mp_irqflag >> 2) & 3, mp_irq->mp_srcbus,
-		mp_irq->mp_srcbusirq, mp_irq->mp_dstapic, mp_irq->mp_dstirq);
+		mp_irq->irqtype, mp_irq->irqflag & 3,
+		(mp_irq->irqflag >> 2) & 3, mp_irq->srcbus,
+		mp_irq->srcbusirq, mp_irq->dstapic, mp_irq->dstirq);
 }
 
 static void __init assign_to_mp_irq(struct mpc_intsrc *m,
-				    struct mp_config_intsrc *mp_irq)
+				    struct mpc_intsrc *mp_irq)
 {
-	mp_irq->mp_dstapic = m->dstapic;
-	mp_irq->mp_type = m->type;
-	mp_irq->mp_irqtype = m->irqtype;
-	mp_irq->mp_irqflag = m->irqflag;
-	mp_irq->mp_srcbus = m->srcbus;
-	mp_irq->mp_srcbusirq = m->srcbusirq;
-	mp_irq->mp_dstirq = m->dstirq;
+	mp_irq->dstapic = m->dstapic;
+	mp_irq->type = m->type;
+	mp_irq->irqtype = m->irqtype;
+	mp_irq->irqflag = m->irqflag;
+	mp_irq->srcbus = m->srcbus;
+	mp_irq->srcbusirq = m->srcbusirq;
+	mp_irq->dstirq = m->dstirq;
 }
 
-static void __init assign_to_mpc_intsrc(struct mp_config_intsrc *mp_irq,
+static void __init assign_to_mpc_intsrc(struct mpc_intsrc *mp_irq,
 					struct mpc_intsrc *m)
 {
-	m->dstapic = mp_irq->mp_dstapic;
-	m->type = mp_irq->mp_type;
-	m->irqtype = mp_irq->mp_irqtype;
-	m->irqflag = mp_irq->mp_irqflag;
-	m->srcbus = mp_irq->mp_srcbus;
-	m->srcbusirq = mp_irq->mp_srcbusirq;
-	m->dstirq = mp_irq->mp_dstirq;
+	m->dstapic = mp_irq->dstapic;
+	m->type = mp_irq->type;
+	m->irqtype = mp_irq->irqtype;
+	m->irqflag = mp_irq->irqflag;
+	m->srcbus = mp_irq->srcbus;
+	m->srcbusirq = mp_irq->srcbusirq;
+	m->dstirq = mp_irq->dstirq;
 }
 
-static int __init mp_irq_mpc_intsrc_cmp(struct mp_config_intsrc *mp_irq,
+static int __init mp_irq_mpc_intsrc_cmp(struct mpc_intsrc *mp_irq,
 					struct mpc_intsrc *m)
 {
-	if (mp_irq->mp_dstapic != m->dstapic)
+	if (mp_irq->dstapic != m->dstapic)
 		return 1;
-	if (mp_irq->mp_type != m->type)
+	if (mp_irq->type != m->type)
 		return 2;
-	if (mp_irq->mp_irqtype != m->irqtype)
+	if (mp_irq->irqtype != m->irqtype)
 		return 3;
-	if (mp_irq->mp_irqflag != m->irqflag)
+	if (mp_irq->irqflag != m->irqflag)
 		return 4;
-	if (mp_irq->mp_srcbus != m->srcbus)
+	if (mp_irq->srcbus != m->srcbus)
 		return 5;
-	if (mp_irq->mp_srcbusirq != m->srcbusirq)
+	if (mp_irq->srcbusirq != m->srcbusirq)
 		return 6;
-	if (mp_irq->mp_dstirq != m->dstirq)
+	if (mp_irq->dstirq != m->dstirq)
 		return 7;
 
 	return 0;
@@ -292,16 +287,7 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
 		return 0;
 
 #ifdef CONFIG_X86_32
-	/*
-	 * need to make sure summit and es7000's mps_oem_check is safe to be
-	 * called early via genericarch 's mps_oem_check
-	 */
-	if (early) {
-#ifdef CONFIG_X86_NUMAQ
-		numaq_mps_oem_check(mpc, oem, str);
-#endif
-	} else
-		mps_oem_check(mpc, oem, str);
+	generic_mps_oem_check(mpc, oem, str);
 #endif
 	/* save the local APIC address, it might be non-default */
 	if (!acpi_lapic)
@@ -386,13 +372,13 @@ static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
 			(*x86_quirks->mpc_record)++;
 	}
 
-#ifdef CONFIG_X86_GENERICARCH
-       generic_bigsmp_probe();
+#ifdef CONFIG_X86_BIGSMP
+	generic_bigsmp_probe();
 #endif
 
-#ifdef CONFIG_X86_32
-	setup_apic_routing();
-#endif
+	if (apic->setup_apic_routing)
+		apic->setup_apic_routing();
+
 	if (!num_processors)
 		printk(KERN_ERR "MPTABLE: no processors registered!\n");
 	return num_processors;
@@ -417,7 +403,7 @@ static void __init construct_default_ioirq_mptable(int mpc_default_type)
 	intsrc.type = MP_INTSRC;
 	intsrc.irqflag = 0;	/* conforming */
 	intsrc.srcbus = 0;
-	intsrc.dstapic = mp_ioapics[0].mp_apicid;
+	intsrc.dstapic = mp_ioapics[0].apicid;
 
 	intsrc.irqtype = mp_INT;
 
@@ -570,14 +556,14 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
 	}
 }
 
-static struct intel_mp_floating *mpf_found;
+static struct mpf_intel *mpf_found;
 
 /*
  * Scan the memory blocks for an SMP configuration block.
  */
 static void __init __get_smp_config(unsigned int early)
 {
-	struct intel_mp_floating *mpf = mpf_found;
+	struct mpf_intel *mpf = mpf_found;
 
 	if (!mpf)
 		return;
@@ -598,9 +584,9 @@ static void __init __get_smp_config(unsigned int early)
 	}
 
 	printk(KERN_INFO "Intel MultiProcessor Specification v1.%d\n",
-	       mpf->mpf_specification);
+	       mpf->specification);
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
-	if (mpf->mpf_feature2 & (1 << 7)) {
+	if (mpf->feature2 & (1 << 7)) {
 		printk(KERN_INFO "    IMCR and PIC compatibility mode.\n");
 		pic_mode = 1;
 	} else {
@@ -611,7 +597,7 @@ static void __init __get_smp_config(unsigned int early)
 	/*
 	 * Now see if we need to read further.
 	 */
-	if (mpf->mpf_feature1 != 0) {
+	if (mpf->feature1 != 0) {
 		if (early) {
 			/*
 			 * local APIC has default address
@@ -621,16 +607,16 @@ static void __init __get_smp_config(unsigned int early)
 		}
 
 		printk(KERN_INFO "Default MP configuration #%d\n",
-		       mpf->mpf_feature1);
-		construct_default_ISA_mptable(mpf->mpf_feature1);
+		       mpf->feature1);
+		construct_default_ISA_mptable(mpf->feature1);
 
-	} else if (mpf->mpf_physptr) {
+	} else if (mpf->physptr) {
 
 		/*
 		 * Read the physical hardware table.  Anything here will
 		 * override the defaults.
 		 */
-		if (!smp_read_mpc(phys_to_virt(mpf->mpf_physptr), early)) {
+		if (!smp_read_mpc(phys_to_virt(mpf->physptr), early)) {
 #ifdef CONFIG_X86_LOCAL_APIC
 			smp_found_config = 0;
 #endif
@@ -688,32 +674,32 @@ static int __init smp_scan_config(unsigned long base, unsigned long length,
 				  unsigned reserve)
 {
 	unsigned int *bp = phys_to_virt(base);
-	struct intel_mp_floating *mpf;
+	struct mpf_intel *mpf;
 
 	apic_printk(APIC_VERBOSE, "Scan SMP from %p for %ld bytes.\n",
 			bp, length);
 	BUILD_BUG_ON(sizeof(*mpf) != 16);
 
 	while (length > 0) {
-		mpf = (struct intel_mp_floating *)bp;
+		mpf = (struct mpf_intel *)bp;
 		if ((*bp == SMP_MAGIC_IDENT) &&
-		    (mpf->mpf_length == 1) &&
+		    (mpf->length == 1) &&
 		    !mpf_checksum((unsigned char *)bp, 16) &&
-		    ((mpf->mpf_specification == 1)
-		     || (mpf->mpf_specification == 4))) {
+		    ((mpf->specification == 1)
+		     || (mpf->specification == 4))) {
 #ifdef CONFIG_X86_LOCAL_APIC
 			smp_found_config = 1;
 #endif
 			mpf_found = mpf;
 
-			printk(KERN_INFO "found SMP MP-table at [%p] %08lx\n",
-			       mpf, virt_to_phys(mpf));
+			printk(KERN_INFO "found SMP MP-table at [%p] %llx\n",
+			       mpf, (u64)virt_to_phys(mpf));
 
 			if (!reserve)
 				return 1;
 			reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE,
 					BOOTMEM_DEFAULT);
-			if (mpf->mpf_physptr) {
+			if (mpf->physptr) {
 				unsigned long size = PAGE_SIZE;
 #ifdef CONFIG_X86_32
 				/*
@@ -722,15 +708,24 @@ static int __init smp_scan_config(unsigned long base, unsigned long length,
 				 * the bottom is mapped now.
 				 * PC-9800's MPC table places on the very last
 				 * of physical memory; so that simply reserving
-				 * PAGE_SIZE from mpg->mpf_physptr yields BUG()
+				 * PAGE_SIZE from mpf->physptr yields BUG()
 				 * in reserve_bootmem.
+				 * also need to make sure physptr is below than
+				 * max_low_pfn
+				 * we don't need reserve the area above max_low_pfn
 				 */
 				unsigned long end = max_low_pfn * PAGE_SIZE;
-				if (mpf->mpf_physptr + size > end)
-					size = end - mpf->mpf_physptr;
-#endif
-				reserve_bootmem_generic(mpf->mpf_physptr, size,
+
+				if (mpf->physptr < end) {
+					if (mpf->physptr + size > end)
+						size = end - mpf->physptr;
+					reserve_bootmem_generic(mpf->physptr, size,
+							BOOTMEM_DEFAULT);
+				}
+#else
+				reserve_bootmem_generic(mpf->physptr, size,
 						BOOTMEM_DEFAULT);
+#endif
 			}
 
 			return 1;
@@ -809,15 +804,15 @@ static int  __init get_MP_intsrc_index(struct mpc_intsrc *m)
 	/* not legacy */
 
 	for (i = 0; i < mp_irq_entries; i++) {
-		if (mp_irqs[i].mp_irqtype != mp_INT)
+		if (mp_irqs[i].irqtype != mp_INT)
 			continue;
 
-		if (mp_irqs[i].mp_irqflag != 0x0f)
+		if (mp_irqs[i].irqflag != 0x0f)
 			continue;
 
-		if (mp_irqs[i].mp_srcbus != m->srcbus)
+		if (mp_irqs[i].srcbus != m->srcbus)
 			continue;
-		if (mp_irqs[i].mp_srcbusirq != m->srcbusirq)
+		if (mp_irqs[i].srcbusirq != m->srcbusirq)
 			continue;
 		if (irq_used[i]) {
 			/* already claimed */
@@ -922,10 +917,10 @@ static int  __init replace_intsrc_all(struct mpc_table *mpc,
 		if (irq_used[i])
 			continue;
 
-		if (mp_irqs[i].mp_irqtype != mp_INT)
+		if (mp_irqs[i].irqtype != mp_INT)
 			continue;
 
-		if (mp_irqs[i].mp_irqflag != 0x0f)
+		if (mp_irqs[i].irqflag != 0x0f)
 			continue;
 
 		if (nr_m_spare > 0) {
@@ -1001,7 +996,7 @@ static int __init update_mp_table(void)
 {
 	char str[16];
 	char oem[10];
-	struct intel_mp_floating *mpf;
+	struct mpf_intel *mpf;
 	struct mpc_table *mpc, *mpc_new;
 
 	if (!enable_update_mptable)
@@ -1014,19 +1009,19 @@ static int __init update_mp_table(void)
 	/*
 	 * Now see if we need to go further.
 	 */
-	if (mpf->mpf_feature1 != 0)
+	if (mpf->feature1 != 0)
 		return 0;
 
-	if (!mpf->mpf_physptr)
+	if (!mpf->physptr)
 		return 0;
 
-	mpc = phys_to_virt(mpf->mpf_physptr);
+	mpc = phys_to_virt(mpf->physptr);
 
 	if (!smp_check_mpc(mpc, oem, str))
 		return 0;
 
-	printk(KERN_INFO "mpf: %lx\n", virt_to_phys(mpf));
-	printk(KERN_INFO "mpf_physptr: %x\n", mpf->mpf_physptr);
+	printk(KERN_INFO "mpf: %llx\n", (u64)virt_to_phys(mpf));
+	printk(KERN_INFO "physptr: %x\n", mpf->physptr);
 
 	if (mpc_new_phys && mpc->length > mpc_new_length) {
 		mpc_new_phys = 0;
@@ -1047,23 +1042,23 @@ static int __init update_mp_table(void)
 		}
 		printk(KERN_INFO "use in-positon replacing\n");
 	} else {
-		mpf->mpf_physptr = mpc_new_phys;
+		mpf->physptr = mpc_new_phys;
 		mpc_new = phys_to_virt(mpc_new_phys);
 		memcpy(mpc_new, mpc, mpc->length);
 		mpc = mpc_new;
 		/* check if we can modify that */
-		if (mpc_new_phys - mpf->mpf_physptr) {
-			struct intel_mp_floating *mpf_new;
+		if (mpc_new_phys - mpf->physptr) {
+			struct mpf_intel *mpf_new;
 			/* steal 16 bytes from [0, 1k) */
 			printk(KERN_INFO "mpf new: %x\n", 0x400 - 16);
 			mpf_new = phys_to_virt(0x400 - 16);
 			memcpy(mpf_new, mpf, 16);
 			mpf = mpf_new;
-			mpf->mpf_physptr = mpc_new_phys;
+			mpf->physptr = mpc_new_phys;
 		}
-		mpf->mpf_checksum = 0;
-		mpf->mpf_checksum -= mpf_checksum((unsigned char *)mpf, 16);
-		printk(KERN_INFO "mpf_physptr new: %x\n", mpf->mpf_physptr);
+		mpf->checksum = 0;
+		mpf->checksum -= mpf_checksum((unsigned char *)mpf, 16);
+		printk(KERN_INFO "physptr new: %x\n", mpf->physptr);
 	}
 
 	/*
diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 726266695b2c..3cf3413ec626 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -35,10 +35,10 @@
 #include <linux/device.h>
 #include <linux/cpu.h>
 #include <linux/notifier.h>
+#include <linux/uaccess.h>
 
 #include <asm/processor.h>
 #include <asm/msr.h>
-#include <asm/uaccess.h>
 #include <asm/system.h>
 
 static struct class *msr_class;
diff --git a/arch/x86/kernel/numaq_32.c b/arch/x86/kernel/numaq_32.c
deleted file mode 100644
index f2191d4f2717..000000000000
--- a/arch/x86/kernel/numaq_32.c
+++ /dev/null
@@ -1,293 +0,0 @@
-/*
- * Written by: Patricia Gaughen, IBM Corporation
- *
- * Copyright (C) 2002, IBM Corp.
- *
- * All rights reserved.          
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to <gone@us.ibm.com>
- */
-
-#include <linux/mm.h>
-#include <linux/bootmem.h>
-#include <linux/mmzone.h>
-#include <linux/module.h>
-#include <linux/nodemask.h>
-#include <asm/numaq.h>
-#include <asm/topology.h>
-#include <asm/processor.h>
-#include <asm/genapic.h>
-#include <asm/e820.h>
-#include <asm/setup.h>
-
-#define	MB_TO_PAGES(addr) ((addr) << (20 - PAGE_SHIFT))
-
-/*
- * Function: smp_dump_qct()
- *
- * Description: gets memory layout from the quad config table.  This
- * function also updates node_online_map with the nodes (quads) present.
- */
-static void __init smp_dump_qct(void)
-{
-	int node;
-	struct eachquadmem *eq;
-	struct sys_cfg_data *scd =
-		(struct sys_cfg_data *)__va(SYS_CFG_DATA_PRIV_ADDR);
-
-	nodes_clear(node_online_map);
-	for_each_node(node) {
-		if (scd->quads_present31_0 & (1 << node)) {
-			node_set_online(node);
-			eq = &scd->eq[node];
-			/* Convert to pages */
-			node_start_pfn[node] = MB_TO_PAGES(
-				eq->hi_shrd_mem_start - eq->priv_mem_size);
-			node_end_pfn[node] = MB_TO_PAGES(
-				eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
-
-			e820_register_active_regions(node, node_start_pfn[node],
-							node_end_pfn[node]);
-			memory_present(node,
-				node_start_pfn[node], node_end_pfn[node]);
-			node_remap_size[node] = node_memmap_size_bytes(node,
-							node_start_pfn[node],
-							node_end_pfn[node]);
-		}
-	}
-}
-
-
-void __cpuinit numaq_tsc_disable(void)
-{
-	if (!found_numaq)
-		return;
-
-	if (num_online_nodes() > 1) {
-		printk(KERN_DEBUG "NUMAQ: disabling TSC\n");
-		setup_clear_cpu_cap(X86_FEATURE_TSC);
-	}
-}
-
-static int __init numaq_pre_time_init(void)
-{
-	numaq_tsc_disable();
-	return 0;
-}
-
-int found_numaq;
-/*
- * Have to match translation table entries to main table entries by counter
- * hence the mpc_record variable .... can't see a less disgusting way of
- * doing this ....
- */
-struct mpc_config_translation {
-	unsigned char mpc_type;
-	unsigned char trans_len;
-	unsigned char trans_type;
-	unsigned char trans_quad;
-	unsigned char trans_global;
-	unsigned char trans_local;
-	unsigned short trans_reserved;
-};
-
-/* x86_quirks member */
-static int mpc_record;
-static struct mpc_config_translation *translation_table[MAX_MPC_ENTRY]
-    __cpuinitdata;
-
-static inline int generate_logical_apicid(int quad, int phys_apicid)
-{
-	return (quad << 4) + (phys_apicid ? phys_apicid << 1 : 1);
-}
-
-/* x86_quirks member */
-static int mpc_apic_id(struct mpc_cpu *m)
-{
-	int quad = translation_table[mpc_record]->trans_quad;
-	int logical_apicid = generate_logical_apicid(quad, m->apicid);
-
-	printk(KERN_DEBUG "Processor #%d %u:%u APIC version %d (quad %d, apic %d)\n",
-	       m->apicid, (m->cpufeature & CPU_FAMILY_MASK) >> 8,
-	       (m->cpufeature & CPU_MODEL_MASK) >> 4,
-	       m->apicver, quad, logical_apicid);
-	return logical_apicid;
-}
-
-int mp_bus_id_to_node[MAX_MP_BUSSES];
-
-int mp_bus_id_to_local[MAX_MP_BUSSES];
-
-/* x86_quirks member */
-static void mpc_oem_bus_info(struct mpc_bus *m, char *name)
-{
-	int quad = translation_table[mpc_record]->trans_quad;
-	int local = translation_table[mpc_record]->trans_local;
-
-	mp_bus_id_to_node[m->busid] = quad;
-	mp_bus_id_to_local[m->busid] = local;
-	printk(KERN_INFO "Bus #%d is %s (node %d)\n",
-	       m->busid, name, quad);
-}
-
-int quad_local_to_mp_bus_id [NR_CPUS/4][4];
-
-/* x86_quirks member */
-static void mpc_oem_pci_bus(struct mpc_bus *m)
-{
-	int quad = translation_table[mpc_record]->trans_quad;
-	int local = translation_table[mpc_record]->trans_local;
-
-	quad_local_to_mp_bus_id[quad][local] = m->busid;
-}
-
-static void __init MP_translation_info(struct mpc_config_translation *m)
-{
-	printk(KERN_INFO
-	       "Translation: record %d, type %d, quad %d, global %d, local %d\n",
-	       mpc_record, m->trans_type, m->trans_quad, m->trans_global,
-	       m->trans_local);
-
-	if (mpc_record >= MAX_MPC_ENTRY)
-		printk(KERN_ERR "MAX_MPC_ENTRY exceeded!\n");
-	else
-		translation_table[mpc_record] = m;	/* stash this for later */
-	if (m->trans_quad < MAX_NUMNODES && !node_online(m->trans_quad))
-		node_set_online(m->trans_quad);
-}
-
-static int __init mpf_checksum(unsigned char *mp, int len)
-{
-	int sum = 0;
-
-	while (len--)
-		sum += *mp++;
-
-	return sum & 0xFF;
-}
-
-/*
- * Read/parse the MPC oem tables
- */
-
-static void __init smp_read_mpc_oem(struct mpc_oemtable *oemtable,
-				    unsigned short oemsize)
-{
-	int count = sizeof(*oemtable);	/* the header size */
-	unsigned char *oemptr = ((unsigned char *)oemtable) + count;
-
-	mpc_record = 0;
-	printk(KERN_INFO "Found an OEM MPC table at %8p - parsing it ... \n",
-	       oemtable);
-	if (memcmp(oemtable->signature, MPC_OEM_SIGNATURE, 4)) {
-		printk(KERN_WARNING
-		       "SMP mpc oemtable: bad signature [%c%c%c%c]!\n",
-		       oemtable->signature[0], oemtable->signature[1],
-		       oemtable->signature[2], oemtable->signature[3]);
-		return;
-	}
-	if (mpf_checksum((unsigned char *)oemtable, oemtable->length)) {
-		printk(KERN_WARNING "SMP oem mptable: checksum error!\n");
-		return;
-	}
-	while (count < oemtable->length) {
-		switch (*oemptr) {
-		case MP_TRANSLATION:
-			{
-				struct mpc_config_translation *m =
-				    (struct mpc_config_translation *)oemptr;
-				MP_translation_info(m);
-				oemptr += sizeof(*m);
-				count += sizeof(*m);
-				++mpc_record;
-				break;
-			}
-		default:
-			{
-				printk(KERN_WARNING
-				       "Unrecognised OEM table entry type! - %d\n",
-				       (int)*oemptr);
-				return;
-			}
-		}
-	}
-}
-
-static int __init numaq_setup_ioapic_ids(void)
-{
-	/* so can skip it */
-	return 1;
-}
-
-static int __init numaq_update_genapic(void)
-{
-	genapic->wakeup_cpu = wakeup_secondary_cpu_via_nmi;
-
-	return 0;
-}
-
-static struct x86_quirks numaq_x86_quirks __initdata = {
-	.arch_pre_time_init	= numaq_pre_time_init,
-	.arch_time_init		= NULL,
-	.arch_pre_intr_init	= NULL,
-	.arch_memory_setup	= NULL,
-	.arch_intr_init		= NULL,
-	.arch_trap_init		= NULL,
-	.mach_get_smp_config	= NULL,
-	.mach_find_smp_config	= NULL,
-	.mpc_record		= &mpc_record,
-	.mpc_apic_id		= mpc_apic_id,
-	.mpc_oem_bus_info	= mpc_oem_bus_info,
-	.mpc_oem_pci_bus	= mpc_oem_pci_bus,
-	.smp_read_mpc_oem	= smp_read_mpc_oem,
-	.setup_ioapic_ids	= numaq_setup_ioapic_ids,
-	.update_genapic		= numaq_update_genapic,
-};
-
-void numaq_mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
-{
-	if (strncmp(oem, "IBM NUMA", 8))
-		printk("Warning!  Not a NUMA-Q system!\n");
-	else
-		found_numaq = 1;
-}
-
-static __init void early_check_numaq(void)
-{
-	/*
-	 * Find possible boot-time SMP configuration:
-	 */
-	early_find_smp_config();
-	/*
-	 * get boot-time SMP configuration:
-	 */
-	if (smp_found_config)
-		early_get_smp_config();
-
-	if (found_numaq)
-		x86_quirks = &numaq_x86_quirks;
-}
-
-int __init get_memcfg_numaq(void)
-{
-	early_check_numaq();
-	if (!found_numaq)
-		return 0;
-	smp_dump_qct();
-	return 1;
-}
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 95777b0faa73..3a7c5a44082e 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -26,13 +26,3 @@ struct pv_lock_ops pv_lock_ops = {
 };
 EXPORT_SYMBOL(pv_lock_ops);
 
-void __init paravirt_use_bytelocks(void)
-{
-#ifdef CONFIG_SMP
-	pv_lock_ops.spin_is_locked = __byte_spin_is_locked;
-	pv_lock_ops.spin_is_contended = __byte_spin_is_contended;
-	pv_lock_ops.spin_lock = __byte_spin_lock;
-	pv_lock_ops.spin_trylock = __byte_spin_trylock;
-	pv_lock_ops.spin_unlock = __byte_spin_unlock;
-#endif
-}
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c6520a4e85d4..63dd358d8ee1 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -28,7 +28,6 @@
 #include <asm/paravirt.h>
 #include <asm/desc.h>
 #include <asm/setup.h>
-#include <asm/arch_hooks.h>
 #include <asm/pgtable.h>
 #include <asm/time.h>
 #include <asm/pgalloc.h>
@@ -44,6 +43,17 @@ void _paravirt_nop(void)
 {
 }
 
+/* identity function, which can be inlined */
+u32 _paravirt_ident_32(u32 x)
+{
+	return x;
+}
+
+u64 _paravirt_ident_64(u64 x)
+{
+	return x;
+}
+
 static void __init default_banner(void)
 {
 	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
@@ -138,9 +148,16 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
 	if (opfunc == NULL)
 		/* If there's no function, patch it with a ud2a (BUG) */
 		ret = paravirt_patch_insns(insnbuf, len, ud2a, ud2a+sizeof(ud2a));
-	else if (opfunc == paravirt_nop)
+	else if (opfunc == _paravirt_nop)
 		/* If the operation is a nop, then nop the callsite */
 		ret = paravirt_patch_nop();
+
+	/* identity functions just return their single argument */
+	else if (opfunc == _paravirt_ident_32)
+		ret = paravirt_patch_ident_32(insnbuf, len);
+	else if (opfunc == _paravirt_ident_64)
+		ret = paravirt_patch_ident_64(insnbuf, len);
+
 	else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
 		 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
 		 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) ||
@@ -318,10 +335,10 @@ struct pv_time_ops pv_time_ops = {
 
 struct pv_irq_ops pv_irq_ops = {
 	.init_IRQ = native_init_IRQ,
-	.save_fl = native_save_fl,
-	.restore_fl = native_restore_fl,
-	.irq_disable = native_irq_disable,
-	.irq_enable = native_irq_enable,
+	.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
+	.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
+	.irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
+	.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
 	.safe_halt = native_safe_halt,
 	.halt = native_halt,
 #ifdef CONFIG_X86_64
@@ -399,6 +416,14 @@ struct pv_apic_ops pv_apic_ops = {
 #endif
 };
 
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_PAE)
+/* 32-bit pagetable entries */
+#define PTE_IDENT	__PV_IS_CALLEE_SAVE(_paravirt_ident_32)
+#else
+/* 64-bit pagetable entries */
+#define PTE_IDENT	__PV_IS_CALLEE_SAVE(_paravirt_ident_64)
+#endif
+
 struct pv_mmu_ops pv_mmu_ops = {
 #ifndef CONFIG_X86_64
 	.pagetable_setup_start = native_pagetable_setup_start,
@@ -450,22 +475,23 @@ struct pv_mmu_ops pv_mmu_ops = {
 	.pmd_clear = native_pmd_clear,
 #endif
 	.set_pud = native_set_pud,
-	.pmd_val = native_pmd_val,
-	.make_pmd = native_make_pmd,
+
+	.pmd_val = PTE_IDENT,
+	.make_pmd = PTE_IDENT,
 
 #if PAGETABLE_LEVELS == 4
-	.pud_val = native_pud_val,
-	.make_pud = native_make_pud,
+	.pud_val = PTE_IDENT,
+	.make_pud = PTE_IDENT,
+
 	.set_pgd = native_set_pgd,
 #endif
 #endif /* PAGETABLE_LEVELS >= 3 */
 
-	.pte_val = native_pte_val,
-	.pte_flags = native_pte_flags,
-	.pgd_val = native_pgd_val,
+	.pte_val = PTE_IDENT,
+	.pgd_val = PTE_IDENT,
 
-	.make_pte = native_make_pte,
-	.make_pgd = native_make_pgd,
+	.make_pte = PTE_IDENT,
+	.make_pgd = PTE_IDENT,
 
 	.dup_mmap = paravirt_nop,
 	.exit_mmap = paravirt_nop,
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index 9fe644f4861d..d9f32e6d6ab6 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -12,6 +12,18 @@ DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
 DEF_NATIVE(pv_cpu_ops, clts, "clts");
 DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
 
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+	/* arg in %eax, return in %eax */
+	return 0;
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+	/* arg in %edx:%eax, return in %edx:%eax */
+	return 0;
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
 {
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 061d01df9ae6..3f08f34f93eb 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -19,6 +19,21 @@ DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq");
 DEF_NATIVE(pv_cpu_ops, usergs_sysret32, "swapgs; sysretl");
 DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
 
+DEF_NATIVE(, mov32, "mov %edi, %eax");
+DEF_NATIVE(, mov64, "mov %rdi, %rax");
+
+unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
+{
+	return paravirt_patch_insns(insnbuf, len,
+				    start__mov32, end__mov32);
+}
+
+unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
+{
+	return paravirt_patch_insns(insnbuf, len,
+				    start__mov64, end__mov64);
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
 {
diff --git a/arch/x86/kernel/probe_roms_32.c b/arch/x86/kernel/probe_roms_32.c
index 675a48c404a5..071e7fea42e5 100644
--- a/arch/x86/kernel/probe_roms_32.c
+++ b/arch/x86/kernel/probe_roms_32.c
@@ -18,7 +18,7 @@
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/io.h>
-#include <setup_arch.h>
+#include <asm/setup_arch.h>
 
 static struct resource system_rom_resource = {
 	.name	= "System ROM",
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6d12f7e37f8c..6afa5232dbb7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -1,8 +1,8 @@
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
-#include <asm/idle.h>
 #include <linux/smp.h>
+#include <linux/prctl.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/module.h>
@@ -11,6 +11,9 @@
 #include <linux/ftrace.h>
 #include <asm/system.h>
 #include <asm/apic.h>
+#include <asm/idle.h>
+#include <asm/uaccess.h>
+#include <asm/i387.h>
 
 unsigned long idle_halt;
 EXPORT_SYMBOL(idle_halt);
@@ -56,6 +59,192 @@ void arch_task_cache_init(void)
 }
 
 /*
+ * Free current thread data structures etc..
+ */
+void exit_thread(void)
+{
+	struct task_struct *me = current;
+	struct thread_struct *t = &me->thread;
+
+	if (me->thread.io_bitmap_ptr) {
+		struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
+
+		kfree(t->io_bitmap_ptr);
+		t->io_bitmap_ptr = NULL;
+		clear_thread_flag(TIF_IO_BITMAP);
+		/*
+		 * Careful, clear this in the TSS too:
+		 */
+		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
+		t->io_bitmap_max = 0;
+		put_cpu();
+	}
+
+	ds_exit_thread(current);
+}
+
+void flush_thread(void)
+{
+	struct task_struct *tsk = current;
+
+#ifdef CONFIG_X86_64
+	if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
+		clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
+		if (test_tsk_thread_flag(tsk, TIF_IA32)) {
+			clear_tsk_thread_flag(tsk, TIF_IA32);
+		} else {
+			set_tsk_thread_flag(tsk, TIF_IA32);
+			current_thread_info()->status |= TS_COMPAT;
+		}
+	}
+#endif
+
+	clear_tsk_thread_flag(tsk, TIF_DEBUG);
+
+	tsk->thread.debugreg0 = 0;
+	tsk->thread.debugreg1 = 0;
+	tsk->thread.debugreg2 = 0;
+	tsk->thread.debugreg3 = 0;
+	tsk->thread.debugreg6 = 0;
+	tsk->thread.debugreg7 = 0;
+	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+	/*
+	 * Forget coprocessor state..
+	 */
+	tsk->fpu_counter = 0;
+	clear_fpu(tsk);
+	clear_used_math();
+}
+
+static void hard_disable_TSC(void)
+{
+	write_cr4(read_cr4() | X86_CR4_TSD);
+}
+
+void disable_TSC(void)
+{
+	preempt_disable();
+	if (!test_and_set_thread_flag(TIF_NOTSC))
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOTSC in the current running context.
+		 */
+		hard_disable_TSC();
+	preempt_enable();
+}
+
+static void hard_enable_TSC(void)
+{
+	write_cr4(read_cr4() & ~X86_CR4_TSD);
+}
+
+static void enable_TSC(void)
+{
+	preempt_disable();
+	if (test_and_clear_thread_flag(TIF_NOTSC))
+		/*
+		 * Must flip the CPU state synchronously with
+		 * TIF_NOTSC in the current running context.
+		 */
+		hard_enable_TSC();
+	preempt_enable();
+}
+
+int get_tsc_mode(unsigned long adr)
+{
+	unsigned int val;
+
+	if (test_thread_flag(TIF_NOTSC))
+		val = PR_TSC_SIGSEGV;
+	else
+		val = PR_TSC_ENABLE;
+
+	return put_user(val, (unsigned int __user *)adr);
+}
+
+int set_tsc_mode(unsigned int val)
+{
+	if (val == PR_TSC_SIGSEGV)
+		disable_TSC();
+	else if (val == PR_TSC_ENABLE)
+		enable_TSC();
+	else
+		return -EINVAL;
+
+	return 0;
+}
+
+void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
+		      struct tss_struct *tss)
+{
+	struct thread_struct *prev, *next;
+
+	prev = &prev_p->thread;
+	next = &next_p->thread;
+
+	if (test_tsk_thread_flag(next_p, TIF_DS_AREA_MSR) ||
+	    test_tsk_thread_flag(prev_p, TIF_DS_AREA_MSR))
+		ds_switch_to(prev_p, next_p);
+	else if (next->debugctlmsr != prev->debugctlmsr)
+		update_debugctlmsr(next->debugctlmsr);
+
+	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
+		set_debugreg(next->debugreg0, 0);
+		set_debugreg(next->debugreg1, 1);
+		set_debugreg(next->debugreg2, 2);
+		set_debugreg(next->debugreg3, 3);
+		/* no 4 and 5 */
+		set_debugreg(next->debugreg6, 6);
+		set_debugreg(next->debugreg7, 7);
+	}
+
+	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
+	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
+		/* prev and next are different */
+		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
+			hard_disable_TSC();
+		else
+			hard_enable_TSC();
+	}
+
+	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
+		/*
+		 * Copy the relevant range of the IO bitmap.
+		 * Normally this is 128 bytes or less:
+		 */
+		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+		       max(prev->io_bitmap_max, next->io_bitmap_max));
+	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
+		/*
+		 * Clear any possible leftover bits:
+		 */
+		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+	}
+}
+
+int sys_fork(struct pt_regs *regs)
+{
+	return do_fork(SIGCHLD, regs->sp, regs, 0, NULL, NULL);
+}
+
+/*
+ * This is trivial, and on the face of it looks like it
+ * could equally well be done in user mode.
+ *
+ * Not so, for quite unobvious reasons - register pressure.
+ * In user mode vfork() cannot have a stack frame, and if
+ * done by calling the "clone()" system call directly, you
+ * do not have enough call-clobbered registers to hold all
+ * the information you need.
+ */
+int sys_vfork(struct pt_regs *regs)
+{
+	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->sp, regs, 0,
+		       NULL, NULL);
+}
+
+
+/*
  * Idle related variables and functions
  */
 unsigned long boot_option_idle_override = 0;
@@ -350,7 +539,7 @@ static void c1e_idle(void)
 
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 	if (pm_idle == poll_idle && smp_num_siblings > 1) {
 		printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
 			" performance may degrade.\n");
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index bd4da2af08ae..14014d766cad 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -11,6 +11,7 @@
 
 #include <stdarg.h>
 
+#include <linux/stackprotector.h>
 #include <linux/cpu.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
@@ -66,9 +67,6 @@ asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 
-DEFINE_PER_CPU(int, cpu_number);
-EXPORT_PER_CPU_SYMBOL(cpu_number);
-
 /*
  * Return saved PC of a blocked thread.
  */
@@ -94,6 +92,15 @@ void cpu_idle(void)
 {
 	int cpu = smp_processor_id();
 
+	/*
+	 * If we're the non-boot CPU, nothing set the stack canary up
+	 * for us.  CPU0 already has it initialized but no harm in
+	 * doing it again.  This is a good place for updating it, as
+	 * we wont ever return from this function (so the invalid
+	 * canaries already on the stack wont ever trigger).
+	 */
+	boot_init_stack_canary();
+
 	current_thread_info()->status |= TS_POLLING;
 
 	/* endless idle loop with no priority at all */
@@ -108,7 +115,6 @@ void cpu_idle(void)
 				play_dead();
 
 			local_irq_disable();
-			__get_cpu_var(irq_stat).idle_timestamp = jiffies;
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
 			pm_idle();
@@ -132,7 +138,7 @@ void __show_regs(struct pt_regs *regs, int all)
 	if (user_mode_vm(regs)) {
 		sp = regs->sp;
 		ss = regs->ss & 0xffff;
-		savesegment(gs, gs);
+		gs = get_user_gs(regs);
 	} else {
 		sp = (unsigned long) (&regs->sp);
 		savesegment(ss, ss);
@@ -213,6 +219,7 @@ int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
 	regs.ds = __USER_DS;
 	regs.es = __USER_DS;
 	regs.fs = __KERNEL_PERCPU;
+	regs.gs = __KERNEL_STACK_CANARY;
 	regs.orig_ax = -1;
 	regs.ip = (unsigned long) kernel_thread_helper;
 	regs.cs = __KERNEL_CS | get_kernel_rpl();
@@ -223,55 +230,6 @@ int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
 }
 EXPORT_SYMBOL(kernel_thread);
 
-/*
- * Free current thread data structures etc..
- */
-void exit_thread(void)
-{
-	/* The process may have allocated an io port bitmap... nuke it. */
-	if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
-		struct task_struct *tsk = current;
-		struct thread_struct *t = &tsk->thread;
-		int cpu = get_cpu();
-		struct tss_struct *tss = &per_cpu(init_tss, cpu);
-
-		kfree(t->io_bitmap_ptr);
-		t->io_bitmap_ptr = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/*
-		 * Careful, clear this in the TSS too:
-		 */
-		memset(tss->io_bitmap, 0xff, tss->io_bitmap_max);
-		t->io_bitmap_max = 0;
-		tss->io_bitmap_owner = NULL;
-		tss->io_bitmap_max = 0;
-		tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
-		put_cpu();
-	}
-
-	ds_exit_thread(current);
-}
-
-void flush_thread(void)
-{
-	struct task_struct *tsk = current;
-
-	tsk->thread.debugreg0 = 0;
-	tsk->thread.debugreg1 = 0;
-	tsk->thread.debugreg2 = 0;
-	tsk->thread.debugreg3 = 0;
-	tsk->thread.debugreg6 = 0;
-	tsk->thread.debugreg7 = 0;
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
-	/*
-	 * Forget coprocessor state..
-	 */
-	tsk->fpu_counter = 0;
-	clear_fpu(tsk);
-	clear_used_math();
-}
-
 void release_thread(struct task_struct *dead_task)
 {
 	BUG_ON(dead_task->mm);
@@ -305,7 +263,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long sp,
 
 	p->thread.ip = (unsigned long) ret_from_fork;
 
-	savesegment(gs, p->thread.gs);
+	task_user_gs(p) = get_user_gs(regs);
 
 	tsk = current;
 	if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
@@ -343,7 +301,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long sp,
 void
 start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 {
-	__asm__("movl %0, %%gs" : : "r"(0));
+	set_user_gs(regs, 0);
 	regs->fs		= 0;
 	set_fs(USER_DS);
 	regs->ds		= __USER_DS;
@@ -359,127 +317,6 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 }
 EXPORT_SYMBOL_GPL(start_thread);
 
-static void hard_disable_TSC(void)
-{
-	write_cr4(read_cr4() | X86_CR4_TSD);
-}
-
-void disable_TSC(void)
-{
-	preempt_disable();
-	if (!test_and_set_thread_flag(TIF_NOTSC))
-		/*
-		 * Must flip the CPU state synchronously with
-		 * TIF_NOTSC in the current running context.
-		 */
-		hard_disable_TSC();
-	preempt_enable();
-}
-
-static void hard_enable_TSC(void)
-{
-	write_cr4(read_cr4() & ~X86_CR4_TSD);
-}
-
-static void enable_TSC(void)
-{
-	preempt_disable();
-	if (test_and_clear_thread_flag(TIF_NOTSC))
-		/*
-		 * Must flip the CPU state synchronously with
-		 * TIF_NOTSC in the current running context.
-		 */
-		hard_enable_TSC();
-	preempt_enable();
-}
-
-int get_tsc_mode(unsigned long adr)
-{
-	unsigned int val;
-
-	if (test_thread_flag(TIF_NOTSC))
-		val = PR_TSC_SIGSEGV;
-	else
-		val = PR_TSC_ENABLE;
-
-	return put_user(val, (unsigned int __user *)adr);
-}
-
-int set_tsc_mode(unsigned int val)
-{
-	if (val == PR_TSC_SIGSEGV)
-		disable_TSC();
-	else if (val == PR_TSC_ENABLE)
-		enable_TSC();
-	else
-		return -EINVAL;
-
-	return 0;
-}
-
-static noinline void
-__switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
-		 struct tss_struct *tss)
-{
-	struct thread_struct *prev, *next;
-
-	prev = &prev_p->thread;
-	next = &next_p->thread;
-
-	if (test_tsk_thread_flag(next_p, TIF_DS_AREA_MSR) ||
-	    test_tsk_thread_flag(prev_p, TIF_DS_AREA_MSR))
-		ds_switch_to(prev_p, next_p);
-	else if (next->debugctlmsr != prev->debugctlmsr)
-		update_debugctlmsr(next->debugctlmsr);
-
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		set_debugreg(next->debugreg0, 0);
-		set_debugreg(next->debugreg1, 1);
-		set_debugreg(next->debugreg2, 2);
-		set_debugreg(next->debugreg3, 3);
-		/* no 4 and 5 */
-		set_debugreg(next->debugreg6, 6);
-		set_debugreg(next->debugreg7, 7);
-	}
-
-	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
-	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
-		/* prev and next are different */
-		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
-			hard_disable_TSC();
-		else
-			hard_enable_TSC();
-	}
-
-	if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
-		/*
-		 * Disable the bitmap via an invalid offset. We still cache
-		 * the previous bitmap owner and the IO bitmap contents:
-		 */
-		tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
-		return;
-	}
-
-	if (likely(next == tss->io_bitmap_owner)) {
-		/*
-		 * Previous owner of the bitmap (hence the bitmap content)
-		 * matches the next task, we dont have to do anything but
-		 * to set a valid offset in the TSS:
-		 */
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
-		return;
-	}
-	/*
-	 * Lazy TSS's I/O bitmap copy. We set an invalid offset here
-	 * and we let the task to get a GPF in case an I/O instruction
-	 * is performed.  The handler of the GPF will verify that the
-	 * faulting task has a valid I/O bitmap and, it true, does the
-	 * real copy and restart the instruction.  This will save us
-	 * redundant copies when the currently switched task does not
-	 * perform any I/O during its timeslice.
-	 */
-	tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET_LAZY;
-}
 
 /*
  *	switch_to(x,yn) should switch tasks from x to y.
@@ -540,7 +377,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * used %fs or %gs (it does not today), or if the kernel is
 	 * running inside of a hypervisor layer.
 	 */
-	savesegment(gs, prev->gs);
+	lazy_save_gs(prev->gs);
 
 	/*
 	 * Load the per-thread Thread-Local Storage descriptor.
@@ -586,64 +423,44 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * Restore %gs if needed (which is common)
 	 */
 	if (prev->gs | next->gs)
-		loadsegment(gs, next->gs);
+		lazy_load_gs(next->gs);
 
-	x86_write_percpu(current_task, next_p);
+	percpu_write(current_task, next_p);
 
 	return prev_p;
 }
 
-asmlinkage int sys_fork(struct pt_regs regs)
-{
-	return do_fork(SIGCHLD, regs.sp, &regs, 0, NULL, NULL);
-}
-
-asmlinkage int sys_clone(struct pt_regs regs)
+int sys_clone(struct pt_regs *regs)
 {
 	unsigned long clone_flags;
 	unsigned long newsp;
 	int __user *parent_tidptr, *child_tidptr;
 
-	clone_flags = regs.bx;
-	newsp = regs.cx;
-	parent_tidptr = (int __user *)regs.dx;
-	child_tidptr = (int __user *)regs.di;
+	clone_flags = regs->bx;
+	newsp = regs->cx;
+	parent_tidptr = (int __user *)regs->dx;
+	child_tidptr = (int __user *)regs->di;
 	if (!newsp)
-		newsp = regs.sp;
-	return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr);
-}
-
-/*
- * This is trivial, and on the face of it looks like it
- * could equally well be done in user mode.
- *
- * Not so, for quite unobvious reasons - register pressure.
- * In user mode vfork() cannot have a stack frame, and if
- * done by calling the "clone()" system call directly, you
- * do not have enough call-clobbered registers to hold all
- * the information you need.
- */
-asmlinkage int sys_vfork(struct pt_regs regs)
-{
-	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.sp, &regs, 0, NULL, NULL);
+		newsp = regs->sp;
+	return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
 }
 
 /*
  * sys_execve() executes a new program.
  */
-asmlinkage int sys_execve(struct pt_regs regs)
+int sys_execve(struct pt_regs *regs)
 {
 	int error;
 	char *filename;
 
-	filename = getname((char __user *) regs.bx);
+	filename = getname((char __user *) regs->bx);
 	error = PTR_ERR(filename);
 	if (IS_ERR(filename))
 		goto out;
 	error = do_execve(filename,
-			(char __user * __user *) regs.cx,
-			(char __user * __user *) regs.dx,
-			&regs);
+			(char __user * __user *) regs->cx,
+			(char __user * __user *) regs->dx,
+			regs);
 	if (error == 0) {
 		/* Make sure we don't return using sysenter.. */
 		set_thread_flag(TIF_IRET);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 85b4cb5c1980..abb7e6a7f0c6 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -16,6 +16,7 @@
 
 #include <stdarg.h>
 
+#include <linux/stackprotector.h>
 #include <linux/cpu.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
@@ -47,7 +48,6 @@
 #include <asm/processor.h>
 #include <asm/i387.h>
 #include <asm/mmu_context.h>
-#include <asm/pda.h>
 #include <asm/prctl.h>
 #include <asm/desc.h>
 #include <asm/proto.h>
@@ -58,6 +58,12 @@
 
 asmlinkage extern void ret_from_fork(void);
 
+DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
+EXPORT_PER_CPU_SYMBOL(current_task);
+
+DEFINE_PER_CPU(unsigned long, old_rsp);
+static DEFINE_PER_CPU(unsigned char, is_idle);
+
 unsigned long kernel_thread_flags = CLONE_VM | CLONE_UNTRACED;
 
 static ATOMIC_NOTIFIER_HEAD(idle_notifier);
@@ -76,13 +82,13 @@ EXPORT_SYMBOL_GPL(idle_notifier_unregister);
 
 void enter_idle(void)
 {
-	write_pda(isidle, 1);
+	percpu_write(is_idle, 1);
 	atomic_notifier_call_chain(&idle_notifier, IDLE_START, NULL);
 }
 
 static void __exit_idle(void)
 {
-	if (test_and_clear_bit_pda(0, isidle) == 0)
+	if (x86_test_and_clear_bit_percpu(0, is_idle) == 0)
 		return;
 	atomic_notifier_call_chain(&idle_notifier, IDLE_END, NULL);
 }
@@ -112,6 +118,16 @@ static inline void play_dead(void)
 void cpu_idle(void)
 {
 	current_thread_info()->status |= TS_POLLING;
+
+	/*
+	 * If we're the non-boot CPU, nothing set the stack canary up
+	 * for us.  CPU0 already has it initialized but no harm in
+	 * doing it again.  This is a good place for updating it, as
+	 * we wont ever return from this function (so the invalid
+	 * canaries already on the stack wont ever trigger).
+	 */
+	boot_init_stack_canary();
+
 	/* endless idle loop with no priority at all */
 	while (1) {
 		tick_nohz_stop_sched_tick(1);
@@ -221,61 +237,6 @@ void show_regs(struct pt_regs *regs)
 	show_trace(NULL, regs, (void *)(regs + 1), regs->bp);
 }
 
-/*
- * Free current thread data structures etc..
- */
-void exit_thread(void)
-{
-	struct task_struct *me = current;
-	struct thread_struct *t = &me->thread;
-
-	if (me->thread.io_bitmap_ptr) {
-		struct tss_struct *tss = &per_cpu(init_tss, get_cpu());
-
-		kfree(t->io_bitmap_ptr);
-		t->io_bitmap_ptr = NULL;
-		clear_thread_flag(TIF_IO_BITMAP);
-		/*
-		 * Careful, clear this in the TSS too:
-		 */
-		memset(tss->io_bitmap, 0xff, t->io_bitmap_max);
-		t->io_bitmap_max = 0;
-		put_cpu();
-	}
-
-	ds_exit_thread(current);
-}
-
-void flush_thread(void)
-{
-	struct task_struct *tsk = current;
-
-	if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
-		clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
-		if (test_tsk_thread_flag(tsk, TIF_IA32)) {
-			clear_tsk_thread_flag(tsk, TIF_IA32);
-		} else {
-			set_tsk_thread_flag(tsk, TIF_IA32);
-			current_thread_info()->status |= TS_COMPAT;
-		}
-	}
-	clear_tsk_thread_flag(tsk, TIF_DEBUG);
-
-	tsk->thread.debugreg0 = 0;
-	tsk->thread.debugreg1 = 0;
-	tsk->thread.debugreg2 = 0;
-	tsk->thread.debugreg3 = 0;
-	tsk->thread.debugreg6 = 0;
-	tsk->thread.debugreg7 = 0;
-	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
-	/*
-	 * Forget coprocessor state..
-	 */
-	tsk->fpu_counter = 0;
-	clear_fpu(tsk);
-	clear_used_math();
-}
-
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
@@ -397,7 +358,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 	load_gs_index(0);
 	regs->ip		= new_ip;
 	regs->sp		= new_sp;
-	write_pda(oldrsp, new_sp);
+	percpu_write(old_rsp, new_sp);
 	regs->cs		= __USER_CS;
 	regs->ss		= __USER_DS;
 	regs->flags		= 0x200;
@@ -409,118 +370,6 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 }
 EXPORT_SYMBOL_GPL(start_thread);
 
-static void hard_disable_TSC(void)
-{
-	write_cr4(read_cr4() | X86_CR4_TSD);
-}
-
-void disable_TSC(void)
-{
-	preempt_disable();
-	if (!test_and_set_thread_flag(TIF_NOTSC))
-		/*
-		 * Must flip the CPU state synchronously with
-		 * TIF_NOTSC in the current running context.
-		 */
-		hard_disable_TSC();
-	preempt_enable();
-}
-
-static void hard_enable_TSC(void)
-{
-	write_cr4(read_cr4() & ~X86_CR4_TSD);
-}
-
-static void enable_TSC(void)
-{
-	preempt_disable();
-	if (test_and_clear_thread_flag(TIF_NOTSC))
-		/*
-		 * Must flip the CPU state synchronously with
-		 * TIF_NOTSC in the current running context.
-		 */
-		hard_enable_TSC();
-	preempt_enable();
-}
-
-int get_tsc_mode(unsigned long adr)
-{
-	unsigned int val;
-
-	if (test_thread_flag(TIF_NOTSC))
-		val = PR_TSC_SIGSEGV;
-	else
-		val = PR_TSC_ENABLE;
-
-	return put_user(val, (unsigned int __user *)adr);
-}
-
-int set_tsc_mode(unsigned int val)
-{
-	if (val == PR_TSC_SIGSEGV)
-		disable_TSC();
-	else if (val == PR_TSC_ENABLE)
-		enable_TSC();
-	else
-		return -EINVAL;
-
-	return 0;
-}
-
-/*
- * This special macro can be used to load a debugging register
- */
-#define loaddebug(thread, r) set_debugreg(thread->debugreg ## r, r)
-
-static inline void __switch_to_xtra(struct task_struct *prev_p,
-				    struct task_struct *next_p,
-				    struct tss_struct *tss)
-{
-	struct thread_struct *prev, *next;
-
-	prev = &prev_p->thread,
-	next = &next_p->thread;
-
-	if (test_tsk_thread_flag(next_p, TIF_DS_AREA_MSR) ||
-	    test_tsk_thread_flag(prev_p, TIF_DS_AREA_MSR))
-		ds_switch_to(prev_p, next_p);
-	else if (next->debugctlmsr != prev->debugctlmsr)
-		update_debugctlmsr(next->debugctlmsr);
-
-	if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
-		loaddebug(next, 0);
-		loaddebug(next, 1);
-		loaddebug(next, 2);
-		loaddebug(next, 3);
-		/* no 4 and 5 */
-		loaddebug(next, 6);
-		loaddebug(next, 7);
-	}
-
-	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
-	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
-		/* prev and next are different */
-		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
-			hard_disable_TSC();
-		else
-			hard_enable_TSC();
-	}
-
-	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
-		/*
-		 * Copy the relevant range of the IO bitmap.
-		 * Normally this is 128 bytes or less:
-		 */
-		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(prev->io_bitmap_max, next->io_bitmap_max));
-	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
-		/*
-		 * Clear any possible leftover bits:
-		 */
-		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
-	}
-}
-
 /*
  *	switch_to(x,y) should switch tasks from x to y.
  *
@@ -618,21 +467,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
-	prev->usersp = read_pda(oldrsp);
-	write_pda(oldrsp, next->usersp);
-	write_pda(pcurrent, next_p);
+	prev->usersp = percpu_read(old_rsp);
+	percpu_write(old_rsp, next->usersp);
+	percpu_write(current_task, next_p);
 
-	write_pda(kernelstack,
+	percpu_write(kernel_stack,
 		  (unsigned long)task_stack_page(next_p) +
-		  THREAD_SIZE - PDA_STACKOFFSET);
-#ifdef CONFIG_CC_STACKPROTECTOR
-	write_pda(stack_canary, next_p->stack_canary);
-	/*
-	 * Build time only check to make sure the stack_canary is at
-	 * offset 40 in the pda; this is a gcc ABI requirement
-	 */
-	BUILD_BUG_ON(offsetof(struct x8664_pda, stack_canary) != 40);
-#endif
+		  THREAD_SIZE - KERNEL_STACK_OFFSET);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
@@ -686,11 +527,6 @@ void set_personality_64bit(void)
 	current->personality &= ~READ_IMPLIES_EXEC;
 }
 
-asmlinkage long sys_fork(struct pt_regs *regs)
-{
-	return do_fork(SIGCHLD, regs->sp, regs, 0, NULL, NULL);
-}
-
 asmlinkage long
 sys_clone(unsigned long clone_flags, unsigned long newsp,
 	  void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
@@ -700,22 +536,6 @@ sys_clone(unsigned long clone_flags, unsigned long newsp,
 	return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
 }
 
-/*
- * This is trivial, and on the face of it looks like it
- * could equally well be done in user mode.
- *
- * Not so, for quite unobvious reasons - register pressure.
- * In user mode vfork() cannot have a stack frame, and if
- * done by calling the "clone()" system call directly, you
- * do not have enough call-clobbered registers to hold all
- * the information you need.
- */
-asmlinkage long sys_vfork(struct pt_regs *regs)
-{
-	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->sp, regs, 0,
-		    NULL, NULL);
-}
-
 unsigned long get_wchan(struct task_struct *p)
 {
 	unsigned long stack;
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 06ca07f6ad86..3d9672e59c16 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -75,10 +75,7 @@ static inline bool invalid_selector(u16 value)
 static unsigned long *pt_regs_access(struct pt_regs *regs, unsigned long regno)
 {
 	BUILD_BUG_ON(offsetof(struct pt_regs, bx) != 0);
-	regno >>= 2;
-	if (regno > FS)
-		--regno;
-	return &regs->bx + regno;
+	return &regs->bx + (regno >> 2);
 }
 
 static u16 get_segment_reg(struct task_struct *task, unsigned long offset)
@@ -90,9 +87,10 @@ static u16 get_segment_reg(struct task_struct *task, unsigned long offset)
 	if (offset != offsetof(struct user_regs_struct, gs))
 		retval = *pt_regs_access(task_pt_regs(task), offset);
 	else {
-		retval = task->thread.gs;
 		if (task == current)
-			savesegment(gs, retval);
+			retval = get_user_gs(task_pt_regs(task));
+		else
+			retval = task_user_gs(task);
 	}
 	return retval;
 }
@@ -126,13 +124,10 @@ static int set_segment_reg(struct task_struct *task,
 		break;
 
 	case offsetof(struct user_regs_struct, gs):
-		task->thread.gs = value;
 		if (task == current)
-			/*
-			 * The user-mode %gs is not affected by
-			 * kernel entry, so we must update the CPU.
-			 */
-			loadsegment(gs, value);
+			set_user_gs(task_pt_regs(task), value);
+		else
+			task_user_gs(task) = value;
 	}
 
 	return 0;
@@ -273,7 +268,7 @@ static unsigned long debugreg_addr_limit(struct task_struct *task)
 	if (test_tsk_thread_flag(task, TIF_IA32))
 		return IA32_PAGE_OFFSET - 3;
 #endif
-	return TASK_SIZE64 - 7;
+	return TASK_SIZE_MAX - 7;
 }
 
 #endif	/* CONFIG_X86_32 */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 4526b3a75ed2..2aef36d8aca2 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -14,6 +14,7 @@
 #include <asm/reboot.h>
 #include <asm/pci_x86.h>
 #include <asm/virtext.h>
+#include <asm/cpu.h>
 
 #ifdef CONFIG_X86_32
 # include <linux/dmi.h>
@@ -23,8 +24,6 @@
 # include <asm/iommu.h>
 #endif
 
-#include <mach_ipi.h>
-
 /*
  * Power off function, if any
  */
@@ -658,7 +657,7 @@ static int crash_nmi_callback(struct notifier_block *self,
 
 static void smp_send_nmi_allbutself(void)
 {
-	send_IPI_allbutself(NMI_VECTOR);
+	apic->send_IPI_allbutself(NMI_VECTOR);
 }
 
 static struct notifier_block crash_nmi_nb = {
diff --git a/arch/x86/kernel/relocate_kernel_32.S b/arch/x86/kernel/relocate_kernel_32.S
index a160f3119725..2064d0aa8d28 100644
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -7,7 +7,7 @@
  */
 
 #include <linux/linkage.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/kexec.h>
 #include <asm/processor-flags.h>
 
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index f5afe665a82b..d32cfb27a479 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -7,10 +7,10 @@
  */
 
 #include <linux/linkage.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/kexec.h>
 #include <asm/processor-flags.h>
-#include <asm/pgtable.h>
+#include <asm/pgtable_types.h>
 
 /*
  * Must be relocatable PIC code callable as a C function
@@ -29,122 +29,6 @@ relocate_kernel:
 	 * %rdx start address
 	 */
 
-	/* map the control page at its virtual address */
-
-	movq	$0x0000ff8000000000, %r10        /* mask */
-	mov	$(39 - 3), %cl                   /* bits to shift */
-	movq	PTR(VA_CONTROL_PAGE)(%rsi), %r11 /* address to map */
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PGD)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PUD_0)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PUD_0)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PMD_0)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PMD_0)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PTE_0)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PTE_0)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_CONTROL_PAGE)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	/* identity map the control page at its physical address */
-
-	movq	$0x0000ff8000000000, %r10        /* mask */
-	mov	$(39 - 3), %cl                   /* bits to shift */
-	movq	PTR(PA_CONTROL_PAGE)(%rsi), %r11 /* address to map */
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PGD)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PUD_1)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PUD_1)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PMD_1)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PMD_1)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_PTE_1)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-	shrq	$9, %r10
-	sub	$9, %cl
-
-	movq	%r11, %r9
-	andq	%r10, %r9
-	shrq	%cl, %r9
-
-	movq	PTR(VA_PTE_1)(%rsi), %r8
-	addq	%r8, %r9
-	movq	PTR(PA_CONTROL_PAGE)(%rsi), %r8
-	orq	$PAGE_ATTR, %r8
-	movq	%r8, (%r9)
-
-relocate_new_kernel:
-	/* %rdi indirection_page
-	 * %rsi page_list
-	 * %rdx start address
-	 */
-
 	/* zero out flags, and disable interrupts */
 	pushq $0
 	popfq
@@ -156,9 +40,8 @@ relocate_new_kernel:
 	/* get physical address of page table now too */
 	movq	PTR(PA_TABLE_PAGE)(%rsi), %rcx
 
-	/* switch to new set of page tables */
-	movq	PTR(PA_PGD)(%rsi), %r9
-	movq	%r9, %cr3
+	/* Switch to the identity mapped page tables */
+	movq	%rcx, %cr3
 
 	/* setup a new stack at the end of the physical control page */
 	lea	PAGE_SIZE(%r8), %rsp
@@ -194,9 +77,7 @@ identity_mapped:
 	jmp 1f
 1:
 
-	/* Switch to the identity mapped page tables,
-	 * and flush the TLB.
-	*/
+	/* Flush the TLB (needed?) */
 	movq	%rcx, %cr3
 
 	/* Do the copies */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6a8811a69324..b746deb9ebc6 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -74,14 +74,15 @@
 #include <asm/e820.h>
 #include <asm/mpspec.h>
 #include <asm/setup.h>
-#include <asm/arch_hooks.h>
 #include <asm/efi.h>
+#include <asm/timer.h>
+#include <asm/i8259.h>
 #include <asm/sections.h>
 #include <asm/dmi.h>
 #include <asm/io_apic.h>
 #include <asm/ist.h>
 #include <asm/vmi.h>
-#include <setup_arch.h>
+#include <asm/setup_arch.h>
 #include <asm/bios_ebda.h>
 #include <asm/cacheflush.h>
 #include <asm/processor.h>
@@ -89,7 +90,7 @@
 
 #include <asm/system.h>
 #include <asm/vsyscall.h>
-#include <asm/smp.h>
+#include <asm/cpu.h>
 #include <asm/desc.h>
 #include <asm/dma.h>
 #include <asm/iommu.h>
@@ -97,7 +98,6 @@
 #include <asm/mmu_context.h>
 #include <asm/proto.h>
 
-#include <mach_apic.h>
 #include <asm/paravirt.h>
 #include <asm/hypervisor.h>
 
@@ -112,6 +112,20 @@
 #define ARCH_SETUP
 #endif
 
+unsigned int boot_cpu_id __read_mostly;
+
+#ifdef CONFIG_X86_64
+int default_cpu_present_to_apicid(int mps_cpu)
+{
+	return __default_cpu_present_to_apicid(mps_cpu);
+}
+
+int default_check_phys_apicid_present(int boot_cpu_physical_apicid)
+{
+	return __default_check_phys_apicid_present(boot_cpu_physical_apicid);
+}
+#endif
+
 #ifndef CONFIG_DEBUG_BOOT_PARAMS
 struct boot_params __initdata boot_params;
 #else
@@ -586,20 +600,7 @@ static int __init setup_elfcorehdr(char *arg)
 early_param("elfcorehdr", setup_elfcorehdr);
 #endif
 
-static int __init default_update_genapic(void)
-{
-#ifdef CONFIG_X86_SMP
-# if defined(CONFIG_X86_GENERICARCH) || defined(CONFIG_X86_64)
-	genapic->wakeup_cpu = wakeup_secondary_cpu_via_init;
-# endif
-#endif
-
-	return 0;
-}
-
-static struct x86_quirks default_x86_quirks __initdata = {
-	.update_genapic         = default_update_genapic,
-};
+static struct x86_quirks default_x86_quirks __initdata;
 
 struct x86_quirks *x86_quirks __initdata = &default_x86_quirks;
 
@@ -656,7 +657,6 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 	visws_early_detect();
-	pre_setup_arch_hook();
 #else
 	printk(KERN_INFO "Command line: %s\n", boot_command_line);
 #endif
@@ -824,8 +824,7 @@ void __init setup_arch(char **cmdline_p)
 #else
 	num_physpages = max_pfn;
 
- 	if (cpu_has_x2apic)
- 		check_x2apic();
+	check_x2apic();
 
 	/* How many end-of-memory variables you have, grandma! */
 	/* need this before calling reserve_initrd */
@@ -865,9 +864,7 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_initrd();
 
-#ifdef CONFIG_X86_64
 	vsmp_init();
-#endif
 
 	io_delay_init();
 
@@ -893,12 +890,11 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	acpi_reserve_bootmem();
 #endif
-#ifdef CONFIG_X86_FIND_SMP_CONFIG
 	/*
 	 * Find and reserve possible boot-time SMP configuration:
 	 */
 	find_smp_config();
-#endif
+
 	reserve_crashkernel();
 
 #ifdef CONFIG_X86_64
@@ -925,9 +921,7 @@ void __init setup_arch(char **cmdline_p)
 	map_vsyscall();
 #endif
 
-#ifdef CONFIG_X86_GENERICARCH
 	generic_apic_probe();
-#endif
 
 	early_quirks();
 
@@ -978,4 +972,95 @@ void __init setup_arch(char **cmdline_p)
 #endif
 }
 
+#ifdef CONFIG_X86_32
 
+/**
+ * x86_quirk_pre_intr_init - initialisation prior to setting up interrupt vectors
+ *
+ * Description:
+ *	Perform any necessary interrupt initialisation prior to setting up
+ *	the "ordinary" interrupt call gates.  For legacy reasons, the ISA
+ *	interrupts should be initialised here if the machine emulates a PC
+ *	in any way.
+ **/
+void __init x86_quirk_pre_intr_init(void)
+{
+	if (x86_quirks->arch_pre_intr_init) {
+		if (x86_quirks->arch_pre_intr_init())
+			return;
+	}
+	init_ISA_irqs();
+}
+
+/**
+ * x86_quirk_intr_init - post gate setup interrupt initialisation
+ *
+ * Description:
+ *	Fill in any interrupts that may have been left out by the general
+ *	init_IRQ() routine.  interrupts having to do with the machine rather
+ *	than the devices on the I/O bus (like APIC interrupts in intel MP
+ *	systems) are started here.
+ **/
+void __init x86_quirk_intr_init(void)
+{
+	if (x86_quirks->arch_intr_init) {
+		if (x86_quirks->arch_intr_init())
+			return;
+	}
+}
+
+/**
+ * x86_quirk_trap_init - initialise system specific traps
+ *
+ * Description:
+ *	Called as the final act of trap_init().  Used in VISWS to initialise
+ *	the various board specific APIC traps.
+ **/
+void __init x86_quirk_trap_init(void)
+{
+	if (x86_quirks->arch_trap_init) {
+		if (x86_quirks->arch_trap_init())
+			return;
+	}
+}
+
+static struct irqaction irq0  = {
+	.handler = timer_interrupt,
+	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
+	.mask = CPU_MASK_NONE,
+	.name = "timer"
+};
+
+/**
+ * x86_quirk_pre_time_init - do any specific initialisations before.
+ *
+ **/
+void __init x86_quirk_pre_time_init(void)
+{
+	if (x86_quirks->arch_pre_time_init)
+		x86_quirks->arch_pre_time_init();
+}
+
+/**
+ * x86_quirk_time_init - do any specific initialisations for the system timer.
+ *
+ * Description:
+ *	Must plug the system timer interrupt source at HZ into the IRQ listed
+ *	in irq_vectors.h:TIMER_IRQ
+ **/
+void __init x86_quirk_time_init(void)
+{
+	if (x86_quirks->arch_time_init) {
+		/*
+		 * A nonzero return code does not mean failure, it means
+		 * that the architecture quirk does not want any
+		 * generic (timer) setup to be performed after this:
+		 */
+		if (x86_quirks->arch_time_init())
+			return;
+	}
+
+	irq0.mask = cpumask_of_cpu(0);
+	setup_irq(0, &irq0);
+}
+#endif /* CONFIG_X86_32 */
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 01161077a49c..400331b50a53 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -7,402 +7,439 @@
 #include <linux/crash_dump.h>
 #include <linux/smp.h>
 #include <linux/topology.h>
+#include <linux/pfn.h>
 #include <asm/sections.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/mpspec.h>
 #include <asm/apicdef.h>
 #include <asm/highmem.h>
+#include <asm/proto.h>
+#include <asm/cpumask.h>
+#include <asm/cpu.h>
+#include <asm/stackprotector.h>
 
-#ifdef CONFIG_X86_LOCAL_APIC
-unsigned int num_processors;
-unsigned disabled_cpus __cpuinitdata;
-/* Processor that is doing the boot up */
-unsigned int boot_cpu_physical_apicid = -1U;
-EXPORT_SYMBOL(boot_cpu_physical_apicid);
-unsigned int max_physical_apicid;
-
-/* Bitmask of physically existing CPUs */
-physid_mask_t phys_cpu_present_map;
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+# define DBG(x...) printk(KERN_DEBUG x)
+#else
+# define DBG(x...)
 #endif
 
-/* map cpu index to physical APIC ID */
-DEFINE_EARLY_PER_CPU(u16, x86_cpu_to_apicid, BAD_APICID);
-DEFINE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid, BAD_APICID);
-EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
-EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid);
-
-#if defined(CONFIG_NUMA) && defined(CONFIG_X86_64)
-#define	X86_64_NUMA	1
+DEFINE_PER_CPU(int, cpu_number);
+EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-/* map cpu index to node index */
-DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
-EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
+#ifdef CONFIG_X86_64
+#define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
+#else
+#define BOOT_PERCPU_OFFSET 0
+#endif
 
-/* which logical CPUs are on which nodes */
-cpumask_t *node_to_cpumask_map;
-EXPORT_SYMBOL(node_to_cpumask_map);
+DEFINE_PER_CPU(unsigned long, this_cpu_off) = BOOT_PERCPU_OFFSET;
+EXPORT_PER_CPU_SYMBOL(this_cpu_off);
 
-/* setup node_to_cpumask_map */
-static void __init setup_node_to_cpumask_map(void);
+unsigned long __per_cpu_offset[NR_CPUS] __read_mostly = {
+	[0 ... NR_CPUS-1] = BOOT_PERCPU_OFFSET,
+};
+EXPORT_SYMBOL(__per_cpu_offset);
 
+/*
+ * On x86_64 symbols referenced from code should be reachable using
+ * 32bit relocations.  Reserve space for static percpu variables in
+ * modules so that they are always served from the first chunk which
+ * is located at the percpu segment base.  On x86_32, anything can
+ * address anywhere.  No need to reserve space in the first chunk.
+ */
+#ifdef CONFIG_X86_64
+#define PERCPU_FIRST_CHUNK_RESERVE	PERCPU_MODULE_RESERVE
 #else
-static inline void setup_node_to_cpumask_map(void) { }
+#define PERCPU_FIRST_CHUNK_RESERVE	0
 #endif
 
-#if defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) && defined(CONFIG_X86_SMP)
-/*
- * Copy data used in early init routines from the initial arrays to the
- * per cpu data areas.  These arrays then become expendable and the
- * *_early_ptr's are zeroed indicating that the static arrays are gone.
+/**
+ * pcpu_need_numa - determine percpu allocation needs to consider NUMA
+ *
+ * If NUMA is not configured or there is only one NUMA node available,
+ * there is no reason to consider NUMA.  This function determines
+ * whether percpu allocation should consider NUMA or not.
+ *
+ * RETURNS:
+ * true if NUMA should be considered; otherwise, false.
  */
-static void __init setup_per_cpu_maps(void)
+static bool __init pcpu_need_numa(void)
 {
-	int cpu;
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+	pg_data_t *last = NULL;
+	unsigned int cpu;
 
 	for_each_possible_cpu(cpu) {
-		per_cpu(x86_cpu_to_apicid, cpu) =
-				early_per_cpu_map(x86_cpu_to_apicid, cpu);
-		per_cpu(x86_bios_cpu_apicid, cpu) =
-				early_per_cpu_map(x86_bios_cpu_apicid, cpu);
-#ifdef X86_64_NUMA
-		per_cpu(x86_cpu_to_node_map, cpu) =
-				early_per_cpu_map(x86_cpu_to_node_map, cpu);
-#endif
-	}
+		int node = early_cpu_to_node(cpu);
 
-	/* indicate the early static arrays will soon be gone */
-	early_per_cpu_ptr(x86_cpu_to_apicid) = NULL;
-	early_per_cpu_ptr(x86_bios_cpu_apicid) = NULL;
-#ifdef X86_64_NUMA
-	early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
+		if (node_online(node) && NODE_DATA(node) &&
+		    last && last != NODE_DATA(node))
+			return true;
+
+		last = NODE_DATA(node);
+	}
 #endif
+	return false;
 }
 
-#ifdef CONFIG_X86_32
-/*
- * Great future not-so-futuristic plan: make i386 and x86_64 do it
- * the same way
- */
-unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
-EXPORT_SYMBOL(__per_cpu_offset);
-static inline void setup_cpu_pda_map(void) { }
-
-#elif !defined(CONFIG_SMP)
-static inline void setup_cpu_pda_map(void) { }
-
-#else /* CONFIG_SMP && CONFIG_X86_64 */
-
-/*
- * Allocate cpu_pda pointer table and array via alloc_bootmem.
+/**
+ * pcpu_alloc_bootmem - NUMA friendly alloc_bootmem wrapper for percpu
+ * @cpu: cpu to allocate for
+ * @size: size allocation in bytes
+ * @align: alignment
+ *
+ * Allocate @size bytes aligned at @align for cpu @cpu.  This wrapper
+ * does the right thing for NUMA regardless of the current
+ * configuration.
+ *
+ * RETURNS:
+ * Pointer to the allocated area on success, NULL on failure.
  */
-static void __init setup_cpu_pda_map(void)
+static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size,
+					unsigned long align)
 {
-	char *pda;
-	struct x8664_pda **new_cpu_pda;
-	unsigned long size;
-	int cpu;
-
-	size = roundup(sizeof(struct x8664_pda), cache_line_size());
-
-	/* allocate cpu_pda array and pointer table */
-	{
-		unsigned long tsize = nr_cpu_ids * sizeof(void *);
-		unsigned long asize = size * (nr_cpu_ids - 1);
-
-		tsize = roundup(tsize, cache_line_size());
-		new_cpu_pda = alloc_bootmem(tsize + asize);
-		pda = (char *)new_cpu_pda + tsize;
-	}
-
-	/* initialize pointer table to static pda's */
-	for_each_possible_cpu(cpu) {
-		if (cpu == 0) {
-			/* leave boot cpu pda in place */
-			new_cpu_pda[0] = cpu_pda(0);
-			continue;
-		}
-		new_cpu_pda[cpu] = (struct x8664_pda *)pda;
-		new_cpu_pda[cpu]->in_bootmem = 1;
-		pda += size;
+	const unsigned long goal = __pa(MAX_DMA_ADDRESS);
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+	int node = early_cpu_to_node(cpu);
+	void *ptr;
+
+	if (!node_online(node) || !NODE_DATA(node)) {
+		ptr = __alloc_bootmem_nopanic(size, align, goal);
+		pr_info("cpu %d has no node %d or node-local memory\n",
+			cpu, node);
+		pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
+			 cpu, size, __pa(ptr));
+	} else {
+		ptr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
+						   size, align, goal);
+		pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
+			 "%016lx\n", cpu, size, node, __pa(ptr));
 	}
-
-	/* point to new pointer table */
-	_cpu_pda = new_cpu_pda;
+	return ptr;
+#else
+	return __alloc_bootmem_nopanic(size, align, goal);
+#endif
 }
 
-#endif /* CONFIG_SMP && CONFIG_X86_64 */
-
-#ifdef CONFIG_X86_64
+/*
+ * Remap allocator
+ *
+ * This allocator uses PMD page as unit.  A PMD page is allocated for
+ * each cpu and each is remapped into vmalloc area using PMD mapping.
+ * As PMD page is quite large, only part of it is used for the first
+ * chunk.  Unused part is returned to the bootmem allocator.
+ *
+ * So, the PMD pages are mapped twice - once to the physical mapping
+ * and to the vmalloc area for the first percpu chunk.  The double
+ * mapping does add one more PMD TLB entry pressure but still is much
+ * better than only using 4k mappings while still being NUMA friendly.
+ */
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+static size_t pcpur_size __initdata;
+static void **pcpur_ptrs __initdata;
 
-/* correctly size the local cpu masks */
-static void __init setup_cpu_local_masks(void)
+static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)
 {
-	alloc_bootmem_cpumask_var(&cpu_initialized_mask);
-	alloc_bootmem_cpumask_var(&cpu_callin_mask);
-	alloc_bootmem_cpumask_var(&cpu_callout_mask);
-	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
-}
+	size_t off = (size_t)pageno << PAGE_SHIFT;
 
-#else /* CONFIG_X86_32 */
+	if (off >= pcpur_size)
+		return NULL;
 
-static inline void setup_cpu_local_masks(void)
-{
+	return virt_to_page(pcpur_ptrs[cpu] + off);
 }
 
-#endif /* CONFIG_X86_32 */
-
-/*
- * Great future plan:
- * Declare PDA itself and support (irqstack,tss,pgd) as per cpu data.
- * Always point %gs to its beginning
- */
-void __init setup_per_cpu_areas(void)
+static ssize_t __init setup_pcpu_remap(size_t static_size)
 {
-	ssize_t size, old_size;
-	char *ptr;
-	int cpu;
-	unsigned long align = 1;
-
-	/* Setup cpu_pda map */
-	setup_cpu_pda_map();
+	static struct vm_struct vm;
+	pg_data_t *last;
+	size_t ptrs_size, dyn_size;
+	unsigned int cpu;
+	ssize_t ret;
+
+	/*
+	 * If large page isn't supported, there's no benefit in doing
+	 * this.  Also, on non-NUMA, embedding is better.
+	 */
+	if (!cpu_has_pse || pcpu_need_numa())
+		return -EINVAL;
+
+	last = NULL;
+	for_each_possible_cpu(cpu) {
+		int node = early_cpu_to_node(cpu);
 
-	/* Copy section for each CPU (we discard the original) */
-	old_size = PERCPU_ENOUGH_ROOM;
-	align = max_t(unsigned long, PAGE_SIZE, align);
-	size = roundup(old_size, align);
+		if (node_online(node) && NODE_DATA(node) &&
+		    last && last != NODE_DATA(node))
+			goto proceed;
 
-	pr_info("NR_CPUS:%d nr_cpumask_bits:%d nr_cpu_ids:%d nr_node_ids:%d\n",
-		NR_CPUS, nr_cpumask_bits, nr_cpu_ids, nr_node_ids);
+		last = NODE_DATA(node);
+	}
+	return -EINVAL;
+
+proceed:
+	/*
+	 * Currently supports only single page.  Supporting multiple
+	 * pages won't be too difficult if it ever becomes necessary.
+	 */
+	pcpur_size = PFN_ALIGN(static_size + PERCPU_MODULE_RESERVE +
+			       PERCPU_DYNAMIC_RESERVE);
+	if (pcpur_size > PMD_SIZE) {
+		pr_warning("PERCPU: static data is larger than large page, "
+			   "can't use large page\n");
+		return -EINVAL;
+	}
+	dyn_size = pcpur_size - static_size - PERCPU_FIRST_CHUNK_RESERVE;
 
-	pr_info("PERCPU: Allocating %zd bytes of per cpu data\n", size);
+	/* allocate pointer array and alloc large pages */
+	ptrs_size = PFN_ALIGN(num_possible_cpus() * sizeof(pcpur_ptrs[0]));
+	pcpur_ptrs = alloc_bootmem(ptrs_size);
 
 	for_each_possible_cpu(cpu) {
-#ifndef CONFIG_NEED_MULTIPLE_NODES
-		ptr = __alloc_bootmem(size, align,
-				 __pa(MAX_DMA_ADDRESS));
-#else
-		int node = early_cpu_to_node(cpu);
-		if (!node_online(node) || !NODE_DATA(node)) {
-			ptr = __alloc_bootmem(size, align,
-					 __pa(MAX_DMA_ADDRESS));
-			pr_info("cpu %d has no node %d or node-local memory\n",
-				cpu, node);
-			pr_debug("per cpu data for cpu%d at %016lx\n",
-				 cpu, __pa(ptr));
-		} else {
-			ptr = __alloc_bootmem_node(NODE_DATA(node), size, align,
-							__pa(MAX_DMA_ADDRESS));
-			pr_debug("per cpu data for cpu%d on node%d at %016lx\n",
-				cpu, node, __pa(ptr));
-		}
-#endif
-		per_cpu_offset(cpu) = ptr - __per_cpu_start;
-		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+		pcpur_ptrs[cpu] = pcpu_alloc_bootmem(cpu, PMD_SIZE, PMD_SIZE);
+		if (!pcpur_ptrs[cpu])
+			goto enomem;
+
+		/*
+		 * Only use pcpur_size bytes and give back the rest.
+		 *
+		 * Ingo: The 2MB up-rounding bootmem is needed to make
+		 * sure the partial 2MB page is still fully RAM - it's
+		 * not well-specified to have a PAT-incompatible area
+		 * (unmapped RAM, device memory, etc.) in that hole.
+		 */
+		free_bootmem(__pa(pcpur_ptrs[cpu] + pcpur_size),
+			     PMD_SIZE - pcpur_size);
+
+		memcpy(pcpur_ptrs[cpu], __per_cpu_load, static_size);
 	}
 
-	/* Setup percpu data maps */
-	setup_per_cpu_maps();
-
-	/* Setup node to cpumask map */
-	setup_node_to_cpumask_map();
-
-	/* Setup cpu initialized, callin, callout masks */
-	setup_cpu_local_masks();
-}
-
-#endif
+	/* allocate address and map */
+	vm.flags = VM_ALLOC;
+	vm.size = num_possible_cpus() * PMD_SIZE;
+	vm_area_register_early(&vm, PMD_SIZE);
 
-#ifdef X86_64_NUMA
+	for_each_possible_cpu(cpu) {
+		pmd_t *pmd;
 
-/*
- * Allocate node_to_cpumask_map based on number of available nodes
- * Requires node_possible_map to be valid.
- *
- * Note: node_to_cpumask() is not valid until after this is done.
- */
-static void __init setup_node_to_cpumask_map(void)
-{
-	unsigned int node, num = 0;
-	cpumask_t *map;
-
-	/* setup nr_node_ids if not done yet */
-	if (nr_node_ids == MAX_NUMNODES) {
-		for_each_node_mask(node, node_possible_map)
-			num = node;
-		nr_node_ids = num + 1;
+		pmd = populate_extra_pmd((unsigned long)vm.addr
+					 + cpu * PMD_SIZE);
+		set_pmd(pmd, pfn_pmd(page_to_pfn(virt_to_page(pcpur_ptrs[cpu])),
+				     PAGE_KERNEL_LARGE));
 	}
 
-	/* allocate the map */
-	map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
-
-	pr_debug("Node to cpumask map at %p for %d nodes\n",
-		 map, nr_node_ids);
-
-	/* node_to_cpumask() will now work */
-	node_to_cpumask_map = map;
+	/* we're ready, commit */
+	pr_info("PERCPU: Remapped at %p with large pages, static data "
+		"%zu bytes\n", vm.addr, static_size);
+
+	ret = pcpu_setup_first_chunk(pcpur_get_page, static_size,
+				     PERCPU_FIRST_CHUNK_RESERVE, dyn_size,
+				     PMD_SIZE, vm.addr, NULL);
+	goto out_free_ar;
+
+enomem:
+	for_each_possible_cpu(cpu)
+		if (pcpur_ptrs[cpu])
+			free_bootmem(__pa(pcpur_ptrs[cpu]), PMD_SIZE);
+	ret = -ENOMEM;
+out_free_ar:
+	free_bootmem(__pa(pcpur_ptrs), ptrs_size);
+	return ret;
 }
-
-void __cpuinit numa_set_node(int cpu, int node)
+#else
+static ssize_t __init setup_pcpu_remap(size_t static_size)
 {
-	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
-
-	if (cpu_pda(cpu) && node != NUMA_NO_NODE)
-		cpu_pda(cpu)->nodenumber = node;
-
-	if (cpu_to_node_map)
-		cpu_to_node_map[cpu] = node;
-
-	else if (per_cpu_offset(cpu))
-		per_cpu(x86_cpu_to_node_map, cpu) = node;
-
-	else
-		pr_debug("Setting node for non-present cpu %d\n", cpu);
+	return -EINVAL;
 }
+#endif
 
-void __cpuinit numa_clear_node(int cpu)
+/*
+ * Embedding allocator
+ *
+ * The first chunk is sized to just contain the static area plus
+ * module and dynamic reserves and embedded into linear physical
+ * mapping so that it can use PMD mapping without additional TLB
+ * pressure.
+ */
+static ssize_t __init setup_pcpu_embed(size_t static_size)
 {
-	numa_set_node(cpu, NUMA_NO_NODE);
+	size_t reserve = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE;
+
+	/*
+	 * If large page isn't supported, there's no benefit in doing
+	 * this.  Also, embedding allocation doesn't play well with
+	 * NUMA.
+	 */
+	if (!cpu_has_pse || pcpu_need_numa())
+		return -EINVAL;
+
+	return pcpu_embed_first_chunk(static_size, PERCPU_FIRST_CHUNK_RESERVE,
+				      reserve - PERCPU_FIRST_CHUNK_RESERVE, -1);
 }
 
-#ifndef CONFIG_DEBUG_PER_CPU_MAPS
+/*
+ * 4k page allocator
+ *
+ * This is the basic allocator.  Static percpu area is allocated
+ * page-by-page and most of initialization is done by the generic
+ * setup function.
+ */
+static struct page **pcpu4k_pages __initdata;
+static int pcpu4k_nr_static_pages __initdata;
 
-void __cpuinit numa_add_cpu(int cpu)
+static struct page * __init pcpu4k_get_page(unsigned int cpu, int pageno)
 {
-	cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+	if (pageno < pcpu4k_nr_static_pages)
+		return pcpu4k_pages[cpu * pcpu4k_nr_static_pages + pageno];
+	return NULL;
 }
 
-void __cpuinit numa_remove_cpu(int cpu)
+static void __init pcpu4k_populate_pte(unsigned long addr)
 {
-	cpu_clear(cpu, node_to_cpumask_map[cpu_to_node(cpu)]);
+	populate_extra_pte(addr);
 }
 
-#else /* CONFIG_DEBUG_PER_CPU_MAPS */
-
-/*
- * --------- debug versions of the numa functions ---------
- */
-static void __cpuinit numa_set_cpumask(int cpu, int enable)
+static ssize_t __init setup_pcpu_4k(size_t static_size)
 {
-	int node = cpu_to_node(cpu);
-	cpumask_t *mask;
-	char buf[64];
-
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_ERR "node_to_cpumask_map NULL\n");
-		dump_stack();
-		return;
-	}
-
-	mask = &node_to_cpumask_map[node];
-	if (enable)
-		cpu_set(cpu, *mask);
-	else
-		cpu_clear(cpu, *mask);
+	size_t pages_size;
+	unsigned int cpu;
+	int i, j;
+	ssize_t ret;
+
+	pcpu4k_nr_static_pages = PFN_UP(static_size);
+
+	/* unaligned allocations can't be freed, round up to page size */
+	pages_size = PFN_ALIGN(pcpu4k_nr_static_pages * num_possible_cpus()
+			       * sizeof(pcpu4k_pages[0]));
+	pcpu4k_pages = alloc_bootmem(pages_size);
+
+	/* allocate and copy */
+	j = 0;
+	for_each_possible_cpu(cpu)
+		for (i = 0; i < pcpu4k_nr_static_pages; i++) {
+			void *ptr;
+
+			ptr = pcpu_alloc_bootmem(cpu, PAGE_SIZE, PAGE_SIZE);
+			if (!ptr)
+				goto enomem;
+
+			memcpy(ptr, __per_cpu_load + i * PAGE_SIZE, PAGE_SIZE);
+			pcpu4k_pages[j++] = virt_to_page(ptr);
+		}
 
-	cpulist_scnprintf(buf, sizeof(buf), mask);
-	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
-		enable ? "numa_add_cpu" : "numa_remove_cpu", cpu, node, buf);
+	/* we're ready, commit */
+	pr_info("PERCPU: Allocated %d 4k pages, static data %zu bytes\n",
+		pcpu4k_nr_static_pages, static_size);
+
+	ret = pcpu_setup_first_chunk(pcpu4k_get_page, static_size,
+				     PERCPU_FIRST_CHUNK_RESERVE, -1,
+				     -1, NULL, pcpu4k_populate_pte);
+	goto out_free_ar;
+
+enomem:
+	while (--j >= 0)
+		free_bootmem(__pa(page_address(pcpu4k_pages[j])), PAGE_SIZE);
+	ret = -ENOMEM;
+out_free_ar:
+	free_bootmem(__pa(pcpu4k_pages), pages_size);
+	return ret;
 }
 
-void __cpuinit numa_add_cpu(int cpu)
+static inline void setup_percpu_segment(int cpu)
 {
-	numa_set_cpumask(cpu, 1);
-}
-
-void __cpuinit numa_remove_cpu(int cpu)
-{
-	numa_set_cpumask(cpu, 0);
-}
+#ifdef CONFIG_X86_32
+	struct desc_struct gdt;
 
-int cpu_to_node(int cpu)
-{
-	if (early_per_cpu_ptr(x86_cpu_to_node_map)) {
-		printk(KERN_WARNING
-			"cpu_to_node(%d): usage too early!\n", cpu);
-		dump_stack();
-		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
-	}
-	return per_cpu(x86_cpu_to_node_map, cpu);
+	pack_descriptor(&gdt, per_cpu_offset(cpu), 0xFFFFF,
+			0x2 | DESCTYPE_S, 0x8);
+	gdt.s = 1;
+	write_gdt_entry(get_cpu_gdt_table(cpu),
+			GDT_ENTRY_PERCPU, &gdt, DESCTYPE_S);
+#endif
 }
-EXPORT_SYMBOL(cpu_to_node);
 
 /*
- * Same function as cpu_to_node() but used if called before the
- * per_cpu areas are setup.
+ * Great future plan:
+ * Declare PDA itself and support (irqstack,tss,pgd) as per cpu data.
+ * Always point %gs to its beginning
  */
-int early_cpu_to_node(int cpu)
+void __init setup_per_cpu_areas(void)
 {
-	if (early_per_cpu_ptr(x86_cpu_to_node_map))
-		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
-
-	if (!per_cpu_offset(cpu)) {
-		printk(KERN_WARNING
-			"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
-		dump_stack();
-		return NUMA_NO_NODE;
-	}
-	return per_cpu(x86_cpu_to_node_map, cpu);
-}
+	size_t static_size = __per_cpu_end - __per_cpu_start;
+	unsigned int cpu;
+	unsigned long delta;
+	size_t pcpu_unit_size;
+	ssize_t ret;
 
+	pr_info("NR_CPUS:%d nr_cpumask_bits:%d nr_cpu_ids:%d nr_node_ids:%d\n",
+		NR_CPUS, nr_cpumask_bits, nr_cpu_ids, nr_node_ids);
 
-/* empty cpumask */
-static const cpumask_t cpu_mask_none;
-
-/*
- * Returns a pointer to the bitmask of CPUs on Node 'node'.
- */
-const cpumask_t *cpumask_of_node(int node)
-{
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_WARNING
-			"cpumask_of_node(%d): no node_to_cpumask_map!\n",
-			node);
-		dump_stack();
-		return (const cpumask_t *)&cpu_online_map;
-	}
-	if (node >= nr_node_ids) {
-		printk(KERN_WARNING
-			"cpumask_of_node(%d): node > nr_node_ids(%d)\n",
-			node, nr_node_ids);
-		dump_stack();
-		return &cpu_mask_none;
-	}
-	return &node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(cpumask_of_node);
-
-/*
- * Returns a bitmask of CPUs on Node 'node'.
- *
- * Side note: this function creates the returned cpumask on the stack
- * so with a high NR_CPUS count, excessive stack space is used.  The
- * node_to_cpumask_ptr function should be used whenever possible.
- */
-cpumask_t node_to_cpumask(int node)
-{
-	if (node_to_cpumask_map == NULL) {
-		printk(KERN_WARNING
-			"node_to_cpumask(%d): no node_to_cpumask_map!\n", node);
-		dump_stack();
-		return cpu_online_map;
-	}
-	if (node >= nr_node_ids) {
-		printk(KERN_WARNING
-			"node_to_cpumask(%d): node > nr_node_ids(%d)\n",
-			node, nr_node_ids);
-		dump_stack();
-		return cpu_mask_none;
+	/*
+	 * Allocate percpu area.  If PSE is supported, try to make use
+	 * of large page mappings.  Please read comments on top of
+	 * each allocator for details.
+	 */
+	ret = setup_pcpu_remap(static_size);
+	if (ret < 0)
+		ret = setup_pcpu_embed(static_size);
+	if (ret < 0)
+		ret = setup_pcpu_4k(static_size);
+	if (ret < 0)
+		panic("cannot allocate static percpu area (%zu bytes, err=%zd)",
+		      static_size, ret);
+
+	pcpu_unit_size = ret;
+
+	/* alrighty, percpu areas up and running */
+	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
+	for_each_possible_cpu(cpu) {
+		per_cpu_offset(cpu) = delta + cpu * pcpu_unit_size;
+		per_cpu(this_cpu_off, cpu) = per_cpu_offset(cpu);
+		per_cpu(cpu_number, cpu) = cpu;
+		setup_percpu_segment(cpu);
+		setup_stack_canary_segment(cpu);
+		/*
+		 * Copy data used in early init routines from the
+		 * initial arrays to the per cpu data areas.  These
+		 * arrays then become expendable and the *_early_ptr's
+		 * are zeroed indicating that the static arrays are
+		 * gone.
+		 */
+#ifdef CONFIG_X86_LOCAL_APIC
+		per_cpu(x86_cpu_to_apicid, cpu) =
+			early_per_cpu_map(x86_cpu_to_apicid, cpu);
+		per_cpu(x86_bios_cpu_apicid, cpu) =
+			early_per_cpu_map(x86_bios_cpu_apicid, cpu);
+#endif
+#ifdef CONFIG_X86_64
+		per_cpu(irq_stack_ptr, cpu) =
+			per_cpu(irq_stack_union.irq_stack, cpu) +
+			IRQ_STACK_SIZE - 64;
+#ifdef CONFIG_NUMA
+		per_cpu(x86_cpu_to_node_map, cpu) =
+			early_per_cpu_map(x86_cpu_to_node_map, cpu);
+#endif
+#endif
+		/*
+		 * Up to this point, the boot CPU has been using .data.init
+		 * area.  Reload any changed state for the boot CPU.
+		 */
+		if (cpu == boot_cpu_id)
+			switch_to_new_gdt(cpu);
 	}
-	return node_to_cpumask_map[node];
-}
-EXPORT_SYMBOL(node_to_cpumask);
 
-/*
- * --------- end of debug versions of the numa functions ---------
- */
-
-#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+	/* indicate the early static arrays will soon be gone */
+#ifdef CONFIG_X86_LOCAL_APIC
+	early_per_cpu_ptr(x86_cpu_to_apicid) = NULL;
+	early_per_cpu_ptr(x86_bios_cpu_apicid) = NULL;
+#endif
+#if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
+	early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
+#endif
 
-#endif /* X86_64_NUMA */
+	/* Setup node to cpumask map */
+	setup_node_to_cpumask_map();
 
+	/* Setup cpu initialized, callin, callout masks */
+	setup_cpu_local_masks();
+}
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index df0587f24c54..d2cc6428c587 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -50,27 +50,23 @@
 # define FIX_EFLAGS	__FIX_EFLAGS
 #endif
 
-#define COPY(x)			{		\
-	err |= __get_user(regs->x, &sc->x);	\
-}
+#define COPY(x)			do {			\
+	get_user_ex(regs->x, &sc->x);			\
+} while (0)
 
-#define COPY_SEG(seg)		{			\
-		unsigned short tmp;			\
-		err |= __get_user(tmp, &sc->seg);	\
-		regs->seg = tmp;			\
-}
+#define GET_SEG(seg)		({			\
+	unsigned short tmp;				\
+	get_user_ex(tmp, &sc->seg);			\
+	tmp;						\
+})
 
-#define COPY_SEG_CPL3(seg)	{			\
-		unsigned short tmp;			\
-		err |= __get_user(tmp, &sc->seg);	\
-		regs->seg = tmp | 3;			\
-}
+#define COPY_SEG(seg)		do {			\
+	regs->seg = GET_SEG(seg);			\
+} while (0)
 
-#define GET_SEG(seg)		{			\
-		unsigned short tmp;			\
-		err |= __get_user(tmp, &sc->seg);	\
-		loadsegment(seg, tmp);			\
-}
+#define COPY_SEG_CPL3(seg)	do {			\
+	regs->seg = GET_SEG(seg) | 3;			\
+} while (0)
 
 static int
 restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
@@ -83,45 +79,49 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
 	/* Always make any pending restarted system calls return -EINTR */
 	current_thread_info()->restart_block.fn = do_no_restart_syscall;
 
+	get_user_try {
+
 #ifdef CONFIG_X86_32
-	GET_SEG(gs);
-	COPY_SEG(fs);
-	COPY_SEG(es);
-	COPY_SEG(ds);
+		set_user_gs(regs, GET_SEG(gs));
+		COPY_SEG(fs);
+		COPY_SEG(es);
+		COPY_SEG(ds);
 #endif /* CONFIG_X86_32 */
 
-	COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
-	COPY(dx); COPY(cx); COPY(ip);
+		COPY(di); COPY(si); COPY(bp); COPY(sp); COPY(bx);
+		COPY(dx); COPY(cx); COPY(ip);
 
 #ifdef CONFIG_X86_64
-	COPY(r8);
-	COPY(r9);
-	COPY(r10);
-	COPY(r11);
-	COPY(r12);
-	COPY(r13);
-	COPY(r14);
-	COPY(r15);
+		COPY(r8);
+		COPY(r9);
+		COPY(r10);
+		COPY(r11);
+		COPY(r12);
+		COPY(r13);
+		COPY(r14);
+		COPY(r15);
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_32
-	COPY_SEG_CPL3(cs);
-	COPY_SEG_CPL3(ss);
+		COPY_SEG_CPL3(cs);
+		COPY_SEG_CPL3(ss);
 #else /* !CONFIG_X86_32 */
-	/* Kernel saves and restores only the CS segment register on signals,
-	 * which is the bare minimum needed to allow mixed 32/64-bit code.
-	 * App's signal handler can save/restore other segments if needed. */
-	COPY_SEG_CPL3(cs);
+		/* Kernel saves and restores only the CS segment register on signals,
+		 * which is the bare minimum needed to allow mixed 32/64-bit code.
+		 * App's signal handler can save/restore other segments if needed. */
+		COPY_SEG_CPL3(cs);
 #endif /* CONFIG_X86_32 */
 
-	err |= __get_user(tmpflags, &sc->flags);
-	regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
-	regs->orig_ax = -1;		/* disable syscall checks */
+		get_user_ex(tmpflags, &sc->flags);
+		regs->flags = (regs->flags & ~FIX_EFLAGS) | (tmpflags & FIX_EFLAGS);
+		regs->orig_ax = -1;		/* disable syscall checks */
 
-	err |= __get_user(buf, &sc->fpstate);
-	err |= restore_i387_xstate(buf);
+		get_user_ex(buf, &sc->fpstate);
+		err |= restore_i387_xstate(buf);
+
+		get_user_ex(*pax, &sc->ax);
+	} get_user_catch(err);
 
-	err |= __get_user(*pax, &sc->ax);
 	return err;
 }
 
@@ -131,57 +131,55 @@ setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 {
 	int err = 0;
 
-#ifdef CONFIG_X86_32
-	{
-		unsigned int tmp;
+	put_user_try {
 
-		savesegment(gs, tmp);
-		err |= __put_user(tmp, (unsigned int __user *)&sc->gs);
-	}
-	err |= __put_user(regs->fs, (unsigned int __user *)&sc->fs);
-	err |= __put_user(regs->es, (unsigned int __user *)&sc->es);
-	err |= __put_user(regs->ds, (unsigned int __user *)&sc->ds);
+#ifdef CONFIG_X86_32
+		put_user_ex(get_user_gs(regs), (unsigned int __user *)&sc->gs);
+		put_user_ex(regs->fs, (unsigned int __user *)&sc->fs);
+		put_user_ex(regs->es, (unsigned int __user *)&sc->es);
+		put_user_ex(regs->ds, (unsigned int __user *)&sc->ds);
 #endif /* CONFIG_X86_32 */
 
-	err |= __put_user(regs->di, &sc->di);
-	err |= __put_user(regs->si, &sc->si);
-	err |= __put_user(regs->bp, &sc->bp);
-	err |= __put_user(regs->sp, &sc->sp);
-	err |= __put_user(regs->bx, &sc->bx);
-	err |= __put_user(regs->dx, &sc->dx);
-	err |= __put_user(regs->cx, &sc->cx);
-	err |= __put_user(regs->ax, &sc->ax);
+		put_user_ex(regs->di, &sc->di);
+		put_user_ex(regs->si, &sc->si);
+		put_user_ex(regs->bp, &sc->bp);
+		put_user_ex(regs->sp, &sc->sp);
+		put_user_ex(regs->bx, &sc->bx);
+		put_user_ex(regs->dx, &sc->dx);
+		put_user_ex(regs->cx, &sc->cx);
+		put_user_ex(regs->ax, &sc->ax);
 #ifdef CONFIG_X86_64
-	err |= __put_user(regs->r8, &sc->r8);
-	err |= __put_user(regs->r9, &sc->r9);
-	err |= __put_user(regs->r10, &sc->r10);
-	err |= __put_user(regs->r11, &sc->r11);
-	err |= __put_user(regs->r12, &sc->r12);
-	err |= __put_user(regs->r13, &sc->r13);
-	err |= __put_user(regs->r14, &sc->r14);
-	err |= __put_user(regs->r15, &sc->r15);
+		put_user_ex(regs->r8, &sc->r8);
+		put_user_ex(regs->r9, &sc->r9);
+		put_user_ex(regs->r10, &sc->r10);
+		put_user_ex(regs->r11, &sc->r11);
+		put_user_ex(regs->r12, &sc->r12);
+		put_user_ex(regs->r13, &sc->r13);
+		put_user_ex(regs->r14, &sc->r14);
+		put_user_ex(regs->r15, &sc->r15);
 #endif /* CONFIG_X86_64 */
 
-	err |= __put_user(current->thread.trap_no, &sc->trapno);
-	err |= __put_user(current->thread.error_code, &sc->err);
-	err |= __put_user(regs->ip, &sc->ip);
+		put_user_ex(current->thread.trap_no, &sc->trapno);
+		put_user_ex(current->thread.error_code, &sc->err);
+		put_user_ex(regs->ip, &sc->ip);
 #ifdef CONFIG_X86_32
-	err |= __put_user(regs->cs, (unsigned int __user *)&sc->cs);
-	err |= __put_user(regs->flags, &sc->flags);
-	err |= __put_user(regs->sp, &sc->sp_at_signal);
-	err |= __put_user(regs->ss, (unsigned int __user *)&sc->ss);
+		put_user_ex(regs->cs, (unsigned int __user *)&sc->cs);
+		put_user_ex(regs->flags, &sc->flags);
+		put_user_ex(regs->sp, &sc->sp_at_signal);
+		put_user_ex(regs->ss, (unsigned int __user *)&sc->ss);
 #else /* !CONFIG_X86_32 */
-	err |= __put_user(regs->flags, &sc->flags);
-	err |= __put_user(regs->cs, &sc->cs);
-	err |= __put_user(0, &sc->gs);
-	err |= __put_user(0, &sc->fs);
+		put_user_ex(regs->flags, &sc->flags);
+		put_user_ex(regs->cs, &sc->cs);
+		put_user_ex(0, &sc->gs);
+		put_user_ex(0, &sc->fs);
 #endif /* CONFIG_X86_32 */
 
-	err |= __put_user(fpstate, &sc->fpstate);
+		put_user_ex(fpstate, &sc->fpstate);
 
-	/* non-iBCS2 extensions.. */
-	err |= __put_user(mask, &sc->oldmask);
-	err |= __put_user(current->thread.cr2, &sc->cr2);
+		/* non-iBCS2 extensions.. */
+		put_user_ex(mask, &sc->oldmask);
+		put_user_ex(current->thread.cr2, &sc->cr2);
+	} put_user_catch(err);
 
 	return err;
 }
@@ -189,40 +187,35 @@ setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
 /*
  * Set up a signal frame.
  */
-#ifdef CONFIG_X86_32
-static const struct {
-	u16 poplmovl;
-	u32 val;
-	u16 int80;
-} __attribute__((packed)) retcode = {
-	0xb858,		/* popl %eax; movl $..., %eax */
-	__NR_sigreturn,
-	0x80cd,		/* int $0x80 */
-};
-
-static const struct {
-	u8  movl;
-	u32 val;
-	u16 int80;
-	u8  pad;
-} __attribute__((packed)) rt_retcode = {
-	0xb8,		/* movl $..., %eax */
-	__NR_rt_sigreturn,
-	0x80cd,		/* int $0x80 */
-	0
-};
 
 /*
  * Determine which stack to use..
  */
+static unsigned long align_sigframe(unsigned long sp)
+{
+#ifdef CONFIG_X86_32
+	/*
+	 * Align the stack pointer according to the i386 ABI,
+	 * i.e. so that on function entry ((sp + 4) & 15) == 0.
+	 */
+	sp = ((sp + 4) & -16ul) - 4;
+#else /* !CONFIG_X86_32 */
+	sp = round_down(sp, 16) - 8;
+#endif
+	return sp;
+}
+
 static inline void __user *
 get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
-	     void **fpstate)
+	     void __user **fpstate)
 {
-	unsigned long sp;
-
 	/* Default to using normal stack */
-	sp = regs->sp;
+	unsigned long sp = regs->sp;
+
+#ifdef CONFIG_X86_64
+	/* redzone */
+	sp -= 128;
+#endif /* CONFIG_X86_64 */
 
 	/*
 	 * If we are on the alternate signal stack and would overflow it, don't.
@@ -236,30 +229,52 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		if (sas_ss_flags(sp) == 0)
 			sp = current->sas_ss_sp + current->sas_ss_size;
 	} else {
+#ifdef CONFIG_X86_32
 		/* This is the legacy signal stack switching. */
 		if ((regs->ss & 0xffff) != __USER_DS &&
 			!(ka->sa.sa_flags & SA_RESTORER) &&
 				ka->sa.sa_restorer)
 			sp = (unsigned long) ka->sa.sa_restorer;
+#endif /* CONFIG_X86_32 */
 	}
 
 	if (used_math()) {
-		sp = sp - sig_xstate_size;
-		*fpstate = (struct _fpstate *) sp;
+		sp -= sig_xstate_size;
+#ifdef CONFIG_X86_64
+		sp = round_down(sp, 64);
+#endif /* CONFIG_X86_64 */
+		*fpstate = (void __user *)sp;
+
 		if (save_i387_xstate(*fpstate) < 0)
 			return (void __user *)-1L;
 	}
 
-	sp -= frame_size;
-	/*
-	 * Align the stack pointer according to the i386 ABI,
-	 * i.e. so that on function entry ((sp + 4) & 15) == 0.
-	 */
-	sp = ((sp + 4) & -16ul) - 4;
-
-	return (void __user *) sp;
+	return (void __user *)align_sigframe(sp - frame_size);
 }
 
+#ifdef CONFIG_X86_32
+static const struct {
+	u16 poplmovl;
+	u32 val;
+	u16 int80;
+} __attribute__((packed)) retcode = {
+	0xb858,		/* popl %eax; movl $..., %eax */
+	__NR_sigreturn,
+	0x80cd,		/* int $0x80 */
+};
+
+static const struct {
+	u8  movl;
+	u32 val;
+	u16 int80;
+	u8  pad;
+} __attribute__((packed)) rt_retcode = {
+	0xb8,		/* movl $..., %eax */
+	__NR_rt_sigreturn,
+	0x80cd,		/* int $0x80 */
+	0
+};
+
 static int
 __setup_frame(int sig, struct k_sigaction *ka, sigset_t *set,
 	      struct pt_regs *regs)
@@ -336,43 +351,41 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
 		return -EFAULT;
 
-	err |= __put_user(sig, &frame->sig);
-	err |= __put_user(&frame->info, &frame->pinfo);
-	err |= __put_user(&frame->uc, &frame->puc);
-	err |= copy_siginfo_to_user(&frame->info, info);
-	if (err)
-		return -EFAULT;
+	put_user_try {
+		put_user_ex(sig, &frame->sig);
+		put_user_ex(&frame->info, &frame->pinfo);
+		put_user_ex(&frame->uc, &frame->puc);
+		err |= copy_siginfo_to_user(&frame->info, info);
 
-	/* Create the ucontext.  */
-	if (cpu_has_xsave)
-		err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
-	else
-		err |= __put_user(0, &frame->uc.uc_flags);
-	err |= __put_user(0, &frame->uc.uc_link);
-	err |= __put_user(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
-	err |= __put_user(sas_ss_flags(regs->sp),
-			  &frame->uc.uc_stack.ss_flags);
-	err |= __put_user(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
-	err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
-				regs, set->sig[0]);
-	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
-	if (err)
-		return -EFAULT;
-
-	/* Set up to return from userspace.  */
-	restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn);
-	if (ka->sa.sa_flags & SA_RESTORER)
-		restorer = ka->sa.sa_restorer;
-	err |= __put_user(restorer, &frame->pretcode);
+		/* Create the ucontext.  */
+		if (cpu_has_xsave)
+			put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
+		else
+			put_user_ex(0, &frame->uc.uc_flags);
+		put_user_ex(0, &frame->uc.uc_link);
+		put_user_ex(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
+		put_user_ex(sas_ss_flags(regs->sp),
+			    &frame->uc.uc_stack.ss_flags);
+		put_user_ex(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
+		err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
+					regs, set->sig[0]);
+		err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
+
+		/* Set up to return from userspace.  */
+		restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn);
+		if (ka->sa.sa_flags & SA_RESTORER)
+			restorer = ka->sa.sa_restorer;
+		put_user_ex(restorer, &frame->pretcode);
 
-	/*
-	 * This is movl $__NR_rt_sigreturn, %ax ; int $0x80
-	 *
-	 * WE DO NOT USE IT ANY MORE! It's only left here for historical
-	 * reasons and because gdb uses it as a signature to notice
-	 * signal handler stack frames.
-	 */
-	err |= __put_user(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
+		/*
+		 * This is movl $__NR_rt_sigreturn, %ax ; int $0x80
+		 *
+		 * WE DO NOT USE IT ANY MORE! It's only left here for historical
+		 * reasons and because gdb uses it as a signature to notice
+		 * signal handler stack frames.
+		 */
+		put_user_ex(*((u64 *)&rt_retcode), (u64 *)frame->retcode);
+	} put_user_catch(err);
 
 	if (err)
 		return -EFAULT;
@@ -392,24 +405,6 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	return 0;
 }
 #else /* !CONFIG_X86_32 */
-/*
- * Determine which stack to use..
- */
-static void __user *
-get_stack(struct k_sigaction *ka, unsigned long sp, unsigned long size)
-{
-	/* Default to using normal stack - redzone*/
-	sp -= 128;
-
-	/* This is the X/Open sanctioned signal stack switching.  */
-	if (ka->sa.sa_flags & SA_ONSTACK) {
-		if (sas_ss_flags(sp) == 0)
-			sp = current->sas_ss_sp + current->sas_ss_size;
-	}
-
-	return (void __user *)round_down(sp - size, 64);
-}
-
 static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 			    sigset_t *set, struct pt_regs *regs)
 {
@@ -418,15 +413,7 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 	int err = 0;
 	struct task_struct *me = current;
 
-	if (used_math()) {
-		fp = get_stack(ka, regs->sp, sig_xstate_size);
-		frame = (void __user *)round_down(
-			(unsigned long)fp - sizeof(struct rt_sigframe), 16) - 8;
-
-		if (save_i387_xstate(fp) < 0)
-			return -EFAULT;
-	} else
-		frame = get_stack(ka, regs->sp, sizeof(struct rt_sigframe)) - 8;
+	frame = get_sigframe(ka, regs, sizeof(struct rt_sigframe), &fp);
 
 	if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
 		return -EFAULT;
@@ -436,28 +423,30 @@ static int __setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
 			return -EFAULT;
 	}
 
-	/* Create the ucontext.  */
-	if (cpu_has_xsave)
-		err |= __put_user(UC_FP_XSTATE, &frame->uc.uc_flags);
-	else
-		err |= __put_user(0, &frame->uc.uc_flags);
-	err |= __put_user(0, &frame->uc.uc_link);
-	err |= __put_user(me->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
-	err |= __put_user(sas_ss_flags(regs->sp),
-			  &frame->uc.uc_stack.ss_flags);
-	err |= __put_user(me->sas_ss_size, &frame->uc.uc_stack.ss_size);
-	err |= setup_sigcontext(&frame->uc.uc_mcontext, fp, regs, set->sig[0]);
-	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
-
-	/* Set up to return from userspace.  If provided, use a stub
-	   already in userspace.  */
-	/* x86-64 should always use SA_RESTORER. */
-	if (ka->sa.sa_flags & SA_RESTORER) {
-		err |= __put_user(ka->sa.sa_restorer, &frame->pretcode);
-	} else {
-		/* could use a vstub here */
-		return -EFAULT;
-	}
+	put_user_try {
+		/* Create the ucontext.  */
+		if (cpu_has_xsave)
+			put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
+		else
+			put_user_ex(0, &frame->uc.uc_flags);
+		put_user_ex(0, &frame->uc.uc_link);
+		put_user_ex(me->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
+		put_user_ex(sas_ss_flags(regs->sp),
+			    &frame->uc.uc_stack.ss_flags);
+		put_user_ex(me->sas_ss_size, &frame->uc.uc_stack.ss_size);
+		err |= setup_sigcontext(&frame->uc.uc_mcontext, fp, regs, set->sig[0]);
+		err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
+
+		/* Set up to return from userspace.  If provided, use a stub
+		   already in userspace.  */
+		/* x86-64 should always use SA_RESTORER. */
+		if (ka->sa.sa_flags & SA_RESTORER) {
+			put_user_ex(ka->sa.sa_restorer, &frame->pretcode);
+		} else {
+			/* could use a vstub here */
+			err |= -EFAULT;
+		}
+	} put_user_catch(err);
 
 	if (err)
 		return -EFAULT;
@@ -509,31 +498,41 @@ sys_sigaction(int sig, const struct old_sigaction __user *act,
 	      struct old_sigaction __user *oact)
 {
 	struct k_sigaction new_ka, old_ka;
-	int ret;
+	int ret = 0;
 
 	if (act) {
 		old_sigset_t mask;
 
-		if (!access_ok(VERIFY_READ, act, sizeof(*act)) ||
-		    __get_user(new_ka.sa.sa_handler, &act->sa_handler) ||
-		    __get_user(new_ka.sa.sa_restorer, &act->sa_restorer))
+		if (!access_ok(VERIFY_READ, act, sizeof(*act)))
 			return -EFAULT;
 
-		__get_user(new_ka.sa.sa_flags, &act->sa_flags);
-		__get_user(mask, &act->sa_mask);
+		get_user_try {
+			get_user_ex(new_ka.sa.sa_handler, &act->sa_handler);
+			get_user_ex(new_ka.sa.sa_flags, &act->sa_flags);
+			get_user_ex(mask, &act->sa_mask);
+			get_user_ex(new_ka.sa.sa_restorer, &act->sa_restorer);
+		} get_user_catch(ret);
+
+		if (ret)
+			return -EFAULT;
 		siginitset(&new_ka.sa.sa_mask, mask);
 	}
 
 	ret = do_sigaction(sig, act ? &new_ka : NULL, oact ? &old_ka : NULL);
 
 	if (!ret && oact) {
-		if (!access_ok(VERIFY_WRITE, oact, sizeof(*oact)) ||
-		    __put_user(old_ka.sa.sa_handler, &oact->sa_handler) ||
-		    __put_user(old_ka.sa.sa_restorer, &oact->sa_restorer))
+		if (!access_ok(VERIFY_WRITE, oact, sizeof(*oact)))
 			return -EFAULT;
 
-		__put_user(old_ka.sa.sa_flags, &oact->sa_flags);
-		__put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask);
+		put_user_try {
+			put_user_ex(old_ka.sa.sa_handler, &oact->sa_handler);
+			put_user_ex(old_ka.sa.sa_flags, &oact->sa_flags);
+			put_user_ex(old_ka.sa.sa_mask.sig[0], &oact->sa_mask);
+			put_user_ex(old_ka.sa.sa_restorer, &oact->sa_restorer);
+		} put_user_catch(ret);
+
+		if (ret)
+			return -EFAULT;
 	}
 
 	return ret;
@@ -541,14 +540,9 @@ sys_sigaction(int sig, const struct old_sigaction __user *act,
 #endif /* CONFIG_X86_32 */
 
 #ifdef CONFIG_X86_32
-asmlinkage int sys_sigaltstack(unsigned long bx)
+int sys_sigaltstack(struct pt_regs *regs)
 {
-	/*
-	 * This is needed to make gcc realize it doesn't own the
-	 * "struct pt_regs"
-	 */
-	struct pt_regs *regs = (struct pt_regs *)&bx;
-	const stack_t __user *uss = (const stack_t __user *)bx;
+	const stack_t __user *uss = (const stack_t __user *)regs->bx;
 	stack_t __user *uoss = (stack_t __user *)regs->cx;
 
 	return do_sigaltstack(uss, uoss, regs->sp);
@@ -566,14 +560,12 @@ sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
  * Do a signal return; undo the signal stack.
  */
 #ifdef CONFIG_X86_32
-asmlinkage unsigned long sys_sigreturn(unsigned long __unused)
+unsigned long sys_sigreturn(struct pt_regs *regs)
 {
 	struct sigframe __user *frame;
-	struct pt_regs *regs;
 	unsigned long ax;
 	sigset_t set;
 
-	regs = (struct pt_regs *) &__unused;
 	frame = (struct sigframe __user *)(regs->sp - 8);
 
 	if (!access_ok(VERIFY_READ, frame, sizeof(*frame)))
@@ -600,7 +592,7 @@ badframe:
 }
 #endif /* CONFIG_X86_32 */
 
-static long do_rt_sigreturn(struct pt_regs *regs)
+long sys_rt_sigreturn(struct pt_regs *regs)
 {
 	struct rt_sigframe __user *frame;
 	unsigned long ax;
@@ -631,25 +623,6 @@ badframe:
 	return 0;
 }
 
-#ifdef CONFIG_X86_32
-/*
- * Note: do not pass in pt_regs directly as with tail-call optimization
- * GCC will incorrectly stomp on the caller's frame and corrupt user-space
- * register state:
- */
-asmlinkage int sys_rt_sigreturn(unsigned long __unused)
-{
-	struct pt_regs *regs = (struct pt_regs *)&__unused;
-
-	return do_rt_sigreturn(regs);
-}
-#else /* !CONFIG_X86_32 */
-asmlinkage long sys_rt_sigreturn(struct pt_regs *regs)
-{
-	return do_rt_sigreturn(regs);
-}
-#endif /* CONFIG_X86_32 */
-
 /*
  * OK, we're invoking a handler:
  */
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index e6faa3316bd2..13f33ea8ccaa 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -2,7 +2,7 @@
  *	Intel SMP support routines.
  *
  *	(c) 1995 Alan Cox, Building #3 <alan@lxorguk.ukuu.org.uk>
- *	(c) 1998-99, 2000 Ingo Molnar <mingo@redhat.com>
+ *	(c) 1998-99, 2000, 2009 Ingo Molnar <mingo@redhat.com>
  *      (c) 2002,2003 Andi Kleen, SuSE Labs.
  *
  *	i386 and x86_64 integration by Glauber Costa <gcosta@redhat.com>
@@ -26,8 +26,7 @@
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
 #include <asm/proto.h>
-#include <mach_ipi.h>
-#include <mach_apic.h>
+#include <asm/apic.h>
 /*
  *	Some notes on x86 processor bugs affecting SMP operation:
  *
@@ -118,12 +117,12 @@ static void native_smp_send_reschedule(int cpu)
 		WARN_ON(1);
 		return;
 	}
-	send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
+	apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
-	send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
+	apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
 }
 
 void native_send_call_func_ipi(const struct cpumask *mask)
@@ -131,7 +130,7 @@ void native_send_call_func_ipi(const struct cpumask *mask)
 	cpumask_var_t allbutself;
 
 	if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
-		send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 		return;
 	}
 
@@ -140,9 +139,9 @@ void native_send_call_func_ipi(const struct cpumask *mask)
 
 	if (cpumask_equal(mask, allbutself) &&
 	    cpumask_equal(cpu_online_mask, cpu_callout_mask))
-		send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+		apic->send_IPI_allbutself(CALL_FUNCTION_VECTOR);
 	else
-		send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
+		apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 
 	free_cpumask_var(allbutself);
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bb1a3b1fc87f..249334f5080a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -2,7 +2,7 @@
  *	x86 SMP booting functions
  *
  *	(c) 1995 Alan Cox, Building #3 <alan@lxorguk.ukuu.org.uk>
- *	(c) 1998, 1999, 2000 Ingo Molnar <mingo@redhat.com>
+ *	(c) 1998, 1999, 2000, 2009 Ingo Molnar <mingo@redhat.com>
  *	Copyright 2001 Andi Kleen, SuSE Labs.
  *
  *	Much of the core SMP work is based on previous work by Thomas Radke, to
@@ -53,7 +53,6 @@
 #include <asm/nmi.h>
 #include <asm/irq.h>
 #include <asm/idle.h>
-#include <asm/smp.h>
 #include <asm/trampoline.h>
 #include <asm/cpu.h>
 #include <asm/numa.h>
@@ -61,13 +60,12 @@
 #include <asm/tlbflush.h>
 #include <asm/mtrr.h>
 #include <asm/vmi.h>
-#include <asm/genapic.h>
+#include <asm/apic.h>
 #include <asm/setup.h>
+#include <asm/uv/uv.h>
 #include <linux/mc146818rtc.h>
 
-#include <mach_apic.h>
-#include <mach_wakecpu.h>
-#include <smpboot_hooks.h>
+#include <asm/smpboot_hooks.h>
 
 #ifdef CONFIG_X86_32
 u8 apicid_2_node[MAX_APICID];
@@ -114,7 +112,7 @@ EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 DEFINE_PER_CPU_SHARED_ALIGNED(struct cpuinfo_x86, cpu_info);
 EXPORT_PER_CPU_SYMBOL(cpu_info);
 
-static atomic_t init_deasserted;
+atomic_t init_deasserted;
 
 
 /* Set if we find a B stepping CPU */
@@ -163,7 +161,7 @@ static void map_cpu_to_logical_apicid(void)
 {
 	int cpu = smp_processor_id();
 	int apicid = logical_smp_processor_id();
-	int node = apicid_to_node(apicid);
+	int node = apic->apicid_to_node(apicid);
 
 	if (!node_online(node))
 		node = first_online_node;
@@ -196,7 +194,8 @@ static void __cpuinit smp_callin(void)
 	 * our local APIC.  We have to wait for the IPI or we'll
 	 * lock up on an APIC access.
 	 */
-	wait_for_init_deassert(&init_deasserted);
+	if (apic->wait_for_init_deassert)
+		apic->wait_for_init_deassert(&init_deasserted);
 
 	/*
 	 * (This works even if the APIC is not enabled.)
@@ -243,7 +242,8 @@ static void __cpuinit smp_callin(void)
 	 */
 
 	pr_debug("CALLIN, before setup_local_APIC().\n");
-	smp_callin_clear_local_apic();
+	if (apic->smp_callin_clear_local_apic)
+		apic->smp_callin_clear_local_apic();
 	setup_local_APIC();
 	end_local_APIC_setup();
 	map_cpu_to_logical_apicid();
@@ -583,7 +583,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
 	/* Target chip */
 	/* Boot on the stack */
 	/* Kick the second */
-	apic_icr_write(APIC_DM_NMI | APIC_DEST_LOGICAL, logical_apicid);
+	apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
 
 	pr_debug("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -614,12 +614,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
 	unsigned long send_status, accept_status = 0;
 	int maxlvt, num_starts, j;
 
-	if (get_uv_system_type() == UV_NON_UNIQUE_APIC) {
-		send_status = uv_wakeup_secondary(phys_apicid, start_eip);
-		atomic_set(&init_deasserted, 1);
-		return send_status;
-	}
-
 	maxlvt = lapic_get_maxlvt();
 
 	/*
@@ -745,78 +739,23 @@ static void __cpuinit do_fork_idle(struct work_struct *work)
 	complete(&c_idle->done);
 }
 
-#ifdef CONFIG_X86_64
-
-/* __ref because it's safe to call free_bootmem when after_bootmem == 0. */
-static void __ref free_bootmem_pda(struct x8664_pda *oldpda)
-{
-	if (!after_bootmem)
-		free_bootmem((unsigned long)oldpda, sizeof(*oldpda));
-}
-
-/*
- * Allocate node local memory for the AP pda.
- *
- * Must be called after the _cpu_pda pointer table is initialized.
- */
-int __cpuinit get_local_pda(int cpu)
-{
-	struct x8664_pda *oldpda, *newpda;
-	unsigned long size = sizeof(struct x8664_pda);
-	int node = cpu_to_node(cpu);
-
-	if (cpu_pda(cpu) && !cpu_pda(cpu)->in_bootmem)
-		return 0;
-
-	oldpda = cpu_pda(cpu);
-	newpda = kmalloc_node(size, GFP_ATOMIC, node);
-	if (!newpda) {
-		printk(KERN_ERR "Could not allocate node local PDA "
-			"for CPU %d on node %d\n", cpu, node);
-
-		if (oldpda)
-			return 0;	/* have a usable pda */
-		else
-			return -1;
-	}
-
-	if (oldpda) {
-		memcpy(newpda, oldpda, size);
-		free_bootmem_pda(oldpda);
-	}
-
-	newpda->in_bootmem = 0;
-	cpu_pda(cpu) = newpda;
-	return 0;
-}
-#endif /* CONFIG_X86_64 */
-
-static int __cpuinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
- * Returns zero if CPU booted OK, else error code from wakeup_secondary_cpu.
+ * Returns zero if CPU booted OK, else error code from
+ * ->wakeup_secondary_cpu.
  */
+static int __cpuinit do_boot_cpu(int apicid, int cpu)
 {
 	unsigned long boot_error = 0;
-	int timeout;
 	unsigned long start_ip;
-	unsigned short nmi_high = 0, nmi_low = 0;
+	int timeout;
 	struct create_idle c_idle = {
-		.cpu = cpu,
-		.done = COMPLETION_INITIALIZER_ONSTACK(c_idle.done),
+		.cpu	= cpu,
+		.done	= COMPLETION_INITIALIZER_ONSTACK(c_idle.done),
 	};
-	INIT_WORK(&c_idle.work, do_fork_idle);
 
-#ifdef CONFIG_X86_64
-	/* Allocate node local memory for AP pdas */
-	if (cpu > 0) {
-		boot_error = get_local_pda(cpu);
-		if (boot_error)
-			goto restore_state;
-			/* if can't get pda memory, can't start cpu */
-	}
-#endif
+	INIT_WORK(&c_idle.work, do_fork_idle);
 
 	alternatives_smp_switch(1);
 
@@ -847,14 +786,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
 
 	set_idle_for_cpu(cpu, c_idle.idle);
 do_rest:
-#ifdef CONFIG_X86_32
 	per_cpu(current_task, cpu) = c_idle.idle;
-	init_gdt(cpu);
+#ifdef CONFIG_X86_32
 	/* Stack for startup_32 can be just as for start_secondary onwards */
 	irq_ctx_init(cpu);
 #else
-	cpu_pda(cpu)->pcurrent = c_idle.idle;
 	clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
+	initial_gs = per_cpu_offset(cpu);
+	per_cpu(kernel_stack, cpu) =
+		(unsigned long)task_stack_page(c_idle.idle) -
+		KERNEL_STACK_OFFSET + THREAD_SIZE;
 #endif
 	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
 	initial_code = (unsigned long)start_secondary;
@@ -878,8 +819,6 @@ do_rest:
 
 		pr_debug("Setting warm reset code and vector.\n");
 
-		store_NMI_vector(&nmi_high, &nmi_low);
-
 		smpboot_setup_warm_reset_vector(start_ip);
 		/*
 		 * Be paranoid about clearing APIC errors.
@@ -891,9 +830,13 @@ do_rest:
 	}
 
 	/*
-	 * Starting actual IPI sequence...
+	 * Kick the secondary CPU. Use the method in the APIC driver
+	 * if it's defined - or use an INIT boot APIC message otherwise:
 	 */
-	boot_error = wakeup_secondary_cpu(apicid, start_ip);
+	if (apic->wakeup_secondary_cpu)
+		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
+	else
+		boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
 
 	if (!boot_error) {
 		/*
@@ -927,13 +870,11 @@ do_rest:
 			else
 				/* trampoline code not run */
 				printk(KERN_ERR "Not responding.\n");
-			if (get_uv_system_type() != UV_NON_UNIQUE_APIC)
-				inquire_remote_apic(apicid);
+			if (apic->inquire_remote_apic)
+				apic->inquire_remote_apic(apicid);
 		}
 	}
-#ifdef CONFIG_X86_64
-restore_state:
-#endif
+
 	if (boot_error) {
 		/* Try to put things back the way they were before ... */
 		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
@@ -961,7 +902,7 @@ restore_state:
 
 int __cpuinit native_cpu_up(unsigned int cpu)
 {
-	int apicid = cpu_present_to_apicid(cpu);
+	int apicid = apic->cpu_present_to_apicid(cpu);
 	unsigned long flags;
 	int err;
 
@@ -1054,14 +995,14 @@ static int __init smp_sanity_check(unsigned max_cpus)
 {
 	preempt_disable();
 
-#if defined(CONFIG_X86_PC) && defined(CONFIG_X86_32)
+#if !defined(CONFIG_X86_BIGSMP) && defined(CONFIG_X86_32)
 	if (def_to_bigsmp && nr_cpu_ids > 8) {
 		unsigned int cpu;
 		unsigned nr;
 
 		printk(KERN_WARNING
 		       "More than 8 CPUs detected - skipping them.\n"
-		       "Use CONFIG_X86_GENERICARCH and CONFIG_X86_BIGSMP.\n");
+		       "Use CONFIG_X86_BIGSMP.\n");
 
 		nr = 0;
 		for_each_present_cpu(cpu) {
@@ -1107,7 +1048,7 @@ static int __init smp_sanity_check(unsigned max_cpus)
 	 * Should not be necessary because the MP table should list the boot
 	 * CPU too, but we do it for the sake of robustness anyway.
 	 */
-	if (!check_phys_apicid_present(boot_cpu_physical_apicid)) {
+	if (!apic->check_phys_apicid_present(boot_cpu_physical_apicid)) {
 		printk(KERN_NOTICE
 			"weird, boot CPU (#%d) not listed by the BIOS.\n",
 			boot_cpu_physical_apicid);
@@ -1125,6 +1066,7 @@ static int __init smp_sanity_check(unsigned max_cpus)
 		printk(KERN_ERR "... forcing use of dummy APIC emulation."
 				"(tell your hw vendor)\n");
 		smpboot_clear_io_apic();
+		arch_disable_smp_support();
 		return -1;
 	}
 
@@ -1181,9 +1123,9 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	current_thread_info()->cpu = 0;  /* needed? */
 	set_cpu_sibling_map(0);
 
-#ifdef CONFIG_X86_64
 	enable_IR_x2apic();
-	setup_apic_routing();
+#ifdef CONFIG_X86_64
+	default_setup_apic_routing();
 #endif
 
 	if (smp_sanity_check(max_cpus) < 0) {
@@ -1207,18 +1149,18 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	setup_local_APIC();
 
-#ifdef CONFIG_X86_64
 	/*
 	 * Enable IO APIC before setting up error vector
 	 */
 	if (!skip_ioapic_setup && nr_ioapics)
 		enable_IO_APIC();
-#endif
+
 	end_local_APIC_setup();
 
 	map_cpu_to_logical_apicid();
 
-	setup_portio_remap();
+	if (apic->setup_portio_remap)
+		apic->setup_portio_remap();
 
 	smpboot_setup_io_apic();
 	/*
@@ -1240,10 +1182,7 @@ out:
 void __init native_smp_prepare_boot_cpu(void)
 {
 	int me = smp_processor_id();
-#ifdef CONFIG_X86_32
-	init_gdt(me);
-#endif
-	switch_to_new_gdt();
+	switch_to_new_gdt(me);
 	/* already set me in cpu_online_mask in boot_cpu_init() */
 	cpumask_set_cpu(me, cpu_callout_mask);
 	per_cpu(cpu_state, me) = CPU_ONLINE;
diff --git a/arch/x86/kernel/smpcommon.c b/arch/x86/kernel/smpcommon.c
deleted file mode 100644
index 397e309839dd..000000000000
--- a/arch/x86/kernel/smpcommon.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/*
- * SMP stuff which is common to all sub-architectures.
- */
-#include <linux/module.h>
-#include <asm/smp.h>
-
-#ifdef CONFIG_X86_32
-DEFINE_PER_CPU(unsigned long, this_cpu_off);
-EXPORT_PER_CPU_SYMBOL(this_cpu_off);
-
-/*
- * Initialize the CPU's GDT.  This is either the boot CPU doing itself
- * (still using the master per-cpu area), or a CPU doing it for a
- * secondary which will soon come up.
- */
-__cpuinit void init_gdt(int cpu)
-{
-	struct desc_struct gdt;
-
-	pack_descriptor(&gdt, __per_cpu_offset[cpu], 0xFFFFF,
-			0x2 | DESCTYPE_S, 0x8);
-	gdt.s = 1;
-
-	write_gdt_entry(get_cpu_gdt_table(cpu),
-			GDT_ENTRY_PERCPU, &gdt, DESCTYPE_S);
-
-	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
-	per_cpu(cpu_number, cpu) = cpu;
-}
-#endif
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 10786af95545..f7bddc2e37d1 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -1,7 +1,7 @@
 /*
  * Stack trace management functions
  *
- *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *  Copyright (C) 2006-2009 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
  */
 #include <linux/sched.h>
 #include <linux/stacktrace.h>
diff --git a/arch/x86/kernel/summit_32.c b/arch/x86/kernel/summit_32.c
deleted file mode 100644
index 7b987852e876..000000000000
--- a/arch/x86/kernel/summit_32.c
+++ /dev/null
@@ -1,188 +0,0 @@
-/*
- * IBM Summit-Specific Code
- *
- * Written By: Matthew Dobson, IBM Corporation
- *
- * Copyright (c) 2003 IBM Corp.
- *
- * All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to <colpatch@us.ibm.com>
- *
- */
-
-#include <linux/mm.h>
-#include <linux/init.h>
-#include <asm/io.h>
-#include <asm/bios_ebda.h>
-#include <asm/summit/mpparse.h>
-
-static struct rio_table_hdr *rio_table_hdr __initdata;
-static struct scal_detail   *scal_devs[MAX_NUMNODES] __initdata;
-static struct rio_detail    *rio_devs[MAX_NUMNODES*4] __initdata;
-
-#ifndef CONFIG_X86_NUMAQ
-static int mp_bus_id_to_node[MAX_MP_BUSSES] __initdata;
-#endif
-
-static int __init setup_pci_node_map_for_wpeg(int wpeg_num, int last_bus)
-{
-	int twister = 0, node = 0;
-	int i, bus, num_buses;
-
-	for (i = 0; i < rio_table_hdr->num_rio_dev; i++) {
-		if (rio_devs[i]->node_id == rio_devs[wpeg_num]->owner_id) {
-			twister = rio_devs[i]->owner_id;
-			break;
-		}
-	}
-	if (i == rio_table_hdr->num_rio_dev) {
-		printk(KERN_ERR "%s: Couldn't find owner Cyclone for Winnipeg!\n", __func__);
-		return last_bus;
-	}
-
-	for (i = 0; i < rio_table_hdr->num_scal_dev; i++) {
-		if (scal_devs[i]->node_id == twister) {
-			node = scal_devs[i]->node_id;
-			break;
-		}
-	}
-	if (i == rio_table_hdr->num_scal_dev) {
-		printk(KERN_ERR "%s: Couldn't find owner Twister for Cyclone!\n", __func__);
-		return last_bus;
-	}
-
-	switch (rio_devs[wpeg_num]->type) {
-	case CompatWPEG:
-		/*
-		 * The Compatibility Winnipeg controls the 2 legacy buses,
-		 * the 66MHz PCI bus [2 slots] and the 2 "extra" buses in case
-		 * a PCI-PCI bridge card is used in either slot: total 5 buses.
-		 */
-		num_buses = 5;
-		break;
-	case AltWPEG:
-		/*
-		 * The Alternate Winnipeg controls the 2 133MHz buses [1 slot
-		 * each], their 2 "extra" buses, the 100MHz bus [2 slots] and
-		 * the "extra" buses for each of those slots: total 7 buses.
-		 */
-		num_buses = 7;
-		break;
-	case LookOutAWPEG:
-	case LookOutBWPEG:
-		/*
-		 * A Lookout Winnipeg controls 3 100MHz buses [2 slots each]
-		 * & the "extra" buses for each of those slots: total 9 buses.
-		 */
-		num_buses = 9;
-		break;
-	default:
-		printk(KERN_INFO "%s: Unsupported Winnipeg type!\n", __func__);
-		return last_bus;
-	}
-
-	for (bus = last_bus; bus < last_bus + num_buses; bus++)
-		mp_bus_id_to_node[bus] = node;
-	return bus;
-}
-
-static int __init build_detail_arrays(void)
-{
-	unsigned long ptr;
-	int i, scal_detail_size, rio_detail_size;
-
-	if (rio_table_hdr->num_scal_dev > MAX_NUMNODES) {
-		printk(KERN_WARNING "%s: MAX_NUMNODES too low!  Defined as %d, but system has %d nodes.\n", __func__, MAX_NUMNODES, rio_table_hdr->num_scal_dev);
-		return 0;
-	}
-
-	switch (rio_table_hdr->version) {
-	default:
-		printk(KERN_WARNING "%s: Invalid Rio Grande Table Version: %d\n", __func__, rio_table_hdr->version);
-		return 0;
-	case 2:
-		scal_detail_size = 11;
-		rio_detail_size = 13;
-		break;
-	case 3:
-		scal_detail_size = 12;
-		rio_detail_size = 15;
-		break;
-	}
-
-	ptr = (unsigned long)rio_table_hdr + 3;
-	for (i = 0; i < rio_table_hdr->num_scal_dev; i++, ptr += scal_detail_size)
-		scal_devs[i] = (struct scal_detail *)ptr;
-
-	for (i = 0; i < rio_table_hdr->num_rio_dev; i++, ptr += rio_detail_size)
-		rio_devs[i] = (struct rio_detail *)ptr;
-
-	return 1;
-}
-
-void __init setup_summit(void)
-{
-	unsigned long		ptr;
-	unsigned short		offset;
-	int			i, next_wpeg, next_bus = 0;
-
-	/* The pointer to the EBDA is stored in the word @ phys 0x40E(40:0E) */
-	ptr = get_bios_ebda();
-	ptr = (unsigned long)phys_to_virt(ptr);
-
-	rio_table_hdr = NULL;
-	offset = 0x180;
-	while (offset) {
-		/* The block id is stored in the 2nd word */
-		if (*((unsigned short *)(ptr + offset + 2)) == 0x4752) {
-			/* set the pointer past the offset & block id */
-			rio_table_hdr = (struct rio_table_hdr *)(ptr + offset + 4);
-			break;
-		}
-		/* The next offset is stored in the 1st word.  0 means no more */
-		offset = *((unsigned short *)(ptr + offset));
-	}
-	if (!rio_table_hdr) {
-		printk(KERN_ERR "%s: Unable to locate Rio Grande Table in EBDA - bailing!\n", __func__);
-		return;
-	}
-
-	if (!build_detail_arrays())
-		return;
-
-	/* The first Winnipeg we're looking for has an index of 0 */
-	next_wpeg = 0;
-	do {
-		for (i = 0; i < rio_table_hdr->num_rio_dev; i++) {
-			if (is_WPEG(rio_devs[i]) && rio_devs[i]->WP_index == next_wpeg) {
-				/* It's the Winnipeg we're looking for! */
-				next_bus = setup_pci_node_map_for_wpeg(i, next_bus);
-				next_wpeg++;
-				break;
-			}
-		}
-		/*
-		 * If we go through all Rio devices and don't find one with
-		 * the next index, it means we've found all the Winnipegs,
-		 * and thus all the PCI buses.
-		 */
-		if (i == rio_table_hdr->num_rio_dev)
-			next_wpeg = 0;
-	} while (next_wpeg != 0);
-}
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index e2e86a08f31d..3bdb64829b82 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -1,7 +1,7 @@
 ENTRY(sys_call_table)
 	.long sys_restart_syscall	/* 0 - old "setup()" system call, used for restarting */
 	.long sys_exit
-	.long sys_fork
+	.long ptregs_fork
 	.long sys_read
 	.long sys_write
 	.long sys_open		/* 5 */
@@ -10,7 +10,7 @@ ENTRY(sys_call_table)
 	.long sys_creat
 	.long sys_link
 	.long sys_unlink	/* 10 */
-	.long sys_execve
+	.long ptregs_execve
 	.long sys_chdir
 	.long sys_time
 	.long sys_mknod
@@ -109,17 +109,17 @@ ENTRY(sys_call_table)
 	.long sys_newlstat
 	.long sys_newfstat
 	.long sys_uname
-	.long sys_iopl		/* 110 */
+	.long ptregs_iopl	/* 110 */
 	.long sys_vhangup
 	.long sys_ni_syscall	/* old "idle" system call */
-	.long sys_vm86old
+	.long ptregs_vm86old
 	.long sys_wait4
 	.long sys_swapoff	/* 115 */
 	.long sys_sysinfo
 	.long sys_ipc
 	.long sys_fsync
-	.long sys_sigreturn
-	.long sys_clone		/* 120 */
+	.long ptregs_sigreturn
+	.long ptregs_clone	/* 120 */
 	.long sys_setdomainname
 	.long sys_newuname
 	.long sys_modify_ldt
@@ -165,14 +165,14 @@ ENTRY(sys_call_table)
 	.long sys_mremap
 	.long sys_setresuid16
 	.long sys_getresuid16	/* 165 */
-	.long sys_vm86
+	.long ptregs_vm86
 	.long sys_ni_syscall	/* Old sys_query_module */
 	.long sys_poll
 	.long sys_nfsservctl
 	.long sys_setresgid16	/* 170 */
 	.long sys_getresgid16
 	.long sys_prctl
-	.long sys_rt_sigreturn
+	.long ptregs_rt_sigreturn
 	.long sys_rt_sigaction
 	.long sys_rt_sigprocmask	/* 175 */
 	.long sys_rt_sigpending
@@ -185,11 +185,11 @@ ENTRY(sys_call_table)
 	.long sys_getcwd
 	.long sys_capget
 	.long sys_capset	/* 185 */
-	.long sys_sigaltstack
+	.long ptregs_sigaltstack
 	.long sys_sendfile
 	.long sys_ni_syscall	/* reserved for streams1 */
 	.long sys_ni_syscall	/* reserved for streams2 */
-	.long sys_vfork		/* 190 */
+	.long ptregs_vfork	/* 190 */
 	.long sys_getrlimit
 	.long sys_mmap2
 	.long sys_truncate64
diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c
index 3985cac0ed47..5c5d87f0b2e1 100644
--- a/arch/x86/kernel/time_32.c
+++ b/arch/x86/kernel/time_32.c
@@ -33,12 +33,12 @@
 #include <linux/time.h>
 #include <linux/mca.h>
 
-#include <asm/arch_hooks.h>
+#include <asm/setup.h>
 #include <asm/hpet.h>
 #include <asm/time.h>
 #include <asm/timer.h>
 
-#include "do_timer.h"
+#include <asm/do_timer.h>
 
 int timer_ack;
 
@@ -118,7 +118,7 @@ void __init hpet_time_init(void)
 {
 	if (!hpet_enable())
 		setup_pit_timer();
-	time_init_hook();
+	x86_quirk_time_init();
 }
 
 /*
@@ -131,7 +131,7 @@ void __init hpet_time_init(void)
  */
 void __init time_init(void)
 {
-	pre_time_init_hook();
+	x86_quirk_pre_time_init();
 	tsc_init();
 	late_time_init = choose_time_init();
 }
diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
deleted file mode 100644
index ce5054642247..000000000000
--- a/arch/x86/kernel/tlb_32.c
+++ /dev/null
@@ -1,256 +0,0 @@
-#include <linux/spinlock.h>
-#include <linux/cpu.h>
-#include <linux/interrupt.h>
-
-#include <asm/tlbflush.h>
-
-DEFINE_PER_CPU(struct tlb_state, cpu_tlbstate)
-			____cacheline_aligned = { &init_mm, 0, };
-
-/* must come after the send_IPI functions above for inlining */
-#include <mach_ipi.h>
-
-/*
- *	Smarter SMP flushing macros.
- *		c/o Linus Torvalds.
- *
- *	These mean you can really definitely utterly forget about
- *	writing to user space from interrupts. (Its not allowed anyway).
- *
- *	Optimizations Manfred Spraul <manfred@colorfullife.com>
- */
-
-static cpumask_t flush_cpumask;
-static struct mm_struct *flush_mm;
-static unsigned long flush_va;
-static DEFINE_SPINLOCK(tlbstate_lock);
-
-/*
- * We cannot call mmdrop() because we are in interrupt context,
- * instead update mm->cpu_vm_mask.
- *
- * We need to reload %cr3 since the page tables may be going
- * away from under us..
- */
-void leave_mm(int cpu)
-{
-	BUG_ON(x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK);
-	cpu_clear(cpu, x86_read_percpu(cpu_tlbstate.active_mm)->cpu_vm_mask);
-	load_cr3(swapper_pg_dir);
-}
-EXPORT_SYMBOL_GPL(leave_mm);
-
-/*
- *
- * The flush IPI assumes that a thread switch happens in this order:
- * [cpu0: the cpu that switches]
- * 1) switch_mm() either 1a) or 1b)
- * 1a) thread switch to a different mm
- * 1a1) cpu_clear(cpu, old_mm->cpu_vm_mask);
- * 	Stop ipi delivery for the old mm. This is not synchronized with
- * 	the other cpus, but smp_invalidate_interrupt ignore flush ipis
- * 	for the wrong mm, and in the worst case we perform a superfluous
- * 	tlb flush.
- * 1a2) set cpu_tlbstate to TLBSTATE_OK
- * 	Now the smp_invalidate_interrupt won't call leave_mm if cpu0
- *	was in lazy tlb mode.
- * 1a3) update cpu_tlbstate[].active_mm
- * 	Now cpu0 accepts tlb flushes for the new mm.
- * 1a4) cpu_set(cpu, new_mm->cpu_vm_mask);
- * 	Now the other cpus will send tlb flush ipis.
- * 1a4) change cr3.
- * 1b) thread switch without mm change
- *	cpu_tlbstate[].active_mm is correct, cpu0 already handles
- *	flush ipis.
- * 1b1) set cpu_tlbstate to TLBSTATE_OK
- * 1b2) test_and_set the cpu bit in cpu_vm_mask.
- * 	Atomically set the bit [other cpus will start sending flush ipis],
- * 	and test the bit.
- * 1b3) if the bit was 0: leave_mm was called, flush the tlb.
- * 2) switch %%esp, ie current
- *
- * The interrupt must handle 2 special cases:
- * - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
- * - the cpu performs speculative tlb reads, i.e. even if the cpu only
- *   runs in kernel space, the cpu could load tlb entries for user space
- *   pages.
- *
- * The good news is that cpu_tlbstate is local to each cpu, no
- * write/read ordering problems.
- */
-
-/*
- * TLB flush IPI:
- *
- * 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
- * 2) Leave the mm if we are in the lazy tlb mode.
- */
-
-void smp_invalidate_interrupt(struct pt_regs *regs)
-{
-	unsigned long cpu;
-
-	cpu = get_cpu();
-
-	if (!cpu_isset(cpu, flush_cpumask))
-		goto out;
-		/*
-		 * This was a BUG() but until someone can quote me the
-		 * line from the intel manual that guarantees an IPI to
-		 * multiple CPUs is retried _only_ on the erroring CPUs
-		 * its staying as a return
-		 *
-		 * BUG();
-		 */
-
-	if (flush_mm == x86_read_percpu(cpu_tlbstate.active_mm)) {
-		if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_OK) {
-			if (flush_va == TLB_FLUSH_ALL)
-				local_flush_tlb();
-			else
-				__flush_tlb_one(flush_va);
-		} else
-			leave_mm(cpu);
-	}
-	ack_APIC_irq();
-	smp_mb__before_clear_bit();
-	cpu_clear(cpu, flush_cpumask);
-	smp_mb__after_clear_bit();
-out:
-	put_cpu_no_resched();
-	inc_irq_stat(irq_tlb_count);
-}
-
-void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
-			     unsigned long va)
-{
-	cpumask_t cpumask = *cpumaskp;
-
-	/*
-	 * A couple of (to be removed) sanity checks:
-	 *
-	 * - current CPU must not be in mask
-	 * - mask must exist :)
-	 */
-	BUG_ON(cpus_empty(cpumask));
-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
-	BUG_ON(!mm);
-
-#ifdef CONFIG_HOTPLUG_CPU
-	/* If a CPU which we ran on has gone down, OK. */
-	cpus_and(cpumask, cpumask, cpu_online_map);
-	if (unlikely(cpus_empty(cpumask)))
-		return;
-#endif
-
-	/*
-	 * i'm not happy about this global shared spinlock in the
-	 * MM hot path, but we'll see how contended it is.
-	 * AK: x86-64 has a faster method that could be ported.
-	 */
-	spin_lock(&tlbstate_lock);
-
-	flush_mm = mm;
-	flush_va = va;
-	cpus_or(flush_cpumask, cpumask, flush_cpumask);
-
-	/*
-	 * Make the above memory operations globally visible before
-	 * sending the IPI.
-	 */
-	smp_mb();
-	/*
-	 * We have to send the IPI only to
-	 * CPUs affected.
-	 */
-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR);
-
-	while (!cpus_empty(flush_cpumask))
-		/* nothing. lockup detection does not belong here */
-		cpu_relax();
-
-	flush_mm = NULL;
-	flush_va = 0;
-	spin_unlock(&tlbstate_lock);
-}
-
-void flush_tlb_current_task(void)
-{
-	struct mm_struct *mm = current->mm;
-	cpumask_t cpu_mask;
-
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
-	local_flush_tlb();
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
-	preempt_enable();
-}
-
-void flush_tlb_mm(struct mm_struct *mm)
-{
-	cpumask_t cpu_mask;
-
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
-	if (current->active_mm == mm) {
-		if (current->mm)
-			local_flush_tlb();
-		else
-			leave_mm(smp_processor_id());
-	}
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
-
-	preempt_enable();
-}
-
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	cpumask_t cpu_mask;
-
-	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
-
-	if (current->active_mm == mm) {
-		if (current->mm)
-			__flush_tlb_one(va);
-		 else
-			leave_mm(smp_processor_id());
-	}
-
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, va);
-
-	preempt_enable();
-}
-EXPORT_SYMBOL(flush_tlb_page);
-
-static void do_flush_tlb_all(void *info)
-{
-	unsigned long cpu = smp_processor_id();
-
-	__flush_tlb_all();
-	if (x86_read_percpu(cpu_tlbstate.state) == TLBSTATE_LAZY)
-		leave_mm(cpu);
-}
-
-void flush_tlb_all(void)
-{
-	on_each_cpu(do_flush_tlb_all, NULL, 1);
-}
-
-void reset_lazy_tlbstate(void)
-{
-	int cpu = raw_smp_processor_id();
-
-	per_cpu(cpu_tlbstate, cpu).state = 0;
-	per_cpu(cpu_tlbstate, cpu).active_mm = &init_mm;
-}
-
diff --git a/arch/x86/kernel/tlb_uv.c b/arch/x86/kernel/tlb_uv.c
index 6812b829ed83..d038b9c45cf8 100644
--- a/arch/x86/kernel/tlb_uv.c
+++ b/arch/x86/kernel/tlb_uv.c
@@ -11,16 +11,15 @@
 #include <linux/kernel.h>
 
 #include <asm/mmu_context.h>
+#include <asm/uv/uv.h>
 #include <asm/uv/uv_mmrs.h>
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_bau.h>
-#include <asm/genapic.h>
+#include <asm/apic.h>
 #include <asm/idle.h>
 #include <asm/tsc.h>
 #include <asm/irq_vectors.h>
 
-#include <mach_apic.h>
-
 static struct bau_control	**uv_bau_table_bases __read_mostly;
 static int			uv_bau_retry_limit __read_mostly;
 
@@ -210,14 +209,15 @@ static int uv_wait_completion(struct bau_desc *bau_desc,
  *
  * Send a broadcast and wait for a broadcast message to complete.
  *
- * The cpumaskp mask contains the cpus the broadcast was sent to.
+ * The flush_mask contains the cpus the broadcast was sent to.
  *
- * Returns 1 if all remote flushing was done. The mask is zeroed.
- * Returns 0 if some remote flushing remains to be done. The mask is left
- * unchanged.
+ * Returns NULL if all remote flushing was done. The mask is zeroed.
+ * Returns @flush_mask if some remote flushing remains to be done. The
+ * mask will have some bits still set.
  */
-int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
-			   cpumask_t *cpumaskp)
+const struct cpumask *uv_flush_send_and_wait(int cpu, int this_blade,
+					     struct bau_desc *bau_desc,
+					     struct cpumask *flush_mask)
 {
 	int completion_status = 0;
 	int right_shift;
@@ -257,66 +257,74 @@ int uv_flush_send_and_wait(int cpu, int this_blade, struct bau_desc *bau_desc,
 		 * the cpu's, all of which are still in the mask.
 		 */
 		__get_cpu_var(ptcstats).ptc_i++;
-		return 0;
+		return flush_mask;
 	}
 
 	/*
 	 * Success, so clear the remote cpu's from the mask so we don't
 	 * use the IPI method of shootdown on them.
 	 */
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, flush_mask) {
 		blade = uv_cpu_to_blade_id(bit);
 		if (blade == this_blade)
 			continue;
-		cpu_clear(bit, *cpumaskp);
+		cpumask_clear_cpu(bit, flush_mask);
 	}
-	if (!cpus_empty(*cpumaskp))
-		return 0;
-	return 1;
+	if (!cpumask_empty(flush_mask))
+		return flush_mask;
+	return NULL;
 }
 
 /**
  * uv_flush_tlb_others - globally purge translation cache of a virtual
  * address or all TLB's
- * @cpumaskp: mask of all cpu's in which the address is to be removed
+ * @cpumask: mask of all cpu's in which the address is to be removed
  * @mm: mm_struct containing virtual address range
  * @va: virtual address to be removed (or TLB_FLUSH_ALL for all TLB's on cpu)
+ * @cpu: the current cpu
  *
  * This is the entry point for initiating any UV global TLB shootdown.
  *
  * Purges the translation caches of all specified processors of the given
  * virtual address, or purges all TLB's on specified processors.
  *
- * The caller has derived the cpumaskp from the mm_struct and has subtracted
- * the local cpu from the mask.  This function is called only if there
- * are bits set in the mask. (e.g. flush_tlb_page())
+ * The caller has derived the cpumask from the mm_struct.  This function
+ * is called only if there are bits set in the mask. (e.g. flush_tlb_page())
  *
- * The cpumaskp is converted into a nodemask of the nodes containing
+ * The cpumask is converted into a nodemask of the nodes containing
  * the cpus.
  *
- * Returns 1 if all remote flushing was done.
- * Returns 0 if some remote flushing remains to be done.
+ * Note that this function should be called with preemption disabled.
+ *
+ * Returns NULL if all remote flushing was done.
+ * Returns pointer to cpumask if some remote flushing remains to be
+ * done.  The returned pointer is valid till preemption is re-enabled.
  */
-int uv_flush_tlb_others(cpumask_t *cpumaskp, struct mm_struct *mm,
-			unsigned long va)
+const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
+					  struct mm_struct *mm,
+					  unsigned long va, unsigned int cpu)
 {
+	static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
+	struct cpumask *flush_mask = &__get_cpu_var(flush_tlb_mask);
 	int i;
 	int bit;
 	int blade;
-	int cpu;
+	int uv_cpu;
 	int this_blade;
 	int locals = 0;
 	struct bau_desc *bau_desc;
 
-	cpu = uv_blade_processor_id();
+	cpumask_andnot(flush_mask, cpumask, cpumask_of(cpu));
+
+	uv_cpu = uv_blade_processor_id();
 	this_blade = uv_numa_blade_id();
 	bau_desc = __get_cpu_var(bau_control).descriptor_base;
-	bau_desc += UV_ITEMS_PER_DESCRIPTOR * cpu;
+	bau_desc += UV_ITEMS_PER_DESCRIPTOR * uv_cpu;
 
 	bau_nodes_clear(&bau_desc->distribution, UV_DISTRIBUTION_SIZE);
 
 	i = 0;
-	for_each_cpu_mask(bit, *cpumaskp) {
+	for_each_cpu(bit, flush_mask) {
 		blade = uv_cpu_to_blade_id(bit);
 		BUG_ON(blade > (UV_DISTRIBUTION_SIZE - 1));
 		if (blade == this_blade) {
@@ -331,17 +339,17 @@ int uv_flush_tlb_others(cpumask_t *cpumaskp, struct mm_struct *mm,
 		 * no off_node flushing; return status for local node
 		 */
 		if (locals)
-			return 0;
+			return flush_mask;
 		else
-			return 1;
+			return NULL;
 	}
 	__get_cpu_var(ptcstats).requestor++;
 	__get_cpu_var(ptcstats).ntargeted += i;
 
 	bau_desc->payload.address = va;
-	bau_desc->payload.sending_cpu = smp_processor_id();
+	bau_desc->payload.sending_cpu = cpu;
 
-	return uv_flush_send_and_wait(cpu, this_blade, bau_desc, cpumaskp);
+	return uv_flush_send_and_wait(uv_cpu, this_blade, bau_desc, flush_mask);
 }
 
 /*
diff --git a/arch/x86/kernel/trampoline_32.S b/arch/x86/kernel/trampoline_32.S
index d8ccc3c6552f..66d874e5404c 100644
--- a/arch/x86/kernel/trampoline_32.S
+++ b/arch/x86/kernel/trampoline_32.S
@@ -29,7 +29,7 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 
 /* We can free up trampoline after bootup if cpu hotplug is not supported. */
 #ifndef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/trampoline_64.S b/arch/x86/kernel/trampoline_64.S
index 894293c598db..cddfb8d386b9 100644
--- a/arch/x86/kernel/trampoline_64.S
+++ b/arch/x86/kernel/trampoline_64.S
@@ -25,10 +25,11 @@
  */
 
 #include <linux/linkage.h>
-#include <asm/pgtable.h>
-#include <asm/page.h>
+#include <asm/pgtable_types.h>
+#include <asm/page_types.h>
 #include <asm/msr.h>
 #include <asm/segment.h>
+#include <asm/processor-flags.h>
 
 .section .rodata, "a", @progbits
 
@@ -37,7 +38,7 @@
 ENTRY(trampoline_data)
 r_base = .
 	cli			# We should be safe anyway
-	wbinvd	
+	wbinvd
 	mov	%cs, %ax	# Code and data in the same place
 	mov	%ax, %ds
 	mov	%ax, %es
@@ -73,9 +74,8 @@ r_base = .
 	lidtl	tidt - r_base	# load idt with 0, 0
 	lgdtl	tgdt - r_base	# load gdt with whatever is appropriate
 
-	xor	%ax, %ax
-	inc	%ax		# protected mode (PE) bit
-	lmsw	%ax		# into protected mode
+	mov	$X86_CR0_PE, %ax	# protected mode (PE) bit
+	lmsw	%ax			# into protected mode
 
 	# flush prefetch and jump to startup_32
 	ljmpl	*(startup_32_vector - r_base)
@@ -86,9 +86,8 @@ startup_32:
 	movl	$__KERNEL_DS, %eax	# Initialize the %ds segment register
 	movl	%eax, %ds
 
-	xorl	%eax, %eax
-	btsl	$5, %eax		# Enable PAE mode
-	movl	%eax, %cr4
+	movl	$X86_CR4_PAE, %eax
+	movl	%eax, %cr4		# Enable PAE mode
 
 					# Setup trampoline 4 level pagetables
 	leal	(trampoline_level4_pgt - r_base)(%esi), %eax
@@ -99,9 +98,9 @@ startup_32:
 	xorl	%edx, %edx
 	wrmsr
 
-	xorl	%eax, %eax
-	btsl	$31, %eax		# Enable paging and in turn activate Long Mode
-	btsl	$0, %eax		# Enable protected mode
+	# Enable paging and in turn activate Long Mode
+	# Enable protected mode
+	movl	$(X86_CR0_PG | X86_CR0_PE), %eax
 	movl	%eax, %cr0
 
 	/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a9e7548e1790..a1d288327ff0 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -54,15 +54,14 @@
 #include <asm/desc.h>
 #include <asm/i387.h>
 
-#include <mach_traps.h>
+#include <asm/mach_traps.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/pgalloc.h>
 #include <asm/proto.h>
-#include <asm/pda.h>
 #else
 #include <asm/processor-flags.h>
-#include <asm/arch_hooks.h>
+#include <asm/setup.h>
 #include <asm/traps.h>
 
 #include "cpu/mcheck/mce.h"
@@ -119,47 +118,6 @@ die_if_kernel(const char *str, struct pt_regs *regs, long err)
 	if (!user_mode_vm(regs))
 		die(str, regs, err);
 }
-
-/*
- * Perform the lazy TSS's I/O bitmap copy. If the TSS has an
- * invalid offset set (the LAZY one) and the faulting thread has
- * a valid I/O bitmap pointer, we copy the I/O bitmap in the TSS,
- * we set the offset field correctly and return 1.
- */
-static int lazy_iobitmap_copy(void)
-{
-	struct thread_struct *thread;
-	struct tss_struct *tss;
-	int cpu;
-
-	cpu = get_cpu();
-	tss = &per_cpu(init_tss, cpu);
-	thread = &current->thread;
-
-	if (tss->x86_tss.io_bitmap_base == INVALID_IO_BITMAP_OFFSET_LAZY &&
-	    thread->io_bitmap_ptr) {
-		memcpy(tss->io_bitmap, thread->io_bitmap_ptr,
-		       thread->io_bitmap_max);
-		/*
-		 * If the previously set map was extending to higher ports
-		 * than the current one, pad extra space with 0xff (no access).
-		 */
-		if (thread->io_bitmap_max < tss->io_bitmap_max) {
-			memset((char *) tss->io_bitmap +
-				thread->io_bitmap_max, 0xff,
-				tss->io_bitmap_max - thread->io_bitmap_max);
-		}
-		tss->io_bitmap_max = thread->io_bitmap_max;
-		tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET;
-		tss->io_bitmap_owner = thread;
-		put_cpu();
-
-		return 1;
-	}
-	put_cpu();
-
-	return 0;
-}
 #endif
 
 static void __kprobes
@@ -310,11 +268,6 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	conditional_sti(regs);
 
 #ifdef CONFIG_X86_32
-	if (lazy_iobitmap_copy()) {
-		/* restart the faulting instruction */
-		return;
-	}
-
 	if (regs->flags & X86_VM_MASK)
 		goto gp_in_vm86;
 #endif
@@ -914,19 +867,20 @@ void math_emulate(struct math_emu_info *info)
 }
 #endif /* CONFIG_MATH_EMULATION */
 
-dotraplinkage void __kprobes do_device_not_available(struct pt_regs regs)
+dotraplinkage void __kprobes
+do_device_not_available(struct pt_regs *regs, long error_code)
 {
 #ifdef CONFIG_X86_32
 	if (read_cr0() & X86_CR0_EM) {
 		struct math_emu_info info = { };
 
-		conditional_sti(&regs);
+		conditional_sti(regs);
 
-		info.regs = &regs;
+		info.regs = regs;
 		math_emulate(&info);
 	} else {
 		math_state_restore(); /* interrupts still off */
-		conditional_sti(&regs);
+		conditional_sti(regs);
 	}
 #else
 	math_state_restore();
@@ -942,7 +896,7 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
 	info.si_signo = SIGILL;
 	info.si_errno = 0;
 	info.si_code = ILL_BADSTK;
-	info.si_addr = 0;
+	info.si_addr = NULL;
 	if (notify_die(DIE_TRAP, "iret exception",
 			regs, error_code, 32, SIGILL) == NOTIFY_STOP)
 		return;
@@ -1026,6 +980,6 @@ void __init trap_init(void)
 	cpu_init();
 
 #ifdef CONFIG_X86_32
-	trap_init_hook();
+	x86_quirk_trap_init();
 #endif
 }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 08afa1579e6d..7a567ebe6361 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -791,7 +791,7 @@ __cpuinit int unsynchronized_tsc(void)
 	if (!cpu_has_tsc || tsc_unstable)
 		return 1;
 
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 	if (apic_is_clustered_box())
 		return 1;
 #endif
diff --git a/arch/x86/kernel/visws_quirks.c b/arch/x86/kernel/visws_quirks.c
index d801d06af068..191a876e9e87 100644
--- a/arch/x86/kernel/visws_quirks.c
+++ b/arch/x86/kernel/visws_quirks.c
@@ -24,18 +24,14 @@
 
 #include <asm/visws/cobalt.h>
 #include <asm/visws/piix4.h>
-#include <asm/arch_hooks.h>
 #include <asm/io_apic.h>
 #include <asm/fixmap.h>
 #include <asm/reboot.h>
 #include <asm/setup.h>
+#include <asm/apic.h>
 #include <asm/e820.h>
 #include <asm/io.h>
 
-#include <mach_ipi.h>
-
-#include "mach_apic.h"
-
 #include <linux/kernel_stat.h>
 
 #include <asm/i8259.h>
@@ -49,8 +45,6 @@
 
 extern int no_broadcast;
 
-#include <asm/apic.h>
-
 char visws_board_type	= -1;
 char visws_board_rev	= -1;
 
@@ -200,7 +194,7 @@ static void __init MP_processor_info(struct mpc_cpu *m)
 		return;
 	}
 
-	apic_cpus = apicid_to_cpu_present(m->apicid);
+	apic_cpus = apic->apicid_to_cpu_present(m->apicid);
 	physids_or(phys_cpu_present_map, phys_cpu_present_map, apic_cpus);
 	/*
 	 * Validate version
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 4eeb5cf9720d..d7ac84e7fc1c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -158,7 +158,7 @@ struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
 	ret = KVM86->regs32;
 
 	ret->fs = current->thread.saved_fs;
-	loadsegment(gs, current->thread.saved_gs);
+	set_user_gs(ret, current->thread.saved_gs);
 
 	return ret;
 }
@@ -197,9 +197,9 @@ out:
 static int do_vm86_irq_handling(int subfunction, int irqnumber);
 static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk);
 
-asmlinkage int sys_vm86old(struct pt_regs regs)
+int sys_vm86old(struct pt_regs *regs)
 {
-	struct vm86_struct __user *v86 = (struct vm86_struct __user *)regs.bx;
+	struct vm86_struct __user *v86 = (struct vm86_struct __user *)regs->bx;
 	struct kernel_vm86_struct info; /* declare this _on top_,
 					 * this avoids wasting of stack space.
 					 * This remains on the stack until we
@@ -218,7 +218,7 @@ asmlinkage int sys_vm86old(struct pt_regs regs)
 	if (tmp)
 		goto out;
 	memset(&info.vm86plus, 0, (int)&info.regs32 - (int)&info.vm86plus);
-	info.regs32 = &regs;
+	info.regs32 = regs;
 	tsk->thread.vm86_info = v86;
 	do_sys_vm86(&info, tsk);
 	ret = 0;	/* we never return here */
@@ -227,7 +227,7 @@ out:
 }
 
 
-asmlinkage int sys_vm86(struct pt_regs regs)
+int sys_vm86(struct pt_regs *regs)
 {
 	struct kernel_vm86_struct info; /* declare this _on top_,
 					 * this avoids wasting of stack space.
@@ -239,12 +239,12 @@ asmlinkage int sys_vm86(struct pt_regs regs)
 	struct vm86plus_struct __user *v86;
 
 	tsk = current;
-	switch (regs.bx) {
+	switch (regs->bx) {
 	case VM86_REQUEST_IRQ:
 	case VM86_FREE_IRQ:
 	case VM86_GET_IRQ_BITS:
 	case VM86_GET_AND_RESET_IRQ:
-		ret = do_vm86_irq_handling(regs.bx, (int)regs.cx);
+		ret = do_vm86_irq_handling(regs->bx, (int)regs->cx);
 		goto out;
 	case VM86_PLUS_INSTALL_CHECK:
 		/*
@@ -261,14 +261,14 @@ asmlinkage int sys_vm86(struct pt_regs regs)
 	ret = -EPERM;
 	if (tsk->thread.saved_sp0)
 		goto out;
-	v86 = (struct vm86plus_struct __user *)regs.cx;
+	v86 = (struct vm86plus_struct __user *)regs->cx;
 	tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
 				       offsetof(struct kernel_vm86_struct, regs32) -
 				       sizeof(info.regs));
 	ret = -EFAULT;
 	if (tmp)
 		goto out;
-	info.regs32 = &regs;
+	info.regs32 = regs;
 	info.vm86plus.is_vm86pus = 1;
 	tsk->thread.vm86_info = (struct vm86_struct __user *)v86;
 	do_sys_vm86(&info, tsk);
@@ -323,7 +323,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
 	info->regs32->ax = 0;
 	tsk->thread.saved_sp0 = tsk->thread.sp0;
 	tsk->thread.saved_fs = info->regs32->fs;
-	savesegment(gs, tsk->thread.saved_gs);
+	tsk->thread.saved_gs = get_user_gs(info->regs32);
 
 	tss = &per_cpu(init_tss, get_cpu());
 	tsk->thread.sp0 = (unsigned long) &info->VM86_TSS_ESP0;
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index bef58b4982db..2cc4a90e2cb3 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -680,10 +680,11 @@ static inline int __init activate_vmi(void)
 	para_fill(pv_mmu_ops.write_cr2, SetCR2);
 	para_fill(pv_mmu_ops.write_cr3, SetCR3);
 	para_fill(pv_cpu_ops.write_cr4, SetCR4);
-	para_fill(pv_irq_ops.save_fl, GetInterruptMask);
-	para_fill(pv_irq_ops.restore_fl, SetInterruptMask);
-	para_fill(pv_irq_ops.irq_disable, DisableInterrupts);
-	para_fill(pv_irq_ops.irq_enable, EnableInterrupts);
+
+	para_fill(pv_irq_ops.save_fl.func, GetInterruptMask);
+	para_fill(pv_irq_ops.restore_fl.func, SetInterruptMask);
+	para_fill(pv_irq_ops.irq_disable.func, DisableInterrupts);
+	para_fill(pv_irq_ops.irq_enable.func, EnableInterrupts);
 
 	para_fill(pv_cpu_ops.wbinvd, WBINVD);
 	para_fill(pv_cpu_ops.read_tsc, RDTSC);
@@ -797,8 +798,8 @@ static inline int __init activate_vmi(void)
 #endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
-       para_fill(apic_ops->read, APICRead);
-       para_fill(apic_ops->write, APICWrite);
+       para_fill(apic->read, APICRead);
+       para_fill(apic->write, APICWrite);
 #endif
 
 	/*
diff --git a/arch/x86/kernel/vmiclock_32.c b/arch/x86/kernel/vmiclock_32.c
index e5b088fffa40..33a788d5879c 100644
--- a/arch/x86/kernel/vmiclock_32.c
+++ b/arch/x86/kernel/vmiclock_32.c
@@ -28,7 +28,6 @@
 
 #include <asm/vmi.h>
 #include <asm/vmi_time.h>
-#include <asm/arch_hooks.h>
 #include <asm/apicdef.h>
 #include <asm/apic.h>
 #include <asm/timer.h>
@@ -256,7 +255,7 @@ void __devinit vmi_time_bsp_init(void)
 	 */
 	clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
 	local_irq_disable();
-#ifdef CONFIG_X86_SMP
+#ifdef CONFIG_SMP
 	/*
 	 * XXX handle_percpu_irq only defined for SMP; we need to switch over
 	 * to using it, since this is a local interrupt, which each CPU must
@@ -288,8 +287,7 @@ static struct clocksource clocksource_vmi;
 static cycle_t read_real_cycles(void)
 {
 	cycle_t ret = (cycle_t)vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL);
-	return ret >= clocksource_vmi.cycle_last ?
-		ret : clocksource_vmi.cycle_last;
+	return max(ret, clocksource_vmi.cycle_last);
 }
 
 static struct clocksource clocksource_vmi = {
diff --git a/arch/x86/kernel/vmlinux_32.lds.S b/arch/x86/kernel/vmlinux_32.lds.S
index 82c67559dde7..0d860963f268 100644
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -12,7 +12,7 @@
 
 #include <asm-generic/vmlinux.lds.h>
 #include <asm/thread_info.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/cache.h>
 #include <asm/boot.h>
 
@@ -178,14 +178,7 @@ SECTIONS
 	__initramfs_end = .;
   }
 #endif
-  . = ALIGN(PAGE_SIZE);
-  .data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
-	__per_cpu_start = .;
-	*(.data.percpu.page_aligned)
-	*(.data.percpu)
-	*(.data.percpu.shared_aligned)
-	__per_cpu_end = .;
-  }
+  PERCPU(PAGE_SIZE)
   . = ALIGN(PAGE_SIZE);
   /* freed after init ends here */
 
diff --git a/arch/x86/kernel/vmlinux_64.lds.S b/arch/x86/kernel/vmlinux_64.lds.S
index 1a614c0e6bef..fbfced6f6800 100644
--- a/arch/x86/kernel/vmlinux_64.lds.S
+++ b/arch/x86/kernel/vmlinux_64.lds.S
@@ -5,7 +5,8 @@
 #define LOAD_OFFSET __START_KERNEL_map
 
 #include <asm-generic/vmlinux.lds.h>
-#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/page_types.h>
 
 #undef i386	/* in case the preprocessor is a 32bit one */
 
@@ -13,12 +14,15 @@ OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
 OUTPUT_ARCH(i386:x86-64)
 ENTRY(phys_startup_64)
 jiffies_64 = jiffies;
-_proxy_pda = 1;
 PHDRS {
 	text PT_LOAD FLAGS(5);	/* R_E */
 	data PT_LOAD FLAGS(7);	/* RWE */
 	user PT_LOAD FLAGS(7);	/* RWE */
 	data.init PT_LOAD FLAGS(7);	/* RWE */
+#ifdef CONFIG_SMP
+	percpu PT_LOAD FLAGS(7);	/* RWE */
+#endif
+	data.init2 PT_LOAD FLAGS(7);	/* RWE */
 	note PT_NOTE FLAGS(0);	/* ___ */
 }
 SECTIONS
@@ -208,14 +212,28 @@ SECTIONS
   __initramfs_end = .;
 #endif
 
+#ifdef CONFIG_SMP
+  /*
+   * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
+   * output PHDR, so the next output section - __data_nosave - should
+   * start another section data.init2.  Also, pda should be at the head of
+   * percpu area.  Preallocate it and define the percpu offset symbol
+   * so that it can be accessed as a percpu variable.
+   */
+  . = ALIGN(PAGE_SIZE);
+  PERCPU_VADDR(0, :percpu)
+#else
   PERCPU(PAGE_SIZE)
+#endif
 
   . = ALIGN(PAGE_SIZE);
   __init_end = .;
 
   . = ALIGN(PAGE_SIZE);
   __nosave_begin = .;
-  .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) { *(.data.nosave) }
+  .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) {
+      *(.data.nosave)
+  } :data.init2 /* use another section data.init2, see PERCPU_VADDR() above */
   . = ALIGN(PAGE_SIZE);
   __nosave_end = .;
 
@@ -239,8 +257,21 @@ SECTIONS
   DWARF_DEBUG
 }
 
+ /*
+  * Per-cpu symbols which need to be offset from __per_cpu_load
+  * for the boot processor.
+  */
+#define INIT_PER_CPU(x) init_per_cpu__##x = per_cpu__##x + __per_cpu_load
+INIT_PER_CPU(gdt_page);
+INIT_PER_CPU(irq_stack_union);
+
 /*
  * Build-time check on the image size:
  */
 ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
 	"kernel image bigger than KERNEL_IMAGE_SIZE")
+
+#ifdef CONFIG_SMP
+ASSERT((per_cpu__irq_stack_union == 0),
+        "irq_stack_union is not at start of per-cpu area");
+#endif
diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
index a688f3bfaec2..74de562812cc 100644
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -22,7 +22,7 @@
 #include <asm/paravirt.h>
 #include <asm/setup.h>
 
-#if defined CONFIG_PCI && defined CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT
 /*
  * Interrupt control on vSMPowered systems:
  * ~AC is a shadow of IF.  If IF is 'on' AC should be 'off'
@@ -37,6 +37,7 @@ static unsigned long vsmp_save_fl(void)
 		flags &= ~X86_EFLAGS_IF;
 	return flags;
 }
+PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
 
 static void vsmp_restore_fl(unsigned long flags)
 {
@@ -46,6 +47,7 @@ static void vsmp_restore_fl(unsigned long flags)
 		flags |= X86_EFLAGS_AC;
 	native_restore_fl(flags);
 }
+PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
 
 static void vsmp_irq_disable(void)
 {
@@ -53,6 +55,7 @@ static void vsmp_irq_disable(void)
 
 	native_restore_fl((flags & ~X86_EFLAGS_IF) | X86_EFLAGS_AC);
 }
+PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
 
 static void vsmp_irq_enable(void)
 {
@@ -60,6 +63,7 @@ static void vsmp_irq_enable(void)
 
 	native_restore_fl((flags | X86_EFLAGS_IF) & (~X86_EFLAGS_AC));
 }
+PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_enable);
 
 static unsigned __init_or_module vsmp_patch(u8 type, u16 clobbers, void *ibuf,
 				  unsigned long addr, unsigned len)
@@ -90,10 +94,10 @@ static void __init set_vsmp_pv_ops(void)
 	       cap, ctl);
 	if (cap & ctl & (1 << 4)) {
 		/* Setup irq ops and turn on vSMP  IRQ fastpath handling */
-		pv_irq_ops.irq_disable = vsmp_irq_disable;
-		pv_irq_ops.irq_enable  = vsmp_irq_enable;
-		pv_irq_ops.save_fl  = vsmp_save_fl;
-		pv_irq_ops.restore_fl  = vsmp_restore_fl;
+		pv_irq_ops.irq_disable = PV_CALLEE_SAVE(vsmp_irq_disable);
+		pv_irq_ops.irq_enable  = PV_CALLEE_SAVE(vsmp_irq_enable);
+		pv_irq_ops.save_fl  = PV_CALLEE_SAVE(vsmp_save_fl);
+		pv_irq_ops.restore_fl  = PV_CALLEE_SAVE(vsmp_restore_fl);
 		pv_init_ops.patch = vsmp_patch;
 
 		ctl &= ~(1 << 4);
@@ -110,7 +114,6 @@ static void __init set_vsmp_pv_ops(void)
 }
 #endif
 
-#ifdef CONFIG_PCI
 static int is_vsmp = -1;
 
 static void __init detect_vsmp_box(void)
@@ -135,15 +138,6 @@ int is_vsmp_box(void)
 		return 0;
 	}
 }
-#else
-static void __init detect_vsmp_box(void)
-{
-}
-int is_vsmp_box(void)
-{
-	return 0;
-}
-#endif
 
 void __init vsmp_init(void)
 {
diff --git a/arch/x86/kernel/x8664_ksyms_64.c b/arch/x86/kernel/x8664_ksyms_64.c
index 695e426aa354..3909e3ba5ce3 100644
--- a/arch/x86/kernel/x8664_ksyms_64.c
+++ b/arch/x86/kernel/x8664_ksyms_64.c
@@ -58,5 +58,3 @@ EXPORT_SYMBOL(__memcpy);
 EXPORT_SYMBOL(empty_zero_page);
 EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
-
-EXPORT_SYMBOL(_proxy_pda);
diff --git a/arch/x86/lguest/Kconfig b/arch/x86/lguest/Kconfig
index c70e12b1a637..8dab8f7844d3 100644
--- a/arch/x86/lguest/Kconfig
+++ b/arch/x86/lguest/Kconfig
@@ -3,7 +3,6 @@ config LGUEST_GUEST
 	select PARAVIRT
 	depends on X86_32
 	depends on !X86_PAE
-	depends on !X86_VOYAGER
 	select VIRTIO
 	select VIRTIO_RING
 	select VIRTIO_CONSOLE
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 960a8d9c049c..9fe4ddaa8f6f 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -173,24 +173,29 @@ static unsigned long save_fl(void)
 {
 	return lguest_data.irq_enabled;
 }
+PV_CALLEE_SAVE_REGS_THUNK(save_fl);
 
 /* restore_flags() just sets the flags back to the value given. */
 static void restore_fl(unsigned long flags)
 {
 	lguest_data.irq_enabled = flags;
 }
+PV_CALLEE_SAVE_REGS_THUNK(restore_fl);
 
 /* Interrupts go off... */
 static void irq_disable(void)
 {
 	lguest_data.irq_enabled = 0;
 }
+PV_CALLEE_SAVE_REGS_THUNK(irq_disable);
 
 /* Interrupts go on... */
 static void irq_enable(void)
 {
 	lguest_data.irq_enabled = X86_EFLAGS_IF;
 }
+PV_CALLEE_SAVE_REGS_THUNK(irq_enable);
+
 /*:*/
 /*M:003 Note that we don't check for outstanding interrupts when we re-enable
  * them (or when we unmask an interrupt).  This seems to work for the moment,
@@ -278,7 +283,7 @@ static void lguest_load_tls(struct thread_struct *t, unsigned int cpu)
 	/* There's one problem which normal hardware doesn't have: the Host
 	 * can't handle us removing entries we're currently using.  So we clear
 	 * the GS register here: if it's needed it'll be reloaded anyway. */
-	loadsegment(gs, 0);
+	lazy_load_gs(0);
 	lazy_hcall(LHCALL_LOAD_TLS, __pa(&t->tls_array), cpu, 0);
 }
 
@@ -830,13 +835,14 @@ static u32 lguest_apic_safe_wait_icr_idle(void)
 	return 0;
 }
 
-static struct apic_ops lguest_basic_apic_ops = {
-	.read = lguest_apic_read,
-	.write = lguest_apic_write,
-	.icr_read = lguest_apic_icr_read,
-	.icr_write = lguest_apic_icr_write,
-	.wait_icr_idle = lguest_apic_wait_icr_idle,
-	.safe_wait_icr_idle = lguest_apic_safe_wait_icr_idle,
+static void set_lguest_basic_apic_ops(void)
+{
+	apic->read = lguest_apic_read;
+	apic->write = lguest_apic_write;
+	apic->icr_read = lguest_apic_icr_read;
+	apic->icr_write = lguest_apic_icr_write;
+	apic->wait_icr_idle = lguest_apic_wait_icr_idle;
+	apic->safe_wait_icr_idle = lguest_apic_safe_wait_icr_idle;
 };
 #endif
 
@@ -991,10 +997,10 @@ __init void lguest_init(void)
 
 	/* interrupt-related operations */
 	pv_irq_ops.init_IRQ = lguest_init_IRQ;
-	pv_irq_ops.save_fl = save_fl;
-	pv_irq_ops.restore_fl = restore_fl;
-	pv_irq_ops.irq_disable = irq_disable;
-	pv_irq_ops.irq_enable = irq_enable;
+	pv_irq_ops.save_fl = PV_CALLEE_SAVE(save_fl);
+	pv_irq_ops.restore_fl = PV_CALLEE_SAVE(restore_fl);
+	pv_irq_ops.irq_disable = PV_CALLEE_SAVE(irq_disable);
+	pv_irq_ops.irq_enable = PV_CALLEE_SAVE(irq_enable);
 	pv_irq_ops.safe_halt = lguest_safe_halt;
 
 	/* init-time operations */
@@ -1037,7 +1043,7 @@ __init void lguest_init(void)
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	/* apic read/write intercepts */
-	apic_ops = &lguest_basic_apic_ops;
+	set_lguest_basic_apic_ops();
 #endif
 
 	/* time operations */
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index ad374003742f..51f1504cddd9 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -28,7 +28,7 @@
 
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/errno.h>
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
diff --git a/arch/x86/mach-default/Makefile b/arch/x86/mach-default/Makefile
deleted file mode 100644
index 012fe34459e6..000000000000
--- a/arch/x86/mach-default/Makefile
+++ /dev/null
@@ -1,5 +0,0 @@
-#
-# Makefile for the linux kernel.
-#
-
-obj-y				:= setup.o
diff --git a/arch/x86/mach-default/setup.c b/arch/x86/mach-default/setup.c
deleted file mode 100644
index 50b591871128..000000000000
--- a/arch/x86/mach-default/setup.c
+++ /dev/null
@@ -1,174 +0,0 @@
-/*
- *	Machine specific setup for generic
- */
-
-#include <linux/smp.h>
-#include <linux/init.h>
-#include <linux/interrupt.h>
-#include <asm/acpi.h>
-#include <asm/arch_hooks.h>
-#include <asm/e820.h>
-#include <asm/setup.h>
-
-#include <mach_ipi.h>
-
-#ifdef CONFIG_HOTPLUG_CPU
-#define DEFAULT_SEND_IPI	(1)
-#else
-#define DEFAULT_SEND_IPI	(0)
-#endif
-
-int no_broadcast = DEFAULT_SEND_IPI;
-
-/**
- * pre_intr_init_hook - initialisation prior to setting up interrupt vectors
- *
- * Description:
- *	Perform any necessary interrupt initialisation prior to setting up
- *	the "ordinary" interrupt call gates.  For legacy reasons, the ISA
- *	interrupts should be initialised here if the machine emulates a PC
- *	in any way.
- **/
-void __init pre_intr_init_hook(void)
-{
-	if (x86_quirks->arch_pre_intr_init) {
-		if (x86_quirks->arch_pre_intr_init())
-			return;
-	}
-	init_ISA_irqs();
-}
-
-/*
- * IRQ2 is cascade interrupt to second interrupt controller
- */
-static struct irqaction irq2 = {
-	.handler = no_action,
-	.mask = CPU_MASK_NONE,
-	.name = "cascade",
-};
-
-/**
- * intr_init_hook - post gate setup interrupt initialisation
- *
- * Description:
- *	Fill in any interrupts that may have been left out by the general
- *	init_IRQ() routine.  interrupts having to do with the machine rather
- *	than the devices on the I/O bus (like APIC interrupts in intel MP
- *	systems) are started here.
- **/
-void __init intr_init_hook(void)
-{
-	if (x86_quirks->arch_intr_init) {
-		if (x86_quirks->arch_intr_init())
-			return;
-	}
-	if (!acpi_ioapic)
-		setup_irq(2, &irq2);
-
-}
-
-/**
- * pre_setup_arch_hook - hook called prior to any setup_arch() execution
- *
- * Description:
- *	generally used to activate any machine specific identification
- *	routines that may be needed before setup_arch() runs.  On Voyager
- *	this is used to get the board revision and type.
- **/
-void __init pre_setup_arch_hook(void)
-{
-}
-
-/**
- * trap_init_hook - initialise system specific traps
- *
- * Description:
- *	Called as the final act of trap_init().  Used in VISWS to initialise
- *	the various board specific APIC traps.
- **/
-void __init trap_init_hook(void)
-{
-	if (x86_quirks->arch_trap_init) {
-		if (x86_quirks->arch_trap_init())
-			return;
-	}
-}
-
-static struct irqaction irq0  = {
-	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
-	.mask = CPU_MASK_NONE,
-	.name = "timer"
-};
-
-/**
- * pre_time_init_hook - do any specific initialisations before.
- *
- **/
-void __init pre_time_init_hook(void)
-{
-	if (x86_quirks->arch_pre_time_init)
-		x86_quirks->arch_pre_time_init();
-}
-
-/**
- * time_init_hook - do any specific initialisations for the system timer.
- *
- * Description:
- *	Must plug the system timer interrupt source at HZ into the IRQ listed
- *	in irq_vectors.h:TIMER_IRQ
- **/
-void __init time_init_hook(void)
-{
-	if (x86_quirks->arch_time_init) {
-		/*
-		 * A nonzero return code does not mean failure, it means
-		 * that the architecture quirk does not want any
-		 * generic (timer) setup to be performed after this:
-		 */
-		if (x86_quirks->arch_time_init())
-			return;
-	}
-
-	irq0.mask = cpumask_of_cpu(0);
-	setup_irq(0, &irq0);
-}
-
-#ifdef CONFIG_MCA
-/**
- * mca_nmi_hook - hook into MCA specific NMI chain
- *
- * Description:
- *	The MCA (Microchannel Architecture) has an NMI chain for NMI sources
- *	along the MCA bus.  Use this to hook into that chain if you will need
- *	it.
- **/
-void mca_nmi_hook(void)
-{
-	/*
-	 * If I recall correctly, there's a whole bunch of other things that
-	 * we can do to check for NMI problems, but that's all I know about
-	 * at the moment.
-	 */
-	pr_warning("NMI generated from unknown source!\n");
-}
-#endif
-
-static __init int no_ipi_broadcast(char *str)
-{
-	get_option(&str, &no_broadcast);
-	pr_info("Using %s mode\n",
-		no_broadcast ? "No IPI Broadcast" : "IPI Broadcast");
-	return 1;
-}
-__setup("no_ipi_broadcast=", no_ipi_broadcast);
-
-static int __init print_ipi_mode(void)
-{
-	pr_info("Using IPI %s mode\n",
-		no_broadcast ? "No-Shortcut" : "Shortcut");
-	return 0;
-}
-
-late_initcall(print_ipi_mode);
-
diff --git a/arch/x86/mach-generic/Makefile b/arch/x86/mach-generic/Makefile
deleted file mode 100644
index 6730f4e7c744..000000000000
--- a/arch/x86/mach-generic/Makefile
+++ /dev/null
@@ -1,11 +0,0 @@
-#
-# Makefile for the generic architecture
-#
-
-EXTRA_CFLAGS			:= -Iarch/x86/kernel
-
-obj-y				:= probe.o default.o
-obj-$(CONFIG_X86_NUMAQ)		+= numaq.o
-obj-$(CONFIG_X86_SUMMIT)	+= summit.o
-obj-$(CONFIG_X86_BIGSMP)	+= bigsmp.o
-obj-$(CONFIG_X86_ES7000)	+= es7000.o
diff --git a/arch/x86/mach-generic/bigsmp.c b/arch/x86/mach-generic/bigsmp.c
deleted file mode 100644
index bc4c7840b2a8..000000000000
--- a/arch/x86/mach-generic/bigsmp.c
+++ /dev/null
@@ -1,60 +0,0 @@
-/*
- * APIC driver for "bigsmp" XAPIC machines with more than 8 virtual CPUs.
- * Drives the local APIC in "clustered mode".
- */
-#define APIC_DEFINITION 1
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <asm/mpspec.h>
-#include <asm/genapic.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <linux/kernel.h>
-#include <linux/init.h>
-#include <linux/dmi.h>
-#include <asm/bigsmp/apicdef.h>
-#include <linux/smp.h>
-#include <asm/bigsmp/apic.h>
-#include <asm/bigsmp/ipi.h>
-#include <asm/mach-default/mach_mpparse.h>
-#include <asm/mach-default/mach_wakecpu.h>
-
-static int dmi_bigsmp; /* can be set by dmi scanners */
-
-static int hp_ht_bigsmp(const struct dmi_system_id *d)
-{
-	printk(KERN_NOTICE "%s detected: force use of apic=bigsmp\n", d->ident);
-	dmi_bigsmp = 1;
-	return 0;
-}
-
-
-static const struct dmi_system_id bigsmp_dmi_table[] = {
-	{ hp_ht_bigsmp, "HP ProLiant DL760 G2",
-	{ DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
-	DMI_MATCH(DMI_BIOS_VERSION, "P44-"),}
-	},
-
-	{ hp_ht_bigsmp, "HP ProLiant DL740",
-	{ DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
-	DMI_MATCH(DMI_BIOS_VERSION, "P47-"),}
-	},
-	 { }
-};
-
-static void vector_allocation_domain(int cpu, cpumask_t *retmask)
-{
-	cpus_clear(*retmask);
-	cpu_set(cpu, *retmask);
-}
-
-static int probe_bigsmp(void)
-{
-	if (def_to_bigsmp)
-		dmi_bigsmp = 1;
-	else
-		dmi_check_system(bigsmp_dmi_table);
-	return dmi_bigsmp;
-}
-
-struct genapic apic_bigsmp = APIC_INIT("bigsmp", probe_bigsmp);
diff --git a/arch/x86/mach-generic/default.c b/arch/x86/mach-generic/default.c
deleted file mode 100644
index e63a4a76d8cd..000000000000
--- a/arch/x86/mach-generic/default.c
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * Default generic APIC driver. This handles up to 8 CPUs.
- */
-#define APIC_DEFINITION 1
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <asm/mpspec.h>
-#include <asm/mach-default/mach_apicdef.h>
-#include <asm/genapic.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/smp.h>
-#include <linux/init.h>
-#include <asm/mach-default/mach_apic.h>
-#include <asm/mach-default/mach_ipi.h>
-#include <asm/mach-default/mach_mpparse.h>
-#include <asm/mach-default/mach_wakecpu.h>
-
-/* should be called last. */
-static int probe_default(void)
-{
-	return 1;
-}
-
-struct genapic apic_default = APIC_INIT("default", probe_default);
diff --git a/arch/x86/mach-generic/es7000.c b/arch/x86/mach-generic/es7000.c
deleted file mode 100644
index c2ded1448024..000000000000
--- a/arch/x86/mach-generic/es7000.c
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * APIC driver for the Unisys ES7000 chipset.
- */
-#define APIC_DEFINITION 1
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <asm/mpspec.h>
-#include <asm/genapic.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/init.h>
-#include <asm/es7000/apicdef.h>
-#include <linux/smp.h>
-#include <asm/es7000/apic.h>
-#include <asm/es7000/ipi.h>
-#include <asm/es7000/mpparse.h>
-#include <asm/mach-default/mach_wakecpu.h>
-
-void __init es7000_update_genapic_to_cluster(void)
-{
-	genapic->target_cpus = target_cpus_cluster;
-	genapic->int_delivery_mode = INT_DELIVERY_MODE_CLUSTER;
-	genapic->int_dest_mode = INT_DEST_MODE_CLUSTER;
-	genapic->no_balance_irq = NO_BALANCE_IRQ_CLUSTER;
-
-	genapic->init_apic_ldr = init_apic_ldr_cluster;
-
-	genapic->cpu_mask_to_apicid = cpu_mask_to_apicid_cluster;
-}
-
-static int probe_es7000(void)
-{
-	/* probed later in mptable/ACPI hooks */
-	return 0;
-}
-
-extern void es7000_sw_apic(void);
-static void __init enable_apic_mode(void)
-{
-	es7000_sw_apic();
-	return;
-}
-
-static __init int
-mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
-{
-	if (mpc->oemptr) {
-		struct mpc_oemtable *oem_table =
-			(struct mpc_oemtable *)mpc->oemptr;
-		if (!strncmp(oem, "UNISYS", 6))
-			return parse_unisys_oem((char *)oem_table);
-	}
-	return 0;
-}
-
-#ifdef CONFIG_ACPI
-/* Hook from generic ACPI tables.c */
-static int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	unsigned long oem_addr = 0;
-	int check_dsdt;
-	int ret = 0;
-
-	/* check dsdt at first to avoid clear fix_map for oem_addr */
-	check_dsdt = es7000_check_dsdt();
-
-	if (!find_unisys_acpi_oem_table(&oem_addr)) {
-		if (check_dsdt)
-			ret = parse_unisys_oem((char *)oem_addr);
-		else {
-			setup_unisys();
-			ret = 1;
-		}
-		/*
-		 * we need to unmap it
-		 */
-		unmap_unisys_acpi_oem_table(oem_addr);
-	}
-	return ret;
-}
-#else
-static int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	return 0;
-}
-#endif
-
-static void vector_allocation_domain(int cpu, cpumask_t *retmask)
-{
-	/* Careful. Some cpus do not strictly honor the set of cpus
-	 * specified in the interrupt destination when using lowest
-	 * priority interrupt delivery mode.
-	 *
-	 * In particular there was a hyperthreading cpu observed to
-	 * deliver interrupts to the wrong hyperthread when only one
-	 * hyperthread was specified in the interrupt desitination.
-	 */
-	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
-}
-
-struct genapic __initdata_refok apic_es7000 = APIC_INIT("es7000", probe_es7000);
diff --git a/arch/x86/mach-generic/numaq.c b/arch/x86/mach-generic/numaq.c
deleted file mode 100644
index 3679e2255645..000000000000
--- a/arch/x86/mach-generic/numaq.c
+++ /dev/null
@@ -1,53 +0,0 @@
-/*
- * APIC driver for the IBM NUMAQ chipset.
- */
-#define APIC_DEFINITION 1
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <asm/mpspec.h>
-#include <asm/genapic.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/init.h>
-#include <asm/numaq/apicdef.h>
-#include <linux/smp.h>
-#include <asm/numaq/apic.h>
-#include <asm/numaq/ipi.h>
-#include <asm/numaq/mpparse.h>
-#include <asm/numaq/wakecpu.h>
-#include <asm/numaq.h>
-
-static int mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
-{
-	numaq_mps_oem_check(mpc, oem, productid);
-	return found_numaq;
-}
-
-static int probe_numaq(void)
-{
-	/* already know from get_memcfg_numaq() */
-	return found_numaq;
-}
-
-/* Hook from generic ACPI tables.c */
-static int acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	return 0;
-}
-
-static void vector_allocation_domain(int cpu, cpumask_t *retmask)
-{
-	/* Careful. Some cpus do not strictly honor the set of cpus
-	 * specified in the interrupt destination when using lowest
-	 * priority interrupt delivery mode.
-	 *
-	 * In particular there was a hyperthreading cpu observed to
-	 * deliver interrupts to the wrong hyperthread when only one
-	 * hyperthread was specified in the interrupt desitination.
-	 */
-	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
-}
-
-struct genapic apic_numaq = APIC_INIT("NUMAQ", probe_numaq);
diff --git a/arch/x86/mach-generic/probe.c b/arch/x86/mach-generic/probe.c
deleted file mode 100644
index 15a38daef1a8..000000000000
--- a/arch/x86/mach-generic/probe.c
+++ /dev/null
@@ -1,152 +0,0 @@
-/*
- * Copyright 2003 Andi Kleen, SuSE Labs.
- * Subject to the GNU Public License, v.2
- *
- * Generic x86 APIC driver probe layer.
- */
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <linux/string.h>
-#include <linux/kernel.h>
-#include <linux/ctype.h>
-#include <linux/init.h>
-#include <linux/errno.h>
-#include <asm/fixmap.h>
-#include <asm/mpspec.h>
-#include <asm/apicdef.h>
-#include <asm/genapic.h>
-#include <asm/setup.h>
-
-extern struct genapic apic_numaq;
-extern struct genapic apic_summit;
-extern struct genapic apic_bigsmp;
-extern struct genapic apic_es7000;
-extern struct genapic apic_default;
-
-struct genapic *genapic = &apic_default;
-
-static struct genapic *apic_probe[] __initdata = {
-#ifdef CONFIG_X86_NUMAQ
-	&apic_numaq,
-#endif
-#ifdef CONFIG_X86_SUMMIT
-	&apic_summit,
-#endif
-#ifdef CONFIG_X86_BIGSMP
-	&apic_bigsmp,
-#endif
-#ifdef CONFIG_X86_ES7000
-	&apic_es7000,
-#endif
-	&apic_default,	/* must be last */
-	NULL,
-};
-
-static int cmdline_apic __initdata;
-static int __init parse_apic(char *arg)
-{
-	int i;
-
-	if (!arg)
-		return -EINVAL;
-
-	for (i = 0; apic_probe[i]; i++) {
-		if (!strcmp(apic_probe[i]->name, arg)) {
-			genapic = apic_probe[i];
-			cmdline_apic = 1;
-			return 0;
-		}
-	}
-
-	if (x86_quirks->update_genapic)
-		x86_quirks->update_genapic();
-
-	/* Parsed again by __setup for debug/verbose */
-	return 0;
-}
-early_param("apic", parse_apic);
-
-void __init generic_bigsmp_probe(void)
-{
-#ifdef CONFIG_X86_BIGSMP
-	/*
-	 * This routine is used to switch to bigsmp mode when
-	 * - There is no apic= option specified by the user
-	 * - generic_apic_probe() has chosen apic_default as the sub_arch
-	 * - we find more than 8 CPUs in acpi LAPIC listing with xAPIC support
-	 */
-
-	if (!cmdline_apic && genapic == &apic_default) {
-		if (apic_bigsmp.probe()) {
-			genapic = &apic_bigsmp;
-			if (x86_quirks->update_genapic)
-				x86_quirks->update_genapic();
-			printk(KERN_INFO "Overriding APIC driver with %s\n",
-			       genapic->name);
-		}
-	}
-#endif
-}
-
-void __init generic_apic_probe(void)
-{
-	if (!cmdline_apic) {
-		int i;
-		for (i = 0; apic_probe[i]; i++) {
-			if (apic_probe[i]->probe()) {
-				genapic = apic_probe[i];
-				break;
-			}
-		}
-		/* Not visible without early console */
-		if (!apic_probe[i])
-			panic("Didn't find an APIC driver");
-
-		if (x86_quirks->update_genapic)
-			x86_quirks->update_genapic();
-	}
-	printk(KERN_INFO "Using APIC driver %s\n", genapic->name);
-}
-
-/* These functions can switch the APIC even after the initial ->probe() */
-
-int __init mps_oem_check(struct mpc_table *mpc, char *oem, char *productid)
-{
-	int i;
-	for (i = 0; apic_probe[i]; ++i) {
-		if (apic_probe[i]->mps_oem_check(mpc, oem, productid)) {
-			if (!cmdline_apic) {
-				genapic = apic_probe[i];
-				if (x86_quirks->update_genapic)
-					x86_quirks->update_genapic();
-				printk(KERN_INFO "Switched to APIC driver `%s'.\n",
-				       genapic->name);
-			}
-			return 1;
-		}
-	}
-	return 0;
-}
-
-int __init acpi_madt_oem_check(char *oem_id, char *oem_table_id)
-{
-	int i;
-	for (i = 0; apic_probe[i]; ++i) {
-		if (apic_probe[i]->acpi_madt_oem_check(oem_id, oem_table_id)) {
-			if (!cmdline_apic) {
-				genapic = apic_probe[i];
-				if (x86_quirks->update_genapic)
-					x86_quirks->update_genapic();
-				printk(KERN_INFO "Switched to APIC driver `%s'.\n",
-				       genapic->name);
-			}
-			return 1;
-		}
-	}
-	return 0;
-}
-
-int hard_smp_processor_id(void)
-{
-	return genapic->get_apic_id(*(unsigned long *)(APIC_BASE+APIC_ID));
-}
diff --git a/arch/x86/mach-generic/summit.c b/arch/x86/mach-generic/summit.c
deleted file mode 100644
index 2821ffc188b5..000000000000
--- a/arch/x86/mach-generic/summit.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * APIC driver for the IBM "Summit" chipset.
- */
-#define APIC_DEFINITION 1
-#include <linux/threads.h>
-#include <linux/cpumask.h>
-#include <asm/mpspec.h>
-#include <asm/genapic.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <linux/kernel.h>
-#include <linux/string.h>
-#include <linux/init.h>
-#include <asm/summit/apicdef.h>
-#include <linux/smp.h>
-#include <asm/summit/apic.h>
-#include <asm/summit/ipi.h>
-#include <asm/summit/mpparse.h>
-#include <asm/mach-default/mach_wakecpu.h>
-
-static int probe_summit(void)
-{
-	/* probed later in mptable/ACPI hooks */
-	return 0;
-}
-
-static void vector_allocation_domain(int cpu, cpumask_t *retmask)
-{
-	/* Careful. Some cpus do not strictly honor the set of cpus
-	 * specified in the interrupt destination when using lowest
-	 * priority interrupt delivery mode.
-	 *
-	 * In particular there was a hyperthreading cpu observed to
-	 * deliver interrupts to the wrong hyperthread when only one
-	 * hyperthread was specified in the interrupt desitination.
-	 */
-	*retmask = (cpumask_t){ { [0] = APIC_ALL_CPUS, } };
-}
-
-struct genapic apic_summit = APIC_INIT("summit", probe_summit);
diff --git a/arch/x86/mach-rdc321x/Makefile b/arch/x86/mach-rdc321x/Makefile
deleted file mode 100644
index 8325b4ca431c..000000000000
--- a/arch/x86/mach-rdc321x/Makefile
+++ /dev/null
@@ -1,5 +0,0 @@
-#
-# Makefile for the RDC321x specific parts of the kernel
-#
-obj-$(CONFIG_X86_RDC321X)        := gpio.o platform.o
-
diff --git a/arch/x86/mach-rdc321x/gpio.c b/arch/x86/mach-rdc321x/gpio.c
deleted file mode 100644
index 247f33d3a407..000000000000
--- a/arch/x86/mach-rdc321x/gpio.c
+++ /dev/null
@@ -1,194 +0,0 @@
-/*
- *  GPIO support for RDC SoC R3210/R8610
- *
- *  Copyright (C) 2007, Florian Fainelli <florian@openwrt.org>
- *  Copyright (C) 2008, Volker Weiss <dev@tintuc.de>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- */
-
-
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <linux/types.h>
-#include <linux/module.h>
-
-#include <asm/gpio.h>
-#include <asm/mach-rdc321x/rdc321x_defs.h>
-
-
-/* spin lock to protect our private copy of GPIO data register plus
-   the access to PCI conf registers. */
-static DEFINE_SPINLOCK(gpio_lock);
-
-/* copy of GPIO data registers */
-static u32 gpio_data_reg1;
-static u32 gpio_data_reg2;
-
-static u32 gpio_request_data[2];
-
-
-static inline void rdc321x_conf_write(unsigned addr, u32 value)
-{
-	outl((1 << 31) | (7 << 11) | addr, RDC3210_CFGREG_ADDR);
-	outl(value, RDC3210_CFGREG_DATA);
-}
-
-static inline void rdc321x_conf_or(unsigned addr, u32 value)
-{
-	outl((1 << 31) | (7 << 11) | addr, RDC3210_CFGREG_ADDR);
-	value |= inl(RDC3210_CFGREG_DATA);
-	outl(value, RDC3210_CFGREG_DATA);
-}
-
-static inline u32 rdc321x_conf_read(unsigned addr)
-{
-	outl((1 << 31) | (7 << 11) | addr, RDC3210_CFGREG_ADDR);
-
-	return inl(RDC3210_CFGREG_DATA);
-}
-
-/* configure pin as GPIO */
-static void rdc321x_configure_gpio(unsigned gpio)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&gpio_lock, flags);
-	rdc321x_conf_or(gpio < 32
-		? RDC321X_GPIO_CTRL_REG1 : RDC321X_GPIO_CTRL_REG2,
-		1 << (gpio & 0x1f));
-	spin_unlock_irqrestore(&gpio_lock, flags);
-}
-
-/* initially setup the 2 copies of the gpio data registers.
-   This function must be called by the platform setup code. */
-void __init rdc321x_gpio_setup()
-{
-	/* this might not be, what others (BIOS, bootloader, etc.)
-	   wrote to these registers before, but it's a good guess. Still
-	   better than just using 0xffffffff. */
-
-	gpio_data_reg1 = rdc321x_conf_read(RDC321X_GPIO_DATA_REG1);
-	gpio_data_reg2 = rdc321x_conf_read(RDC321X_GPIO_DATA_REG2);
-}
-
-/* determine, if gpio number is valid */
-static inline int rdc321x_is_gpio(unsigned gpio)
-{
-	return gpio <= RDC321X_MAX_GPIO;
-}
-
-/* request GPIO */
-int rdc_gpio_request(unsigned gpio, const char *label)
-{
-	unsigned long flags;
-
-	if (!rdc321x_is_gpio(gpio))
-		return -EINVAL;
-
-	spin_lock_irqsave(&gpio_lock, flags);
-	if (gpio_request_data[(gpio & 0x20) ? 1 : 0] & (1 << (gpio & 0x1f)))
-		goto inuse;
-	gpio_request_data[(gpio & 0x20) ? 1 : 0] |= (1 << (gpio & 0x1f));
-	spin_unlock_irqrestore(&gpio_lock, flags);
-
-	return 0;
-inuse:
-	spin_unlock_irqrestore(&gpio_lock, flags);
-	return -EINVAL;
-}
-EXPORT_SYMBOL(rdc_gpio_request);
-
-/* release previously-claimed GPIO */
-void rdc_gpio_free(unsigned gpio)
-{
-	unsigned long flags;
-
-	if (!rdc321x_is_gpio(gpio))
-		return;
-
-	spin_lock_irqsave(&gpio_lock, flags);
-	gpio_request_data[(gpio & 0x20) ? 1 : 0] &= ~(1 << (gpio & 0x1f));
-	spin_unlock_irqrestore(&gpio_lock, flags);
-}
-EXPORT_SYMBOL(rdc_gpio_free);
-
-/* read GPIO pin */
-int rdc_gpio_get_value(unsigned gpio)
-{
-	u32 reg;
-	unsigned long flags;
-
-	spin_lock_irqsave(&gpio_lock, flags);
-	reg = rdc321x_conf_read(gpio < 32
-		? RDC321X_GPIO_DATA_REG1 : RDC321X_GPIO_DATA_REG2);
-	spin_unlock_irqrestore(&gpio_lock, flags);
-
-	return (1 << (gpio & 0x1f)) & reg ? 1 : 0;
-}
-EXPORT_SYMBOL(rdc_gpio_get_value);
-
-/* set GPIO pin to value */
-void rdc_gpio_set_value(unsigned gpio, int value)
-{
-	unsigned long flags;
-	u32 reg;
-
-	reg = 1 << (gpio & 0x1f);
-	if (gpio < 32) {
-		spin_lock_irqsave(&gpio_lock, flags);
-		if (value)
-			gpio_data_reg1 |= reg;
-		else
-			gpio_data_reg1 &= ~reg;
-		rdc321x_conf_write(RDC321X_GPIO_DATA_REG1, gpio_data_reg1);
-		spin_unlock_irqrestore(&gpio_lock, flags);
-	} else {
-		spin_lock_irqsave(&gpio_lock, flags);
-		if (value)
-			gpio_data_reg2 |= reg;
-		else
-			gpio_data_reg2 &= ~reg;
-		rdc321x_conf_write(RDC321X_GPIO_DATA_REG2, gpio_data_reg2);
-		spin_unlock_irqrestore(&gpio_lock, flags);
-	}
-}
-EXPORT_SYMBOL(rdc_gpio_set_value);
-
-/* configure GPIO pin as input */
-int rdc_gpio_direction_input(unsigned gpio)
-{
-	if (!rdc321x_is_gpio(gpio))
-		return -EINVAL;
-
-	rdc321x_configure_gpio(gpio);
-
-	return 0;
-}
-EXPORT_SYMBOL(rdc_gpio_direction_input);
-
-/* configure GPIO pin as output and set value */
-int rdc_gpio_direction_output(unsigned gpio, int value)
-{
-	if (!rdc321x_is_gpio(gpio))
-		return -EINVAL;
-
-	gpio_set_value(gpio, value);
-	rdc321x_configure_gpio(gpio);
-
-	return 0;
-}
-EXPORT_SYMBOL(rdc_gpio_direction_output);
diff --git a/arch/x86/mach-rdc321x/platform.c b/arch/x86/mach-rdc321x/platform.c
deleted file mode 100644
index 4f4e50c3ad3b..000000000000
--- a/arch/x86/mach-rdc321x/platform.c
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- *  Generic RDC321x platform devices
- *
- *  Copyright (C) 2007 Florian Fainelli <florian@openwrt.org>
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version 2
- *  of the License, or (at your option) any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; if not, write to the
- *  Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
- *  Boston, MA  02110-1301, USA.
- *
- */
-
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/list.h>
-#include <linux/device.h>
-#include <linux/platform_device.h>
-#include <linux/leds.h>
-
-#include <asm/gpio.h>
-
-/* LEDS */
-static struct gpio_led default_leds[] = {
-	{ .name = "rdc:dmz", .gpio = 1, },
-};
-
-static struct gpio_led_platform_data rdc321x_led_data = {
-	.num_leds = ARRAY_SIZE(default_leds),
-	.leds = default_leds,
-};
-
-static struct platform_device rdc321x_leds = {
-	.name = "leds-gpio",
-	.id = -1,
-	.dev = {
-		.platform_data = &rdc321x_led_data,
-	}
-};
-
-/* Watchdog */
-static struct platform_device rdc321x_wdt = {
-	.name = "rdc321x-wdt",
-	.id = -1,
-	.num_resources = 0,
-};
-
-static struct platform_device *rdc321x_devs[] = {
-	&rdc321x_leds,
-	&rdc321x_wdt
-};
-
-static int __init rdc_board_setup(void)
-{
-	rdc321x_gpio_setup();
-
-	return platform_add_devices(rdc321x_devs, ARRAY_SIZE(rdc321x_devs));
-}
-
-arch_initcall(rdc_board_setup);
diff --git a/arch/x86/mach-voyager/Makefile b/arch/x86/mach-voyager/Makefile
deleted file mode 100644
index 15c250b371d3..000000000000
--- a/arch/x86/mach-voyager/Makefile
+++ /dev/null
@@ -1,8 +0,0 @@
-#
-# Makefile for the linux kernel.
-#
-
-EXTRA_CFLAGS	:= -Iarch/x86/kernel
-obj-y			:= setup.o voyager_basic.o voyager_thread.o
-
-obj-$(CONFIG_SMP)	+= voyager_smp.o voyager_cat.o
diff --git a/arch/x86/mach-voyager/setup.c b/arch/x86/mach-voyager/setup.c
deleted file mode 100644
index 8e5118371f0f..000000000000
--- a/arch/x86/mach-voyager/setup.c
+++ /dev/null
@@ -1,118 +0,0 @@
-/*
- *	Machine specific setup for generic
- */
-
-#include <linux/init.h>
-#include <linux/interrupt.h>
-#include <asm/arch_hooks.h>
-#include <asm/voyager.h>
-#include <asm/e820.h>
-#include <asm/io.h>
-#include <asm/setup.h>
-
-void __init pre_intr_init_hook(void)
-{
-	init_ISA_irqs();
-}
-
-/*
- * IRQ2 is cascade interrupt to second interrupt controller
- */
-static struct irqaction irq2 = {
-	.handler = no_action,
-	.mask = CPU_MASK_NONE,
-	.name = "cascade",
-};
-
-void __init intr_init_hook(void)
-{
-#ifdef CONFIG_SMP
-	voyager_smp_intr_init();
-#endif
-
-	setup_irq(2, &irq2);
-}
-
-static void voyager_disable_tsc(void)
-{
-	/* Voyagers run their CPUs from independent clocks, so disable
-	 * the TSC code because we can't sync them */
-	setup_clear_cpu_cap(X86_FEATURE_TSC);
-}
-
-void __init pre_setup_arch_hook(void)
-{
-	voyager_disable_tsc();
-}
-
-void __init pre_time_init_hook(void)
-{
-	voyager_disable_tsc();
-}
-
-void __init trap_init_hook(void)
-{
-}
-
-static struct irqaction irq0 = {
-	.handler = timer_interrupt,
-	.flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,
-	.mask = CPU_MASK_NONE,
-	.name = "timer"
-};
-
-void __init time_init_hook(void)
-{
-	irq0.mask = cpumask_of_cpu(safe_smp_processor_id());
-	setup_irq(0, &irq0);
-}
-
-/* Hook for machine specific memory setup. */
-
-char *__init machine_specific_memory_setup(void)
-{
-	char *who;
-	int new_nr;
-
-	who = "NOT VOYAGER";
-
-	if (voyager_level == 5) {
-		__u32 addr, length;
-		int i;
-
-		who = "Voyager-SUS";
-
-		e820.nr_map = 0;
-		for (i = 0; voyager_memory_detect(i, &addr, &length); i++) {
-			e820_add_region(addr, length, E820_RAM);
-		}
-		return who;
-	} else if (voyager_level == 4) {
-		__u32 tom;
-		__u16 catbase = inb(VOYAGER_SSPB_RELOCATION_PORT) << 8;
-		/* select the DINO config space */
-		outb(VOYAGER_DINO, VOYAGER_CAT_CONFIG_PORT);
-		/* Read DINO top of memory register */
-		tom = ((inb(catbase + 0x4) & 0xf0) << 16)
-		    + ((inb(catbase + 0x5) & 0x7f) << 24);
-
-		if (inb(catbase) != VOYAGER_DINO) {
-			printk(KERN_ERR
-			       "Voyager: Failed to get DINO for L4, setting tom to EXT_MEM_K\n");
-			tom = (boot_params.screen_info.ext_mem_k) << 10;
-		}
-		who = "Voyager-TOM";
-		e820_add_region(0, 0x9f000, E820_RAM);
-		/* map from 1M to top of memory */
-		e820_add_region(1 * 1024 * 1024, tom - 1 * 1024 * 1024,
-				  E820_RAM);
-		/* FIXME: Should check the ASICs to see if I need to
-		 * take out the 8M window.  Just do it at the moment
-		 * */
-		e820_add_region(8 * 1024 * 1024, 8 * 1024 * 1024,
-				  E820_RESERVED);
-		return who;
-	}
-
-	return default_machine_specific_memory_setup();
-}
diff --git a/arch/x86/mach-voyager/voyager_basic.c b/arch/x86/mach-voyager/voyager_basic.c
deleted file mode 100644
index 46d6f8067690..000000000000
--- a/arch/x86/mach-voyager/voyager_basic.c
+++ /dev/null
@@ -1,317 +0,0 @@
-/* Copyright (C) 1999,2001 
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * This file contains all the voyager specific routines for getting
- * initialisation of the architecture to function.  For additional
- * features see:
- *
- *	voyager_cat.c - Voyager CAT bus interface
- *	voyager_smp.c - Voyager SMP hal (emulates linux smp.c)
- */
-
-#include <linux/module.h>
-#include <linux/types.h>
-#include <linux/sched.h>
-#include <linux/ptrace.h>
-#include <linux/ioport.h>
-#include <linux/interrupt.h>
-#include <linux/init.h>
-#include <linux/delay.h>
-#include <linux/reboot.h>
-#include <linux/sysrq.h>
-#include <linux/smp.h>
-#include <linux/nodemask.h>
-#include <asm/io.h>
-#include <asm/voyager.h>
-#include <asm/vic.h>
-#include <linux/pm.h>
-#include <asm/tlbflush.h>
-#include <asm/arch_hooks.h>
-#include <asm/i8253.h>
-
-/*
- * Power off function, if any
- */
-void (*pm_power_off) (void);
-EXPORT_SYMBOL(pm_power_off);
-
-int voyager_level = 0;
-
-struct voyager_SUS *voyager_SUS = NULL;
-
-#ifdef CONFIG_SMP
-static void voyager_dump(int dummy1, struct tty_struct *dummy3)
-{
-	/* get here via a sysrq */
-	voyager_smp_dump();
-}
-
-static struct sysrq_key_op sysrq_voyager_dump_op = {
-	.handler = voyager_dump,
-	.help_msg = "Voyager",
-	.action_msg = "Dump Voyager Status",
-};
-#endif
-
-void voyager_detect(struct voyager_bios_info *bios)
-{
-	if (bios->len != 0xff) {
-		int class = (bios->class_1 << 8)
-		    | (bios->class_2 & 0xff);
-
-		printk("Voyager System detected.\n"
-		       "        Class %x, Revision %d.%d\n",
-		       class, bios->major, bios->minor);
-		if (class == VOYAGER_LEVEL4)
-			voyager_level = 4;
-		else if (class < VOYAGER_LEVEL5_AND_ABOVE)
-			voyager_level = 3;
-		else
-			voyager_level = 5;
-		printk("        Architecture Level %d\n", voyager_level);
-		if (voyager_level < 4)
-			printk
-			    ("\n**WARNING**: Voyager HAL only supports Levels 4 and 5 Architectures at the moment\n\n");
-		/* install the power off handler */
-		pm_power_off = voyager_power_off;
-#ifdef CONFIG_SMP
-		register_sysrq_key('v', &sysrq_voyager_dump_op);
-#endif
-	} else {
-		printk("\n\n**WARNING**: No Voyager Subsystem Found\n");
-	}
-}
-
-void voyager_system_interrupt(int cpl, void *dev_id)
-{
-	printk("Voyager: detected system interrupt\n");
-}
-
-/* Routine to read information from the extended CMOS area */
-__u8 voyager_extended_cmos_read(__u16 addr)
-{
-	outb(addr & 0xff, 0x74);
-	outb((addr >> 8) & 0xff, 0x75);
-	return inb(0x76);
-}
-
-/* internal definitions for the SUS Click Map of memory */
-
-#define CLICK_ENTRIES	16
-#define CLICK_SIZE	4096	/* click to byte conversion for Length */
-
-typedef struct ClickMap {
-	struct Entry {
-		__u32 Address;
-		__u32 Length;
-	} Entry[CLICK_ENTRIES];
-} ClickMap_t;
-
-/* This routine is pretty much an awful hack to read the bios clickmap by
- * mapping it into page 0.  There are usually three regions in the map:
- * 	Base Memory
- * 	Extended Memory
- *	zero length marker for end of map
- *
- * Returns are 0 for failure and 1 for success on extracting region.
- */
-int __init voyager_memory_detect(int region, __u32 * start, __u32 * length)
-{
-	int i;
-	int retval = 0;
-	__u8 cmos[4];
-	ClickMap_t *map;
-	unsigned long map_addr;
-	unsigned long old;
-
-	if (region >= CLICK_ENTRIES) {
-		printk("Voyager: Illegal ClickMap region %d\n", region);
-		return 0;
-	}
-
-	for (i = 0; i < sizeof(cmos); i++)
-		cmos[i] =
-		    voyager_extended_cmos_read(VOYAGER_MEMORY_CLICKMAP + i);
-
-	map_addr = *(unsigned long *)cmos;
-
-	/* steal page 0 for this */
-	old = pg0[0];
-	pg0[0] = ((map_addr & PAGE_MASK) | _PAGE_RW | _PAGE_PRESENT);
-	local_flush_tlb();
-	/* now clear everything out but page 0 */
-	map = (ClickMap_t *) (map_addr & (~PAGE_MASK));
-
-	/* zero length is the end of the clickmap */
-	if (map->Entry[region].Length != 0) {
-		*length = map->Entry[region].Length * CLICK_SIZE;
-		*start = map->Entry[region].Address;
-		retval = 1;
-	}
-
-	/* replace the mapping */
-	pg0[0] = old;
-	local_flush_tlb();
-	return retval;
-}
-
-/* voyager specific handling code for timer interrupts.  Used to hand
- * off the timer tick to the SMP code, since the VIC doesn't have an
- * internal timer (The QIC does, but that's another story). */
-void voyager_timer_interrupt(void)
-{
-	if ((jiffies & 0x3ff) == 0) {
-
-		/* There seems to be something flaky in either
-		 * hardware or software that is resetting the timer 0
-		 * count to something much higher than it should be
-		 * This seems to occur in the boot sequence, just
-		 * before root is mounted.  Therefore, every 10
-		 * seconds or so, we sanity check the timer zero count
-		 * and kick it back to where it should be.
-		 *
-		 * FIXME: This is the most awful hack yet seen.  I
-		 * should work out exactly what is interfering with
-		 * the timer count settings early in the boot sequence
-		 * and swiftly introduce it to something sharp and
-		 * pointy.  */
-		__u16 val;
-
-		spin_lock(&i8253_lock);
-
-		outb_p(0x00, 0x43);
-		val = inb_p(0x40);
-		val |= inb(0x40) << 8;
-		spin_unlock(&i8253_lock);
-
-		if (val > LATCH) {
-			printk
-			    ("\nVOYAGER: countdown timer value too high (%d), resetting\n\n",
-			     val);
-			spin_lock(&i8253_lock);
-			outb(0x34, 0x43);
-			outb_p(LATCH & 0xff, 0x40);	/* LSB */
-			outb(LATCH >> 8, 0x40);	/* MSB */
-			spin_unlock(&i8253_lock);
-		}
-	}
-#ifdef CONFIG_SMP
-	smp_vic_timer_interrupt();
-#endif
-}
-
-void voyager_power_off(void)
-{
-	printk("VOYAGER Power Off\n");
-
-	if (voyager_level == 5) {
-		voyager_cat_power_off();
-	} else if (voyager_level == 4) {
-		/* This doesn't apparently work on most L4 machines,
-		 * but the specs say to do this to get automatic power
-		 * off.  Unfortunately, if it doesn't power off the
-		 * machine, it ends up doing a cold restart, which
-		 * isn't really intended, so comment out the code */
-#if 0
-		int port;
-
-		/* enable the voyager Configuration Space */
-		outb((inb(VOYAGER_MC_SETUP) & 0xf0) | 0x8, VOYAGER_MC_SETUP);
-		/* the port for the power off flag is an offset from the
-		   floating base */
-		port = (inb(VOYAGER_SSPB_RELOCATION_PORT) << 8) + 0x21;
-		/* set the power off flag */
-		outb(inb(port) | 0x1, port);
-#endif
-	}
-	/* and wait for it to happen */
-	local_irq_disable();
-	for (;;)
-		halt();
-}
-
-/* copied from process.c */
-static inline void kb_wait(void)
-{
-	int i;
-
-	for (i = 0; i < 0x10000; i++)
-		if ((inb_p(0x64) & 0x02) == 0)
-			break;
-}
-
-void machine_shutdown(void)
-{
-	/* Architecture specific shutdown needed before a kexec */
-}
-
-void machine_restart(char *cmd)
-{
-	printk("Voyager Warm Restart\n");
-	kb_wait();
-
-	if (voyager_level == 5) {
-		/* write magic values to the RTC to inform system that
-		 * shutdown is beginning */
-		outb(0x8f, 0x70);
-		outb(0x5, 0x71);
-
-		udelay(50);
-		outb(0xfe, 0x64);	/* pull reset low */
-	} else if (voyager_level == 4) {
-		__u16 catbase = inb(VOYAGER_SSPB_RELOCATION_PORT) << 8;
-		__u8 basebd = inb(VOYAGER_MC_SETUP);
-
-		outb(basebd | 0x08, VOYAGER_MC_SETUP);
-		outb(0x02, catbase + 0x21);
-	}
-	local_irq_disable();
-	for (;;)
-		halt();
-}
-
-void machine_emergency_restart(void)
-{
-	/*for now, just hook this to a warm restart */
-	machine_restart(NULL);
-}
-
-void mca_nmi_hook(void)
-{
-	__u8 dumpval __maybe_unused = inb(0xf823);
-	__u8 swnmi __maybe_unused = inb(0xf813);
-
-	/* FIXME: assume dump switch pressed */
-	/* check to see if the dump switch was pressed */
-	VDEBUG(("VOYAGER: dumpval = 0x%x, swnmi = 0x%x\n", dumpval, swnmi));
-	/* clear swnmi */
-	outb(0xff, 0xf813);
-	/* tell SUS to ignore dump */
-	if (voyager_level == 5 && voyager_SUS != NULL) {
-		if (voyager_SUS->SUS_mbox == VOYAGER_DUMP_BUTTON_NMI) {
-			voyager_SUS->kernel_mbox = VOYAGER_NO_COMMAND;
-			voyager_SUS->kernel_flags |= VOYAGER_OS_IN_PROGRESS;
-			udelay(1000);
-			voyager_SUS->kernel_mbox = VOYAGER_IGNORE_DUMP;
-			voyager_SUS->kernel_flags &= ~VOYAGER_OS_IN_PROGRESS;
-		}
-	}
-	printk(KERN_ERR
-	       "VOYAGER: Dump switch pressed, printing CPU%d tracebacks\n",
-	       smp_processor_id());
-	show_stack(NULL, NULL);
-	show_state();
-}
-
-void machine_halt(void)
-{
-	/* treat a halt like a power off */
-	machine_power_off();
-}
-
-void machine_power_off(void)
-{
-	if (pm_power_off)
-		pm_power_off();
-}
diff --git a/arch/x86/mach-voyager/voyager_cat.c b/arch/x86/mach-voyager/voyager_cat.c
deleted file mode 100644
index 2ad598c104af..000000000000
--- a/arch/x86/mach-voyager/voyager_cat.c
+++ /dev/null
@@ -1,1197 +0,0 @@
-/* -*- mode: c; c-basic-offset: 8 -*- */
-
-/* Copyright (C) 1999,2001
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * This file contains all the logic for manipulating the CAT bus
- * in a level 5 machine.
- *
- * The CAT bus is a serial configuration and test bus.  Its primary
- * uses are to probe the initial configuration of the system and to
- * diagnose error conditions when a system interrupt occurs.  The low
- * level interface is fairly primitive, so most of this file consists
- * of bit shift manipulations to send and receive packets on the
- * serial bus */
-
-#include <linux/types.h>
-#include <linux/completion.h>
-#include <linux/sched.h>
-#include <asm/voyager.h>
-#include <asm/vic.h>
-#include <linux/ioport.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/delay.h>
-#include <asm/io.h>
-
-#ifdef VOYAGER_CAT_DEBUG
-#define CDEBUG(x)	printk x
-#else
-#define CDEBUG(x)
-#endif
-
-/* the CAT command port */
-#define CAT_CMD		(sspb + 0xe)
-/* the CAT data port */
-#define CAT_DATA	(sspb + 0xd)
-
-/* the internal cat functions */
-static void cat_pack(__u8 * msg, __u16 start_bit, __u8 * data, __u16 num_bits);
-static void cat_unpack(__u8 * msg, __u16 start_bit, __u8 * data,
-		       __u16 num_bits);
-static void cat_build_header(__u8 * header, const __u16 len,
-			     const __u16 smallest_reg_bits,
-			     const __u16 longest_reg_bits);
-static int cat_sendinst(voyager_module_t * modp, voyager_asic_t * asicp,
-			__u8 reg, __u8 op);
-static int cat_getdata(voyager_module_t * modp, voyager_asic_t * asicp,
-		       __u8 reg, __u8 * value);
-static int cat_shiftout(__u8 * data, __u16 data_bytes, __u16 header_bytes,
-			__u8 pad_bits);
-static int cat_write(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg,
-		     __u8 value);
-static int cat_read(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg,
-		    __u8 * value);
-static int cat_subread(voyager_module_t * modp, voyager_asic_t * asicp,
-		       __u16 offset, __u16 len, void *buf);
-static int cat_senddata(voyager_module_t * modp, voyager_asic_t * asicp,
-			__u8 reg, __u8 value);
-static int cat_disconnect(voyager_module_t * modp, voyager_asic_t * asicp);
-static int cat_connect(voyager_module_t * modp, voyager_asic_t * asicp);
-
-static inline const char *cat_module_name(int module_id)
-{
-	switch (module_id) {
-	case 0x10:
-		return "Processor Slot 0";
-	case 0x11:
-		return "Processor Slot 1";
-	case 0x12:
-		return "Processor Slot 2";
-	case 0x13:
-		return "Processor Slot 4";
-	case 0x14:
-		return "Memory Slot 0";
-	case 0x15:
-		return "Memory Slot 1";
-	case 0x18:
-		return "Primary Microchannel";
-	case 0x19:
-		return "Secondary Microchannel";
-	case 0x1a:
-		return "Power Supply Interface";
-	case 0x1c:
-		return "Processor Slot 5";
-	case 0x1d:
-		return "Processor Slot 6";
-	case 0x1e:
-		return "Processor Slot 7";
-	case 0x1f:
-		return "Processor Slot 8";
-	default:
-		return "Unknown Module";
-	}
-}
-
-static int sspb = 0;		/* stores the super port location */
-int voyager_8slot = 0;		/* set to true if a 51xx monster */
-
-voyager_module_t *voyager_cat_list;
-
-/* the I/O port assignments for the VIC and QIC */
-static struct resource vic_res = {
-	.name = "Voyager Interrupt Controller",
-	.start = 0xFC00,
-	.end = 0xFC6F
-};
-static struct resource qic_res = {
-	.name = "Quad Interrupt Controller",
-	.start = 0xFC70,
-	.end = 0xFCFF
-};
-
-/* This function is used to pack a data bit stream inside a message.
- * It writes num_bits of the data buffer in msg starting at start_bit.
- * Note: This function assumes that any unused bit in the data stream
- * is set to zero so that the ors will work correctly */
-static void
-cat_pack(__u8 * msg, const __u16 start_bit, __u8 * data, const __u16 num_bits)
-{
-	/* compute initial shift needed */
-	const __u16 offset = start_bit % BITS_PER_BYTE;
-	__u16 len = num_bits / BITS_PER_BYTE;
-	__u16 byte = start_bit / BITS_PER_BYTE;
-	__u16 residue = (num_bits % BITS_PER_BYTE) + offset;
-	int i;
-
-	/* adjust if we have more than a byte of residue */
-	if (residue >= BITS_PER_BYTE) {
-		residue -= BITS_PER_BYTE;
-		len++;
-	}
-
-	/* clear out the bits.  We assume here that if len==0 then
-	 * residue >= offset.  This is always true for the catbus
-	 * operations */
-	msg[byte] &= 0xff << (BITS_PER_BYTE - offset);
-	msg[byte++] |= data[0] >> offset;
-	if (len == 0)
-		return;
-	for (i = 1; i < len; i++)
-		msg[byte++] = (data[i - 1] << (BITS_PER_BYTE - offset))
-		    | (data[i] >> offset);
-	if (residue != 0) {
-		__u8 mask = 0xff >> residue;
-		__u8 last_byte = data[i - 1] << (BITS_PER_BYTE - offset)
-		    | (data[i] >> offset);
-
-		last_byte &= ~mask;
-		msg[byte] &= mask;
-		msg[byte] |= last_byte;
-	}
-	return;
-}
-
-/* unpack the data again (same arguments as cat_pack()). data buffer
- * must be zero populated.
- *
- * Function: given a message string move to start_bit and copy num_bits into
- * data (starting at bit 0 in data).
- */
-static void
-cat_unpack(__u8 * msg, const __u16 start_bit, __u8 * data, const __u16 num_bits)
-{
-	/* compute initial shift needed */
-	const __u16 offset = start_bit % BITS_PER_BYTE;
-	__u16 len = num_bits / BITS_PER_BYTE;
-	const __u8 last_bits = num_bits % BITS_PER_BYTE;
-	__u16 byte = start_bit / BITS_PER_BYTE;
-	int i;
-
-	if (last_bits != 0)
-		len++;
-
-	/* special case: want < 8 bits from msg and we can get it from
-	 * a single byte of the msg */
-	if (len == 0 && BITS_PER_BYTE - offset >= num_bits) {
-		data[0] = msg[byte] << offset;
-		data[0] &= 0xff >> (BITS_PER_BYTE - num_bits);
-		return;
-	}
-	for (i = 0; i < len; i++) {
-		/* this annoying if has to be done just in case a read of
-		 * msg one beyond the array causes a panic */
-		if (offset != 0) {
-			data[i] = msg[byte++] << offset;
-			data[i] |= msg[byte] >> (BITS_PER_BYTE - offset);
-		} else {
-			data[i] = msg[byte++];
-		}
-	}
-	/* do we need to truncate the final byte */
-	if (last_bits != 0) {
-		data[i - 1] &= 0xff << (BITS_PER_BYTE - last_bits);
-	}
-	return;
-}
-
-static void
-cat_build_header(__u8 * header, const __u16 len, const __u16 smallest_reg_bits,
-		 const __u16 longest_reg_bits)
-{
-	int i;
-	__u16 start_bit = (smallest_reg_bits - 1) % BITS_PER_BYTE;
-	__u8 *last_byte = &header[len - 1];
-
-	if (start_bit == 0)
-		start_bit = 1;	/* must have at least one bit in the hdr */
-
-	for (i = 0; i < len; i++)
-		header[i] = 0;
-
-	for (i = start_bit; i > 0; i--)
-		*last_byte = ((*last_byte) << 1) + 1;
-
-}
-
-static int
-cat_sendinst(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg, __u8 op)
-{
-	__u8 parity, inst, inst_buf[4] = { 0 };
-	__u8 iseq[VOYAGER_MAX_SCAN_PATH], hseq[VOYAGER_MAX_REG_SIZE];
-	__u16 ibytes, hbytes, padbits;
-	int i;
-
-	/* 
-	 * Parity is the parity of the register number + 1 (READ_REGISTER
-	 * and WRITE_REGISTER always add '1' to the number of bits == 1)
-	 */
-	parity = (__u8) (1 + (reg & 0x01) +
-			 ((__u8) (reg & 0x02) >> 1) +
-			 ((__u8) (reg & 0x04) >> 2) +
-			 ((__u8) (reg & 0x08) >> 3)) % 2;
-
-	inst = ((parity << 7) | (reg << 2) | op);
-
-	outb(VOYAGER_CAT_IRCYC, CAT_CMD);
-	if (!modp->scan_path_connected) {
-		if (asicp->asic_id != VOYAGER_CAT_ID) {
-			printk
-			    ("**WARNING***: cat_sendinst has disconnected scan path not to CAT asic\n");
-			return 1;
-		}
-		outb(VOYAGER_CAT_HEADER, CAT_DATA);
-		outb(inst, CAT_DATA);
-		if (inb(CAT_DATA) != VOYAGER_CAT_HEADER) {
-			CDEBUG(("VOYAGER CAT: cat_sendinst failed to get CAT_HEADER\n"));
-			return 1;
-		}
-		return 0;
-	}
-	ibytes = modp->inst_bits / BITS_PER_BYTE;
-	if ((padbits = modp->inst_bits % BITS_PER_BYTE) != 0) {
-		padbits = BITS_PER_BYTE - padbits;
-		ibytes++;
-	}
-	hbytes = modp->largest_reg / BITS_PER_BYTE;
-	if (modp->largest_reg % BITS_PER_BYTE)
-		hbytes++;
-	CDEBUG(("cat_sendinst: ibytes=%d, hbytes=%d\n", ibytes, hbytes));
-	/* initialise the instruction sequence to 0xff */
-	for (i = 0; i < ibytes + hbytes; i++)
-		iseq[i] = 0xff;
-	cat_build_header(hseq, hbytes, modp->smallest_reg, modp->largest_reg);
-	cat_pack(iseq, modp->inst_bits, hseq, hbytes * BITS_PER_BYTE);
-	inst_buf[0] = inst;
-	inst_buf[1] = 0xFF >> (modp->largest_reg % BITS_PER_BYTE);
-	cat_pack(iseq, asicp->bit_location, inst_buf, asicp->ireg_length);
-#ifdef VOYAGER_CAT_DEBUG
-	printk("ins = 0x%x, iseq: ", inst);
-	for (i = 0; i < ibytes + hbytes; i++)
-		printk("0x%x ", iseq[i]);
-	printk("\n");
-#endif
-	if (cat_shiftout(iseq, ibytes, hbytes, padbits)) {
-		CDEBUG(("VOYAGER CAT: cat_sendinst: cat_shiftout failed\n"));
-		return 1;
-	}
-	CDEBUG(("CAT SHIFTOUT DONE\n"));
-	return 0;
-}
-
-static int
-cat_getdata(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg,
-	    __u8 * value)
-{
-	if (!modp->scan_path_connected) {
-		if (asicp->asic_id != VOYAGER_CAT_ID) {
-			CDEBUG(("VOYAGER CAT: ERROR: cat_getdata to CAT asic with scan path connected\n"));
-			return 1;
-		}
-		if (reg > VOYAGER_SUBADDRHI)
-			outb(VOYAGER_CAT_RUN, CAT_CMD);
-		outb(VOYAGER_CAT_DRCYC, CAT_CMD);
-		outb(VOYAGER_CAT_HEADER, CAT_DATA);
-		*value = inb(CAT_DATA);
-		outb(0xAA, CAT_DATA);
-		if (inb(CAT_DATA) != VOYAGER_CAT_HEADER) {
-			CDEBUG(("cat_getdata: failed to get VOYAGER_CAT_HEADER\n"));
-			return 1;
-		}
-		return 0;
-	} else {
-		__u16 sbits = modp->num_asics - 1 + asicp->ireg_length;
-		__u16 sbytes = sbits / BITS_PER_BYTE;
-		__u16 tbytes;
-		__u8 string[VOYAGER_MAX_SCAN_PATH],
-		    trailer[VOYAGER_MAX_REG_SIZE];
-		__u8 padbits;
-		int i;
-
-		outb(VOYAGER_CAT_DRCYC, CAT_CMD);
-
-		if ((padbits = sbits % BITS_PER_BYTE) != 0) {
-			padbits = BITS_PER_BYTE - padbits;
-			sbytes++;
-		}
-		tbytes = asicp->ireg_length / BITS_PER_BYTE;
-		if (asicp->ireg_length % BITS_PER_BYTE)
-			tbytes++;
-		CDEBUG(("cat_getdata: tbytes = %d, sbytes = %d, padbits = %d\n",
-			tbytes, sbytes, padbits));
-		cat_build_header(trailer, tbytes, 1, asicp->ireg_length);
-
-		for (i = tbytes - 1; i >= 0; i--) {
-			outb(trailer[i], CAT_DATA);
-			string[sbytes + i] = inb(CAT_DATA);
-		}
-
-		for (i = sbytes - 1; i >= 0; i--) {
-			outb(0xaa, CAT_DATA);
-			string[i] = inb(CAT_DATA);
-		}
-		*value = 0;
-		cat_unpack(string,
-			   padbits + (tbytes * BITS_PER_BYTE) +
-			   asicp->asic_location, value, asicp->ireg_length);
-#ifdef VOYAGER_CAT_DEBUG
-		printk("value=0x%x, string: ", *value);
-		for (i = 0; i < tbytes + sbytes; i++)
-			printk("0x%x ", string[i]);
-		printk("\n");
-#endif
-
-		/* sanity check the rest of the return */
-		for (i = 0; i < tbytes; i++) {
-			__u8 input = 0;
-
-			cat_unpack(string, padbits + (i * BITS_PER_BYTE),
-				   &input, BITS_PER_BYTE);
-			if (trailer[i] != input) {
-				CDEBUG(("cat_getdata: failed to sanity check rest of ret(%d) 0x%x != 0x%x\n", i, input, trailer[i]));
-				return 1;
-			}
-		}
-		CDEBUG(("cat_getdata DONE\n"));
-		return 0;
-	}
-}
-
-static int
-cat_shiftout(__u8 * data, __u16 data_bytes, __u16 header_bytes, __u8 pad_bits)
-{
-	int i;
-
-	for (i = data_bytes + header_bytes - 1; i >= header_bytes; i--)
-		outb(data[i], CAT_DATA);
-
-	for (i = header_bytes - 1; i >= 0; i--) {
-		__u8 header = 0;
-		__u8 input;
-
-		outb(data[i], CAT_DATA);
-		input = inb(CAT_DATA);
-		CDEBUG(("cat_shiftout: returned 0x%x\n", input));
-		cat_unpack(data, ((data_bytes + i) * BITS_PER_BYTE) - pad_bits,
-			   &header, BITS_PER_BYTE);
-		if (input != header) {
-			CDEBUG(("VOYAGER CAT: cat_shiftout failed to return header 0x%x != 0x%x\n", input, header));
-			return 1;
-		}
-	}
-	return 0;
-}
-
-static int
-cat_senddata(voyager_module_t * modp, voyager_asic_t * asicp,
-	     __u8 reg, __u8 value)
-{
-	outb(VOYAGER_CAT_DRCYC, CAT_CMD);
-	if (!modp->scan_path_connected) {
-		if (asicp->asic_id != VOYAGER_CAT_ID) {
-			CDEBUG(("VOYAGER CAT: ERROR: scan path disconnected when asic != CAT\n"));
-			return 1;
-		}
-		outb(VOYAGER_CAT_HEADER, CAT_DATA);
-		outb(value, CAT_DATA);
-		if (inb(CAT_DATA) != VOYAGER_CAT_HEADER) {
-			CDEBUG(("cat_senddata: failed to get correct header response to sent data\n"));
-			return 1;
-		}
-		if (reg > VOYAGER_SUBADDRHI) {
-			outb(VOYAGER_CAT_RUN, CAT_CMD);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			outb(VOYAGER_CAT_RUN, CAT_CMD);
-		}
-
-		return 0;
-	} else {
-		__u16 hbytes = asicp->ireg_length / BITS_PER_BYTE;
-		__u16 dbytes =
-		    (modp->num_asics - 1 + asicp->ireg_length) / BITS_PER_BYTE;
-		__u8 padbits, dseq[VOYAGER_MAX_SCAN_PATH],
-		    hseq[VOYAGER_MAX_REG_SIZE];
-		int i;
-
-		if ((padbits = (modp->num_asics - 1
-				+ asicp->ireg_length) % BITS_PER_BYTE) != 0) {
-			padbits = BITS_PER_BYTE - padbits;
-			dbytes++;
-		}
-		if (asicp->ireg_length % BITS_PER_BYTE)
-			hbytes++;
-
-		cat_build_header(hseq, hbytes, 1, asicp->ireg_length);
-
-		for (i = 0; i < dbytes + hbytes; i++)
-			dseq[i] = 0xff;
-		CDEBUG(("cat_senddata: dbytes=%d, hbytes=%d, padbits=%d\n",
-			dbytes, hbytes, padbits));
-		cat_pack(dseq, modp->num_asics - 1 + asicp->ireg_length,
-			 hseq, hbytes * BITS_PER_BYTE);
-		cat_pack(dseq, asicp->asic_location, &value,
-			 asicp->ireg_length);
-#ifdef VOYAGER_CAT_DEBUG
-		printk("dseq ");
-		for (i = 0; i < hbytes + dbytes; i++) {
-			printk("0x%x ", dseq[i]);
-		}
-		printk("\n");
-#endif
-		return cat_shiftout(dseq, dbytes, hbytes, padbits);
-	}
-}
-
-static int
-cat_write(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg, __u8 value)
-{
-	if (cat_sendinst(modp, asicp, reg, VOYAGER_WRITE_CONFIG))
-		return 1;
-	return cat_senddata(modp, asicp, reg, value);
-}
-
-static int
-cat_read(voyager_module_t * modp, voyager_asic_t * asicp, __u8 reg,
-	 __u8 * value)
-{
-	if (cat_sendinst(modp, asicp, reg, VOYAGER_READ_CONFIG))
-		return 1;
-	return cat_getdata(modp, asicp, reg, value);
-}
-
-static int
-cat_subaddrsetup(voyager_module_t * modp, voyager_asic_t * asicp, __u16 offset,
-		 __u16 len)
-{
-	__u8 val;
-
-	if (len > 1) {
-		/* set auto increment */
-		__u8 newval;
-
-		if (cat_read(modp, asicp, VOYAGER_AUTO_INC_REG, &val)) {
-			CDEBUG(("cat_subaddrsetup: read of VOYAGER_AUTO_INC_REG failed\n"));
-			return 1;
-		}
-		CDEBUG(("cat_subaddrsetup: VOYAGER_AUTO_INC_REG = 0x%x\n",
-			val));
-		newval = val | VOYAGER_AUTO_INC;
-		if (newval != val) {
-			if (cat_write(modp, asicp, VOYAGER_AUTO_INC_REG, val)) {
-				CDEBUG(("cat_subaddrsetup: write to VOYAGER_AUTO_INC_REG failed\n"));
-				return 1;
-			}
-		}
-	}
-	if (cat_write(modp, asicp, VOYAGER_SUBADDRLO, (__u8) (offset & 0xff))) {
-		CDEBUG(("cat_subaddrsetup: write to SUBADDRLO failed\n"));
-		return 1;
-	}
-	if (asicp->subaddr > VOYAGER_SUBADDR_LO) {
-		if (cat_write
-		    (modp, asicp, VOYAGER_SUBADDRHI, (__u8) (offset >> 8))) {
-			CDEBUG(("cat_subaddrsetup: write to SUBADDRHI failed\n"));
-			return 1;
-		}
-		cat_read(modp, asicp, VOYAGER_SUBADDRHI, &val);
-		CDEBUG(("cat_subaddrsetup: offset = %d, hi = %d\n", offset,
-			val));
-	}
-	cat_read(modp, asicp, VOYAGER_SUBADDRLO, &val);
-	CDEBUG(("cat_subaddrsetup: offset = %d, lo = %d\n", offset, val));
-	return 0;
-}
-
-static int
-cat_subwrite(voyager_module_t * modp, voyager_asic_t * asicp, __u16 offset,
-	     __u16 len, void *buf)
-{
-	int i, retval;
-
-	/* FIXME: need special actions for VOYAGER_CAT_ID here */
-	if (asicp->asic_id == VOYAGER_CAT_ID) {
-		CDEBUG(("cat_subwrite: ATTEMPT TO WRITE TO CAT ASIC\n"));
-		/* FIXME -- This is supposed to be handled better
-		 * There is a problem writing to the cat asic in the
-		 * PSI.  The 30us delay seems to work, though */
-		udelay(30);
-	}
-
-	if ((retval = cat_subaddrsetup(modp, asicp, offset, len)) != 0) {
-		printk("cat_subwrite: cat_subaddrsetup FAILED\n");
-		return retval;
-	}
-
-	if (cat_sendinst
-	    (modp, asicp, VOYAGER_SUBADDRDATA, VOYAGER_WRITE_CONFIG)) {
-		printk("cat_subwrite: cat_sendinst FAILED\n");
-		return 1;
-	}
-	for (i = 0; i < len; i++) {
-		if (cat_senddata(modp, asicp, 0xFF, ((__u8 *) buf)[i])) {
-			printk
-			    ("cat_subwrite: cat_sendata element at %d FAILED\n",
-			     i);
-			return 1;
-		}
-	}
-	return 0;
-}
-static int
-cat_subread(voyager_module_t * modp, voyager_asic_t * asicp, __u16 offset,
-	    __u16 len, void *buf)
-{
-	int i, retval;
-
-	if ((retval = cat_subaddrsetup(modp, asicp, offset, len)) != 0) {
-		CDEBUG(("cat_subread: cat_subaddrsetup FAILED\n"));
-		return retval;
-	}
-
-	if (cat_sendinst(modp, asicp, VOYAGER_SUBADDRDATA, VOYAGER_READ_CONFIG)) {
-		CDEBUG(("cat_subread: cat_sendinst failed\n"));
-		return 1;
-	}
-	for (i = 0; i < len; i++) {
-		if (cat_getdata(modp, asicp, 0xFF, &((__u8 *) buf)[i])) {
-			CDEBUG(("cat_subread: cat_getdata element %d failed\n",
-				i));
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/* buffer for storing EPROM data read in during initialisation */
-static __initdata __u8 eprom_buf[0xFFFF];
-static voyager_module_t *voyager_initial_module;
-
-/* Initialise the cat bus components.  We assume this is called by the
- * boot cpu *after* all memory initialisation has been done (so we can
- * use kmalloc) but before smp initialisation, so we can probe the SMP
- * configuration and pick up necessary information.  */
-void __init voyager_cat_init(void)
-{
-	voyager_module_t **modpp = &voyager_initial_module;
-	voyager_asic_t **asicpp;
-	voyager_asic_t *qabc_asic = NULL;
-	int i, j;
-	unsigned long qic_addr = 0;
-	__u8 qabc_data[0x20];
-	__u8 num_submodules, val;
-	voyager_eprom_hdr_t *eprom_hdr = (voyager_eprom_hdr_t *) & eprom_buf[0];
-
-	__u8 cmos[4];
-	unsigned long addr;
-
-	/* initiallise the SUS mailbox */
-	for (i = 0; i < sizeof(cmos); i++)
-		cmos[i] = voyager_extended_cmos_read(VOYAGER_DUMP_LOCATION + i);
-	addr = *(unsigned long *)cmos;
-	if ((addr & 0xff000000) != 0xff000000) {
-		printk(KERN_ERR
-		       "Voyager failed to get SUS mailbox (addr = 0x%lx\n",
-		       addr);
-	} else {
-		static struct resource res;
-
-		res.name = "voyager SUS";
-		res.start = addr;
-		res.end = addr + 0x3ff;
-
-		request_resource(&iomem_resource, &res);
-		voyager_SUS = (struct voyager_SUS *)
-		    ioremap(addr, 0x400);
-		printk(KERN_NOTICE "Voyager SUS mailbox version 0x%x\n",
-		       voyager_SUS->SUS_version);
-		voyager_SUS->kernel_version = VOYAGER_MAILBOX_VERSION;
-		voyager_SUS->kernel_flags = VOYAGER_OS_HAS_SYSINT;
-	}
-
-	/* clear the processor counts */
-	voyager_extended_vic_processors = 0;
-	voyager_quad_processors = 0;
-
-	printk("VOYAGER: beginning CAT bus probe\n");
-	/* set up the SuperSet Port Block which tells us where the
-	 * CAT communication port is */
-	sspb = inb(VOYAGER_SSPB_RELOCATION_PORT) * 0x100;
-	VDEBUG(("VOYAGER DEBUG: sspb = 0x%x\n", sspb));
-
-	/* now find out if were 8 slot or normal */
-	if ((inb(VIC_PROC_WHO_AM_I) & EIGHT_SLOT_IDENTIFIER)
-	    == EIGHT_SLOT_IDENTIFIER) {
-		voyager_8slot = 1;
-		printk(KERN_NOTICE
-		       "Voyager: Eight slot 51xx configuration detected\n");
-	}
-
-	for (i = VOYAGER_MIN_MODULE; i <= VOYAGER_MAX_MODULE; i++) {
-		__u8 input;
-		int asic;
-		__u16 eprom_size;
-		__u16 sp_offset;
-
-		outb(VOYAGER_CAT_DESELECT, VOYAGER_CAT_CONFIG_PORT);
-		outb(i, VOYAGER_CAT_CONFIG_PORT);
-
-		/* check the presence of the module */
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		outb(VOYAGER_CAT_IRCYC, CAT_CMD);
-		outb(VOYAGER_CAT_HEADER, CAT_DATA);
-		/* stream series of alternating 1's and 0's to stimulate
-		 * response */
-		outb(0xAA, CAT_DATA);
-		input = inb(CAT_DATA);
-		outb(VOYAGER_CAT_END, CAT_CMD);
-		if (input != VOYAGER_CAT_HEADER) {
-			continue;
-		}
-		CDEBUG(("VOYAGER DEBUG: found module id 0x%x, %s\n", i,
-			cat_module_name(i)));
-		*modpp = kmalloc(sizeof(voyager_module_t), GFP_KERNEL);	/*&voyager_module_storage[cat_count++]; */
-		if (*modpp == NULL) {
-			printk("**WARNING** kmalloc failure in cat_init\n");
-			continue;
-		}
-		memset(*modpp, 0, sizeof(voyager_module_t));
-		/* need temporary asic for cat_subread.  It will be
-		 * filled in correctly later */
-		(*modpp)->asic = kmalloc(sizeof(voyager_asic_t), GFP_KERNEL);	/*&voyager_asic_storage[asic_count]; */
-		if ((*modpp)->asic == NULL) {
-			printk("**WARNING** kmalloc failure in cat_init\n");
-			continue;
-		}
-		memset((*modpp)->asic, 0, sizeof(voyager_asic_t));
-		(*modpp)->asic->asic_id = VOYAGER_CAT_ID;
-		(*modpp)->asic->subaddr = VOYAGER_SUBADDR_HI;
-		(*modpp)->module_addr = i;
-		(*modpp)->scan_path_connected = 0;
-		if (i == VOYAGER_PSI) {
-			/* Exception leg for modules with no EEPROM */
-			printk("Module \"%s\"\n", cat_module_name(i));
-			continue;
-		}
-
-		CDEBUG(("cat_init: Reading eeprom for module 0x%x at offset %d\n", i, VOYAGER_XSUM_END_OFFSET));
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		cat_disconnect(*modpp, (*modpp)->asic);
-		if (cat_subread(*modpp, (*modpp)->asic,
-				VOYAGER_XSUM_END_OFFSET, sizeof(eprom_size),
-				&eprom_size)) {
-			printk
-			    ("**WARNING**: Voyager couldn't read EPROM size for module 0x%x\n",
-			     i);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		if (eprom_size > sizeof(eprom_buf)) {
-			printk
-			    ("**WARNING**: Voyager insufficient size to read EPROM data, module 0x%x.  Need %d\n",
-			     i, eprom_size);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		outb(VOYAGER_CAT_END, CAT_CMD);
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		CDEBUG(("cat_init: module 0x%x, eeprom_size %d\n", i,
-			eprom_size));
-		if (cat_subread
-		    (*modpp, (*modpp)->asic, 0, eprom_size, eprom_buf)) {
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		outb(VOYAGER_CAT_END, CAT_CMD);
-		printk("Module \"%s\", version 0x%x, tracer 0x%x, asics %d\n",
-		       cat_module_name(i), eprom_hdr->version_id,
-		       *((__u32 *) eprom_hdr->tracer), eprom_hdr->num_asics);
-		(*modpp)->ee_size = eprom_hdr->ee_size;
-		(*modpp)->num_asics = eprom_hdr->num_asics;
-		asicpp = &((*modpp)->asic);
-		sp_offset = eprom_hdr->scan_path_offset;
-		/* All we really care about are the Quad cards.  We
-		 * identify them because they are in a processor slot
-		 * and have only four asics */
-		if ((i < 0x10 || (i >= 0x14 && i < 0x1c) || i > 0x1f)) {
-			modpp = &((*modpp)->next);
-			continue;
-		}
-		/* Now we know it's in a processor slot, does it have
-		 * a quad baseboard submodule */
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		cat_read(*modpp, (*modpp)->asic, VOYAGER_SUBMODPRESENT,
-			 &num_submodules);
-		/* lowest two bits, active low */
-		num_submodules = ~(0xfc | num_submodules);
-		CDEBUG(("VOYAGER CAT: %d submodules present\n",
-			num_submodules));
-		if (num_submodules == 0) {
-			/* fill in the dyadic extended processors */
-			__u8 cpu = i & 0x07;
-
-			printk("Module \"%s\": Dyadic Processor Card\n",
-			       cat_module_name(i));
-			voyager_extended_vic_processors |= (1 << cpu);
-			cpu += 4;
-			voyager_extended_vic_processors |= (1 << cpu);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-
-		/* now we want to read the asics on the first submodule,
-		 * which should be the quad base board */
-
-		cat_read(*modpp, (*modpp)->asic, VOYAGER_SUBMODSELECT, &val);
-		CDEBUG(("cat_init: SUBMODSELECT value = 0x%x\n", val));
-		val = (val & 0x7c) | VOYAGER_QUAD_BASEBOARD;
-		cat_write(*modpp, (*modpp)->asic, VOYAGER_SUBMODSELECT, val);
-
-		outb(VOYAGER_CAT_END, CAT_CMD);
-
-		CDEBUG(("cat_init: Reading eeprom for module 0x%x at offset %d\n", i, VOYAGER_XSUM_END_OFFSET));
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		cat_disconnect(*modpp, (*modpp)->asic);
-		if (cat_subread(*modpp, (*modpp)->asic,
-				VOYAGER_XSUM_END_OFFSET, sizeof(eprom_size),
-				&eprom_size)) {
-			printk
-			    ("**WARNING**: Voyager couldn't read EPROM size for module 0x%x\n",
-			     i);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		if (eprom_size > sizeof(eprom_buf)) {
-			printk
-			    ("**WARNING**: Voyager insufficient size to read EPROM data, module 0x%x.  Need %d\n",
-			     i, eprom_size);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		outb(VOYAGER_CAT_END, CAT_CMD);
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		CDEBUG(("cat_init: module 0x%x, eeprom_size %d\n", i,
-			eprom_size));
-		if (cat_subread
-		    (*modpp, (*modpp)->asic, 0, eprom_size, eprom_buf)) {
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			continue;
-		}
-		outb(VOYAGER_CAT_END, CAT_CMD);
-		/* Now do everything for the QBB submodule 1 */
-		(*modpp)->ee_size = eprom_hdr->ee_size;
-		(*modpp)->num_asics = eprom_hdr->num_asics;
-		asicpp = &((*modpp)->asic);
-		sp_offset = eprom_hdr->scan_path_offset;
-		/* get rid of the dummy CAT asic and read the real one */
-		kfree((*modpp)->asic);
-		for (asic = 0; asic < (*modpp)->num_asics; asic++) {
-			int j;
-			voyager_asic_t *asicp = *asicpp = kzalloc(sizeof(voyager_asic_t), GFP_KERNEL);	/*&voyager_asic_storage[asic_count++]; */
-			voyager_sp_table_t *sp_table;
-			voyager_at_t *asic_table;
-			voyager_jtt_t *jtag_table;
-
-			if (asicp == NULL) {
-				printk
-				    ("**WARNING** kmalloc failure in cat_init\n");
-				continue;
-			}
-			asicpp = &(asicp->next);
-			asicp->asic_location = asic;
-			sp_table =
-			    (voyager_sp_table_t *) (eprom_buf + sp_offset);
-			asicp->asic_id = sp_table->asic_id;
-			asic_table =
-			    (voyager_at_t *) (eprom_buf +
-					      sp_table->asic_data_offset);
-			for (j = 0; j < 4; j++)
-				asicp->jtag_id[j] = asic_table->jtag_id[j];
-			jtag_table =
-			    (voyager_jtt_t *) (eprom_buf +
-					       asic_table->jtag_offset);
-			asicp->ireg_length = jtag_table->ireg_len;
-			asicp->bit_location = (*modpp)->inst_bits;
-			(*modpp)->inst_bits += asicp->ireg_length;
-			if (asicp->ireg_length > (*modpp)->largest_reg)
-				(*modpp)->largest_reg = asicp->ireg_length;
-			if (asicp->ireg_length < (*modpp)->smallest_reg ||
-			    (*modpp)->smallest_reg == 0)
-				(*modpp)->smallest_reg = asicp->ireg_length;
-			CDEBUG(("asic 0x%x, ireg_length=%d, bit_location=%d\n",
-				asicp->asic_id, asicp->ireg_length,
-				asicp->bit_location));
-			if (asicp->asic_id == VOYAGER_QUAD_QABC) {
-				CDEBUG(("VOYAGER CAT: QABC ASIC found\n"));
-				qabc_asic = asicp;
-			}
-			sp_offset += sizeof(voyager_sp_table_t);
-		}
-		CDEBUG(("Module inst_bits = %d, largest_reg = %d, smallest_reg=%d\n", (*modpp)->inst_bits, (*modpp)->largest_reg, (*modpp)->smallest_reg));
-		/* OK, now we have the QUAD ASICs set up, use them.
-		 * we need to:
-		 *
-		 * 1. Find the Memory area for the Quad CPIs.
-		 * 2. Find the Extended VIC processor
-		 * 3. Configure a second extended VIC processor (This
-		 *    cannot be done for the 51xx.
-		 * */
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		cat_connect(*modpp, (*modpp)->asic);
-		CDEBUG(("CAT CONNECTED!!\n"));
-		cat_subread(*modpp, qabc_asic, 0, sizeof(qabc_data), qabc_data);
-		qic_addr = qabc_data[5] << 8;
-		qic_addr = (qic_addr | qabc_data[6]) << 8;
-		qic_addr = (qic_addr | qabc_data[7]) << 8;
-		printk
-		    ("Module \"%s\": Quad Processor Card; CPI 0x%lx, SET=0x%x\n",
-		     cat_module_name(i), qic_addr, qabc_data[8]);
-#if 0				/* plumbing fails---FIXME */
-		if ((qabc_data[8] & 0xf0) == 0) {
-			/* FIXME: 32 way 8 CPU slot monster cannot be
-			 * plumbed this way---need to check for it */
-
-			printk("Plumbing second Extended Quad Processor\n");
-			/* second VIC line hardwired to Quad CPU 1 */
-			qabc_data[8] |= 0x20;
-			cat_subwrite(*modpp, qabc_asic, 8, 1, &qabc_data[8]);
-#ifdef VOYAGER_CAT_DEBUG
-			/* verify plumbing */
-			cat_subread(*modpp, qabc_asic, 8, 1, &qabc_data[8]);
-			if ((qabc_data[8] & 0xf0) == 0) {
-				CDEBUG(("PLUMBING FAILED: 0x%x\n",
-					qabc_data[8]));
-			}
-#endif
-		}
-#endif
-
-		{
-			struct resource *res =
-			    kzalloc(sizeof(struct resource), GFP_KERNEL);
-			res->name = kmalloc(128, GFP_KERNEL);
-			sprintf((char *)res->name, "Voyager %s Quad CPI",
-				cat_module_name(i));
-			res->start = qic_addr;
-			res->end = qic_addr + 0x3ff;
-			request_resource(&iomem_resource, res);
-		}
-
-		qic_addr = (unsigned long)ioremap_cache(qic_addr, 0x400);
-
-		for (j = 0; j < 4; j++) {
-			__u8 cpu;
-
-			if (voyager_8slot) {
-				/* 8 slot has a different mapping,
-				 * each slot has only one vic line, so
-				 * 1 cpu in each slot must be < 8 */
-				cpu = (i & 0x07) + j * 8;
-			} else {
-				cpu = (i & 0x03) + j * 4;
-			}
-			if ((qabc_data[8] & (1 << j))) {
-				voyager_extended_vic_processors |= (1 << cpu);
-			}
-			if (qabc_data[8] & (1 << (j + 4))) {
-				/* Second SET register plumbed: Quad
-				 * card has two VIC connected CPUs.
-				 * Secondary cannot be booted as a VIC
-				 * CPU */
-				voyager_extended_vic_processors |= (1 << cpu);
-				voyager_allowed_boot_processors &=
-				    (~(1 << cpu));
-			}
-
-			voyager_quad_processors |= (1 << cpu);
-			voyager_quad_cpi_addr[cpu] = (struct voyager_qic_cpi *)
-			    (qic_addr + (j << 8));
-			CDEBUG(("CPU%d: CPI address 0x%lx\n", cpu,
-				(unsigned long)voyager_quad_cpi_addr[cpu]));
-		}
-		outb(VOYAGER_CAT_END, CAT_CMD);
-
-		*asicpp = NULL;
-		modpp = &((*modpp)->next);
-	}
-	*modpp = NULL;
-	printk
-	    ("CAT Bus Initialisation finished: extended procs 0x%x, quad procs 0x%x, allowed vic boot = 0x%x\n",
-	     voyager_extended_vic_processors, voyager_quad_processors,
-	     voyager_allowed_boot_processors);
-	request_resource(&ioport_resource, &vic_res);
-	if (voyager_quad_processors)
-		request_resource(&ioport_resource, &qic_res);
-	/* set up the front power switch */
-}
-
-int voyager_cat_readb(__u8 module, __u8 asic, int reg)
-{
-	return 0;
-}
-
-static int cat_disconnect(voyager_module_t * modp, voyager_asic_t * asicp)
-{
-	__u8 val;
-	int err = 0;
-
-	if (!modp->scan_path_connected)
-		return 0;
-	if (asicp->asic_id != VOYAGER_CAT_ID) {
-		CDEBUG(("cat_disconnect: ASIC is not CAT\n"));
-		return 1;
-	}
-	err = cat_read(modp, asicp, VOYAGER_SCANPATH, &val);
-	if (err) {
-		CDEBUG(("cat_disconnect: failed to read SCANPATH\n"));
-		return err;
-	}
-	val &= VOYAGER_DISCONNECT_ASIC;
-	err = cat_write(modp, asicp, VOYAGER_SCANPATH, val);
-	if (err) {
-		CDEBUG(("cat_disconnect: failed to write SCANPATH\n"));
-		return err;
-	}
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	modp->scan_path_connected = 0;
-
-	return 0;
-}
-
-static int cat_connect(voyager_module_t * modp, voyager_asic_t * asicp)
-{
-	__u8 val;
-	int err = 0;
-
-	if (modp->scan_path_connected)
-		return 0;
-	if (asicp->asic_id != VOYAGER_CAT_ID) {
-		CDEBUG(("cat_connect: ASIC is not CAT\n"));
-		return 1;
-	}
-
-	err = cat_read(modp, asicp, VOYAGER_SCANPATH, &val);
-	if (err) {
-		CDEBUG(("cat_connect: failed to read SCANPATH\n"));
-		return err;
-	}
-	val |= VOYAGER_CONNECT_ASIC;
-	err = cat_write(modp, asicp, VOYAGER_SCANPATH, val);
-	if (err) {
-		CDEBUG(("cat_connect: failed to write SCANPATH\n"));
-		return err;
-	}
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	modp->scan_path_connected = 1;
-
-	return 0;
-}
-
-void voyager_cat_power_off(void)
-{
-	/* Power the machine off by writing to the PSI over the CAT
-	 * bus */
-	__u8 data;
-	voyager_module_t psi = { 0 };
-	voyager_asic_t psi_asic = { 0 };
-
-	psi.asic = &psi_asic;
-	psi.asic->asic_id = VOYAGER_CAT_ID;
-	psi.asic->subaddr = VOYAGER_SUBADDR_HI;
-	psi.module_addr = VOYAGER_PSI;
-	psi.scan_path_connected = 0;
-
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	/* Connect the PSI to the CAT Bus */
-	outb(VOYAGER_CAT_DESELECT, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_PSI, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	cat_disconnect(&psi, &psi_asic);
-	/* Read the status */
-	cat_subread(&psi, &psi_asic, VOYAGER_PSI_GENERAL_REG, 1, &data);
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	CDEBUG(("PSI STATUS 0x%x\n", data));
-	/* These two writes are power off prep and perform */
-	data = PSI_CLEAR;
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	cat_subwrite(&psi, &psi_asic, VOYAGER_PSI_GENERAL_REG, 1, &data);
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	data = PSI_POWER_DOWN;
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	cat_subwrite(&psi, &psi_asic, VOYAGER_PSI_GENERAL_REG, 1, &data);
-	outb(VOYAGER_CAT_END, CAT_CMD);
-}
-
-struct voyager_status voyager_status = { 0 };
-
-void voyager_cat_psi(__u8 cmd, __u16 reg, __u8 * data)
-{
-	voyager_module_t psi = { 0 };
-	voyager_asic_t psi_asic = { 0 };
-
-	psi.asic = &psi_asic;
-	psi.asic->asic_id = VOYAGER_CAT_ID;
-	psi.asic->subaddr = VOYAGER_SUBADDR_HI;
-	psi.module_addr = VOYAGER_PSI;
-	psi.scan_path_connected = 0;
-
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	/* Connect the PSI to the CAT Bus */
-	outb(VOYAGER_CAT_DESELECT, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_PSI, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	cat_disconnect(&psi, &psi_asic);
-	switch (cmd) {
-	case VOYAGER_PSI_READ:
-		cat_read(&psi, &psi_asic, reg, data);
-		break;
-	case VOYAGER_PSI_WRITE:
-		cat_write(&psi, &psi_asic, reg, *data);
-		break;
-	case VOYAGER_PSI_SUBREAD:
-		cat_subread(&psi, &psi_asic, reg, 1, data);
-		break;
-	case VOYAGER_PSI_SUBWRITE:
-		cat_subwrite(&psi, &psi_asic, reg, 1, data);
-		break;
-	default:
-		printk(KERN_ERR "Voyager PSI, unrecognised command %d\n", cmd);
-		break;
-	}
-	outb(VOYAGER_CAT_END, CAT_CMD);
-}
-
-void voyager_cat_do_common_interrupt(void)
-{
-	/* This is caused either by a memory parity error or something
-	 * in the PSI */
-	__u8 data;
-	voyager_module_t psi = { 0 };
-	voyager_asic_t psi_asic = { 0 };
-	struct voyager_psi psi_reg;
-	int i;
-      re_read:
-	psi.asic = &psi_asic;
-	psi.asic->asic_id = VOYAGER_CAT_ID;
-	psi.asic->subaddr = VOYAGER_SUBADDR_HI;
-	psi.module_addr = VOYAGER_PSI;
-	psi.scan_path_connected = 0;
-
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	/* Connect the PSI to the CAT Bus */
-	outb(VOYAGER_CAT_DESELECT, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_PSI, VOYAGER_CAT_CONFIG_PORT);
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	cat_disconnect(&psi, &psi_asic);
-	/* Read the status.  NOTE: Need to read *all* the PSI regs here
-	 * otherwise the cmn int will be reasserted */
-	for (i = 0; i < sizeof(psi_reg.regs); i++) {
-		cat_read(&psi, &psi_asic, i, &((__u8 *) & psi_reg.regs)[i]);
-	}
-	outb(VOYAGER_CAT_END, CAT_CMD);
-	if ((psi_reg.regs.checkbit & 0x02) == 0) {
-		psi_reg.regs.checkbit |= 0x02;
-		cat_write(&psi, &psi_asic, 5, psi_reg.regs.checkbit);
-		printk("VOYAGER RE-READ PSI\n");
-		goto re_read;
-	}
-	outb(VOYAGER_CAT_RUN, CAT_CMD);
-	for (i = 0; i < sizeof(psi_reg.subregs); i++) {
-		/* This looks strange, but the PSI doesn't do auto increment
-		 * correctly */
-		cat_subread(&psi, &psi_asic, VOYAGER_PSI_SUPPLY_REG + i,
-			    1, &((__u8 *) & psi_reg.subregs)[i]);
-	}
-	outb(VOYAGER_CAT_END, CAT_CMD);
-#ifdef VOYAGER_CAT_DEBUG
-	printk("VOYAGER PSI: ");
-	for (i = 0; i < sizeof(psi_reg.regs); i++)
-		printk("%02x ", ((__u8 *) & psi_reg.regs)[i]);
-	printk("\n           ");
-	for (i = 0; i < sizeof(psi_reg.subregs); i++)
-		printk("%02x ", ((__u8 *) & psi_reg.subregs)[i]);
-	printk("\n");
-#endif
-	if (psi_reg.regs.intstatus & PSI_MON) {
-		/* switch off or power fail */
-
-		if (psi_reg.subregs.supply & PSI_SWITCH_OFF) {
-			if (voyager_status.switch_off) {
-				printk(KERN_ERR
-				       "Voyager front panel switch turned off again---Immediate power off!\n");
-				voyager_cat_power_off();
-				/* not reached */
-			} else {
-				printk(KERN_ERR
-				       "Voyager front panel switch turned off\n");
-				voyager_status.switch_off = 1;
-				voyager_status.request_from_kernel = 1;
-				wake_up_process(voyager_thread);
-			}
-			/* Tell the hardware we're taking care of the
-			 * shutdown, otherwise it will power the box off
-			 * within 3 seconds of the switch being pressed and,
-			 * which is much more important to us, continue to 
-			 * assert the common interrupt */
-			data = PSI_CLR_SWITCH_OFF;
-			outb(VOYAGER_CAT_RUN, CAT_CMD);
-			cat_subwrite(&psi, &psi_asic, VOYAGER_PSI_SUPPLY_REG,
-				     1, &data);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-		} else {
-
-			VDEBUG(("Voyager ac fail reg 0x%x\n",
-				psi_reg.subregs.ACfail));
-			if ((psi_reg.subregs.ACfail & AC_FAIL_STAT_CHANGE) == 0) {
-				/* No further update */
-				return;
-			}
-#if 0
-			/* Don't bother trying to find out who failed.
-			 * FIXME: This probably makes the code incorrect on
-			 * anything other than a 345x */
-			for (i = 0; i < 5; i++) {
-				if (psi_reg.subregs.ACfail & (1 << i)) {
-					break;
-				}
-			}
-			printk(KERN_NOTICE "AC FAIL IN SUPPLY %d\n", i);
-#endif
-			/* DON'T do this: it shuts down the AC PSI 
-			   outb(VOYAGER_CAT_RUN, CAT_CMD);
-			   data = PSI_MASK_MASK | i;
-			   cat_subwrite(&psi, &psi_asic, VOYAGER_PSI_MASK,
-			   1, &data);
-			   outb(VOYAGER_CAT_END, CAT_CMD);
-			 */
-			printk(KERN_ERR "Voyager AC power failure\n");
-			outb(VOYAGER_CAT_RUN, CAT_CMD);
-			data = PSI_COLD_START;
-			cat_subwrite(&psi, &psi_asic, VOYAGER_PSI_GENERAL_REG,
-				     1, &data);
-			outb(VOYAGER_CAT_END, CAT_CMD);
-			voyager_status.power_fail = 1;
-			voyager_status.request_from_kernel = 1;
-			wake_up_process(voyager_thread);
-		}
-
-	} else if (psi_reg.regs.intstatus & PSI_FAULT) {
-		/* Major fault! */
-		printk(KERN_ERR
-		       "Voyager PSI Detected major fault, immediate power off!\n");
-		voyager_cat_power_off();
-		/* not reached */
-	} else if (psi_reg.regs.intstatus & (PSI_DC_FAIL | PSI_ALARM
-					     | PSI_CURRENT | PSI_DVM
-					     | PSI_PSCFAULT | PSI_STAT_CHG)) {
-		/* other psi fault */
-
-		printk(KERN_WARNING "Voyager PSI status 0x%x\n", data);
-		/* clear the PSI fault */
-		outb(VOYAGER_CAT_RUN, CAT_CMD);
-		cat_write(&psi, &psi_asic, VOYAGER_PSI_STATUS_REG, 0);
-		outb(VOYAGER_CAT_END, CAT_CMD);
-	}
-}
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
deleted file mode 100644
index b9cc84a2a4fc..000000000000
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ /dev/null
@@ -1,1807 +0,0 @@
-/* -*- mode: c; c-basic-offset: 8 -*- */
-
-/* Copyright (C) 1999,2001
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * This file provides all the same external entries as smp.c but uses
- * the voyager hal to provide the functionality
- */
-#include <linux/cpu.h>
-#include <linux/module.h>
-#include <linux/mm.h>
-#include <linux/kernel_stat.h>
-#include <linux/delay.h>
-#include <linux/mc146818rtc.h>
-#include <linux/cache.h>
-#include <linux/interrupt.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/bootmem.h>
-#include <linux/completion.h>
-#include <asm/desc.h>
-#include <asm/voyager.h>
-#include <asm/vic.h>
-#include <asm/mtrr.h>
-#include <asm/pgalloc.h>
-#include <asm/tlbflush.h>
-#include <asm/arch_hooks.h>
-#include <asm/trampoline.h>
-
-/* TLB state -- visible externally, indexed physically */
-DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { &init_mm, 0 };
-
-/* CPU IRQ affinity -- set to all ones initially */
-static unsigned long cpu_irq_affinity[NR_CPUS] __cacheline_aligned =
-	{[0 ... NR_CPUS-1]  = ~0UL };
-
-/* per CPU data structure (for /proc/cpuinfo et al), visible externally
- * indexed physically */
-DEFINE_PER_CPU_SHARED_ALIGNED(struct cpuinfo_x86, cpu_info);
-EXPORT_PER_CPU_SYMBOL(cpu_info);
-
-/* physical ID of the CPU used to boot the system */
-unsigned char boot_cpu_id;
-
-/* The memory line addresses for the Quad CPIs */
-struct voyager_qic_cpi *voyager_quad_cpi_addr[NR_CPUS] __cacheline_aligned;
-
-/* The masks for the Extended VIC processors, filled in by cat_init */
-__u32 voyager_extended_vic_processors = 0;
-
-/* Masks for the extended Quad processors which cannot be VIC booted */
-__u32 voyager_allowed_boot_processors = 0;
-
-/* The mask for the Quad Processors (both extended and non-extended) */
-__u32 voyager_quad_processors = 0;
-
-/* Total count of live CPUs, used in process.c to display
- * the CPU information and in irq.c for the per CPU irq
- * activity count.  Finally exported by i386_ksyms.c */
-static int voyager_extended_cpus = 1;
-
-/* Used for the invalidate map that's also checked in the spinlock */
-static volatile unsigned long smp_invalidate_needed;
-
-/* Bitmask of CPUs present in the system - exported by i386_syms.c, used
- * by scheduler but indexed physically */
-static cpumask_t voyager_phys_cpu_present_map = CPU_MASK_NONE;
-
-/* The internal functions */
-static void send_CPI(__u32 cpuset, __u8 cpi);
-static void ack_CPI(__u8 cpi);
-static int ack_QIC_CPI(__u8 cpi);
-static void ack_special_QIC_CPI(__u8 cpi);
-static void ack_VIC_CPI(__u8 cpi);
-static void send_CPI_allbutself(__u8 cpi);
-static void mask_vic_irq(unsigned int irq);
-static void unmask_vic_irq(unsigned int irq);
-static unsigned int startup_vic_irq(unsigned int irq);
-static void enable_local_vic_irq(unsigned int irq);
-static void disable_local_vic_irq(unsigned int irq);
-static void before_handle_vic_irq(unsigned int irq);
-static void after_handle_vic_irq(unsigned int irq);
-static void set_vic_irq_affinity(unsigned int irq, const struct cpumask *mask);
-static void ack_vic_irq(unsigned int irq);
-static void vic_enable_cpi(void);
-static void do_boot_cpu(__u8 cpuid);
-static void do_quad_bootstrap(void);
-static void initialize_secondary(void);
-
-int hard_smp_processor_id(void);
-int safe_smp_processor_id(void);
-
-/* Inline functions */
-static inline void send_one_QIC_CPI(__u8 cpu, __u8 cpi)
-{
-	voyager_quad_cpi_addr[cpu]->qic_cpi[cpi].cpi =
-	    (smp_processor_id() << 16) + cpi;
-}
-
-static inline void send_QIC_CPI(__u32 cpuset, __u8 cpi)
-{
-	int cpu;
-
-	for_each_online_cpu(cpu) {
-		if (cpuset & (1 << cpu)) {
-#ifdef VOYAGER_DEBUG
-			if (!cpu_online(cpu))
-				VDEBUG(("CPU%d sending cpi %d to CPU%d not in "
-					"cpu_online_map\n",
-					hard_smp_processor_id(), cpi, cpu));
-#endif
-			send_one_QIC_CPI(cpu, cpi - QIC_CPI_OFFSET);
-		}
-	}
-}
-
-static inline void wrapper_smp_local_timer_interrupt(void)
-{
-	irq_enter();
-	smp_local_timer_interrupt();
-	irq_exit();
-}
-
-static inline void send_one_CPI(__u8 cpu, __u8 cpi)
-{
-	if (voyager_quad_processors & (1 << cpu))
-		send_one_QIC_CPI(cpu, cpi - QIC_CPI_OFFSET);
-	else
-		send_CPI(1 << cpu, cpi);
-}
-
-static inline void send_CPI_allbutself(__u8 cpi)
-{
-	__u8 cpu = smp_processor_id();
-	__u32 mask = cpus_addr(cpu_online_map)[0] & ~(1 << cpu);
-	send_CPI(mask, cpi);
-}
-
-static inline int is_cpu_quad(void)
-{
-	__u8 cpumask = inb(VIC_PROC_WHO_AM_I);
-	return ((cpumask & QUAD_IDENTIFIER) == QUAD_IDENTIFIER);
-}
-
-static inline int is_cpu_extended(void)
-{
-	__u8 cpu = hard_smp_processor_id();
-
-	return (voyager_extended_vic_processors & (1 << cpu));
-}
-
-static inline int is_cpu_vic_boot(void)
-{
-	__u8 cpu = hard_smp_processor_id();
-
-	return (voyager_extended_vic_processors
-		& voyager_allowed_boot_processors & (1 << cpu));
-}
-
-static inline void ack_CPI(__u8 cpi)
-{
-	switch (cpi) {
-	case VIC_CPU_BOOT_CPI:
-		if (is_cpu_quad() && !is_cpu_vic_boot())
-			ack_QIC_CPI(cpi);
-		else
-			ack_VIC_CPI(cpi);
-		break;
-	case VIC_SYS_INT:
-	case VIC_CMN_INT:
-		/* These are slightly strange.  Even on the Quad card,
-		 * They are vectored as VIC CPIs */
-		if (is_cpu_quad())
-			ack_special_QIC_CPI(cpi);
-		else
-			ack_VIC_CPI(cpi);
-		break;
-	default:
-		printk("VOYAGER ERROR: CPI%d is in common CPI code\n", cpi);
-		break;
-	}
-}
-
-/* local variables */
-
-/* The VIC IRQ descriptors -- these look almost identical to the
- * 8259 IRQs except that masks and things must be kept per processor
- */
-static struct irq_chip vic_chip = {
-	.name = "VIC",
-	.startup = startup_vic_irq,
-	.mask = mask_vic_irq,
-	.unmask = unmask_vic_irq,
-	.set_affinity = set_vic_irq_affinity,
-};
-
-/* used to count up as CPUs are brought on line (starts at 0) */
-static int cpucount = 0;
-
-/* The per cpu profile stuff - used in smp_local_timer_interrupt */
-static DEFINE_PER_CPU(int, prof_multiplier) = 1;
-static DEFINE_PER_CPU(int, prof_old_multiplier) = 1;
-static DEFINE_PER_CPU(int, prof_counter) = 1;
-
-/* the map used to check if a CPU has booted */
-static __u32 cpu_booted_map;
-
-/* the synchronize flag used to hold all secondary CPUs spinning in
- * a tight loop until the boot sequence is ready for them */
-static cpumask_t smp_commenced_mask = CPU_MASK_NONE;
-
-/* This is for the new dynamic CPU boot code */
-
-/* The per processor IRQ masks (these are usually kept in sync) */
-static __u16 vic_irq_mask[NR_CPUS] __cacheline_aligned;
-
-/* the list of IRQs to be enabled by the VIC_ENABLE_IRQ_CPI */
-static __u16 vic_irq_enable_mask[NR_CPUS] __cacheline_aligned = { 0 };
-
-/* Lock for enable/disable of VIC interrupts */
-static __cacheline_aligned DEFINE_SPINLOCK(vic_irq_lock);
-
-/* The boot processor is correctly set up in PC mode when it
- * comes up, but the secondaries need their master/slave 8259
- * pairs initializing correctly */
-
-/* Interrupt counters (per cpu) and total - used to try to
- * even up the interrupt handling routines */
-static long vic_intr_total = 0;
-static long vic_intr_count[NR_CPUS] __cacheline_aligned = { 0 };
-static unsigned long vic_tick[NR_CPUS] __cacheline_aligned = { 0 };
-
-/* Since we can only use CPI0, we fake all the other CPIs */
-static unsigned long vic_cpi_mailbox[NR_CPUS] __cacheline_aligned;
-
-/* debugging routine to read the isr of the cpu's pic */
-static inline __u16 vic_read_isr(void)
-{
-	__u16 isr;
-
-	outb(0x0b, 0xa0);
-	isr = inb(0xa0) << 8;
-	outb(0x0b, 0x20);
-	isr |= inb(0x20);
-
-	return isr;
-}
-
-static __init void qic_setup(void)
-{
-	if (!is_cpu_quad()) {
-		/* not a quad, no setup */
-		return;
-	}
-	outb(QIC_DEFAULT_MASK0, QIC_MASK_REGISTER0);
-	outb(QIC_CPI_ENABLE, QIC_MASK_REGISTER1);
-
-	if (is_cpu_extended()) {
-		/* the QIC duplicate of the VIC base register */
-		outb(VIC_DEFAULT_CPI_BASE, QIC_VIC_CPI_BASE_REGISTER);
-		outb(QIC_DEFAULT_CPI_BASE, QIC_CPI_BASE_REGISTER);
-
-		/* FIXME: should set up the QIC timer and memory parity
-		 * error vectors here */
-	}
-}
-
-static __init void vic_setup_pic(void)
-{
-	outb(1, VIC_REDIRECT_REGISTER_1);
-	/* clear the claim registers for dynamic routing */
-	outb(0, VIC_CLAIM_REGISTER_0);
-	outb(0, VIC_CLAIM_REGISTER_1);
-
-	outb(0, VIC_PRIORITY_REGISTER);
-	/* Set the Primary and Secondary Microchannel vector
-	 * bases to be the same as the ordinary interrupts
-	 *
-	 * FIXME: This would be more efficient using separate
-	 * vectors. */
-	outb(FIRST_EXTERNAL_VECTOR, VIC_PRIMARY_MC_BASE);
-	outb(FIRST_EXTERNAL_VECTOR, VIC_SECONDARY_MC_BASE);
-	/* Now initiallise the master PIC belonging to this CPU by
-	 * sending the four ICWs */
-
-	/* ICW1: level triggered, ICW4 needed */
-	outb(0x19, 0x20);
-
-	/* ICW2: vector base */
-	outb(FIRST_EXTERNAL_VECTOR, 0x21);
-
-	/* ICW3: slave at line 2 */
-	outb(0x04, 0x21);
-
-	/* ICW4: 8086 mode */
-	outb(0x01, 0x21);
-
-	/* now the same for the slave PIC */
-
-	/* ICW1: level trigger, ICW4 needed */
-	outb(0x19, 0xA0);
-
-	/* ICW2: slave vector base */
-	outb(FIRST_EXTERNAL_VECTOR + 8, 0xA1);
-
-	/* ICW3: slave ID */
-	outb(0x02, 0xA1);
-
-	/* ICW4: 8086 mode */
-	outb(0x01, 0xA1);
-}
-
-static void do_quad_bootstrap(void)
-{
-	if (is_cpu_quad() && is_cpu_vic_boot()) {
-		int i;
-		unsigned long flags;
-		__u8 cpuid = hard_smp_processor_id();
-
-		local_irq_save(flags);
-
-		for (i = 0; i < 4; i++) {
-			/* FIXME: this would be >>3 &0x7 on the 32 way */
-			if (((cpuid >> 2) & 0x03) == i)
-				/* don't lower our own mask! */
-				continue;
-
-			/* masquerade as local Quad CPU */
-			outb(QIC_CPUID_ENABLE | i, QIC_PROCESSOR_ID);
-			/* enable the startup CPI */
-			outb(QIC_BOOT_CPI_MASK, QIC_MASK_REGISTER1);
-			/* restore cpu id */
-			outb(0, QIC_PROCESSOR_ID);
-		}
-		local_irq_restore(flags);
-	}
-}
-
-void prefill_possible_map(void)
-{
-	/* This is empty on voyager because we need a much
-	 * earlier detection which is done in find_smp_config */
-}
-
-/* Set up all the basic stuff: read the SMP config and make all the
- * SMP information reflect only the boot cpu.  All others will be
- * brought on-line later. */
-void __init find_smp_config(void)
-{
-	int i;
-
-	boot_cpu_id = hard_smp_processor_id();
-
-	printk("VOYAGER SMP: Boot cpu is %d\n", boot_cpu_id);
-
-	/* initialize the CPU structures (moved from smp_boot_cpus) */
-	for (i = 0; i < nr_cpu_ids; i++)
-		cpu_irq_affinity[i] = ~0;
-	cpu_online_map = cpumask_of_cpu(boot_cpu_id);
-
-	/* The boot CPU must be extended */
-	voyager_extended_vic_processors = 1 << boot_cpu_id;
-	/* initially, all of the first 8 CPUs can boot */
-	voyager_allowed_boot_processors = 0xff;
-	/* set up everything for just this CPU, we can alter
-	 * this as we start the other CPUs later */
-	/* now get the CPU disposition from the extended CMOS */
-	cpus_addr(voyager_phys_cpu_present_map)[0] =
-	    voyager_extended_cmos_read(VOYAGER_PROCESSOR_PRESENT_MASK);
-	cpus_addr(voyager_phys_cpu_present_map)[0] |=
-	    voyager_extended_cmos_read(VOYAGER_PROCESSOR_PRESENT_MASK + 1) << 8;
-	cpus_addr(voyager_phys_cpu_present_map)[0] |=
-	    voyager_extended_cmos_read(VOYAGER_PROCESSOR_PRESENT_MASK +
-				       2) << 16;
-	cpus_addr(voyager_phys_cpu_present_map)[0] |=
-	    voyager_extended_cmos_read(VOYAGER_PROCESSOR_PRESENT_MASK +
-				       3) << 24;
-	init_cpu_possible(&voyager_phys_cpu_present_map);
-	printk("VOYAGER SMP: voyager_phys_cpu_present_map = 0x%lx\n",
-	       cpus_addr(voyager_phys_cpu_present_map)[0]);
-	/* Here we set up the VIC to enable SMP */
-	/* enable the CPIs by writing the base vector to their register */
-	outb(VIC_DEFAULT_CPI_BASE, VIC_CPI_BASE_REGISTER);
-	outb(1, VIC_REDIRECT_REGISTER_1);
-	/* set the claim registers for static routing --- Boot CPU gets
-	 * all interrupts untill all other CPUs started */
-	outb(0xff, VIC_CLAIM_REGISTER_0);
-	outb(0xff, VIC_CLAIM_REGISTER_1);
-	/* Set the Primary and Secondary Microchannel vector
-	 * bases to be the same as the ordinary interrupts
-	 *
-	 * FIXME: This would be more efficient using separate
-	 * vectors. */
-	outb(FIRST_EXTERNAL_VECTOR, VIC_PRIMARY_MC_BASE);
-	outb(FIRST_EXTERNAL_VECTOR, VIC_SECONDARY_MC_BASE);
-
-	/* Finally tell the firmware that we're driving */
-	outb(inb(VOYAGER_SUS_IN_CONTROL_PORT) | VOYAGER_IN_CONTROL_FLAG,
-	     VOYAGER_SUS_IN_CONTROL_PORT);
-
-	current_thread_info()->cpu = boot_cpu_id;
-	x86_write_percpu(cpu_number, boot_cpu_id);
-}
-
-/*
- *	The bootstrap kernel entry code has set these up. Save them
- *	for a given CPU, id is physical */
-void __init smp_store_cpu_info(int id)
-{
-	struct cpuinfo_x86 *c = &cpu_data(id);
-
-	*c = boot_cpu_data;
-	c->cpu_index = id;
-
-	identify_secondary_cpu(c);
-}
-
-/* Routine initially called when a non-boot CPU is brought online */
-static void __init start_secondary(void *unused)
-{
-	__u8 cpuid = hard_smp_processor_id();
-
-	cpu_init();
-
-	/* OK, we're in the routine */
-	ack_CPI(VIC_CPU_BOOT_CPI);
-
-	/* setup the 8259 master slave pair belonging to this CPU ---
-	 * we won't actually receive any until the boot CPU
-	 * relinquishes it's static routing mask */
-	vic_setup_pic();
-
-	qic_setup();
-
-	if (is_cpu_quad() && !is_cpu_vic_boot()) {
-		/* clear the boot CPI */
-		__u8 dummy;
-
-		dummy =
-		    voyager_quad_cpi_addr[cpuid]->qic_cpi[VIC_CPU_BOOT_CPI].cpi;
-		printk("read dummy %d\n", dummy);
-	}
-
-	/* lower the mask to receive CPIs */
-	vic_enable_cpi();
-
-	VDEBUG(("VOYAGER SMP: CPU%d, stack at about %p\n", cpuid, &cpuid));
-
-	notify_cpu_starting(cpuid);
-
-	/* enable interrupts */
-	local_irq_enable();
-
-	/* get our bogomips */
-	calibrate_delay();
-
-	/* save our processor parameters */
-	smp_store_cpu_info(cpuid);
-
-	/* if we're a quad, we may need to bootstrap other CPUs */
-	do_quad_bootstrap();
-
-	/* FIXME: this is rather a poor hack to prevent the CPU
-	 * activating softirqs while it's supposed to be waiting for
-	 * permission to proceed.  Without this, the new per CPU stuff
-	 * in the softirqs will fail */
-	local_irq_disable();
-	cpu_set(cpuid, cpu_callin_map);
-
-	/* signal that we're done */
-	cpu_booted_map = 1;
-
-	while (!cpu_isset(cpuid, smp_commenced_mask))
-		rep_nop();
-	local_irq_enable();
-
-	local_flush_tlb();
-
-	cpu_set(cpuid, cpu_online_map);
-	wmb();
-	cpu_idle();
-}
-
-/* Routine to kick start the given CPU and wait for it to report ready
- * (or timeout in startup).  When this routine returns, the requested
- * CPU is either fully running and configured or known to be dead.
- *
- * We call this routine sequentially 1 CPU at a time, so no need for
- * locking */
-
-static void __init do_boot_cpu(__u8 cpu)
-{
-	struct task_struct *idle;
-	int timeout;
-	unsigned long flags;
-	int quad_boot = (1 << cpu) & voyager_quad_processors
-	    & ~(voyager_extended_vic_processors
-		& voyager_allowed_boot_processors);
-
-	/* This is the format of the CPI IDT gate (in real mode) which
-	 * we're hijacking to boot the CPU */
-	union IDTFormat {
-		struct seg {
-			__u16 Offset;
-			__u16 Segment;
-		} idt;
-		__u32 val;
-	} hijack_source;
-
-	__u32 *hijack_vector;
-	__u32 start_phys_address = setup_trampoline();
-
-	/* There's a clever trick to this: The linux trampoline is
-	 * compiled to begin at absolute location zero, so make the
-	 * address zero but have the data segment selector compensate
-	 * for the actual address */
-	hijack_source.idt.Offset = start_phys_address & 0x000F;
-	hijack_source.idt.Segment = (start_phys_address >> 4) & 0xFFFF;
-
-	cpucount++;
-	alternatives_smp_switch(1);
-
-	idle = fork_idle(cpu);
-	if (IS_ERR(idle))
-		panic("failed fork for CPU%d", cpu);
-	idle->thread.ip = (unsigned long)start_secondary;
-	/* init_tasks (in sched.c) is indexed logically */
-	stack_start.sp = (void *)idle->thread.sp;
-
-	init_gdt(cpu);
-	per_cpu(current_task, cpu) = idle;
-	early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
-	irq_ctx_init(cpu);
-
-	/* Note: Don't modify initial ss override */
-	VDEBUG(("VOYAGER SMP: Booting CPU%d at 0x%lx[%x:%x], stack %p\n", cpu,
-		(unsigned long)hijack_source.val, hijack_source.idt.Segment,
-		hijack_source.idt.Offset, stack_start.sp));
-
-	/* init lowmem identity mapping */
-	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + KERNEL_PGD_BOUNDARY,
-			min_t(unsigned long, KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-	flush_tlb_all();
-
-	if (quad_boot) {
-		printk("CPU %d: non extended Quad boot\n", cpu);
-		hijack_vector =
-		    (__u32 *)
-		    phys_to_virt((VIC_CPU_BOOT_CPI + QIC_DEFAULT_CPI_BASE) * 4);
-		*hijack_vector = hijack_source.val;
-	} else {
-		printk("CPU%d: extended VIC boot\n", cpu);
-		hijack_vector =
-		    (__u32 *)
-		    phys_to_virt((VIC_CPU_BOOT_CPI + VIC_DEFAULT_CPI_BASE) * 4);
-		*hijack_vector = hijack_source.val;
-		/* VIC errata, may also receive interrupt at this address */
-		hijack_vector =
-		    (__u32 *)
-		    phys_to_virt((VIC_CPU_BOOT_ERRATA_CPI +
-				  VIC_DEFAULT_CPI_BASE) * 4);
-		*hijack_vector = hijack_source.val;
-	}
-	/* All non-boot CPUs start with interrupts fully masked.  Need
-	 * to lower the mask of the CPI we're about to send.  We do
-	 * this in the VIC by masquerading as the processor we're
-	 * about to boot and lowering its interrupt mask */
-	local_irq_save(flags);
-	if (quad_boot) {
-		send_one_QIC_CPI(cpu, VIC_CPU_BOOT_CPI);
-	} else {
-		outb(VIC_CPU_MASQUERADE_ENABLE | cpu, VIC_PROCESSOR_ID);
-		/* here we're altering registers belonging to `cpu' */
-
-		outb(VIC_BOOT_INTERRUPT_MASK, 0x21);
-		/* now go back to our original identity */
-		outb(boot_cpu_id, VIC_PROCESSOR_ID);
-
-		/* and boot the CPU */
-
-		send_CPI((1 << cpu), VIC_CPU_BOOT_CPI);
-	}
-	cpu_booted_map = 0;
-	local_irq_restore(flags);
-
-	/* now wait for it to become ready (or timeout) */
-	for (timeout = 0; timeout < 50000; timeout++) {
-		if (cpu_booted_map)
-			break;
-		udelay(100);
-	}
-	/* reset the page table */
-	zap_low_mappings();
-
-	if (cpu_booted_map) {
-		VDEBUG(("CPU%d: Booted successfully, back in CPU %d\n",
-			cpu, smp_processor_id()));
-
-		printk("CPU%d: ", cpu);
-		print_cpu_info(&cpu_data(cpu));
-		wmb();
-		cpu_set(cpu, cpu_callout_map);
-		cpu_set(cpu, cpu_present_map);
-	} else {
-		printk("CPU%d FAILED TO BOOT: ", cpu);
-		if (*
-		    ((volatile unsigned char *)phys_to_virt(start_phys_address))
-		    == 0xA5)
-			printk("Stuck.\n");
-		else
-			printk("Not responding.\n");
-
-		cpucount--;
-	}
-}
-
-void __init smp_boot_cpus(void)
-{
-	int i;
-
-	/* CAT BUS initialisation must be done after the memory */
-	/* FIXME: The L4 has a catbus too, it just needs to be
-	 * accessed in a totally different way */
-	if (voyager_level == 5) {
-		voyager_cat_init();
-
-		/* now that the cat has probed the Voyager System Bus, sanity
-		 * check the cpu map */
-		if (((voyager_quad_processors | voyager_extended_vic_processors)
-		     & cpus_addr(voyager_phys_cpu_present_map)[0]) !=
-		    cpus_addr(voyager_phys_cpu_present_map)[0]) {
-			/* should panic */
-			printk("\n\n***WARNING*** "
-			       "Sanity check of CPU present map FAILED\n");
-		}
-	} else if (voyager_level == 4)
-		voyager_extended_vic_processors =
-		    cpus_addr(voyager_phys_cpu_present_map)[0];
-
-	/* this sets up the idle task to run on the current cpu */
-	voyager_extended_cpus = 1;
-	/* Remove the global_irq_holder setting, it triggers a BUG() on
-	 * schedule at the moment */
-	//global_irq_holder = boot_cpu_id;
-
-	/* FIXME: Need to do something about this but currently only works
-	 * on CPUs with a tsc which none of mine have.
-	 smp_tune_scheduling();
-	 */
-	smp_store_cpu_info(boot_cpu_id);
-	/* setup the jump vector */
-	initial_code = (unsigned long)initialize_secondary;
-	printk("CPU%d: ", boot_cpu_id);
-	print_cpu_info(&cpu_data(boot_cpu_id));
-
-	if (is_cpu_quad()) {
-		/* booting on a Quad CPU */
-		printk("VOYAGER SMP: Boot CPU is Quad\n");
-		qic_setup();
-		do_quad_bootstrap();
-	}
-
-	/* enable our own CPIs */
-	vic_enable_cpi();
-
-	cpu_set(boot_cpu_id, cpu_online_map);
-	cpu_set(boot_cpu_id, cpu_callout_map);
-
-	/* loop over all the extended VIC CPUs and boot them.  The
-	 * Quad CPUs must be bootstrapped by their extended VIC cpu */
-	for (i = 0; i < nr_cpu_ids; i++) {
-		if (i == boot_cpu_id || !cpu_isset(i, voyager_phys_cpu_present_map))
-			continue;
-		do_boot_cpu(i);
-		/* This udelay seems to be needed for the Quad boots
-		 * don't remove unless you know what you're doing */
-		udelay(1000);
-	}
-	/* we could compute the total bogomips here, but why bother?,
-	 * Code added from smpboot.c */
-	{
-		unsigned long bogosum = 0;
-
-		for_each_online_cpu(i)
-			bogosum += cpu_data(i).loops_per_jiffy;
-		printk(KERN_INFO "Total of %d processors activated "
-		       "(%lu.%02lu BogoMIPS).\n",
-		       cpucount + 1, bogosum / (500000 / HZ),
-		       (bogosum / (5000 / HZ)) % 100);
-	}
-	voyager_extended_cpus = hweight32(voyager_extended_vic_processors);
-	printk("VOYAGER: Extended (interrupt handling CPUs): "
-	       "%d, non-extended: %d\n", voyager_extended_cpus,
-	       num_booting_cpus() - voyager_extended_cpus);
-	/* that's it, switch to symmetric mode */
-	outb(0, VIC_PRIORITY_REGISTER);
-	outb(0, VIC_CLAIM_REGISTER_0);
-	outb(0, VIC_CLAIM_REGISTER_1);
-
-	VDEBUG(("VOYAGER SMP: Booted with %d CPUs\n", num_booting_cpus()));
-}
-
-/* Reload the secondary CPUs task structure (this function does not
- * return ) */
-static void __init initialize_secondary(void)
-{
-#if 0
-	// AC kernels only
-	set_current(hard_get_current());
-#endif
-
-	/*
-	 * We don't actually need to load the full TSS,
-	 * basically just the stack pointer and the eip.
-	 */
-
-	asm volatile ("movl %0,%%esp\n\t"
-		      "jmp *%1"::"r" (current->thread.sp),
-		      "r"(current->thread.ip));
-}
-
-/* handle a Voyager SYS_INT -- If we don't, the base board will
- * panic the system.
- *
- * System interrupts occur because some problem was detected on the
- * various busses.  To find out what you have to probe all the
- * hardware via the CAT bus.  FIXME: At the moment we do nothing. */
-void smp_vic_sys_interrupt(struct pt_regs *regs)
-{
-	ack_CPI(VIC_SYS_INT);
-	printk("Voyager SYSTEM INTERRUPT\n");
-}
-
-/* Handle a voyager CMN_INT; These interrupts occur either because of
- * a system status change or because a single bit memory error
- * occurred.  FIXME: At the moment, ignore all this. */
-void smp_vic_cmn_interrupt(struct pt_regs *regs)
-{
-	static __u8 in_cmn_int = 0;
-	static DEFINE_SPINLOCK(cmn_int_lock);
-
-	/* common ints are broadcast, so make sure we only do this once */
-	_raw_spin_lock(&cmn_int_lock);
-	if (in_cmn_int)
-		goto unlock_end;
-
-	in_cmn_int++;
-	_raw_spin_unlock(&cmn_int_lock);
-
-	VDEBUG(("Voyager COMMON INTERRUPT\n"));
-
-	if (voyager_level == 5)
-		voyager_cat_do_common_interrupt();
-
-	_raw_spin_lock(&cmn_int_lock);
-	in_cmn_int = 0;
-      unlock_end:
-	_raw_spin_unlock(&cmn_int_lock);
-	ack_CPI(VIC_CMN_INT);
-}
-
-/*
- * Reschedule call back. Nothing to do, all the work is done
- * automatically when we return from the interrupt.  */
-static void smp_reschedule_interrupt(void)
-{
-	/* do nothing */
-}
-
-static struct mm_struct *flush_mm;
-static unsigned long flush_va;
-static DEFINE_SPINLOCK(tlbstate_lock);
-
-/*
- * We cannot call mmdrop() because we are in interrupt context,
- * instead update mm->cpu_vm_mask.
- *
- * We need to reload %cr3 since the page tables may be going
- * away from under us..
- */
-static inline void voyager_leave_mm(unsigned long cpu)
-{
-	if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK)
-		BUG();
-	cpu_clear(cpu, per_cpu(cpu_tlbstate, cpu).active_mm->cpu_vm_mask);
-	load_cr3(swapper_pg_dir);
-}
-
-/*
- * Invalidate call-back
- */
-static void smp_invalidate_interrupt(void)
-{
-	__u8 cpu = smp_processor_id();
-
-	if (!test_bit(cpu, &smp_invalidate_needed))
-		return;
-	/* This will flood messages.  Don't uncomment unless you see
-	 * Problems with cross cpu invalidation
-	 VDEBUG(("VOYAGER SMP: CPU%d received INVALIDATE_CPI\n",
-	 smp_processor_id()));
-	 */
-
-	if (flush_mm == per_cpu(cpu_tlbstate, cpu).active_mm) {
-		if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) {
-			if (flush_va == TLB_FLUSH_ALL)
-				local_flush_tlb();
-			else
-				__flush_tlb_one(flush_va);
-		} else
-			voyager_leave_mm(cpu);
-	}
-	smp_mb__before_clear_bit();
-	clear_bit(cpu, &smp_invalidate_needed);
-	smp_mb__after_clear_bit();
-}
-
-/* All the new flush operations for 2.4 */
-
-/* This routine is called with a physical cpu mask */
-static void
-voyager_flush_tlb_others(unsigned long cpumask, struct mm_struct *mm,
-			 unsigned long va)
-{
-	int stuck = 50000;
-
-	if (!cpumask)
-		BUG();
-	if ((cpumask & cpus_addr(cpu_online_map)[0]) != cpumask)
-		BUG();
-	if (cpumask & (1 << smp_processor_id()))
-		BUG();
-	if (!mm)
-		BUG();
-
-	spin_lock(&tlbstate_lock);
-
-	flush_mm = mm;
-	flush_va = va;
-	atomic_set_mask(cpumask, &smp_invalidate_needed);
-	/*
-	 * We have to send the CPI only to
-	 * CPUs affected.
-	 */
-	send_CPI(cpumask, VIC_INVALIDATE_CPI);
-
-	while (smp_invalidate_needed) {
-		mb();
-		if (--stuck == 0) {
-			printk("***WARNING*** Stuck doing invalidate CPI "
-			       "(CPU%d)\n", smp_processor_id());
-			break;
-		}
-	}
-
-	/* Uncomment only to debug invalidation problems
-	   VDEBUG(("VOYAGER SMP: Completed invalidate CPI (CPU%d)\n", cpu));
-	 */
-
-	flush_mm = NULL;
-	flush_va = 0;
-	spin_unlock(&tlbstate_lock);
-}
-
-void flush_tlb_current_task(void)
-{
-	struct mm_struct *mm = current->mm;
-	unsigned long cpu_mask;
-
-	preempt_disable();
-
-	cpu_mask = cpus_addr(mm->cpu_vm_mask)[0] & ~(1 << smp_processor_id());
-	local_flush_tlb();
-	if (cpu_mask)
-		voyager_flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
-
-	preempt_enable();
-}
-
-void flush_tlb_mm(struct mm_struct *mm)
-{
-	unsigned long cpu_mask;
-
-	preempt_disable();
-
-	cpu_mask = cpus_addr(mm->cpu_vm_mask)[0] & ~(1 << smp_processor_id());
-
-	if (current->active_mm == mm) {
-		if (current->mm)
-			local_flush_tlb();
-		else
-			voyager_leave_mm(smp_processor_id());
-	}
-	if (cpu_mask)
-		voyager_flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
-
-	preempt_enable();
-}
-
-void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	unsigned long cpu_mask;
-
-	preempt_disable();
-
-	cpu_mask = cpus_addr(mm->cpu_vm_mask)[0] & ~(1 << smp_processor_id());
-	if (current->active_mm == mm) {
-		if (current->mm)
-			__flush_tlb_one(va);
-		else
-			voyager_leave_mm(smp_processor_id());
-	}
-
-	if (cpu_mask)
-		voyager_flush_tlb_others(cpu_mask, mm, va);
-
-	preempt_enable();
-}
-
-EXPORT_SYMBOL(flush_tlb_page);
-
-/* enable the requested IRQs */
-static void smp_enable_irq_interrupt(void)
-{
-	__u8 irq;
-	__u8 cpu = get_cpu();
-
-	VDEBUG(("VOYAGER SMP: CPU%d enabling irq mask 0x%x\n", cpu,
-		vic_irq_enable_mask[cpu]));
-
-	spin_lock(&vic_irq_lock);
-	for (irq = 0; irq < 16; irq++) {
-		if (vic_irq_enable_mask[cpu] & (1 << irq))
-			enable_local_vic_irq(irq);
-	}
-	vic_irq_enable_mask[cpu] = 0;
-	spin_unlock(&vic_irq_lock);
-
-	put_cpu_no_resched();
-}
-
-/*
- *	CPU halt call-back
- */
-static void smp_stop_cpu_function(void *dummy)
-{
-	VDEBUG(("VOYAGER SMP: CPU%d is STOPPING\n", smp_processor_id()));
-	cpu_clear(smp_processor_id(), cpu_online_map);
-	local_irq_disable();
-	for (;;)
-		halt();
-}
-
-/* execute a thread on a new CPU.  The function to be called must be
- * previously set up.  This is used to schedule a function for
- * execution on all CPUs - set up the function then broadcast a
- * function_interrupt CPI to come here on each CPU */
-static void smp_call_function_interrupt(void)
-{
-	irq_enter();
-	generic_smp_call_function_interrupt();
-	__get_cpu_var(irq_stat).irq_call_count++;
-	irq_exit();
-}
-
-static void smp_call_function_single_interrupt(void)
-{
-	irq_enter();
-	generic_smp_call_function_single_interrupt();
-	__get_cpu_var(irq_stat).irq_call_count++;
-	irq_exit();
-}
-
-/* Sorry about the name.  In an APIC based system, the APICs
- * themselves are programmed to send a timer interrupt.  This is used
- * by linux to reschedule the processor.  Voyager doesn't have this,
- * so we use the system clock to interrupt one processor, which in
- * turn, broadcasts a timer CPI to all the others --- we receive that
- * CPI here.  We don't use this actually for counting so losing
- * ticks doesn't matter
- *
- * FIXME: For those CPUs which actually have a local APIC, we could
- * try to use it to trigger this interrupt instead of having to
- * broadcast the timer tick.  Unfortunately, all my pentium DYADs have
- * no local APIC, so I can't do this
- *
- * This function is currently a placeholder and is unused in the code */
-void smp_apic_timer_interrupt(struct pt_regs *regs)
-{
-	struct pt_regs *old_regs = set_irq_regs(regs);
-	wrapper_smp_local_timer_interrupt();
-	set_irq_regs(old_regs);
-}
-
-/* All of the QUAD interrupt GATES */
-void smp_qic_timer_interrupt(struct pt_regs *regs)
-{
-	struct pt_regs *old_regs = set_irq_regs(regs);
-	ack_QIC_CPI(QIC_TIMER_CPI);
-	wrapper_smp_local_timer_interrupt();
-	set_irq_regs(old_regs);
-}
-
-void smp_qic_invalidate_interrupt(struct pt_regs *regs)
-{
-	ack_QIC_CPI(QIC_INVALIDATE_CPI);
-	smp_invalidate_interrupt();
-}
-
-void smp_qic_reschedule_interrupt(struct pt_regs *regs)
-{
-	ack_QIC_CPI(QIC_RESCHEDULE_CPI);
-	smp_reschedule_interrupt();
-}
-
-void smp_qic_enable_irq_interrupt(struct pt_regs *regs)
-{
-	ack_QIC_CPI(QIC_ENABLE_IRQ_CPI);
-	smp_enable_irq_interrupt();
-}
-
-void smp_qic_call_function_interrupt(struct pt_regs *regs)
-{
-	ack_QIC_CPI(QIC_CALL_FUNCTION_CPI);
-	smp_call_function_interrupt();
-}
-
-void smp_qic_call_function_single_interrupt(struct pt_regs *regs)
-{
-	ack_QIC_CPI(QIC_CALL_FUNCTION_SINGLE_CPI);
-	smp_call_function_single_interrupt();
-}
-
-void smp_vic_cpi_interrupt(struct pt_regs *regs)
-{
-	struct pt_regs *old_regs = set_irq_regs(regs);
-	__u8 cpu = smp_processor_id();
-
-	if (is_cpu_quad())
-		ack_QIC_CPI(VIC_CPI_LEVEL0);
-	else
-		ack_VIC_CPI(VIC_CPI_LEVEL0);
-
-	if (test_and_clear_bit(VIC_TIMER_CPI, &vic_cpi_mailbox[cpu]))
-		wrapper_smp_local_timer_interrupt();
-	if (test_and_clear_bit(VIC_INVALIDATE_CPI, &vic_cpi_mailbox[cpu]))
-		smp_invalidate_interrupt();
-	if (test_and_clear_bit(VIC_RESCHEDULE_CPI, &vic_cpi_mailbox[cpu]))
-		smp_reschedule_interrupt();
-	if (test_and_clear_bit(VIC_ENABLE_IRQ_CPI, &vic_cpi_mailbox[cpu]))
-		smp_enable_irq_interrupt();
-	if (test_and_clear_bit(VIC_CALL_FUNCTION_CPI, &vic_cpi_mailbox[cpu]))
-		smp_call_function_interrupt();
-	if (test_and_clear_bit(VIC_CALL_FUNCTION_SINGLE_CPI, &vic_cpi_mailbox[cpu]))
-		smp_call_function_single_interrupt();
-	set_irq_regs(old_regs);
-}
-
-static void do_flush_tlb_all(void *info)
-{
-	unsigned long cpu = smp_processor_id();
-
-	__flush_tlb_all();
-	if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_LAZY)
-		voyager_leave_mm(cpu);
-}
-
-/* flush the TLB of every active CPU in the system */
-void flush_tlb_all(void)
-{
-	on_each_cpu(do_flush_tlb_all, 0, 1);
-}
-
-/* send a reschedule CPI to one CPU by physical CPU number*/
-static void voyager_smp_send_reschedule(int cpu)
-{
-	send_one_CPI(cpu, VIC_RESCHEDULE_CPI);
-}
-
-int hard_smp_processor_id(void)
-{
-	__u8 i;
-	__u8 cpumask = inb(VIC_PROC_WHO_AM_I);
-	if ((cpumask & QUAD_IDENTIFIER) == QUAD_IDENTIFIER)
-		return cpumask & 0x1F;
-
-	for (i = 0; i < 8; i++) {
-		if (cpumask & (1 << i))
-			return i;
-	}
-	printk("** WARNING ** Illegal cpuid returned by VIC: %d", cpumask);
-	return 0;
-}
-
-int safe_smp_processor_id(void)
-{
-	return hard_smp_processor_id();
-}
-
-/* broadcast a halt to all other CPUs */
-static void voyager_smp_send_stop(void)
-{
-	smp_call_function(smp_stop_cpu_function, NULL, 1);
-}
-
-/* this function is triggered in time.c when a clock tick fires
- * we need to re-broadcast the tick to all CPUs */
-void smp_vic_timer_interrupt(void)
-{
-	send_CPI_allbutself(VIC_TIMER_CPI);
-	smp_local_timer_interrupt();
-}
-
-/* local (per CPU) timer interrupt.  It does both profiling and
- * process statistics/rescheduling.
- *
- * We do profiling in every local tick, statistics/rescheduling
- * happen only every 'profiling multiplier' ticks. The default
- * multiplier is 1 and it can be changed by writing the new multiplier
- * value into /proc/profile.
- */
-void smp_local_timer_interrupt(void)
-{
-	int cpu = smp_processor_id();
-	long weight;
-
-	profile_tick(CPU_PROFILING);
-	if (--per_cpu(prof_counter, cpu) <= 0) {
-		/*
-		 * The multiplier may have changed since the last time we got
-		 * to this point as a result of the user writing to
-		 * /proc/profile. In this case we need to adjust the APIC
-		 * timer accordingly.
-		 *
-		 * Interrupts are already masked off at this point.
-		 */
-		per_cpu(prof_counter, cpu) = per_cpu(prof_multiplier, cpu);
-		if (per_cpu(prof_counter, cpu) !=
-		    per_cpu(prof_old_multiplier, cpu)) {
-			/* FIXME: need to update the vic timer tick here */
-			per_cpu(prof_old_multiplier, cpu) =
-			    per_cpu(prof_counter, cpu);
-		}
-
-		update_process_times(user_mode_vm(get_irq_regs()));
-	}
-
-	if (((1 << cpu) & voyager_extended_vic_processors) == 0)
-		/* only extended VIC processors participate in
-		 * interrupt distribution */
-		return;
-
-	/*
-	 * We take the 'long' return path, and there every subsystem
-	 * grabs the appropriate locks (kernel lock/ irq lock).
-	 *
-	 * we might want to decouple profiling from the 'long path',
-	 * and do the profiling totally in assembly.
-	 *
-	 * Currently this isn't too much of an issue (performance wise),
-	 * we can take more than 100K local irqs per second on a 100 MHz P5.
-	 */
-
-	if ((++vic_tick[cpu] & 0x7) != 0)
-		return;
-	/* get here every 16 ticks (about every 1/6 of a second) */
-
-	/* Change our priority to give someone else a chance at getting
-	 * the IRQ. The algorithm goes like this:
-	 *
-	 * In the VIC, the dynamically routed interrupt is always
-	 * handled by the lowest priority eligible (i.e. receiving
-	 * interrupts) CPU.  If >1 eligible CPUs are equal lowest, the
-	 * lowest processor number gets it.
-	 *
-	 * The priority of a CPU is controlled by a special per-CPU
-	 * VIC priority register which is 3 bits wide 0 being lowest
-	 * and 7 highest priority..
-	 *
-	 * Therefore we subtract the average number of interrupts from
-	 * the number we've fielded.  If this number is negative, we
-	 * lower the activity count and if it is positive, we raise
-	 * it.
-	 *
-	 * I'm afraid this still leads to odd looking interrupt counts:
-	 * the totals are all roughly equal, but the individual ones
-	 * look rather skewed.
-	 *
-	 * FIXME: This algorithm is total crap when mixed with SMP
-	 * affinity code since we now try to even up the interrupt
-	 * counts when an affinity binding is keeping them on a
-	 * particular CPU*/
-	weight = (vic_intr_count[cpu] * voyager_extended_cpus
-		  - vic_intr_total) >> 4;
-	weight += 4;
-	if (weight > 7)
-		weight = 7;
-	if (weight < 0)
-		weight = 0;
-
-	outb((__u8) weight, VIC_PRIORITY_REGISTER);
-
-#ifdef VOYAGER_DEBUG
-	if ((vic_tick[cpu] & 0xFFF) == 0) {
-		/* print this message roughly every 25 secs */
-		printk("VOYAGER SMP: vic_tick[%d] = %lu, weight = %ld\n",
-		       cpu, vic_tick[cpu], weight);
-	}
-#endif
-}
-
-/* setup the profiling timer */
-int setup_profiling_timer(unsigned int multiplier)
-{
-	int i;
-
-	if ((!multiplier))
-		return -EINVAL;
-
-	/*
-	 * Set the new multiplier for each CPU. CPUs don't start using the
-	 * new values until the next timer interrupt in which they do process
-	 * accounting.
-	 */
-	for (i = 0; i < nr_cpu_ids; ++i)
-		per_cpu(prof_multiplier, i) = multiplier;
-
-	return 0;
-}
-
-/* This is a bit of a mess, but forced on us by the genirq changes
- * there's no genirq handler that really does what voyager wants
- * so hack it up with the simple IRQ handler */
-static void handle_vic_irq(unsigned int irq, struct irq_desc *desc)
-{
-	before_handle_vic_irq(irq);
-	handle_simple_irq(irq, desc);
-	after_handle_vic_irq(irq);
-}
-
-/*  The CPIs are handled in the per cpu 8259s, so they must be
- *  enabled to be received: FIX: enabling the CPIs in the early
- *  boot sequence interferes with bug checking; enable them later
- *  on in smp_init */
-#define VIC_SET_GATE(cpi, vector) \
-	set_intr_gate((cpi) + VIC_DEFAULT_CPI_BASE, (vector))
-#define QIC_SET_GATE(cpi, vector) \
-	set_intr_gate((cpi) + QIC_DEFAULT_CPI_BASE, (vector))
-
-void __init voyager_smp_intr_init(void)
-{
-	int i;
-
-	/* initialize the per cpu irq mask to all disabled */
-	for (i = 0; i < nr_cpu_ids; i++)
-		vic_irq_mask[i] = 0xFFFF;
-
-	VIC_SET_GATE(VIC_CPI_LEVEL0, vic_cpi_interrupt);
-
-	VIC_SET_GATE(VIC_SYS_INT, vic_sys_interrupt);
-	VIC_SET_GATE(VIC_CMN_INT, vic_cmn_interrupt);
-
-	QIC_SET_GATE(QIC_TIMER_CPI, qic_timer_interrupt);
-	QIC_SET_GATE(QIC_INVALIDATE_CPI, qic_invalidate_interrupt);
-	QIC_SET_GATE(QIC_RESCHEDULE_CPI, qic_reschedule_interrupt);
-	QIC_SET_GATE(QIC_ENABLE_IRQ_CPI, qic_enable_irq_interrupt);
-	QIC_SET_GATE(QIC_CALL_FUNCTION_CPI, qic_call_function_interrupt);
-
-	/* now put the VIC descriptor into the first 48 IRQs
-	 *
-	 * This is for later: first 16 correspond to PC IRQs; next 16
-	 * are Primary MC IRQs and final 16 are Secondary MC IRQs */
-	for (i = 0; i < 48; i++)
-		set_irq_chip_and_handler(i, &vic_chip, handle_vic_irq);
-}
-
-/* send a CPI at level cpi to a set of cpus in cpuset (set 1 bit per
- * processor to receive CPI */
-static void send_CPI(__u32 cpuset, __u8 cpi)
-{
-	int cpu;
-	__u32 quad_cpuset = (cpuset & voyager_quad_processors);
-
-	if (cpi < VIC_START_FAKE_CPI) {
-		/* fake CPI are only used for booting, so send to the
-		 * extended quads as well---Quads must be VIC booted */
-		outb((__u8) (cpuset), VIC_CPI_Registers[cpi]);
-		return;
-	}
-	if (quad_cpuset)
-		send_QIC_CPI(quad_cpuset, cpi);
-	cpuset &= ~quad_cpuset;
-	cpuset &= 0xff;		/* only first 8 CPUs vaild for VIC CPI */
-	if (cpuset == 0)
-		return;
-	for_each_online_cpu(cpu) {
-		if (cpuset & (1 << cpu))
-			set_bit(cpi, &vic_cpi_mailbox[cpu]);
-	}
-	if (cpuset)
-		outb((__u8) cpuset, VIC_CPI_Registers[VIC_CPI_LEVEL0]);
-}
-
-/* Acknowledge receipt of CPI in the QIC, clear in QIC hardware and
- * set the cache line to shared by reading it.
- *
- * DON'T make this inline otherwise the cache line read will be
- * optimised away
- * */
-static int ack_QIC_CPI(__u8 cpi)
-{
-	__u8 cpu = hard_smp_processor_id();
-
-	cpi &= 7;
-
-	outb(1 << cpi, QIC_INTERRUPT_CLEAR1);
-	return voyager_quad_cpi_addr[cpu]->qic_cpi[cpi].cpi;
-}
-
-static void ack_special_QIC_CPI(__u8 cpi)
-{
-	switch (cpi) {
-	case VIC_CMN_INT:
-		outb(QIC_CMN_INT, QIC_INTERRUPT_CLEAR0);
-		break;
-	case VIC_SYS_INT:
-		outb(QIC_SYS_INT, QIC_INTERRUPT_CLEAR0);
-		break;
-	}
-	/* also clear at the VIC, just in case (nop for non-extended proc) */
-	ack_VIC_CPI(cpi);
-}
-
-/* Acknowledge receipt of CPI in the VIC (essentially an EOI) */
-static void ack_VIC_CPI(__u8 cpi)
-{
-#ifdef VOYAGER_DEBUG
-	unsigned long flags;
-	__u16 isr;
-	__u8 cpu = smp_processor_id();
-
-	local_irq_save(flags);
-	isr = vic_read_isr();
-	if ((isr & (1 << (cpi & 7))) == 0) {
-		printk("VOYAGER SMP: CPU%d lost CPI%d\n", cpu, cpi);
-	}
-#endif
-	/* send specific EOI; the two system interrupts have
-	 * bit 4 set for a separate vector but behave as the
-	 * corresponding 3 bit intr */
-	outb_p(0x60 | (cpi & 7), 0x20);
-
-#ifdef VOYAGER_DEBUG
-	if ((vic_read_isr() & (1 << (cpi & 7))) != 0) {
-		printk("VOYAGER SMP: CPU%d still asserting CPI%d\n", cpu, cpi);
-	}
-	local_irq_restore(flags);
-#endif
-}
-
-/* cribbed with thanks from irq.c */
-#define __byte(x,y)	(((unsigned char *)&(y))[x])
-#define cached_21(cpu)	(__byte(0,vic_irq_mask[cpu]))
-#define cached_A1(cpu)	(__byte(1,vic_irq_mask[cpu]))
-
-static unsigned int startup_vic_irq(unsigned int irq)
-{
-	unmask_vic_irq(irq);
-
-	return 0;
-}
-
-/* The enable and disable routines.  This is where we run into
- * conflicting architectural philosophy.  Fundamentally, the voyager
- * architecture does not expect to have to disable interrupts globally
- * (the IRQ controllers belong to each CPU).  The processor masquerade
- * which is used to start the system shouldn't be used in a running OS
- * since it will cause great confusion if two separate CPUs drive to
- * the same IRQ controller (I know, I've tried it).
- *
- * The solution is a variant on the NCR lazy SPL design:
- *
- * 1) To disable an interrupt, do nothing (other than set the
- *    IRQ_DISABLED flag).  This dares the interrupt actually to arrive.
- *
- * 2) If the interrupt dares to come in, raise the local mask against
- *    it (this will result in all the CPU masks being raised
- *    eventually).
- *
- * 3) To enable the interrupt, lower the mask on the local CPU and
- *    broadcast an Interrupt enable CPI which causes all other CPUs to
- *    adjust their masks accordingly.  */
-
-static void unmask_vic_irq(unsigned int irq)
-{
-	/* linux doesn't to processor-irq affinity, so enable on
-	 * all CPUs we know about */
-	int cpu = smp_processor_id(), real_cpu;
-	__u16 mask = (1 << irq);
-	__u32 processorList = 0;
-	unsigned long flags;
-
-	VDEBUG(("VOYAGER: unmask_vic_irq(%d) CPU%d affinity 0x%lx\n",
-		irq, cpu, cpu_irq_affinity[cpu]));
-	spin_lock_irqsave(&vic_irq_lock, flags);
-	for_each_online_cpu(real_cpu) {
-		if (!(voyager_extended_vic_processors & (1 << real_cpu)))
-			continue;
-		if (!(cpu_irq_affinity[real_cpu] & mask)) {
-			/* irq has no affinity for this CPU, ignore */
-			continue;
-		}
-		if (real_cpu == cpu) {
-			enable_local_vic_irq(irq);
-		} else if (vic_irq_mask[real_cpu] & mask) {
-			vic_irq_enable_mask[real_cpu] |= mask;
-			processorList |= (1 << real_cpu);
-		}
-	}
-	spin_unlock_irqrestore(&vic_irq_lock, flags);
-	if (processorList)
-		send_CPI(processorList, VIC_ENABLE_IRQ_CPI);
-}
-
-static void mask_vic_irq(unsigned int irq)
-{
-	/* lazy disable, do nothing */
-}
-
-static void enable_local_vic_irq(unsigned int irq)
-{
-	__u8 cpu = smp_processor_id();
-	__u16 mask = ~(1 << irq);
-	__u16 old_mask = vic_irq_mask[cpu];
-
-	vic_irq_mask[cpu] &= mask;
-	if (vic_irq_mask[cpu] == old_mask)
-		return;
-
-	VDEBUG(("VOYAGER DEBUG: Enabling irq %d in hardware on CPU %d\n",
-		irq, cpu));
-
-	if (irq & 8) {
-		outb_p(cached_A1(cpu), 0xA1);
-		(void)inb_p(0xA1);
-	} else {
-		outb_p(cached_21(cpu), 0x21);
-		(void)inb_p(0x21);
-	}
-}
-
-static void disable_local_vic_irq(unsigned int irq)
-{
-	__u8 cpu = smp_processor_id();
-	__u16 mask = (1 << irq);
-	__u16 old_mask = vic_irq_mask[cpu];
-
-	if (irq == 7)
-		return;
-
-	vic_irq_mask[cpu] |= mask;
-	if (old_mask == vic_irq_mask[cpu])
-		return;
-
-	VDEBUG(("VOYAGER DEBUG: Disabling irq %d in hardware on CPU %d\n",
-		irq, cpu));
-
-	if (irq & 8) {
-		outb_p(cached_A1(cpu), 0xA1);
-		(void)inb_p(0xA1);
-	} else {
-		outb_p(cached_21(cpu), 0x21);
-		(void)inb_p(0x21);
-	}
-}
-
-/* The VIC is level triggered, so the ack can only be issued after the
- * interrupt completes.  However, we do Voyager lazy interrupt
- * handling here: It is an extremely expensive operation to mask an
- * interrupt in the vic, so we merely set a flag (IRQ_DISABLED).  If
- * this interrupt actually comes in, then we mask and ack here to push
- * the interrupt off to another CPU */
-static void before_handle_vic_irq(unsigned int irq)
-{
-	irq_desc_t *desc = irq_to_desc(irq);
-	__u8 cpu = smp_processor_id();
-
-	_raw_spin_lock(&vic_irq_lock);
-	vic_intr_total++;
-	vic_intr_count[cpu]++;
-
-	if (!(cpu_irq_affinity[cpu] & (1 << irq))) {
-		/* The irq is not in our affinity mask, push it off
-		 * onto another CPU */
-		VDEBUG(("VOYAGER DEBUG: affinity triggered disable of irq %d "
-			"on cpu %d\n", irq, cpu));
-		disable_local_vic_irq(irq);
-		/* set IRQ_INPROGRESS to prevent the handler in irq.c from
-		 * actually calling the interrupt routine */
-		desc->status |= IRQ_REPLAY | IRQ_INPROGRESS;
-	} else if (desc->status & IRQ_DISABLED) {
-		/* Damn, the interrupt actually arrived, do the lazy
-		 * disable thing. The interrupt routine in irq.c will
-		 * not handle a IRQ_DISABLED interrupt, so nothing more
-		 * need be done here */
-		VDEBUG(("VOYAGER DEBUG: lazy disable of irq %d on CPU %d\n",
-			irq, cpu));
-		disable_local_vic_irq(irq);
-		desc->status |= IRQ_REPLAY;
-	} else {
-		desc->status &= ~IRQ_REPLAY;
-	}
-
-	_raw_spin_unlock(&vic_irq_lock);
-}
-
-/* Finish the VIC interrupt: basically mask */
-static void after_handle_vic_irq(unsigned int irq)
-{
-	irq_desc_t *desc = irq_to_desc(irq);
-
-	_raw_spin_lock(&vic_irq_lock);
-	{
-		unsigned int status = desc->status & ~IRQ_INPROGRESS;
-#ifdef VOYAGER_DEBUG
-		__u16 isr;
-#endif
-
-		desc->status = status;
-		if ((status & IRQ_DISABLED))
-			disable_local_vic_irq(irq);
-#ifdef VOYAGER_DEBUG
-		/* DEBUG: before we ack, check what's in progress */
-		isr = vic_read_isr();
-		if ((isr & (1 << irq) && !(status & IRQ_REPLAY)) == 0) {
-			int i;
-			__u8 cpu = smp_processor_id();
-			__u8 real_cpu;
-			int mask;	/* Um... initialize me??? --RR */
-
-			printk("VOYAGER SMP: CPU%d lost interrupt %d\n",
-			       cpu, irq);
-			for_each_possible_cpu(real_cpu, mask) {
-
-				outb(VIC_CPU_MASQUERADE_ENABLE | real_cpu,
-				     VIC_PROCESSOR_ID);
-				isr = vic_read_isr();
-				if (isr & (1 << irq)) {
-					printk
-					    ("VOYAGER SMP: CPU%d ack irq %d\n",
-					     real_cpu, irq);
-					ack_vic_irq(irq);
-				}
-				outb(cpu, VIC_PROCESSOR_ID);
-			}
-		}
-#endif /* VOYAGER_DEBUG */
-		/* as soon as we ack, the interrupt is eligible for
-		 * receipt by another CPU so everything must be in
-		 * order here  */
-		ack_vic_irq(irq);
-		if (status & IRQ_REPLAY) {
-			/* replay is set if we disable the interrupt
-			 * in the before_handle_vic_irq() routine, so
-			 * clear the in progress bit here to allow the
-			 * next CPU to handle this correctly */
-			desc->status &= ~(IRQ_REPLAY | IRQ_INPROGRESS);
-		}
-#ifdef VOYAGER_DEBUG
-		isr = vic_read_isr();
-		if ((isr & (1 << irq)) != 0)
-			printk("VOYAGER SMP: after_handle_vic_irq() after "
-			       "ack irq=%d, isr=0x%x\n", irq, isr);
-#endif /* VOYAGER_DEBUG */
-	}
-	_raw_spin_unlock(&vic_irq_lock);
-
-	/* All code after this point is out of the main path - the IRQ
-	 * may be intercepted by another CPU if reasserted */
-}
-
-/* Linux processor - interrupt affinity manipulations.
- *
- * For each processor, we maintain a 32 bit irq affinity mask.
- * Initially it is set to all 1's so every processor accepts every
- * interrupt.  In this call, we change the processor's affinity mask:
- *
- * Change from enable to disable:
- *
- * If the interrupt ever comes in to the processor, we will disable it
- * and ack it to push it off to another CPU, so just accept the mask here.
- *
- * Change from disable to enable:
- *
- * change the mask and then do an interrupt enable CPI to re-enable on
- * the selected processors */
-
-void set_vic_irq_affinity(unsigned int irq, const struct cpumask *mask)
-{
-	/* Only extended processors handle interrupts */
-	unsigned long real_mask;
-	unsigned long irq_mask = 1 << irq;
-	int cpu;
-
-	real_mask = cpus_addr(*mask)[0] & voyager_extended_vic_processors;
-
-	if (cpus_addr(*mask)[0] == 0)
-		/* can't have no CPUs to accept the interrupt -- extremely
-		 * bad things will happen */
-		return;
-
-	if (irq == 0)
-		/* can't change the affinity of the timer IRQ.  This
-		 * is due to the constraint in the voyager
-		 * architecture that the CPI also comes in on and IRQ
-		 * line and we have chosen IRQ0 for this.  If you
-		 * raise the mask on this interrupt, the processor
-		 * will no-longer be able to accept VIC CPIs */
-		return;
-
-	if (irq >= 32)
-		/* You can only have 32 interrupts in a voyager system
-		 * (and 32 only if you have a secondary microchannel
-		 * bus) */
-		return;
-
-	for_each_online_cpu(cpu) {
-		unsigned long cpu_mask = 1 << cpu;
-
-		if (cpu_mask & real_mask) {
-			/* enable the interrupt for this cpu */
-			cpu_irq_affinity[cpu] |= irq_mask;
-		} else {
-			/* disable the interrupt for this cpu */
-			cpu_irq_affinity[cpu] &= ~irq_mask;
-		}
-	}
-	/* this is magic, we now have the correct affinity maps, so
-	 * enable the interrupt.  This will send an enable CPI to
-	 * those CPUs who need to enable it in their local masks,
-	 * causing them to correct for the new affinity . If the
-	 * interrupt is currently globally disabled, it will simply be
-	 * disabled again as it comes in (voyager lazy disable).  If
-	 * the affinity map is tightened to disable the interrupt on a
-	 * cpu, it will be pushed off when it comes in */
-	unmask_vic_irq(irq);
-}
-
-static void ack_vic_irq(unsigned int irq)
-{
-	if (irq & 8) {
-		outb(0x62, 0x20);	/* Specific EOI to cascade */
-		outb(0x60 | (irq & 7), 0xA0);
-	} else {
-		outb(0x60 | (irq & 7), 0x20);
-	}
-}
-
-/* enable the CPIs.  In the VIC, the CPIs are delivered by the 8259
- * but are not vectored by it.  This means that the 8259 mask must be
- * lowered to receive them */
-static __init void vic_enable_cpi(void)
-{
-	__u8 cpu = smp_processor_id();
-
-	/* just take a copy of the current mask (nop for boot cpu) */
-	vic_irq_mask[cpu] = vic_irq_mask[boot_cpu_id];
-
-	enable_local_vic_irq(VIC_CPI_LEVEL0);
-	enable_local_vic_irq(VIC_CPI_LEVEL1);
-	/* for sys int and cmn int */
-	enable_local_vic_irq(7);
-
-	if (is_cpu_quad()) {
-		outb(QIC_DEFAULT_MASK0, QIC_MASK_REGISTER0);
-		outb(QIC_CPI_ENABLE, QIC_MASK_REGISTER1);
-		VDEBUG(("VOYAGER SMP: QIC ENABLE CPI: CPU%d: MASK 0x%x\n",
-			cpu, QIC_CPI_ENABLE));
-	}
-
-	VDEBUG(("VOYAGER SMP: ENABLE CPI: CPU%d: MASK 0x%x\n",
-		cpu, vic_irq_mask[cpu]));
-}
-
-void voyager_smp_dump()
-{
-	int old_cpu = smp_processor_id(), cpu;
-
-	/* dump the interrupt masks of each processor */
-	for_each_online_cpu(cpu) {
-		__u16 imr, isr, irr;
-		unsigned long flags;
-
-		local_irq_save(flags);
-		outb(VIC_CPU_MASQUERADE_ENABLE | cpu, VIC_PROCESSOR_ID);
-		imr = (inb(0xa1) << 8) | inb(0x21);
-		outb(0x0a, 0xa0);
-		irr = inb(0xa0) << 8;
-		outb(0x0a, 0x20);
-		irr |= inb(0x20);
-		outb(0x0b, 0xa0);
-		isr = inb(0xa0) << 8;
-		outb(0x0b, 0x20);
-		isr |= inb(0x20);
-		outb(old_cpu, VIC_PROCESSOR_ID);
-		local_irq_restore(flags);
-		printk("\tCPU%d: mask=0x%x, IMR=0x%x, IRR=0x%x, ISR=0x%x\n",
-		       cpu, vic_irq_mask[cpu], imr, irr, isr);
-#if 0
-		/* These lines are put in to try to unstick an un ack'd irq */
-		if (isr != 0) {
-			int irq;
-			for (irq = 0; irq < 16; irq++) {
-				if (isr & (1 << irq)) {
-					printk("\tCPU%d: ack irq %d\n",
-					       cpu, irq);
-					local_irq_save(flags);
-					outb(VIC_CPU_MASQUERADE_ENABLE | cpu,
-					     VIC_PROCESSOR_ID);
-					ack_vic_irq(irq);
-					outb(old_cpu, VIC_PROCESSOR_ID);
-					local_irq_restore(flags);
-				}
-			}
-		}
-#endif
-	}
-}
-
-void smp_voyager_power_off(void *dummy)
-{
-	if (smp_processor_id() == boot_cpu_id)
-		voyager_power_off();
-	else
-		smp_stop_cpu_function(NULL);
-}
-
-static void __init voyager_smp_prepare_cpus(unsigned int max_cpus)
-{
-	/* FIXME: ignore max_cpus for now */
-	smp_boot_cpus();
-}
-
-static void __cpuinit voyager_smp_prepare_boot_cpu(void)
-{
-	init_gdt(smp_processor_id());
-	switch_to_new_gdt();
-
-	cpu_online_map = cpumask_of_cpu(smp_processor_id());
-	cpu_callout_map = cpumask_of_cpu(smp_processor_id());
-	cpu_callin_map = CPU_MASK_NONE;
-	cpu_present_map = cpumask_of_cpu(smp_processor_id());
-
-}
-
-static int __cpuinit voyager_cpu_up(unsigned int cpu)
-{
-	/* This only works at boot for x86.  See "rewrite" above. */
-	if (cpu_isset(cpu, smp_commenced_mask))
-		return -ENOSYS;
-
-	/* In case one didn't come up */
-	if (!cpu_isset(cpu, cpu_callin_map))
-		return -EIO;
-	/* Unleash the CPU! */
-	cpu_set(cpu, smp_commenced_mask);
-	while (!cpu_online(cpu))
-		mb();
-	return 0;
-}
-
-static void __init voyager_smp_cpus_done(unsigned int max_cpus)
-{
-	zap_low_mappings();
-}
-
-void __init smp_setup_processor_id(void)
-{
-	current_thread_info()->cpu = hard_smp_processor_id();
-	x86_write_percpu(cpu_number, hard_smp_processor_id());
-}
-
-static void voyager_send_call_func(const struct cpumask *callmask)
-{
-	__u32 mask = cpus_addr(*callmask)[0] & ~(1 << smp_processor_id());
-	send_CPI(mask, VIC_CALL_FUNCTION_CPI);
-}
-
-static void voyager_send_call_func_single(int cpu)
-{
-	send_CPI(1 << cpu, VIC_CALL_FUNCTION_SINGLE_CPI);
-}
-
-struct smp_ops smp_ops = {
-	.smp_prepare_boot_cpu = voyager_smp_prepare_boot_cpu,
-	.smp_prepare_cpus = voyager_smp_prepare_cpus,
-	.cpu_up = voyager_cpu_up,
-	.smp_cpus_done = voyager_smp_cpus_done,
-
-	.smp_send_stop = voyager_smp_send_stop,
-	.smp_send_reschedule = voyager_smp_send_reschedule,
-
-	.send_call_func_ipi = voyager_send_call_func,
-	.send_call_func_single_ipi = voyager_send_call_func_single,
-};
diff --git a/arch/x86/mach-voyager/voyager_thread.c b/arch/x86/mach-voyager/voyager_thread.c
deleted file mode 100644
index 15464a20fb38..000000000000
--- a/arch/x86/mach-voyager/voyager_thread.c
+++ /dev/null
@@ -1,128 +0,0 @@
-/* -*- mode: c; c-basic-offset: 8 -*- */
-
-/* Copyright (C) 2001
- *
- * Author: J.E.J.Bottomley@HansenPartnership.com
- *
- * This module provides the machine status monitor thread for the
- * voyager architecture.  This allows us to monitor the machine
- * environment (temp, voltage, fan function) and the front panel and
- * internal UPS.  If a fault is detected, this thread takes corrective
- * action (usually just informing init)
- * */
-
-#include <linux/module.h>
-#include <linux/mm.h>
-#include <linux/kernel_stat.h>
-#include <linux/delay.h>
-#include <linux/mc146818rtc.h>
-#include <linux/init.h>
-#include <linux/bootmem.h>
-#include <linux/kmod.h>
-#include <linux/completion.h>
-#include <linux/sched.h>
-#include <linux/kthread.h>
-#include <asm/desc.h>
-#include <asm/voyager.h>
-#include <asm/vic.h>
-#include <asm/mtrr.h>
-#include <asm/msr.h>
-
-struct task_struct *voyager_thread;
-static __u8 set_timeout;
-
-static int execute(const char *string)
-{
-	int ret;
-
-	char *envp[] = {
-		"HOME=/",
-		"TERM=linux",
-		"PATH=/sbin:/usr/sbin:/bin:/usr/bin",
-		NULL,
-	};
-	char *argv[] = {
-		"/bin/bash",
-		"-c",
-		(char *)string,
-		NULL,
-	};
-
-	if ((ret =
-	     call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC)) != 0) {
-		printk(KERN_ERR "Voyager failed to run \"%s\": %i\n", string,
-		       ret);
-	}
-	return ret;
-}
-
-static void check_from_kernel(void)
-{
-	if (voyager_status.switch_off) {
-
-		/* FIXME: This should be configurable via proc */
-		execute("umask 600; echo 0 > /etc/initrunlvl; kill -HUP 1");
-	} else if (voyager_status.power_fail) {
-		VDEBUG(("Voyager daemon detected AC power failure\n"));
-
-		/* FIXME: This should be configureable via proc */
-		execute("umask 600; echo F > /etc/powerstatus; kill -PWR 1");
-		set_timeout = 1;
-	}
-}
-
-static void check_continuing_condition(void)
-{
-	if (voyager_status.power_fail) {
-		__u8 data;
-		voyager_cat_psi(VOYAGER_PSI_SUBREAD,
-				VOYAGER_PSI_AC_FAIL_REG, &data);
-		if ((data & 0x1f) == 0) {
-			/* all power restored */
-			printk(KERN_NOTICE
-			       "VOYAGER AC power restored, cancelling shutdown\n");
-			/* FIXME: should be user configureable */
-			execute
-			    ("umask 600; echo O > /etc/powerstatus; kill -PWR 1");
-			set_timeout = 0;
-		}
-	}
-}
-
-static int thread(void *unused)
-{
-	printk(KERN_NOTICE "Voyager starting monitor thread\n");
-
-	for (;;) {
-		set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(set_timeout ? HZ : MAX_SCHEDULE_TIMEOUT);
-
-		VDEBUG(("Voyager Daemon awoken\n"));
-		if (voyager_status.request_from_kernel == 0) {
-			/* probably awoken from timeout */
-			check_continuing_condition();
-		} else {
-			check_from_kernel();
-			voyager_status.request_from_kernel = 0;
-		}
-	}
-}
-
-static int __init voyager_thread_start(void)
-{
-	voyager_thread = kthread_run(thread, NULL, "kvoyagerd");
-	if (IS_ERR(voyager_thread)) {
-		printk(KERN_ERR
-		       "Voyager: Failed to create system monitor thread.\n");
-		return PTR_ERR(voyager_thread);
-	}
-	return 0;
-}
-
-static void __exit voyager_thread_stop(void)
-{
-	kthread_stop(voyager_thread);
-}
-
-module_init(voyager_thread_start);
-module_exit(voyager_thread_stop);
diff --git a/arch/x86/math-emu/get_address.c b/arch/x86/math-emu/get_address.c
index 420b3b6e3915..6ef5e99380f9 100644
--- a/arch/x86/math-emu/get_address.c
+++ b/arch/x86/math-emu/get_address.c
@@ -150,11 +150,9 @@ static long pm_address(u_char FPU_modrm, u_char segment,
 #endif /* PARANOID */
 
 	switch (segment) {
-		/* gs isn't used by the kernel, so it still has its
-		   user-space value. */
 	case PREFIX_GS_ - 1:
-		/* N.B. - movl %seg, mem is a 2 byte write regardless of prefix */
-		savesegment(gs, addr->selector);
+		/* user gs handling can be lazy, use special accessors */
+		addr->selector = get_user_gs(FPU_info->regs);
 		break;
 	default:
 		addr->selector = PM_REG_(segment);
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index d8cc96a2738f..08537747cb58 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -1,6 +1,8 @@
-obj-y	:=  init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
+obj-y	:=  init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
 	    pat.o pgtable.o gup.o
 
+obj-$(CONFIG_SMP)		+= tlb.o
+
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 7e8db53528a7..61b41ca3b5a2 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -23,6 +23,12 @@ int fixup_exception(struct pt_regs *regs)
 
 	fixup = search_exception_tables(regs->ip);
 	if (fixup) {
+		/* If fixup is less than 16, it means uaccess error */
+		if (fixup->fixup < 16) {
+			current_thread_info()->uaccess_err = -EFAULT;
+			regs->ip += fixup->fixup;
+			return 1;
+		}
 		regs->ip = fixup->fixup;
 		return 1;
 	}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c76ef1d701c9..a03b7279efa0 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1,73 +1,79 @@
 /*
  *  Copyright (C) 1995  Linus Torvalds
- *  Copyright (C) 2001,2002 Andi Kleen, SuSE Labs.
+ *  Copyright (C) 2001, 2002 Andi Kleen, SuSE Labs.
+ *  Copyright (C) 2008-2009, Red Hat Inc., Ingo Molnar
  */
-
-#include <linux/signal.h>
-#include <linux/sched.h>
-#include <linux/kernel.h>
-#include <linux/errno.h>
-#include <linux/string.h>
-#include <linux/types.h>
-#include <linux/ptrace.h>
-#include <linux/mmiotrace.h>
-#include <linux/mman.h>
-#include <linux/mm.h>
-#include <linux/smp.h>
 #include <linux/interrupt.h>
-#include <linux/init.h>
-#include <linux/tty.h>
-#include <linux/vt_kern.h>		/* For unblank_screen() */
+#include <linux/mmiotrace.h>
+#include <linux/bootmem.h>
 #include <linux/compiler.h>
 #include <linux/highmem.h>
-#include <linux/bootmem.h>		/* for max_low_pfn */
-#include <linux/vmalloc.h>
-#include <linux/module.h>
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/vt_kern.h>
+#include <linux/signal.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <linux/string.h>
+#include <linux/module.h>
 #include <linux/kdebug.h>
+#include <linux/errno.h>
+#include <linux/magic.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/mman.h>
+#include <linux/tty.h>
+#include <linux/smp.h>
+#include <linux/mm.h>
+
+#include <asm-generic/sections.h>
 
-#include <asm/system.h>
-#include <asm/desc.h>
-#include <asm/segment.h>
-#include <asm/pgalloc.h>
-#include <asm/smp.h>
 #include <asm/tlbflush.h>
+#include <asm/pgalloc.h>
+#include <asm/segment.h>
+#include <asm/system.h>
 #include <asm/proto.h>
-#include <asm-generic/sections.h>
 #include <asm/traps.h>
+#include <asm/desc.h>
 
 /*
- * Page fault error code bits
- *	bit 0 == 0 means no page found, 1 means protection fault
- *	bit 1 == 0 means read, 1 means write
- *	bit 2 == 0 means kernel, 1 means user-mode
- *	bit 3 == 1 means use of reserved bit detected
- *	bit 4 == 1 means fault was an instruction fetch
+ * Page fault error code bits:
+ *
+ *   bit 0 ==	 0: no page found	1: protection fault
+ *   bit 1 ==	 0: read access		1: write access
+ *   bit 2 ==	 0: kernel-mode access	1: user-mode access
+ *   bit 3 ==				1: use of reserved bit detected
+ *   bit 4 ==				1: fault was an instruction fetch
  */
-#define PF_PROT		(1<<0)
-#define PF_WRITE	(1<<1)
-#define PF_USER		(1<<2)
-#define PF_RSVD		(1<<3)
-#define PF_INSTR	(1<<4)
+enum x86_pf_error_code {
+
+	PF_PROT		=		1 << 0,
+	PF_WRITE	=		1 << 1,
+	PF_USER		=		1 << 2,
+	PF_RSVD		=		1 << 3,
+	PF_INSTR	=		1 << 4,
+};
 
+/*
+ * Returns 0 if mmiotrace is disabled, or if the fault is not
+ * handled by mmiotrace:
+ */
 static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
 {
-#ifdef CONFIG_MMIOTRACE
 	if (unlikely(is_kmmio_active()))
 		if (kmmio_handler(regs, addr) == 1)
 			return -1;
-#endif
 	return 0;
 }
 
 static inline int notify_page_fault(struct pt_regs *regs)
 {
-#ifdef CONFIG_KPROBES
 	int ret = 0;
 
 	/* kprobe_running() needs smp_processor_id() */
-	if (!user_mode_vm(regs)) {
+	if (kprobes_built_in() && !user_mode_vm(regs)) {
 		preempt_disable();
 		if (kprobe_running() && kprobe_fault_handler(regs, 14))
 			ret = 1;
@@ -75,29 +81,76 @@ static inline int notify_page_fault(struct pt_regs *regs)
 	}
 
 	return ret;
-#else
-	return 0;
-#endif
 }
 
 /*
- * X86_32
- * Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
- * Check that here and ignore it.
+ * Prefetch quirks:
+ *
+ * 32-bit mode:
+ *
+ *   Sometimes AMD Athlon/Opteron CPUs report invalid exceptions on prefetch.
+ *   Check that here and ignore it.
+ *
+ * 64-bit mode:
  *
- * X86_64
- * Sometimes the CPU reports invalid exceptions on prefetch.
- * Check that here and ignore it.
+ *   Sometimes the CPU reports invalid exceptions on prefetch.
+ *   Check that here and ignore it.
  *
- * Opcode checker based on code by Richard Brunner
+ * Opcode checker based on code by Richard Brunner.
  */
-static int is_prefetch(struct pt_regs *regs, unsigned long addr,
-		       unsigned long error_code)
+static inline int
+check_prefetch_opcode(struct pt_regs *regs, unsigned char *instr,
+		      unsigned char opcode, int *prefetch)
 {
+	unsigned char instr_hi = opcode & 0xf0;
+	unsigned char instr_lo = opcode & 0x0f;
+
+	switch (instr_hi) {
+	case 0x20:
+	case 0x30:
+		/*
+		 * Values 0x26,0x2E,0x36,0x3E are valid x86 prefixes.
+		 * In X86_64 long mode, the CPU will signal invalid
+		 * opcode if some of these prefixes are present so
+		 * X86_64 will never get here anyway
+		 */
+		return ((instr_lo & 7) == 0x6);
+#ifdef CONFIG_X86_64
+	case 0x40:
+		/*
+		 * In AMD64 long mode 0x40..0x4F are valid REX prefixes
+		 * Need to figure out under what instruction mode the
+		 * instruction was issued. Could check the LDT for lm,
+		 * but for now it's good enough to assume that long
+		 * mode only uses well known segments or kernel.
+		 */
+		return (!user_mode(regs)) || (regs->cs == __USER_CS);
+#endif
+	case 0x60:
+		/* 0x64 thru 0x67 are valid prefixes in all modes. */
+		return (instr_lo & 0xC) == 0x4;
+	case 0xF0:
+		/* 0xF0, 0xF2, 0xF3 are valid prefixes in all modes. */
+		return !instr_lo || (instr_lo>>1) == 1;
+	case 0x00:
+		/* Prefetch instruction is 0x0F0D or 0x0F18 */
+		if (probe_kernel_address(instr, opcode))
+			return 0;
+
+		*prefetch = (instr_lo == 0xF) &&
+			(opcode == 0x0D || opcode == 0x18);
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+static int
+is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
+{
+	unsigned char *max_instr;
 	unsigned char *instr;
-	int scan_more = 1;
 	int prefetch = 0;
-	unsigned char *max_instr;
 
 	/*
 	 * If it was a exec (instruction fetch) fault on NX page, then
@@ -106,106 +159,170 @@ static int is_prefetch(struct pt_regs *regs, unsigned long addr,
 	if (error_code & PF_INSTR)
 		return 0;
 
-	instr = (unsigned char *)convert_ip_to_linear(current, regs);
+	instr = (void *)convert_ip_to_linear(current, regs);
 	max_instr = instr + 15;
 
 	if (user_mode(regs) && instr >= (unsigned char *)TASK_SIZE)
 		return 0;
 
-	while (scan_more && instr < max_instr) {
+	while (instr < max_instr) {
 		unsigned char opcode;
-		unsigned char instr_hi;
-		unsigned char instr_lo;
 
 		if (probe_kernel_address(instr, opcode))
 			break;
 
-		instr_hi = opcode & 0xf0;
-		instr_lo = opcode & 0x0f;
 		instr++;
 
-		switch (instr_hi) {
-		case 0x20:
-		case 0x30:
-			/*
-			 * Values 0x26,0x2E,0x36,0x3E are valid x86 prefixes.
-			 * In X86_64 long mode, the CPU will signal invalid
-			 * opcode if some of these prefixes are present so
-			 * X86_64 will never get here anyway
-			 */
-			scan_more = ((instr_lo & 7) == 0x6);
+		if (!check_prefetch_opcode(regs, instr, opcode, &prefetch))
 			break;
-#ifdef CONFIG_X86_64
-		case 0x40:
-			/*
-			 * In AMD64 long mode 0x40..0x4F are valid REX prefixes
-			 * Need to figure out under what instruction mode the
-			 * instruction was issued. Could check the LDT for lm,
-			 * but for now it's good enough to assume that long
-			 * mode only uses well known segments or kernel.
-			 */
-			scan_more = (!user_mode(regs)) || (regs->cs == __USER_CS);
-			break;
-#endif
-		case 0x60:
-			/* 0x64 thru 0x67 are valid prefixes in all modes. */
-			scan_more = (instr_lo & 0xC) == 0x4;
-			break;
-		case 0xF0:
-			/* 0xF0, 0xF2, 0xF3 are valid prefixes in all modes. */
-			scan_more = !instr_lo || (instr_lo>>1) == 1;
-			break;
-		case 0x00:
-			/* Prefetch instruction is 0x0F0D or 0x0F18 */
-			scan_more = 0;
-
-			if (probe_kernel_address(instr, opcode))
-				break;
-			prefetch = (instr_lo == 0xF) &&
-				(opcode == 0x0D || opcode == 0x18);
-			break;
-		default:
-			scan_more = 0;
-			break;
-		}
 	}
 	return prefetch;
 }
 
-static void force_sig_info_fault(int si_signo, int si_code,
-	unsigned long address, struct task_struct *tsk)
+static void
+force_sig_info_fault(int si_signo, int si_code, unsigned long address,
+		     struct task_struct *tsk)
 {
 	siginfo_t info;
 
-	info.si_signo = si_signo;
-	info.si_errno = 0;
-	info.si_code = si_code;
-	info.si_addr = (void __user *)address;
+	info.si_signo	= si_signo;
+	info.si_errno	= 0;
+	info.si_code	= si_code;
+	info.si_addr	= (void __user *)address;
+
 	force_sig_info(si_signo, &info, tsk);
 }
 
-#ifdef CONFIG_X86_64
-static int bad_address(void *p)
+DEFINE_SPINLOCK(pgd_lock);
+LIST_HEAD(pgd_list);
+
+#ifdef CONFIG_X86_32
+static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 {
-	unsigned long dummy;
-	return probe_kernel_address((unsigned long *)p, dummy);
+	unsigned index = pgd_index(address);
+	pgd_t *pgd_k;
+	pud_t *pud, *pud_k;
+	pmd_t *pmd, *pmd_k;
+
+	pgd += index;
+	pgd_k = init_mm.pgd + index;
+
+	if (!pgd_present(*pgd_k))
+		return NULL;
+
+	/*
+	 * set_pgd(pgd, *pgd_k); here would be useless on PAE
+	 * and redundant with the set_pmd() on non-PAE. As would
+	 * set_pud.
+	 */
+	pud = pud_offset(pgd, address);
+	pud_k = pud_offset(pgd_k, address);
+	if (!pud_present(*pud_k))
+		return NULL;
+
+	pmd = pmd_offset(pud, address);
+	pmd_k = pmd_offset(pud_k, address);
+	if (!pmd_present(*pmd_k))
+		return NULL;
+
+	if (!pmd_present(*pmd)) {
+		set_pmd(pmd, *pmd_k);
+		arch_flush_lazy_mmu_mode();
+	} else {
+		BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k));
+	}
+
+	return pmd_k;
+}
+
+void vmalloc_sync_all(void)
+{
+	unsigned long address;
+
+	if (SHARED_KERNEL_PMD)
+		return;
+
+	for (address = VMALLOC_START & PMD_MASK;
+	     address >= TASK_SIZE && address < FIXADDR_TOP;
+	     address += PMD_SIZE) {
+
+		unsigned long flags;
+		struct page *page;
+
+		spin_lock_irqsave(&pgd_lock, flags);
+		list_for_each_entry(page, &pgd_list, lru) {
+			if (!vmalloc_sync_one(page_address(page), address))
+				break;
+		}
+		spin_unlock_irqrestore(&pgd_lock, flags);
+	}
+}
+
+/*
+ * 32-bit:
+ *
+ *   Handle a fault on the vmalloc or module mapping area
+ */
+static noinline int vmalloc_fault(unsigned long address)
+{
+	unsigned long pgd_paddr;
+	pmd_t *pmd_k;
+	pte_t *pte_k;
+
+	/* Make sure we are in vmalloc area: */
+	if (!(address >= VMALLOC_START && address < VMALLOC_END))
+		return -1;
+
+	/*
+	 * Synchronize this task's top level page-table
+	 * with the 'reference' page table.
+	 *
+	 * Do _not_ use "current" here. We might be inside
+	 * an interrupt in the middle of a task switch..
+	 */
+	pgd_paddr = read_cr3();
+	pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
+	if (!pmd_k)
+		return -1;
+
+	pte_k = pte_offset_kernel(pmd_k, address);
+	if (!pte_present(*pte_k))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * Did it hit the DOS screen memory VA from vm86 mode?
+ */
+static inline void
+check_v8086_mode(struct pt_regs *regs, unsigned long address,
+		 struct task_struct *tsk)
+{
+	unsigned long bit;
+
+	if (!v8086_mode(regs))
+		return;
+
+	bit = (address - 0xA0000) >> PAGE_SHIFT;
+	if (bit < 32)
+		tsk->thread.screen_bitmap |= 1 << bit;
 }
-#endif
 
 static void dump_pagetable(unsigned long address)
 {
-#ifdef CONFIG_X86_32
 	__typeof__(pte_val(__pte(0))) page;
 
 	page = read_cr3();
 	page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
+
 #ifdef CONFIG_X86_PAE
 	printk("*pdpt = %016Lx ", page);
 	if ((page >> PAGE_SHIFT) < max_low_pfn
 	    && page & _PAGE_PRESENT) {
 		page &= PAGE_MASK;
 		page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
-		                                         & (PTRS_PER_PMD - 1)];
+							& (PTRS_PER_PMD - 1)];
 		printk(KERN_CONT "*pde = %016Lx ", page);
 		page &= ~_PAGE_NX;
 	}
@@ -217,19 +334,145 @@ static void dump_pagetable(unsigned long address)
 	 * We must not directly access the pte in the highpte
 	 * case if the page table is located in highmem.
 	 * And let's rather not kmap-atomic the pte, just in case
-	 * it's allocated already.
+	 * it's allocated already:
 	 */
 	if ((page >> PAGE_SHIFT) < max_low_pfn
 	    && (page & _PAGE_PRESENT)
 	    && !(page & _PAGE_PSE)) {
+
 		page &= PAGE_MASK;
 		page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
-		                                         & (PTRS_PER_PTE - 1)];
+							& (PTRS_PER_PTE - 1)];
 		printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
 	}
 
 	printk("\n");
-#else /* CONFIG_X86_64 */
+}
+
+#else /* CONFIG_X86_64: */
+
+void vmalloc_sync_all(void)
+{
+	unsigned long address;
+
+	for (address = VMALLOC_START & PGDIR_MASK; address <= VMALLOC_END;
+	     address += PGDIR_SIZE) {
+
+		const pgd_t *pgd_ref = pgd_offset_k(address);
+		unsigned long flags;
+		struct page *page;
+
+		if (pgd_none(*pgd_ref))
+			continue;
+
+		spin_lock_irqsave(&pgd_lock, flags);
+		list_for_each_entry(page, &pgd_list, lru) {
+			pgd_t *pgd;
+			pgd = (pgd_t *)page_address(page) + pgd_index(address);
+			if (pgd_none(*pgd))
+				set_pgd(pgd, *pgd_ref);
+			else
+				BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+		}
+		spin_unlock_irqrestore(&pgd_lock, flags);
+	}
+}
+
+/*
+ * 64-bit:
+ *
+ *   Handle a fault on the vmalloc area
+ *
+ * This assumes no large pages in there.
+ */
+static noinline int vmalloc_fault(unsigned long address)
+{
+	pgd_t *pgd, *pgd_ref;
+	pud_t *pud, *pud_ref;
+	pmd_t *pmd, *pmd_ref;
+	pte_t *pte, *pte_ref;
+
+	/* Make sure we are in vmalloc area: */
+	if (!(address >= VMALLOC_START && address < VMALLOC_END))
+		return -1;
+
+	/*
+	 * Copy kernel mappings over when needed. This can also
+	 * happen within a race in page table update. In the later
+	 * case just flush:
+	 */
+	pgd = pgd_offset(current->active_mm, address);
+	pgd_ref = pgd_offset_k(address);
+	if (pgd_none(*pgd_ref))
+		return -1;
+
+	if (pgd_none(*pgd))
+		set_pgd(pgd, *pgd_ref);
+	else
+		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+
+	/*
+	 * Below here mismatches are bugs because these lower tables
+	 * are shared:
+	 */
+
+	pud = pud_offset(pgd, address);
+	pud_ref = pud_offset(pgd_ref, address);
+	if (pud_none(*pud_ref))
+		return -1;
+
+	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
+		BUG();
+
+	pmd = pmd_offset(pud, address);
+	pmd_ref = pmd_offset(pud_ref, address);
+	if (pmd_none(*pmd_ref))
+		return -1;
+
+	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
+		BUG();
+
+	pte_ref = pte_offset_kernel(pmd_ref, address);
+	if (!pte_present(*pte_ref))
+		return -1;
+
+	pte = pte_offset_kernel(pmd, address);
+
+	/*
+	 * Don't use pte_page here, because the mappings can point
+	 * outside mem_map, and the NUMA hash lookup cannot handle
+	 * that:
+	 */
+	if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
+		BUG();
+
+	return 0;
+}
+
+static const char errata93_warning[] =
+KERN_ERR "******* Your BIOS seems to not contain a fix for K8 errata #93\n"
+KERN_ERR "******* Working around it, but it may cause SEGVs or burn power.\n"
+KERN_ERR "******* Please consider a BIOS update.\n"
+KERN_ERR "******* Disabling USB legacy in the BIOS may also help.\n";
+
+/*
+ * No vm86 mode in 64-bit mode:
+ */
+static inline void
+check_v8086_mode(struct pt_regs *regs, unsigned long address,
+		 struct task_struct *tsk)
+{
+}
+
+static int bad_address(void *p)
+{
+	unsigned long dummy;
+
+	return probe_kernel_address((unsigned long *)p, dummy);
+}
+
+static void dump_pagetable(unsigned long address)
+{
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -238,102 +481,77 @@ static void dump_pagetable(unsigned long address)
 	pgd = (pgd_t *)read_cr3();
 
 	pgd = __va((unsigned long)pgd & PHYSICAL_PAGE_MASK);
+
 	pgd += pgd_index(address);
-	if (bad_address(pgd)) goto bad;
+	if (bad_address(pgd))
+		goto bad;
+
 	printk("PGD %lx ", pgd_val(*pgd));
-	if (!pgd_present(*pgd)) goto ret;
+
+	if (!pgd_present(*pgd))
+		goto out;
 
 	pud = pud_offset(pgd, address);
-	if (bad_address(pud)) goto bad;
+	if (bad_address(pud))
+		goto bad;
+
 	printk("PUD %lx ", pud_val(*pud));
 	if (!pud_present(*pud) || pud_large(*pud))
-		goto ret;
+		goto out;
 
 	pmd = pmd_offset(pud, address);
-	if (bad_address(pmd)) goto bad;
+	if (bad_address(pmd))
+		goto bad;
+
 	printk("PMD %lx ", pmd_val(*pmd));
-	if (!pmd_present(*pmd) || pmd_large(*pmd)) goto ret;
+	if (!pmd_present(*pmd) || pmd_large(*pmd))
+		goto out;
 
 	pte = pte_offset_kernel(pmd, address);
-	if (bad_address(pte)) goto bad;
+	if (bad_address(pte))
+		goto bad;
+
 	printk("PTE %lx", pte_val(*pte));
-ret:
+out:
 	printk("\n");
 	return;
 bad:
 	printk("BAD\n");
-#endif
-}
-
-#ifdef CONFIG_X86_32
-static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
-{
-	unsigned index = pgd_index(address);
-	pgd_t *pgd_k;
-	pud_t *pud, *pud_k;
-	pmd_t *pmd, *pmd_k;
-
-	pgd += index;
-	pgd_k = init_mm.pgd + index;
-
-	if (!pgd_present(*pgd_k))
-		return NULL;
-
-	/*
-	 * set_pgd(pgd, *pgd_k); here would be useless on PAE
-	 * and redundant with the set_pmd() on non-PAE. As would
-	 * set_pud.
-	 */
-
-	pud = pud_offset(pgd, address);
-	pud_k = pud_offset(pgd_k, address);
-	if (!pud_present(*pud_k))
-		return NULL;
-
-	pmd = pmd_offset(pud, address);
-	pmd_k = pmd_offset(pud_k, address);
-	if (!pmd_present(*pmd_k))
-		return NULL;
-	if (!pmd_present(*pmd)) {
-		set_pmd(pmd, *pmd_k);
-		arch_flush_lazy_mmu_mode();
-	} else
-		BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k));
-	return pmd_k;
 }
-#endif
 
-#ifdef CONFIG_X86_64
-static const char errata93_warning[] =
-KERN_ERR "******* Your BIOS seems to not contain a fix for K8 errata #93\n"
-KERN_ERR "******* Working around it, but it may cause SEGVs or burn power.\n"
-KERN_ERR "******* Please consider a BIOS update.\n"
-KERN_ERR "******* Disabling USB legacy in the BIOS may also help.\n";
-#endif
+#endif /* CONFIG_X86_64 */
 
-/* Workaround for K8 erratum #93 & buggy BIOS.
-   BIOS SMM functions are required to use a specific workaround
-   to avoid corruption of the 64bit RIP register on C stepping K8.
-   A lot of BIOS that didn't get tested properly miss this.
-   The OS sees this as a page fault with the upper 32bits of RIP cleared.
-   Try to work around it here.
-   Note we only handle faults in kernel here.
-   Does nothing for X86_32
+/*
+ * Workaround for K8 erratum #93 & buggy BIOS.
+ *
+ * BIOS SMM functions are required to use a specific workaround
+ * to avoid corruption of the 64bit RIP register on C stepping K8.
+ *
+ * A lot of BIOS that didn't get tested properly miss this.
+ *
+ * The OS sees this as a page fault with the upper 32bits of RIP cleared.
+ * Try to work around it here.
+ *
+ * Note we only handle faults in kernel here.
+ * Does nothing on 32-bit.
  */
 static int is_errata93(struct pt_regs *regs, unsigned long address)
 {
 #ifdef CONFIG_X86_64
-	static int warned;
+	static int once;
+
 	if (address != regs->ip)
 		return 0;
+
 	if ((address >> 32) != 0)
 		return 0;
+
 	address |= 0xffffffffUL << 32;
 	if ((address >= (u64)_stext && address <= (u64)_etext) ||
 	    (address >= MODULES_VADDR && address <= MODULES_END)) {
-		if (!warned) {
+		if (!once) {
 			printk(errata93_warning);
-			warned = 1;
+			once = 1;
 		}
 		regs->ip = address;
 		return 1;
@@ -343,16 +561,17 @@ static int is_errata93(struct pt_regs *regs, unsigned long address)
 }
 
 /*
- * Work around K8 erratum #100 K8 in compat mode occasionally jumps to illegal
- * addresses >4GB.  We catch this in the page fault handler because these
- * addresses are not reachable. Just detect this case and return.  Any code
+ * Work around K8 erratum #100 K8 in compat mode occasionally jumps
+ * to illegal addresses >4GB.
+ *
+ * We catch this in the page fault handler because these addresses
+ * are not reachable. Just detect this case and return.  Any code
  * segment in LDT is compatibility mode.
  */
 static int is_errata100(struct pt_regs *regs, unsigned long address)
 {
 #ifdef CONFIG_X86_64
-	if ((regs->cs == __USER32_CS || (regs->cs & (1<<2))) &&
-	    (address >> 32))
+	if ((regs->cs == __USER32_CS || (regs->cs & (1<<2))) && (address >> 32))
 		return 1;
 #endif
 	return 0;
@@ -362,8 +581,9 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
 {
 #ifdef CONFIG_X86_F00F_BUG
 	unsigned long nr;
+
 	/*
-	 * Pentium F0 0F C7 C8 bug workaround.
+	 * Pentium F0 0F C7 C8 bug workaround:
 	 */
 	if (boot_cpu_data.f00f_bug) {
 		nr = (address - idt_descr.address) >> 3;
@@ -377,62 +597,277 @@ static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
 	return 0;
 }
 
-static void show_fault_oops(struct pt_regs *regs, unsigned long error_code,
-			    unsigned long address)
+static const char nx_warning[] = KERN_CRIT
+"kernel tried to execute NX-protected page - exploit attempt? (uid: %d)\n";
+
+static void
+show_fault_oops(struct pt_regs *regs, unsigned long error_code,
+		unsigned long address)
 {
-#ifdef CONFIG_X86_32
 	if (!oops_may_print())
 		return;
-#endif
 
-#ifdef CONFIG_X86_PAE
 	if (error_code & PF_INSTR) {
 		unsigned int level;
+
 		pte_t *pte = lookup_address(address, &level);
 
 		if (pte && pte_present(*pte) && !pte_exec(*pte))
-			printk(KERN_CRIT "kernel tried to execute "
-				"NX-protected page - exploit attempt? "
-				"(uid: %d)\n", current_uid());
+			printk(nx_warning, current_uid());
 	}
-#endif
 
 	printk(KERN_ALERT "BUG: unable to handle kernel ");
 	if (address < PAGE_SIZE)
 		printk(KERN_CONT "NULL pointer dereference");
 	else
 		printk(KERN_CONT "paging request");
+
 	printk(KERN_CONT " at %p\n", (void *) address);
 	printk(KERN_ALERT "IP:");
 	printk_address(regs->ip, 1);
+
 	dump_pagetable(address);
 }
 
-#ifdef CONFIG_X86_64
-static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
-				 unsigned long error_code)
+static noinline void
+pgtable_bad(struct pt_regs *regs, unsigned long error_code,
+	    unsigned long address)
 {
-	unsigned long flags = oops_begin();
-	int sig = SIGKILL;
 	struct task_struct *tsk;
+	unsigned long flags;
+	int sig;
+
+	flags = oops_begin();
+	tsk = current;
+	sig = SIGKILL;
 
 	printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
-	       current->comm, address);
+	       tsk->comm, address);
 	dump_pagetable(address);
-	tsk = current;
-	tsk->thread.cr2 = address;
-	tsk->thread.trap_no = 14;
-	tsk->thread.error_code = error_code;
+
+	tsk->thread.cr2		= address;
+	tsk->thread.trap_no	= 14;
+	tsk->thread.error_code	= error_code;
+
 	if (__die("Bad pagetable", regs, error_code))
 		sig = 0;
+
 	oops_end(flags, regs, sig);
 }
-#endif
+
+static noinline void
+no_context(struct pt_regs *regs, unsigned long error_code,
+	   unsigned long address)
+{
+	struct task_struct *tsk = current;
+	unsigned long *stackend;
+	unsigned long flags;
+	int sig;
+
+	/* Are we prepared to handle this kernel fault? */
+	if (fixup_exception(regs))
+		return;
+
+	/*
+	 * 32-bit:
+	 *
+	 *   Valid to do another page fault here, because if this fault
+	 *   had been triggered by is_prefetch fixup_exception would have
+	 *   handled it.
+	 *
+	 * 64-bit:
+	 *
+	 *   Hall of shame of CPU/BIOS bugs.
+	 */
+	if (is_prefetch(regs, error_code, address))
+		return;
+
+	if (is_errata93(regs, address))
+		return;
+
+	/*
+	 * Oops. The kernel tried to access some bad page. We'll have to
+	 * terminate things with extreme prejudice:
+	 */
+	flags = oops_begin();
+
+	show_fault_oops(regs, error_code, address);
+
+	stackend = end_of_stack(tsk);
+	if (*stackend != STACK_END_MAGIC)
+		printk(KERN_ALERT "Thread overran stack, or stack corrupted\n");
+
+	tsk->thread.cr2		= address;
+	tsk->thread.trap_no	= 14;
+	tsk->thread.error_code	= error_code;
+
+	sig = SIGKILL;
+	if (__die("Oops", regs, error_code))
+		sig = 0;
+
+	/* Executive summary in case the body of the oops scrolled away */
+	printk(KERN_EMERG "CR2: %016lx\n", address);
+
+	oops_end(flags, regs, sig);
+}
+
+/*
+ * Print out info about fatal segfaults, if the show_unhandled_signals
+ * sysctl is set:
+ */
+static inline void
+show_signal_msg(struct pt_regs *regs, unsigned long error_code,
+		unsigned long address, struct task_struct *tsk)
+{
+	if (!unhandled_signal(tsk, SIGSEGV))
+		return;
+
+	if (!printk_ratelimit())
+		return;
+
+	printk(KERN_CONT "%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
+		task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+		tsk->comm, task_pid_nr(tsk), address,
+		(void *)regs->ip, (void *)regs->sp, error_code);
+
+	print_vma_addr(KERN_CONT " in ", regs->ip);
+
+	printk(KERN_CONT "\n");
+}
+
+static void
+__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
+		       unsigned long address, int si_code)
+{
+	struct task_struct *tsk = current;
+
+	/* User mode accesses just cause a SIGSEGV */
+	if (error_code & PF_USER) {
+		/*
+		 * It's possible to have interrupts off here:
+		 */
+		local_irq_enable();
+
+		/*
+		 * Valid to do another page fault here because this one came
+		 * from user space:
+		 */
+		if (is_prefetch(regs, error_code, address))
+			return;
+
+		if (is_errata100(regs, address))
+			return;
+
+		if (unlikely(show_unhandled_signals))
+			show_signal_msg(regs, error_code, address, tsk);
+
+		/* Kernel addresses are always protection faults: */
+		tsk->thread.cr2		= address;
+		tsk->thread.error_code	= error_code | (address >= TASK_SIZE);
+		tsk->thread.trap_no	= 14;
+
+		force_sig_info_fault(SIGSEGV, si_code, address, tsk);
+
+		return;
+	}
+
+	if (is_f00f_bug(regs, address))
+		return;
+
+	no_context(regs, error_code, address);
+}
+
+static noinline void
+bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
+		     unsigned long address)
+{
+	__bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR);
+}
+
+static void
+__bad_area(struct pt_regs *regs, unsigned long error_code,
+	   unsigned long address, int si_code)
+{
+	struct mm_struct *mm = current->mm;
+
+	/*
+	 * Something tried to access memory that isn't in our memory map..
+	 * Fix it, but check if it's kernel or user first..
+	 */
+	up_read(&mm->mmap_sem);
+
+	__bad_area_nosemaphore(regs, error_code, address, si_code);
+}
+
+static noinline void
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+{
+	__bad_area(regs, error_code, address, SEGV_MAPERR);
+}
+
+static noinline void
+bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
+		      unsigned long address)
+{
+	__bad_area(regs, error_code, address, SEGV_ACCERR);
+}
+
+/* TODO: fixup for "mm-invoke-oom-killer-from-page-fault.patch" */
+static void
+out_of_memory(struct pt_regs *regs, unsigned long error_code,
+	      unsigned long address)
+{
+	/*
+	 * We ran out of memory, call the OOM killer, and return the userspace
+	 * (which will retry the fault, or kill us if we got oom-killed):
+	 */
+	up_read(&current->mm->mmap_sem);
+
+	pagefault_out_of_memory();
+}
+
+static void
+do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+{
+	struct task_struct *tsk = current;
+	struct mm_struct *mm = tsk->mm;
+
+	up_read(&mm->mmap_sem);
+
+	/* Kernel mode? Handle exceptions or die: */
+	if (!(error_code & PF_USER))
+		no_context(regs, error_code, address);
+
+	/* User-space => ok to do another page fault: */
+	if (is_prefetch(regs, error_code, address))
+		return;
+
+	tsk->thread.cr2		= address;
+	tsk->thread.error_code	= error_code;
+	tsk->thread.trap_no	= 14;
+
+	force_sig_info_fault(SIGBUS, BUS_ADRERR, address, tsk);
+}
+
+static noinline void
+mm_fault_error(struct pt_regs *regs, unsigned long error_code,
+	       unsigned long address, unsigned int fault)
+{
+	if (fault & VM_FAULT_OOM) {
+		out_of_memory(regs, error_code, address);
+	} else {
+		if (fault & VM_FAULT_SIGBUS)
+			do_sigbus(regs, error_code, address);
+		else
+			BUG();
+	}
+}
 
 static int spurious_fault_check(unsigned long error_code, pte_t *pte)
 {
 	if ((error_code & PF_WRITE) && !pte_write(*pte))
 		return 0;
+
 	if ((error_code & PF_INSTR) && !pte_exec(*pte))
 		return 0;
 
@@ -440,21 +875,25 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
 }
 
 /*
- * Handle a spurious fault caused by a stale TLB entry.  This allows
- * us to lazily refresh the TLB when increasing the permissions of a
- * kernel page (RO -> RW or NX -> X).  Doing it eagerly is very
- * expensive since that implies doing a full cross-processor TLB
- * flush, even if no stale TLB entries exist on other processors.
+ * Handle a spurious fault caused by a stale TLB entry.
+ *
+ * This allows us to lazily refresh the TLB when increasing the
+ * permissions of a kernel page (RO -> RW or NX -> X).  Doing it
+ * eagerly is very expensive since that implies doing a full
+ * cross-processor TLB flush, even if no stale TLB entries exist
+ * on other processors.
+ *
  * There are no security implications to leaving a stale TLB when
  * increasing the permissions on a page.
  */
-static int spurious_fault(unsigned long address,
-			  unsigned long error_code)
+static noinline int
+spurious_fault(unsigned long error_code, unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	int ret;
 
 	/* Reserved-bit violation or user access to kernel space? */
 	if (error_code & (PF_USER | PF_RSVD))
@@ -482,127 +921,71 @@ static int spurious_fault(unsigned long address,
 	if (!pte_present(*pte))
 		return 0;
 
-	return spurious_fault_check(error_code, pte);
-}
-
-/*
- * X86_32
- * Handle a fault on the vmalloc or module mapping area
- *
- * X86_64
- * Handle a fault on the vmalloc area
- *
- * This assumes no large pages in there.
- */
-static int vmalloc_fault(unsigned long address)
-{
-#ifdef CONFIG_X86_32
-	unsigned long pgd_paddr;
-	pmd_t *pmd_k;
-	pte_t *pte_k;
-
-	/* Make sure we are in vmalloc area */
-	if (!(address >= VMALLOC_START && address < VMALLOC_END))
-		return -1;
+	ret = spurious_fault_check(error_code, pte);
+	if (!ret)
+		return 0;
 
 	/*
-	 * Synchronize this task's top level page-table
-	 * with the 'reference' page table.
-	 *
-	 * Do _not_ use "current" here. We might be inside
-	 * an interrupt in the middle of a task switch..
+	 * Make sure we have permissions in PMD.
+	 * If not, then there's a bug in the page tables:
 	 */
-	pgd_paddr = read_cr3();
-	pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
-	if (!pmd_k)
-		return -1;
-	pte_k = pte_offset_kernel(pmd_k, address);
-	if (!pte_present(*pte_k))
-		return -1;
-	return 0;
-#else
-	pgd_t *pgd, *pgd_ref;
-	pud_t *pud, *pud_ref;
-	pmd_t *pmd, *pmd_ref;
-	pte_t *pte, *pte_ref;
+	ret = spurious_fault_check(error_code, (pte_t *) pmd);
+	WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
 
-	/* Make sure we are in vmalloc area */
-	if (!(address >= VMALLOC_START && address < VMALLOC_END))
-		return -1;
+	return ret;
+}
 
-	/* Copy kernel mappings over when needed. This can also
-	   happen within a race in page table update. In the later
-	   case just flush. */
+int show_unhandled_signals = 1;
 
-	pgd = pgd_offset(current->active_mm, address);
-	pgd_ref = pgd_offset_k(address);
-	if (pgd_none(*pgd_ref))
-		return -1;
-	if (pgd_none(*pgd))
-		set_pgd(pgd, *pgd_ref);
-	else
-		BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
+static inline int
+access_error(unsigned long error_code, int write, struct vm_area_struct *vma)
+{
+	if (write) {
+		/* write, present and write, not present: */
+		if (unlikely(!(vma->vm_flags & VM_WRITE)))
+			return 1;
+		return 0;
+	}
 
-	/* Below here mismatches are bugs because these lower tables
-	   are shared */
+	/* read, present: */
+	if (unlikely(error_code & PF_PROT))
+		return 1;
+
+	/* read, not present: */
+	if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
+		return 1;
 
-	pud = pud_offset(pgd, address);
-	pud_ref = pud_offset(pgd_ref, address);
-	if (pud_none(*pud_ref))
-		return -1;
-	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
-		BUG();
-	pmd = pmd_offset(pud, address);
-	pmd_ref = pmd_offset(pud_ref, address);
-	if (pmd_none(*pmd_ref))
-		return -1;
-	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
-		BUG();
-	pte_ref = pte_offset_kernel(pmd_ref, address);
-	if (!pte_present(*pte_ref))
-		return -1;
-	pte = pte_offset_kernel(pmd, address);
-	/* Don't use pte_page here, because the mappings can point
-	   outside mem_map, and the NUMA hash lookup cannot handle
-	   that. */
-	if (!pte_present(*pte) || pte_pfn(*pte) != pte_pfn(*pte_ref))
-		BUG();
 	return 0;
-#endif
 }
 
-int show_unhandled_signals = 1;
+static int fault_in_kernel_space(unsigned long address)
+{
+	return address >= TASK_SIZE_MAX;
+}
 
 /*
  * This routine handles page faults.  It determines the address,
  * and the problem, and then passes it off to one of the appropriate
  * routines.
  */
-#ifdef CONFIG_X86_64
-asmlinkage
-#endif
-void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
+dotraplinkage void __kprobes
+do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-	struct task_struct *tsk;
-	struct mm_struct *mm;
 	struct vm_area_struct *vma;
+	struct task_struct *tsk;
 	unsigned long address;
-	int write, si_code;
+	struct mm_struct *mm;
+	int write;
 	int fault;
-#ifdef CONFIG_X86_64
-	unsigned long flags;
-	int sig;
-#endif
 
 	tsk = current;
 	mm = tsk->mm;
+
 	prefetchw(&mm->mmap_sem);
 
-	/* get the address */
+	/* Get the faulting address: */
 	address = read_cr2();
 
-	si_code = SEGV_MAPERR;
-
 	if (unlikely(kmmio_fault(regs, address)))
 		return;
 
@@ -619,319 +1002,147 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * (error_code & 4) == 0, and that the fault was not a
 	 * protection error (error_code & 9) == 0.
 	 */
-#ifdef CONFIG_X86_32
-	if (unlikely(address >= TASK_SIZE)) {
-#else
-	if (unlikely(address >= TASK_SIZE64)) {
-#endif
+	if (unlikely(fault_in_kernel_space(address))) {
 		if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
 		    vmalloc_fault(address) >= 0)
 			return;
 
-		/* Can handle a stale RO->RW TLB */
-		if (spurious_fault(address, error_code))
+		/* Can handle a stale RO->RW TLB: */
+		if (spurious_fault(error_code, address))
 			return;
 
-		/* kprobes don't want to hook the spurious faults. */
+		/* kprobes don't want to hook the spurious faults: */
 		if (notify_page_fault(regs))
 			return;
 		/*
 		 * Don't take the mm semaphore here. If we fixup a prefetch
-		 * fault we could otherwise deadlock.
+		 * fault we could otherwise deadlock:
 		 */
-		goto bad_area_nosemaphore;
-	}
+		bad_area_nosemaphore(regs, error_code, address);
 
-	/* kprobes don't want to hook the spurious faults. */
-	if (notify_page_fault(regs))
 		return;
+	}
 
+	/* kprobes don't want to hook the spurious faults: */
+	if (unlikely(notify_page_fault(regs)))
+		return;
 	/*
 	 * It's safe to allow irq's after cr2 has been saved and the
 	 * vmalloc fault has been handled.
 	 *
 	 * User-mode registers count as a user access even for any
-	 * potential system fault or CPU buglet.
+	 * potential system fault or CPU buglet:
 	 */
 	if (user_mode_vm(regs)) {
 		local_irq_enable();
 		error_code |= PF_USER;
-	} else if (regs->flags & X86_EFLAGS_IF)
-		local_irq_enable();
+	} else {
+		if (regs->flags & X86_EFLAGS_IF)
+			local_irq_enable();
+	}
 
-#ifdef CONFIG_X86_64
 	if (unlikely(error_code & PF_RSVD))
-		pgtable_bad(address, regs, error_code);
-#endif
+		pgtable_bad(regs, error_code, address);
 
 	/*
-	 * If we're in an interrupt, have no user context or are running in an
-	 * atomic region then we must not take the fault.
+	 * If we're in an interrupt, have no user context or are running
+	 * in an atomic region then we must not take the fault:
 	 */
-	if (unlikely(in_atomic() || !mm))
-		goto bad_area_nosemaphore;
+	if (unlikely(in_atomic() || !mm)) {
+		bad_area_nosemaphore(regs, error_code, address);
+		return;
+	}
 
 	/*
 	 * When running in the kernel we expect faults to occur only to
-	 * addresses in user space.  All other faults represent errors in the
-	 * kernel and should generate an OOPS.  Unfortunately, in the case of an
-	 * erroneous fault occurring in a code path which already holds mmap_sem
-	 * we will deadlock attempting to validate the fault against the
-	 * address space.  Luckily the kernel only validly references user
-	 * space from well defined areas of code, which are listed in the
-	 * exceptions table.
+	 * addresses in user space.  All other faults represent errors in
+	 * the kernel and should generate an OOPS.  Unfortunately, in the
+	 * case of an erroneous fault occurring in a code path which already
+	 * holds mmap_sem we will deadlock attempting to validate the fault
+	 * against the address space.  Luckily the kernel only validly
+	 * references user space from well defined areas of code, which are
+	 * listed in the exceptions table.
 	 *
 	 * As the vast majority of faults will be valid we will only perform
-	 * the source reference check when there is a possibility of a deadlock.
-	 * Attempt to lock the address space, if we cannot we then validate the
-	 * source.  If this is invalid we can skip the address space check,
-	 * thus avoiding the deadlock.
+	 * the source reference check when there is a possibility of a
+	 * deadlock. Attempt to lock the address space, if we cannot we then
+	 * validate the source. If this is invalid we can skip the address
+	 * space check, thus avoiding the deadlock:
 	 */
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
 		if ((error_code & PF_USER) == 0 &&
-		    !search_exception_tables(regs->ip))
-			goto bad_area_nosemaphore;
+		    !search_exception_tables(regs->ip)) {
+			bad_area_nosemaphore(regs, error_code, address);
+			return;
+		}
 		down_read(&mm->mmap_sem);
+	} else {
+		/*
+		 * The above down_read_trylock() might have succeeded in
+		 * which case we'll have missed the might_sleep() from
+		 * down_read():
+		 */
+		might_sleep();
 	}
 
 	vma = find_vma(mm, address);
-	if (!vma)
-		goto bad_area;
-	if (vma->vm_start <= address)
+	if (unlikely(!vma)) {
+		bad_area(regs, error_code, address);
+		return;
+	}
+	if (likely(vma->vm_start <= address))
 		goto good_area;
-	if (!(vma->vm_flags & VM_GROWSDOWN))
-		goto bad_area;
+	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
+		bad_area(regs, error_code, address);
+		return;
+	}
 	if (error_code & PF_USER) {
 		/*
 		 * Accessing the stack below %sp is always a bug.
 		 * The large cushion allows instructions like enter
-		 * and pusha to work.  ("enter $65535,$31" pushes
+		 * and pusha to work. ("enter $65535, $31" pushes
 		 * 32 pointers and then decrements %sp by 65535.)
 		 */
-		if (address + 65536 + 32 * sizeof(unsigned long) < regs->sp)
-			goto bad_area;
+		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
+			bad_area(regs, error_code, address);
+			return;
+		}
 	}
-	if (expand_stack(vma, address))
-		goto bad_area;
-/*
- * Ok, we have a good vm_area for this memory access, so
- * we can handle it..
- */
+	if (unlikely(expand_stack(vma, address))) {
+		bad_area(regs, error_code, address);
+		return;
+	}
+
+	/*
+	 * Ok, we have a good vm_area for this memory access, so
+	 * we can handle it..
+	 */
 good_area:
-	si_code = SEGV_ACCERR;
-	write = 0;
-	switch (error_code & (PF_PROT|PF_WRITE)) {
-	default:	/* 3: write, present */
-		/* fall through */
-	case PF_WRITE:		/* write, not present */
-		if (!(vma->vm_flags & VM_WRITE))
-			goto bad_area;
-		write++;
-		break;
-	case PF_PROT:		/* read, present */
-		goto bad_area;
-	case 0:			/* read, not present */
-		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
-			goto bad_area;
+	write = error_code & PF_WRITE;
+
+	if (unlikely(access_error(error_code, write, vma))) {
+		bad_area_access_error(regs, error_code, address);
+		return;
 	}
 
 	/*
 	 * If for any reason at all we couldn't handle the fault,
 	 * make sure we exit gracefully rather than endlessly redo
-	 * the fault.
+	 * the fault:
 	 */
 	fault = handle_mm_fault(mm, vma, address, write);
+
 	if (unlikely(fault & VM_FAULT_ERROR)) {
-		if (fault & VM_FAULT_OOM)
-			goto out_of_memory;
-		else if (fault & VM_FAULT_SIGBUS)
-			goto do_sigbus;
-		BUG();
+		mm_fault_error(regs, error_code, address, fault);
+		return;
 	}
+
 	if (fault & VM_FAULT_MAJOR)
 		tsk->maj_flt++;
 	else
 		tsk->min_flt++;
 
-#ifdef CONFIG_X86_32
-	/*
-	 * Did it hit the DOS screen memory VA from vm86 mode?
-	 */
-	if (v8086_mode(regs)) {
-		unsigned long bit = (address - 0xA0000) >> PAGE_SHIFT;
-		if (bit < 32)
-			tsk->thread.screen_bitmap |= 1 << bit;
-	}
-#endif
-	up_read(&mm->mmap_sem);
-	return;
+	check_v8086_mode(regs, address, tsk);
 
-/*
- * Something tried to access memory that isn't in our memory map..
- * Fix it, but check if it's kernel or user first..
- */
-bad_area:
 	up_read(&mm->mmap_sem);
-
-bad_area_nosemaphore:
-	/* User mode accesses just cause a SIGSEGV */
-	if (error_code & PF_USER) {
-		/*
-		 * It's possible to have interrupts off here.
-		 */
-		local_irq_enable();
-
-		/*
-		 * Valid to do another page fault here because this one came
-		 * from user space.
-		 */
-		if (is_prefetch(regs, address, error_code))
-			return;
-
-		if (is_errata100(regs, address))
-			return;
-
-		if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
-		    printk_ratelimit()) {
-			printk(
-			"%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
-			task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
-			tsk->comm, task_pid_nr(tsk), address,
-			(void *) regs->ip, (void *) regs->sp, error_code);
-			print_vma_addr(" in ", regs->ip);
-			printk("\n");
-		}
-
-		tsk->thread.cr2 = address;
-		/* Kernel addresses are always protection faults */
-		tsk->thread.error_code = error_code | (address >= TASK_SIZE);
-		tsk->thread.trap_no = 14;
-		force_sig_info_fault(SIGSEGV, si_code, address, tsk);
-		return;
-	}
-
-	if (is_f00f_bug(regs, address))
-		return;
-
-no_context:
-	/* Are we prepared to handle this kernel fault?  */
-	if (fixup_exception(regs))
-		return;
-
-	/*
-	 * X86_32
-	 * Valid to do another page fault here, because if this fault
-	 * had been triggered by is_prefetch fixup_exception would have
-	 * handled it.
-	 *
-	 * X86_64
-	 * Hall of shame of CPU/BIOS bugs.
-	 */
-	if (is_prefetch(regs, address, error_code))
-		return;
-
-	if (is_errata93(regs, address))
-		return;
-
-/*
- * Oops. The kernel tried to access some bad page. We'll have to
- * terminate things with extreme prejudice.
- */
-#ifdef CONFIG_X86_32
-	bust_spinlocks(1);
-#else
-	flags = oops_begin();
-#endif
-
-	show_fault_oops(regs, error_code, address);
-
-	tsk->thread.cr2 = address;
-	tsk->thread.trap_no = 14;
-	tsk->thread.error_code = error_code;
-
-#ifdef CONFIG_X86_32
-	die("Oops", regs, error_code);
-	bust_spinlocks(0);
-	do_exit(SIGKILL);
-#else
-	sig = SIGKILL;
-	if (__die("Oops", regs, error_code))
-		sig = 0;
-	/* Executive summary in case the body of the oops scrolled away */
-	printk(KERN_EMERG "CR2: %016lx\n", address);
-	oops_end(flags, regs, sig);
-#endif
-
-out_of_memory:
-	/*
-	 * We ran out of memory, call the OOM killer, and return the userspace
-	 * (which will retry the fault, or kill us if we got oom-killed).
-	 */
-	up_read(&mm->mmap_sem);
-	pagefault_out_of_memory();
-	return;
-
-do_sigbus:
-	up_read(&mm->mmap_sem);
-
-	/* Kernel mode? Handle exceptions or die */
-	if (!(error_code & PF_USER))
-		goto no_context;
-#ifdef CONFIG_X86_32
-	/* User space => ok to do another page fault */
-	if (is_prefetch(regs, address, error_code))
-		return;
-#endif
-	tsk->thread.cr2 = address;
-	tsk->thread.error_code = error_code;
-	tsk->thread.trap_no = 14;
-	force_sig_info_fault(SIGBUS, BUS_ADRERR, address, tsk);
-}
-
-DEFINE_SPINLOCK(pgd_lock);
-LIST_HEAD(pgd_list);
-
-void vmalloc_sync_all(void)
-{
-	unsigned long address;
-
-#ifdef CONFIG_X86_32
-	if (SHARED_KERNEL_PMD)
-		return;
-
-	for (address = VMALLOC_START & PMD_MASK;
-	     address >= TASK_SIZE && address < FIXADDR_TOP;
-	     address += PMD_SIZE) {
-		unsigned long flags;
-		struct page *page;
-
-		spin_lock_irqsave(&pgd_lock, flags);
-		list_for_each_entry(page, &pgd_list, lru) {
-			if (!vmalloc_sync_one(page_address(page),
-					      address))
-				break;
-		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
-	}
-#else /* CONFIG_X86_64 */
-	for (address = VMALLOC_START & PGDIR_MASK; address <= VMALLOC_END;
-	     address += PGDIR_SIZE) {
-		const pgd_t *pgd_ref = pgd_offset_k(address);
-		unsigned long flags;
-		struct page *page;
-
-		if (pgd_none(*pgd_ref))
-			continue;
-		spin_lock_irqsave(&pgd_lock, flags);
-		list_for_each_entry(page, &pgd_list, lru) {
-			pgd_t *pgd;
-			pgd = (pgd_t *)page_address(page) + pgd_index(address);
-			if (pgd_none(*pgd))
-				set_pgd(pgd, *pgd_ref);
-			else
-				BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
-		}
-		spin_unlock_irqrestore(&pgd_lock, flags);
-	}
-#endif
 }
diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index bcc079c282dd..00f127c80b0e 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -1,5 +1,6 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
+#include <linux/swap.h> /* for totalram_pages */
 
 void *kmap(struct page *page)
 {
@@ -156,3 +157,36 @@ EXPORT_SYMBOL(kmap);
 EXPORT_SYMBOL(kunmap);
 EXPORT_SYMBOL(kmap_atomic);
 EXPORT_SYMBOL(kunmap_atomic);
+
+#ifdef CONFIG_NUMA
+void __init set_highmem_pages_init(void)
+{
+	struct zone *zone;
+	int nid;
+
+	for_each_zone(zone) {
+		unsigned long zone_start_pfn, zone_end_pfn;
+
+		if (!is_highmem(zone))
+			continue;
+
+		zone_start_pfn = zone->zone_start_pfn;
+		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
+
+		nid = zone_to_nid(zone);
+		printk(KERN_INFO "Initializing %s for node %d (%08lx:%08lx)\n",
+				zone->name, nid, zone_start_pfn, zone_end_pfn);
+
+		add_highpages_with_active_regions(nid, zone_start_pfn,
+				 zone_end_pfn);
+	}
+	totalram_pages += totalhigh_pages;
+}
+#else
+void __init set_highmem_pages_init(void)
+{
+	add_highpages_with_active_regions(0, highstart_pfn, highend_pfn);
+
+	totalram_pages += totalhigh_pages;
+}
+#endif /* CONFIG_NUMA */
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
new file mode 100644
index 000000000000..ce6a722587d8
--- /dev/null
+++ b/arch/x86/mm/init.c
@@ -0,0 +1,49 @@
+#include <linux/swap.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+#include <asm/sections.h>
+#include <asm/system.h>
+
+void free_init_pages(char *what, unsigned long begin, unsigned long end)
+{
+	unsigned long addr = begin;
+
+	if (addr >= end)
+		return;
+
+	/*
+	 * If debugging page accesses then do not free this memory but
+	 * mark them not present - any buggy init-section access will
+	 * create a kernel page fault:
+	 */
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	printk(KERN_INFO "debug: unmapping init memory %08lx..%08lx\n",
+		begin, PAGE_ALIGN(end));
+	set_memory_np(begin, (end - begin) >> PAGE_SHIFT);
+#else
+	/*
+	 * We just marked the kernel text read only above, now that
+	 * we are going to free part of that, we need to make that
+	 * writeable first.
+	 */
+	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+
+	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
+
+	for (; addr < end; addr += PAGE_SIZE) {
+		ClearPageReserved(virt_to_page(addr));
+		init_page_count(virt_to_page(addr));
+		memset((void *)(addr & ~(PAGE_SIZE-1)),
+			POISON_FREE_INITMEM, PAGE_SIZE);
+		free_page(addr);
+		totalram_pages++;
+	}
+#endif
+}
+
+void free_initmem(void)
+{
+	free_init_pages("unused kernel memory",
+			(unsigned long)(&__init_begin),
+			(unsigned long)(&__init_end));
+}
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 2cef05074413..47df0e1bbeb9 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -49,9 +49,6 @@
 #include <asm/paravirt.h>
 #include <asm/setup.h>
 #include <asm/cacheflush.h>
-#include <asm/smp.h>
-
-unsigned int __VMALLOC_RESERVE = 128 << 20;
 
 unsigned long max_low_pfn_mapped;
 unsigned long max_pfn_mapped;
@@ -138,6 +135,23 @@ static pte_t * __init one_page_table_init(pmd_t *pmd)
 	return pte_offset_kernel(pmd, 0);
 }
 
+pmd_t * __init populate_extra_pmd(unsigned long vaddr)
+{
+	int pgd_idx = pgd_index(vaddr);
+	int pmd_idx = pmd_index(vaddr);
+
+	return one_md_table_init(swapper_pg_dir + pgd_idx) + pmd_idx;
+}
+
+pte_t * __init populate_extra_pte(unsigned long vaddr)
+{
+	int pte_idx = pte_index(vaddr);
+	pmd_t *pmd;
+
+	pmd = populate_extra_pmd(vaddr);
+	return one_page_table_init(pmd) + pte_idx;
+}
+
 static pte_t *__init page_table_kmap_check(pte_t *pte, pmd_t *pmd,
 					   unsigned long vaddr, pte_t *lastpte)
 {
@@ -470,22 +484,10 @@ void __init add_highpages_with_active_regions(int nid, unsigned long start_pfn,
 	work_with_active_regions(nid, add_highpages_work_fn, &data);
 }
 
-#ifndef CONFIG_NUMA
-static void __init set_highmem_pages_init(void)
-{
-	add_highpages_with_active_regions(0, highstart_pfn, highend_pfn);
-
-	totalram_pages += totalhigh_pages;
-}
-#endif /* !CONFIG_NUMA */
-
 #else
 static inline void permanent_kmaps_init(pgd_t *pgd_base)
 {
 }
-static inline void set_highmem_pages_init(void)
-{
-}
 #endif /* CONFIG_HIGHMEM */
 
 void __init native_pagetable_setup_start(pgd_t *base)
@@ -675,75 +677,97 @@ static int __init parse_highmem(char *arg)
 }
 early_param("highmem", parse_highmem);
 
+#define MSG_HIGHMEM_TOO_BIG \
+	"highmem size (%luMB) is bigger than pages available (%luMB)!\n"
+
+#define MSG_LOWMEM_TOO_SMALL \
+	"highmem size (%luMB) results in <64MB lowmem, ignoring it!\n"
 /*
- * Determine low and high memory ranges:
+ * All of RAM fits into lowmem - but if user wants highmem
+ * artificially via the highmem=x boot parameter then create
+ * it:
  */
-void __init find_low_pfn_range(void)
+void __init lowmem_pfn_init(void)
 {
-	/* it could update max_pfn */
-
 	/* max_low_pfn is 0, we already have early_res support */
-
 	max_low_pfn = max_pfn;
-	if (max_low_pfn > MAXMEM_PFN) {
-		if (highmem_pages == -1)
-			highmem_pages = max_pfn - MAXMEM_PFN;
-		if (highmem_pages + MAXMEM_PFN < max_pfn)
-			max_pfn = MAXMEM_PFN + highmem_pages;
-		if (highmem_pages + MAXMEM_PFN > max_pfn) {
-			printk(KERN_WARNING "only %luMB highmem pages "
-				"available, ignoring highmem size of %uMB.\n",
-				pages_to_mb(max_pfn - MAXMEM_PFN),
+
+	if (highmem_pages == -1)
+		highmem_pages = 0;
+#ifdef CONFIG_HIGHMEM
+	if (highmem_pages >= max_pfn) {
+		printk(KERN_ERR MSG_HIGHMEM_TOO_BIG,
+			pages_to_mb(highmem_pages), pages_to_mb(max_pfn));
+		highmem_pages = 0;
+	}
+	if (highmem_pages) {
+		if (max_low_pfn - highmem_pages < 64*1024*1024/PAGE_SIZE) {
+			printk(KERN_ERR MSG_LOWMEM_TOO_SMALL,
 				pages_to_mb(highmem_pages));
 			highmem_pages = 0;
 		}
-		max_low_pfn = MAXMEM_PFN;
+		max_low_pfn -= highmem_pages;
+	}
+#else
+	if (highmem_pages)
+		printk(KERN_ERR "ignoring highmem size on non-highmem kernel!\n");
+#endif
+}
+
+#define MSG_HIGHMEM_TOO_SMALL \
+	"only %luMB highmem pages available, ignoring highmem size of %luMB!\n"
+
+#define MSG_HIGHMEM_TRIMMED \
+	"Warning: only 4GB will be used. Use a HIGHMEM64G enabled kernel!\n"
+/*
+ * We have more RAM than fits into lowmem - we try to put it into
+ * highmem, also taking the highmem=x boot parameter into account:
+ */
+void __init highmem_pfn_init(void)
+{
+	max_low_pfn = MAXMEM_PFN;
+
+	if (highmem_pages == -1)
+		highmem_pages = max_pfn - MAXMEM_PFN;
+
+	if (highmem_pages + MAXMEM_PFN < max_pfn)
+		max_pfn = MAXMEM_PFN + highmem_pages;
+
+	if (highmem_pages + MAXMEM_PFN > max_pfn) {
+		printk(KERN_WARNING MSG_HIGHMEM_TOO_SMALL,
+			pages_to_mb(max_pfn - MAXMEM_PFN),
+			pages_to_mb(highmem_pages));
+		highmem_pages = 0;
+	}
 #ifndef CONFIG_HIGHMEM
-		/* Maximum memory usable is what is directly addressable */
-		printk(KERN_WARNING "Warning only %ldMB will be used.\n",
-					MAXMEM>>20);
-		if (max_pfn > MAX_NONPAE_PFN)
-			printk(KERN_WARNING
-				 "Use a HIGHMEM64G enabled kernel.\n");
-		else
-			printk(KERN_WARNING "Use a HIGHMEM enabled kernel.\n");
-		max_pfn = MAXMEM_PFN;
+	/* Maximum memory usable is what is directly addressable */
+	printk(KERN_WARNING "Warning only %ldMB will be used.\n", MAXMEM>>20);
+	if (max_pfn > MAX_NONPAE_PFN)
+		printk(KERN_WARNING "Use a HIGHMEM64G enabled kernel.\n");
+	else
+		printk(KERN_WARNING "Use a HIGHMEM enabled kernel.\n");
+	max_pfn = MAXMEM_PFN;
 #else /* !CONFIG_HIGHMEM */
 #ifndef CONFIG_HIGHMEM64G
-		if (max_pfn > MAX_NONPAE_PFN) {
-			max_pfn = MAX_NONPAE_PFN;
-			printk(KERN_WARNING "Warning only 4GB will be used."
-				"Use a HIGHMEM64G enabled kernel.\n");
-		}
+	if (max_pfn > MAX_NONPAE_PFN) {
+		max_pfn = MAX_NONPAE_PFN;
+		printk(KERN_WARNING MSG_HIGHMEM_TRIMMED);
+	}
 #endif /* !CONFIG_HIGHMEM64G */
 #endif /* !CONFIG_HIGHMEM */
-	} else {
-		if (highmem_pages == -1)
-			highmem_pages = 0;
-#ifdef CONFIG_HIGHMEM
-		if (highmem_pages >= max_pfn) {
-			printk(KERN_ERR "highmem size specified (%uMB) is "
-				"bigger than pages available (%luMB)!.\n",
-				pages_to_mb(highmem_pages),
-				pages_to_mb(max_pfn));
-			highmem_pages = 0;
-		}
-		if (highmem_pages) {
-			if (max_low_pfn - highmem_pages <
-			    64*1024*1024/PAGE_SIZE){
-				printk(KERN_ERR "highmem size %uMB results in "
-				"smaller than 64MB lowmem, ignoring it.\n"
-					, pages_to_mb(highmem_pages));
-				highmem_pages = 0;
-			}
-			max_low_pfn -= highmem_pages;
-		}
-#else
-		if (highmem_pages)
-			printk(KERN_ERR "ignoring highmem size on non-highmem"
-					" kernel!\n");
-#endif
-	}
+}
+
+/*
+ * Determine low and high memory ranges:
+ */
+void __init find_low_pfn_range(void)
+{
+	/* it could update max_pfn */
+
+	if (max_pfn <= MAXMEM_PFN)
+		lowmem_pfn_init();
+	else
+		highmem_pfn_init();
 }
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
@@ -826,10 +850,10 @@ static void __init find_early_table_space(unsigned long end, int use_pse)
 	unsigned long puds, pmds, ptes, tables, start;
 
 	puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-	tables = PAGE_ALIGN(puds * sizeof(pud_t));
+	tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
 
 	pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-	tables += PAGE_ALIGN(pmds * sizeof(pmd_t));
+	tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
 
 	if (use_pse) {
 		unsigned long extra;
@@ -840,10 +864,10 @@ static void __init find_early_table_space(unsigned long end, int use_pse)
 	} else
 		ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
-	tables += PAGE_ALIGN(ptes * sizeof(pte_t));
+	tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
 
 	/* for fixmap */
-	tables += PAGE_ALIGN(__end_of_fixed_addresses * sizeof(pte_t));
+	tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
 
 	/*
 	 * RED-PEN putting page tables only on node 0 could
@@ -1193,45 +1217,6 @@ void mark_rodata_ro(void)
 }
 #endif
 
-void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	/*
-	 * If debugging page accesses then do not free this memory but
-	 * mark them not present - any buggy init-section access will
-	 * create a kernel page fault:
-	 */
-	printk(KERN_INFO "debug: unmapping init memory %08lx..%08lx\n",
-		begin, PAGE_ALIGN(end));
-	set_memory_np(begin, (end - begin) >> PAGE_SHIFT);
-#else
-	unsigned long addr;
-
-	/*
-	 * We just marked the kernel text read only above, now that
-	 * we are going to free part of that, we need to make that
-	 * writeable first.
-	 */
-	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
-
-	for (addr = begin; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
-		totalram_pages++;
-	}
-	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
-#endif
-}
-
-void free_initmem(void)
-{
-	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b1352250096e..07f44d491df1 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -168,34 +168,51 @@ static __ref void *spp_getpage(void)
 	return ptr;
 }
 
-void
-set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)
+static pud_t *fill_pud(pgd_t *pgd, unsigned long vaddr)
 {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	if (pgd_none(*pgd)) {
+		pud_t *pud = (pud_t *)spp_getpage();
+		pgd_populate(&init_mm, pgd, pud);
+		if (pud != pud_offset(pgd, 0))
+			printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",
+			       pud, pud_offset(pgd, 0));
+	}
+	return pud_offset(pgd, vaddr);
+}
 
-	pud = pud_page + pud_index(vaddr);
+static pmd_t *fill_pmd(pud_t *pud, unsigned long vaddr)
+{
 	if (pud_none(*pud)) {
-		pmd = (pmd_t *) spp_getpage();
+		pmd_t *pmd = (pmd_t *) spp_getpage();
 		pud_populate(&init_mm, pud, pmd);
-		if (pmd != pmd_offset(pud, 0)) {
+		if (pmd != pmd_offset(pud, 0))
 			printk(KERN_ERR "PAGETABLE BUG #01! %p <-> %p\n",
-				pmd, pmd_offset(pud, 0));
-			return;
-		}
+			       pmd, pmd_offset(pud, 0));
 	}
-	pmd = pmd_offset(pud, vaddr);
+	return pmd_offset(pud, vaddr);
+}
+
+static pte_t *fill_pte(pmd_t *pmd, unsigned long vaddr)
+{
 	if (pmd_none(*pmd)) {
-		pte = (pte_t *) spp_getpage();
+		pte_t *pte = (pte_t *) spp_getpage();
 		pmd_populate_kernel(&init_mm, pmd, pte);
-		if (pte != pte_offset_kernel(pmd, 0)) {
+		if (pte != pte_offset_kernel(pmd, 0))
 			printk(KERN_ERR "PAGETABLE BUG #02!\n");
-			return;
-		}
 	}
+	return pte_offset_kernel(pmd, vaddr);
+}
+
+void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pud = pud_page + pud_index(vaddr);
+	pmd = fill_pmd(pud, vaddr);
+	pte = fill_pte(pmd, vaddr);
 
-	pte = pte_offset_kernel(pmd, vaddr);
 	set_pte(pte, new_pte);
 
 	/*
@@ -205,8 +222,7 @@ set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte)
 	__flush_tlb_one(vaddr);
 }
 
-void
-set_pte_vaddr(unsigned long vaddr, pte_t pteval)
+void set_pte_vaddr(unsigned long vaddr, pte_t pteval)
 {
 	pgd_t *pgd;
 	pud_t *pud_page;
@@ -223,6 +239,24 @@ set_pte_vaddr(unsigned long vaddr, pte_t pteval)
 	set_pte_vaddr_pud(pud_page, vaddr, pteval);
 }
 
+pmd_t * __init populate_extra_pmd(unsigned long vaddr)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+
+	pgd = pgd_offset_k(vaddr);
+	pud = fill_pud(pgd, vaddr);
+	return fill_pmd(pud, vaddr);
+}
+
+pte_t * __init populate_extra_pte(unsigned long vaddr)
+{
+	pmd_t *pmd;
+
+	pmd = populate_extra_pmd(vaddr);
+	return fill_pte(pmd, vaddr);
+}
+
 /*
  * Create large page table mappings for a range of physical addresses.
  */
@@ -947,43 +981,6 @@ void __init mem_init(void)
 		initsize >> 10);
 }
 
-void free_init_pages(char *what, unsigned long begin, unsigned long end)
-{
-	unsigned long addr = begin;
-
-	if (addr >= end)
-		return;
-
-	/*
-	 * If debugging page accesses then do not free this memory but
-	 * mark them not present - any buggy init-section access will
-	 * create a kernel page fault:
-	 */
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	printk(KERN_INFO "debug: unmapping init memory %08lx..%08lx\n",
-		begin, PAGE_ALIGN(end));
-	set_memory_np(begin, (end - begin) >> PAGE_SHIFT);
-#else
-	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
-
-	for (; addr < end; addr += PAGE_SIZE) {
-		ClearPageReserved(virt_to_page(addr));
-		init_page_count(virt_to_page(addr));
-		memset((void *)(addr & ~(PAGE_SIZE-1)),
-			POISON_FREE_INITMEM, PAGE_SIZE);
-		free_page(addr);
-		totalram_pages++;
-	}
-#endif
-}
-
-void free_initmem(void)
-{
-	free_init_pages("unused kernel memory",
-			(unsigned long)(&__init_begin),
-			(unsigned long)(&__init_end));
-}
-
 #ifdef CONFIG_DEBUG_RODATA
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index f45d5e29a72e..433f7bd4648a 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -348,7 +348,7 @@ EXPORT_SYMBOL(ioremap_nocache);
  *
  * Must be freed with iounmap.
  */
-void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
 	if (pat_enabled)
 		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC,
diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index 9cab18b0b857..0bcd7883d036 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -9,44 +9,44 @@
 
 #include <asm/e820.h>
 
-static void __init memtest(unsigned long start_phys, unsigned long size,
-				 unsigned pattern)
+static u64 patterns[] __initdata = {
+	0,
+	0xffffffffffffffffULL,
+	0x5555555555555555ULL,
+	0xaaaaaaaaaaaaaaaaULL,
+	0x1111111111111111ULL,
+	0x2222222222222222ULL,
+	0x4444444444444444ULL,
+	0x8888888888888888ULL,
+	0x3333333333333333ULL,
+	0x6666666666666666ULL,
+	0x9999999999999999ULL,
+	0xccccccccccccccccULL,
+	0x7777777777777777ULL,
+	0xbbbbbbbbbbbbbbbbULL,
+	0xddddddddddddddddULL,
+	0xeeeeeeeeeeeeeeeeULL,
+	0x7a6c7258554e494cULL, /* yeah ;-) */
+};
+
+static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 {
-	unsigned long i;
-	unsigned long *start;
-	unsigned long start_bad;
-	unsigned long last_bad;
-	unsigned long val;
-	unsigned long start_phys_aligned;
-	unsigned long count;
-	unsigned long incr;
-
-	switch (pattern) {
-	case 0:
-		val = 0UL;
-		break;
-	case 1:
-		val = -1UL;
-		break;
-	case 2:
-#ifdef CONFIG_X86_64
-		val = 0x5555555555555555UL;
-#else
-		val = 0x55555555UL;
-#endif
-		break;
-	case 3:
-#ifdef CONFIG_X86_64
-		val = 0xaaaaaaaaaaaaaaaaUL;
-#else
-		val = 0xaaaaaaaaUL;
-#endif
-		break;
-	default:
-		return;
-	}
+	printk(KERN_INFO "  %016llx bad mem addr %010llx - %010llx reserved\n",
+	       (unsigned long long) pattern,
+	       (unsigned long long) start_bad,
+	       (unsigned long long) end_bad);
+	reserve_early(start_bad, end_bad, "BAD RAM");
+}
 
-	incr = sizeof(unsigned long);
+static void __init memtest(u64 pattern, u64 start_phys, u64 size)
+{
+	u64 i, count;
+	u64 *start;
+	u64 start_bad, last_bad;
+	u64 start_phys_aligned;
+	size_t incr;
+
+	incr = sizeof(pattern);
 	start_phys_aligned = ALIGN(start_phys, incr);
 	count = (size - (start_phys_aligned - start_phys))/incr;
 	start = __va(start_phys_aligned);
@@ -54,25 +54,42 @@ static void __init memtest(unsigned long start_phys, unsigned long size,
 	last_bad = 0;
 
 	for (i = 0; i < count; i++)
-		start[i] = val;
+		start[i] = pattern;
 	for (i = 0; i < count; i++, start++, start_phys_aligned += incr) {
-		if (*start != val) {
-			if (start_phys_aligned == last_bad + incr) {
-				last_bad += incr;
-			} else {
-				if (start_bad) {
-					printk(KERN_CONT "\n  %016lx bad mem addr %010lx - %010lx reserved",
-						val, start_bad, last_bad + incr);
-					reserve_early(start_bad, last_bad + incr, "BAD RAM");
-				}
-				start_bad = last_bad = start_phys_aligned;
-			}
+		if (*start == pattern)
+			continue;
+		if (start_phys_aligned == last_bad + incr) {
+			last_bad += incr;
+			continue;
 		}
+		if (start_bad)
+			reserve_bad_mem(pattern, start_bad, last_bad + incr);
+		start_bad = last_bad = start_phys_aligned;
 	}
-	if (start_bad) {
-		printk(KERN_CONT "\n  %016lx bad mem addr %010lx - %010lx reserved",
-			val, start_bad, last_bad + incr);
-		reserve_early(start_bad, last_bad + incr, "BAD RAM");
+	if (start_bad)
+		reserve_bad_mem(pattern, start_bad, last_bad + incr);
+}
+
+static void __init do_one_pass(u64 pattern, u64 start, u64 end)
+{
+	u64 size = 0;
+
+	while (start < end) {
+		start = find_e820_area_size(start, &size, 1);
+
+		/* done ? */
+		if (start >= end)
+			break;
+		if (start + size > end)
+			size = end - start;
+
+		printk(KERN_INFO "  %010llx - %010llx pattern %016llx\n",
+		       (unsigned long long) start,
+		       (unsigned long long) start + size,
+		       (unsigned long long) cpu_to_be64(pattern));
+		memtest(pattern, start, size);
+
+		start += size;
 	}
 }
 
@@ -90,33 +107,22 @@ early_param("memtest", parse_memtest);
 
 void __init early_memtest(unsigned long start, unsigned long end)
 {
-	u64 t_start, t_size;
-	unsigned pattern;
+	unsigned int i;
+	unsigned int idx = 0;
 
 	if (!memtest_pattern)
 		return;
 
-	printk(KERN_INFO "early_memtest: pattern num %d", memtest_pattern);
-	for (pattern = 0; pattern < memtest_pattern; pattern++) {
-		t_start = start;
-		t_size = 0;
-		while (t_start < end) {
-			t_start = find_e820_area_size(t_start, &t_size, 1);
-
-			/* done ? */
-			if (t_start >= end)
-				break;
-			if (t_start + t_size > end)
-				t_size = end - t_start;
-
-			printk(KERN_CONT "\n  %010llx - %010llx pattern %d",
-				(unsigned long long)t_start,
-				(unsigned long long)t_start + t_size, pattern);
-
-			memtest(t_start, t_size, pattern);
+	printk(KERN_INFO "early_memtest: # of tests: %d\n", memtest_pattern);
+	for (i = 0; i < memtest_pattern; i++) {
+		idx = i % ARRAY_SIZE(patterns);
+		do_one_pass(patterns[idx], start, end);
+	}
 
-			t_start += t_size;
-		}
+	if (idx > 0) {
+		printk(KERN_INFO "early_memtest: wipe out "
+		       "test pattern from memory\n");
+		/* additional test with pattern 0 will do this */
+		do_one_pass(0, start, end);
 	}
-	printk(KERN_CONT "\n");
 }
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 56fe7124fbec..165829600566 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -4,7 +4,7 @@
  * Based on code by Ingo Molnar and Andi Kleen, copyrighted
  * as follows:
  *
- * Copyright 2003-2004 Red Hat Inc., Durham, North Carolina.
+ * Copyright 2003-2009 Red Hat Inc.
  * All Rights Reserved.
  * Copyright 2005 Andi Kleen, SUSE Labs.
  * Copyright 2007 Jiri Kosina, SUSE Labs.
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index d1f7439d173c..451fe95a0352 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -194,7 +194,7 @@ void *alloc_remap(int nid, unsigned long size)
 	size = ALIGN(size, L1_CACHE_BYTES);
 
 	if (!allocation || (allocation + size) >= node_remap_end_vaddr[nid])
-		return 0;
+		return NULL;
 
 	node_remap_alloc_vaddr[nid] += size;
 	memset(allocation, 0, size);
@@ -423,32 +423,6 @@ void __init initmem_init(unsigned long start_pfn,
 	setup_bootmem_allocator();
 }
 
-void __init set_highmem_pages_init(void)
-{
-#ifdef CONFIG_HIGHMEM
-	struct zone *zone;
-	int nid;
-
-	for_each_zone(zone) {
-		unsigned long zone_start_pfn, zone_end_pfn;
-
-		if (!is_highmem(zone))
-			continue;
-
-		zone_start_pfn = zone->zone_start_pfn;
-		zone_end_pfn = zone_start_pfn + zone->spanned_pages;
-
-		nid = zone_to_nid(zone);
-		printk(KERN_INFO "Initializing %s for node %d (%08lx:%08lx)\n",
-				zone->name, nid, zone_start_pfn, zone_end_pfn);
-
-		add_highpages_with_active_regions(nid, zone_start_pfn,
-				 zone_end_pfn);
-	}
-	totalram_pages += totalhigh_pages;
-#endif
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static int paddr_to_nid(u64 addr)
 {
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index f3516da035d1..64c9cf043cdd 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -20,6 +20,12 @@
 #include <asm/acpi.h>
 #include <asm/k8.h>
 
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+# define DBG(x...) printk(KERN_DEBUG x)
+#else
+# define DBG(x...)
+#endif
+
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
 
@@ -33,6 +39,21 @@ int numa_off __initdata;
 static unsigned long __initdata nodemap_addr;
 static unsigned long __initdata nodemap_size;
 
+DEFINE_PER_CPU(int, node_number) = 0;
+EXPORT_PER_CPU_SYMBOL(node_number);
+
+/*
+ * Map cpu index to node index
+ */
+DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
+
+/*
+ * Which logical CPUs are on which nodes
+ */
+cpumask_t *node_to_cpumask_map;
+EXPORT_SYMBOL(node_to_cpumask_map);
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -640,3 +661,199 @@ void __init init_cpu_to_node(void)
 #endif
 
 
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: node_to_cpumask() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node, num = 0;
+	cpumask_t *map;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES) {
+		for_each_node_mask(node, node_possible_map)
+			num = node;
+		nr_node_ids = num + 1;
+	}
+
+	/* allocate the map */
+	map = alloc_bootmem_low(nr_node_ids * sizeof(cpumask_t));
+	DBG("node_to_cpumask_map at %p for %d nodes\n", map, nr_node_ids);
+
+	pr_debug("Node to cpumask map at %p for %d nodes\n",
+		 map, nr_node_ids);
+
+	/* node_to_cpumask() will now work */
+	node_to_cpumask_map = map;
+}
+
+void __cpuinit numa_set_node(int cpu, int node)
+{
+	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
+
+	/* early setting, no percpu area yet */
+	if (cpu_to_node_map) {
+		cpu_to_node_map[cpu] = node;
+		return;
+	}
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+	if (cpu >= nr_cpu_ids || !cpu_possible(cpu)) {
+		printk(KERN_ERR "numa_set_node: invalid cpu# (%d)\n", cpu);
+		dump_stack();
+		return;
+	}
+#endif
+	per_cpu(x86_cpu_to_node_map, cpu) = node;
+
+	if (node != NUMA_NO_NODE)
+		per_cpu(node_number, cpu) = node;
+}
+
+void __cpuinit numa_clear_node(int cpu)
+{
+	numa_set_node(cpu, NUMA_NO_NODE);
+}
+
+#ifndef CONFIG_DEBUG_PER_CPU_MAPS
+
+void __cpuinit numa_add_cpu(int cpu)
+{
+	cpu_set(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+}
+
+void __cpuinit numa_remove_cpu(int cpu)
+{
+	cpu_clear(cpu, node_to_cpumask_map[early_cpu_to_node(cpu)]);
+}
+
+#else /* CONFIG_DEBUG_PER_CPU_MAPS */
+
+/*
+ * --------- debug versions of the numa functions ---------
+ */
+static void __cpuinit numa_set_cpumask(int cpu, int enable)
+{
+	int node = early_cpu_to_node(cpu);
+	cpumask_t *mask;
+	char buf[64];
+
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_ERR "node_to_cpumask_map NULL\n");
+		dump_stack();
+		return;
+	}
+
+	mask = &node_to_cpumask_map[node];
+	if (enable)
+		cpu_set(cpu, *mask);
+	else
+		cpu_clear(cpu, *mask);
+
+	cpulist_scnprintf(buf, sizeof(buf), mask);
+	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
+		enable ? "numa_add_cpu" : "numa_remove_cpu", cpu, node, buf);
+}
+
+void __cpuinit numa_add_cpu(int cpu)
+{
+	numa_set_cpumask(cpu, 1);
+}
+
+void __cpuinit numa_remove_cpu(int cpu)
+{
+	numa_set_cpumask(cpu, 0);
+}
+
+int cpu_to_node(int cpu)
+{
+	if (early_per_cpu_ptr(x86_cpu_to_node_map)) {
+		printk(KERN_WARNING
+			"cpu_to_node(%d): usage too early!\n", cpu);
+		dump_stack();
+		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
+	}
+	return per_cpu(x86_cpu_to_node_map, cpu);
+}
+EXPORT_SYMBOL(cpu_to_node);
+
+/*
+ * Same function as cpu_to_node() but used if called before the
+ * per_cpu areas are setup.
+ */
+int early_cpu_to_node(int cpu)
+{
+	if (early_per_cpu_ptr(x86_cpu_to_node_map))
+		return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
+
+	if (!cpu_possible(cpu)) {
+		printk(KERN_WARNING
+			"early_cpu_to_node(%d): no per_cpu area!\n", cpu);
+		dump_stack();
+		return NUMA_NO_NODE;
+	}
+	return per_cpu(x86_cpu_to_node_map, cpu);
+}
+
+
+/* empty cpumask */
+static const cpumask_t cpu_mask_none;
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const cpumask_t *cpumask_of_node(int node)
+{
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return (const cpumask_t *)&cpu_online_map;
+	}
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return &cpu_mask_none;
+	}
+	return &node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+/*
+ * Returns a bitmask of CPUs on Node 'node'.
+ *
+ * Side note: this function creates the returned cpumask on the stack
+ * so with a high NR_CPUS count, excessive stack space is used.  The
+ * node_to_cpumask_ptr function should be used whenever possible.
+ */
+cpumask_t node_to_cpumask(int node)
+{
+	if (node_to_cpumask_map == NULL) {
+		printk(KERN_WARNING
+			"node_to_cpumask(%d): no node_to_cpumask_map!\n", node);
+		dump_stack();
+		return cpu_online_map;
+	}
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"node_to_cpumask(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_mask_none;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(node_to_cpumask);
+
+/*
+ * --------- end of debug versions of the numa functions ---------
+ */
+
+#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 7233bd7e357b..9c4294986af7 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -482,6 +482,13 @@ static int split_large_page(pte_t *kpte, unsigned long address)
 	pbase = (pte_t *)page_address(base);
 	paravirt_alloc_pte(&init_mm, page_to_pfn(base));
 	ref_prot = pte_pgprot(pte_clrhuge(*kpte));
+	/*
+	 * If we ever want to utilize the PAT bit, we need to
+	 * update this function to make sure it's converted from
+	 * bit 12 to bit 7 when we cross from the 2MB level to
+	 * the 4K level:
+	 */
+	WARN_ON_ONCE(pgprot_val(ref_prot) & _PAGE_PAT_LARGE);
 
 #ifdef CONFIG_X86_64
 	if (level == PG_LEVEL_1G) {
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index e0ab173b6974..2ed37158012d 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -31,7 +31,7 @@
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
-void __cpuinit pat_disable(char *reason)
+void __cpuinit pat_disable(const char *reason)
 {
 	pat_enabled = 0;
 	printk(KERN_INFO "%s\n", reason);
@@ -43,6 +43,11 @@ static int __init nopat(char *str)
 	return 0;
 }
 early_param("nopat", nopat);
+#else
+static inline void pat_disable(const char *reason)
+{
+	(void)reason;
+}
 #endif
 
 
@@ -79,16 +84,20 @@ void pat_init(void)
 	if (!pat_enabled)
 		return;
 
-	/* Paranoia check. */
-	if (!cpu_has_pat && boot_pat_state) {
-		/*
-		 * If this happens we are on a secondary CPU, but
-		 * switched to PAT on the boot CPU. We have no way to
-		 * undo PAT.
-		 */
-		printk(KERN_ERR "PAT enabled, "
-		       "but not supported by secondary CPU\n");
-		BUG();
+	if (!cpu_has_pat) {
+		if (!boot_pat_state) {
+			pat_disable("PAT not supported by CPU.");
+			return;
+		} else {
+			/*
+			 * If this happens we are on a secondary CPU, but
+			 * switched to PAT on the boot CPU. We have no way to
+			 * undo PAT.
+			 */
+			printk(KERN_ERR "PAT enabled, "
+			       "but not supported by secondary CPU\n");
+			BUG();
+		}
 	}
 
 	/* Set PWT to Write-Combining. All other bits stay the same */
@@ -626,6 +635,33 @@ void unmap_devmem(unsigned long pfn, unsigned long size, pgprot_t vma_prot)
 }
 
 /*
+ * Change the memory type for the physial address range in kernel identity
+ * mapping space if that range is a part of identity map.
+ */
+int kernel_map_sync_memtype(u64 base, unsigned long size, unsigned long flags)
+{
+	unsigned long id_sz;
+
+	if (!pat_enabled || base >= __pa(high_memory))
+		return 0;
+
+	id_sz = (__pa(high_memory) < base + size) ?
+				__pa(high_memory) - base :
+				size;
+
+	if (ioremap_change_attr((unsigned long)__va(base), id_sz, flags) < 0) {
+		printk(KERN_INFO
+			"%s:%d ioremap_change_attr failed %s "
+			"for %Lx-%Lx\n",
+			current->comm, current->pid,
+			cattr_name(flags),
+			base, (unsigned long long)(base + size));
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
  * Internal interface to reserve a range of physical memory with prot.
  * Reserved non RAM regions only and after successful reserve_memtype,
  * this func also keeps identity mapping (if any) in sync with this new prot.
@@ -634,7 +670,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 				int strict_prot)
 {
 	int is_ram = 0;
-	int id_sz, ret;
+	int ret;
 	unsigned long flags;
 	unsigned long want_flags = (pgprot_val(*vma_prot) & _PAGE_CACHE_MASK);
 
@@ -671,23 +707,8 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 				     flags);
 	}
 
-	/* Need to keep identity mapping in sync */
-	if (paddr >= __pa(high_memory))
-		return 0;
-
-	id_sz = (__pa(high_memory) < paddr + size) ?
-				__pa(high_memory) - paddr :
-				size;
-
-	if (ioremap_change_attr((unsigned long)__va(paddr), id_sz, flags) < 0) {
+	if (kernel_map_sync_memtype(paddr, size, flags) < 0) {
 		free_memtype(paddr, paddr + size);
-		printk(KERN_ERR
-			"%s:%d reserve_pfn_range ioremap_change_attr failed %s "
-			"for %Lx-%Lx\n",
-			current->comm, current->pid,
-			cattr_name(flags),
-			(unsigned long long)paddr,
-			(unsigned long long)(paddr + size));
 		return -EINVAL;
 	}
 	return 0;
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 86f2ffc43c3d..5b7c7c8464fe 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -313,6 +313,24 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
 	return young;
 }
 
+/**
+ * reserve_top_address - reserves a hole in the top of kernel address space
+ * @reserve - size of hole to reserve
+ *
+ * Can be used to relocate the fixmap area and poke a hole in the top
+ * of kernel address space to make room for a hypervisor.
+ */
+void __init reserve_top_address(unsigned long reserve)
+{
+#ifdef CONFIG_X86_32
+	BUG_ON(fixmaps_set > 0);
+	printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
+	       (int)-reserve);
+	__FIXADDR_TOP = -reserve - PAGE_SIZE;
+	__VMALLOC_RESERVE += reserve;
+#endif
+}
+
 int fixmaps_set;
 
 void __native_set_fixmap(enum fixed_addresses idx, pte_t pte)
diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
index 0951db9ee519..f2e477c91c1b 100644
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -20,6 +20,8 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
+unsigned int __VMALLOC_RESERVE = 128 << 20;
+
 /*
  * Associate a virtual page frame with a given physical page frame 
  * and protection flags for that frame.
@@ -97,22 +99,6 @@ void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags)
 unsigned long __FIXADDR_TOP = 0xfffff000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
 
-/**
- * reserve_top_address - reserves a hole in the top of kernel address space
- * @reserve - size of hole to reserve
- *
- * Can be used to relocate the fixmap area and poke a hole in the top
- * of kernel address space to make room for a hypervisor.
- */
-void __init reserve_top_address(unsigned long reserve)
-{
-	BUG_ON(fixmaps_set > 0);
-	printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
-	       (int)-reserve);
-	__FIXADDR_TOP = -reserve - PAGE_SIZE;
-	__VMALLOC_RESERVE += reserve;
-}
-
 /*
  * vmalloc=size forces the vmalloc area to be exactly 'size'
  * bytes. This can be used to increase (or decrease) the
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 09737c8af074..574c8bc95ef0 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -20,7 +20,8 @@
 #include <asm/proto.h>
 #include <asm/numa.h>
 #include <asm/e820.h>
-#include <asm/genapic.h>
+#include <asm/apic.h>
+#include <asm/uv/uv.h>
 
 int acpi_numa __initdata;
 
diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/mm/tlb.c
index f8be6f1d2e48..a654d59e4483 100644
--- a/arch/x86/kernel/tlb_64.c
+++ b/arch/x86/mm/tlb.c
@@ -1,24 +1,19 @@
 #include <linux/init.h>
 
 #include <linux/mm.h>
-#include <linux/delay.h>
 #include <linux/spinlock.h>
 #include <linux/smp.h>
-#include <linux/kernel_stat.h>
-#include <linux/mc146818rtc.h>
 #include <linux/interrupt.h>
+#include <linux/module.h>
 
-#include <asm/mtrr.h>
-#include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
-#include <asm/proto.h>
-#include <asm/apicdef.h>
-#include <asm/idle.h>
-#include <asm/uv/uv_hub.h>
-#include <asm/uv/uv_bau.h>
+#include <asm/apic.h>
+#include <asm/uv/uv.h>
+
+DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate)
+			= { &init_mm, 0, };
 
-#include <mach_ipi.h>
 /*
  *	Smarter SMP flushing macros.
  *		c/o Linus Torvalds.
@@ -33,7 +28,7 @@
  *	To avoid global state use 8 different call vectors.
  *	Each CPU uses a specific vector to trigger flushes on other
  *	CPUs. Depending on the received vector the target CPUs look into
- *	the right per cpu variable for the flush data.
+ *	the right array slot for the flush data.
  *
  *	With more than 8 CPUs they are hashed to the 8 available
  *	vectors. The limited global vector space forces us to this right now.
@@ -43,18 +38,18 @@
 
 union smp_flush_state {
 	struct {
-		cpumask_t flush_cpumask;
 		struct mm_struct *flush_mm;
 		unsigned long flush_va;
 		spinlock_t tlbstate_lock;
+		DECLARE_BITMAP(flush_cpumask, NR_CPUS);
 	};
-	char pad[SMP_CACHE_BYTES];
-} ____cacheline_aligned;
+	char pad[CONFIG_X86_INTERNODE_CACHE_BYTES];
+} ____cacheline_internodealigned_in_smp;
 
 /* State is put into the per CPU data section, but padded
    to a full cache line because other CPUs can access it and we don't
    want false sharing in the per cpu data segment. */
-static DEFINE_PER_CPU(union smp_flush_state, flush_state);
+static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
 
 /*
  * We cannot call mmdrop() because we are in interrupt context,
@@ -62,9 +57,9 @@ static DEFINE_PER_CPU(union smp_flush_state, flush_state);
  */
 void leave_mm(int cpu)
 {
-	if (read_pda(mmu_state) == TLBSTATE_OK)
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpu_clear(cpu, read_pda(active_mm)->cpu_vm_mask);
+	cpu_clear(cpu, percpu_read(cpu_tlbstate.active_mm)->cpu_vm_mask);
 	load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
@@ -117,10 +112,20 @@ EXPORT_SYMBOL_GPL(leave_mm);
  * Interrupts are disabled.
  */
 
-asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
+/*
+ * FIXME: use of asmlinkage is not consistent.  On x86_64 it's noop
+ * but still used for documentation purpose but the usage is slightly
+ * inconsistent.  On x86_32, asmlinkage is regparm(0) but interrupt
+ * entry calls in with the first parameter in %eax.  Maybe define
+ * intrlinkage?
+ */
+#ifdef CONFIG_X86_64
+asmlinkage
+#endif
+void smp_invalidate_interrupt(struct pt_regs *regs)
 {
-	int cpu;
-	int sender;
+	unsigned int cpu;
+	unsigned int sender;
 	union smp_flush_state *f;
 
 	cpu = smp_processor_id();
@@ -129,9 +134,9 @@ asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
 	 * Use that to determine where the sender put the data.
 	 */
 	sender = ~regs->orig_ax - INVALIDATE_TLB_VECTOR_START;
-	f = &per_cpu(flush_state, sender);
+	f = &flush_state[sender];
 
-	if (!cpu_isset(cpu, f->flush_cpumask))
+	if (!cpumask_test_cpu(cpu, to_cpumask(f->flush_cpumask)))
 		goto out;
 		/*
 		 * This was a BUG() but until someone can quote me the
@@ -142,8 +147,8 @@ asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
 		 * BUG();
 		 */
 
-	if (f->flush_mm == read_pda(active_mm)) {
-		if (read_pda(mmu_state) == TLBSTATE_OK) {
+	if (f->flush_mm == percpu_read(cpu_tlbstate.active_mm)) {
+		if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) {
 			if (f->flush_va == TLB_FLUSH_ALL)
 				local_flush_tlb();
 			else
@@ -153,23 +158,21 @@ asmlinkage void smp_invalidate_interrupt(struct pt_regs *regs)
 	}
 out:
 	ack_APIC_irq();
-	cpu_clear(cpu, f->flush_cpumask);
+	smp_mb__before_clear_bit();
+	cpumask_clear_cpu(cpu, to_cpumask(f->flush_cpumask));
+	smp_mb__after_clear_bit();
 	inc_irq_stat(irq_tlb_count);
 }
 
-void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
-			     unsigned long va)
+static void flush_tlb_others_ipi(const struct cpumask *cpumask,
+				 struct mm_struct *mm, unsigned long va)
 {
-	int sender;
+	unsigned int sender;
 	union smp_flush_state *f;
-	cpumask_t cpumask = *cpumaskp;
-
-	if (is_uv_system() && uv_flush_tlb_others(&cpumask, mm, va))
-		return;
 
 	/* Caller has disabled preemption */
 	sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
-	f = &per_cpu(flush_state, sender);
+	f = &flush_state[sender];
 
 	/*
 	 * Could avoid this lock when
@@ -180,7 +183,8 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
 
 	f->flush_mm = mm;
 	f->flush_va = va;
-	cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask);
+	cpumask_andnot(to_cpumask(f->flush_cpumask),
+		       cpumask, cpumask_of(smp_processor_id()));
 
 	/*
 	 * Make the above memory operations globally visible before
@@ -191,9 +195,10 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR_START + sender);
+	apic->send_IPI_mask(to_cpumask(f->flush_cpumask),
+		      INVALIDATE_TLB_VECTOR_START + sender);
 
-	while (!cpus_empty(f->flush_cpumask))
+	while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
 		cpu_relax();
 
 	f->flush_mm = NULL;
@@ -201,12 +206,28 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
 	spin_unlock(&f->tlbstate_lock);
 }
 
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     struct mm_struct *mm, unsigned long va)
+{
+	if (is_uv_system()) {
+		unsigned int cpu;
+
+		cpu = get_cpu();
+		cpumask = uv_flush_tlb_others(cpumask, mm, va, cpu);
+		if (cpumask)
+			flush_tlb_others_ipi(cpumask, mm, va);
+		put_cpu();
+		return;
+	}
+	flush_tlb_others_ipi(cpumask, mm, va);
+}
+
 static int __cpuinit init_smp_flush(void)
 {
 	int i;
 
-	for_each_possible_cpu(i)
-		spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock);
+	for (i = 0; i < ARRAY_SIZE(flush_state); i++)
+		spin_lock_init(&flush_state[i].tlbstate_lock);
 
 	return 0;
 }
@@ -215,25 +236,18 @@ core_initcall(init_smp_flush);
 void flush_tlb_current_task(void)
 {
 	struct mm_struct *mm = current->mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	local_flush_tlb();
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 	preempt_enable();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-	cpumask_t cpu_mask;
-
 	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -241,8 +255,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 		else
 			leave_mm(smp_processor_id());
 	}
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, TLB_FLUSH_ALL);
+	if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(&mm->cpu_vm_mask, mm, TLB_FLUSH_ALL);
 
 	preempt_enable();
 }
@@ -250,11 +264,8 @@ void flush_tlb_mm(struct mm_struct *mm)
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	cpumask_t cpu_mask;
 
 	preempt_disable();
-	cpu_mask = mm->cpu_vm_mask;
-	cpu_clear(smp_processor_id(), cpu_mask);
 
 	if (current->active_mm == mm) {
 		if (current->mm)
@@ -263,8 +274,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long va)
 			leave_mm(smp_processor_id());
 	}
 
-	if (!cpus_empty(cpu_mask))
-		flush_tlb_others(cpu_mask, mm, va);
+	if (cpumask_any_but(&mm->cpu_vm_mask, smp_processor_id()) < nr_cpu_ids)
+		flush_tlb_others(&mm->cpu_vm_mask, mm, va);
 
 	preempt_enable();
 }
@@ -274,7 +285,7 @@ static void do_flush_tlb_all(void *info)
 	unsigned long cpu = smp_processor_id();
 
 	__flush_tlb_all();
-	if (read_pda(mmu_state) == TLBSTATE_LAZY)
+	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_LAZY)
 		leave_mm(cpu);
 }
 
diff --git a/arch/x86/pci/numaq_32.c b/arch/x86/pci/numaq_32.c
index 2089354968a2..8eb295e116f6 100644
--- a/arch/x86/pci/numaq_32.c
+++ b/arch/x86/pci/numaq_32.c
@@ -5,7 +5,7 @@
 #include <linux/pci.h>
 #include <linux/init.h>
 #include <linux/nodemask.h>
-#include <mach_apic.h>
+#include <asm/apic.h>
 #include <asm/mpspec.h>
 #include <asm/pci_x86.h>
 
@@ -18,10 +18,6 @@
 
 #define QUADLOCAL2BUS(quad,local) (quad_local_to_mp_bus_id[quad][local])
 
-/* Where the IO area was mapped on multiquad, always 0 otherwise */
-void *xquad_portio;
-EXPORT_SYMBOL(xquad_portio);
-
 #define XQUAD_PORT_ADDR(port, quad) (xquad_portio + (XQUAD_PORTIO_QUAD*quad) + port)
 
 #define PCI_CONF1_MQ_ADDRESS(bus, devfn, reg) \
diff --git a/arch/x86/pci/pcbios.c b/arch/x86/pci/pcbios.c
index b82cae970dfd..1c975cc9839e 100644
--- a/arch/x86/pci/pcbios.c
+++ b/arch/x86/pci/pcbios.c
@@ -7,7 +7,7 @@
 #include <linux/module.h>
 #include <linux/uaccess.h>
 #include <asm/pci_x86.h>
-#include <asm/mach-default/pci-functions.h>
+#include <asm/pci-functions.h>
 
 /* BIOS32 signature: "_32_" */
 #define BIOS32_SIGNATURE	(('_' << 0) + ('3' << 8) + ('2' << 16) + ('_' << 24))
diff --git a/arch/x86/power/hibernate_asm_32.S b/arch/x86/power/hibernate_asm_32.S
index d1e9b53f9d33..b641388d8286 100644
--- a/arch/x86/power/hibernate_asm_32.S
+++ b/arch/x86/power/hibernate_asm_32.S
@@ -8,7 +8,7 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
 
diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index 000415947d93..9356547d8c01 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -18,7 +18,7 @@
 	.text
 #include <linux/linkage.h>
 #include <asm/segment.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 #include <asm/asm-offsets.h>
 #include <asm/processor-flags.h>
 
diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 4d6ef0a336d6..16a9020c8f11 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -38,7 +38,7 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
 	$(call if_changed,objcopy)
 
 CFL := $(PROFILING) -mcmodel=small -fPIC -O2 -fasynchronous-unwind-tables -m64 \
-       $(filter -g%,$(KBUILD_CFLAGS))
+       $(filter -g%,$(KBUILD_CFLAGS)) $(call cc-option, -fno-stack-protector)
 
 $(vobjs): KBUILD_CFLAGS += $(CFL)
 
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 9c98cc6ba978..7133cdf9098b 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -85,8 +85,8 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
 	unsigned long addr, end;
 	unsigned offset;
 	end = (start + PMD_SIZE - 1) & PMD_MASK;
-	if (end >= TASK_SIZE64)
-		end = TASK_SIZE64;
+	if (end >= TASK_SIZE_MAX)
+		end = TASK_SIZE_MAX;
 	end -= len;
 	/* This loses some more bits than a modulo, but is cheaper */
 	offset = get_random_int() & (PTRS_PER_PTE - 1);
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 87b9ab166423..b83e119fbeb0 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -6,7 +6,7 @@ config XEN
 	bool "Xen guest support"
 	select PARAVIRT
 	select PARAVIRT_CLOCK
-	depends on X86_64 || (X86_32 && X86_PAE && !(X86_VISWS || X86_VOYAGER))
+	depends on X86_64 || (X86_32 && X86_PAE && !X86_VISWS)
 	depends on X86_CMPXCHG && X86_TSC
 	help
 	  This is the Linux Xen port.  Enabling this will allow the
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 6dcefba7836f..3b767d03fd6a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -6,7 +6,8 @@ CFLAGS_REMOVE_irq.o = -pg
 endif
 
 obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
-			time.o xen-asm_$(BITS).o grant-table.o suspend.o
+			time.o xen-asm.o xen-asm_$(BITS).o \
+			grant-table.o suspend.o
 
 obj-$(CONFIG_SMP)		+= smp.o spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 \ No newline at end of file
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index b58e96338149..82cd39a6cbd3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -61,40 +61,13 @@ DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
 enum xen_domain_type xen_domain_type = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
 
-/*
- * Identity map, in addition to plain kernel map.  This needs to be
- * large enough to allocate page table pages to allocate the rest.
- * Each page can map 2MB.
- */
-static pte_t level1_ident_pgt[PTRS_PER_PTE * 4] __page_aligned_bss;
-
-#ifdef CONFIG_X86_64
-/* l3 pud for userspace vsyscall mapping */
-static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss;
-#endif /* CONFIG_X86_64 */
-
-/*
- * Note about cr3 (pagetable base) values:
- *
- * xen_cr3 contains the current logical cr3 value; it contains the
- * last set cr3.  This may not be the current effective cr3, because
- * its update may be being lazily deferred.  However, a vcpu looking
- * at its own cr3 can use this value knowing that it everything will
- * be self-consistent.
- *
- * xen_current_cr3 contains the actual vcpu cr3; it is set once the
- * hypercall to set the vcpu cr3 is complete (so it may be a little
- * out of date, but it will never be set early).  If one vcpu is
- * looking at another vcpu's cr3 value, it should use this variable.
- */
-DEFINE_PER_CPU(unsigned long, xen_cr3);	 /* cr3 stored as physaddr */
-DEFINE_PER_CPU(unsigned long, xen_current_cr3);	 /* actual vcpu cr3 */
-
 struct start_info *xen_start_info;
 EXPORT_SYMBOL_GPL(xen_start_info);
 
 struct shared_info xen_dummy_shared_info;
 
+void *xen_initial_gdt;
+
 /*
  * Point at some empty memory to start with. We map the real shared_info
  * page as soon as fixmap is up and running.
@@ -114,14 +87,7 @@ struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info;
  *
  * 0: not available, 1: available
  */
-static int have_vcpu_info_placement =
-#ifdef CONFIG_X86_32
-	1
-#else
-	0
-#endif
-	;
-
+static int have_vcpu_info_placement = 1;
 
 static void xen_vcpu_setup(int cpu)
 {
@@ -137,7 +103,7 @@ static void xen_vcpu_setup(int cpu)
 
 	vcpup = &per_cpu(xen_vcpu_info, cpu);
 
-	info.mfn = virt_to_mfn(vcpup);
+	info.mfn = arbitrary_virt_to_mfn(vcpup);
 	info.offset = offset_in_page(vcpup);
 
 	printk(KERN_DEBUG "trying to map vcpu_info %d at %p, mfn %llx, offset %d\n",
@@ -237,7 +203,7 @@ static unsigned long xen_get_debugreg(int reg)
 	return HYPERVISOR_get_debugreg(reg);
 }
 
-static void xen_leave_lazy(void)
+void xen_leave_lazy(void)
 {
 	paravirt_leave_lazy(paravirt_get_lazy_mode());
 	xen_mc_flush();
@@ -335,8 +301,10 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 	frames = mcs.args;
 
 	for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
-		frames[f] = virt_to_mfn(va);
+		frames[f] = arbitrary_virt_to_mfn((void *)va);
+
 		make_lowmem_page_readonly((void *)va);
+		make_lowmem_page_readonly(mfn_to_virt(frames[f]));
 	}
 
 	MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
@@ -348,7 +316,7 @@ static void load_TLS_descriptor(struct thread_struct *t,
 				unsigned int cpu, unsigned int i)
 {
 	struct desc_struct *gdt = get_cpu_gdt_table(cpu);
-	xmaddr_t maddr = virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
+	xmaddr_t maddr = arbitrary_virt_to_machine(&gdt[GDT_ENTRY_TLS_MIN+i]);
 	struct multicall_space mc = __xen_mc_entry(0);
 
 	MULTI_update_descriptor(mc.mc, maddr.maddr, t->tls_array[i]);
@@ -357,13 +325,14 @@ static void load_TLS_descriptor(struct thread_struct *t,
 static void xen_load_tls(struct thread_struct *t, unsigned int cpu)
 {
 	/*
-	 * XXX sleazy hack: If we're being called in a lazy-cpu zone,
-	 * it means we're in a context switch, and %gs has just been
-	 * saved.  This means we can zero it out to prevent faults on
-	 * exit from the hypervisor if the next process has no %gs.
-	 * Either way, it has been saved, and the new value will get
-	 * loaded properly.  This will go away as soon as Xen has been
-	 * modified to not save/restore %gs for normal hypercalls.
+	 * XXX sleazy hack: If we're being called in a lazy-cpu zone
+	 * and lazy gs handling is enabled, it means we're in a
+	 * context switch, and %gs has just been saved.  This means we
+	 * can zero it out to prevent faults on exit from the
+	 * hypervisor if the next process has no %gs.  Either way, it
+	 * has been saved, and the new value will get loaded properly.
+	 * This will go away as soon as Xen has been modified to not
+	 * save/restore %gs for normal hypercalls.
 	 *
 	 * On x86_64, this hack is not used for %gs, because gs points
 	 * to KERNEL_GS_BASE (and uses it for PDA references), so we
@@ -375,7 +344,7 @@ static void xen_load_tls(struct thread_struct *t, unsigned int cpu)
 	 */
 	if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) {
 #ifdef CONFIG_X86_32
-		loadsegment(gs, 0);
+		lazy_load_gs(0);
 #else
 		loadsegment(fs, 0);
 #endif
@@ -521,7 +490,7 @@ static void xen_write_gdt_entry(struct desc_struct *dt, int entry,
 		break;
 
 	default: {
-		xmaddr_t maddr = virt_to_machine(&dt[entry]);
+		xmaddr_t maddr = arbitrary_virt_to_machine(&dt[entry]);
 
 		xen_mc_flush();
 		if (HYPERVISOR_update_descriptor(maddr.maddr, *(u64 *)desc))
@@ -587,94 +556,18 @@ static u32 xen_safe_apic_wait_icr_idle(void)
         return 0;
 }
 
-static struct apic_ops xen_basic_apic_ops = {
-	.read = xen_apic_read,
-	.write = xen_apic_write,
-	.icr_read = xen_apic_icr_read,
-	.icr_write = xen_apic_icr_write,
-	.wait_icr_idle = xen_apic_wait_icr_idle,
-	.safe_wait_icr_idle = xen_safe_apic_wait_icr_idle,
-};
-
-#endif
-
-static void xen_flush_tlb(void)
-{
-	struct mmuext_op *op;
-	struct multicall_space mcs;
-
-	preempt_disable();
-
-	mcs = xen_mc_entry(sizeof(*op));
-
-	op = mcs.args;
-	op->cmd = MMUEXT_TLB_FLUSH_LOCAL;
-	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
-
-	xen_mc_issue(PARAVIRT_LAZY_MMU);
-
-	preempt_enable();
-}
-
-static void xen_flush_tlb_single(unsigned long addr)
+static void set_xen_basic_apic_ops(void)
 {
-	struct mmuext_op *op;
-	struct multicall_space mcs;
-
-	preempt_disable();
-
-	mcs = xen_mc_entry(sizeof(*op));
-	op = mcs.args;
-	op->cmd = MMUEXT_INVLPG_LOCAL;
-	op->arg1.linear_addr = addr & PAGE_MASK;
-	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
-
-	xen_mc_issue(PARAVIRT_LAZY_MMU);
-
-	preempt_enable();
+	apic->read = xen_apic_read;
+	apic->write = xen_apic_write;
+	apic->icr_read = xen_apic_icr_read;
+	apic->icr_write = xen_apic_icr_write;
+	apic->wait_icr_idle = xen_apic_wait_icr_idle;
+	apic->safe_wait_icr_idle = xen_safe_apic_wait_icr_idle;
 }
 
-static void xen_flush_tlb_others(const cpumask_t *cpus, struct mm_struct *mm,
-				 unsigned long va)
-{
-	struct {
-		struct mmuext_op op;
-		cpumask_t mask;
-	} *args;
-	cpumask_t cpumask = *cpus;
-	struct multicall_space mcs;
-
-	/*
-	 * A couple of (to be removed) sanity checks:
-	 *
-	 * - current CPU must not be in mask
-	 * - mask must exist :)
-	 */
-	BUG_ON(cpus_empty(cpumask));
-	BUG_ON(cpu_isset(smp_processor_id(), cpumask));
-	BUG_ON(!mm);
-
-	/* If a CPU which we ran on has gone down, OK. */
-	cpus_and(cpumask, cpumask, cpu_online_map);
-	if (cpus_empty(cpumask))
-		return;
-
-	mcs = xen_mc_entry(sizeof(*args));
-	args = mcs.args;
-	args->mask = cpumask;
-	args->op.arg2.vcpumask = &args->mask;
-
-	if (va == TLB_FLUSH_ALL) {
-		args->op.cmd = MMUEXT_TLB_FLUSH_MULTI;
-	} else {
-		args->op.cmd = MMUEXT_INVLPG_MULTI;
-		args->op.arg1.linear_addr = va;
-	}
-
-	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
+#endif
 
-	xen_mc_issue(PARAVIRT_LAZY_MMU);
-}
 
 static void xen_clts(void)
 {
@@ -700,21 +593,6 @@ static void xen_write_cr0(unsigned long cr0)
 	xen_mc_issue(PARAVIRT_LAZY_CPU);
 }
 
-static void xen_write_cr2(unsigned long cr2)
-{
-	x86_read_percpu(xen_vcpu)->arch.cr2 = cr2;
-}
-
-static unsigned long xen_read_cr2(void)
-{
-	return x86_read_percpu(xen_vcpu)->arch.cr2;
-}
-
-static unsigned long xen_read_cr2_direct(void)
-{
-	return x86_read_percpu(xen_vcpu_info.arch.cr2);
-}
-
 static void xen_write_cr4(unsigned long cr4)
 {
 	cr4 &= ~X86_CR4_PGE;
@@ -723,71 +601,6 @@ static void xen_write_cr4(unsigned long cr4)
 	native_write_cr4(cr4);
 }
 
-static unsigned long xen_read_cr3(void)
-{
-	return x86_read_percpu(xen_cr3);
-}
-
-static void set_current_cr3(void *v)
-{
-	x86_write_percpu(xen_current_cr3, (unsigned long)v);
-}
-
-static void __xen_write_cr3(bool kernel, unsigned long cr3)
-{
-	struct mmuext_op *op;
-	struct multicall_space mcs;
-	unsigned long mfn;
-
-	if (cr3)
-		mfn = pfn_to_mfn(PFN_DOWN(cr3));
-	else
-		mfn = 0;
-
-	WARN_ON(mfn == 0 && kernel);
-
-	mcs = __xen_mc_entry(sizeof(*op));
-
-	op = mcs.args;
-	op->cmd = kernel ? MMUEXT_NEW_BASEPTR : MMUEXT_NEW_USER_BASEPTR;
-	op->arg1.mfn = mfn;
-
-	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
-
-	if (kernel) {
-		x86_write_percpu(xen_cr3, cr3);
-
-		/* Update xen_current_cr3 once the batch has actually
-		   been submitted. */
-		xen_mc_callback(set_current_cr3, (void *)cr3);
-	}
-}
-
-static void xen_write_cr3(unsigned long cr3)
-{
-	BUG_ON(preemptible());
-
-	xen_mc_batch();  /* disables interrupts */
-
-	/* Update while interrupts are disabled, so its atomic with
-	   respect to ipis */
-	x86_write_percpu(xen_cr3, cr3);
-
-	__xen_write_cr3(true, cr3);
-
-#ifdef CONFIG_X86_64
-	{
-		pgd_t *user_pgd = xen_get_user_pgd(__va(cr3));
-		if (user_pgd)
-			__xen_write_cr3(false, __pa(user_pgd));
-		else
-			__xen_write_cr3(false, 0);
-	}
-#endif
-
-	xen_mc_issue(PARAVIRT_LAZY_CPU);  /* interrupts restored */
-}
-
 static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 {
 	int ret;
@@ -829,185 +642,6 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 	return ret;
 }
 
-/* Early in boot, while setting up the initial pagetable, assume
-   everything is pinned. */
-static __init void xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)
-{
-#ifdef CONFIG_FLATMEM
-	BUG_ON(mem_map);	/* should only be used early */
-#endif
-	make_lowmem_page_readonly(__va(PFN_PHYS(pfn)));
-}
-
-/* Early release_pte assumes that all pts are pinned, since there's
-   only init_mm and anything attached to that is pinned. */
-static void xen_release_pte_init(unsigned long pfn)
-{
-	make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
-}
-
-static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
-{
-	struct mmuext_op op;
-	op.cmd = cmd;
-	op.arg1.mfn = pfn_to_mfn(pfn);
-	if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
-		BUG();
-}
-
-/* This needs to make sure the new pte page is pinned iff its being
-   attached to a pinned pagetable. */
-static void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn, unsigned level)
-{
-	struct page *page = pfn_to_page(pfn);
-
-	if (PagePinned(virt_to_page(mm->pgd))) {
-		SetPagePinned(page);
-
-		vm_unmap_aliases();
-		if (!PageHighMem(page)) {
-			make_lowmem_page_readonly(__va(PFN_PHYS((unsigned long)pfn)));
-			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
-				pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
-		} else {
-			/* make sure there are no stray mappings of
-			   this page */
-			kmap_flush_unused();
-		}
-	}
-}
-
-static void xen_alloc_pte(struct mm_struct *mm, unsigned long pfn)
-{
-	xen_alloc_ptpage(mm, pfn, PT_PTE);
-}
-
-static void xen_alloc_pmd(struct mm_struct *mm, unsigned long pfn)
-{
-	xen_alloc_ptpage(mm, pfn, PT_PMD);
-}
-
-static int xen_pgd_alloc(struct mm_struct *mm)
-{
-	pgd_t *pgd = mm->pgd;
-	int ret = 0;
-
-	BUG_ON(PagePinned(virt_to_page(pgd)));
-
-#ifdef CONFIG_X86_64
-	{
-		struct page *page = virt_to_page(pgd);
-		pgd_t *user_pgd;
-
-		BUG_ON(page->private != 0);
-
-		ret = -ENOMEM;
-
-		user_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-		page->private = (unsigned long)user_pgd;
-
-		if (user_pgd != NULL) {
-			user_pgd[pgd_index(VSYSCALL_START)] =
-				__pgd(__pa(level3_user_vsyscall) | _PAGE_TABLE);
-			ret = 0;
-		}
-
-		BUG_ON(PagePinned(virt_to_page(xen_get_user_pgd(pgd))));
-	}
-#endif
-
-	return ret;
-}
-
-static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
-#ifdef CONFIG_X86_64
-	pgd_t *user_pgd = xen_get_user_pgd(pgd);
-
-	if (user_pgd)
-		free_page((unsigned long)user_pgd);
-#endif
-}
-
-/* This should never happen until we're OK to use struct page */
-static void xen_release_ptpage(unsigned long pfn, unsigned level)
-{
-	struct page *page = pfn_to_page(pfn);
-
-	if (PagePinned(page)) {
-		if (!PageHighMem(page)) {
-			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
-				pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
-			make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
-		}
-		ClearPagePinned(page);
-	}
-}
-
-static void xen_release_pte(unsigned long pfn)
-{
-	xen_release_ptpage(pfn, PT_PTE);
-}
-
-static void xen_release_pmd(unsigned long pfn)
-{
-	xen_release_ptpage(pfn, PT_PMD);
-}
-
-#if PAGETABLE_LEVELS == 4
-static void xen_alloc_pud(struct mm_struct *mm, unsigned long pfn)
-{
-	xen_alloc_ptpage(mm, pfn, PT_PUD);
-}
-
-static void xen_release_pud(unsigned long pfn)
-{
-	xen_release_ptpage(pfn, PT_PUD);
-}
-#endif
-
-#ifdef CONFIG_HIGHPTE
-static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
-{
-	pgprot_t prot = PAGE_KERNEL;
-
-	if (PagePinned(page))
-		prot = PAGE_KERNEL_RO;
-
-	if (0 && PageHighMem(page))
-		printk("mapping highpte %lx type %d prot %s\n",
-		       page_to_pfn(page), type,
-		       (unsigned long)pgprot_val(prot) & _PAGE_RW ? "WRITE" : "READ");
-
-	return kmap_atomic_prot(page, type, prot);
-}
-#endif
-
-#ifdef CONFIG_X86_32
-static __init pte_t mask_rw_pte(pte_t *ptep, pte_t pte)
-{
-	/* If there's an existing pte, then don't allow _PAGE_RW to be set */
-	if (pte_val_ma(*ptep) & _PAGE_PRESENT)
-		pte = __pte_ma(((pte_val_ma(*ptep) & _PAGE_RW) | ~_PAGE_RW) &
-			       pte_val_ma(pte));
-
-	return pte;
-}
-
-/* Init-time set_pte while constructing initial pagetables, which
-   doesn't allow RO pagetable pages to be remapped RW */
-static __init void xen_set_pte_init(pte_t *ptep, pte_t pte)
-{
-	pte = mask_rw_pte(ptep, pte);
-
-	xen_set_pte(ptep, pte);
-}
-#endif
-
-static __init void xen_pagetable_setup_start(pgd_t *base)
-{
-}
-
 void xen_setup_shared_info(void)
 {
 	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
@@ -1028,37 +662,6 @@ void xen_setup_shared_info(void)
 	xen_setup_mfn_list_list();
 }
 
-static __init void xen_pagetable_setup_done(pgd_t *base)
-{
-	xen_setup_shared_info();
-}
-
-static __init void xen_post_allocator_init(void)
-{
-	pv_mmu_ops.set_pte = xen_set_pte;
-	pv_mmu_ops.set_pmd = xen_set_pmd;
-	pv_mmu_ops.set_pud = xen_set_pud;
-#if PAGETABLE_LEVELS == 4
-	pv_mmu_ops.set_pgd = xen_set_pgd;
-#endif
-
-	/* This will work as long as patching hasn't happened yet
-	   (which it hasn't) */
-	pv_mmu_ops.alloc_pte = xen_alloc_pte;
-	pv_mmu_ops.alloc_pmd = xen_alloc_pmd;
-	pv_mmu_ops.release_pte = xen_release_pte;
-	pv_mmu_ops.release_pmd = xen_release_pmd;
-#if PAGETABLE_LEVELS == 4
-	pv_mmu_ops.alloc_pud = xen_alloc_pud;
-	pv_mmu_ops.release_pud = xen_release_pud;
-#endif
-
-#ifdef CONFIG_X86_64
-	SetPagePinned(virt_to_page(level3_user_vsyscall));
-#endif
-	xen_mark_init_mm_pinned();
-}
-
 /* This is called once we have the cpu_possible_map */
 void xen_setup_vcpu_info_placement(void)
 {
@@ -1072,10 +675,10 @@ void xen_setup_vcpu_info_placement(void)
 	if (have_vcpu_info_placement) {
 		printk(KERN_INFO "Xen: using vcpu_info placement\n");
 
-		pv_irq_ops.save_fl = xen_save_fl_direct;
-		pv_irq_ops.restore_fl = xen_restore_fl_direct;
-		pv_irq_ops.irq_disable = xen_irq_disable_direct;
-		pv_irq_ops.irq_enable = xen_irq_enable_direct;
+		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
+		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
+		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
+		pv_irq_ops.irq_enable = __PV_IS_CALLEE_SAVE(xen_irq_enable_direct);
 		pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
 	}
 }
@@ -1133,49 +736,6 @@ static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
 	return ret;
 }
 
-static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
-{
-	pte_t pte;
-
-	phys >>= PAGE_SHIFT;
-
-	switch (idx) {
-	case FIX_BTMAP_END ... FIX_BTMAP_BEGIN:
-#ifdef CONFIG_X86_F00F_BUG
-	case FIX_F00F_IDT:
-#endif
-#ifdef CONFIG_X86_32
-	case FIX_WP_TEST:
-	case FIX_VDSO:
-# ifdef CONFIG_HIGHMEM
-	case FIX_KMAP_BEGIN ... FIX_KMAP_END:
-# endif
-#else
-	case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE:
-#endif
-#ifdef CONFIG_X86_LOCAL_APIC
-	case FIX_APIC_BASE:	/* maps dummy local APIC */
-#endif
-		pte = pfn_pte(phys, prot);
-		break;
-
-	default:
-		pte = mfn_pte(phys, prot);
-		break;
-	}
-
-	__native_set_fixmap(idx, pte);
-
-#ifdef CONFIG_X86_64
-	/* Replicate changes to map the vsyscall page into the user
-	   pagetable vsyscall mapping. */
-	if (idx >= VSYSCALL_LAST_PAGE && idx <= VSYSCALL_FIRST_PAGE) {
-		unsigned long vaddr = __fix_to_virt(idx);
-		set_pte_vaddr_pud(level3_user_vsyscall, vaddr, pte);
-	}
-#endif
-}
-
 static const struct pv_info xen_info __initdata = {
 	.paravirt_enabled = 1,
 	.shared_kernel_pmd = 0,
@@ -1271,87 +831,6 @@ static const struct pv_apic_ops xen_apic_ops __initdata = {
 #endif
 };
 
-static const struct pv_mmu_ops xen_mmu_ops __initdata = {
-	.pagetable_setup_start = xen_pagetable_setup_start,
-	.pagetable_setup_done = xen_pagetable_setup_done,
-
-	.read_cr2 = xen_read_cr2,
-	.write_cr2 = xen_write_cr2,
-
-	.read_cr3 = xen_read_cr3,
-	.write_cr3 = xen_write_cr3,
-
-	.flush_tlb_user = xen_flush_tlb,
-	.flush_tlb_kernel = xen_flush_tlb,
-	.flush_tlb_single = xen_flush_tlb_single,
-	.flush_tlb_others = xen_flush_tlb_others,
-
-	.pte_update = paravirt_nop,
-	.pte_update_defer = paravirt_nop,
-
-	.pgd_alloc = xen_pgd_alloc,
-	.pgd_free = xen_pgd_free,
-
-	.alloc_pte = xen_alloc_pte_init,
-	.release_pte = xen_release_pte_init,
-	.alloc_pmd = xen_alloc_pte_init,
-	.alloc_pmd_clone = paravirt_nop,
-	.release_pmd = xen_release_pte_init,
-
-#ifdef CONFIG_HIGHPTE
-	.kmap_atomic_pte = xen_kmap_atomic_pte,
-#endif
-
-#ifdef CONFIG_X86_64
-	.set_pte = xen_set_pte,
-#else
-	.set_pte = xen_set_pte_init,
-#endif
-	.set_pte_at = xen_set_pte_at,
-	.set_pmd = xen_set_pmd_hyper,
-
-	.ptep_modify_prot_start = __ptep_modify_prot_start,
-	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
-
-	.pte_val = xen_pte_val,
-	.pte_flags = native_pte_flags,
-	.pgd_val = xen_pgd_val,
-
-	.make_pte = xen_make_pte,
-	.make_pgd = xen_make_pgd,
-
-#ifdef CONFIG_X86_PAE
-	.set_pte_atomic = xen_set_pte_atomic,
-	.set_pte_present = xen_set_pte_at,
-	.pte_clear = xen_pte_clear,
-	.pmd_clear = xen_pmd_clear,
-#endif	/* CONFIG_X86_PAE */
-	.set_pud = xen_set_pud_hyper,
-
-	.make_pmd = xen_make_pmd,
-	.pmd_val = xen_pmd_val,
-
-#if PAGETABLE_LEVELS == 4
-	.pud_val = xen_pud_val,
-	.make_pud = xen_make_pud,
-	.set_pgd = xen_set_pgd_hyper,
-
-	.alloc_pud = xen_alloc_pte_init,
-	.release_pud = xen_release_pte_init,
-#endif	/* PAGETABLE_LEVELS == 4 */
-
-	.activate_mm = xen_activate_mm,
-	.dup_mmap = xen_dup_mmap,
-	.exit_mmap = xen_exit_mmap,
-
-	.lazy_mode = {
-		.enter = paravirt_enter_lazy_mmu,
-		.leave = xen_leave_lazy,
-	},
-
-	.set_fixmap = xen_set_fixmap,
-};
-
 static void xen_reboot(int reason)
 {
 	struct sched_shutdown r = { .reason = reason };
@@ -1394,223 +873,6 @@ static const struct machine_ops __initdata xen_machine_ops = {
 };
 
 
-static void __init xen_reserve_top(void)
-{
-#ifdef CONFIG_X86_32
-	unsigned long top = HYPERVISOR_VIRT_START;
-	struct xen_platform_parameters pp;
-
-	if (HYPERVISOR_xen_version(XENVER_platform_parameters, &pp) == 0)
-		top = pp.virt_start;
-
-	reserve_top_address(-top);
-#endif	/* CONFIG_X86_32 */
-}
-
-/*
- * Like __va(), but returns address in the kernel mapping (which is
- * all we have until the physical memory mapping has been set up.
- */
-static void *__ka(phys_addr_t paddr)
-{
-#ifdef CONFIG_X86_64
-	return (void *)(paddr + __START_KERNEL_map);
-#else
-	return __va(paddr);
-#endif
-}
-
-/* Convert a machine address to physical address */
-static unsigned long m2p(phys_addr_t maddr)
-{
-	phys_addr_t paddr;
-
-	maddr &= PTE_PFN_MASK;
-	paddr = mfn_to_pfn(maddr >> PAGE_SHIFT) << PAGE_SHIFT;
-
-	return paddr;
-}
-
-/* Convert a machine address to kernel virtual */
-static void *m2v(phys_addr_t maddr)
-{
-	return __ka(m2p(maddr));
-}
-
-static void set_page_prot(void *addr, pgprot_t prot)
-{
-	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
-	pte_t pte = pfn_pte(pfn, prot);
-
-	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
-		BUG();
-}
-
-static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
-{
-	unsigned pmdidx, pteidx;
-	unsigned ident_pte;
-	unsigned long pfn;
-
-	ident_pte = 0;
-	pfn = 0;
-	for (pmdidx = 0; pmdidx < PTRS_PER_PMD && pfn < max_pfn; pmdidx++) {
-		pte_t *pte_page;
-
-		/* Reuse or allocate a page of ptes */
-		if (pmd_present(pmd[pmdidx]))
-			pte_page = m2v(pmd[pmdidx].pmd);
-		else {
-			/* Check for free pte pages */
-			if (ident_pte == ARRAY_SIZE(level1_ident_pgt))
-				break;
-
-			pte_page = &level1_ident_pgt[ident_pte];
-			ident_pte += PTRS_PER_PTE;
-
-			pmd[pmdidx] = __pmd(__pa(pte_page) | _PAGE_TABLE);
-		}
-
-		/* Install mappings */
-		for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
-			pte_t pte;
-
-			if (pfn > max_pfn_mapped)
-				max_pfn_mapped = pfn;
-
-			if (!pte_none(pte_page[pteidx]))
-				continue;
-
-			pte = pfn_pte(pfn, PAGE_KERNEL_EXEC);
-			pte_page[pteidx] = pte;
-		}
-	}
-
-	for (pteidx = 0; pteidx < ident_pte; pteidx += PTRS_PER_PTE)
-		set_page_prot(&level1_ident_pgt[pteidx], PAGE_KERNEL_RO);
-
-	set_page_prot(pmd, PAGE_KERNEL_RO);
-}
-
-#ifdef CONFIG_X86_64
-static void convert_pfn_mfn(void *v)
-{
-	pte_t *pte = v;
-	int i;
-
-	/* All levels are converted the same way, so just treat them
-	   as ptes. */
-	for (i = 0; i < PTRS_PER_PTE; i++)
-		pte[i] = xen_make_pte(pte[i].pte);
-}
-
-/*
- * Set up the inital kernel pagetable.
- *
- * We can construct this by grafting the Xen provided pagetable into
- * head_64.S's preconstructed pagetables.  We copy the Xen L2's into
- * level2_ident_pgt, level2_kernel_pgt and level2_fixmap_pgt.  This
- * means that only the kernel has a physical mapping to start with -
- * but that's enough to get __va working.  We need to fill in the rest
- * of the physical mapping once some sort of allocator has been set
- * up.
- */
-static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
-						unsigned long max_pfn)
-{
-	pud_t *l3;
-	pmd_t *l2;
-
-	/* Zap identity mapping */
-	init_level4_pgt[0] = __pgd(0);
-
-	/* Pre-constructed entries are in pfn, so convert to mfn */
-	convert_pfn_mfn(init_level4_pgt);
-	convert_pfn_mfn(level3_ident_pgt);
-	convert_pfn_mfn(level3_kernel_pgt);
-
-	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
-	l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
-
-	memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
-	memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
-
-	l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
-	l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
-	memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
-
-	/* Set up identity map */
-	xen_map_identity_early(level2_ident_pgt, max_pfn);
-
-	/* Make pagetable pieces RO */
-	set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
-	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
-
-	/* Pin down new L4 */
-	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
-			  PFN_DOWN(__pa_symbol(init_level4_pgt)));
-
-	/* Unpin Xen-provided one */
-	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
-
-	/* Switch over */
-	pgd = init_level4_pgt;
-
-	/*
-	 * At this stage there can be no user pgd, and no page
-	 * structure to attach it to, so make sure we just set kernel
-	 * pgd.
-	 */
-	xen_mc_batch();
-	__xen_write_cr3(true, __pa(pgd));
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
-
-	reserve_early(__pa(xen_start_info->pt_base),
-		      __pa(xen_start_info->pt_base +
-			   xen_start_info->nr_pt_frames * PAGE_SIZE),
-		      "XEN PAGETABLES");
-
-	return pgd;
-}
-#else	/* !CONFIG_X86_64 */
-static pmd_t level2_kernel_pgt[PTRS_PER_PMD] __page_aligned_bss;
-
-static __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
-						unsigned long max_pfn)
-{
-	pmd_t *kernel_pmd;
-
-	init_pg_tables_start = __pa(pgd);
-	init_pg_tables_end = __pa(pgd) + xen_start_info->nr_pt_frames*PAGE_SIZE;
-	max_pfn_mapped = PFN_DOWN(init_pg_tables_end + 512*1024);
-
-	kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
-	memcpy(level2_kernel_pgt, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
-
-	xen_map_identity_early(level2_kernel_pgt, max_pfn);
-
-	memcpy(swapper_pg_dir, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
-	set_pgd(&swapper_pg_dir[KERNEL_PGD_BOUNDARY],
-			__pgd(__pa(level2_kernel_pgt) | _PAGE_PRESENT));
-
-	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
-	set_page_prot(empty_zero_page, PAGE_KERNEL_RO);
-
-	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
-
-	xen_write_cr3(__pa(swapper_pg_dir));
-
-	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
-
-	return swapper_pg_dir;
-}
-#endif	/* CONFIG_X86_64 */
-
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1639,7 +901,7 @@ asmlinkage void __init xen_start_kernel(void)
 	/*
 	 * set up the basic apic ops.
 	 */
-	apic_ops = &xen_basic_apic_ops;
+	set_xen_basic_apic_ops();
 #endif
 
 	if (xen_feature(XENFEAT_mmu_pt_update_preserve_ad)) {
@@ -1650,10 +912,18 @@ asmlinkage void __init xen_start_kernel(void)
 	machine_ops = xen_machine_ops;
 
 #ifdef CONFIG_X86_64
-	/* Disable until direct per-cpu data access. */
-	have_vcpu_info_placement = 0;
-	x86_64_init_pda();
+	/*
+	 * Setup percpu state.  We only need to do this for 64-bit
+	 * because 32-bit already has %fs set properly.
+	 */
+	load_percpu_segment(0);
 #endif
+	/*
+	 * The only reliable way to retain the initial address of the
+	 * percpu gdt_page is to remember it here, so we can go and
+	 * mark it RW later, when the initial percpu area is freed.
+	 */
+	xen_initial_gdt = &per_cpu(gdt_page, 0);
 
 	xen_smp_init();
 
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index bb042608c602..cfd17799bd6d 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -19,27 +19,12 @@ void xen_force_evtchn_callback(void)
 	(void)HYPERVISOR_xen_version(0, NULL);
 }
 
-static void __init __xen_init_IRQ(void)
-{
-	int i;
-
-	/* Create identity vector->irq map */
-	for(i = 0; i < NR_VECTORS; i++) {
-		int cpu;
-
-		for_each_possible_cpu(cpu)
-			per_cpu(vector_irq, cpu)[i] = i;
-	}
-
-	xen_init_IRQ();
-}
-
 static unsigned long xen_save_fl(void)
 {
 	struct vcpu_info *vcpu;
 	unsigned long flags;
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 
 	/* flag has opposite sense of mask */
 	flags = !vcpu->evtchn_upcall_mask;
@@ -50,6 +35,7 @@ static unsigned long xen_save_fl(void)
 	*/
 	return (-flags) & X86_EFLAGS_IF;
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
 
 static void xen_restore_fl(unsigned long flags)
 {
@@ -62,7 +48,7 @@ static void xen_restore_fl(unsigned long flags)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = flags;
 	preempt_enable_no_resched();
 
@@ -76,6 +62,7 @@ static void xen_restore_fl(unsigned long flags)
 			xen_force_evtchn_callback();
 	}
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
 
 static void xen_irq_disable(void)
 {
@@ -83,9 +70,10 @@ static void xen_irq_disable(void)
 	   make sure we're don't switch CPUs between getting the vcpu
 	   pointer and updating the mask. */
 	preempt_disable();
-	x86_read_percpu(xen_vcpu)->evtchn_upcall_mask = 1;
+	percpu_read(xen_vcpu)->evtchn_upcall_mask = 1;
 	preempt_enable_no_resched();
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
 
 static void xen_irq_enable(void)
 {
@@ -96,7 +84,7 @@ static void xen_irq_enable(void)
 	   the caller is confused and is trying to re-enable interrupts
 	   on an indeterminate processor. */
 
-	vcpu = x86_read_percpu(xen_vcpu);
+	vcpu = percpu_read(xen_vcpu);
 	vcpu->evtchn_upcall_mask = 0;
 
 	/* Doesn't matter if we get preempted here, because any
@@ -106,6 +94,7 @@ static void xen_irq_enable(void)
 	if (unlikely(vcpu->evtchn_upcall_pending))
 		xen_force_evtchn_callback();
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_irq_enable);
 
 static void xen_safe_halt(void)
 {
@@ -123,11 +112,13 @@ static void xen_halt(void)
 }
 
 static const struct pv_irq_ops xen_irq_ops __initdata = {
-	.init_IRQ = __xen_init_IRQ,
-	.save_fl = xen_save_fl,
-	.restore_fl = xen_restore_fl,
-	.irq_disable = xen_irq_disable,
-	.irq_enable = xen_irq_enable,
+	.init_IRQ = xen_init_IRQ,
+
+	.save_fl = PV_CALLEE_SAVE(xen_save_fl),
+	.restore_fl = PV_CALLEE_SAVE(xen_restore_fl),
+	.irq_disable = PV_CALLEE_SAVE(xen_irq_disable),
+	.irq_enable = PV_CALLEE_SAVE(xen_irq_enable),
+
 	.safe_halt = xen_safe_halt,
 	.halt = xen_halt,
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 503c240e26c7..cb6afa4ec95c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -47,6 +47,7 @@
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
 #include <asm/mmu_context.h>
+#include <asm/setup.h>
 #include <asm/paravirt.h>
 #include <asm/linkage.h>
 
@@ -55,6 +56,8 @@
 
 #include <xen/page.h>
 #include <xen/interface/xen.h>
+#include <xen/interface/version.h>
+#include <xen/hvc-console.h>
 
 #include "multicalls.h"
 #include "mmu.h"
@@ -114,6 +117,37 @@ static inline void check_zero(void)
 
 #endif /* CONFIG_XEN_DEBUG_FS */
 
+
+/*
+ * Identity map, in addition to plain kernel map.  This needs to be
+ * large enough to allocate page table pages to allocate the rest.
+ * Each page can map 2MB.
+ */
+static pte_t level1_ident_pgt[PTRS_PER_PTE * 4] __page_aligned_bss;
+
+#ifdef CONFIG_X86_64
+/* l3 pud for userspace vsyscall mapping */
+static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss;
+#endif /* CONFIG_X86_64 */
+
+/*
+ * Note about cr3 (pagetable base) values:
+ *
+ * xen_cr3 contains the current logical cr3 value; it contains the
+ * last set cr3.  This may not be the current effective cr3, because
+ * its update may be being lazily deferred.  However, a vcpu looking
+ * at its own cr3 can use this value knowing that it everything will
+ * be self-consistent.
+ *
+ * xen_current_cr3 contains the actual vcpu cr3; it is set once the
+ * hypercall to set the vcpu cr3 is complete (so it may be a little
+ * out of date, but it will never be set early).  If one vcpu is
+ * looking at another vcpu's cr3 value, it should use this variable.
+ */
+DEFINE_PER_CPU(unsigned long, xen_cr3);	 /* cr3 stored as physaddr */
+DEFINE_PER_CPU(unsigned long, xen_current_cr3);	 /* actual vcpu cr3 */
+
+
 /*
  * Just beyond the highest usermode address.  STACK_TOP_MAX has a
  * redzone above it, so round it up to a PGD boundary.
@@ -242,6 +276,13 @@ void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 	p2m_top[topidx][idx] = mfn;
 }
 
+unsigned long arbitrary_virt_to_mfn(void *vaddr)
+{
+	xmaddr_t maddr = arbitrary_virt_to_machine(vaddr);
+
+	return PFN_DOWN(maddr.maddr);
+}
+
 xmaddr_t arbitrary_virt_to_machine(void *vaddr)
 {
 	unsigned long address = (unsigned long)vaddr;
@@ -458,28 +499,33 @@ pteval_t xen_pte_val(pte_t pte)
 {
 	return pte_mfn_to_pfn(pte.pte);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
 
 pgdval_t xen_pgd_val(pgd_t pgd)
 {
 	return pte_mfn_to_pfn(pgd.pgd);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_pgd_val);
 
 pte_t xen_make_pte(pteval_t pte)
 {
 	pte = pte_pfn_to_mfn(pte);
 	return native_make_pte(pte);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
 
 pgd_t xen_make_pgd(pgdval_t pgd)
 {
 	pgd = pte_pfn_to_mfn(pgd);
 	return native_make_pgd(pgd);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pgd);
 
 pmdval_t xen_pmd_val(pmd_t pmd)
 {
 	return pte_mfn_to_pfn(pmd.pmd);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_pmd_val);
 
 void xen_set_pud_hyper(pud_t *ptr, pud_t val)
 {
@@ -556,12 +602,14 @@ pmd_t xen_make_pmd(pmdval_t pmd)
 	pmd = pte_pfn_to_mfn(pmd);
 	return native_make_pmd(pmd);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd);
 
 #if PAGETABLE_LEVELS == 4
 pudval_t xen_pud_val(pud_t pud)
 {
 	return pte_mfn_to_pfn(pud.pud);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_pud_val);
 
 pud_t xen_make_pud(pudval_t pud)
 {
@@ -569,6 +617,7 @@ pud_t xen_make_pud(pudval_t pud)
 
 	return native_make_pud(pud);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pud);
 
 pgd_t *xen_get_user_pgd(pgd_t *pgd)
 {
@@ -1063,18 +1112,14 @@ static void drop_other_mm_ref(void *info)
 	struct mm_struct *mm = info;
 	struct mm_struct *active_mm;
 
-#ifdef CONFIG_X86_64
-	active_mm = read_pda(active_mm);
-#else
-	active_mm = __get_cpu_var(cpu_tlbstate).active_mm;
-#endif
+	active_mm = percpu_read(cpu_tlbstate.active_mm);
 
 	if (active_mm == mm)
 		leave_mm(smp_processor_id());
 
 	/* If this cpu still has a stale cr3 reference, then make sure
 	   it has been flushed. */
-	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+	if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) {
 		load_cr3(swapper_pg_dir);
 		arch_flush_lazy_cpu_mode();
 	}
@@ -1156,6 +1201,706 @@ void xen_exit_mmap(struct mm_struct *mm)
 	spin_unlock(&mm->page_table_lock);
 }
 
+static __init void xen_pagetable_setup_start(pgd_t *base)
+{
+}
+
+static __init void xen_pagetable_setup_done(pgd_t *base)
+{
+	xen_setup_shared_info();
+}
+
+static void xen_write_cr2(unsigned long cr2)
+{
+	percpu_read(xen_vcpu)->arch.cr2 = cr2;
+}
+
+static unsigned long xen_read_cr2(void)
+{
+	return percpu_read(xen_vcpu)->arch.cr2;
+}
+
+unsigned long xen_read_cr2_direct(void)
+{
+	return percpu_read(xen_vcpu_info.arch.cr2);
+}
+
+static void xen_flush_tlb(void)
+{
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+
+	preempt_disable();
+
+	mcs = xen_mc_entry(sizeof(*op));
+
+	op = mcs.args;
+	op->cmd = MMUEXT_TLB_FLUSH_LOCAL;
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
+
+	xen_mc_issue(PARAVIRT_LAZY_MMU);
+
+	preempt_enable();
+}
+
+static void xen_flush_tlb_single(unsigned long addr)
+{
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+
+	preempt_disable();
+
+	mcs = xen_mc_entry(sizeof(*op));
+	op = mcs.args;
+	op->cmd = MMUEXT_INVLPG_LOCAL;
+	op->arg1.linear_addr = addr & PAGE_MASK;
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
+
+	xen_mc_issue(PARAVIRT_LAZY_MMU);
+
+	preempt_enable();
+}
+
+static void xen_flush_tlb_others(const struct cpumask *cpus,
+				 struct mm_struct *mm, unsigned long va)
+{
+	struct {
+		struct mmuext_op op;
+		DECLARE_BITMAP(mask, NR_CPUS);
+	} *args;
+	struct multicall_space mcs;
+
+	BUG_ON(cpumask_empty(cpus));
+	BUG_ON(!mm);
+
+	mcs = xen_mc_entry(sizeof(*args));
+	args = mcs.args;
+	args->op.arg2.vcpumask = to_cpumask(args->mask);
+
+	/* Remove us, and any offline CPUS. */
+	cpumask_and(to_cpumask(args->mask), cpus, cpu_online_mask);
+	cpumask_clear_cpu(smp_processor_id(), to_cpumask(args->mask));
+
+	if (va == TLB_FLUSH_ALL) {
+		args->op.cmd = MMUEXT_TLB_FLUSH_MULTI;
+	} else {
+		args->op.cmd = MMUEXT_INVLPG_MULTI;
+		args->op.arg1.linear_addr = va;
+	}
+
+	MULTI_mmuext_op(mcs.mc, &args->op, 1, NULL, DOMID_SELF);
+
+	xen_mc_issue(PARAVIRT_LAZY_MMU);
+}
+
+static unsigned long xen_read_cr3(void)
+{
+	return percpu_read(xen_cr3);
+}
+
+static void set_current_cr3(void *v)
+{
+	percpu_write(xen_current_cr3, (unsigned long)v);
+}
+
+static void __xen_write_cr3(bool kernel, unsigned long cr3)
+{
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+	unsigned long mfn;
+
+	if (cr3)
+		mfn = pfn_to_mfn(PFN_DOWN(cr3));
+	else
+		mfn = 0;
+
+	WARN_ON(mfn == 0 && kernel);
+
+	mcs = __xen_mc_entry(sizeof(*op));
+
+	op = mcs.args;
+	op->cmd = kernel ? MMUEXT_NEW_BASEPTR : MMUEXT_NEW_USER_BASEPTR;
+	op->arg1.mfn = mfn;
+
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
+
+	if (kernel) {
+		percpu_write(xen_cr3, cr3);
+
+		/* Update xen_current_cr3 once the batch has actually
+		   been submitted. */
+		xen_mc_callback(set_current_cr3, (void *)cr3);
+	}
+}
+
+static void xen_write_cr3(unsigned long cr3)
+{
+	BUG_ON(preemptible());
+
+	xen_mc_batch();  /* disables interrupts */
+
+	/* Update while interrupts are disabled, so its atomic with
+	   respect to ipis */
+	percpu_write(xen_cr3, cr3);
+
+	__xen_write_cr3(true, cr3);
+
+#ifdef CONFIG_X86_64
+	{
+		pgd_t *user_pgd = xen_get_user_pgd(__va(cr3));
+		if (user_pgd)
+			__xen_write_cr3(false, __pa(user_pgd));
+		else
+			__xen_write_cr3(false, 0);
+	}
+#endif
+
+	xen_mc_issue(PARAVIRT_LAZY_CPU);  /* interrupts restored */
+}
+
+static int xen_pgd_alloc(struct mm_struct *mm)
+{
+	pgd_t *pgd = mm->pgd;
+	int ret = 0;
+
+	BUG_ON(PagePinned(virt_to_page(pgd)));
+
+#ifdef CONFIG_X86_64
+	{
+		struct page *page = virt_to_page(pgd);
+		pgd_t *user_pgd;
+
+		BUG_ON(page->private != 0);
+
+		ret = -ENOMEM;
+
+		user_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		page->private = (unsigned long)user_pgd;
+
+		if (user_pgd != NULL) {
+			user_pgd[pgd_index(VSYSCALL_START)] =
+				__pgd(__pa(level3_user_vsyscall) | _PAGE_TABLE);
+			ret = 0;
+		}
+
+		BUG_ON(PagePinned(virt_to_page(xen_get_user_pgd(pgd))));
+	}
+#endif
+
+	return ret;
+}
+
+static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+#ifdef CONFIG_X86_64
+	pgd_t *user_pgd = xen_get_user_pgd(pgd);
+
+	if (user_pgd)
+		free_page((unsigned long)user_pgd);
+#endif
+}
+
+#ifdef CONFIG_HIGHPTE
+static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
+{
+	pgprot_t prot = PAGE_KERNEL;
+
+	if (PagePinned(page))
+		prot = PAGE_KERNEL_RO;
+
+	if (0 && PageHighMem(page))
+		printk("mapping highpte %lx type %d prot %s\n",
+		       page_to_pfn(page), type,
+		       (unsigned long)pgprot_val(prot) & _PAGE_RW ? "WRITE" : "READ");
+
+	return kmap_atomic_prot(page, type, prot);
+}
+#endif
+
+#ifdef CONFIG_X86_32
+static __init pte_t mask_rw_pte(pte_t *ptep, pte_t pte)
+{
+	/* If there's an existing pte, then don't allow _PAGE_RW to be set */
+	if (pte_val_ma(*ptep) & _PAGE_PRESENT)
+		pte = __pte_ma(((pte_val_ma(*ptep) & _PAGE_RW) | ~_PAGE_RW) &
+			       pte_val_ma(pte));
+
+	return pte;
+}
+
+/* Init-time set_pte while constructing initial pagetables, which
+   doesn't allow RO pagetable pages to be remapped RW */
+static __init void xen_set_pte_init(pte_t *ptep, pte_t pte)
+{
+	pte = mask_rw_pte(ptep, pte);
+
+	xen_set_pte(ptep, pte);
+}
+#endif
+
+/* Early in boot, while setting up the initial pagetable, assume
+   everything is pinned. */
+static __init void xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn)
+{
+#ifdef CONFIG_FLATMEM
+	BUG_ON(mem_map);	/* should only be used early */
+#endif
+	make_lowmem_page_readonly(__va(PFN_PHYS(pfn)));
+}
+
+/* Early release_pte assumes that all pts are pinned, since there's
+   only init_mm and anything attached to that is pinned. */
+static void xen_release_pte_init(unsigned long pfn)
+{
+	make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
+}
+
+static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
+{
+	struct mmuext_op op;
+	op.cmd = cmd;
+	op.arg1.mfn = pfn_to_mfn(pfn);
+	if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
+		BUG();
+}
+
+/* This needs to make sure the new pte page is pinned iff its being
+   attached to a pinned pagetable. */
+static void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn, unsigned level)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	if (PagePinned(virt_to_page(mm->pgd))) {
+		SetPagePinned(page);
+
+		vm_unmap_aliases();
+		if (!PageHighMem(page)) {
+			make_lowmem_page_readonly(__va(PFN_PHYS((unsigned long)pfn)));
+			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+				pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
+		} else {
+			/* make sure there are no stray mappings of
+			   this page */
+			kmap_flush_unused();
+		}
+	}
+}
+
+static void xen_alloc_pte(struct mm_struct *mm, unsigned long pfn)
+{
+	xen_alloc_ptpage(mm, pfn, PT_PTE);
+}
+
+static void xen_alloc_pmd(struct mm_struct *mm, unsigned long pfn)
+{
+	xen_alloc_ptpage(mm, pfn, PT_PMD);
+}
+
+/* This should never happen until we're OK to use struct page */
+static void xen_release_ptpage(unsigned long pfn, unsigned level)
+{
+	struct page *page = pfn_to_page(pfn);
+
+	if (PagePinned(page)) {
+		if (!PageHighMem(page)) {
+			if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+				pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
+			make_lowmem_page_readwrite(__va(PFN_PHYS(pfn)));
+		}
+		ClearPagePinned(page);
+	}
+}
+
+static void xen_release_pte(unsigned long pfn)
+{
+	xen_release_ptpage(pfn, PT_PTE);
+}
+
+static void xen_release_pmd(unsigned long pfn)
+{
+	xen_release_ptpage(pfn, PT_PMD);
+}
+
+#if PAGETABLE_LEVELS == 4
+static void xen_alloc_pud(struct mm_struct *mm, unsigned long pfn)
+{
+	xen_alloc_ptpage(mm, pfn, PT_PUD);
+}
+
+static void xen_release_pud(unsigned long pfn)
+{
+	xen_release_ptpage(pfn, PT_PUD);
+}
+#endif
+
+void __init xen_reserve_top(void)
+{
+#ifdef CONFIG_X86_32
+	unsigned long top = HYPERVISOR_VIRT_START;
+	struct xen_platform_parameters pp;
+
+	if (HYPERVISOR_xen_version(XENVER_platform_parameters, &pp) == 0)
+		top = pp.virt_start;
+
+	reserve_top_address(-top);
+#endif	/* CONFIG_X86_32 */
+}
+
+/*
+ * Like __va(), but returns address in the kernel mapping (which is
+ * all we have until the physical memory mapping has been set up.
+ */
+static void *__ka(phys_addr_t paddr)
+{
+#ifdef CONFIG_X86_64
+	return (void *)(paddr + __START_KERNEL_map);
+#else
+	return __va(paddr);
+#endif
+}
+
+/* Convert a machine address to physical address */
+static unsigned long m2p(phys_addr_t maddr)
+{
+	phys_addr_t paddr;
+
+	maddr &= PTE_PFN_MASK;
+	paddr = mfn_to_pfn(maddr >> PAGE_SHIFT) << PAGE_SHIFT;
+
+	return paddr;
+}
+
+/* Convert a machine address to kernel virtual */
+static void *m2v(phys_addr_t maddr)
+{
+	return __ka(m2p(maddr));
+}
+
+static void set_page_prot(void *addr, pgprot_t prot)
+{
+	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
+	pte_t pte = pfn_pte(pfn, prot);
+
+	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
+		BUG();
+}
+
+static __init void xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
+{
+	unsigned pmdidx, pteidx;
+	unsigned ident_pte;
+	unsigned long pfn;
+
+	ident_pte = 0;
+	pfn = 0;
+	for (pmdidx = 0; pmdidx < PTRS_PER_PMD && pfn < max_pfn; pmdidx++) {
+		pte_t *pte_page;
+
+		/* Reuse or allocate a page of ptes */
+		if (pmd_present(pmd[pmdidx]))
+			pte_page = m2v(pmd[pmdidx].pmd);
+		else {
+			/* Check for free pte pages */
+			if (ident_pte == ARRAY_SIZE(level1_ident_pgt))
+				break;
+
+			pte_page = &level1_ident_pgt[ident_pte];
+			ident_pte += PTRS_PER_PTE;
+
+			pmd[pmdidx] = __pmd(__pa(pte_page) | _PAGE_TABLE);
+		}
+
+		/* Install mappings */
+		for (pteidx = 0; pteidx < PTRS_PER_PTE; pteidx++, pfn++) {
+			pte_t pte;
+
+			if (pfn > max_pfn_mapped)
+				max_pfn_mapped = pfn;
+
+			if (!pte_none(pte_page[pteidx]))
+				continue;
+
+			pte = pfn_pte(pfn, PAGE_KERNEL_EXEC);
+			pte_page[pteidx] = pte;
+		}
+	}
+
+	for (pteidx = 0; pteidx < ident_pte; pteidx += PTRS_PER_PTE)
+		set_page_prot(&level1_ident_pgt[pteidx], PAGE_KERNEL_RO);
+
+	set_page_prot(pmd, PAGE_KERNEL_RO);
+}
+
+#ifdef CONFIG_X86_64
+static void convert_pfn_mfn(void *v)
+{
+	pte_t *pte = v;
+	int i;
+
+	/* All levels are converted the same way, so just treat them
+	   as ptes. */
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		pte[i] = xen_make_pte(pte[i].pte);
+}
+
+/*
+ * Set up the inital kernel pagetable.
+ *
+ * We can construct this by grafting the Xen provided pagetable into
+ * head_64.S's preconstructed pagetables.  We copy the Xen L2's into
+ * level2_ident_pgt, level2_kernel_pgt and level2_fixmap_pgt.  This
+ * means that only the kernel has a physical mapping to start with -
+ * but that's enough to get __va working.  We need to fill in the rest
+ * of the physical mapping once some sort of allocator has been set
+ * up.
+ */
+__init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
+					 unsigned long max_pfn)
+{
+	pud_t *l3;
+	pmd_t *l2;
+
+	/* Zap identity mapping */
+	init_level4_pgt[0] = __pgd(0);
+
+	/* Pre-constructed entries are in pfn, so convert to mfn */
+	convert_pfn_mfn(init_level4_pgt);
+	convert_pfn_mfn(level3_ident_pgt);
+	convert_pfn_mfn(level3_kernel_pgt);
+
+	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
+	l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
+
+	memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+	memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+
+	l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
+	l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
+	memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+
+	/* Set up identity map */
+	xen_map_identity_early(level2_ident_pgt, max_pfn);
+
+	/* Make pagetable pieces RO */
+	set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
+	set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
+	set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
+	set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
+	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
+	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
+
+	/* Pin down new L4 */
+	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
+			  PFN_DOWN(__pa_symbol(init_level4_pgt)));
+
+	/* Unpin Xen-provided one */
+	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
+
+	/* Switch over */
+	pgd = init_level4_pgt;
+
+	/*
+	 * At this stage there can be no user pgd, and no page
+	 * structure to attach it to, so make sure we just set kernel
+	 * pgd.
+	 */
+	xen_mc_batch();
+	__xen_write_cr3(true, __pa(pgd));
+	xen_mc_issue(PARAVIRT_LAZY_CPU);
+
+	reserve_early(__pa(xen_start_info->pt_base),
+		      __pa(xen_start_info->pt_base +
+			   xen_start_info->nr_pt_frames * PAGE_SIZE),
+		      "XEN PAGETABLES");
+
+	return pgd;
+}
+#else	/* !CONFIG_X86_64 */
+static pmd_t level2_kernel_pgt[PTRS_PER_PMD] __page_aligned_bss;
+
+__init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
+					 unsigned long max_pfn)
+{
+	pmd_t *kernel_pmd;
+
+	init_pg_tables_start = __pa(pgd);
+	init_pg_tables_end = __pa(pgd) + xen_start_info->nr_pt_frames*PAGE_SIZE;
+	max_pfn_mapped = PFN_DOWN(init_pg_tables_end + 512*1024);
+
+	kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
+	memcpy(level2_kernel_pgt, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
+
+	xen_map_identity_early(level2_kernel_pgt, max_pfn);
+
+	memcpy(swapper_pg_dir, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
+	set_pgd(&swapper_pg_dir[KERNEL_PGD_BOUNDARY],
+			__pgd(__pa(level2_kernel_pgt) | _PAGE_PRESENT));
+
+	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
+	set_page_prot(swapper_pg_dir, PAGE_KERNEL_RO);
+	set_page_prot(empty_zero_page, PAGE_KERNEL_RO);
+
+	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
+
+	xen_write_cr3(__pa(swapper_pg_dir));
+
+	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
+
+	return swapper_pg_dir;
+}
+#endif	/* CONFIG_X86_64 */
+
+static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
+{
+	pte_t pte;
+
+	phys >>= PAGE_SHIFT;
+
+	switch (idx) {
+	case FIX_BTMAP_END ... FIX_BTMAP_BEGIN:
+#ifdef CONFIG_X86_F00F_BUG
+	case FIX_F00F_IDT:
+#endif
+#ifdef CONFIG_X86_32
+	case FIX_WP_TEST:
+	case FIX_VDSO:
+# ifdef CONFIG_HIGHMEM
+	case FIX_KMAP_BEGIN ... FIX_KMAP_END:
+# endif
+#else
+	case VSYSCALL_LAST_PAGE ... VSYSCALL_FIRST_PAGE:
+#endif
+#ifdef CONFIG_X86_LOCAL_APIC
+	case FIX_APIC_BASE:	/* maps dummy local APIC */
+#endif
+		pte = pfn_pte(phys, prot);
+		break;
+
+	default:
+		pte = mfn_pte(phys, prot);
+		break;
+	}
+
+	__native_set_fixmap(idx, pte);
+
+#ifdef CONFIG_X86_64
+	/* Replicate changes to map the vsyscall page into the user
+	   pagetable vsyscall mapping. */
+	if (idx >= VSYSCALL_LAST_PAGE && idx <= VSYSCALL_FIRST_PAGE) {
+		unsigned long vaddr = __fix_to_virt(idx);
+		set_pte_vaddr_pud(level3_user_vsyscall, vaddr, pte);
+	}
+#endif
+}
+
+__init void xen_post_allocator_init(void)
+{
+	pv_mmu_ops.set_pte = xen_set_pte;
+	pv_mmu_ops.set_pmd = xen_set_pmd;
+	pv_mmu_ops.set_pud = xen_set_pud;
+#if PAGETABLE_LEVELS == 4
+	pv_mmu_ops.set_pgd = xen_set_pgd;
+#endif
+
+	/* This will work as long as patching hasn't happened yet
+	   (which it hasn't) */
+	pv_mmu_ops.alloc_pte = xen_alloc_pte;
+	pv_mmu_ops.alloc_pmd = xen_alloc_pmd;
+	pv_mmu_ops.release_pte = xen_release_pte;
+	pv_mmu_ops.release_pmd = xen_release_pmd;
+#if PAGETABLE_LEVELS == 4
+	pv_mmu_ops.alloc_pud = xen_alloc_pud;
+	pv_mmu_ops.release_pud = xen_release_pud;
+#endif
+
+#ifdef CONFIG_X86_64
+	SetPagePinned(virt_to_page(level3_user_vsyscall));
+#endif
+	xen_mark_init_mm_pinned();
+}
+
+
+const struct pv_mmu_ops xen_mmu_ops __initdata = {
+	.pagetable_setup_start = xen_pagetable_setup_start,
+	.pagetable_setup_done = xen_pagetable_setup_done,
+
+	.read_cr2 = xen_read_cr2,
+	.write_cr2 = xen_write_cr2,
+
+	.read_cr3 = xen_read_cr3,
+	.write_cr3 = xen_write_cr3,
+
+	.flush_tlb_user = xen_flush_tlb,
+	.flush_tlb_kernel = xen_flush_tlb,
+	.flush_tlb_single = xen_flush_tlb_single,
+	.flush_tlb_others = xen_flush_tlb_others,
+
+	.pte_update = paravirt_nop,
+	.pte_update_defer = paravirt_nop,
+
+	.pgd_alloc = xen_pgd_alloc,
+	.pgd_free = xen_pgd_free,
+
+	.alloc_pte = xen_alloc_pte_init,
+	.release_pte = xen_release_pte_init,
+	.alloc_pmd = xen_alloc_pte_init,
+	.alloc_pmd_clone = paravirt_nop,
+	.release_pmd = xen_release_pte_init,
+
+#ifdef CONFIG_HIGHPTE
+	.kmap_atomic_pte = xen_kmap_atomic_pte,
+#endif
+
+#ifdef CONFIG_X86_64
+	.set_pte = xen_set_pte,
+#else
+	.set_pte = xen_set_pte_init,
+#endif
+	.set_pte_at = xen_set_pte_at,
+	.set_pmd = xen_set_pmd_hyper,
+
+	.ptep_modify_prot_start = __ptep_modify_prot_start,
+	.ptep_modify_prot_commit = __ptep_modify_prot_commit,
+
+	.pte_val = PV_CALLEE_SAVE(xen_pte_val),
+	.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
+
+	.make_pte = PV_CALLEE_SAVE(xen_make_pte),
+	.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
+
+#ifdef CONFIG_X86_PAE
+	.set_pte_atomic = xen_set_pte_atomic,
+	.set_pte_present = xen_set_pte_at,
+	.pte_clear = xen_pte_clear,
+	.pmd_clear = xen_pmd_clear,
+#endif	/* CONFIG_X86_PAE */
+	.set_pud = xen_set_pud_hyper,
+
+	.make_pmd = PV_CALLEE_SAVE(xen_make_pmd),
+	.pmd_val = PV_CALLEE_SAVE(xen_pmd_val),
+
+#if PAGETABLE_LEVELS == 4
+	.pud_val = PV_CALLEE_SAVE(xen_pud_val),
+	.make_pud = PV_CALLEE_SAVE(xen_make_pud),
+	.set_pgd = xen_set_pgd_hyper,
+
+	.alloc_pud = xen_alloc_pte_init,
+	.release_pud = xen_release_pte_init,
+#endif	/* PAGETABLE_LEVELS == 4 */
+
+	.activate_mm = xen_activate_mm,
+	.dup_mmap = xen_dup_mmap,
+	.exit_mmap = xen_exit_mmap,
+
+	.lazy_mode = {
+		.enter = paravirt_enter_lazy_mmu,
+		.leave = xen_leave_lazy,
+	},
+
+	.set_fixmap = xen_set_fixmap,
+};
+
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_mmu_debug;
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index 98d71659da5a..24d1b44a337d 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -54,4 +54,7 @@ pte_t xen_ptep_modify_prot_start(struct mm_struct *mm, unsigned long addr, pte_t
 void  xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
 				  pte_t *ptep, pte_t pte);
 
+unsigned long xen_read_cr2_direct(void);
+
+extern const struct pv_mmu_ops xen_mmu_ops;
 #endif	/* _XEN_MMU_H */
diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
index c738644b5435..8bff7e7c290b 100644
--- a/arch/x86/xen/multicalls.c
+++ b/arch/x86/xen/multicalls.c
@@ -39,6 +39,7 @@ struct mc_buffer {
 	struct multicall_entry entries[MC_BATCH];
 #if MC_DEBUG
 	struct multicall_entry debug[MC_BATCH];
+	void *caller[MC_BATCH];
 #endif
 	unsigned char args[MC_ARGS];
 	struct callback {
@@ -154,11 +155,12 @@ void xen_mc_flush(void)
 			       ret, smp_processor_id());
 			dump_stack();
 			for (i = 0; i < b->mcidx; i++) {
-				printk(KERN_DEBUG "  call %2d/%d: op=%lu arg=[%lx] result=%ld\n",
+				printk(KERN_DEBUG "  call %2d/%d: op=%lu arg=[%lx] result=%ld\t%pF\n",
 				       i+1, b->mcidx,
 				       b->debug[i].op,
 				       b->debug[i].args[0],
-				       b->entries[i].result);
+				       b->entries[i].result,
+				       b->caller[i]);
 			}
 		}
 #endif
@@ -168,8 +170,6 @@ void xen_mc_flush(void)
 	} else
 		BUG_ON(b->argidx != 0);
 
-	local_irq_restore(flags);
-
 	for (i = 0; i < b->cbidx; i++) {
 		struct callback *cb = &b->callbacks[i];
 
@@ -177,7 +177,9 @@ void xen_mc_flush(void)
 	}
 	b->cbidx = 0;
 
-	BUG_ON(ret);
+	local_irq_restore(flags);
+
+	WARN_ON(ret);
 }
 
 struct multicall_space __xen_mc_entry(size_t args)
@@ -197,6 +199,9 @@ struct multicall_space __xen_mc_entry(size_t args)
 	}
 
 	ret.mc = &b->entries[b->mcidx];
+#ifdef MC_DEBUG
+	b->caller[b->mcidx] = __builtin_return_address(0);
+#endif
 	b->mcidx++;
 	ret.args = &b->args[argidx];
 	b->argidx = argidx + args;
diff --git a/arch/x86/xen/multicalls.h b/arch/x86/xen/multicalls.h
index fa3e10725d98..9e565da5d1f7 100644
--- a/arch/x86/xen/multicalls.h
+++ b/arch/x86/xen/multicalls.h
@@ -41,7 +41,7 @@ static inline void xen_mc_issue(unsigned mode)
 		xen_mc_flush();
 
 	/* restore flags saved in xen_mc_batch */
-	local_irq_restore(x86_read_percpu(xen_mc_irq_flags));
+	local_irq_restore(percpu_read(xen_mc_irq_flags));
 }
 
 /* Set up a callback to be called when the current batch is flushed */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c44e2069c7c7..8d470562ffc9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -50,11 +50,7 @@ static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id);
  */
 static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
 {
-#ifdef CONFIG_X86_32
-	__get_cpu_var(irq_stat).irq_resched_count++;
-#else
-	add_pda(irq_resched_count, 1);
-#endif
+	inc_irq_stat(irq_resched_count);
 
 	return IRQ_HANDLED;
 }
@@ -78,7 +74,7 @@ static __cpuinit void cpu_bringup(void)
 	xen_setup_cpu_clockevents();
 
 	cpu_set(cpu, cpu_online_map);
-	x86_write_percpu(cpu_state, CPU_ONLINE);
+	percpu_write(cpu_state, CPU_ONLINE);
 	wmb();
 
 	/* We can take interrupts now: we're officially "up". */
@@ -174,7 +170,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
 	/* We've switched to the "real" per-cpu gdt, so make sure the
 	   old memory can be recycled */
-	make_lowmem_page_readwrite(&per_cpu_var(gdt_page));
+	make_lowmem_page_readwrite(xen_initial_gdt);
 
 	xen_setup_vcpu_info_placement();
 }
@@ -223,6 +219,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 {
 	struct vcpu_guest_context *ctxt;
 	struct desc_struct *gdt;
+	unsigned long gdt_mfn;
 
 	if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map))
 		return 0;
@@ -239,6 +236,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
+#else
+	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 	ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
@@ -250,9 +249,12 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 	ctxt->ldt_ents = 0;
 
 	BUG_ON((unsigned long)gdt & ~PAGE_MASK);
+
+	gdt_mfn = arbitrary_virt_to_mfn(gdt);
 	make_lowmem_page_readonly(gdt);
+	make_lowmem_page_readonly(mfn_to_virt(gdt_mfn));
 
-	ctxt->gdt_frames[0] = virt_to_mfn(gdt);
+	ctxt->gdt_frames[0] = gdt_mfn;
 	ctxt->gdt_ents      = GDT_ENTRIES;
 
 	ctxt->user_regs.cs = __KERNEL_CS;
@@ -283,23 +285,14 @@ static int __cpuinit xen_cpu_up(unsigned int cpu)
 	struct task_struct *idle = idle_task(cpu);
 	int rc;
 
-#ifdef CONFIG_X86_64
-	/* Allocate node local memory for AP pdas */
-	WARN_ON(cpu == 0);
-	if (cpu > 0) {
-		rc = get_local_pda(cpu);
-		if (rc)
-			return rc;
-	}
-#endif
-
-#ifdef CONFIG_X86_32
-	init_gdt(cpu);
 	per_cpu(current_task, cpu) = idle;
+#ifdef CONFIG_X86_32
 	irq_ctx_init(cpu);
 #else
-	cpu_pda(cpu)->pcurrent = idle;
 	clear_tsk_thread_flag(idle, TIF_FORK);
+	per_cpu(kernel_stack, cpu) =
+		(unsigned long)task_stack_page(idle) -
+		KERNEL_STACK_OFFSET + THREAD_SIZE;
 #endif
 	xen_setup_timer(cpu);
 	xen_init_lock_cpu(cpu);
@@ -445,11 +438,7 @@ static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id)
 {
 	irq_enter();
 	generic_smp_call_function_interrupt();
-#ifdef CONFIG_X86_32
-	__get_cpu_var(irq_stat).irq_call_count++;
-#else
-	add_pda(irq_call_count, 1);
-#endif
+	inc_irq_stat(irq_call_count);
 	irq_exit();
 
 	return IRQ_HANDLED;
@@ -459,11 +448,7 @@ static irqreturn_t xen_call_function_single_interrupt(int irq, void *dev_id)
 {
 	irq_enter();
 	generic_smp_call_function_single_interrupt();
-#ifdef CONFIG_X86_32
-	__get_cpu_var(irq_stat).irq_call_count++;
-#else
-	add_pda(irq_call_count, 1);
-#endif
+	inc_irq_stat(irq_call_count);
 	irq_exit();
 
 	return IRQ_HANDLED;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 212ffe012b76..95be7b434724 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -6,6 +6,7 @@
 
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
+#include <asm/fixmap.h>
 
 #include "xen-ops.h"
 #include "mmu.h"
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
new file mode 100644
index 000000000000..79d7362ad6d1
--- /dev/null
+++ b/arch/x86/xen/xen-asm.S
@@ -0,0 +1,142 @@
+/*
+ * Asm versions of Xen pv-ops, suitable for either direct use or
+ * inlining.  The inline versions are the same as the direct-use
+ * versions, with the pre- and post-amble chopped off.
+ *
+ * This code is encoded for size rather than absolute efficiency, with
+ * a view to being able to inline as much as possible.
+ *
+ * We only bother with direct forms (ie, vcpu in percpu data) of the
+ * operations here; the indirect forms are better handled in C, since
+ * they're generally too large to inline anyway.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/percpu.h>
+#include <asm/processor-flags.h>
+
+#include "xen-asm.h"
+
+/*
+ * Enable events.  This clears the event mask and tests the pending
+ * event status with one and operation.  If there are pending events,
+ * then enter the hypervisor to get them handled.
+ */
+ENTRY(xen_irq_enable_direct)
+	/* Unmask events */
+	movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+
+	/*
+	 * Preempt here doesn't matter because that will deal with any
+	 * pending interrupts.  The pending check may end up being run
+	 * on the wrong CPU, but that doesn't hurt.
+	 */
+
+	/* Test for pending */
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	jz 1f
+
+2:	call check_events
+1:
+ENDPATCH(xen_irq_enable_direct)
+	ret
+	ENDPROC(xen_irq_enable_direct)
+	RELOC(xen_irq_enable_direct, 2b+1)
+
+
+/*
+ * Disabling events is simply a matter of making the event mask
+ * non-zero.
+ */
+ENTRY(xen_irq_disable_direct)
+	movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+ENDPATCH(xen_irq_disable_direct)
+	ret
+	ENDPROC(xen_irq_disable_direct)
+	RELOC(xen_irq_disable_direct, 0)
+
+/*
+ * (xen_)save_fl is used to get the current interrupt enable status.
+ * Callers expect the status to be in X86_EFLAGS_IF, and other bits
+ * may be set in the return value.  We take advantage of this by
+ * making sure that X86_EFLAGS_IF has the right value (and other bits
+ * in that byte are 0), but other bits in the return value are
+ * undefined.  We need to toggle the state of the bit, because Xen and
+ * x86 use opposite senses (mask vs enable).
+ */
+ENTRY(xen_save_fl_direct)
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	setz %ah
+	addb %ah, %ah
+ENDPATCH(xen_save_fl_direct)
+	ret
+	ENDPROC(xen_save_fl_direct)
+	RELOC(xen_save_fl_direct, 0)
+
+
+/*
+ * In principle the caller should be passing us a value return from
+ * xen_save_fl_direct, but for robustness sake we test only the
+ * X86_EFLAGS_IF flag rather than the whole byte. After setting the
+ * interrupt mask state, it checks for unmasked pending events and
+ * enters the hypervisor to get them delivered if so.
+ */
+ENTRY(xen_restore_fl_direct)
+#ifdef CONFIG_X86_64
+	testw $X86_EFLAGS_IF, %di
+#else
+	testb $X86_EFLAGS_IF>>8, %ah
+#endif
+	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	/*
+	 * Preempt here doesn't matter because that will deal with any
+	 * pending interrupts.  The pending check may end up being run
+	 * on the wrong CPU, but that doesn't hurt.
+	 */
+
+	/* check for unmasked and pending */
+	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	jz 1f
+2:	call check_events
+1:
+ENDPATCH(xen_restore_fl_direct)
+	ret
+	ENDPROC(xen_restore_fl_direct)
+	RELOC(xen_restore_fl_direct, 2b+1)
+
+
+/*
+ * Force an event check by making a hypercall, but preserve regs
+ * before making the call.
+ */
+check_events:
+#ifdef CONFIG_X86_32
+	push %eax
+	push %ecx
+	push %edx
+	call xen_force_evtchn_callback
+	pop %edx
+	pop %ecx
+	pop %eax
+#else
+	push %rax
+	push %rcx
+	push %rdx
+	push %rsi
+	push %rdi
+	push %r8
+	push %r9
+	push %r10
+	push %r11
+	call xen_force_evtchn_callback
+	pop %r11
+	pop %r10
+	pop %r9
+	pop %r8
+	pop %rdi
+	pop %rsi
+	pop %rdx
+	pop %rcx
+	pop %rax
+#endif
+	ret
diff --git a/arch/x86/xen/xen-asm.h b/arch/x86/xen/xen-asm.h
new file mode 100644
index 000000000000..465276467a47
--- /dev/null
+++ b/arch/x86/xen/xen-asm.h
@@ -0,0 +1,12 @@
+#ifndef _XEN_XEN_ASM_H
+#define _XEN_XEN_ASM_H
+
+#include <linux/linkage.h>
+
+#define RELOC(x, v)	.globl x##_reloc; x##_reloc=v
+#define ENDPATCH(x)	.globl x##_end; x##_end=.
+
+/* Pseudo-flag used for virtual NMI, which we don't implement yet */
+#define XEN_EFLAGS_NMI	0x80000000
+
+#endif
diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index 42786f59d9c0..88e15deb8b82 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -1,117 +1,43 @@
 /*
-	Asm versions of Xen pv-ops, suitable for either direct use or inlining.
-	The inline versions are the same as the direct-use versions, with the
-	pre- and post-amble chopped off.
-
-	This code is encoded for size rather than absolute efficiency,
-	with a view to being able to inline as much as possible.
-
-	We only bother with direct forms (ie, vcpu in pda) of the operations
-	here; the indirect forms are better handled in C, since they're
-	generally too large to inline anyway.
+ * Asm versions of Xen pv-ops, suitable for either direct use or
+ * inlining.  The inline versions are the same as the direct-use
+ * versions, with the pre- and post-amble chopped off.
+ *
+ * This code is encoded for size rather than absolute efficiency, with
+ * a view to being able to inline as much as possible.
+ *
+ * We only bother with direct forms (ie, vcpu in pda) of the
+ * operations here; the indirect forms are better handled in C, since
+ * they're generally too large to inline anyway.
  */
 
-#include <linux/linkage.h>
-
-#include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
-#include <asm/percpu.h>
 #include <asm/processor-flags.h>
 #include <asm/segment.h>
 
 #include <xen/interface/xen.h>
 
-#define RELOC(x, v)	.globl x##_reloc; x##_reloc=v
-#define ENDPATCH(x)	.globl x##_end; x##_end=.
-
-/* Pseudo-flag used for virtual NMI, which we don't implement yet */
-#define XEN_EFLAGS_NMI	0x80000000
-
-/*
-	Enable events.  This clears the event mask and tests the pending
-	event status with one and operation.  If there are pending
-	events, then enter the hypervisor to get them handled.
- */
-ENTRY(xen_irq_enable_direct)
-	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_mask
-
-	/* Preempt here doesn't matter because that will deal with
-	   any pending interrupts.  The pending check may end up being
-	   run on the wrong CPU, but that doesn't hurt. */
-
-	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_pending
-	jz 1f
-
-2:	call check_events
-1:
-ENDPATCH(xen_irq_enable_direct)
-	ret
-	ENDPROC(xen_irq_enable_direct)
-	RELOC(xen_irq_enable_direct, 2b+1)
-
-
-/*
-	Disabling events is simply a matter of making the event mask
-	non-zero.
- */
-ENTRY(xen_irq_disable_direct)
-	movb $1, PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_mask
-ENDPATCH(xen_irq_disable_direct)
-	ret
-	ENDPROC(xen_irq_disable_direct)
-	RELOC(xen_irq_disable_direct, 0)
+#include "xen-asm.h"
 
 /*
-	(xen_)save_fl is used to get the current interrupt enable status.
-	Callers expect the status to be in X86_EFLAGS_IF, and other bits
-	may be set in the return value.  We take advantage of this by
-	making sure that X86_EFLAGS_IF has the right value (and other bits
-	in that byte are 0), but other bits in the return value are
-	undefined.  We need to toggle the state of the bit, because
-	Xen and x86 use opposite senses (mask vs enable).
+ * Force an event check by making a hypercall, but preserve regs
+ * before making the call.
  */
-ENTRY(xen_save_fl_direct)
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_mask
-	setz %ah
-	addb %ah,%ah
-ENDPATCH(xen_save_fl_direct)
-	ret
-	ENDPROC(xen_save_fl_direct)
-	RELOC(xen_save_fl_direct, 0)
-
-
-/*
-	In principle the caller should be passing us a value return
-	from xen_save_fl_direct, but for robustness sake we test only
-	the X86_EFLAGS_IF flag rather than the whole byte. After
-	setting the interrupt mask state, it checks for unmasked
-	pending events and enters the hypervisor to get them delivered
-	if so.
- */
-ENTRY(xen_restore_fl_direct)
-	testb $X86_EFLAGS_IF>>8, %ah
-	setz PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_mask
-	/* Preempt here doesn't matter because that will deal with
-	   any pending interrupts.  The pending check may end up being
-	   run on the wrong CPU, but that doesn't hurt. */
-
-	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info)+XEN_vcpu_info_pending
-	jz 1f
-2:	call check_events
-1:
-ENDPATCH(xen_restore_fl_direct)
+check_events:
+	push %eax
+	push %ecx
+	push %edx
+	call xen_force_evtchn_callback
+	pop %edx
+	pop %ecx
+	pop %eax
 	ret
-	ENDPROC(xen_restore_fl_direct)
-	RELOC(xen_restore_fl_direct, 2b+1)
 
 /*
-	We can't use sysexit directly, because we're not running in ring0.
-	But we can easily fake it up using iret.  Assuming xen_sysexit
-	is jumped to with a standard stack frame, we can just strip it
-	back to a standard iret frame and use iret.
+ * We can't use sysexit directly, because we're not running in ring0.
+ * But we can easily fake it up using iret.  Assuming xen_sysexit is
+ * jumped to with a standard stack frame, we can just strip it back to
+ * a standard iret frame and use iret.
  */
 ENTRY(xen_sysexit)
 	movl PT_EAX(%esp), %eax			/* Shouldn't be necessary? */
@@ -122,33 +48,31 @@ ENTRY(xen_sysexit)
 ENDPROC(xen_sysexit)
 
 /*
-	This is run where a normal iret would be run, with the same stack setup:
-	      8: eflags
-	      4: cs
-	esp-> 0: eip
-
-	This attempts to make sure that any pending events are dealt
-	with on return to usermode, but there is a small window in
-	which an event can happen just before entering usermode.  If
-	the nested interrupt ends up setting one of the TIF_WORK_MASK
-	pending work flags, they will not be tested again before
-	returning to usermode. This means that a process can end up
-	with pending work, which will be unprocessed until the process
-	enters and leaves the kernel again, which could be an
-	unbounded amount of time.  This means that a pending signal or
-	reschedule event could be indefinitely delayed.
-
-	The fix is to notice a nested interrupt in the critical
-	window, and if one occurs, then fold the nested interrupt into
-	the current interrupt stack frame, and re-process it
-	iteratively rather than recursively.  This means that it will
-	exit via the normal path, and all pending work will be dealt
-	with appropriately.
-
-	Because the nested interrupt handler needs to deal with the
-	current stack state in whatever form its in, we keep things
-	simple by only using a single register which is pushed/popped
-	on the stack.
+ * This is run where a normal iret would be run, with the same stack setup:
+ *	8: eflags
+ *	4: cs
+ *	esp-> 0: eip
+ *
+ * This attempts to make sure that any pending events are dealt with
+ * on return to usermode, but there is a small window in which an
+ * event can happen just before entering usermode.  If the nested
+ * interrupt ends up setting one of the TIF_WORK_MASK pending work
+ * flags, they will not be tested again before returning to
+ * usermode. This means that a process can end up with pending work,
+ * which will be unprocessed until the process enters and leaves the
+ * kernel again, which could be an unbounded amount of time.  This
+ * means that a pending signal or reschedule event could be
+ * indefinitely delayed.
+ *
+ * The fix is to notice a nested interrupt in the critical window, and
+ * if one occurs, then fold the nested interrupt into the current
+ * interrupt stack frame, and re-process it iteratively rather than
+ * recursively.  This means that it will exit via the normal path, and
+ * all pending work will be dealt with appropriately.
+ *
+ * Because the nested interrupt handler needs to deal with the current
+ * stack state in whatever form its in, we keep things simple by only
+ * using a single register which is pushed/popped on the stack.
  */
 ENTRY(xen_iret)
 	/* test eflags for special cases */
@@ -158,13 +82,15 @@ ENTRY(xen_iret)
 	push %eax
 	ESP_OFFSET=4	# bytes pushed onto stack
 
-	/* Store vcpu_info pointer for easy access.  Do it this
-	   way to avoid having to reload %fs */
+	/*
+	 * Store vcpu_info pointer for easy access.  Do it this way to
+	 * avoid having to reload %fs
+	 */
 #ifdef CONFIG_SMP
 	GET_THREAD_INFO(%eax)
-	movl TI_cpu(%eax),%eax
-	movl __per_cpu_offset(,%eax,4),%eax
-	mov per_cpu__xen_vcpu(%eax),%eax
+	movl TI_cpu(%eax), %eax
+	movl __per_cpu_offset(,%eax,4), %eax
+	mov per_cpu__xen_vcpu(%eax), %eax
 #else
 	movl per_cpu__xen_vcpu, %eax
 #endif
@@ -172,37 +98,46 @@ ENTRY(xen_iret)
 	/* check IF state we're restoring */
 	testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp)
 
-	/* Maybe enable events.  Once this happens we could get a
-	   recursive event, so the critical region starts immediately
-	   afterwards.  However, if that happens we don't end up
-	   resuming the code, so we don't have to be worried about
-	   being preempted to another CPU. */
+	/*
+	 * Maybe enable events.  Once this happens we could get a
+	 * recursive event, so the critical region starts immediately
+	 * afterwards.  However, if that happens we don't end up
+	 * resuming the code, so we don't have to be worried about
+	 * being preempted to another CPU.
+	 */
 	setz XEN_vcpu_info_mask(%eax)
 xen_iret_start_crit:
 
 	/* check for unmasked and pending */
 	cmpw $0x0001, XEN_vcpu_info_pending(%eax)
 
-	/* If there's something pending, mask events again so we
-	   can jump back into xen_hypervisor_callback */
+	/*
+	 * If there's something pending, mask events again so we can
+	 * jump back into xen_hypervisor_callback
+	 */
 	sete XEN_vcpu_info_mask(%eax)
 
 	popl %eax
 
-	/* From this point on the registers are restored and the stack
-	   updated, so we don't need to worry about it if we're preempted */
+	/*
+	 * From this point on the registers are restored and the stack
+	 * updated, so we don't need to worry about it if we're
+	 * preempted
+	 */
 iret_restore_end:
 
-	/* Jump to hypervisor_callback after fixing up the stack.
-	   Events are masked, so jumping out of the critical
-	   region is OK. */
+	/*
+	 * Jump to hypervisor_callback after fixing up the stack.
+	 * Events are masked, so jumping out of the critical region is
+	 * OK.
+	 */
 	je xen_hypervisor_callback
 
 1:	iret
 xen_iret_end_crit:
-.section __ex_table,"a"
+.section __ex_table, "a"
 	.align 4
-	.long 1b,iret_exc
+	.long 1b, iret_exc
 .previous
 
 hyper_iret:
@@ -212,55 +147,55 @@ hyper_iret:
 	.globl xen_iret_start_crit, xen_iret_end_crit
 
 /*
-   This is called by xen_hypervisor_callback in entry.S when it sees
-   that the EIP at the time of interrupt was between xen_iret_start_crit
-   and xen_iret_end_crit.  We're passed the EIP in %eax so we can do
-   a more refined determination of what to do.
-
-   The stack format at this point is:
-	----------------
-	 ss		: (ss/esp may be present if we came from usermode)
-	 esp		:
-	 eflags		}  outer exception info
-	 cs		}
-	 eip		}
-	---------------- <- edi (copy dest)
-	 eax		:  outer eax if it hasn't been restored
-	----------------
-	 eflags		}  nested exception info
-	 cs		}   (no ss/esp because we're nested
-	 eip		}    from the same ring)
-	 orig_eax	}<- esi (copy src)
-	 - - - - - - - -
-	 fs		}
-	 es		}
-	 ds		}  SAVE_ALL state
-	 eax		}
-	  :		:
-	 ebx		}<- esp
-	----------------
-
-   In order to deliver the nested exception properly, we need to shift
-   everything from the return addr up to the error code so it
-   sits just under the outer exception info.  This means that when we
-   handle the exception, we do it in the context of the outer exception
-   rather than starting a new one.
-
-   The only caveat is that if the outer eax hasn't been
-   restored yet (ie, it's still on stack), we need to insert
-   its value into the SAVE_ALL state before going on, since
-   it's usermode state which we eventually need to restore.
+ * This is called by xen_hypervisor_callback in entry.S when it sees
+ * that the EIP at the time of interrupt was between
+ * xen_iret_start_crit and xen_iret_end_crit.  We're passed the EIP in
+ * %eax so we can do a more refined determination of what to do.
+ *
+ * The stack format at this point is:
+ *	----------------
+ *	 ss		: (ss/esp may be present if we came from usermode)
+ *	 esp		:
+ *	 eflags		}  outer exception info
+ *	 cs		}
+ *	 eip		}
+ *	---------------- <- edi (copy dest)
+ *	 eax		:  outer eax if it hasn't been restored
+ *	----------------
+ *	 eflags		}  nested exception info
+ *	 cs		}   (no ss/esp because we're nested
+ *	 eip		}    from the same ring)
+ *	 orig_eax	}<- esi (copy src)
+ *	 - - - - - - - -
+ *	 fs		}
+ *	 es		}
+ *	 ds		}  SAVE_ALL state
+ *	 eax		}
+ *	  :		:
+ *	 ebx		}<- esp
+ *	----------------
+ *
+ * In order to deliver the nested exception properly, we need to shift
+ * everything from the return addr up to the error code so it sits
+ * just under the outer exception info.  This means that when we
+ * handle the exception, we do it in the context of the outer
+ * exception rather than starting a new one.
+ *
+ * The only caveat is that if the outer eax hasn't been restored yet
+ * (ie, it's still on stack), we need to insert its value into the
+ * SAVE_ALL state before going on, since it's usermode state which we
+ * eventually need to restore.
  */
 ENTRY(xen_iret_crit_fixup)
 	/*
-	   Paranoia: Make sure we're really coming from kernel space.
-	   One could imagine a case where userspace jumps into the
-	   critical range address, but just before the CPU delivers a GP,
-	   it decides to deliver an interrupt instead.  Unlikely?
-	   Definitely.  Easy to avoid?  Yes.  The Intel documents
-	   explicitly say that the reported EIP for a bad jump is the
-	   jump instruction itself, not the destination, but some virtual
-	   environments get this wrong.
+	 * Paranoia: Make sure we're really coming from kernel space.
+	 * One could imagine a case where userspace jumps into the
+	 * critical range address, but just before the CPU delivers a
+	 * GP, it decides to deliver an interrupt instead.  Unlikely?
+	 * Definitely.  Easy to avoid?  Yes.  The Intel documents
+	 * explicitly say that the reported EIP for a bad jump is the
+	 * jump instruction itself, not the destination, but some
+	 * virtual environments get this wrong.
 	 */
 	movl PT_CS(%esp), %ecx
 	andl $SEGMENT_RPL_MASK, %ecx
@@ -270,15 +205,17 @@ ENTRY(xen_iret_crit_fixup)
 	lea PT_ORIG_EAX(%esp), %esi
 	lea PT_EFLAGS(%esp), %edi
 
-	/* If eip is before iret_restore_end then stack
-	   hasn't been restored yet. */
+	/*
+	 * If eip is before iret_restore_end then stack
+	 * hasn't been restored yet.
+	 */
 	cmp $iret_restore_end, %eax
 	jae 1f
 
-	movl 0+4(%edi),%eax		/* copy EAX (just above top of frame) */
+	movl 0+4(%edi), %eax		/* copy EAX (just above top of frame) */
 	movl %eax, PT_EAX(%esp)
 
-	lea ESP_OFFSET(%edi),%edi	/* move dest up over saved regs */
+	lea ESP_OFFSET(%edi), %edi	/* move dest up over saved regs */
 
 	/* set up the copy */
 1:	std
@@ -286,20 +223,6 @@ ENTRY(xen_iret_crit_fixup)
 	rep movsl
 	cld
 
-	lea 4(%edi),%esp		/* point esp to new frame */
+	lea 4(%edi), %esp		/* point esp to new frame */
 2:	jmp xen_do_upcall
 
-
-/*
-	Force an event check by making a hypercall,
-	but preserve regs before making the call.
- */
-check_events:
-	push %eax
-	push %ecx
-	push %edx
-	call xen_force_evtchn_callback
-	pop %edx
-	pop %ecx
-	pop %eax
-	ret
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 05794c566e87..02f496a8dbaa 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -1,174 +1,45 @@
 /*
-	Asm versions of Xen pv-ops, suitable for either direct use or inlining.
-	The inline versions are the same as the direct-use versions, with the
-	pre- and post-amble chopped off.
-
-	This code is encoded for size rather than absolute efficiency,
-	with a view to being able to inline as much as possible.
-
-	We only bother with direct forms (ie, vcpu in pda) of the operations
-	here; the indirect forms are better handled in C, since they're
-	generally too large to inline anyway.
+ * Asm versions of Xen pv-ops, suitable for either direct use or
+ * inlining.  The inline versions are the same as the direct-use
+ * versions, with the pre- and post-amble chopped off.
+ *
+ * This code is encoded for size rather than absolute efficiency, with
+ * a view to being able to inline as much as possible.
+ *
+ * We only bother with direct forms (ie, vcpu in pda) of the
+ * operations here; the indirect forms are better handled in C, since
+ * they're generally too large to inline anyway.
  */
 
-#include <linux/linkage.h>
-
-#include <asm/asm-offsets.h>
-#include <asm/processor-flags.h>
 #include <asm/errno.h>
+#include <asm/percpu.h>
+#include <asm/processor-flags.h>
 #include <asm/segment.h>
 
 #include <xen/interface/xen.h>
 
-#define RELOC(x, v)	.globl x##_reloc; x##_reloc=v
-#define ENDPATCH(x)	.globl x##_end; x##_end=.
-
-/* Pseudo-flag used for virtual NMI, which we don't implement yet */
-#define XEN_EFLAGS_NMI	0x80000000
-
-#if 1
-/*
-	x86-64 does not yet support direct access to percpu variables
-	via a segment override, so we just need to make sure this code
-	never gets used
- */
-#define BUG			ud2a
-#define PER_CPU_VAR(var, off)	0xdeadbeef
-#endif
-
-/*
-	Enable events.  This clears the event mask and tests the pending
-	event status with one and operation.  If there are pending
-	events, then enter the hypervisor to get them handled.
- */
-ENTRY(xen_irq_enable_direct)
-	BUG
-
-	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_mask)
-
-	/* Preempt here doesn't matter because that will deal with
-	   any pending interrupts.  The pending check may end up being
-	   run on the wrong CPU, but that doesn't hurt. */
-
-	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_pending)
-	jz 1f
-
-2:	call check_events
-1:
-ENDPATCH(xen_irq_enable_direct)
-	ret
-	ENDPROC(xen_irq_enable_direct)
-	RELOC(xen_irq_enable_direct, 2b+1)
-
-/*
-	Disabling events is simply a matter of making the event mask
-	non-zero.
- */
-ENTRY(xen_irq_disable_direct)
-	BUG
-
-	movb $1, PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_mask)
-ENDPATCH(xen_irq_disable_direct)
-	ret
-	ENDPROC(xen_irq_disable_direct)
-	RELOC(xen_irq_disable_direct, 0)
-
-/*
-	(xen_)save_fl is used to get the current interrupt enable status.
-	Callers expect the status to be in X86_EFLAGS_IF, and other bits
-	may be set in the return value.  We take advantage of this by
-	making sure that X86_EFLAGS_IF has the right value (and other bits
-	in that byte are 0), but other bits in the return value are
-	undefined.  We need to toggle the state of the bit, because
-	Xen and x86 use opposite senses (mask vs enable).
- */
-ENTRY(xen_save_fl_direct)
-	BUG
-
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_mask)
-	setz %ah
-	addb %ah,%ah
-ENDPATCH(xen_save_fl_direct)
-	ret
-	ENDPROC(xen_save_fl_direct)
-	RELOC(xen_save_fl_direct, 0)
-
-/*
-	In principle the caller should be passing us a value return
-	from xen_save_fl_direct, but for robustness sake we test only
-	the X86_EFLAGS_IF flag rather than the whole byte. After
-	setting the interrupt mask state, it checks for unmasked
-	pending events and enters the hypervisor to get them delivered
-	if so.
- */
-ENTRY(xen_restore_fl_direct)
-	BUG
-
-	testb $X86_EFLAGS_IF>>8, %ah
-	setz PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_mask)
-	/* Preempt here doesn't matter because that will deal with
-	   any pending interrupts.  The pending check may end up being
-	   run on the wrong CPU, but that doesn't hurt. */
-
-	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info, XEN_vcpu_info_pending)
-	jz 1f
-2:	call check_events
-1:
-ENDPATCH(xen_restore_fl_direct)
-	ret
-	ENDPROC(xen_restore_fl_direct)
-	RELOC(xen_restore_fl_direct, 2b+1)
-
-
-/*
-	Force an event check by making a hypercall,
-	but preserve regs before making the call.
- */
-check_events:
-	push %rax
-	push %rcx
-	push %rdx
-	push %rsi
-	push %rdi
-	push %r8
-	push %r9
-	push %r10
-	push %r11
-	call xen_force_evtchn_callback
-	pop %r11
-	pop %r10
-	pop %r9
-	pop %r8
-	pop %rdi
-	pop %rsi
-	pop %rdx
-	pop %rcx
-	pop %rax
-	ret
+#include "xen-asm.h"
 
 ENTRY(xen_adjust_exception_frame)
-	mov 8+0(%rsp),%rcx
-	mov 8+8(%rsp),%r11
+	mov 8+0(%rsp), %rcx
+	mov 8+8(%rsp), %r11
 	ret $16
 
 hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
 /*
-	Xen64 iret frame:
-
-	ss
-	rsp
-	rflags
-	cs
-	rip		<-- standard iret frame
-
-	flags
-
-	rcx		}
-	r11		}<-- pushed by hypercall page
-rsp ->	rax		}
+ * Xen64 iret frame:
+ *
+ *	ss
+ *	rsp
+ *	rflags
+ *	cs
+ *	rip		<-- standard iret frame
+ *
+ *	flags
+ *
+ *	rcx		}
+ *	r11		}<-- pushed by hypercall page
+ * rsp->rax		}
  */
 ENTRY(xen_iret)
 	pushq $0
@@ -177,8 +48,8 @@ ENDPATCH(xen_iret)
 RELOC(xen_iret, 1b+1)
 
 /*
-	sysexit is not used for 64-bit processes, so it's
-	only ever used to return to 32-bit compat userspace.
+ * sysexit is not used for 64-bit processes, so it's only ever used to
+ * return to 32-bit compat userspace.
  */
 ENTRY(xen_sysexit)
 	pushq $__USER32_DS
@@ -193,13 +64,15 @@ ENDPATCH(xen_sysexit)
 RELOC(xen_sysexit, 1b+1)
 
 ENTRY(xen_sysret64)
-	/* We're already on the usermode stack at this point, but still
-	   with the kernel gs, so we can easily switch back */
-	movq %rsp, %gs:pda_oldrsp
-	movq %gs:pda_kernelstack,%rsp
+	/*
+	 * We're already on the usermode stack at this point, but
+	 * still with the kernel gs, so we can easily switch back
+	 */
+	movq %rsp, PER_CPU_VAR(old_rsp)
+	movq PER_CPU_VAR(kernel_stack), %rsp
 
 	pushq $__USER_DS
-	pushq %gs:pda_oldrsp
+	pushq PER_CPU_VAR(old_rsp)
 	pushq %r11
 	pushq $__USER_CS
 	pushq %rcx
@@ -210,13 +83,15 @@ ENDPATCH(xen_sysret64)
 RELOC(xen_sysret64, 1b+1)
 
 ENTRY(xen_sysret32)
-	/* We're already on the usermode stack at this point, but still
-	   with the kernel gs, so we can easily switch back */
-	movq %rsp, %gs:pda_oldrsp
-	movq %gs:pda_kernelstack, %rsp
+	/*
+	 * We're already on the usermode stack at this point, but
+	 * still with the kernel gs, so we can easily switch back
+	 */
+	movq %rsp, PER_CPU_VAR(old_rsp)
+	movq PER_CPU_VAR(kernel_stack), %rsp
 
 	pushq $__USER32_DS
-	pushq %gs:pda_oldrsp
+	pushq PER_CPU_VAR(old_rsp)
 	pushq %r11
 	pushq $__USER32_CS
 	pushq %rcx
@@ -227,28 +102,27 @@ ENDPATCH(xen_sysret32)
 RELOC(xen_sysret32, 1b+1)
 
 /*
-	Xen handles syscall callbacks much like ordinary exceptions,
-	which means we have:
-	 - kernel gs
-	 - kernel rsp
-	 - an iret-like stack frame on the stack (including rcx and r11):
-		ss
-		rsp
-		rflags
-		cs
-		rip
-		r11
-	rsp->	rcx
-
-	In all the entrypoints, we undo all that to make it look
-	like a CPU-generated syscall/sysenter and jump to the normal
-	entrypoint.
+ * Xen handles syscall callbacks much like ordinary exceptions, which
+ * means we have:
+ * - kernel gs
+ * - kernel rsp
+ * - an iret-like stack frame on the stack (including rcx and r11):
+ *	ss
+ *	rsp
+ *	rflags
+ *	cs
+ *	rip
+ *	r11
+ * rsp->rcx
+ *
+ * In all the entrypoints, we undo all that to make it look like a
+ * CPU-generated syscall/sysenter and jump to the normal entrypoint.
  */
 
 .macro undo_xen_syscall
-	mov 0*8(%rsp),%rcx
-	mov 1*8(%rsp),%r11
-	mov 5*8(%rsp),%rsp
+	mov 0*8(%rsp), %rcx
+	mov 1*8(%rsp), %r11
+	mov 5*8(%rsp), %rsp
 .endm
 
 /* Normal 64-bit system call target */
@@ -275,7 +149,7 @@ ENDPROC(xen_sysenter_target)
 
 ENTRY(xen_syscall32_target)
 ENTRY(xen_sysenter_target)
-	lea 16(%rsp), %rsp	/* strip %rcx,%r11 */
+	lea 16(%rsp), %rsp	/* strip %rcx, %r11 */
 	mov $-ENOSYS, %rax
 	pushq $VGCF_in_syscall
 	jmp hypercall_iret
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 63d49a523ed3..1a5ff24e29c0 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -8,7 +8,7 @@
 
 #include <asm/boot.h>
 #include <asm/asm.h>
-#include <asm/page.h>
+#include <asm/page_types.h>
 
 #include <xen/interface/elfnote.h>
 #include <asm/xen/interface.h>
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c1f8faf0a2c5..2f5ef2632ea2 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -10,9 +10,12 @@
 extern const char xen_hypervisor_callback[];
 extern const char xen_failsafe_callback[];
 
+extern void *xen_initial_gdt;
+
 struct trap_info;
 void xen_copy_trap_info(struct trap_info *traps);
 
+DECLARE_PER_CPU(struct vcpu_info, xen_vcpu_info);
 DECLARE_PER_CPU(unsigned long, xen_cr3);
 DECLARE_PER_CPU(unsigned long, xen_current_cr3);
 
@@ -22,6 +25,13 @@ extern struct shared_info *HYPERVISOR_shared_info;
 
 void xen_setup_mfn_list_list(void);
 void xen_setup_shared_info(void);
+void xen_setup_machphys_mapping(void);
+pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
+void xen_ident_map_ISA(void);
+void xen_reserve_top(void);
+
+void xen_leave_lazy(void);
+void xen_post_allocator_init(void);
 
 char * __init xen_memory_setup(void);
 void __init xen_arch_setup(void);
diff --git a/block/blktrace.c b/block/blktrace.c
index 7cf9d1ff45a0..028120a0965a 100644
--- a/block/blktrace.c
+++ b/block/blktrace.c
@@ -363,7 +363,7 @@ int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev,
 	if (!bt->sequence)
 		goto err;
 
-	bt->msg_data = __alloc_percpu(BLK_TN_MAX_MSG);
+	bt->msg_data = __alloc_percpu(BLK_TN_MAX_MSG, __alignof__(char));
 	if (!bt->msg_data)
 		goto err;
 
diff --git a/block/cmd-filter.c b/block/cmd-filter.c
index 504b275e1b90..572bbc2f900d 100644
--- a/block/cmd-filter.c
+++ b/block/cmd-filter.c
@@ -22,6 +22,7 @@
 #include <linux/spinlock.h>
 #include <linux/capability.h>
 #include <linux/bitops.h>
+#include <linux/blkdev.h>
 
 #include <scsi/scsi.h>
 #include <linux/cdrom.h>
diff --git a/drivers/acpi/acpica/tbxface.c b/drivers/acpi/acpica/tbxface.c
index c3e841f3cde9..ab0aff3c7d6a 100644
--- a/drivers/acpi/acpica/tbxface.c
+++ b/drivers/acpi/acpica/tbxface.c
@@ -365,7 +365,7 @@ ACPI_EXPORT_SYMBOL(acpi_unload_table_id)
 
 /*******************************************************************************
  *
- * FUNCTION:    acpi_get_table
+ * FUNCTION:    acpi_get_table_with_size
  *
  * PARAMETERS:  Signature           - ACPI signature of needed table
  *              Instance            - Which instance (for SSDTs)
@@ -377,8 +377,9 @@ ACPI_EXPORT_SYMBOL(acpi_unload_table_id)
  *
  *****************************************************************************/
 acpi_status
-acpi_get_table(char *signature,
-	       u32 instance, struct acpi_table_header **out_table)
+acpi_get_table_with_size(char *signature,
+	       u32 instance, struct acpi_table_header **out_table,
+	       acpi_size *tbl_size)
 {
        u32 i;
        u32 j;
@@ -408,6 +409,7 @@ acpi_get_table(char *signature,
 		    acpi_tb_verify_table(&acpi_gbl_root_table_list.tables[i]);
 		if (ACPI_SUCCESS(status)) {
 			*out_table = acpi_gbl_root_table_list.tables[i].pointer;
+			*tbl_size = acpi_gbl_root_table_list.tables[i].length;
 		}
 
 		if (!acpi_gbl_permanent_mmap) {
@@ -420,6 +422,15 @@ acpi_get_table(char *signature,
 	return (AE_NOT_FOUND);
 }
 
+acpi_status
+acpi_get_table(char *signature,
+	       u32 instance, struct acpi_table_header **out_table)
+{
+	acpi_size tbl_size;
+
+	return acpi_get_table_with_size(signature,
+		       instance, out_table, &tbl_size);
+}
 ACPI_EXPORT_SYMBOL(acpi_get_table)
 
 /*******************************************************************************
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 1e35f342957c..eb8980d67368 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -272,14 +272,21 @@ acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 }
 EXPORT_SYMBOL_GPL(acpi_os_map_memory);
 
-void acpi_os_unmap_memory(void __iomem * virt, acpi_size size)
+void __ref acpi_os_unmap_memory(void __iomem *virt, acpi_size size)
 {
-	if (acpi_gbl_permanent_mmap) {
+	if (acpi_gbl_permanent_mmap)
 		iounmap(virt);
-	}
+	else
+		__acpi_unmap_table(virt, size);
 }
 EXPORT_SYMBOL_GPL(acpi_os_unmap_memory);
 
+void __init early_acpi_os_unmap_memory(void __iomem *virt, acpi_size size)
+{
+	if (!acpi_gbl_permanent_mmap)
+		__acpi_unmap_table(virt, size);
+}
+
 #ifdef ACPI_FUTURE_USAGE
 acpi_status
 acpi_os_get_physical_address(void *virt, acpi_physical_address * phys)
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 9cc769b587ff..68fd3d292799 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -516,12 +516,12 @@ int acpi_processor_preregister_performance(
 			continue;
 		}
 
-		if (!performance || !percpu_ptr(performance, i)) {
+		if (!performance || !per_cpu_ptr(performance, i)) {
 			retval = -EINVAL;
 			continue;
 		}
 
-		pr->performance = percpu_ptr(performance, i);
+		pr->performance = per_cpu_ptr(performance, i);
 		cpumask_set_cpu(i, pr->performance->shared_cpu_map);
 		if (acpi_processor_get_psd(pr)) {
 			retval = -EINVAL;
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index a8852952fac4..fec1ae36d431 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -181,14 +181,15 @@ acpi_table_parse_entries(char *id,
 	struct acpi_subtable_header *entry;
 	unsigned int count = 0;
 	unsigned long table_end;
+	acpi_size tbl_size;
 
 	if (!handler)
 		return -EINVAL;
 
 	if (strncmp(id, ACPI_SIG_MADT, 4) == 0)
-		acpi_get_table(id, acpi_apic_instance, &table_header);
+		acpi_get_table_with_size(id, acpi_apic_instance, &table_header, &tbl_size);
 	else
-		acpi_get_table(id, 0, &table_header);
+		acpi_get_table_with_size(id, 0, &table_header, &tbl_size);
 
 	if (!table_header) {
 		printk(KERN_WARNING PREFIX "%4.4s not present\n", id);
@@ -206,8 +207,10 @@ acpi_table_parse_entries(char *id,
 	       table_end) {
 		if (entry->type == entry_id
 		    && (!max_entries || count++ < max_entries))
-			if (handler(entry, table_end))
+			if (handler(entry, table_end)) {
+				early_acpi_os_unmap_memory((char *)table_header, tbl_size);
 				return -EINVAL;
+			}
 
 		entry = (struct acpi_subtable_header *)
 		    ((unsigned long)entry + entry->length);
@@ -217,6 +220,7 @@ acpi_table_parse_entries(char *id,
 		       "%i found\n", id, entry_id, count - max_entries, count);
 	}
 
+	early_acpi_os_unmap_memory((char *)table_header, tbl_size);
 	return count;
 }
 
@@ -241,17 +245,19 @@ acpi_table_parse_madt(enum acpi_madt_type id,
 int __init acpi_table_parse(char *id, acpi_table_handler handler)
 {
 	struct acpi_table_header *table = NULL;
+	acpi_size tbl_size;
 
 	if (!handler)
 		return -EINVAL;
 
 	if (strncmp(id, ACPI_SIG_MADT, 4) == 0)
-		acpi_get_table(id, acpi_apic_instance, &table);
+		acpi_get_table_with_size(id, acpi_apic_instance, &table, &tbl_size);
 	else
-		acpi_get_table(id, 0, &table);
+		acpi_get_table_with_size(id, 0, &table, &tbl_size);
 
 	if (table) {
 		handler(table);
+		early_acpi_os_unmap_memory(table, tbl_size);
 		return 0;
 	} else
 		return 1;
@@ -265,8 +271,9 @@ int __init acpi_table_parse(char *id, acpi_table_handler handler)
 static void __init check_multiple_madt(void)
 {
 	struct acpi_table_header *table = NULL;
+	acpi_size tbl_size;
 
-	acpi_get_table(ACPI_SIG_MADT, 2, &table);
+	acpi_get_table_with_size(ACPI_SIG_MADT, 2, &table, &tbl_size);
 	if (table) {
 		printk(KERN_WARNING PREFIX
 		       "BIOS bug: multiple APIC/MADT found,"
@@ -275,6 +282,7 @@ static void __init check_multiple_madt(void)
 		       "If \"acpi_apic_instance=%d\" works better, "
 		       "notify linux-acpi@vger.kernel.org\n",
 		       acpi_apic_instance ? 0 : 2);
+		early_acpi_os_unmap_memory(table, tbl_size);
 
 	} else
 		acpi_apic_instance = 0;
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 719ee5c1c8d9..5b257a57bc57 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -107,7 +107,7 @@ static SYSDEV_ATTR(crash_notes, 0400, show_crash_notes, NULL);
 /*
  * Print cpu online, possible, present, and system maps
  */
-static ssize_t print_cpus_map(char *buf, cpumask_t *map)
+static ssize_t print_cpus_map(char *buf, const struct cpumask *map)
 {
 	int n = cpulist_scnprintf(buf, PAGE_SIZE-2, map);
 
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index a778fb52b11f..bf6b13206d00 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -31,7 +31,10 @@
 #include <linux/hardirq.h>
 #include <linux/topology.h>
 
-#define define_one_ro(_name) 		\
+#define define_one_ro_named(_name, _func)				\
+static SYSDEV_ATTR(_name, 0444, _func, NULL)
+
+#define define_one_ro(_name)				\
 static SYSDEV_ATTR(_name, 0444, show_##_name, NULL)
 
 #define define_id_show_func(name)				\
@@ -42,8 +45,8 @@ static ssize_t show_##name(struct sys_device *dev,		\
 	return sprintf(buf, "%d\n", topology_##name(cpu));	\
 }
 
-#if defined(topology_thread_siblings) || defined(topology_core_siblings)
-static ssize_t show_cpumap(int type, cpumask_t *mask, char *buf)
+#if defined(topology_thread_cpumask) || defined(topology_core_cpumask)
+static ssize_t show_cpumap(int type, const struct cpumask *mask, char *buf)
 {
 	ptrdiff_t len = PTR_ALIGN(buf + PAGE_SIZE - 1, PAGE_SIZE) - buf;
 	int n = 0;
@@ -65,7 +68,7 @@ static ssize_t show_##name(struct sys_device *dev,			\
 			   struct sysdev_attribute *attr, char *buf)	\
 {									\
 	unsigned int cpu = dev->id;					\
-	return show_cpumap(0, &(topology_##name(cpu)), buf);		\
+	return show_cpumap(0, topology_##name(cpu), buf);		\
 }
 
 #define define_siblings_show_list(name)					\
@@ -74,7 +77,7 @@ static ssize_t show_##name##_list(struct sys_device *dev,		\
 				  char *buf)				\
 {									\
 	unsigned int cpu = dev->id;					\
-	return show_cpumap(1, &(topology_##name(cpu)), buf);		\
+	return show_cpumap(1, topology_##name(cpu), buf);		\
 }
 
 #else
@@ -82,9 +85,7 @@ static ssize_t show_##name##_list(struct sys_device *dev,		\
 static ssize_t show_##name(struct sys_device *dev,			\
 			   struct sysdev_attribute *attr, char *buf)	\
 {									\
-	unsigned int cpu = dev->id;					\
-	cpumask_t mask = topology_##name(cpu);				\
-	return show_cpumap(0, &mask, buf);				\
+	return show_cpumap(0, topology_##name(dev->id), buf);		\
 }
 
 #define define_siblings_show_list(name)					\
@@ -92,9 +93,7 @@ static ssize_t show_##name##_list(struct sys_device *dev,		\
 				  struct sysdev_attribute *attr,	\
 				  char *buf)				\
 {									\
-	unsigned int cpu = dev->id;					\
-	cpumask_t mask = topology_##name(cpu);				\
-	return show_cpumap(1, &mask, buf);				\
+	return show_cpumap(1, topology_##name(dev->id), buf);		\
 }
 #endif
 
@@ -107,13 +106,13 @@ define_one_ro(physical_package_id);
 define_id_show_func(core_id);
 define_one_ro(core_id);
 
-define_siblings_show_func(thread_siblings);
-define_one_ro(thread_siblings);
-define_one_ro(thread_siblings_list);
+define_siblings_show_func(thread_cpumask);
+define_one_ro_named(thread_siblings, show_thread_cpumask);
+define_one_ro_named(thread_siblings_list, show_thread_cpumask_list);
 
-define_siblings_show_func(core_siblings);
-define_one_ro(core_siblings);
-define_one_ro(core_siblings_list);
+define_siblings_show_func(core_cpumask);
+define_one_ro_named(core_siblings, show_core_cpumask);
+define_one_ro_named(core_siblings_list, show_core_cpumask_list);
 
 static struct attribute *default_attrs[] = {
 	&attr_physical_package_id.attr,
diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index e1129fad96dd..ee19b6e8fcb4 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -143,7 +143,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE,
 #endif
 
 #ifndef CONFIG_X86_64
-#include "mach_timer.h"
+#include <asm/mach_timer.h>
 #define PMTMR_EXPECTED_RATE \
   ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
 /*
diff --git a/drivers/clocksource/cyclone.c b/drivers/clocksource/cyclone.c
index 1bde303b970b..8615059a8729 100644
--- a/drivers/clocksource/cyclone.c
+++ b/drivers/clocksource/cyclone.c
@@ -7,7 +7,7 @@
 #include <asm/pgtable.h>
 #include <asm/io.h>
 
-#include "mach_timer.h"
+#include <asm/mach_timer.h>
 
 #define CYCLONE_CBAR_ADDR	0xFEB00CD0	/* base address ptr */
 #define CYCLONE_PMCC_OFFSET	0x51A0		/* offset to control register */
diff --git a/drivers/eisa/Kconfig b/drivers/eisa/Kconfig
index c0646576cf47..2705284f6223 100644
--- a/drivers/eisa/Kconfig
+++ b/drivers/eisa/Kconfig
@@ -3,7 +3,7 @@
 #
 config EISA_VLB_PRIMING
 	bool "Vesa Local Bus priming"
-	depends on X86_PC && EISA
+	depends on X86 && EISA
 	default n
 	---help---
 	  Activate this option if your system contains a Vesa Local
@@ -24,11 +24,11 @@ config EISA_PCI_EISA
 	  When in doubt, say Y.
 
 # Using EISA_VIRTUAL_ROOT on something other than an Alpha or
-# an X86_PC may lead to crashes...
+# an X86 may lead to crashes...
 
 config EISA_VIRTUAL_ROOT
 	bool "EISA virtual root device"
-	depends on EISA && (ALPHA || X86_PC)
+	depends on EISA && (ALPHA || X86)
 	default y
 	---help---
 	  Activate this option if your system only have EISA bus
diff --git a/drivers/firmware/dcdbas.c b/drivers/firmware/dcdbas.c
index 777fba48d2d3..3009e0171e54 100644
--- a/drivers/firmware/dcdbas.c
+++ b/drivers/firmware/dcdbas.c
@@ -244,7 +244,7 @@ static ssize_t host_control_on_shutdown_store(struct device *dev,
  */
 int dcdbas_smi_request(struct smi_cmd *smi_cmd)
 {
-	cpumask_t old_mask;
+	cpumask_var_t old_mask;
 	int ret = 0;
 
 	if (smi_cmd->magic != SMI_CMD_MAGIC) {
@@ -254,8 +254,11 @@ int dcdbas_smi_request(struct smi_cmd *smi_cmd)
 	}
 
 	/* SMI requires CPU 0 */
-	old_mask = current->cpus_allowed;
-	set_cpus_allowed_ptr(current, &cpumask_of_cpu(0));
+	if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_copy(old_mask, &current->cpus_allowed);
+	set_cpus_allowed_ptr(current, cpumask_of(0));
 	if (smp_processor_id() != 0) {
 		dev_dbg(&dcdbas_pdev->dev, "%s: failed to get CPU 0\n",
 			__func__);
@@ -275,7 +278,8 @@ int dcdbas_smi_request(struct smi_cmd *smi_cmd)
 	);
 
 out:
-	set_cpus_allowed_ptr(current, &old_mask);
+	set_cpus_allowed_ptr(current, old_mask);
+	free_cpumask_var(old_mask);
 	return ret;
 }
 
diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
index 3ab3e4a41d67..7b7ddc2d51c9 100644
--- a/drivers/firmware/iscsi_ibft.c
+++ b/drivers/firmware/iscsi_ibft.c
@@ -938,8 +938,8 @@ static int __init ibft_init(void)
 		return -ENOMEM;
 
 	if (ibft_addr) {
-		printk(KERN_INFO "iBFT detected at 0x%lx.\n",
-		       virt_to_phys((void *)ibft_addr));
+		printk(KERN_INFO "iBFT detected at 0x%llx.\n",
+		       (u64)virt_to_phys((void *)ibft_addr));
 
 		rc = ibft_check_device();
 		if (rc)
diff --git a/drivers/gpu/drm/drm_info.c b/drivers/gpu/drm/drm_info.c
index fc98952b9033..1b699768ccfb 100644
--- a/drivers/gpu/drm/drm_info.c
+++ b/drivers/gpu/drm/drm_info.c
@@ -286,9 +286,9 @@ int drm_vma_info(struct seq_file *m, void *data)
 #endif
 
 	mutex_lock(&dev->struct_mutex);
-	seq_printf(m, "vma use count: %d, high_memory = %p, 0x%08lx\n",
+	seq_printf(m, "vma use count: %d, high_memory = %p, 0x%08llx\n",
 		   atomic_read(&dev->vma_count),
-		   high_memory, virt_to_phys(high_memory));
+		   high_memory, (u64)virt_to_phys(high_memory));
 
 	list_for_each_entry(pt, &dev->vmalist, head) {
 		vma = pt->vma;
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 12876392516e..13d7674b293d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -168,7 +168,7 @@ iscsi_iser_mtask_xmit(struct iscsi_conn *conn, struct iscsi_task *task)
 {
 	int error = 0;
 
-	debug_scsi("task deq [cid %d itt 0x%x]\n", conn->id, task->itt);
+	iser_dbg("task deq [cid %d itt 0x%x]\n", conn->id, task->itt);
 
 	error = iser_send_control(conn, task);
 
@@ -195,7 +195,7 @@ iscsi_iser_task_xmit_unsol_data(struct iscsi_conn *conn,
 	/* Send data-out PDUs while there's still unsolicited data to send */
 	while (iscsi_task_has_unsol_data(task)) {
 		iscsi_prep_data_out_pdu(task, r2t, &hdr);
-		debug_scsi("Sending data-out: itt 0x%x, data count %d\n",
+		iser_dbg("Sending data-out: itt 0x%x, data count %d\n",
 			   hdr.itt, r2t->data_count);
 
 		/* the buffer description has been passed with the command */
@@ -206,7 +206,7 @@ iscsi_iser_task_xmit_unsol_data(struct iscsi_conn *conn,
 			goto iscsi_iser_task_xmit_unsol_data_exit;
 		}
 		r2t->sent += r2t->data_count;
-		debug_scsi("Need to send %d more as data-out PDUs\n",
+		iser_dbg("Need to send %d more as data-out PDUs\n",
 			   r2t->data_length - r2t->sent);
 	}
 
@@ -227,12 +227,12 @@ iscsi_iser_task_xmit(struct iscsi_task *task)
 	if (task->sc->sc_data_direction == DMA_TO_DEVICE) {
 		BUG_ON(scsi_bufflen(task->sc) == 0);
 
-		debug_scsi("cmd [itt %x total %d imm %d unsol_data %d\n",
+		iser_dbg("cmd [itt %x total %d imm %d unsol_data %d\n",
 			   task->itt, scsi_bufflen(task->sc),
 			   task->imm_count, task->unsol_r2t.data_length);
 	}
 
-	debug_scsi("task deq [cid %d itt 0x%x]\n",
+	iser_dbg("task deq [cid %d itt 0x%x]\n",
 		   conn->id, task->itt);
 
 	/* Send the cmd PDU */
@@ -397,14 +397,14 @@ static void iscsi_iser_session_destroy(struct iscsi_cls_session *cls_session)
 static struct iscsi_cls_session *
 iscsi_iser_session_create(struct iscsi_endpoint *ep,
 			  uint16_t cmds_max, uint16_t qdepth,
-			  uint32_t initial_cmdsn, uint32_t *hostno)
+			  uint32_t initial_cmdsn)
 {
 	struct iscsi_cls_session *cls_session;
 	struct iscsi_session *session;
 	struct Scsi_Host *shost;
 	struct iser_conn *ib_conn;
 
-	shost = iscsi_host_alloc(&iscsi_iser_sht, 0, ISCSI_MAX_CMD_PER_LUN);
+	shost = iscsi_host_alloc(&iscsi_iser_sht, 0, 1);
 	if (!shost)
 		return NULL;
 	shost->transportt = iscsi_iser_scsi_transport;
@@ -423,7 +423,6 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
 	if (iscsi_host_add(shost,
 			   ep ? ib_conn->device->ib_device->dma_device : NULL))
 		goto free_host;
-	*hostno = shost->host_no;
 
 	/*
 	 * we do not support setting can_queue cmd_per_lun from userspace yet
@@ -596,7 +595,7 @@ static struct scsi_host_template iscsi_iser_sht = {
 	.change_queue_depth	= iscsi_change_queue_depth,
 	.sg_tablesize           = ISCSI_ISER_SG_TABLESIZE,
 	.max_sectors		= 1024,
-	.cmd_per_lun            = ISCSI_MAX_CMD_PER_LUN,
+	.cmd_per_lun            = ISER_DEF_CMD_PER_LUN,
 	.eh_abort_handler       = iscsi_eh_abort,
 	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_target_reset_handler= iscsi_eh_target_reset,
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 861119593f2b..9d529cae1f0d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -93,7 +93,7 @@
 
 					/* support upto 512KB in one RDMA */
 #define ISCSI_ISER_SG_TABLESIZE         (0x80000 >> SHIFT_4K)
-#define ISCSI_ISER_MAX_LUN		256
+#define ISER_DEF_CMD_PER_LUN		128
 
 /* QP settings */
 /* Maximal bounds on received asynchronous PDUs */
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index e209cb8dd948..9de640200ad3 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -661,7 +661,7 @@ void iser_snd_completion(struct iser_desc *tx_desc)
 
 	if (resume_tx) {
 		iser_dbg("%ld resuming tx\n",jiffies);
-		scsi_queue_work(conn->session->host, &conn->xmitwork);
+		iscsi_conn_queue_work(conn);
 	}
 
 	if (tx_desc->type == ISCSI_TX_CONTROL) {
diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
index 35561689ff38..ea2638b41982 100644
--- a/drivers/input/keyboard/Kconfig
+++ b/drivers/input/keyboard/Kconfig
@@ -13,11 +13,11 @@ menuconfig INPUT_KEYBOARD
 if INPUT_KEYBOARD
 
 config KEYBOARD_ATKBD
-	tristate "AT keyboard" if EMBEDDED || !X86_PC
+	tristate "AT keyboard" if EMBEDDED || !X86
 	default y
 	select SERIO
 	select SERIO_LIBPS2
-	select SERIO_I8042 if X86_PC
+	select SERIO_I8042 if X86
 	select SERIO_GSCPS2 if GSC
 	help
 	  Say Y here if you want to use a standard AT or PS/2 keyboard. Usually
diff --git a/drivers/input/mouse/Kconfig b/drivers/input/mouse/Kconfig
index 9705f3a00a3d..4f38e6f7dfdd 100644
--- a/drivers/input/mouse/Kconfig
+++ b/drivers/input/mouse/Kconfig
@@ -17,7 +17,7 @@ config MOUSE_PS2
 	default y
 	select SERIO
 	select SERIO_LIBPS2
-	select SERIO_I8042 if X86_PC
+	select SERIO_I8042 if X86
 	select SERIO_GSCPS2 if GSC
 	help
 	  Say Y here if you have a PS/2 mouse connected to your system. This
diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
index 76f2b36881c3..a3d3cbab359a 100644
--- a/drivers/lguest/Kconfig
+++ b/drivers/lguest/Kconfig
@@ -1,6 +1,6 @@
 config LGUEST
 	tristate "Linux hypervisor example code"
-	depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX && !X86_VOYAGER
+	depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX
 	select HVC_DRIVER
 	---help---
 	  This is a very simple module which allows you to run
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c64e6798878a..1c484084ed4f 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -162,7 +162,7 @@ config ENCLOSURE_SERVICES
 config SGI_XP
 	tristate "Support communication between SGI SSIs"
 	depends on NET
-	depends on (IA64_GENERIC || IA64_SGI_SN2 || IA64_SGI_UV || X86_64) && SMP
+	depends on (IA64_GENERIC || IA64_SGI_SN2 || IA64_SGI_UV || X86_UV) && SMP
 	select IA64_UNCACHED_ALLOCATOR if IA64_GENERIC || IA64_SGI_SN2
 	select GENERIC_ALLOCATOR if IA64_GENERIC || IA64_SGI_SN2
 	select SGI_GRU if (IA64_GENERIC || IA64_SGI_UV || X86_64) && SMP
@@ -189,7 +189,7 @@ config HP_ILO
 
 config SGI_GRU
 	tristate "SGI GRU driver"
-	depends on (X86_64 || IA64_SGI_UV || IA64_GENERIC) && SMP
+	depends on (X86_UV || IA64_SGI_UV || IA64_GENERIC) && SMP
 	default n
 	select MMU_NOTIFIER
 	---help---
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 650983806392..c67e4e8bd62c 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -36,23 +36,11 @@
 #include <linux/interrupt.h>
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
+#include <asm/uv/uv.h>
 #include "gru.h"
 #include "grulib.h"
 #include "grutables.h"
 
-#if defined CONFIG_X86_64
-#include <asm/genapic.h>
-#include <asm/irq.h>
-#define IS_UV()		is_uv_system()
-#elif defined CONFIG_IA64
-#include <asm/system.h>
-#include <asm/sn/simulator.h>
-/* temp support for running on hardware simulator */
-#define IS_UV()		IS_MEDUSA() || ia64_platform_is("uv")
-#else
-#define IS_UV()		0
-#endif
-
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_mmrs.h>
 
@@ -381,7 +369,7 @@ static int __init gru_init(void)
 	char id[10];
 	void *gru_start_vaddr;
 
-	if (!IS_UV())
+	if (!is_uv_system())
 		return 0;
 
 #if defined CONFIG_IA64
@@ -451,7 +439,7 @@ static void __exit gru_exit(void)
 	int order = get_order(sizeof(struct gru_state) *
 			      GRU_CHIPLETS_PER_BLADE);
 
-	if (!IS_UV())
+	if (!is_uv_system())
 		return;
 
 	for (i = 0; i < GRU_CHIPLETS_PER_BLADE; i++)
diff --git a/drivers/misc/sgi-xp/xp.h b/drivers/misc/sgi-xp/xp.h
index 7b4cbd5e03e9..2275126cb334 100644
--- a/drivers/misc/sgi-xp/xp.h
+++ b/drivers/misc/sgi-xp/xp.h
@@ -15,19 +15,19 @@
 
 #include <linux/mutex.h>
 
-#ifdef CONFIG_IA64
+#if defined CONFIG_X86_UV || defined CONFIG_IA64_SGI_UV
+#include <asm/uv/uv.h>
+#define is_uv()		is_uv_system()
+#endif
+
+#ifndef is_uv
+#define is_uv()		0
+#endif
+
+#if defined CONFIG_IA64
 #include <asm/system.h>
 #include <asm/sn/arch.h>	/* defines is_shub1() and is_shub2() */
 #define is_shub()	ia64_platform_is("sn2")
-#ifdef CONFIG_IA64_SGI_UV
-#define is_uv()		ia64_platform_is("uv")
-#else
-#define is_uv()		0
-#endif
-#endif
-#ifdef CONFIG_X86_64
-#include <asm/genapic.h>
-#define is_uv()		is_uv_system()
 #endif
 
 #ifndef is_shub1
@@ -42,10 +42,6 @@
 #define is_shub()	0
 #endif
 
-#ifndef is_uv
-#define is_uv()		0
-#endif
-
 #ifdef USE_DBUG_ON
 #define DBUG_ON(condition)	BUG_ON(condition)
 #else
diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 89218f7cfaa7..6576170de962 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -318,7 +318,7 @@ xpc_hb_checker(void *ignore)
 
 	/* this thread was marked active by xpc_hb_init() */
 
-	set_cpus_allowed_ptr(current, &cpumask_of_cpu(XPC_HB_CHECK_CPU));
+	set_cpus_allowed_ptr(current, cpumask_of(XPC_HB_CHECK_CPU));
 
 	/* set our heartbeating to other partitions into motion */
 	xpc_hb_check_timeout = jiffies + (xpc_hb_check_interval * HZ);
diff --git a/drivers/mtd/nand/Kconfig b/drivers/mtd/nand/Kconfig
index 8b12e6e109d3..2ff88791cebc 100644
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@ -273,7 +273,7 @@ config MTD_NAND_CAFE
 
 config MTD_NAND_CS553X
 	tristate "NAND support for CS5535/CS5536 (AMD Geode companion chip)"
-	depends on X86_32 && (X86_PC || X86_GENERICARCH)
+	depends on X86_32
 	help
 	  The CS553x companion chips for the AMD Geode processor
 	  include NAND flash controllers with built-in hardware ECC
diff --git a/drivers/net/ne3210.c b/drivers/net/ne3210.c
index fac43fd6fc87..6a843f7350ab 100644
--- a/drivers/net/ne3210.c
+++ b/drivers/net/ne3210.c
@@ -150,7 +150,8 @@ static int __init ne3210_eisa_probe (struct device *device)
 		if (phys_mem < virt_to_phys(high_memory)) {
 			printk(KERN_CRIT "ne3210.c: Card RAM overlaps with normal memory!!!\n");
 			printk(KERN_CRIT "ne3210.c: Use EISA SCU to set card memory below 1MB,\n");
-			printk(KERN_CRIT "ne3210.c: or to an address above 0x%lx.\n", virt_to_phys(high_memory));
+			printk(KERN_CRIT "ne3210.c: or to an address above 0x%llx.\n",
+				(u64)virt_to_phys(high_memory));
 			printk(KERN_CRIT "ne3210.c: Driver NOT installed.\n");
 			retval = -EINVAL;
 			goto out3;
diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 6eff9ca6c6c8..00c23b1babca 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -894,20 +894,27 @@ static void efx_fini_io(struct efx_nic *efx)
  * interrupts across them. */
 static int efx_wanted_rx_queues(void)
 {
-	cpumask_t core_mask;
+	cpumask_var_t core_mask;
 	int count;
 	int cpu;
 
-	cpus_clear(core_mask);
+	if (!alloc_cpumask_var(&core_mask, GFP_KERNEL)) {
+		printk(KERN_WARNING
+		       "efx.c: allocation failure, irq balancing hobbled\n");
+		return 1;
+	}
+
+	cpumask_clear(core_mask);
 	count = 0;
 	for_each_online_cpu(cpu) {
-		if (!cpu_isset(cpu, core_mask)) {
+		if (!cpumask_test_cpu(cpu, core_mask)) {
 			++count;
-			cpus_or(core_mask, core_mask,
-				topology_core_siblings(cpu));
+			cpumask_or(core_mask, core_mask,
+				   topology_core_cpumask(cpu));
 		}
 	}
 
+	free_cpumask_var(core_mask);
 	return count;
 }
 
diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index 23a1b148d5b2..d4629ab2c614 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -340,10 +340,10 @@ static int falcon_alloc_special_buffer(struct efx_nic *efx,
 	nic_data->next_buffer_table += buffer->entries;
 
 	EFX_LOG(efx, "allocating special buffers %d-%d at %llx+%x "
-		"(virt %p phys %lx)\n", buffer->index,
+		"(virt %p phys %llx)\n", buffer->index,
 		buffer->index + buffer->entries - 1,
-		(unsigned long long)buffer->dma_addr, len,
-		buffer->addr, virt_to_phys(buffer->addr));
+		(u64)buffer->dma_addr, len,
+		buffer->addr, (u64)virt_to_phys(buffer->addr));
 
 	return 0;
 }
@@ -355,10 +355,10 @@ static void falcon_free_special_buffer(struct efx_nic *efx,
 		return;
 
 	EFX_LOG(efx, "deallocating special buffers %d-%d at %llx+%x "
-		"(virt %p phys %lx)\n", buffer->index,
+		"(virt %p phys %llx)\n", buffer->index,
 		buffer->index + buffer->entries - 1,
-		(unsigned long long)buffer->dma_addr, buffer->len,
-		buffer->addr, virt_to_phys(buffer->addr));
+		(u64)buffer->dma_addr, buffer->len,
+		buffer->addr, (u64)virt_to_phys(buffer->addr));
 
 	pci_free_consistent(efx->pci_dev, buffer->len, buffer->addr,
 			    buffer->dma_addr);
@@ -2357,10 +2357,10 @@ int falcon_probe_port(struct efx_nic *efx)
 				 FALCON_MAC_STATS_SIZE);
 	if (rc)
 		return rc;
-	EFX_LOG(efx, "stats buffer at %llx (virt %p phys %lx)\n",
-		(unsigned long long)efx->stats_buffer.dma_addr,
+	EFX_LOG(efx, "stats buffer at %llx (virt %p phys %llx)\n",
+		(u64)efx->stats_buffer.dma_addr,
 		efx->stats_buffer.addr,
-		virt_to_phys(efx->stats_buffer.addr));
+		(u64)virt_to_phys(efx->stats_buffer.addr));
 
 	return 0;
 }
@@ -2935,9 +2935,9 @@ int falcon_probe_nic(struct efx_nic *efx)
 		goto fail4;
 	BUG_ON(efx->irq_status.dma_addr & 0x0f);
 
-	EFX_LOG(efx, "INT_KER at %llx (virt %p phys %lx)\n",
-		(unsigned long long)efx->irq_status.dma_addr,
-		efx->irq_status.addr, virt_to_phys(efx->irq_status.addr));
+	EFX_LOG(efx, "INT_KER at %llx (virt %p phys %llx)\n",
+		(u64)efx->irq_status.dma_addr,
+		efx->irq_status.addr, (u64)virt_to_phys(efx->irq_status.addr));
 
 	falcon_probe_spi_devices(efx);
 
diff --git a/drivers/net/wireless/arlan-main.c b/drivers/net/wireless/arlan-main.c
index 5b9f1e06ebf6..a54a67c425c8 100644
--- a/drivers/net/wireless/arlan-main.c
+++ b/drivers/net/wireless/arlan-main.c
@@ -1085,8 +1085,8 @@ static int __init arlan_probe_here(struct net_device *dev,
 	if (arlan_check_fingerprint(memaddr))
 		return -ENODEV;
 
-	printk(KERN_NOTICE "%s: Arlan found at %x, \n ", dev->name, 
-	       (int) virt_to_phys((void*)memaddr));
+	printk(KERN_NOTICE "%s: Arlan found at %llx, \n ", dev->name, 
+	       (u64) virt_to_phys((void*)memaddr));
 
 	ap->card = (void *) memaddr;
 	dev->mem_start = memaddr;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 9f102a6535c4..f67325387902 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1511,7 +1511,7 @@ static int xennet_set_tso(struct net_device *dev, u32 data)
 static void xennet_set_features(struct net_device *dev)
 {
 	/* Turn off all GSO bits except ROBUST. */
-	dev->features &= (1 << NETIF_F_GSO_SHIFT) - 1;
+	dev->features &= ~NETIF_F_GSO_MASK;
 	dev->features |= NETIF_F_GSO_ROBUST;
 	xennet_set_sg(dev, 0);
 
diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 9da5a4b81133..c3ea5fa7d05a 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -38,7 +38,7 @@
 
 static LIST_HEAD(dying_tasks);
 static LIST_HEAD(dead_tasks);
-static cpumask_t marked_cpus = CPU_MASK_NONE;
+static cpumask_var_t marked_cpus;
 static DEFINE_SPINLOCK(task_mortuary);
 static void process_task_mortuary(void);
 
@@ -456,10 +456,10 @@ static void mark_done(int cpu)
 {
 	int i;
 
-	cpu_set(cpu, marked_cpus);
+	cpumask_set_cpu(cpu, marked_cpus);
 
 	for_each_online_cpu(i) {
-		if (!cpu_isset(i, marked_cpus))
+		if (!cpumask_test_cpu(i, marked_cpus))
 			return;
 	}
 
@@ -468,7 +468,7 @@ static void mark_done(int cpu)
 	 */
 	process_task_mortuary();
 
-	cpus_clear(marked_cpus);
+	cpumask_clear(marked_cpus);
 }
 
 
@@ -565,6 +565,20 @@ void sync_buffer(int cpu)
 	mutex_unlock(&buffer_mutex);
 }
 
+int __init buffer_sync_init(void)
+{
+	if (!alloc_cpumask_var(&marked_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	cpumask_clear(marked_cpus);
+		return 0;
+}
+
+void __exit buffer_sync_cleanup(void)
+{
+	free_cpumask_var(marked_cpus);
+}
+
 /* The function can be used to add a buffer worth of data directly to
  * the kernel buffer. The buffer is assumed to be a circular buffer.
  * Take the entries from index start and end at index end, wrapping
diff --git a/drivers/oprofile/buffer_sync.h b/drivers/oprofile/buffer_sync.h
index 3110732c1835..0ebf5db62679 100644
--- a/drivers/oprofile/buffer_sync.h
+++ b/drivers/oprofile/buffer_sync.h
@@ -19,4 +19,8 @@ void sync_stop(void);
 /* sync the given CPU's buffer */
 void sync_buffer(int cpu);
 
+/* initialize/destroy the buffer system. */
+int buffer_sync_init(void);
+void buffer_sync_cleanup(void);
+
 #endif /* OPROFILE_BUFFER_SYNC_H */
diff --git a/drivers/oprofile/oprof.c b/drivers/oprofile/oprof.c
index 3cffce90f82a..ced39f602292 100644
--- a/drivers/oprofile/oprof.c
+++ b/drivers/oprofile/oprof.c
@@ -183,6 +183,10 @@ static int __init oprofile_init(void)
 {
 	int err;
 
+	err = buffer_sync_init();
+	if (err)
+		return err;
+
 	err = oprofile_arch_init(&oprofile_ops);
 
 	if (err < 0 || timer) {
@@ -191,8 +195,10 @@ static int __init oprofile_init(void)
 	}
 
 	err = oprofilefs_register();
-	if (err)
+	if (err) {
 		oprofile_arch_exit();
+		buffer_sync_cleanup();
+	}
 
 	return err;
 }
@@ -202,6 +208,7 @@ static void __exit oprofile_exit(void)
 {
 	oprofilefs_unregister();
 	oprofile_arch_exit();
+	buffer_sync_cleanup();
 }
 
 
diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index 26c536b51c5a..5f333403c2ea 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -42,6 +42,7 @@
 LIST_HEAD(dmar_drhd_units);
 
 static struct acpi_table_header * __initdata dmar_tbl;
+static acpi_size dmar_tbl_size;
 
 static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
 {
@@ -288,8 +289,9 @@ static int __init dmar_table_detect(void)
 	acpi_status status = AE_OK;
 
 	/* if we could find DMAR table, then there are DMAR devices */
-	status = acpi_get_table(ACPI_SIG_DMAR, 0,
-				(struct acpi_table_header **)&dmar_tbl);
+	status = acpi_get_table_with_size(ACPI_SIG_DMAR, 0,
+				(struct acpi_table_header **)&dmar_tbl,
+				&dmar_tbl_size);
 
 	if (ACPI_SUCCESS(status) && !dmar_tbl) {
 		printk (KERN_WARNING PREFIX "Unable to map DMAR\n");
@@ -489,6 +491,7 @@ void __init detect_intel_iommu(void)
 			iommu_detected = 1;
 #endif
 	}
+	early_acpi_os_unmap_memory(dmar_tbl, dmar_tbl_size);
 	dmar_tbl = NULL;
 }
 
diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c
index b721c2fbe8f5..9d07a05d26f1 100644
--- a/drivers/pci/intr_remapping.c
+++ b/drivers/pci/intr_remapping.c
@@ -6,6 +6,7 @@
 #include <linux/irq.h>
 #include <asm/io_apic.h>
 #include <asm/smp.h>
+#include <asm/cpu.h>
 #include <linux/intel-iommu.h>
 #include "intr_remapping.h"
 
diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index 8af7dfbe022c..616c60ffcf2c 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -3,7 +3,7 @@
  *
  * Module interface and handling of zfcp data structures.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 /*
@@ -249,8 +249,8 @@ struct zfcp_port *zfcp_get_port_by_wwpn(struct zfcp_adapter *adapter,
 	struct zfcp_port *port;
 
 	list_for_each_entry(port, &adapter->port_list_head, list)
-		if ((port->wwpn == wwpn) && !(atomic_read(&port->status) &
-		      (ZFCP_STATUS_PORT_NO_WWPN | ZFCP_STATUS_COMMON_REMOVE)))
+		if ((port->wwpn == wwpn) &&
+		    !(atomic_read(&port->status) & ZFCP_STATUS_COMMON_REMOVE))
 			return port;
 	return NULL;
 }
@@ -421,7 +421,8 @@ int zfcp_status_read_refill(struct zfcp_adapter *adapter)
 	while (atomic_read(&adapter->stat_miss) > 0)
 		if (zfcp_fsf_status_read(adapter)) {
 			if (atomic_read(&adapter->stat_miss) >= 16) {
-				zfcp_erp_adapter_reopen(adapter, 0, 103, NULL);
+				zfcp_erp_adapter_reopen(adapter, 0, "axsref1",
+							NULL);
 				return 1;
 			}
 			break;
@@ -501,6 +502,7 @@ int zfcp_adapter_enqueue(struct ccw_device *ccw_device)
 	spin_lock_init(&adapter->scsi_dbf_lock);
 	spin_lock_init(&adapter->rec_dbf_lock);
 	spin_lock_init(&adapter->req_q_lock);
+	spin_lock_init(&adapter->qdio_stat_lock);
 
 	rwlock_init(&adapter->erp_lock);
 	rwlock_init(&adapter->abort_lock);
@@ -522,7 +524,6 @@ int zfcp_adapter_enqueue(struct ccw_device *ccw_device)
 		goto sysfs_failed;
 
 	atomic_clear_mask(ZFCP_STATUS_COMMON_REMOVE, &adapter->status);
-	zfcp_fc_nameserver_init(adapter);
 
 	if (!zfcp_adapter_scsi_register(adapter))
 		return 0;
@@ -552,6 +553,7 @@ void zfcp_adapter_dequeue(struct zfcp_adapter *adapter)
 
 	cancel_work_sync(&adapter->scan_work);
 	cancel_work_sync(&adapter->stat_work);
+	cancel_delayed_work_sync(&adapter->nsp.work);
 	zfcp_adapter_scsi_unregister(adapter);
 	sysfs_remove_group(&adapter->ccw_device->dev.kobj,
 			   &zfcp_sysfs_adapter_attrs);
@@ -603,10 +605,13 @@ struct zfcp_port *zfcp_port_enqueue(struct zfcp_adapter *adapter, u64 wwpn,
 	init_waitqueue_head(&port->remove_wq);
 	INIT_LIST_HEAD(&port->unit_list_head);
 	INIT_WORK(&port->gid_pn_work, zfcp_erp_port_strategy_open_lookup);
+	INIT_WORK(&port->test_link_work, zfcp_fc_link_test_work);
+	INIT_WORK(&port->rport_work, zfcp_scsi_rport_work);
 
 	port->adapter = adapter;
 	port->d_id = d_id;
 	port->wwpn = wwpn;
+	port->rport_task = RPORT_NONE;
 
 	/* mark port unusable as long as sysfs registration is not complete */
 	atomic_set_mask(status | ZFCP_STATUS_COMMON_REMOVE, &port->status);
@@ -620,11 +625,10 @@ struct zfcp_port *zfcp_port_enqueue(struct zfcp_adapter *adapter, u64 wwpn,
 	dev_set_drvdata(&port->sysfs_device, port);
 
 	read_lock_irq(&zfcp_data.config_lock);
-	if (!(status & ZFCP_STATUS_PORT_NO_WWPN))
-		if (zfcp_get_port_by_wwpn(adapter, wwpn)) {
-			read_unlock_irq(&zfcp_data.config_lock);
-			goto err_out_free;
-		}
+	if (zfcp_get_port_by_wwpn(adapter, wwpn)) {
+		read_unlock_irq(&zfcp_data.config_lock);
+		goto err_out_free;
+	}
 	read_unlock_irq(&zfcp_data.config_lock);
 
 	if (device_register(&port->sysfs_device))
diff --git a/drivers/s390/scsi/zfcp_ccw.c b/drivers/s390/scsi/zfcp_ccw.c
index 285881f07648..1fe1e2eda512 100644
--- a/drivers/s390/scsi/zfcp_ccw.c
+++ b/drivers/s390/scsi/zfcp_ccw.c
@@ -3,7 +3,7 @@
  *
  * Registration and callback for the s390 common I/O layer.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -72,8 +72,7 @@ static void zfcp_ccw_remove(struct ccw_device *ccw_device)
 
 	list_for_each_entry_safe(port, p, &port_remove_lh, list) {
 		list_for_each_entry_safe(unit, u, &unit_remove_lh, list) {
-			if (atomic_read(&unit->status) &
-			    ZFCP_STATUS_UNIT_REGISTERED)
+			if (unit->device)
 				scsi_remove_device(unit->device);
 			zfcp_unit_dequeue(unit);
 		}
@@ -109,11 +108,12 @@ static int zfcp_ccw_set_online(struct ccw_device *ccw_device)
 	/* initialize request counter */
 	BUG_ON(!zfcp_reqlist_isempty(adapter));
 	adapter->req_no = 0;
+	zfcp_fc_nameserver_init(adapter);
 
-	zfcp_erp_modify_adapter_status(adapter, 10, NULL,
+	zfcp_erp_modify_adapter_status(adapter, "ccsonl1", NULL,
 				       ZFCP_STATUS_COMMON_RUNNING, ZFCP_SET);
-	zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED, 85,
-				NULL);
+	zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED,
+				"ccsonl2", NULL);
 	zfcp_erp_wait(adapter);
 	up(&zfcp_data.config_sema);
 	flush_work(&adapter->scan_work);
@@ -137,7 +137,7 @@ static int zfcp_ccw_set_offline(struct ccw_device *ccw_device)
 
 	down(&zfcp_data.config_sema);
 	adapter = dev_get_drvdata(&ccw_device->dev);
-	zfcp_erp_adapter_shutdown(adapter, 0, 86, NULL);
+	zfcp_erp_adapter_shutdown(adapter, 0, "ccsoff1", NULL);
 	zfcp_erp_wait(adapter);
 	zfcp_erp_thread_kill(adapter);
 	up(&zfcp_data.config_sema);
@@ -160,21 +160,21 @@ static int zfcp_ccw_notify(struct ccw_device *ccw_device, int event)
 	case CIO_GONE:
 		dev_warn(&adapter->ccw_device->dev,
 			 "The FCP device has been detached\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 87, NULL);
+		zfcp_erp_adapter_shutdown(adapter, 0, "ccnoti1", NULL);
 		break;
 	case CIO_NO_PATH:
 		dev_warn(&adapter->ccw_device->dev,
 			 "The CHPID for the FCP device is offline\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 88, NULL);
+		zfcp_erp_adapter_shutdown(adapter, 0, "ccnoti2", NULL);
 		break;
 	case CIO_OPER:
 		dev_info(&adapter->ccw_device->dev,
 			 "The FCP device is operational again\n");
-		zfcp_erp_modify_adapter_status(adapter, 11, NULL,
+		zfcp_erp_modify_adapter_status(adapter, "ccnoti3", NULL,
 					       ZFCP_STATUS_COMMON_RUNNING,
 					       ZFCP_SET);
 		zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED,
-					89, NULL);
+					"ccnoti4", NULL);
 		break;
 	}
 	return 1;
@@ -190,7 +190,7 @@ static void zfcp_ccw_shutdown(struct ccw_device *cdev)
 
 	down(&zfcp_data.config_sema);
 	adapter = dev_get_drvdata(&cdev->dev);
-	zfcp_erp_adapter_shutdown(adapter, 0, 90, NULL);
+	zfcp_erp_adapter_shutdown(adapter, 0, "ccshut1", NULL);
 	zfcp_erp_wait(adapter);
 	up(&zfcp_data.config_sema);
 }
diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index cb6df609953e..0a1a5dd8d018 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -490,172 +490,17 @@ static const char *zfcp_rec_dbf_tags[] = {
 	[ZFCP_REC_DBF_ID_ACTION] = "action",
 };
 
-static const char *zfcp_rec_dbf_ids[] = {
-	[1]	= "new",
-	[2]	= "ready",
-	[3]	= "kill",
-	[4]	= "down sleep",
-	[5]	= "down wakeup",
-	[6]	= "down sleep ecd",
-	[7]	= "down wakeup ecd",
-	[8]	= "down sleep epd",
-	[9]	= "down wakeup epd",
-	[10]	= "online",
-	[11]	= "operational",
-	[12]	= "scsi slave destroy",
-	[13]	= "propagate failed adapter",
-	[14]	= "propagate failed port",
-	[15]	= "block adapter",
-	[16]	= "unblock adapter",
-	[17]	= "block port",
-	[18]	= "unblock port",
-	[19]	= "block unit",
-	[20]	= "unblock unit",
-	[21]	= "unit recovery failed",
-	[22]	= "port recovery failed",
-	[23]	= "adapter recovery failed",
-	[24]	= "qdio queues down",
-	[25]	= "p2p failed",
-	[26]	= "nameserver lookup failed",
-	[27]	= "nameserver port failed",
-	[28]	= "link up",
-	[29]	= "link down",
-	[30]	= "link up status read",
-	[31]	= "open port failed",
-	[32]	= "",
-	[33]	= "close port",
-	[34]	= "open unit failed",
-	[35]	= "exclusive open unit failed",
-	[36]	= "shared open unit failed",
-	[37]	= "link down",
-	[38]	= "link down status read no link",
-	[39]	= "link down status read fdisc login",
-	[40]	= "link down status read firmware update",
-	[41]	= "link down status read unknown reason",
-	[42]	= "link down ecd incomplete",
-	[43]	= "link down epd incomplete",
-	[44]	= "sysfs adapter recovery",
-	[45]	= "sysfs port recovery",
-	[46]	= "sysfs unit recovery",
-	[47]	= "port boxed abort",
-	[48]	= "unit boxed abort",
-	[49]	= "port boxed ct",
-	[50]	= "port boxed close physical",
-	[51]	= "port boxed open unit",
-	[52]	= "port boxed close unit",
-	[53]	= "port boxed fcp",
-	[54]	= "unit boxed fcp",
-	[55]	= "port access denied",
-	[56]	= "",
-	[57]	= "",
-	[58]	= "",
-	[59]	= "unit access denied",
-	[60]	= "shared unit access denied open unit",
-	[61]	= "",
-	[62]	= "request timeout",
-	[63]	= "adisc link test reject or timeout",
-	[64]	= "adisc link test d_id changed",
-	[65]	= "adisc link test failed",
-	[66]	= "recovery out of memory",
-	[67]	= "adapter recovery repeated after state change",
-	[68]	= "port recovery repeated after state change",
-	[69]	= "unit recovery repeated after state change",
-	[70]	= "port recovery follow-up after successful adapter recovery",
-	[71]	= "adapter recovery escalation after failed adapter recovery",
-	[72]	= "port recovery follow-up after successful physical port "
-		  "recovery",
-	[73]	= "adapter recovery escalation after failed physical port "
-		  "recovery",
-	[74]	= "unit recovery follow-up after successful port recovery",
-	[75]	= "physical port recovery escalation after failed port "
-		  "recovery",
-	[76]	= "port recovery escalation after failed unit recovery",
-	[77]	= "",
-	[78]	= "duplicate request id",
-	[79]	= "link down",
-	[80]	= "exclusive read-only unit access unsupported",
-	[81]	= "shared read-write unit access unsupported",
-	[82]	= "incoming rscn",
-	[83]	= "incoming wwpn",
-	[84]	= "wka port handle not valid close port",
-	[85]	= "online",
-	[86]	= "offline",
-	[87]	= "ccw device gone",
-	[88]	= "ccw device no path",
-	[89]	= "ccw device operational",
-	[90]	= "ccw device shutdown",
-	[91]	= "sysfs port addition",
-	[92]	= "sysfs port removal",
-	[93]	= "sysfs adapter recovery",
-	[94]	= "sysfs unit addition",
-	[95]	= "sysfs unit removal",
-	[96]	= "sysfs port recovery",
-	[97]	= "sysfs unit recovery",
-	[98]	= "sequence number mismatch",
-	[99]	= "link up",
-	[100]	= "error state",
-	[101]	= "status read physical port closed",
-	[102]	= "link up status read",
-	[103]	= "too many failed status read buffers",
-	[104]	= "port handle not valid abort",
-	[105]	= "lun handle not valid abort",
-	[106]	= "port handle not valid ct",
-	[107]	= "port handle not valid close port",
-	[108]	= "port handle not valid close physical port",
-	[109]	= "port handle not valid open unit",
-	[110]	= "port handle not valid close unit",
-	[111]	= "lun handle not valid close unit",
-	[112]	= "port handle not valid fcp",
-	[113]	= "lun handle not valid fcp",
-	[114]	= "handle mismatch fcp",
-	[115]	= "lun not valid fcp",
-	[116]	= "qdio send failed",
-	[117]	= "version mismatch",
-	[118]	= "incompatible qtcb type",
-	[119]	= "unknown protocol status",
-	[120]	= "unknown fsf command",
-	[121]	= "no recommendation for status qualifier",
-	[122]	= "status read physical port closed in error",
-	[123]	= "fc service class not supported",
-	[124]	= "",
-	[125]	= "need newer zfcp",
-	[126]	= "need newer microcode",
-	[127]	= "arbitrated loop not supported",
-	[128]	= "",
-	[129]	= "qtcb size mismatch",
-	[130]	= "unknown fsf status ecd",
-	[131]	= "fcp request too big",
-	[132]	= "",
-	[133]	= "data direction not valid fcp",
-	[134]	= "command length not valid fcp",
-	[135]	= "status read act update",
-	[136]	= "status read cfdc update",
-	[137]	= "hbaapi port open",
-	[138]	= "hbaapi unit open",
-	[139]	= "hbaapi unit shutdown",
-	[140]	= "qdio error outbound",
-	[141]	= "scsi host reset",
-	[142]	= "dismissing fsf request for recovery action",
-	[143]	= "recovery action timed out",
-	[144]	= "recovery action gone",
-	[145]	= "recovery action being processed",
-	[146]	= "recovery action ready for next step",
-	[147]	= "qdio error inbound",
-	[148]   = "nameserver needed for port scan",
-	[149]   = "port scan",
-	[150]	= "ptp attach",
-	[151]   = "port validation failed",
-};
-
 static int zfcp_rec_dbf_view_format(debug_info_t *id, struct debug_view *view,
 				    char *buf, const char *_rec)
 {
 	struct zfcp_rec_dbf_record *r = (struct zfcp_rec_dbf_record *)_rec;
 	char *p = buf;
+	char hint[ZFCP_DBF_ID_SIZE + 1];
 
+	memcpy(hint, r->id2, ZFCP_DBF_ID_SIZE);
+	hint[ZFCP_DBF_ID_SIZE] = 0;
 	zfcp_dbf_outs(&p, "tag", zfcp_rec_dbf_tags[r->id]);
-	zfcp_dbf_outs(&p, "hint", zfcp_rec_dbf_ids[r->id2]);
-	zfcp_dbf_out(&p, "id", "%d", r->id2);
+	zfcp_dbf_outs(&p, "hint", hint);
 	switch (r->id) {
 	case ZFCP_REC_DBF_ID_THREAD:
 		zfcp_dbf_out(&p, "total", "%d", r->u.thread.total);
@@ -707,7 +552,7 @@ static struct debug_view zfcp_rec_dbf_view = {
  * @adapter: adapter
  * This function assumes that the caller is holding erp_lock.
  */
-void zfcp_rec_dbf_event_thread(u8 id2, struct zfcp_adapter *adapter)
+void zfcp_rec_dbf_event_thread(char *id2, struct zfcp_adapter *adapter)
 {
 	struct zfcp_rec_dbf_record *r = &adapter->rec_dbf_buf;
 	unsigned long flags = 0;
@@ -723,7 +568,7 @@ void zfcp_rec_dbf_event_thread(u8 id2, struct zfcp_adapter *adapter)
 	spin_lock_irqsave(&adapter->rec_dbf_lock, flags);
 	memset(r, 0, sizeof(*r));
 	r->id = ZFCP_REC_DBF_ID_THREAD;
-	r->id2 = id2;
+	memcpy(r->id2, id2, ZFCP_DBF_ID_SIZE);
 	r->u.thread.total = total;
 	r->u.thread.ready = ready;
 	r->u.thread.running = running;
@@ -737,7 +582,7 @@ void zfcp_rec_dbf_event_thread(u8 id2, struct zfcp_adapter *adapter)
  * @adapter: adapter
  * This function assumes that the caller does not hold erp_lock.
  */
-void zfcp_rec_dbf_event_thread_lock(u8 id2, struct zfcp_adapter *adapter)
+void zfcp_rec_dbf_event_thread_lock(char *id2, struct zfcp_adapter *adapter)
 {
 	unsigned long flags;
 
@@ -746,7 +591,7 @@ void zfcp_rec_dbf_event_thread_lock(u8 id2, struct zfcp_adapter *adapter)
 	read_unlock_irqrestore(&adapter->erp_lock, flags);
 }
 
-static void zfcp_rec_dbf_event_target(u8 id2, void *ref,
+static void zfcp_rec_dbf_event_target(char *id2, void *ref,
 				      struct zfcp_adapter *adapter,
 				      atomic_t *status, atomic_t *erp_count,
 				      u64 wwpn, u32 d_id, u64 fcp_lun)
@@ -757,7 +602,7 @@ static void zfcp_rec_dbf_event_target(u8 id2, void *ref,
 	spin_lock_irqsave(&adapter->rec_dbf_lock, flags);
 	memset(r, 0, sizeof(*r));
 	r->id = ZFCP_REC_DBF_ID_TARGET;
-	r->id2 = id2;
+	memcpy(r->id2, id2, ZFCP_DBF_ID_SIZE);
 	r->u.target.ref = (unsigned long)ref;
 	r->u.target.status = atomic_read(status);
 	r->u.target.wwpn = wwpn;
@@ -774,7 +619,8 @@ static void zfcp_rec_dbf_event_target(u8 id2, void *ref,
  * @ref: additional reference (e.g. request)
  * @adapter: adapter
  */
-void zfcp_rec_dbf_event_adapter(u8 id, void *ref, struct zfcp_adapter *adapter)
+void zfcp_rec_dbf_event_adapter(char *id, void *ref,
+				struct zfcp_adapter *adapter)
 {
 	zfcp_rec_dbf_event_target(id, ref, adapter, &adapter->status,
 				  &adapter->erp_counter, 0, 0, 0);
@@ -786,7 +632,7 @@ void zfcp_rec_dbf_event_adapter(u8 id, void *ref, struct zfcp_adapter *adapter)
  * @ref: additional reference (e.g. request)
  * @port: port
  */
-void zfcp_rec_dbf_event_port(u8 id, void *ref, struct zfcp_port *port)
+void zfcp_rec_dbf_event_port(char *id, void *ref, struct zfcp_port *port)
 {
 	struct zfcp_adapter *adapter = port->adapter;
 
@@ -801,7 +647,7 @@ void zfcp_rec_dbf_event_port(u8 id, void *ref, struct zfcp_port *port)
  * @ref: additional reference (e.g. request)
  * @unit: unit
  */
-void zfcp_rec_dbf_event_unit(u8 id, void *ref, struct zfcp_unit *unit)
+void zfcp_rec_dbf_event_unit(char *id, void *ref, struct zfcp_unit *unit)
 {
 	struct zfcp_port *port = unit->port;
 	struct zfcp_adapter *adapter = port->adapter;
@@ -822,7 +668,7 @@ void zfcp_rec_dbf_event_unit(u8 id, void *ref, struct zfcp_unit *unit)
  * @port: port
  * @unit: unit
  */
-void zfcp_rec_dbf_event_trigger(u8 id2, void *ref, u8 want, u8 need,
+void zfcp_rec_dbf_event_trigger(char *id2, void *ref, u8 want, u8 need,
 				void *action, struct zfcp_adapter *adapter,
 				struct zfcp_port *port, struct zfcp_unit *unit)
 {
@@ -832,7 +678,7 @@ void zfcp_rec_dbf_event_trigger(u8 id2, void *ref, u8 want, u8 need,
 	spin_lock_irqsave(&adapter->rec_dbf_lock, flags);
 	memset(r, 0, sizeof(*r));
 	r->id = ZFCP_REC_DBF_ID_TRIGGER;
-	r->id2 = id2;
+	memcpy(r->id2, id2, ZFCP_DBF_ID_SIZE);
 	r->u.trigger.ref = (unsigned long)ref;
 	r->u.trigger.want = want;
 	r->u.trigger.need = need;
@@ -855,7 +701,7 @@ void zfcp_rec_dbf_event_trigger(u8 id2, void *ref, u8 want, u8 need,
  * @id2: identifier
  * @erp_action: error recovery action struct pointer
  */
-void zfcp_rec_dbf_event_action(u8 id2, struct zfcp_erp_action *erp_action)
+void zfcp_rec_dbf_event_action(char *id2, struct zfcp_erp_action *erp_action)
 {
 	struct zfcp_adapter *adapter = erp_action->adapter;
 	struct zfcp_rec_dbf_record *r = &adapter->rec_dbf_buf;
@@ -864,7 +710,7 @@ void zfcp_rec_dbf_event_action(u8 id2, struct zfcp_erp_action *erp_action)
 	spin_lock_irqsave(&adapter->rec_dbf_lock, flags);
 	memset(r, 0, sizeof(*r));
 	r->id = ZFCP_REC_DBF_ID_ACTION;
-	r->id2 = id2;
+	memcpy(r->id2, id2, ZFCP_DBF_ID_SIZE);
 	r->u.action.action = (unsigned long)erp_action;
 	r->u.action.status = erp_action->status;
 	r->u.action.step = erp_action->step;
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 74998ff88e57..a573f7344dd6 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -25,6 +25,7 @@
 #include "zfcp_fsf.h"
 
 #define ZFCP_DBF_TAG_SIZE      4
+#define ZFCP_DBF_ID_SIZE       7
 
 struct zfcp_dbf_dump {
 	u8 tag[ZFCP_DBF_TAG_SIZE];
@@ -70,7 +71,7 @@ struct zfcp_rec_dbf_record_action {
 
 struct zfcp_rec_dbf_record {
 	u8 id;
-	u8 id2;
+	char id2[7];
 	union {
 		struct zfcp_rec_dbf_record_action action;
 		struct zfcp_rec_dbf_record_thread thread;
diff --git a/drivers/s390/scsi/zfcp_def.h b/drivers/s390/scsi/zfcp_def.h
index 510662783a6f..a0318630f047 100644
--- a/drivers/s390/scsi/zfcp_def.h
+++ b/drivers/s390/scsi/zfcp_def.h
@@ -3,7 +3,7 @@
  *
  * Global definitions for the zfcp device driver.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #ifndef ZFCP_DEF_H
@@ -243,9 +243,6 @@ struct zfcp_ls_adisc {
 
 /* remote port status */
 #define ZFCP_STATUS_PORT_PHYS_OPEN		0x00000001
-#define ZFCP_STATUS_PORT_PHYS_CLOSING		0x00000004
-#define ZFCP_STATUS_PORT_NO_WWPN		0x00000008
-#define ZFCP_STATUS_PORT_INVALID_WWPN		0x00000020
 
 /* well known address (WKA) port status*/
 enum zfcp_wka_status {
@@ -258,7 +255,6 @@ enum zfcp_wka_status {
 /* logical unit status */
 #define ZFCP_STATUS_UNIT_SHARED			0x00000004
 #define ZFCP_STATUS_UNIT_READONLY		0x00000008
-#define ZFCP_STATUS_UNIT_REGISTERED		0x00000010
 #define ZFCP_STATUS_UNIT_SCSI_WORK_PENDING	0x00000020
 
 /* FSF request status (this does not have a common part) */
@@ -447,8 +443,9 @@ struct zfcp_adapter {
 	spinlock_t		req_list_lock;	   /* request list lock */
 	struct zfcp_qdio_queue	req_q;		   /* request queue */
 	spinlock_t		req_q_lock;	   /* for operations on queue */
-	int			req_q_pci_batch;   /* SBALs since PCI indication
-						      was last set */
+	ktime_t			req_q_time; /* time of last fill level change */
+	u64			req_q_util; /* for accounting */
+	spinlock_t		qdio_stat_lock;
 	u32			fsf_req_seq_no;	   /* FSF cmnd seq number */
 	wait_queue_head_t	request_wq;	   /* can be used to wait for
 						      more avaliable SBALs */
@@ -514,6 +511,9 @@ struct zfcp_port {
 	u32                    maxframe_size;
 	u32                    supported_classes;
 	struct work_struct     gid_pn_work;
+	struct work_struct     test_link_work;
+	struct work_struct     rport_work;
+	enum { RPORT_NONE, RPORT_ADD, RPORT_DEL }  rport_task;
 };
 
 struct zfcp_unit {
@@ -587,9 +587,6 @@ struct zfcp_fsf_req_qtcb {
 
 /********************** ZFCP SPECIFIC DEFINES ********************************/
 
-#define ZFCP_REQ_AUTO_CLEANUP	0x00000002
-#define ZFCP_REQ_NO_QTCB	0x00000008
-
 #define ZFCP_SET                0x00000100
 #define ZFCP_CLEAR              0x00000200
 
diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index 387a3af528ac..631bdb1dfd6c 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -3,7 +3,7 @@
  *
  * Error Recovery Procedures (ERP).
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -55,7 +55,7 @@ enum zfcp_erp_act_result {
 
 static void zfcp_erp_adapter_block(struct zfcp_adapter *adapter, int mask)
 {
-	zfcp_erp_modify_adapter_status(adapter, 15, NULL,
+	zfcp_erp_modify_adapter_status(adapter, "erablk1", NULL,
 				       ZFCP_STATUS_COMMON_UNBLOCKED | mask,
 				       ZFCP_CLEAR);
 }
@@ -75,9 +75,9 @@ static void zfcp_erp_action_ready(struct zfcp_erp_action *act)
 	struct zfcp_adapter *adapter = act->adapter;
 
 	list_move(&act->list, &act->adapter->erp_ready_head);
-	zfcp_rec_dbf_event_action(146, act);
+	zfcp_rec_dbf_event_action("erardy1", act);
 	up(&adapter->erp_ready_sem);
-	zfcp_rec_dbf_event_thread(2, adapter);
+	zfcp_rec_dbf_event_thread("erardy2", adapter);
 }
 
 static void zfcp_erp_action_dismiss(struct zfcp_erp_action *act)
@@ -208,7 +208,7 @@ static struct zfcp_erp_action *zfcp_erp_setup_act(int need,
 
 static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter,
 				   struct zfcp_port *port,
-				   struct zfcp_unit *unit, u8 id, void *ref)
+				   struct zfcp_unit *unit, char *id, void *ref)
 {
 	int retval = 1, need;
 	struct zfcp_erp_action *act = NULL;
@@ -228,7 +228,7 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter,
 	++adapter->erp_total_count;
 	list_add_tail(&act->list, &adapter->erp_ready_head);
 	up(&adapter->erp_ready_sem);
-	zfcp_rec_dbf_event_thread(1, adapter);
+	zfcp_rec_dbf_event_thread("eracte1", adapter);
 	retval = 0;
  out:
 	zfcp_rec_dbf_event_trigger(id, ref, want, need, act,
@@ -237,13 +237,14 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter,
 }
 
 static int _zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter,
-				    int clear_mask, u8 id, void *ref)
+				    int clear_mask, char *id, void *ref)
 {
 	zfcp_erp_adapter_block(adapter, clear_mask);
+	zfcp_scsi_schedule_rports_block(adapter);
 
 	/* ensure propagation of failed status to new devices */
 	if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
-		zfcp_erp_adapter_failed(adapter, 13, NULL);
+		zfcp_erp_adapter_failed(adapter, "erareo1", NULL);
 		return -EIO;
 	}
 	return zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER,
@@ -258,7 +259,7 @@ static int _zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter,
  * @ref: Reference for debug trace event.
  */
 void zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, int clear,
-			     u8 id, void *ref)
+			     char *id, void *ref)
 {
 	unsigned long flags;
 
@@ -277,7 +278,7 @@ void zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, int clear,
  * @ref: Reference for debug trace event.
  */
 void zfcp_erp_adapter_shutdown(struct zfcp_adapter *adapter, int clear,
-			       u8 id, void *ref)
+			       char *id, void *ref)
 {
 	int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
 	zfcp_erp_adapter_reopen(adapter, clear | flags, id, ref);
@@ -290,7 +291,8 @@ void zfcp_erp_adapter_shutdown(struct zfcp_adapter *adapter, int clear,
  * @id: Id for debug trace event.
  * @ref: Reference for debug trace event.
  */
-void zfcp_erp_port_shutdown(struct zfcp_port *port, int clear, u8 id, void *ref)
+void zfcp_erp_port_shutdown(struct zfcp_port *port, int clear, char *id,
+			    void *ref)
 {
 	int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
 	zfcp_erp_port_reopen(port, clear | flags, id, ref);
@@ -303,7 +305,8 @@ void zfcp_erp_port_shutdown(struct zfcp_port *port, int clear, u8 id, void *ref)
  * @id: Id for debug trace event.
  * @ref: Reference for debug trace event.
  */
-void zfcp_erp_unit_shutdown(struct zfcp_unit *unit, int clear, u8 id, void *ref)
+void zfcp_erp_unit_shutdown(struct zfcp_unit *unit, int clear, char *id,
+			    void *ref)
 {
 	int flags = ZFCP_STATUS_COMMON_RUNNING | ZFCP_STATUS_COMMON_ERP_FAILED;
 	zfcp_erp_unit_reopen(unit, clear | flags, id, ref);
@@ -311,15 +314,16 @@ void zfcp_erp_unit_shutdown(struct zfcp_unit *unit, int clear, u8 id, void *ref)
 
 static void zfcp_erp_port_block(struct zfcp_port *port, int clear)
 {
-	zfcp_erp_modify_port_status(port, 17, NULL,
+	zfcp_erp_modify_port_status(port, "erpblk1", NULL,
 				    ZFCP_STATUS_COMMON_UNBLOCKED | clear,
 				    ZFCP_CLEAR);
 }
 
 static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port,
-					 int clear, u8 id, void *ref)
+					 int clear, char *id, void *ref)
 {
 	zfcp_erp_port_block(port, clear);
+	zfcp_scsi_schedule_rport_block(port);
 
 	if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED)
 		return;
@@ -334,7 +338,7 @@ static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port,
  * @id: Id for debug trace event.
  * @ref: Reference for debug trace event.
  */
-void zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, u8 id,
+void zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, char *id,
 				 void *ref)
 {
 	unsigned long flags;
@@ -347,14 +351,15 @@ void zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, u8 id,
 	read_unlock_irqrestore(&zfcp_data.config_lock, flags);
 }
 
-static int _zfcp_erp_port_reopen(struct zfcp_port *port, int clear, u8 id,
+static int _zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *id,
 				 void *ref)
 {
 	zfcp_erp_port_block(port, clear);
+	zfcp_scsi_schedule_rport_block(port);
 
 	if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
 		/* ensure propagation of failed status to new devices */
-		zfcp_erp_port_failed(port, 14, NULL);
+		zfcp_erp_port_failed(port, "erpreo1", NULL);
 		return -EIO;
 	}
 
@@ -369,7 +374,7 @@ static int _zfcp_erp_port_reopen(struct zfcp_port *port, int clear, u8 id,
  *
  * Returns 0 if recovery has been triggered, < 0 if not.
  */
-int zfcp_erp_port_reopen(struct zfcp_port *port, int clear, u8 id, void *ref)
+int zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *id, void *ref)
 {
 	unsigned long flags;
 	int retval;
@@ -386,12 +391,12 @@ int zfcp_erp_port_reopen(struct zfcp_port *port, int clear, u8 id, void *ref)
 
 static void zfcp_erp_unit_block(struct zfcp_unit *unit, int clear_mask)
 {
-	zfcp_erp_modify_unit_status(unit, 19, NULL,
+	zfcp_erp_modify_unit_status(unit, "erublk1", NULL,
 				    ZFCP_STATUS_COMMON_UNBLOCKED | clear_mask,
 				    ZFCP_CLEAR);
 }
 
-static void _zfcp_erp_unit_reopen(struct zfcp_unit *unit, int clear, u8 id,
+static void _zfcp_erp_unit_reopen(struct zfcp_unit *unit, int clear, char *id,
 				  void *ref)
 {
 	struct zfcp_adapter *adapter = unit->port->adapter;
@@ -411,7 +416,8 @@ static void _zfcp_erp_unit_reopen(struct zfcp_unit *unit, int clear, u8 id,
  * @clear_mask: specifies flags in unit status to be cleared
  * Return: 0 on success, < 0 on error
  */
-void zfcp_erp_unit_reopen(struct zfcp_unit *unit, int clear, u8 id, void *ref)
+void zfcp_erp_unit_reopen(struct zfcp_unit *unit, int clear, char *id,
+			  void *ref)
 {
 	unsigned long flags;
 	struct zfcp_port *port = unit->port;
@@ -437,28 +443,28 @@ static int status_change_clear(unsigned long mask, atomic_t *status)
 static void zfcp_erp_adapter_unblock(struct zfcp_adapter *adapter)
 {
 	if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &adapter->status))
-		zfcp_rec_dbf_event_adapter(16, NULL, adapter);
+		zfcp_rec_dbf_event_adapter("eraubl1", NULL, adapter);
 	atomic_set_mask(ZFCP_STATUS_COMMON_UNBLOCKED, &adapter->status);
 }
 
 static void zfcp_erp_port_unblock(struct zfcp_port *port)
 {
 	if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &port->status))
-		zfcp_rec_dbf_event_port(18, NULL, port);
+		zfcp_rec_dbf_event_port("erpubl1", NULL, port);
 	atomic_set_mask(ZFCP_STATUS_COMMON_UNBLOCKED, &port->status);
 }
 
 static void zfcp_erp_unit_unblock(struct zfcp_unit *unit)
 {
 	if (status_change_set(ZFCP_STATUS_COMMON_UNBLOCKED, &unit->status))
-		zfcp_rec_dbf_event_unit(20, NULL, unit);
+		zfcp_rec_dbf_event_unit("eruubl1", NULL, unit);
 	atomic_set_mask(ZFCP_STATUS_COMMON_UNBLOCKED, &unit->status);
 }
 
 static void zfcp_erp_action_to_running(struct zfcp_erp_action *erp_action)
 {
 	list_move(&erp_action->list, &erp_action->adapter->erp_running_head);
-	zfcp_rec_dbf_event_action(145, erp_action);
+	zfcp_rec_dbf_event_action("erator1", erp_action);
 }
 
 static void zfcp_erp_strategy_check_fsfreq(struct zfcp_erp_action *act)
@@ -474,11 +480,11 @@ static void zfcp_erp_strategy_check_fsfreq(struct zfcp_erp_action *act)
 		if (act->status & (ZFCP_STATUS_ERP_DISMISSED |
 				   ZFCP_STATUS_ERP_TIMEDOUT)) {
 			act->fsf_req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
-			zfcp_rec_dbf_event_action(142, act);
+			zfcp_rec_dbf_event_action("erscf_1", act);
 			act->fsf_req->erp_action = NULL;
 		}
 		if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
-			zfcp_rec_dbf_event_action(143, act);
+			zfcp_rec_dbf_event_action("erscf_2", act);
 		if (act->fsf_req->status & (ZFCP_STATUS_FSFREQ_COMPLETED |
 					    ZFCP_STATUS_FSFREQ_DISMISSED))
 			act->fsf_req = NULL;
@@ -530,7 +536,7 @@ static void zfcp_erp_strategy_memwait(struct zfcp_erp_action *erp_action)
 }
 
 static void _zfcp_erp_port_reopen_all(struct zfcp_adapter *adapter,
-				      int clear, u8 id, void *ref)
+				      int clear, char *id, void *ref)
 {
 	struct zfcp_port *port;
 
@@ -538,8 +544,8 @@ static void _zfcp_erp_port_reopen_all(struct zfcp_adapter *adapter,
 		_zfcp_erp_port_reopen(port, clear, id, ref);
 }
 
-static void _zfcp_erp_unit_reopen_all(struct zfcp_port *port, int clear, u8 id,
-				      void *ref)
+static void _zfcp_erp_unit_reopen_all(struct zfcp_port *port, int clear,
+				      char *id, void *ref)
 {
 	struct zfcp_unit *unit;
 
@@ -559,28 +565,28 @@ static void zfcp_erp_strategy_followup_actions(struct zfcp_erp_action *act)
 
 	case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
 		if (status == ZFCP_ERP_SUCCEEDED)
-			_zfcp_erp_port_reopen_all(adapter, 0, 70, NULL);
+			_zfcp_erp_port_reopen_all(adapter, 0, "ersfa_1", NULL);
 		else
-			_zfcp_erp_adapter_reopen(adapter, 0, 71, NULL);
+			_zfcp_erp_adapter_reopen(adapter, 0, "ersfa_2", NULL);
 		break;
 
 	case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
 		if (status == ZFCP_ERP_SUCCEEDED)
-			_zfcp_erp_port_reopen(port, 0, 72, NULL);
+			_zfcp_erp_port_reopen(port, 0, "ersfa_3", NULL);
 		else
-			_zfcp_erp_adapter_reopen(adapter, 0, 73, NULL);
+			_zfcp_erp_adapter_reopen(adapter, 0, "ersfa_4", NULL);
 		break;
 
 	case ZFCP_ERP_ACTION_REOPEN_PORT:
 		if (status == ZFCP_ERP_SUCCEEDED)
-			_zfcp_erp_unit_reopen_all(port, 0, 74, NULL);
+			_zfcp_erp_unit_reopen_all(port, 0, "ersfa_5", NULL);
 		else
-			_zfcp_erp_port_forced_reopen(port, 0, 75, NULL);
+			_zfcp_erp_port_forced_reopen(port, 0, "ersfa_6", NULL);
 		break;
 
 	case ZFCP_ERP_ACTION_REOPEN_UNIT:
 		if (status != ZFCP_ERP_SUCCEEDED)
-			_zfcp_erp_port_reopen(unit->port, 0, 76, NULL);
+			_zfcp_erp_port_reopen(unit->port, 0, "ersfa_7", NULL);
 		break;
 	}
 }
@@ -617,7 +623,7 @@ static void zfcp_erp_enqueue_ptp_port(struct zfcp_adapter *adapter)
 				 adapter->peer_d_id);
 	if (IS_ERR(port)) /* error or port already attached */
 		return;
-	_zfcp_erp_port_reopen(port, 0, 150, NULL);
+	_zfcp_erp_port_reopen(port, 0, "ereptp1", NULL);
 }
 
 static int zfcp_erp_adapter_strat_fsf_xconf(struct zfcp_erp_action *erp_action)
@@ -640,9 +646,9 @@ static int zfcp_erp_adapter_strat_fsf_xconf(struct zfcp_erp_action *erp_action)
 			return ZFCP_ERP_FAILED;
 		}
 
-		zfcp_rec_dbf_event_thread_lock(6, adapter);
+		zfcp_rec_dbf_event_thread_lock("erasfx1", adapter);
 		down(&adapter->erp_ready_sem);
-		zfcp_rec_dbf_event_thread_lock(7, adapter);
+		zfcp_rec_dbf_event_thread_lock("erasfx2", adapter);
 		if (erp_action->status & ZFCP_STATUS_ERP_TIMEDOUT)
 			break;
 
@@ -681,9 +687,9 @@ static int zfcp_erp_adapter_strategy_open_fsf_xport(struct zfcp_erp_action *act)
 	if (ret)
 		return ZFCP_ERP_FAILED;
 
-	zfcp_rec_dbf_event_thread_lock(8, adapter);
+	zfcp_rec_dbf_event_thread_lock("erasox1", adapter);
 	down(&adapter->erp_ready_sem);
-	zfcp_rec_dbf_event_thread_lock(9, adapter);
+	zfcp_rec_dbf_event_thread_lock("erasox2", adapter);
 	if (act->status & ZFCP_STATUS_ERP_TIMEDOUT)
 		return ZFCP_ERP_FAILED;
 
@@ -705,60 +711,59 @@ static int zfcp_erp_adapter_strategy_open_fsf(struct zfcp_erp_action *act)
 	return ZFCP_ERP_SUCCEEDED;
 }
 
-static int zfcp_erp_adapter_strategy_generic(struct zfcp_erp_action *act,
-					     int close)
+static void zfcp_erp_adapter_strategy_close(struct zfcp_erp_action *act)
 {
-	int retval = ZFCP_ERP_SUCCEEDED;
 	struct zfcp_adapter *adapter = act->adapter;
 
-	if (close)
-		goto close_only;
-
-	retval = zfcp_erp_adapter_strategy_open_qdio(act);
-	if (retval != ZFCP_ERP_SUCCEEDED)
-		goto failed_qdio;
-
-	retval = zfcp_erp_adapter_strategy_open_fsf(act);
-	if (retval != ZFCP_ERP_SUCCEEDED)
-		goto failed_openfcp;
-
-	atomic_set_mask(ZFCP_STATUS_COMMON_OPEN, &act->adapter->status);
-
-	return ZFCP_ERP_SUCCEEDED;
-
- close_only:
-	atomic_clear_mask(ZFCP_STATUS_COMMON_OPEN,
-			  &act->adapter->status);
-
- failed_openfcp:
 	/* close queues to ensure that buffers are not accessed by adapter */
 	zfcp_qdio_close(adapter);
 	zfcp_fsf_req_dismiss_all(adapter);
 	adapter->fsf_req_seq_no = 0;
 	/* all ports and units are closed */
-	zfcp_erp_modify_adapter_status(adapter, 24, NULL,
+	zfcp_erp_modify_adapter_status(adapter, "erascl1", NULL,
 				       ZFCP_STATUS_COMMON_OPEN, ZFCP_CLEAR);
- failed_qdio:
+
 	atomic_clear_mask(ZFCP_STATUS_ADAPTER_XCONFIG_OK |
-			  ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED,
-			  &act->adapter->status);
-	return retval;
+			  ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED, &adapter->status);
 }
 
-static int zfcp_erp_adapter_strategy(struct zfcp_erp_action *act)
+static int zfcp_erp_adapter_strategy_open(struct zfcp_erp_action *act)
 {
-	int retval;
+	struct zfcp_adapter *adapter = act->adapter;
 
-	zfcp_erp_adapter_strategy_generic(act, 1); /* close */
-	if (act->status & ZFCP_STATUS_ERP_CLOSE_ONLY)
-		return ZFCP_ERP_EXIT;
+	if (zfcp_erp_adapter_strategy_open_qdio(act)) {
+		atomic_clear_mask(ZFCP_STATUS_ADAPTER_XCONFIG_OK |
+				  ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED,
+				  &adapter->status);
+		return ZFCP_ERP_FAILED;
+	}
+
+	if (zfcp_erp_adapter_strategy_open_fsf(act)) {
+		zfcp_erp_adapter_strategy_close(act);
+		return ZFCP_ERP_FAILED;
+	}
+
+	atomic_set_mask(ZFCP_STATUS_COMMON_OPEN, &adapter->status);
+
+	return ZFCP_ERP_SUCCEEDED;
+}
 
-	retval = zfcp_erp_adapter_strategy_generic(act, 0); /* open */
+static int zfcp_erp_adapter_strategy(struct zfcp_erp_action *act)
+{
+	struct zfcp_adapter *adapter = act->adapter;
 
-	if (retval == ZFCP_ERP_FAILED)
+	if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_OPEN) {
+		zfcp_erp_adapter_strategy_close(act);
+		if (act->status & ZFCP_STATUS_ERP_CLOSE_ONLY)
+			return ZFCP_ERP_EXIT;
+	}
+
+	if (zfcp_erp_adapter_strategy_open(act)) {
 		ssleep(8);
+		return ZFCP_ERP_FAILED;
+	}
 
-	return retval;
+	return ZFCP_ERP_SUCCEEDED;
 }
 
 static int zfcp_erp_port_forced_strategy_close(struct zfcp_erp_action *act)
@@ -777,10 +782,7 @@ static int zfcp_erp_port_forced_strategy_close(struct zfcp_erp_action *act)
 
 static void zfcp_erp_port_strategy_clearstati(struct zfcp_port *port)
 {
-	atomic_clear_mask(ZFCP_STATUS_COMMON_ACCESS_DENIED |
-			  ZFCP_STATUS_PORT_PHYS_CLOSING |
-			  ZFCP_STATUS_PORT_INVALID_WWPN,
-			  &port->status);
+	atomic_clear_mask(ZFCP_STATUS_COMMON_ACCESS_DENIED, &port->status);
 }
 
 static int zfcp_erp_port_forced_strategy(struct zfcp_erp_action *erp_action)
@@ -836,7 +838,7 @@ static int zfcp_erp_open_ptp_port(struct zfcp_erp_action *act)
 	struct zfcp_port *port = act->port;
 
 	if (port->wwpn != adapter->peer_wwpn) {
-		zfcp_erp_port_failed(port, 25, NULL);
+		zfcp_erp_port_failed(port, "eroptp1", NULL);
 		return ZFCP_ERP_FAILED;
 	}
 	port->d_id = adapter->peer_d_id;
@@ -855,7 +857,7 @@ void zfcp_erp_port_strategy_open_lookup(struct work_struct *work)
 	port->erp_action.step = ZFCP_ERP_STEP_NAMESERVER_LOOKUP;
 	if (retval)
 		zfcp_erp_notify(&port->erp_action, ZFCP_ERP_FAILED);
-
+	zfcp_port_put(port);
 }
 
 static int zfcp_erp_port_strategy_open_common(struct zfcp_erp_action *act)
@@ -871,17 +873,15 @@ static int zfcp_erp_port_strategy_open_common(struct zfcp_erp_action *act)
 		if (fc_host_port_type(adapter->scsi_host) == FC_PORTTYPE_PTP)
 			return zfcp_erp_open_ptp_port(act);
 		if (!port->d_id) {
-			queue_work(zfcp_data.work_queue, &port->gid_pn_work);
+			zfcp_port_get(port);
+			if (!queue_work(zfcp_data.work_queue,
+					&port->gid_pn_work))
+				zfcp_port_put(port);
 			return ZFCP_ERP_CONTINUES;
 		}
 	case ZFCP_ERP_STEP_NAMESERVER_LOOKUP:
-		if (!port->d_id) {
-			if (p_status & (ZFCP_STATUS_PORT_INVALID_WWPN)) {
-				zfcp_erp_port_failed(port, 26, NULL);
-				return ZFCP_ERP_EXIT;
-			}
+		if (!port->d_id)
 			return ZFCP_ERP_FAILED;
-		}
 		return zfcp_erp_port_strategy_open_port(act);
 
 	case ZFCP_ERP_STEP_PORT_OPENING:
@@ -995,7 +995,7 @@ static int zfcp_erp_strategy_check_unit(struct zfcp_unit *unit, int result)
 				"port 0x%016Lx\n",
 				(unsigned long long)unit->fcp_lun,
 				(unsigned long long)unit->port->wwpn);
-			zfcp_erp_unit_failed(unit, 21, NULL);
+			zfcp_erp_unit_failed(unit, "erusck1", NULL);
 		}
 		break;
 	}
@@ -1025,7 +1025,7 @@ static int zfcp_erp_strategy_check_port(struct zfcp_port *port, int result)
 			dev_err(&port->adapter->ccw_device->dev,
 				"ERP failed for remote port 0x%016Lx\n",
 				(unsigned long long)port->wwpn);
-			zfcp_erp_port_failed(port, 22, NULL);
+			zfcp_erp_port_failed(port, "erpsck1", NULL);
 		}
 		break;
 	}
@@ -1052,7 +1052,7 @@ static int zfcp_erp_strategy_check_adapter(struct zfcp_adapter *adapter,
 			dev_err(&adapter->ccw_device->dev,
 				"ERP cannot recover an error "
 				"on the FCP device\n");
-			zfcp_erp_adapter_failed(adapter, 23, NULL);
+			zfcp_erp_adapter_failed(adapter, "erasck1", NULL);
 		}
 		break;
 	}
@@ -1117,7 +1117,7 @@ static int zfcp_erp_strategy_statechange(struct zfcp_erp_action *act, int ret)
 		if (zfcp_erp_strat_change_det(&adapter->status, erp_status)) {
 			_zfcp_erp_adapter_reopen(adapter,
 						 ZFCP_STATUS_COMMON_ERP_FAILED,
-						 67, NULL);
+						 "ersscg1", NULL);
 			return ZFCP_ERP_EXIT;
 		}
 		break;
@@ -1127,7 +1127,7 @@ static int zfcp_erp_strategy_statechange(struct zfcp_erp_action *act, int ret)
 		if (zfcp_erp_strat_change_det(&port->status, erp_status)) {
 			_zfcp_erp_port_reopen(port,
 					      ZFCP_STATUS_COMMON_ERP_FAILED,
-					      68, NULL);
+					      "ersscg2", NULL);
 			return ZFCP_ERP_EXIT;
 		}
 		break;
@@ -1136,7 +1136,7 @@ static int zfcp_erp_strategy_statechange(struct zfcp_erp_action *act, int ret)
 		if (zfcp_erp_strat_change_det(&unit->status, erp_status)) {
 			_zfcp_erp_unit_reopen(unit,
 					      ZFCP_STATUS_COMMON_ERP_FAILED,
-					      69, NULL);
+					      "ersscg3", NULL);
 			return ZFCP_ERP_EXIT;
 		}
 		break;
@@ -1155,7 +1155,7 @@ static void zfcp_erp_action_dequeue(struct zfcp_erp_action *erp_action)
 	}
 
 	list_del(&erp_action->list);
-	zfcp_rec_dbf_event_action(144, erp_action);
+	zfcp_rec_dbf_event_action("eractd1", erp_action);
 
 	switch (erp_action->action) {
 	case ZFCP_ERP_ACTION_REOPEN_UNIT:
@@ -1214,38 +1214,8 @@ static void zfcp_erp_schedule_work(struct zfcp_unit *unit)
 	atomic_set_mask(ZFCP_STATUS_UNIT_SCSI_WORK_PENDING, &unit->status);
 	INIT_WORK(&p->work, zfcp_erp_scsi_scan);
 	p->unit = unit;
-	queue_work(zfcp_data.work_queue, &p->work);
-}
-
-static void zfcp_erp_rport_register(struct zfcp_port *port)
-{
-	struct fc_rport_identifiers ids;
-	ids.node_name = port->wwnn;
-	ids.port_name = port->wwpn;
-	ids.port_id = port->d_id;
-	ids.roles = FC_RPORT_ROLE_FCP_TARGET;
-	port->rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids);
-	if (!port->rport) {
-		dev_err(&port->adapter->ccw_device->dev,
-			"Registering port 0x%016Lx failed\n",
-			(unsigned long long)port->wwpn);
-		return;
-	}
-
-	scsi_target_unblock(&port->rport->dev);
-	port->rport->maxframe_size = port->maxframe_size;
-	port->rport->supported_classes = port->supported_classes;
-}
-
-static void zfcp_erp_rports_del(struct zfcp_adapter *adapter)
-{
-	struct zfcp_port *port;
-	list_for_each_entry(port, &adapter->port_list_head, list) {
-		if (!port->rport)
-			continue;
-		fc_remote_port_delete(port->rport);
-		port->rport = NULL;
-	}
+	if (!queue_work(zfcp_data.work_queue, &p->work))
+		zfcp_unit_put(unit);
 }
 
 static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result)
@@ -1256,10 +1226,8 @@ static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result)
 
 	switch (act->action) {
 	case ZFCP_ERP_ACTION_REOPEN_UNIT:
-		if ((result == ZFCP_ERP_SUCCEEDED) &&
-		    !unit->device && port->rport) {
-			atomic_set_mask(ZFCP_STATUS_UNIT_REGISTERED,
-					&unit->status);
+		flush_work(&port->rport_work);
+		if ((result == ZFCP_ERP_SUCCEEDED) && !unit->device) {
 			if (!(atomic_read(&unit->status) &
 			      ZFCP_STATUS_UNIT_SCSI_WORK_PENDING))
 				zfcp_erp_schedule_work(unit);
@@ -1269,27 +1237,17 @@ static void zfcp_erp_action_cleanup(struct zfcp_erp_action *act, int result)
 
 	case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED:
 	case ZFCP_ERP_ACTION_REOPEN_PORT:
-		if (atomic_read(&port->status) & ZFCP_STATUS_PORT_NO_WWPN) {
-			zfcp_port_put(port);
-			return;
-		}
-		if ((result == ZFCP_ERP_SUCCEEDED) && !port->rport)
-			zfcp_erp_rport_register(port);
-		if ((result != ZFCP_ERP_SUCCEEDED) && port->rport) {
-			fc_remote_port_delete(port->rport);
-			port->rport = NULL;
-		}
+		if (result == ZFCP_ERP_SUCCEEDED)
+			zfcp_scsi_schedule_rport_register(port);
 		zfcp_port_put(port);
 		break;
 
 	case ZFCP_ERP_ACTION_REOPEN_ADAPTER:
-		if (result != ZFCP_ERP_SUCCEEDED) {
-			unregister_service_level(&adapter->service_level);
-			zfcp_erp_rports_del(adapter);
-		} else {
+		if (result == ZFCP_ERP_SUCCEEDED) {
 			register_service_level(&adapter->service_level);
 			schedule_work(&adapter->scan_work);
-		}
+		} else
+			unregister_service_level(&adapter->service_level);
 		zfcp_adapter_put(adapter);
 		break;
 	}
@@ -1346,7 +1304,7 @@ static int zfcp_erp_strategy(struct zfcp_erp_action *erp_action)
 			erp_action->status |= ZFCP_STATUS_ERP_LOWMEM;
 		}
 		if (adapter->erp_total_count == adapter->erp_low_mem_count)
-			_zfcp_erp_adapter_reopen(adapter, 0, 66, NULL);
+			_zfcp_erp_adapter_reopen(adapter, 0, "erstgy1", NULL);
 		else {
 			zfcp_erp_strategy_memwait(erp_action);
 			retval = ZFCP_ERP_CONTINUES;
@@ -1406,9 +1364,9 @@ static int zfcp_erp_thread(void *data)
 				zfcp_erp_wakeup(adapter);
 		}
 
-		zfcp_rec_dbf_event_thread_lock(4, adapter);
+		zfcp_rec_dbf_event_thread_lock("erthrd1", adapter);
 		ignore = down_interruptible(&adapter->erp_ready_sem);
-		zfcp_rec_dbf_event_thread_lock(5, adapter);
+		zfcp_rec_dbf_event_thread_lock("erthrd2", adapter);
 	}
 
 	atomic_clear_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_UP, &adapter->status);
@@ -1453,7 +1411,7 @@ void zfcp_erp_thread_kill(struct zfcp_adapter *adapter)
 {
 	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL, &adapter->status);
 	up(&adapter->erp_ready_sem);
-	zfcp_rec_dbf_event_thread_lock(3, adapter);
+	zfcp_rec_dbf_event_thread_lock("erthrk1", adapter);
 
 	wait_event(adapter->erp_thread_wqh,
 		   !(atomic_read(&adapter->status) &
@@ -1469,7 +1427,7 @@ void zfcp_erp_thread_kill(struct zfcp_adapter *adapter)
  * @id: Event id for debug trace.
  * @ref: Reference for debug trace.
  */
-void zfcp_erp_adapter_failed(struct zfcp_adapter *adapter, u8 id, void *ref)
+void zfcp_erp_adapter_failed(struct zfcp_adapter *adapter, char *id, void *ref)
 {
 	zfcp_erp_modify_adapter_status(adapter, id, ref,
 				       ZFCP_STATUS_COMMON_ERP_FAILED, ZFCP_SET);
@@ -1481,7 +1439,7 @@ void zfcp_erp_adapter_failed(struct zfcp_adapter *adapter, u8 id, void *ref)
  * @id: Event id for debug trace.
  * @ref: Reference for debug trace.
  */
-void zfcp_erp_port_failed(struct zfcp_port *port, u8 id, void *ref)
+void zfcp_erp_port_failed(struct zfcp_port *port, char *id, void *ref)
 {
 	zfcp_erp_modify_port_status(port, id, ref,
 				    ZFCP_STATUS_COMMON_ERP_FAILED, ZFCP_SET);
@@ -1493,7 +1451,7 @@ void zfcp_erp_port_failed(struct zfcp_port *port, u8 id, void *ref)
  * @id: Event id for debug trace.
  * @ref: Reference for debug trace.
  */
-void zfcp_erp_unit_failed(struct zfcp_unit *unit, u8 id, void *ref)
+void zfcp_erp_unit_failed(struct zfcp_unit *unit, char *id, void *ref)
 {
 	zfcp_erp_modify_unit_status(unit, id, ref,
 				    ZFCP_STATUS_COMMON_ERP_FAILED, ZFCP_SET);
@@ -1520,7 +1478,7 @@ void zfcp_erp_wait(struct zfcp_adapter *adapter)
  *
  * Changes in common status bits are propagated to attached ports and units.
  */
-void zfcp_erp_modify_adapter_status(struct zfcp_adapter *adapter, u8 id,
+void zfcp_erp_modify_adapter_status(struct zfcp_adapter *adapter, char *id,
 				    void *ref, u32 mask, int set_or_clear)
 {
 	struct zfcp_port *port;
@@ -1554,7 +1512,7 @@ void zfcp_erp_modify_adapter_status(struct zfcp_adapter *adapter, u8 id,
  *
  * Changes in common status bits are propagated to attached units.
  */
-void zfcp_erp_modify_port_status(struct zfcp_port *port, u8 id, void *ref,
+void zfcp_erp_modify_port_status(struct zfcp_port *port, char *id, void *ref,
 				 u32 mask, int set_or_clear)
 {
 	struct zfcp_unit *unit;
@@ -1586,7 +1544,7 @@ void zfcp_erp_modify_port_status(struct zfcp_port *port, u8 id, void *ref,
  * @mask: status bits to change
  * @set_or_clear: ZFCP_SET or ZFCP_CLEAR
  */
-void zfcp_erp_modify_unit_status(struct zfcp_unit *unit, u8 id, void *ref,
+void zfcp_erp_modify_unit_status(struct zfcp_unit *unit, char *id, void *ref,
 				 u32 mask, int set_or_clear)
 {
 	if (set_or_clear == ZFCP_SET) {
@@ -1609,7 +1567,7 @@ void zfcp_erp_modify_unit_status(struct zfcp_unit *unit, u8 id, void *ref,
  * @id: The debug trace id.
  * @id: Reference for the debug trace.
  */
-void zfcp_erp_port_boxed(struct zfcp_port *port, u8 id, void *ref)
+void zfcp_erp_port_boxed(struct zfcp_port *port, char *id, void *ref)
 {
 	unsigned long flags;
 
@@ -1626,7 +1584,7 @@ void zfcp_erp_port_boxed(struct zfcp_port *port, u8 id, void *ref)
  * @id: The debug trace id.
  * @id: Reference for the debug trace.
  */
-void zfcp_erp_unit_boxed(struct zfcp_unit *unit, u8 id, void *ref)
+void zfcp_erp_unit_boxed(struct zfcp_unit *unit, char *id, void *ref)
 {
 	zfcp_erp_modify_unit_status(unit, id, ref,
 				    ZFCP_STATUS_COMMON_ACCESS_BOXED, ZFCP_SET);
@@ -1642,7 +1600,7 @@ void zfcp_erp_unit_boxed(struct zfcp_unit *unit, u8 id, void *ref)
  * Since the adapter has denied access, stop using the port and the
  * attached units.
  */
-void zfcp_erp_port_access_denied(struct zfcp_port *port, u8 id, void *ref)
+void zfcp_erp_port_access_denied(struct zfcp_port *port, char *id, void *ref)
 {
 	unsigned long flags;
 
@@ -1661,14 +1619,14 @@ void zfcp_erp_port_access_denied(struct zfcp_port *port, u8 id, void *ref)
  *
  * Since the adapter has denied access, stop using the unit.
  */
-void zfcp_erp_unit_access_denied(struct zfcp_unit *unit, u8 id, void *ref)
+void zfcp_erp_unit_access_denied(struct zfcp_unit *unit, char *id, void *ref)
 {
 	zfcp_erp_modify_unit_status(unit, id, ref,
 				    ZFCP_STATUS_COMMON_ERP_FAILED |
 				    ZFCP_STATUS_COMMON_ACCESS_DENIED, ZFCP_SET);
 }
 
-static void zfcp_erp_unit_access_changed(struct zfcp_unit *unit, u8 id,
+static void zfcp_erp_unit_access_changed(struct zfcp_unit *unit, char *id,
 					 void *ref)
 {
 	int status = atomic_read(&unit->status);
@@ -1679,7 +1637,7 @@ static void zfcp_erp_unit_access_changed(struct zfcp_unit *unit, u8 id,
 	zfcp_erp_unit_reopen(unit, ZFCP_STATUS_COMMON_ERP_FAILED, id, ref);
 }
 
-static void zfcp_erp_port_access_changed(struct zfcp_port *port, u8 id,
+static void zfcp_erp_port_access_changed(struct zfcp_port *port, char *id,
 					 void *ref)
 {
 	struct zfcp_unit *unit;
@@ -1701,7 +1659,7 @@ static void zfcp_erp_port_access_changed(struct zfcp_port *port, u8 id,
  * @id: Id for debug trace
  * @ref: Reference for debug trace
  */
-void zfcp_erp_adapter_access_changed(struct zfcp_adapter *adapter, u8 id,
+void zfcp_erp_adapter_access_changed(struct zfcp_adapter *adapter, char *id,
 				     void *ref)
 {
 	struct zfcp_port *port;
diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h
index b5adeda93e1d..f6399ca97bcb 100644
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -3,7 +3,7 @@
  *
  * External function declarations.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #ifndef ZFCP_EXT_H
@@ -35,15 +35,15 @@ extern struct miscdevice zfcp_cfdc_misc;
 /* zfcp_dbf.c */
 extern int zfcp_adapter_debug_register(struct zfcp_adapter *);
 extern void zfcp_adapter_debug_unregister(struct zfcp_adapter *);
-extern void zfcp_rec_dbf_event_thread(u8, struct zfcp_adapter *);
-extern void zfcp_rec_dbf_event_thread_lock(u8, struct zfcp_adapter *);
-extern void zfcp_rec_dbf_event_adapter(u8, void *, struct zfcp_adapter *);
-extern void zfcp_rec_dbf_event_port(u8, void *, struct zfcp_port *);
-extern void zfcp_rec_dbf_event_unit(u8, void *, struct zfcp_unit *);
-extern void zfcp_rec_dbf_event_trigger(u8, void *, u8, u8, void *,
+extern void zfcp_rec_dbf_event_thread(char *, struct zfcp_adapter *);
+extern void zfcp_rec_dbf_event_thread_lock(char *, struct zfcp_adapter *);
+extern void zfcp_rec_dbf_event_adapter(char *, void *, struct zfcp_adapter *);
+extern void zfcp_rec_dbf_event_port(char *, void *, struct zfcp_port *);
+extern void zfcp_rec_dbf_event_unit(char *, void *, struct zfcp_unit *);
+extern void zfcp_rec_dbf_event_trigger(char *, void *, u8, u8, void *,
 				       struct zfcp_adapter *,
 				       struct zfcp_port *, struct zfcp_unit *);
-extern void zfcp_rec_dbf_event_action(u8, struct zfcp_erp_action *);
+extern void zfcp_rec_dbf_event_action(char *, struct zfcp_erp_action *);
 extern void zfcp_hba_dbf_event_fsf_response(struct zfcp_fsf_req *);
 extern void zfcp_hba_dbf_event_fsf_unsol(const char *, struct zfcp_adapter *,
 					 struct fsf_status_read_buffer *);
@@ -66,31 +66,34 @@ extern void zfcp_scsi_dbf_event_devreset(const char *, u8, struct zfcp_unit *,
 					 struct scsi_cmnd *);
 
 /* zfcp_erp.c */
-extern void zfcp_erp_modify_adapter_status(struct zfcp_adapter *, u8, void *,
-					   u32, int);
-extern void zfcp_erp_adapter_reopen(struct zfcp_adapter *, int, u8, void *);
-extern void zfcp_erp_adapter_shutdown(struct zfcp_adapter *, int, u8, void *);
-extern void zfcp_erp_adapter_failed(struct zfcp_adapter *, u8, void *);
-extern void zfcp_erp_modify_port_status(struct zfcp_port *, u8, void *, u32,
+extern void zfcp_erp_modify_adapter_status(struct zfcp_adapter *, char *,
+					   void *, u32, int);
+extern void zfcp_erp_adapter_reopen(struct zfcp_adapter *, int, char *, void *);
+extern void zfcp_erp_adapter_shutdown(struct zfcp_adapter *, int, char *,
+				      void *);
+extern void zfcp_erp_adapter_failed(struct zfcp_adapter *, char *, void *);
+extern void zfcp_erp_modify_port_status(struct zfcp_port *, char *, void *, u32,
 					int);
-extern int  zfcp_erp_port_reopen(struct zfcp_port *, int, u8, void *);
-extern void zfcp_erp_port_shutdown(struct zfcp_port *, int, u8, void *);
-extern void zfcp_erp_port_forced_reopen(struct zfcp_port *, int, u8, void *);
-extern void zfcp_erp_port_failed(struct zfcp_port *, u8, void *);
-extern void zfcp_erp_modify_unit_status(struct zfcp_unit *, u8, void *, u32,
+extern int  zfcp_erp_port_reopen(struct zfcp_port *, int, char *, void *);
+extern void zfcp_erp_port_shutdown(struct zfcp_port *, int, char *, void *);
+extern void zfcp_erp_port_forced_reopen(struct zfcp_port *, int, char *,
+					void *);
+extern void zfcp_erp_port_failed(struct zfcp_port *, char *, void *);
+extern void zfcp_erp_modify_unit_status(struct zfcp_unit *, char *, void *, u32,
 					int);
-extern void zfcp_erp_unit_reopen(struct zfcp_unit *, int, u8, void *);
-extern void zfcp_erp_unit_shutdown(struct zfcp_unit *, int, u8, void *);
-extern void zfcp_erp_unit_failed(struct zfcp_unit *, u8, void *);
+extern void zfcp_erp_unit_reopen(struct zfcp_unit *, int, char *, void *);
+extern void zfcp_erp_unit_shutdown(struct zfcp_unit *, int, char *, void *);
+extern void zfcp_erp_unit_failed(struct zfcp_unit *, char *, void *);
 extern int  zfcp_erp_thread_setup(struct zfcp_adapter *);
 extern void zfcp_erp_thread_kill(struct zfcp_adapter *);
 extern void zfcp_erp_wait(struct zfcp_adapter *);
 extern void zfcp_erp_notify(struct zfcp_erp_action *, unsigned long);
-extern void zfcp_erp_port_boxed(struct zfcp_port *, u8, void *);
-extern void zfcp_erp_unit_boxed(struct zfcp_unit *, u8, void *);
-extern void zfcp_erp_port_access_denied(struct zfcp_port *, u8, void *);
-extern void zfcp_erp_unit_access_denied(struct zfcp_unit *, u8, void *);
-extern void zfcp_erp_adapter_access_changed(struct zfcp_adapter *, u8, void *);
+extern void zfcp_erp_port_boxed(struct zfcp_port *, char *, void *);
+extern void zfcp_erp_unit_boxed(struct zfcp_unit *, char *, void *);
+extern void zfcp_erp_port_access_denied(struct zfcp_port *, char *, void *);
+extern void zfcp_erp_unit_access_denied(struct zfcp_unit *, char *, void *);
+extern void zfcp_erp_adapter_access_changed(struct zfcp_adapter *, char *,
+					    void *);
 extern void zfcp_erp_timeout_handler(unsigned long);
 extern void zfcp_erp_port_strategy_open_lookup(struct work_struct *);
 
@@ -101,6 +104,7 @@ extern void zfcp_fc_incoming_els(struct zfcp_fsf_req *);
 extern int zfcp_fc_ns_gid_pn(struct zfcp_erp_action *);
 extern void zfcp_fc_plogi_evaluate(struct zfcp_port *, struct fsf_plogi *);
 extern void zfcp_test_link(struct zfcp_port *);
+extern void zfcp_fc_link_test_work(struct work_struct *);
 extern void zfcp_fc_nameserver_init(struct zfcp_adapter *);
 
 /* zfcp_fsf.c */
@@ -125,16 +129,13 @@ extern int zfcp_status_read_refill(struct zfcp_adapter *adapter);
 extern int zfcp_fsf_send_ct(struct zfcp_send_ct *, mempool_t *,
 			    struct zfcp_erp_action *);
 extern int zfcp_fsf_send_els(struct zfcp_send_els *);
-extern int zfcp_fsf_send_fcp_command_task(struct zfcp_adapter *,
-					  struct zfcp_unit *,
-					  struct scsi_cmnd *, int, int);
+extern int zfcp_fsf_send_fcp_command_task(struct zfcp_unit *,
+					  struct scsi_cmnd *);
 extern void zfcp_fsf_req_complete(struct zfcp_fsf_req *);
 extern void zfcp_fsf_req_free(struct zfcp_fsf_req *);
-extern struct zfcp_fsf_req *zfcp_fsf_send_fcp_ctm(struct zfcp_adapter *,
-						  struct zfcp_unit *, u8, int);
+extern struct zfcp_fsf_req *zfcp_fsf_send_fcp_ctm(struct zfcp_unit *, u8);
 extern struct zfcp_fsf_req *zfcp_fsf_abort_fcp_command(unsigned long,
-						       struct zfcp_adapter *,
-						       struct zfcp_unit *, int);
+						       struct zfcp_unit *);
 
 /* zfcp_qdio.c */
 extern int zfcp_qdio_allocate(struct zfcp_adapter *);
@@ -153,6 +154,10 @@ extern int zfcp_adapter_scsi_register(struct zfcp_adapter *);
 extern void zfcp_adapter_scsi_unregister(struct zfcp_adapter *);
 extern char *zfcp_get_fcp_sns_info_ptr(struct fcp_rsp_iu *);
 extern struct fc_function_template zfcp_transport_functions;
+extern void zfcp_scsi_rport_work(struct work_struct *);
+extern void zfcp_scsi_schedule_rport_register(struct zfcp_port *);
+extern void zfcp_scsi_schedule_rport_block(struct zfcp_port *);
+extern void zfcp_scsi_schedule_rports_block(struct zfcp_adapter *);
 
 /* zfcp_sysfs.c */
 extern struct attribute_group zfcp_sysfs_unit_attrs;
diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
index eabdfe24456e..aab8123c5966 100644
--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -3,7 +3,7 @@
  *
  * Fibre Channel related functions for the zfcp device driver.
  *
- * Copyright IBM Corporation 2008
+ * Copyright IBM Corporation 2008, 2009
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -98,8 +98,12 @@ static void zfcp_wka_port_offline(struct work_struct *work)
 	struct zfcp_wka_port *wka_port =
 			container_of(dw, struct zfcp_wka_port, work);
 
-	wait_event(wka_port->completion_wq,
-			atomic_read(&wka_port->refcount) == 0);
+	/* Don't wait forvever. If the wka_port is too busy take it offline
+	   through a new call later */
+	if (!wait_event_timeout(wka_port->completion_wq,
+				atomic_read(&wka_port->refcount) == 0,
+				HZ >> 1))
+		return;
 
 	mutex_lock(&wka_port->mutex);
 	if ((atomic_read(&wka_port->refcount) != 0) ||
@@ -145,16 +149,10 @@ static void _zfcp_fc_incoming_rscn(struct zfcp_fsf_req *fsf_req, u32 range,
 	struct zfcp_port *port;
 
 	read_lock_irqsave(&zfcp_data.config_lock, flags);
-	list_for_each_entry(port, &fsf_req->adapter->port_list_head, list) {
-		if (!(atomic_read(&port->status) & ZFCP_STATUS_PORT_PHYS_OPEN))
-			/* Try to connect to unused ports anyway. */
-			zfcp_erp_port_reopen(port,
-					     ZFCP_STATUS_COMMON_ERP_FAILED,
-					     82, fsf_req);
-		else if ((port->d_id & range) == (elem->nport_did & range))
-			/* Check connection status for connected ports */
+	list_for_each_entry(port, &fsf_req->adapter->port_list_head, list)
+		if ((port->d_id & range) == (elem->nport_did & range))
 			zfcp_test_link(port);
-	}
+
 	read_unlock_irqrestore(&zfcp_data.config_lock, flags);
 }
 
@@ -196,7 +194,7 @@ static void zfcp_fc_incoming_wwpn(struct zfcp_fsf_req *req, u64 wwpn)
 	read_unlock_irqrestore(&zfcp_data.config_lock, flags);
 
 	if (port && (port->wwpn == wwpn))
-		zfcp_erp_port_forced_reopen(port, 0, 83, req);
+		zfcp_erp_port_forced_reopen(port, 0, "fciwwp1", req);
 }
 
 static void zfcp_fc_incoming_plogi(struct zfcp_fsf_req *req)
@@ -259,10 +257,9 @@ static void zfcp_fc_ns_gid_pn_eval(unsigned long data)
 
 	if (ct->status)
 		return;
-	if (ct_iu_resp->header.cmd_rsp_code != ZFCP_CT_ACCEPT) {
-		atomic_set_mask(ZFCP_STATUS_PORT_INVALID_WWPN, &port->status);
+	if (ct_iu_resp->header.cmd_rsp_code != ZFCP_CT_ACCEPT)
 		return;
-	}
+
 	/* paranoia */
 	if (ct_iu_req->wwpn != port->wwpn)
 		return;
@@ -375,16 +372,22 @@ static void zfcp_fc_adisc_handler(unsigned long data)
 
 	if (adisc->els.status) {
 		/* request rejected or timed out */
-		zfcp_erp_port_forced_reopen(port, 0, 63, NULL);
+		zfcp_erp_port_forced_reopen(port, 0, "fcadh_1", NULL);
 		goto out;
 	}
 
 	if (!port->wwnn)
 		port->wwnn = ls_adisc->wwnn;
 
-	if (port->wwpn != ls_adisc->wwpn)
-		zfcp_erp_port_reopen(port, 0, 64, NULL);
+	if ((port->wwpn != ls_adisc->wwpn) ||
+	    !(atomic_read(&port->status) & ZFCP_STATUS_COMMON_OPEN)) {
+		zfcp_erp_port_reopen(port, ZFCP_STATUS_COMMON_ERP_FAILED,
+				     "fcadh_2", NULL);
+		goto out;
+	}
 
+	/* port is good, unblock rport without going through erp */
+	zfcp_scsi_schedule_rport_register(port);
  out:
 	zfcp_port_put(port);
 	kfree(adisc);
@@ -422,6 +425,31 @@ static int zfcp_fc_adisc(struct zfcp_port *port)
 	return zfcp_fsf_send_els(&adisc->els);
 }
 
+void zfcp_fc_link_test_work(struct work_struct *work)
+{
+	struct zfcp_port *port =
+		container_of(work, struct zfcp_port, test_link_work);
+	int retval;
+
+	if (!(atomic_read(&port->status) & ZFCP_STATUS_COMMON_UNBLOCKED)) {
+		zfcp_port_put(port);
+		return; /* port erp is running and will update rport status */
+	}
+
+	zfcp_port_get(port);
+	port->rport_task = RPORT_DEL;
+	zfcp_scsi_rport_work(&port->rport_work);
+
+	retval = zfcp_fc_adisc(port);
+	if (retval == 0)
+		return;
+
+	/* send of ADISC was not possible */
+	zfcp_erp_port_forced_reopen(port, 0, "fcltwk1", NULL);
+
+	zfcp_port_put(port);
+}
+
 /**
  * zfcp_test_link - lightweight link test procedure
  * @port: port to be tested
@@ -432,17 +460,9 @@ static int zfcp_fc_adisc(struct zfcp_port *port)
  */
 void zfcp_test_link(struct zfcp_port *port)
 {
-	int retval;
-
 	zfcp_port_get(port);
-	retval = zfcp_fc_adisc(port);
-	if (retval == 0)
-		return;
-
-	/* send of ADISC was not possible */
-	zfcp_port_put(port);
-	if (retval != -EBUSY)
-		zfcp_erp_port_forced_reopen(port, 0, 65, NULL);
+	if (!queue_work(zfcp_data.work_queue, &port->test_link_work))
+		zfcp_port_put(port);
 }
 
 static void zfcp_free_sg_env(struct zfcp_gpn_ft *gpn_ft, int buf_num)
@@ -529,7 +549,7 @@ static void zfcp_validate_port(struct zfcp_port *port)
 		zfcp_port_put(port);
 		return;
 	}
-	zfcp_erp_port_shutdown(port, 0, 151, NULL);
+	zfcp_erp_port_shutdown(port, 0, "fcpval1", NULL);
 	zfcp_erp_wait(adapter);
 	zfcp_port_put(port);
 	zfcp_port_dequeue(port);
@@ -592,7 +612,7 @@ static int zfcp_scan_eval_gpn_ft(struct zfcp_gpn_ft *gpn_ft, int max_entries)
 		if (IS_ERR(port))
 			ret = PTR_ERR(port);
 		else
-			zfcp_erp_port_reopen(port, 0, 149, NULL);
+			zfcp_erp_port_reopen(port, 0, "fcegpf1", NULL);
 	}
 
 	zfcp_erp_wait(adapter);
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index e6416f8541b0..b29f3121b666 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -3,7 +3,7 @@
  *
  * Implementation of FSF commands.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -12,11 +12,14 @@
 #include <linux/blktrace_api.h>
 #include "zfcp_ext.h"
 
+#define ZFCP_REQ_AUTO_CLEANUP	0x00000002
+#define ZFCP_REQ_NO_QTCB	0x00000008
+
 static void zfcp_fsf_request_timeout_handler(unsigned long data)
 {
 	struct zfcp_adapter *adapter = (struct zfcp_adapter *) data;
-	zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED, 62,
-				NULL);
+	zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED,
+				"fsrth_1", NULL);
 }
 
 static void zfcp_fsf_start_timer(struct zfcp_fsf_req *fsf_req,
@@ -75,7 +78,7 @@ static void zfcp_fsf_access_denied_port(struct zfcp_fsf_req *req,
 		 (unsigned long long)port->wwpn);
 	zfcp_act_eval_err(req->adapter, header->fsf_status_qual.halfword[0]);
 	zfcp_act_eval_err(req->adapter, header->fsf_status_qual.halfword[1]);
-	zfcp_erp_port_access_denied(port, 55, req);
+	zfcp_erp_port_access_denied(port, "fspad_1", req);
 	req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 }
 
@@ -89,7 +92,7 @@ static void zfcp_fsf_access_denied_unit(struct zfcp_fsf_req *req,
 		 (unsigned long long)unit->port->wwpn);
 	zfcp_act_eval_err(req->adapter, header->fsf_status_qual.halfword[0]);
 	zfcp_act_eval_err(req->adapter, header->fsf_status_qual.halfword[1]);
-	zfcp_erp_unit_access_denied(unit, 59, req);
+	zfcp_erp_unit_access_denied(unit, "fsuad_1", req);
 	req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 }
 
@@ -97,7 +100,7 @@ static void zfcp_fsf_class_not_supp(struct zfcp_fsf_req *req)
 {
 	dev_err(&req->adapter->ccw_device->dev, "FCP device not "
 		"operational because of an unsupported FC class\n");
-	zfcp_erp_adapter_shutdown(req->adapter, 0, 123, req);
+	zfcp_erp_adapter_shutdown(req->adapter, 0, "fscns_1", req);
 	req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 }
 
@@ -159,20 +162,13 @@ static void zfcp_fsf_status_read_port_closed(struct zfcp_fsf_req *req)
 	list_for_each_entry(port, &adapter->port_list_head, list)
 		if (port->d_id == d_id) {
 			read_unlock_irqrestore(&zfcp_data.config_lock, flags);
-			switch (sr_buf->status_subtype) {
-			case FSF_STATUS_READ_SUB_CLOSE_PHYS_PORT:
-				zfcp_erp_port_reopen(port, 0, 101, req);
-				break;
-			case FSF_STATUS_READ_SUB_ERROR_PORT:
-				zfcp_erp_port_shutdown(port, 0, 122, req);
-				break;
-			}
+			zfcp_erp_port_reopen(port, 0, "fssrpc1", req);
 			return;
 		}
 	read_unlock_irqrestore(&zfcp_data.config_lock, flags);
 }
 
-static void zfcp_fsf_link_down_info_eval(struct zfcp_fsf_req *req, u8 id,
+static void zfcp_fsf_link_down_info_eval(struct zfcp_fsf_req *req, char *id,
 					 struct fsf_link_down_info *link_down)
 {
 	struct zfcp_adapter *adapter = req->adapter;
@@ -181,6 +177,7 @@ static void zfcp_fsf_link_down_info_eval(struct zfcp_fsf_req *req, u8 id,
 		return;
 
 	atomic_set_mask(ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED, &adapter->status);
+	zfcp_scsi_schedule_rports_block(adapter);
 
 	if (!link_down)
 		goto out;
@@ -261,13 +258,13 @@ static void zfcp_fsf_status_read_link_down(struct zfcp_fsf_req *req)
 
 	switch (sr_buf->status_subtype) {
 	case FSF_STATUS_READ_SUB_NO_PHYSICAL_LINK:
-		zfcp_fsf_link_down_info_eval(req, 38, ldi);
+		zfcp_fsf_link_down_info_eval(req, "fssrld1", ldi);
 		break;
 	case FSF_STATUS_READ_SUB_FDISC_FAILED:
-		zfcp_fsf_link_down_info_eval(req, 39, ldi);
+		zfcp_fsf_link_down_info_eval(req, "fssrld2", ldi);
 		break;
 	case FSF_STATUS_READ_SUB_FIRMWARE_UPDATE:
-		zfcp_fsf_link_down_info_eval(req, 40, NULL);
+		zfcp_fsf_link_down_info_eval(req, "fssrld3", NULL);
 	};
 }
 
@@ -307,22 +304,23 @@ static void zfcp_fsf_status_read_handler(struct zfcp_fsf_req *req)
 		dev_info(&adapter->ccw_device->dev,
 			 "The local link has been restored\n");
 		/* All ports should be marked as ready to run again */
-		zfcp_erp_modify_adapter_status(adapter, 30, NULL,
+		zfcp_erp_modify_adapter_status(adapter, "fssrh_1", NULL,
 					       ZFCP_STATUS_COMMON_RUNNING,
 					       ZFCP_SET);
 		zfcp_erp_adapter_reopen(adapter,
 					ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED |
 					ZFCP_STATUS_COMMON_ERP_FAILED,
-					102, req);
+					"fssrh_2", req);
 		break;
 	case FSF_STATUS_READ_NOTIFICATION_LOST:
 		if (sr_buf->status_subtype & FSF_STATUS_READ_SUB_ACT_UPDATED)
-			zfcp_erp_adapter_access_changed(adapter, 135, req);
+			zfcp_erp_adapter_access_changed(adapter, "fssrh_3",
+							req);
 		if (sr_buf->status_subtype & FSF_STATUS_READ_SUB_INCOMING_ELS)
 			schedule_work(&adapter->scan_work);
 		break;
 	case FSF_STATUS_READ_CFDC_UPDATED:
-		zfcp_erp_adapter_access_changed(adapter, 136, req);
+		zfcp_erp_adapter_access_changed(adapter, "fssrh_4", req);
 		break;
 	case FSF_STATUS_READ_FEATURE_UPDATE_ALERT:
 		adapter->adapter_features = sr_buf->payload.word[0];
@@ -351,7 +349,7 @@ static void zfcp_fsf_fsfstatus_qual_eval(struct zfcp_fsf_req *req)
 		dev_err(&req->adapter->ccw_device->dev,
 			"The FCP adapter reported a problem "
 			"that cannot be recovered\n");
-		zfcp_erp_adapter_shutdown(req->adapter, 0, 121, req);
+		zfcp_erp_adapter_shutdown(req->adapter, 0, "fsfsqe1", req);
 		break;
 	}
 	/* all non-return stats set FSFREQ_ERROR*/
@@ -368,7 +366,7 @@ static void zfcp_fsf_fsfstatus_eval(struct zfcp_fsf_req *req)
 		dev_err(&req->adapter->ccw_device->dev,
 			"The FCP adapter does not recognize the command 0x%x\n",
 			req->qtcb->header.fsf_command);
-		zfcp_erp_adapter_shutdown(req->adapter, 0, 120, req);
+		zfcp_erp_adapter_shutdown(req->adapter, 0, "fsfse_1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_ADAPTER_STATUS_AVAILABLE:
@@ -400,17 +398,17 @@ static void zfcp_fsf_protstatus_eval(struct zfcp_fsf_req *req)
 			"QTCB version 0x%x not supported by FCP adapter "
 			"(0x%x to 0x%x)\n", FSF_QTCB_CURRENT_VERSION,
 			psq->word[0], psq->word[1]);
-		zfcp_erp_adapter_shutdown(adapter, 0, 117, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fspse_1", req);
 		break;
 	case FSF_PROT_ERROR_STATE:
 	case FSF_PROT_SEQ_NUMB_ERROR:
-		zfcp_erp_adapter_reopen(adapter, 0, 98, req);
+		zfcp_erp_adapter_reopen(adapter, 0, "fspse_2", req);
 		req->status |= ZFCP_STATUS_FSFREQ_RETRY;
 		break;
 	case FSF_PROT_UNSUPP_QTCB_TYPE:
 		dev_err(&adapter->ccw_device->dev,
 			"The QTCB type is not supported by the FCP adapter\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 118, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fspse_3", req);
 		break;
 	case FSF_PROT_HOST_CONNECTION_INITIALIZING:
 		atomic_set_mask(ZFCP_STATUS_ADAPTER_HOST_CON_INIT,
@@ -420,27 +418,29 @@ static void zfcp_fsf_protstatus_eval(struct zfcp_fsf_req *req)
 		dev_err(&adapter->ccw_device->dev,
 			"0x%Lx is an ambiguous request identifier\n",
 			(unsigned long long)qtcb->bottom.support.req_handle);
-		zfcp_erp_adapter_shutdown(adapter, 0, 78, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fspse_4", req);
 		break;
 	case FSF_PROT_LINK_DOWN:
-		zfcp_fsf_link_down_info_eval(req, 37, &psq->link_down_info);
+		zfcp_fsf_link_down_info_eval(req, "fspse_5",
+					     &psq->link_down_info);
 		/* FIXME: reopening adapter now? better wait for link up */
-		zfcp_erp_adapter_reopen(adapter, 0, 79, req);
+		zfcp_erp_adapter_reopen(adapter, 0, "fspse_6", req);
 		break;
 	case FSF_PROT_REEST_QUEUE:
 		/* All ports should be marked as ready to run again */
-		zfcp_erp_modify_adapter_status(adapter, 28, NULL,
+		zfcp_erp_modify_adapter_status(adapter, "fspse_7", NULL,
 					       ZFCP_STATUS_COMMON_RUNNING,
 					       ZFCP_SET);
 		zfcp_erp_adapter_reopen(adapter,
 					ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED |
-					ZFCP_STATUS_COMMON_ERP_FAILED, 99, req);
+					ZFCP_STATUS_COMMON_ERP_FAILED,
+					"fspse_8", req);
 		break;
 	default:
 		dev_err(&adapter->ccw_device->dev,
 			"0x%x is not a valid transfer protocol status\n",
 			qtcb->prefix.prot_status);
-		zfcp_erp_adapter_shutdown(adapter, 0, 119, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fspse_9", req);
 	}
 	req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 }
@@ -526,7 +526,7 @@ static int zfcp_fsf_exchange_config_evaluate(struct zfcp_fsf_req *req)
 		dev_err(&adapter->ccw_device->dev,
 			"Unknown or unsupported arbitrated loop "
 			"fibre channel topology detected\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 127, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fsece_1", req);
 		return -EIO;
 	}
 
@@ -560,7 +560,7 @@ static void zfcp_fsf_exchange_config_data_handler(struct zfcp_fsf_req *req)
 				"FCP adapter maximum QTCB size (%d bytes) "
 				"is too small\n",
 				bottom->max_qtcb_size);
-			zfcp_erp_adapter_shutdown(adapter, 0, 129, req);
+			zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh1", req);
 			return;
 		}
 		atomic_set_mask(ZFCP_STATUS_ADAPTER_XCONFIG_OK,
@@ -577,11 +577,11 @@ static void zfcp_fsf_exchange_config_data_handler(struct zfcp_fsf_req *req)
 		atomic_set_mask(ZFCP_STATUS_ADAPTER_XCONFIG_OK,
 				&adapter->status);
 
-		zfcp_fsf_link_down_info_eval(req, 42,
+		zfcp_fsf_link_down_info_eval(req, "fsecdh2",
 			&qtcb->header.fsf_status_qual.link_down_info);
 		break;
 	default:
-		zfcp_erp_adapter_shutdown(adapter, 0, 130, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh3", req);
 		return;
 	}
 
@@ -597,14 +597,14 @@ static void zfcp_fsf_exchange_config_data_handler(struct zfcp_fsf_req *req)
 		dev_err(&adapter->ccw_device->dev,
 			"The FCP adapter only supports newer "
 			"control block versions\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 125, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh4", req);
 		return;
 	}
 	if (FSF_QTCB_CURRENT_VERSION > bottom->high_qtcb_version) {
 		dev_err(&adapter->ccw_device->dev,
 			"The FCP adapter only supports older "
 			"control block versions\n");
-		zfcp_erp_adapter_shutdown(adapter, 0, 126, req);
+		zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh5", req);
 	}
 }
 
@@ -617,9 +617,10 @@ static void zfcp_fsf_exchange_port_evaluate(struct zfcp_fsf_req *req)
 	if (req->data)
 		memcpy(req->data, bottom, sizeof(*bottom));
 
-	if (adapter->connection_features & FSF_FEATURE_NPIV_MODE)
+	if (adapter->connection_features & FSF_FEATURE_NPIV_MODE) {
 		fc_host_permanent_port_name(shost) = bottom->wwpn;
-	else
+		fc_host_port_type(shost) = FC_PORTTYPE_NPIV;
+	} else
 		fc_host_permanent_port_name(shost) = fc_host_port_name(shost);
 	fc_host_maxframe_size(shost) = bottom->maximum_frame_size;
 	fc_host_supported_speeds(shost) = bottom->supported_speed;
@@ -638,20 +639,12 @@ static void zfcp_fsf_exchange_port_data_handler(struct zfcp_fsf_req *req)
 		break;
 	case FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE:
 		zfcp_fsf_exchange_port_evaluate(req);
-		zfcp_fsf_link_down_info_eval(req, 43,
+		zfcp_fsf_link_down_info_eval(req, "fsepdh1",
 			&qtcb->header.fsf_status_qual.link_down_info);
 		break;
 	}
 }
 
-static int zfcp_fsf_sbal_available(struct zfcp_adapter *adapter)
-{
-	if (atomic_read(&adapter->req_q.count) > 0)
-		return 1;
-	atomic_inc(&adapter->qdio_outb_full);
-	return 0;
-}
-
 static int zfcp_fsf_req_sbal_get(struct zfcp_adapter *adapter)
 	__releases(&adapter->req_q_lock)
 	__acquires(&adapter->req_q_lock)
@@ -735,7 +728,7 @@ static struct zfcp_fsf_req *zfcp_fsf_req_create(struct zfcp_adapter *adapter,
 
 	req->adapter = adapter;
 	req->fsf_command = fsf_cmd;
-	req->req_id = adapter->req_no++;
+	req->req_id = adapter->req_no;
 	req->sbal_number = 1;
 	req->sbal_first = req_q->first;
 	req->sbal_last = req_q->first;
@@ -791,13 +784,14 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
 		if (zfcp_reqlist_find_safe(adapter, req))
 			zfcp_reqlist_remove(adapter, req);
 		spin_unlock_irqrestore(&adapter->req_list_lock, flags);
-		zfcp_erp_adapter_reopen(adapter, 0, 116, req);
+		zfcp_erp_adapter_reopen(adapter, 0, "fsrs__1", req);
 		return -EIO;
 	}
 
 	/* Don't increase for unsolicited status */
 	if (req->qtcb)
 		adapter->fsf_req_seq_no++;
+	adapter->req_no++;
 
 	return 0;
 }
@@ -870,14 +864,14 @@ static void zfcp_fsf_abort_fcp_command_handler(struct zfcp_fsf_req *req)
 	switch (req->qtcb->header.fsf_status) {
 	case FSF_PORT_HANDLE_NOT_VALID:
 		if (fsq->word[0] == fsq->word[1]) {
-			zfcp_erp_adapter_reopen(unit->port->adapter, 0, 104,
-						req);
+			zfcp_erp_adapter_reopen(unit->port->adapter, 0,
+						"fsafch1", req);
 			req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		}
 		break;
 	case FSF_LUN_HANDLE_NOT_VALID:
 		if (fsq->word[0] == fsq->word[1]) {
-			zfcp_erp_port_reopen(unit->port, 0, 105, req);
+			zfcp_erp_port_reopen(unit->port, 0, "fsafch2", req);
 			req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		}
 		break;
@@ -885,12 +879,12 @@ static void zfcp_fsf_abort_fcp_command_handler(struct zfcp_fsf_req *req)
 		req->status |= ZFCP_STATUS_FSFREQ_ABORTNOTNEEDED;
 		break;
 	case FSF_PORT_BOXED:
-		zfcp_erp_port_boxed(unit->port, 47, req);
+		zfcp_erp_port_boxed(unit->port, "fsafch3", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
 	case FSF_LUN_BOXED:
-		zfcp_erp_unit_boxed(unit, 48, req);
+		zfcp_erp_unit_boxed(unit, "fsafch4", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
                 break;
@@ -912,27 +906,22 @@ static void zfcp_fsf_abort_fcp_command_handler(struct zfcp_fsf_req *req)
 /**
  * zfcp_fsf_abort_fcp_command - abort running SCSI command
  * @old_req_id: unsigned long
- * @adapter: pointer to struct zfcp_adapter
  * @unit: pointer to struct zfcp_unit
- * @req_flags: integer specifying the request flags
  * Returns: pointer to struct zfcp_fsf_req
- *
- * FIXME(design): should be watched by a timeout !!!
  */
 
 struct zfcp_fsf_req *zfcp_fsf_abort_fcp_command(unsigned long old_req_id,
-						struct zfcp_adapter *adapter,
-						struct zfcp_unit *unit,
-						int req_flags)
+						struct zfcp_unit *unit)
 {
 	struct qdio_buffer_element *sbale;
 	struct zfcp_fsf_req *req = NULL;
+	struct zfcp_adapter *adapter = unit->port->adapter;
 
-	spin_lock(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	spin_lock_bh(&adapter->req_q_lock);
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
 	req = zfcp_fsf_req_create(adapter, FSF_QTCB_ABORT_FCP_CMND,
-				  req_flags, adapter->pool.fsf_req_abort);
+				  0, adapter->pool.fsf_req_abort);
 	if (IS_ERR(req)) {
 		req = NULL;
 		goto out;
@@ -960,7 +949,7 @@ out_error_free:
 	zfcp_fsf_req_free(req);
 	req = NULL;
 out:
-	spin_unlock(&adapter->req_q_lock);
+	spin_unlock_bh(&adapter->req_q_lock);
 	return req;
 }
 
@@ -998,7 +987,7 @@ static void zfcp_fsf_send_ct_handler(struct zfcp_fsf_req *req)
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(adapter, 0, 106, req);
+		zfcp_erp_adapter_reopen(adapter, 0, "fsscth1", req);
 	case FSF_GENERIC_COMMAND_REJECTED:
 	case FSF_PAYLOAD_SIZE_MISMATCH:
 	case FSF_REQUEST_SIZE_TOO_LARGE:
@@ -1174,12 +1163,8 @@ int zfcp_fsf_send_els(struct zfcp_send_els *els)
 	struct fsf_qtcb_bottom_support *bottom;
 	int ret = -EIO;
 
-	if (unlikely(!(atomic_read(&els->port->status) &
-		       ZFCP_STATUS_COMMON_UNBLOCKED)))
-		return -EBUSY;
-
-	spin_lock(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	spin_lock_bh(&adapter->req_q_lock);
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
 	req = zfcp_fsf_req_create(adapter, FSF_QTCB_SEND_ELS,
 				  ZFCP_REQ_AUTO_CLEANUP, NULL);
@@ -1212,7 +1197,7 @@ int zfcp_fsf_send_els(struct zfcp_send_els *els)
 failed_send:
 	zfcp_fsf_req_free(req);
 out:
-	spin_unlock(&adapter->req_q_lock);
+	spin_unlock_bh(&adapter->req_q_lock);
 	return ret;
 }
 
@@ -1224,7 +1209,7 @@ int zfcp_fsf_exchange_config_data(struct zfcp_erp_action *erp_action)
 	int retval = -EIO;
 
 	spin_lock_bh(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
 	req = zfcp_fsf_req_create(adapter,
 				  FSF_QTCB_EXCHANGE_CONFIG_DATA,
@@ -1320,7 +1305,7 @@ int zfcp_fsf_exchange_port_data(struct zfcp_erp_action *erp_action)
 		return -EOPNOTSUPP;
 
 	spin_lock_bh(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
 	req = zfcp_fsf_req_create(adapter, FSF_QTCB_EXCHANGE_PORT_DATA,
 				  ZFCP_REQ_AUTO_CLEANUP,
@@ -1366,7 +1351,7 @@ int zfcp_fsf_exchange_port_data_sync(struct zfcp_adapter *adapter,
 		return -EOPNOTSUPP;
 
 	spin_lock_bh(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
 
 	req = zfcp_fsf_req_create(adapter, FSF_QTCB_EXCHANGE_PORT_DATA, 0,
@@ -1416,7 +1401,7 @@ static void zfcp_fsf_open_port_handler(struct zfcp_fsf_req *req)
 			 "Not enough FCP adapter resources to open "
 			 "remote port 0x%016Lx\n",
 			 (unsigned long long)port->wwpn);
-		zfcp_erp_port_failed(port, 31, req);
+		zfcp_erp_port_failed(port, "fsoph_1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_ADAPTER_STATUS_AVAILABLE:
@@ -1522,13 +1507,13 @@ static void zfcp_fsf_close_port_handler(struct zfcp_fsf_req *req)
 
 	switch (req->qtcb->header.fsf_status) {
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(port->adapter, 0, 107, req);
+		zfcp_erp_adapter_reopen(port->adapter, 0, "fscph_1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_ADAPTER_STATUS_AVAILABLE:
 		break;
 	case FSF_GOOD:
-		zfcp_erp_modify_port_status(port, 33, req,
+		zfcp_erp_modify_port_status(port, "fscph_2", req,
 					    ZFCP_STATUS_COMMON_OPEN,
 					    ZFCP_CLEAR);
 		break;
@@ -1657,7 +1642,7 @@ static void zfcp_fsf_close_wka_port_handler(struct zfcp_fsf_req *req)
 
 	if (req->qtcb->header.fsf_status == FSF_PORT_HANDLE_NOT_VALID) {
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
-		zfcp_erp_adapter_reopen(wka_port->adapter, 0, 84, req);
+		zfcp_erp_adapter_reopen(wka_port->adapter, 0, "fscwph1", req);
 	}
 
 	wka_port->status = ZFCP_WKA_PORT_OFFLINE;
@@ -1712,18 +1697,18 @@ static void zfcp_fsf_close_physical_port_handler(struct zfcp_fsf_req *req)
 	struct zfcp_unit *unit;
 
 	if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
-		goto skip_fsfstatus;
+		return;
 
 	switch (header->fsf_status) {
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(port->adapter, 0, 108, req);
+		zfcp_erp_adapter_reopen(port->adapter, 0, "fscpph1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_ACCESS_DENIED:
 		zfcp_fsf_access_denied_port(req, port);
 		break;
 	case FSF_PORT_BOXED:
-		zfcp_erp_port_boxed(port, 50, req);
+		zfcp_erp_port_boxed(port, "fscpph2", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		/* can't use generic zfcp_erp_modify_port_status because
@@ -1752,8 +1737,6 @@ static void zfcp_fsf_close_physical_port_handler(struct zfcp_fsf_req *req)
 					  &unit->status);
 		break;
 	}
-skip_fsfstatus:
-	atomic_clear_mask(ZFCP_STATUS_PORT_PHYS_CLOSING, &port->status);
 }
 
 /**
@@ -1789,8 +1772,6 @@ int zfcp_fsf_close_physical_port(struct zfcp_erp_action *erp_action)
 	req->erp_action = erp_action;
 	req->handler = zfcp_fsf_close_physical_port_handler;
 	erp_action->fsf_req = req;
-	atomic_set_mask(ZFCP_STATUS_PORT_PHYS_CLOSING,
-			&erp_action->port->status);
 
 	zfcp_fsf_start_erp_timer(req);
 	retval = zfcp_fsf_req_send(req);
@@ -1825,7 +1806,7 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 	switch (header->fsf_status) {
 
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(unit->port->adapter, 0, 109, req);
+		zfcp_erp_adapter_reopen(unit->port->adapter, 0, "fsouh_1", req);
 		/* fall through */
 	case FSF_LUN_ALREADY_OPEN:
 		break;
@@ -1835,7 +1816,7 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 		atomic_clear_mask(ZFCP_STATUS_UNIT_READONLY, &unit->status);
 		break;
 	case FSF_PORT_BOXED:
-		zfcp_erp_port_boxed(unit->port, 51, req);
+		zfcp_erp_port_boxed(unit->port, "fsouh_2", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
@@ -1851,7 +1832,7 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 		else
 			zfcp_act_eval_err(adapter,
 					  header->fsf_status_qual.word[2]);
-		zfcp_erp_unit_access_denied(unit, 60, req);
+		zfcp_erp_unit_access_denied(unit, "fsouh_3", req);
 		atomic_clear_mask(ZFCP_STATUS_UNIT_SHARED, &unit->status);
 		atomic_clear_mask(ZFCP_STATUS_UNIT_READONLY, &unit->status);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
@@ -1862,7 +1843,7 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 			 "0x%016Lx on port 0x%016Lx\n",
 			 (unsigned long long)unit->fcp_lun,
 			 (unsigned long long)unit->port->wwpn);
-		zfcp_erp_unit_failed(unit, 34, req);
+		zfcp_erp_unit_failed(unit, "fsouh_4", req);
 		/* fall through */
 	case FSF_INVALID_COMMAND_OPTION:
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
@@ -1911,9 +1892,9 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 					"port 0x%016Lx)\n",
 					(unsigned long long)unit->fcp_lun,
 					(unsigned long long)unit->port->wwpn);
-				zfcp_erp_unit_failed(unit, 35, req);
+				zfcp_erp_unit_failed(unit, "fsouh_5", req);
 				req->status |= ZFCP_STATUS_FSFREQ_ERROR;
-				zfcp_erp_unit_shutdown(unit, 0, 80, req);
+				zfcp_erp_unit_shutdown(unit, 0, "fsouh_6", req);
         		} else if (!exclusive && readwrite) {
 				dev_err(&adapter->ccw_device->dev,
 					"Shared read-write access not "
@@ -1921,9 +1902,9 @@ static void zfcp_fsf_open_unit_handler(struct zfcp_fsf_req *req)
 					"0x%016Lx)\n",
 					(unsigned long long)unit->fcp_lun,
 					(unsigned long long)unit->port->wwpn);
-				zfcp_erp_unit_failed(unit, 36, req);
+				zfcp_erp_unit_failed(unit, "fsouh_7", req);
 				req->status |= ZFCP_STATUS_FSFREQ_ERROR;
-				zfcp_erp_unit_shutdown(unit, 0, 81, req);
+				zfcp_erp_unit_shutdown(unit, 0, "fsouh_8", req);
         		}
 		}
 		break;
@@ -1988,15 +1969,15 @@ static void zfcp_fsf_close_unit_handler(struct zfcp_fsf_req *req)
 
 	switch (req->qtcb->header.fsf_status) {
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(unit->port->adapter, 0, 110, req);
+		zfcp_erp_adapter_reopen(unit->port->adapter, 0, "fscuh_1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_LUN_HANDLE_NOT_VALID:
-		zfcp_erp_port_reopen(unit->port, 0, 111, req);
+		zfcp_erp_port_reopen(unit->port, 0, "fscuh_2", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_PORT_BOXED:
-		zfcp_erp_port_boxed(unit->port, 52, req);
+		zfcp_erp_port_boxed(unit->port, "fscuh_3", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
@@ -2073,7 +2054,6 @@ static void zfcp_fsf_req_latency(struct zfcp_fsf_req *req)
 	struct fsf_qual_latency_info *lat_inf;
 	struct latency_cont *lat;
 	struct zfcp_unit *unit = req->unit;
-	unsigned long flags;
 
 	lat_inf = &req->qtcb->prefix.prot_status_qual.latency_info;
 
@@ -2091,11 +2071,11 @@ static void zfcp_fsf_req_latency(struct zfcp_fsf_req *req)
 		return;
 	}
 
-	spin_lock_irqsave(&unit->latencies.lock, flags);
+	spin_lock(&unit->latencies.lock);
 	zfcp_fsf_update_lat(&lat->channel, lat_inf->channel_lat);
 	zfcp_fsf_update_lat(&lat->fabric, lat_inf->fabric_lat);
 	lat->counter++;
-	spin_unlock_irqrestore(&unit->latencies.lock, flags);
+	spin_unlock(&unit->latencies.lock);
 }
 
 #ifdef CONFIG_BLK_DEV_IO_TRACE
@@ -2147,7 +2127,6 @@ static void zfcp_fsf_send_fcp_command_task_handler(struct zfcp_fsf_req *req)
 
 	if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ABORTED)) {
 		set_host_byte(scpnt, DID_SOFT_ERROR);
-		set_driver_byte(scpnt, SUGGEST_RETRY);
 		goto skip_fsfstatus;
 	}
 
@@ -2237,12 +2216,12 @@ static void zfcp_fsf_send_fcp_command_handler(struct zfcp_fsf_req *req)
 	switch (header->fsf_status) {
 	case FSF_HANDLE_MISMATCH:
 	case FSF_PORT_HANDLE_NOT_VALID:
-		zfcp_erp_adapter_reopen(unit->port->adapter, 0, 112, req);
+		zfcp_erp_adapter_reopen(unit->port->adapter, 0, "fssfch1", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_FCPLUN_NOT_VALID:
 	case FSF_LUN_HANDLE_NOT_VALID:
-		zfcp_erp_port_reopen(unit->port, 0, 113, req);
+		zfcp_erp_port_reopen(unit->port, 0, "fssfch2", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_SERVICE_CLASS_NOT_SUPPORTED:
@@ -2258,7 +2237,8 @@ static void zfcp_fsf_send_fcp_command_handler(struct zfcp_fsf_req *req)
 			req->qtcb->bottom.io.data_direction,
 			(unsigned long long)unit->fcp_lun,
 			(unsigned long long)unit->port->wwpn);
-		zfcp_erp_adapter_shutdown(unit->port->adapter, 0, 133, req);
+		zfcp_erp_adapter_shutdown(unit->port->adapter, 0, "fssfch3",
+					  req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_CMND_LENGTH_NOT_VALID:
@@ -2268,16 +2248,17 @@ static void zfcp_fsf_send_fcp_command_handler(struct zfcp_fsf_req *req)
 			req->qtcb->bottom.io.fcp_cmnd_length,
 			(unsigned long long)unit->fcp_lun,
 			(unsigned long long)unit->port->wwpn);
-		zfcp_erp_adapter_shutdown(unit->port->adapter, 0, 134, req);
+		zfcp_erp_adapter_shutdown(unit->port->adapter, 0, "fssfch4",
+					  req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR;
 		break;
 	case FSF_PORT_BOXED:
-		zfcp_erp_port_boxed(unit->port, 53, req);
+		zfcp_erp_port_boxed(unit->port, "fssfch5", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
 	case FSF_LUN_BOXED:
-		zfcp_erp_unit_boxed(unit, 54, req);
+		zfcp_erp_unit_boxed(unit, "fssfch6", req);
 		req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			       ZFCP_STATUS_FSFREQ_RETRY;
 		break;
@@ -2314,30 +2295,29 @@ static void zfcp_set_fcp_dl(struct fcp_cmnd_iu *fcp_cmd, u32 fcp_dl)
 
 /**
  * zfcp_fsf_send_fcp_command_task - initiate an FCP command (for a SCSI command)
- * @adapter: adapter where scsi command is issued
  * @unit: unit where command is sent to
  * @scsi_cmnd: scsi command to be sent
- * @timer: timer to be started when request is initiated
- * @req_flags: flags for fsf_request
  */
-int zfcp_fsf_send_fcp_command_task(struct zfcp_adapter *adapter,
-				   struct zfcp_unit *unit,
-				   struct scsi_cmnd *scsi_cmnd,
-				   int use_timer, int req_flags)
+int zfcp_fsf_send_fcp_command_task(struct zfcp_unit *unit,
+				   struct scsi_cmnd *scsi_cmnd)
 {
 	struct zfcp_fsf_req *req;
 	struct fcp_cmnd_iu *fcp_cmnd_iu;
 	unsigned int sbtype;
 	int real_bytes, retval = -EIO;
+	struct zfcp_adapter *adapter = unit->port->adapter;
 
 	if (unlikely(!(atomic_read(&unit->status) &
 		       ZFCP_STATUS_COMMON_UNBLOCKED)))
 		return -EBUSY;
 
 	spin_lock(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	if (atomic_read(&adapter->req_q.count) <= 0) {
+		atomic_inc(&adapter->qdio_outb_full);
 		goto out;
-	req = zfcp_fsf_req_create(adapter, FSF_QTCB_FCP_CMND, req_flags,
+	}
+	req = zfcp_fsf_req_create(adapter, FSF_QTCB_FCP_CMND,
+				  ZFCP_REQ_AUTO_CLEANUP,
 				  adapter->pool.fsf_req_scsi);
 	if (IS_ERR(req)) {
 		retval = PTR_ERR(req);
@@ -2411,7 +2391,7 @@ int zfcp_fsf_send_fcp_command_task(struct zfcp_adapter *adapter,
 				"on port 0x%016Lx closed\n",
 				(unsigned long long)unit->fcp_lun,
 				(unsigned long long)unit->port->wwpn);
-			zfcp_erp_unit_shutdown(unit, 0, 131, req);
+			zfcp_erp_unit_shutdown(unit, 0, "fssfct1", req);
 			retval = -EINVAL;
 		}
 		goto failed_scsi_cmnd;
@@ -2419,9 +2399,6 @@ int zfcp_fsf_send_fcp_command_task(struct zfcp_adapter *adapter,
 
 	zfcp_set_fcp_dl(fcp_cmnd_iu, real_bytes);
 
-	if (use_timer)
-		zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
-
 	retval = zfcp_fsf_req_send(req);
 	if (unlikely(retval))
 		goto failed_scsi_cmnd;
@@ -2439,28 +2416,25 @@ out:
 
 /**
  * zfcp_fsf_send_fcp_ctm - send SCSI task management command
- * @adapter: pointer to struct zfcp-adapter
  * @unit: pointer to struct zfcp_unit
  * @tm_flags: unsigned byte for task management flags
- * @req_flags: int request flags
  * Returns: on success pointer to struct fsf_req, NULL otherwise
  */
-struct zfcp_fsf_req *zfcp_fsf_send_fcp_ctm(struct zfcp_adapter *adapter,
-					   struct zfcp_unit *unit,
-					   u8 tm_flags, int req_flags)
+struct zfcp_fsf_req *zfcp_fsf_send_fcp_ctm(struct zfcp_unit *unit, u8 tm_flags)
 {
 	struct qdio_buffer_element *sbale;
 	struct zfcp_fsf_req *req = NULL;
 	struct fcp_cmnd_iu *fcp_cmnd_iu;
+	struct zfcp_adapter *adapter = unit->port->adapter;
 
 	if (unlikely(!(atomic_read(&unit->status) &
 		       ZFCP_STATUS_COMMON_UNBLOCKED)))
 		return NULL;
 
-	spin_lock(&adapter->req_q_lock);
-	if (!zfcp_fsf_sbal_available(adapter))
+	spin_lock_bh(&adapter->req_q_lock);
+	if (zfcp_fsf_req_sbal_get(adapter))
 		goto out;
-	req = zfcp_fsf_req_create(adapter, FSF_QTCB_FCP_CMND, req_flags,
+	req = zfcp_fsf_req_create(adapter, FSF_QTCB_FCP_CMND, 0,
 				  adapter->pool.fsf_req_scsi);
 	if (IS_ERR(req)) {
 		req = NULL;
@@ -2492,7 +2466,7 @@ struct zfcp_fsf_req *zfcp_fsf_send_fcp_ctm(struct zfcp_adapter *adapter,
 	zfcp_fsf_req_free(req);
 	req = NULL;
 out:
-	spin_unlock(&adapter->req_q_lock);
+	spin_unlock_bh(&adapter->req_q_lock);
 	return req;
 }
 
diff --git a/drivers/s390/scsi/zfcp_fsf.h b/drivers/s390/scsi/zfcp_fsf.h
index 8bb200252347..df7f232faba8 100644
--- a/drivers/s390/scsi/zfcp_fsf.h
+++ b/drivers/s390/scsi/zfcp_fsf.h
@@ -127,10 +127,6 @@
 #define FSF_STATUS_READ_CFDC_UPDATED		0x0000000A
 #define FSF_STATUS_READ_FEATURE_UPDATE_ALERT	0x0000000C
 
-/* status subtypes in status read buffer */
-#define FSF_STATUS_READ_SUB_CLOSE_PHYS_PORT	0x00000001
-#define FSF_STATUS_READ_SUB_ERROR_PORT		0x00000002
-
 /* status subtypes for link down */
 #define FSF_STATUS_READ_SUB_NO_PHYSICAL_LINK	0x00000000
 #define FSF_STATUS_READ_SUB_FDISC_FAILED	0x00000001
diff --git a/drivers/s390/scsi/zfcp_qdio.c b/drivers/s390/scsi/zfcp_qdio.c
index 33e0a206a0a4..e0a215309df0 100644
--- a/drivers/s390/scsi/zfcp_qdio.c
+++ b/drivers/s390/scsi/zfcp_qdio.c
@@ -11,9 +11,6 @@
 
 #include "zfcp_ext.h"
 
-/* FIXME(tune): free space should be one max. SBAL chain plus what? */
-#define ZFCP_QDIO_PCI_INTERVAL	(QDIO_MAX_BUFFERS_PER_Q \
-				- (FSF_MAX_SBALS_PER_REQ + 4))
 #define QBUFF_PER_PAGE		(PAGE_SIZE / sizeof(struct qdio_buffer))
 
 static int zfcp_qdio_buffers_enqueue(struct qdio_buffer **sbal)
@@ -58,7 +55,7 @@ void zfcp_qdio_free(struct zfcp_adapter *adapter)
 	}
 }
 
-static void zfcp_qdio_handler_error(struct zfcp_adapter *adapter, u8 id)
+static void zfcp_qdio_handler_error(struct zfcp_adapter *adapter, char *id)
 {
 	dev_warn(&adapter->ccw_device->dev, "A QDIO problem occurred\n");
 
@@ -77,6 +74,23 @@ static void zfcp_qdio_zero_sbals(struct qdio_buffer *sbal[], int first, int cnt)
 	}
 }
 
+/* this needs to be called prior to updating the queue fill level */
+static void zfcp_qdio_account(struct zfcp_adapter *adapter)
+{
+	ktime_t now;
+	s64 span;
+	int free, used;
+
+	spin_lock(&adapter->qdio_stat_lock);
+	now = ktime_get();
+	span = ktime_us_delta(now, adapter->req_q_time);
+	free = max(0, atomic_read(&adapter->req_q.count));
+	used = QDIO_MAX_BUFFERS_PER_Q - free;
+	adapter->req_q_util += used * span;
+	adapter->req_q_time = now;
+	spin_unlock(&adapter->qdio_stat_lock);
+}
+
 static void zfcp_qdio_int_req(struct ccw_device *cdev, unsigned int qdio_err,
 			      int queue_no, int first, int count,
 			      unsigned long parm)
@@ -86,13 +100,14 @@ static void zfcp_qdio_int_req(struct ccw_device *cdev, unsigned int qdio_err,
 
 	if (unlikely(qdio_err)) {
 		zfcp_hba_dbf_event_qdio(adapter, qdio_err, first, count);
-		zfcp_qdio_handler_error(adapter, 140);
+		zfcp_qdio_handler_error(adapter, "qdireq1");
 		return;
 	}
 
 	/* cleanup all SBALs being program-owned now */
 	zfcp_qdio_zero_sbals(queue->sbal, first, count);
 
+	zfcp_qdio_account(adapter);
 	atomic_add(count, &queue->count);
 	wake_up(&adapter->request_wq);
 }
@@ -154,7 +169,7 @@ static void zfcp_qdio_int_resp(struct ccw_device *cdev, unsigned int qdio_err,
 
 	if (unlikely(qdio_err)) {
 		zfcp_hba_dbf_event_qdio(adapter, qdio_err, first, count);
-		zfcp_qdio_handler_error(adapter, 147);
+		zfcp_qdio_handler_error(adapter, "qdires1");
 		return;
 	}
 
@@ -346,21 +361,12 @@ int zfcp_qdio_send(struct zfcp_fsf_req *fsf_req)
 	struct zfcp_qdio_queue *req_q = &adapter->req_q;
 	int first = fsf_req->sbal_first;
 	int count = fsf_req->sbal_number;
-	int retval, pci, pci_batch;
-	struct qdio_buffer_element *sbale;
+	int retval;
+	unsigned int qdio_flags = QDIO_FLAG_SYNC_OUTPUT;
 
-	/* acknowledgements for transferred buffers */
-	pci_batch = adapter->req_q_pci_batch + count;
-	if (unlikely(pci_batch >= ZFCP_QDIO_PCI_INTERVAL)) {
-		pci_batch %= ZFCP_QDIO_PCI_INTERVAL;
-		pci = first + count - (pci_batch + 1);
-		pci %= QDIO_MAX_BUFFERS_PER_Q;
-		sbale = zfcp_qdio_sbale(req_q, pci, 0);
-		sbale->flags |= SBAL_FLAGS0_PCI;
-	}
+	zfcp_qdio_account(adapter);
 
-	retval = do_QDIO(adapter->ccw_device, QDIO_FLAG_SYNC_OUTPUT, 0, first,
-			 count);
+	retval = do_QDIO(adapter->ccw_device, qdio_flags, 0, first, count);
 	if (unlikely(retval)) {
 		zfcp_qdio_zero_sbals(req_q->sbal, first, count);
 		return retval;
@@ -370,7 +376,6 @@ int zfcp_qdio_send(struct zfcp_fsf_req *fsf_req)
 	atomic_sub(count, &req_q->count);
 	req_q->first += count;
 	req_q->first %= QDIO_MAX_BUFFERS_PER_Q;
-	adapter->req_q_pci_batch = pci_batch;
 	return 0;
 }
 
@@ -441,7 +446,6 @@ void zfcp_qdio_close(struct zfcp_adapter *adapter)
 	}
 	req_q->first = 0;
 	atomic_set(&req_q->count, 0);
-	adapter->req_q_pci_batch = 0;
 	adapter->resp_q.first = 0;
 	atomic_set(&adapter->resp_q.count, 0);
 }
@@ -479,7 +483,6 @@ int zfcp_qdio_open(struct zfcp_adapter *adapter)
 	/* set index of first avalable SBALS / number of available SBALS */
 	adapter->req_q.first = 0;
 	atomic_set(&adapter->req_q.count, QDIO_MAX_BUFFERS_PER_Q);
-	adapter->req_q_pci_batch = 0;
 
 	return 0;
 
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 9dc42a68fbdd..58201e1ae478 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -3,7 +3,7 @@
  *
  * Interface to Linux SCSI midlayer.
  *
- * Copyright IBM Corporation 2002, 2008
+ * Copyright IBM Corporation 2002, 2009
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -27,9 +27,7 @@ char *zfcp_get_fcp_sns_info_ptr(struct fcp_rsp_iu *fcp_rsp_iu)
 static void zfcp_scsi_slave_destroy(struct scsi_device *sdpnt)
 {
 	struct zfcp_unit *unit = (struct zfcp_unit *) sdpnt->hostdata;
-	atomic_clear_mask(ZFCP_STATUS_UNIT_REGISTERED, &unit->status);
 	unit->device = NULL;
-	zfcp_erp_unit_failed(unit, 12, NULL);
 	zfcp_unit_put(unit);
 }
 
@@ -58,8 +56,8 @@ static int zfcp_scsi_queuecommand(struct scsi_cmnd *scpnt,
 {
 	struct zfcp_unit *unit;
 	struct zfcp_adapter *adapter;
-	int    status;
-	int    ret;
+	int    status, scsi_result, ret;
+	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
 
 	/* reset the status for this request */
 	scpnt->result = 0;
@@ -81,6 +79,14 @@ static int zfcp_scsi_queuecommand(struct scsi_cmnd *scpnt,
 		return 0;
 	}
 
+	scsi_result = fc_remote_port_chkready(rport);
+	if (unlikely(scsi_result)) {
+		scpnt->result = scsi_result;
+		zfcp_scsi_dbf_event_result("fail", 4, adapter, scpnt, NULL);
+		scpnt->scsi_done(scpnt);
+		return 0;
+	}
+
 	status = atomic_read(&unit->status);
 	if (unlikely((status & ZFCP_STATUS_COMMON_ERP_FAILED) ||
 		     !(status & ZFCP_STATUS_COMMON_RUNNING))) {
@@ -88,8 +94,7 @@ static int zfcp_scsi_queuecommand(struct scsi_cmnd *scpnt,
 		return 0;;
 	}
 
-	ret = zfcp_fsf_send_fcp_command_task(adapter, unit, scpnt, 0,
-					     ZFCP_REQ_AUTO_CLEANUP);
+	ret = zfcp_fsf_send_fcp_command_task(unit, scpnt);
 	if (unlikely(ret == -EBUSY))
 		return SCSI_MLQUEUE_DEVICE_BUSY;
 	else if (unlikely(ret < 0))
@@ -133,8 +138,7 @@ static int zfcp_scsi_slave_alloc(struct scsi_device *sdp)
 
 	read_lock_irqsave(&zfcp_data.config_lock, flags);
 	unit = zfcp_unit_lookup(adapter, sdp->channel, sdp->id, sdp->lun);
-	if (unit &&
-	    (atomic_read(&unit->status) & ZFCP_STATUS_UNIT_REGISTERED)) {
+	if (unit) {
 		sdp->hostdata = unit;
 		unit->device = sdp;
 		zfcp_unit_get(unit);
@@ -147,79 +151,91 @@ out:
 
 static int zfcp_scsi_eh_abort_handler(struct scsi_cmnd *scpnt)
 {
- 	struct Scsi_Host *scsi_host;
- 	struct zfcp_adapter *adapter;
-	struct zfcp_unit *unit;
-	struct zfcp_fsf_req *fsf_req;
+	struct Scsi_Host *scsi_host = scpnt->device->host;
+	struct zfcp_adapter *adapter =
+		(struct zfcp_adapter *) scsi_host->hostdata[0];
+	struct zfcp_unit *unit = scpnt->device->hostdata;
+	struct zfcp_fsf_req *old_req, *abrt_req;
 	unsigned long flags;
 	unsigned long old_req_id = (unsigned long) scpnt->host_scribble;
 	int retval = SUCCESS;
-
-	scsi_host = scpnt->device->host;
-	adapter = (struct zfcp_adapter *) scsi_host->hostdata[0];
-	unit = scpnt->device->hostdata;
+	int retry = 3;
 
 	/* avoid race condition between late normal completion and abort */
 	write_lock_irqsave(&adapter->abort_lock, flags);
 
-	/* Check whether corresponding fsf_req is still pending */
 	spin_lock(&adapter->req_list_lock);
-	fsf_req = zfcp_reqlist_find(adapter, old_req_id);
+	old_req = zfcp_reqlist_find(adapter, old_req_id);
 	spin_unlock(&adapter->req_list_lock);
-	if (!fsf_req) {
+	if (!old_req) {
 		write_unlock_irqrestore(&adapter->abort_lock, flags);
-		zfcp_scsi_dbf_event_abort("lte1", adapter, scpnt, NULL, 0);
-		return retval;
+		zfcp_scsi_dbf_event_abort("lte1", adapter, scpnt, NULL,
+					  old_req_id);
+		return SUCCESS;
 	}
-	fsf_req->data = NULL;
+	old_req->data = NULL;
 
 	/* don't access old fsf_req after releasing the abort_lock */
 	write_unlock_irqrestore(&adapter->abort_lock, flags);
 
-	fsf_req = zfcp_fsf_abort_fcp_command(old_req_id, adapter, unit, 0);
-	if (!fsf_req) {
-		zfcp_scsi_dbf_event_abort("nres", adapter, scpnt, NULL,
-					  old_req_id);
-		retval = FAILED;
-		return retval;
+	while (retry--) {
+		abrt_req = zfcp_fsf_abort_fcp_command(old_req_id, unit);
+		if (abrt_req)
+			break;
+
+		zfcp_erp_wait(adapter);
+		if (!(atomic_read(&adapter->status) &
+		      ZFCP_STATUS_COMMON_RUNNING)) {
+			zfcp_scsi_dbf_event_abort("nres", adapter, scpnt, NULL,
+						  old_req_id);
+			return SUCCESS;
+		}
 	}
+	if (!abrt_req)
+		return FAILED;
 
-	__wait_event(fsf_req->completion_wq,
-		     fsf_req->status & ZFCP_STATUS_FSFREQ_COMPLETED);
+	wait_event(abrt_req->completion_wq,
+		   abrt_req->status & ZFCP_STATUS_FSFREQ_COMPLETED);
 
-	if (fsf_req->status & ZFCP_STATUS_FSFREQ_ABORTSUCCEEDED) {
-		zfcp_scsi_dbf_event_abort("okay", adapter, scpnt, fsf_req, 0);
-	} else if (fsf_req->status & ZFCP_STATUS_FSFREQ_ABORTNOTNEEDED) {
-		zfcp_scsi_dbf_event_abort("lte2", adapter, scpnt, fsf_req, 0);
-	} else {
-		zfcp_scsi_dbf_event_abort("fail", adapter, scpnt, fsf_req, 0);
+	if (abrt_req->status & ZFCP_STATUS_FSFREQ_ABORTSUCCEEDED)
+		zfcp_scsi_dbf_event_abort("okay", adapter, scpnt, abrt_req, 0);
+	else if (abrt_req->status & ZFCP_STATUS_FSFREQ_ABORTNOTNEEDED)
+		zfcp_scsi_dbf_event_abort("lte2", adapter, scpnt, abrt_req, 0);
+	else {
+		zfcp_scsi_dbf_event_abort("fail", adapter, scpnt, abrt_req, 0);
 		retval = FAILED;
 	}
-	zfcp_fsf_req_free(fsf_req);
-
+	zfcp_fsf_req_free(abrt_req);
 	return retval;
 }
 
-static int zfcp_task_mgmt_function(struct zfcp_unit *unit, u8 tm_flags,
-					 struct scsi_cmnd *scpnt)
+static int zfcp_task_mgmt_function(struct scsi_cmnd *scpnt, u8 tm_flags)
 {
+	struct zfcp_unit *unit = scpnt->device->hostdata;
 	struct zfcp_adapter *adapter = unit->port->adapter;
 	struct zfcp_fsf_req *fsf_req;
 	int retval = SUCCESS;
-
-	/* issue task management function */
-	fsf_req = zfcp_fsf_send_fcp_ctm(adapter, unit, tm_flags, 0);
-	if (!fsf_req) {
-		zfcp_scsi_dbf_event_devreset("nres", tm_flags, unit, scpnt);
-		return FAILED;
+	int retry = 3;
+
+	while (retry--) {
+		fsf_req = zfcp_fsf_send_fcp_ctm(unit, tm_flags);
+		if (fsf_req)
+			break;
+
+		zfcp_erp_wait(adapter);
+		if (!(atomic_read(&adapter->status) &
+		      ZFCP_STATUS_COMMON_RUNNING)) {
+			zfcp_scsi_dbf_event_devreset("nres", tm_flags, unit,
+						     scpnt);
+			return SUCCESS;
+		}
 	}
+	if (!fsf_req)
+		return FAILED;
 
-	__wait_event(fsf_req->completion_wq,
-		     fsf_req->status & ZFCP_STATUS_FSFREQ_COMPLETED);
+	wait_event(fsf_req->completion_wq,
+		   fsf_req->status & ZFCP_STATUS_FSFREQ_COMPLETED);
 
-	/*
-	 * check completion status of task management function
-	 */
 	if (fsf_req->status & ZFCP_STATUS_FSFREQ_TMFUNCFAILED) {
 		zfcp_scsi_dbf_event_devreset("fail", tm_flags, unit, scpnt);
 		retval = FAILED;
@@ -230,40 +246,25 @@ static int zfcp_task_mgmt_function(struct zfcp_unit *unit, u8 tm_flags,
 		zfcp_scsi_dbf_event_devreset("okay", tm_flags, unit, scpnt);
 
 	zfcp_fsf_req_free(fsf_req);
-
 	return retval;
 }
 
 static int zfcp_scsi_eh_device_reset_handler(struct scsi_cmnd *scpnt)
 {
-	struct zfcp_unit *unit = scpnt->device->hostdata;
-
-	if (!unit) {
-		WARN_ON(1);
-		return SUCCESS;
-	}
-	return zfcp_task_mgmt_function(unit, FCP_LOGICAL_UNIT_RESET, scpnt);
+	return zfcp_task_mgmt_function(scpnt, FCP_LOGICAL_UNIT_RESET);
 }
 
 static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
 {
-	struct zfcp_unit *unit = scpnt->device->hostdata;
-
-	if (!unit) {
-		WARN_ON(1);
-		return SUCCESS;
-	}
-	return zfcp_task_mgmt_function(unit, FCP_TARGET_RESET, scpnt);
+	return zfcp_task_mgmt_function(scpnt, FCP_TARGET_RESET);
 }
 
 static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
 {
-	struct zfcp_unit *unit;
-	struct zfcp_adapter *adapter;
+	struct zfcp_unit *unit = scpnt->device->hostdata;
+	struct zfcp_adapter *adapter = unit->port->adapter;
 
-	unit = scpnt->device->hostdata;
-	adapter = unit->port->adapter;
-	zfcp_erp_adapter_reopen(adapter, 0, 141, scpnt);
+	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1", scpnt);
 	zfcp_erp_wait(adapter);
 
 	return SUCCESS;
@@ -479,6 +480,109 @@ static void zfcp_set_rport_dev_loss_tmo(struct fc_rport *rport, u32 timeout)
 	rport->dev_loss_tmo = timeout;
 }
 
+/**
+ * zfcp_scsi_dev_loss_tmo_callbk - Free any reference to rport
+ * @rport: The rport that is about to be deleted.
+ */
+static void zfcp_scsi_dev_loss_tmo_callbk(struct fc_rport *rport)
+{
+	struct zfcp_port *port = rport->dd_data;
+
+	write_lock_irq(&zfcp_data.config_lock);
+	port->rport = NULL;
+	write_unlock_irq(&zfcp_data.config_lock);
+}
+
+/**
+ * zfcp_scsi_terminate_rport_io - Terminate all I/O on a rport
+ * @rport: The FC rport where to teminate I/O
+ *
+ * Abort all pending SCSI commands for a port by closing the
+ * port. Using a reopen for avoids a conflict with a shutdown
+ * overwriting a reopen.
+ */
+static void zfcp_scsi_terminate_rport_io(struct fc_rport *rport)
+{
+	struct zfcp_port *port = rport->dd_data;
+
+	zfcp_erp_port_reopen(port, 0, "sctrpi1", NULL);
+}
+
+static void zfcp_scsi_rport_register(struct zfcp_port *port)
+{
+	struct fc_rport_identifiers ids;
+	struct fc_rport *rport;
+
+	ids.node_name = port->wwnn;
+	ids.port_name = port->wwpn;
+	ids.port_id = port->d_id;
+	ids.roles = FC_RPORT_ROLE_FCP_TARGET;
+
+	rport = fc_remote_port_add(port->adapter->scsi_host, 0, &ids);
+	if (!rport) {
+		dev_err(&port->adapter->ccw_device->dev,
+			"Registering port 0x%016Lx failed\n",
+			(unsigned long long)port->wwpn);
+		return;
+	}
+
+	rport->dd_data = port;
+	rport->maxframe_size = port->maxframe_size;
+	rport->supported_classes = port->supported_classes;
+	port->rport = rport;
+}
+
+static void zfcp_scsi_rport_block(struct zfcp_port *port)
+{
+	if (port->rport)
+		fc_remote_port_delete(port->rport);
+}
+
+void zfcp_scsi_schedule_rport_register(struct zfcp_port *port)
+{
+	zfcp_port_get(port);
+	port->rport_task = RPORT_ADD;
+
+	if (!queue_work(zfcp_data.work_queue, &port->rport_work))
+		zfcp_port_put(port);
+}
+
+void zfcp_scsi_schedule_rport_block(struct zfcp_port *port)
+{
+	zfcp_port_get(port);
+	port->rport_task = RPORT_DEL;
+
+	if (!queue_work(zfcp_data.work_queue, &port->rport_work))
+		zfcp_port_put(port);
+}
+
+void zfcp_scsi_schedule_rports_block(struct zfcp_adapter *adapter)
+{
+	struct zfcp_port *port;
+
+	list_for_each_entry(port, &adapter->port_list_head, list)
+		zfcp_scsi_schedule_rport_block(port);
+}
+
+void zfcp_scsi_rport_work(struct work_struct *work)
+{
+	struct zfcp_port *port = container_of(work, struct zfcp_port,
+					      rport_work);
+
+	while (port->rport_task) {
+		if (port->rport_task == RPORT_ADD) {
+			port->rport_task = RPORT_NONE;
+			zfcp_scsi_rport_register(port);
+		} else {
+			port->rport_task = RPORT_NONE;
+			zfcp_scsi_rport_block(port);
+		}
+	}
+
+	zfcp_port_put(port);
+}
+
+
 struct fc_function_template zfcp_transport_functions = {
 	.show_starget_port_id = 1,
 	.show_starget_port_name = 1,
@@ -497,6 +601,8 @@ struct fc_function_template zfcp_transport_functions = {
 	.reset_fc_host_stats = zfcp_reset_fc_host_stats,
 	.set_rport_dev_loss_tmo = zfcp_set_rport_dev_loss_tmo,
 	.get_host_port_state = zfcp_get_host_port_state,
+	.dev_loss_tmo_callbk = zfcp_scsi_dev_loss_tmo_callbk,
+	.terminate_rport_io = zfcp_scsi_terminate_rport_io,
 	.show_host_port_state = 1,
 	/* no functions registered for following dynamic attributes but
 	   directly set by LLDD */
diff --git a/drivers/s390/scsi/zfcp_sysfs.c b/drivers/s390/scsi/zfcp_sysfs.c
index 899af2b45b1e..9a3b8e261c0a 100644
--- a/drivers/s390/scsi/zfcp_sysfs.c
+++ b/drivers/s390/scsi/zfcp_sysfs.c
@@ -112,9 +112,9 @@ static ZFCP_DEV_ATTR(_feat, failed, S_IWUSR | S_IRUGO,			       \
 		     zfcp_sysfs_##_feat##_failed_show,			       \
 		     zfcp_sysfs_##_feat##_failed_store);
 
-ZFCP_SYSFS_FAILED(zfcp_adapter, adapter, adapter, 44, 93);
-ZFCP_SYSFS_FAILED(zfcp_port, port, port->adapter, 45, 96);
-ZFCP_SYSFS_FAILED(zfcp_unit, unit, unit->port->adapter, 46, 97);
+ZFCP_SYSFS_FAILED(zfcp_adapter, adapter, adapter, "syafai1", "syafai2");
+ZFCP_SYSFS_FAILED(zfcp_port, port, port->adapter, "sypfai1", "sypfai2");
+ZFCP_SYSFS_FAILED(zfcp_unit, unit, unit->port->adapter, "syufai1", "syufai2");
 
 static ssize_t zfcp_sysfs_port_rescan_store(struct device *dev,
 					    struct device_attribute *attr,
@@ -168,7 +168,7 @@ static ssize_t zfcp_sysfs_port_remove_store(struct device *dev,
 		goto out;
 	}
 
-	zfcp_erp_port_shutdown(port, 0, 92, NULL);
+	zfcp_erp_port_shutdown(port, 0, "syprs_1", NULL);
 	zfcp_erp_wait(adapter);
 	zfcp_port_put(port);
 	zfcp_port_dequeue(port);
@@ -222,7 +222,7 @@ static ssize_t zfcp_sysfs_unit_add_store(struct device *dev,
 
 	retval = 0;
 
-	zfcp_erp_unit_reopen(unit, 0, 94, NULL);
+	zfcp_erp_unit_reopen(unit, 0, "syuas_1", NULL);
 	zfcp_erp_wait(unit->port->adapter);
 	zfcp_unit_put(unit);
 out:
@@ -268,7 +268,7 @@ static ssize_t zfcp_sysfs_unit_remove_store(struct device *dev,
 		goto out;
 	}
 
-	zfcp_erp_unit_shutdown(unit, 0, 95, NULL);
+	zfcp_erp_unit_shutdown(unit, 0, "syurs_1", NULL);
 	zfcp_erp_wait(unit->port->adapter);
 	zfcp_unit_put(unit);
 	zfcp_unit_dequeue(unit);
@@ -318,10 +318,9 @@ zfcp_sysfs_unit_##_name##_latency_show(struct device *dev,		\
 	struct zfcp_unit *unit = sdev->hostdata;			\
 	struct zfcp_latencies *lat = &unit->latencies;			\
 	struct zfcp_adapter *adapter = unit->port->adapter;		\
-	unsigned long flags;						\
 	unsigned long long fsum, fmin, fmax, csum, cmin, cmax, cc;	\
 									\
-	spin_lock_irqsave(&lat->lock, flags);				\
+	spin_lock_bh(&lat->lock);					\
 	fsum = lat->_name.fabric.sum * adapter->timer_ticks;		\
 	fmin = lat->_name.fabric.min * adapter->timer_ticks;		\
 	fmax = lat->_name.fabric.max * adapter->timer_ticks;		\
@@ -329,7 +328,7 @@ zfcp_sysfs_unit_##_name##_latency_show(struct device *dev,		\
 	cmin = lat->_name.channel.min * adapter->timer_ticks;		\
 	cmax = lat->_name.channel.max * adapter->timer_ticks;		\
 	cc  = lat->_name.counter;					\
-	spin_unlock_irqrestore(&lat->lock, flags);			\
+	spin_unlock_bh(&lat->lock);					\
 									\
 	do_div(fsum, 1000);						\
 	do_div(fmin, 1000);						\
@@ -487,7 +486,8 @@ static ssize_t zfcp_sysfs_adapter_q_full_show(struct device *dev,
 	struct zfcp_adapter *adapter =
 		(struct zfcp_adapter *) scsi_host->hostdata[0];
 
-	return sprintf(buf, "%d\n", atomic_read(&adapter->qdio_outb_full));
+	return sprintf(buf, "%d %llu\n", atomic_read(&adapter->qdio_outb_full),
+		       (unsigned long long)adapter->req_q_util);
 }
 static DEVICE_ATTR(queue_full, S_IRUGO, zfcp_sysfs_adapter_q_full_show, NULL);
 
diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
index 5311317c2e4c..a12783ebb42d 100644
--- a/drivers/scsi/3w-9xxx.c
+++ b/drivers/scsi/3w-9xxx.c
@@ -4,7 +4,7 @@
    Written By: Adam Radford <linuxraid@amcc.com>
    Modifications By: Tom Couch <linuxraid@amcc.com>
 
-   Copyright (C) 2004-2008 Applied Micro Circuits Corporation.
+   Copyright (C) 2004-2009 Applied Micro Circuits Corporation.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -75,6 +75,7 @@
                  Add MSI support and "use_msi" module parameter.
                  Fix bug in twa_get_param() on 4GB+.
                  Use pci_resource_len() for ioremap().
+   2.26.02.012 - Add power management support.
 */
 
 #include <linux/module.h>
@@ -99,7 +100,7 @@
 #include "3w-9xxx.h"
 
 /* Globals */
-#define TW_DRIVER_VERSION "2.26.02.011"
+#define TW_DRIVER_VERSION "2.26.02.012"
 static TW_Device_Extension *twa_device_extension_list[TW_MAX_SLOT];
 static unsigned int twa_device_extension_count;
 static int twa_major = -1;
@@ -2182,6 +2183,98 @@ static void twa_remove(struct pci_dev *pdev)
 	twa_device_extension_count--;
 } /* End twa_remove() */
 
+#ifdef CONFIG_PM
+/* This function is called on PCI suspend */
+static int twa_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+	struct Scsi_Host *host = pci_get_drvdata(pdev);
+	TW_Device_Extension *tw_dev = (TW_Device_Extension *)host->hostdata;
+
+	printk(KERN_WARNING "3w-9xxx: Suspending host %d.\n", tw_dev->host->host_no);
+
+	TW_DISABLE_INTERRUPTS(tw_dev);
+	free_irq(tw_dev->tw_pci_dev->irq, tw_dev);
+
+	if (test_bit(TW_USING_MSI, &tw_dev->flags))
+		pci_disable_msi(pdev);
+
+	/* Tell the card we are shutting down */
+	if (twa_initconnection(tw_dev, 1, 0, 0, 0, 0, 0, NULL, NULL, NULL, NULL, NULL)) {
+		TW_PRINTK(tw_dev->host, TW_DRIVER, 0x38, "Connection shutdown failed during suspend");
+	} else {
+		printk(KERN_WARNING "3w-9xxx: Suspend complete.\n");
+	}
+	TW_CLEAR_ALL_INTERRUPTS(tw_dev);
+
+	pci_save_state(pdev);
+	pci_disable_device(pdev);
+	pci_set_power_state(pdev, pci_choose_state(pdev, state));
+
+	return 0;
+} /* End twa_suspend() */
+
+/* This function is called on PCI resume */
+static int twa_resume(struct pci_dev *pdev)
+{
+	int retval = 0;
+	struct Scsi_Host *host = pci_get_drvdata(pdev);
+	TW_Device_Extension *tw_dev = (TW_Device_Extension *)host->hostdata;
+
+	printk(KERN_WARNING "3w-9xxx: Resuming host %d.\n", tw_dev->host->host_no);
+	pci_set_power_state(pdev, PCI_D0);
+	pci_enable_wake(pdev, PCI_D0, 0);
+	pci_restore_state(pdev);
+
+	retval = pci_enable_device(pdev);
+	if (retval) {
+		TW_PRINTK(tw_dev->host, TW_DRIVER, 0x39, "Enable device failed during resume");
+		return retval;
+	}
+
+	pci_set_master(pdev);
+	pci_try_set_mwi(pdev);
+
+	if (pci_set_dma_mask(pdev, DMA_64BIT_MASK)
+	    || pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK))
+		if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)
+		    || pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK)) {
+			TW_PRINTK(host, TW_DRIVER, 0x40, "Failed to set dma mask during resume");
+			retval = -ENODEV;
+			goto out_disable_device;
+		}
+
+	/* Initialize the card */
+	if (twa_reset_sequence(tw_dev, 0)) {
+		retval = -ENODEV;
+		goto out_disable_device;
+	}
+
+	/* Now setup the interrupt handler */
+	retval = request_irq(pdev->irq, twa_interrupt, IRQF_SHARED, "3w-9xxx", tw_dev);
+	if (retval) {
+		TW_PRINTK(tw_dev->host, TW_DRIVER, 0x42, "Error requesting IRQ during resume");
+		retval = -ENODEV;
+		goto out_disable_device;
+	}
+
+	/* Now enable MSI if enabled */
+	if (test_bit(TW_USING_MSI, &tw_dev->flags))
+		pci_enable_msi(pdev);
+
+	/* Re-enable interrupts on the card */
+	TW_ENABLE_AND_CLEAR_INTERRUPTS(tw_dev);
+
+	printk(KERN_WARNING "3w-9xxx: Resume complete.\n");
+	return 0;
+
+out_disable_device:
+	scsi_remove_host(host);
+	pci_disable_device(pdev);
+
+	return retval;
+} /* End twa_resume() */
+#endif
+
 /* PCI Devices supported by this driver */
 static struct pci_device_id twa_pci_tbl[] __devinitdata = {
 	{ PCI_VENDOR_ID_3WARE, PCI_DEVICE_ID_3WARE_9000,
@@ -2202,6 +2295,10 @@ static struct pci_driver twa_driver = {
 	.id_table	= twa_pci_tbl,
 	.probe		= twa_probe,
 	.remove		= twa_remove,
+#ifdef CONFIG_PM
+	.suspend	= twa_suspend,
+	.resume		= twa_resume,
+#endif
 	.shutdown	= twa_shutdown
 };
 
diff --git a/drivers/scsi/3w-9xxx.h b/drivers/scsi/3w-9xxx.h
index 1729a8785fea..2893eec78ed2 100644
--- a/drivers/scsi/3w-9xxx.h
+++ b/drivers/scsi/3w-9xxx.h
@@ -4,7 +4,7 @@
    Written By: Adam Radford <linuxraid@amcc.com>
    Modifications By: Tom Couch <linuxraid@amcc.com>
 
-   Copyright (C) 2004-2008 Applied Micro Circuits Corporation.
+   Copyright (C) 2004-2009 Applied Micro Circuits Corporation.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 256c7bec7bd7..e2f44e6c0bcb 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -224,14 +224,15 @@ config SCSI_LOGGING
 	  can enable logging by saying Y to "/proc file system support" and
 	  "Sysctl support" below and executing the command
 
-	  echo "scsi log token [level]" > /proc/scsi/scsi
+	  echo <bitmask> > /proc/sys/dev/scsi/logging_level
 
-	  at boot time after the /proc file system has been mounted.
+	  where <bitmask> is a four byte value representing the logging type
+	  and logging level for each type of logging selected.
 
-	  There are a number of things that can be used for 'token' (you can
-	  find them in the source: <file:drivers/scsi/scsi.c>), and this
-	  allows you to select the types of information you want, and the
-	  level allows you to select the level of verbosity.
+	  There are a number of logging types and you can find them in the
+	  source at <file:drivers/scsi/scsi_logging.h>. The logging levels
+	  are also described in that file and they determine the verbosity of
+	  the logging for each logging type.
 
 	  If you say N here, it may be harder to track down some types of SCSI
 	  problems. If you say Y here your kernel will be somewhat larger, but
@@ -570,6 +571,7 @@ config SCSI_ARCMSR_AER
 	  To enable this function, choose Y here.
 
 source "drivers/scsi/megaraid/Kconfig.megaraid"
+source "drivers/scsi/mpt2sas/Kconfig"
 
 config SCSI_HPTIOP
 	tristate "HighPoint RocketRAID 3xxx/4xxx Controller support"
@@ -608,6 +610,7 @@ config SCSI_FLASHPOINT
 config LIBFC
 	tristate "LibFC module"
 	select SCSI_FC_ATTRS
+	select CRC32
 	---help---
 	  Fibre Channel library module
 
@@ -1535,6 +1538,7 @@ config SCSI_NSP32
 config SCSI_DEBUG
 	tristate "SCSI debugging host simulator"
 	depends on SCSI
+	select CRC_T10DIF
 	help
 	  This is a host adapter simulator that can simulate multiple hosts
 	  each with multiple dummy SCSI devices (disks). It defaults to one
@@ -1803,4 +1807,6 @@ source "drivers/scsi/pcmcia/Kconfig"
 
 source "drivers/scsi/device_handler/Kconfig"
 
+source "drivers/scsi/osd/Kconfig"
+
 endmenu
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 7461eb09a031..cf7929634668 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -99,6 +99,7 @@ obj-$(CONFIG_SCSI_DC390T)	+= tmscsim.o
 obj-$(CONFIG_MEGARAID_LEGACY)	+= megaraid.o
 obj-$(CONFIG_MEGARAID_NEWGEN)	+= megaraid/
 obj-$(CONFIG_MEGARAID_SAS)	+= megaraid/
+obj-$(CONFIG_SCSI_MPT2SAS)	+= mpt2sas/
 obj-$(CONFIG_SCSI_ACARD)	+= atp870u.o
 obj-$(CONFIG_SCSI_SUNESP)	+= esp_scsi.o	sun_esp.o
 obj-$(CONFIG_SCSI_GDTH)		+= gdth.o
@@ -137,6 +138,8 @@ obj-$(CONFIG_CHR_DEV_SG)	+= sg.o
 obj-$(CONFIG_CHR_DEV_SCH)	+= ch.o
 obj-$(CONFIG_SCSI_ENCLOSURE)	+= ses.o
 
+obj-$(CONFIG_SCSI_OSD_INITIATOR) += osd/
+
 # This goes last, so that "real" scsi devices probe earlier
 obj-$(CONFIG_SCSI_DEBUG)	+= scsi_debug.o
 
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index af9725409f43..7b1633a8c15a 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -41,6 +41,7 @@ MODULE_DESCRIPTION("device driver for scsi media changer devices");
 MODULE_AUTHOR("Gerd Knorr <kraxel@bytesex.org>");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_CHANGER_MAJOR);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_MEDIUM_CHANGER);
 
 static int init = 1;
 module_param(init, int, 0444);
diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 4003deefb7d8..e79e18101f87 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -1373,21 +1373,14 @@ static const char * const driverbyte_table[]={
 "DRIVER_INVALID", "DRIVER_TIMEOUT", "DRIVER_HARD", "DRIVER_SENSE"};
 #define NUM_DRIVERBYTE_STRS ARRAY_SIZE(driverbyte_table)
 
-static const char * const driversuggest_table[]={"SUGGEST_OK",
-"SUGGEST_RETRY", "SUGGEST_ABORT", "SUGGEST_REMAP", "SUGGEST_DIE",
-"SUGGEST_5", "SUGGEST_6", "SUGGEST_7", "SUGGEST_SENSE"};
-#define NUM_SUGGEST_STRS ARRAY_SIZE(driversuggest_table)
-
 void scsi_show_result(int result)
 {
 	int hb = host_byte(result);
-	int db = (driver_byte(result) & DRIVER_MASK);
-	int su = ((driver_byte(result) & SUGGEST_MASK) >> 4);
+	int db = driver_byte(result);
 
-	printk("Result: hostbyte=%s driverbyte=%s,%s\n",
+	printk("Result: hostbyte=%s driverbyte=%s\n",
 	       (hb < NUM_HOSTBYTE_STRS ? hostbyte_table[hb]     : "invalid"),
-	       (db < NUM_DRIVERBYTE_STRS ? driverbyte_table[db] : "invalid"),
-	       (su < NUM_SUGGEST_STRS ? driversuggest_table[su] : "invalid"));
+	       (db < NUM_DRIVERBYTE_STRS ? driverbyte_table[db] : "invalid"));
 }
 
 #else
diff --git a/drivers/scsi/cxgb3i/cxgb3i_ddp.c b/drivers/scsi/cxgb3i/cxgb3i_ddp.c
index a83d36e4926f..4eb6f5593b3e 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_ddp.c
+++ b/drivers/scsi/cxgb3i/cxgb3i_ddp.c
@@ -196,7 +196,7 @@ static inline int ddp_alloc_gl_skb(struct cxgb3i_ddp_info *ddp, int idx,
 }
 
 /**
- * cxgb3i_ddp_find_page_index - return ddp page index for a given page size.
+ * cxgb3i_ddp_find_page_index - return ddp page index for a given page size
  * @pgsz: page size
  * return the ddp page index, if no match is found return DDP_PGIDX_MAX.
  */
@@ -355,8 +355,7 @@ EXPORT_SYMBOL_GPL(cxgb3i_ddp_release_gl);
  * @tdev: t3cdev adapter
  * @tid: connection id
  * @tformat: tag format
- * @tagp: the s/w tag, if ddp setup is successful, it will be updated with
- *	  ddp/hw tag
+ * @tagp: contains s/w tag initially, will be updated with ddp/hw tag
  * @gl: the page momory list
  * @gfp: allocation mode
  *
diff --git a/drivers/scsi/cxgb3i/cxgb3i_ddp.h b/drivers/scsi/cxgb3i/cxgb3i_ddp.h
index 3faae7831c83..75a63a81e873 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_ddp.h
+++ b/drivers/scsi/cxgb3i/cxgb3i_ddp.h
@@ -185,12 +185,11 @@ static inline int cxgb3i_is_ddp_tag(struct cxgb3i_tag_format *tformat, u32 tag)
 }
 
 /**
- * cxgb3i_sw_tag_usable - check if a given s/w tag has enough bits left for
- *			  the reserved/hw bits
+ * cxgb3i_sw_tag_usable - check if s/w tag has enough bits left for hw bits
  * @tformat: tag format information
  * @sw_tag: s/w tag to be checked
  *
- * return true if the tag is a ddp tag, false otherwise.
+ * return true if the tag can be used for hw ddp tag, false otherwise.
  */
 static inline int cxgb3i_sw_tag_usable(struct cxgb3i_tag_format *tformat,
 					u32 sw_tag)
@@ -222,8 +221,7 @@ static inline u32 cxgb3i_set_non_ddp_tag(struct cxgb3i_tag_format *tformat,
 }
 
 /**
- * cxgb3i_ddp_tag_base - shift the s/w tag bits so that reserved bits are not
- *			 used.
+ * cxgb3i_ddp_tag_base - shift s/w tag bits so that reserved bits are not used
  * @tformat: tag format information
  * @sw_tag: s/w tag to be checked
  */
diff --git a/drivers/scsi/cxgb3i/cxgb3i_iscsi.c b/drivers/scsi/cxgb3i/cxgb3i_iscsi.c
index fa2a44f37b36..e185dedc4c1f 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_iscsi.c
+++ b/drivers/scsi/cxgb3i/cxgb3i_iscsi.c
@@ -101,8 +101,7 @@ free_snic:
 }
 
 /**
- * cxgb3i_adapter_remove - release all the resources held and cleanup any
- *	h/w settings
+ * cxgb3i_adapter_remove - release the resources held and cleanup h/w settings
  * @t3dev: t3cdev adapter
  */
 void cxgb3i_adapter_remove(struct t3cdev *t3dev)
@@ -135,8 +134,7 @@ void cxgb3i_adapter_remove(struct t3cdev *t3dev)
 }
 
 /**
- * cxgb3i_hba_find_by_netdev - find the cxgb3i_hba structure with a given
- *	net_device
+ * cxgb3i_hba_find_by_netdev - find the cxgb3i_hba structure via net_device
  * @t3dev: t3cdev adapter
  */
 struct cxgb3i_hba *cxgb3i_hba_find_by_netdev(struct net_device *ndev)
@@ -170,8 +168,7 @@ struct cxgb3i_hba *cxgb3i_hba_host_add(struct cxgb3i_adapter *snic,
 	int err;
 
 	shost = iscsi_host_alloc(&cxgb3i_host_template,
-				 sizeof(struct cxgb3i_hba),
-				 CXGB3I_SCSI_QDEPTH_DFLT);
+				 sizeof(struct cxgb3i_hba), 1);
 	if (!shost) {
 		cxgb3i_log_info("iscsi_host_alloc failed.\n");
 		return NULL;
@@ -335,13 +332,12 @@ static void cxgb3i_ep_disconnect(struct iscsi_endpoint *ep)
  * @cmds_max:		max # of commands
  * @qdepth:		scsi queue depth
  * @initial_cmdsn:	initial iscsi CMDSN for this session
- * @host_no:		pointer to return host no
  *
  * Creates a new iSCSI session
  */
 static struct iscsi_cls_session *
 cxgb3i_session_create(struct iscsi_endpoint *ep, u16 cmds_max, u16 qdepth,
-		      u32 initial_cmdsn, u32 *host_no)
+		      u32 initial_cmdsn)
 {
 	struct cxgb3i_endpoint *cep;
 	struct cxgb3i_hba *hba;
@@ -360,8 +356,6 @@ cxgb3i_session_create(struct iscsi_endpoint *ep, u16 cmds_max, u16 qdepth,
 	cxgb3i_api_debug("ep 0x%p, cep 0x%p, hba 0x%p.\n", ep, cep, hba);
 	BUG_ON(hba != iscsi_host_priv(shost));
 
-	*host_no = shost->host_no;
-
 	cls_session = iscsi_session_setup(&cxgb3i_iscsi_transport, shost,
 					  cmds_max,
 					  sizeof(struct iscsi_tcp_task) +
@@ -394,9 +388,9 @@ static void cxgb3i_session_destroy(struct iscsi_cls_session *cls_session)
 }
 
 /**
- * cxgb3i_conn_max_xmit_dlength -- check the max. xmit pdu segment size,
- * reduce it to be within the hardware limit if needed
+ * cxgb3i_conn_max_xmit_dlength -- calc the max. xmit pdu segment size
  * @conn: iscsi connection
+ * check the max. xmit pdu payload, reduce it if needed
  */
 static inline int cxgb3i_conn_max_xmit_dlength(struct iscsi_conn *conn)
 
@@ -417,8 +411,7 @@ static inline int cxgb3i_conn_max_xmit_dlength(struct iscsi_conn *conn)
 }
 
 /**
- * cxgb3i_conn_max_recv_dlength -- check the max. recv pdu segment size against
- * the hardware limit
+ * cxgb3i_conn_max_recv_dlength -- check the max. recv pdu segment size
  * @conn: iscsi connection
  * return 0 if the value is valid, < 0 otherwise.
  */
@@ -759,9 +752,9 @@ static void cxgb3i_parse_itt(struct iscsi_conn *conn, itt_t itt,
 
 /**
  * cxgb3i_reserve_itt - generate tag for a give task
- * Try to set up ddp for a scsi read task.
  * @task: iscsi task
  * @hdr_itt: tag, filled in by this function
+ * Set up ddp for scsi read tasks if possible.
  */
 int cxgb3i_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
 {
@@ -809,9 +802,9 @@ int cxgb3i_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
 
 /**
  * cxgb3i_release_itt - release the tag for a given task
- * if the tag is a ddp tag, release the ddp setup
  * @task:	iscsi task
  * @hdr_itt:	tag
+ * If the tag is a ddp tag, release the ddp setup
  */
 void cxgb3i_release_itt(struct iscsi_task *task, itt_t hdr_itt)
 {
@@ -843,7 +836,7 @@ static struct scsi_host_template cxgb3i_host_template = {
 	.can_queue		= CXGB3I_SCSI_QDEPTH_DFLT - 1,
 	.sg_tablesize		= SG_ALL,
 	.max_sectors		= 0xFFFF,
-	.cmd_per_lun		= ISCSI_DEF_CMD_PER_LUN,
+	.cmd_per_lun		= CXGB3I_SCSI_QDEPTH_DFLT,
 	.eh_abort_handler	= iscsi_eh_abort,
 	.eh_device_reset_handler = iscsi_eh_device_reset,
 	.eh_target_reset_handler = iscsi_eh_target_reset,
diff --git a/drivers/scsi/cxgb3i/cxgb3i_offload.c b/drivers/scsi/cxgb3i/cxgb3i_offload.c
index de3b3b614cca..c2e434e54e28 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_offload.c
+++ b/drivers/scsi/cxgb3i/cxgb3i_offload.c
@@ -1417,8 +1417,7 @@ static void c3cn_active_close(struct s3_conn *c3cn)
 }
 
 /**
- * cxgb3i_c3cn_release - close and release an iscsi tcp connection and any
- * 	resource held
+ * cxgb3i_c3cn_release - close and release an iscsi tcp connection
  * @c3cn: the iscsi tcp connection
  */
 void cxgb3i_c3cn_release(struct s3_conn *c3cn)
diff --git a/drivers/scsi/cxgb3i/cxgb3i_offload.h b/drivers/scsi/cxgb3i/cxgb3i_offload.h
index 6344b9eb2589..275f23f16eb7 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_offload.h
+++ b/drivers/scsi/cxgb3i/cxgb3i_offload.h
@@ -139,6 +139,7 @@ enum c3cn_flags {
 
 /**
  * cxgb3i_sdev_data - Per adapter data.
+ *
  * Linked off of each Ethernet device port on the adapter.
  * Also available via the t3cdev structure since we have pointers to our port
  * net_device's there ...
diff --git a/drivers/scsi/cxgb3i/cxgb3i_pdu.c b/drivers/scsi/cxgb3i/cxgb3i_pdu.c
index 17115c230d65..7eebc9a7cb35 100644
--- a/drivers/scsi/cxgb3i/cxgb3i_pdu.c
+++ b/drivers/scsi/cxgb3i/cxgb3i_pdu.c
@@ -479,7 +479,7 @@ void cxgb3i_conn_tx_open(struct s3_conn *c3cn)
 	cxgb3i_tx_debug("cn 0x%p.\n", c3cn);
 	if (conn) {
 		cxgb3i_tx_debug("cn 0x%p, cid %d.\n", c3cn, conn->id);
-		scsi_queue_work(conn->session->host, &conn->xmitwork);
+		iscsi_conn_queue_work(conn);
 	}
 }
 
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index e356b43753ff..dba154c8ff64 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -247,8 +247,8 @@ static unsigned submit_stpg(struct scsi_device *sdev, struct alua_dh_data *h)
 	/* Prepare the data buffer */
 	memset(h->buff, 0, stpg_len);
 	h->buff[4] = TPGS_STATE_OPTIMIZED & 0x0f;
-	h->buff[6] = (h->group_id >> 8) & 0x0f;
-	h->buff[7] = h->group_id & 0x0f;
+	h->buff[6] = (h->group_id >> 8) & 0xff;
+	h->buff[7] = h->group_id & 0xff;
 
 	rq = get_alua_req(sdev, h->buff, stpg_len, WRITE);
 	if (!rq)
@@ -461,6 +461,15 @@ static int alua_check_sense(struct scsi_device *sdev,
 			 */
 			return ADD_TO_MLQUEUE;
 		}
+		if (sense_hdr->asc == 0x3f && sense_hdr->ascq == 0x0e) {
+			/*
+			 * REPORTED_LUNS_DATA_HAS_CHANGED is reported
+			 * when switching controllers on targets like
+			 * Intel Multi-Flex. We can just retry.
+			 */
+			return ADD_TO_MLQUEUE;
+		}
+
 		break;
 	}
 
@@ -691,6 +700,7 @@ static const struct scsi_dh_devlist alua_dev_list[] = {
 	{"IBM", "2107900" },
 	{"IBM", "2145" },
 	{"Pillar", "Axiom" },
+	{"Intel", "Multi-Flex"},
 	{NULL, NULL}
 };
 
diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 53664765570a..43b8c51e98d0 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -449,28 +449,40 @@ static int mode_select_handle_sense(struct scsi_device *sdev,
 				    unsigned char *sensebuf)
 {
 	struct scsi_sense_hdr sense_hdr;
-	int sense, err = SCSI_DH_IO, ret;
+	int err = SCSI_DH_IO, ret;
 
 	ret = scsi_normalize_sense(sensebuf, SCSI_SENSE_BUFFERSIZE, &sense_hdr);
 	if (!ret)
 		goto done;
 
 	err = SCSI_DH_OK;
-	sense = (sense_hdr.sense_key << 16) | (sense_hdr.asc << 8) |
-			sense_hdr.ascq;
-	/* If it is retryable failure, submit the c9 inquiry again */
-	if (sense == 0x59136 || sense == 0x68b02 || sense == 0xb8b02 ||
-			    sense == 0x62900) {
-		/* 0x59136    - Command lock contention
-		 * 0x[6b]8b02 - Quiesense in progress or achieved
-		 * 0x62900    - Power On, Reset, or Bus Device Reset
-		 */
+
+	switch (sense_hdr.sense_key) {
+	case NO_SENSE:
+	case ABORTED_COMMAND:
+	case UNIT_ATTENTION:
 		err = SCSI_DH_RETRY;
+		break;
+	case NOT_READY:
+		if (sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x01)
+			/* LUN Not Ready and is in the Process of Becoming
+			 * Ready
+			 */
+			err = SCSI_DH_RETRY;
+		break;
+	case ILLEGAL_REQUEST:
+		if (sense_hdr.asc == 0x91 && sense_hdr.ascq == 0x36)
+			/*
+			 * Command Lock contention
+			 */
+			err = SCSI_DH_RETRY;
+		break;
+	default:
+		sdev_printk(KERN_INFO, sdev,
+			    "MODE_SELECT failed with sense %02x/%02x/%02x.\n",
+			    sense_hdr.sense_key, sense_hdr.asc, sense_hdr.ascq);
 	}
 
-	if (sense)
-		sdev_printk(KERN_INFO, sdev,
-			"MODE_SELECT failed with sense 0x%x.\n", sense);
 done:
 	return err;
 }
@@ -562,6 +574,12 @@ static int rdac_check_sense(struct scsi_device *sdev,
 			 * Just retry and wait.
 			 */
 			return ADD_TO_MLQUEUE;
+		if (sense_hdr->asc == 0xA1  && sense_hdr->ascq == 0x02)
+			/* LUN Not Ready - Quiescense in progress
+			 * or has been achieved
+			 * Just retry.
+			 */
+			return ADD_TO_MLQUEUE;
 		break;
 	case ILLEGAL_REQUEST:
 		if (sense_hdr->asc == 0x94 && sense_hdr->ascq == 0x01) {
@@ -579,6 +597,11 @@ static int rdac_check_sense(struct scsi_device *sdev,
 			 * Power On, Reset, or Bus Device Reset, just retry.
 			 */
 			return ADD_TO_MLQUEUE;
+		if (sense_hdr->asc == 0x8b && sense_hdr->ascq == 0x02)
+			/*
+			 * Quiescence in progress , just retry.
+			 */
+			return ADD_TO_MLQUEUE;
 		break;
 	}
 	/* success just means we do not care what scsi-ml does */
diff --git a/drivers/scsi/fcoe/fcoe_sw.c b/drivers/scsi/fcoe/fcoe_sw.c
index da210eba1941..2bbbe3c0cc7b 100644
--- a/drivers/scsi/fcoe/fcoe_sw.c
+++ b/drivers/scsi/fcoe/fcoe_sw.c
@@ -133,6 +133,13 @@ static int fcoe_sw_lport_config(struct fc_lport *lp)
 	/* lport fc_lport related configuration */
 	fc_lport_config(lp);
 
+	/* offload related configuration */
+	lp->crc_offload = 0;
+	lp->seq_offload = 0;
+	lp->lro_enabled = 0;
+	lp->lro_xid = 0;
+	lp->lso_max = 0;
+
 	return 0;
 }
 
@@ -186,7 +193,27 @@ static int fcoe_sw_netdev_config(struct fc_lport *lp, struct net_device *netdev)
 	if (fc->real_dev->features & NETIF_F_SG)
 		lp->sg_supp = 1;
 
-
+#ifdef NETIF_F_FCOE_CRC
+	if (netdev->features & NETIF_F_FCOE_CRC) {
+		lp->crc_offload = 1;
+		printk(KERN_DEBUG "fcoe:%s supports FCCRC offload\n",
+		       netdev->name);
+	}
+#endif
+#ifdef NETIF_F_FSO
+	if (netdev->features & NETIF_F_FSO) {
+		lp->seq_offload = 1;
+		lp->lso_max = netdev->gso_max_size;
+		printk(KERN_DEBUG "fcoe:%s supports LSO for max len 0x%x\n",
+		       netdev->name, lp->lso_max);
+	}
+#endif
+	if (netdev->fcoe_ddp_xid) {
+		lp->lro_enabled = 1;
+		lp->lro_xid = netdev->fcoe_ddp_xid;
+		printk(KERN_DEBUG "fcoe:%s supports LRO for max xid 0x%x\n",
+		       netdev->name, lp->lro_xid);
+	}
 	skb_queue_head_init(&fc->fcoe_pending_queue);
 	fc->fcoe_pending_queue_active = 0;
 
@@ -346,8 +373,46 @@ static int fcoe_sw_destroy(struct net_device *netdev)
 	return 0;
 }
 
+/*
+ * fcoe_sw_ddp_setup - calls LLD's ddp_setup through net_device
+ * @lp:	the corresponding fc_lport
+ * @xid: the exchange id for this ddp transfer
+ * @sgl: the scatterlist describing this transfer
+ * @sgc: number of sg items
+ *
+ * Returns : 0 no ddp
+ */
+static int fcoe_sw_ddp_setup(struct fc_lport *lp, u16 xid,
+			     struct scatterlist *sgl, unsigned int sgc)
+{
+	struct net_device *n = fcoe_netdev(lp);
+
+	if (n->netdev_ops && n->netdev_ops->ndo_fcoe_ddp_setup)
+		return n->netdev_ops->ndo_fcoe_ddp_setup(n, xid, sgl, sgc);
+
+	return 0;
+}
+
+/*
+ * fcoe_sw_ddp_done - calls LLD's ddp_done through net_device
+ * @lp:	the corresponding fc_lport
+ * @xid: the exchange id for this ddp transfer
+ *
+ * Returns : the length of data that have been completed by ddp
+ */
+static int fcoe_sw_ddp_done(struct fc_lport *lp, u16 xid)
+{
+	struct net_device *n = fcoe_netdev(lp);
+
+	if (n->netdev_ops && n->netdev_ops->ndo_fcoe_ddp_done)
+		return n->netdev_ops->ndo_fcoe_ddp_done(n, xid);
+	return 0;
+}
+
 static struct libfc_function_template fcoe_sw_libfc_fcn_templ = {
 	.frame_send = fcoe_xmit,
+	.ddp_setup = fcoe_sw_ddp_setup,
+	.ddp_done = fcoe_sw_ddp_done,
 };
 
 /**
diff --git a/drivers/scsi/fcoe/libfcoe.c b/drivers/scsi/fcoe/libfcoe.c
index 5548bf3bb58b..0d6f5beb7f9e 100644
--- a/drivers/scsi/fcoe/libfcoe.c
+++ b/drivers/scsi/fcoe/libfcoe.c
@@ -423,7 +423,7 @@ int fcoe_xmit(struct fc_lport *lp, struct fc_frame *fp)
 
 	/* crc offload */
 	if (likely(lp->crc_offload)) {
-		skb->ip_summed = CHECKSUM_COMPLETE;
+		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum_start = skb_headroom(skb);
 		skb->csum_offset = skb->len;
 		crc = 0;
@@ -460,7 +460,7 @@ int fcoe_xmit(struct fc_lport *lp, struct fc_frame *fp)
 	skb_reset_mac_header(skb);
 	skb_reset_network_header(skb);
 	skb->mac_len = elen;
-	skb->protocol = htons(ETH_P_802_3);
+	skb->protocol = htons(ETH_P_FCOE);
 	skb->dev = fc->real_dev;
 
 	/* fill up mac and fcoe headers */
@@ -483,6 +483,16 @@ int fcoe_xmit(struct fc_lport *lp, struct fc_frame *fp)
 		FC_FCOE_ENCAPS_VER(hp, FC_FCOE_VER);
 	hp->fcoe_sof = sof;
 
+#ifdef NETIF_F_FSO
+	/* fcoe lso, mss is in max_payload which is non-zero for FCP data */
+	if (lp->seq_offload && fr_max_payload(fp)) {
+		skb_shinfo(skb)->gso_type = SKB_GSO_FCOE;
+		skb_shinfo(skb)->gso_size = fr_max_payload(fp);
+	} else {
+		skb_shinfo(skb)->gso_type = 0;
+		skb_shinfo(skb)->gso_size = 0;
+	}
+#endif
 	/* update tx stats: regardless if LLD fails */
 	stats = lp->dev_stats[smp_processor_id()];
 	if (stats) {
@@ -623,7 +633,7 @@ int fcoe_percpu_receive_thread(void *arg)
 		 * it's solicited data, in which case, the FCP layer would
 		 * check it during the copy.
 		 */
-		if (lp->crc_offload)
+		if (lp->crc_offload && skb->ip_summed == CHECKSUM_UNNECESSARY)
 			fr_flags(fp) &= ~FCPHF_CRC_UNCHECKED;
 		else
 			fr_flags(fp) |= FCPHF_CRC_UNCHECKED;
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index aa670a1d1513..89d41a424b33 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -176,7 +176,6 @@ void scsi_remove_host(struct Scsi_Host *shost)
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
 	device_del(&shost->shost_gendev);
-	scsi_proc_hostdir_rm(shost->hostt);
 }
 EXPORT_SYMBOL(scsi_remove_host);
 
@@ -270,6 +269,8 @@ static void scsi_host_dev_release(struct device *dev)
 	struct Scsi_Host *shost = dev_to_shost(dev);
 	struct device *parent = dev->parent;
 
+	scsi_proc_hostdir_rm(shost->hostt);
+
 	if (shost->ehandler)
 		kthread_stop(shost->ehandler);
 	if (shost->work_q)
diff --git a/drivers/scsi/hptiop.c b/drivers/scsi/hptiop.c
index 34be88d7afa5..af1f0af0c5ac 100644
--- a/drivers/scsi/hptiop.c
+++ b/drivers/scsi/hptiop.c
@@ -580,8 +580,7 @@ static void hptiop_finish_scsi_req(struct hptiop_hba *hba, u32 tag,
 		break;
 
 	default:
-		scp->result = ((DRIVER_INVALID|SUGGEST_ABORT)<<24) |
-					(DID_ABORT<<16);
+		scp->result = DRIVER_INVALID << 24 | DID_ABORT << 16;
 		break;
 	}
 
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index ed1e728763a2..93d1fbe4ee5d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2767,6 +2767,40 @@ static void ibmvfc_retry_tgt_init(struct ibmvfc_target *tgt,
 		ibmvfc_init_tgt(tgt, job_step);
 }
 
+/* Defined in FC-LS */
+static const struct {
+	int code;
+	int retry;
+	int logged_in;
+} prli_rsp [] = {
+	{ 0, 1, 0 },
+	{ 1, 0, 1 },
+	{ 2, 1, 0 },
+	{ 3, 1, 0 },
+	{ 4, 0, 0 },
+	{ 5, 0, 0 },
+	{ 6, 0, 1 },
+	{ 7, 0, 0 },
+	{ 8, 1, 0 },
+};
+
+/**
+ * ibmvfc_get_prli_rsp - Find PRLI response index
+ * @flags:	PRLI response flags
+ *
+ **/
+static int ibmvfc_get_prli_rsp(u16 flags)
+{
+	int i;
+	int code = (flags & 0x0f00) >> 8;
+
+	for (i = 0; i < ARRAY_SIZE(prli_rsp); i++)
+		if (prli_rsp[i].code == code)
+			return i;
+
+	return 0;
+}
+
 /**
  * ibmvfc_tgt_prli_done - Completion handler for Process Login
  * @evt:	ibmvfc event struct
@@ -2777,15 +2811,36 @@ static void ibmvfc_tgt_prli_done(struct ibmvfc_event *evt)
 	struct ibmvfc_target *tgt = evt->tgt;
 	struct ibmvfc_host *vhost = evt->vhost;
 	struct ibmvfc_process_login *rsp = &evt->xfer_iu->prli;
+	struct ibmvfc_prli_svc_parms *parms = &rsp->parms;
 	u32 status = rsp->common.status;
+	int index;
 
 	vhost->discovery_threads--;
 	ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_NONE);
 	switch (status) {
 	case IBMVFC_MAD_SUCCESS:
-		tgt_dbg(tgt, "Process Login succeeded\n");
-		tgt->need_login = 0;
-		ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_ADD_RPORT);
+		tgt_dbg(tgt, "Process Login succeeded: %X %02X %04X\n",
+			parms->type, parms->flags, parms->service_parms);
+
+		if (parms->type == IBMVFC_SCSI_FCP_TYPE) {
+			index = ibmvfc_get_prli_rsp(parms->flags);
+			if (prli_rsp[index].logged_in) {
+				if (parms->flags & IBMVFC_PRLI_EST_IMG_PAIR) {
+					tgt->need_login = 0;
+					tgt->ids.roles = 0;
+					if (parms->service_parms & IBMVFC_PRLI_TARGET_FUNC)
+						tgt->ids.roles |= FC_PORT_ROLE_FCP_TARGET;
+					if (parms->service_parms & IBMVFC_PRLI_INITIATOR_FUNC)
+						tgt->ids.roles |= FC_PORT_ROLE_FCP_INITIATOR;
+					ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_ADD_RPORT);
+				} else
+					ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_DEL_RPORT);
+			} else if (prli_rsp[index].retry)
+				ibmvfc_retry_tgt_init(tgt, ibmvfc_tgt_send_prli);
+			else
+				ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_DEL_RPORT);
+		} else
+			ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_DEL_RPORT);
 		break;
 	case IBMVFC_MAD_DRIVER_FAILED:
 		break;
@@ -2874,7 +2929,6 @@ static void ibmvfc_tgt_plogi_done(struct ibmvfc_event *evt)
 		tgt->ids.node_name = wwn_to_u64(rsp->service_parms.node_name);
 		tgt->ids.port_name = wwn_to_u64(rsp->service_parms.port_name);
 		tgt->ids.port_id = tgt->scsi_id;
-		tgt->ids.roles = FC_PORT_ROLE_FCP_TARGET;
 		memcpy(&tgt->service_parms, &rsp->service_parms,
 		       sizeof(tgt->service_parms));
 		memcpy(&tgt->service_parms_change, &rsp->service_parms_change,
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 07829009a8be..def473f0a98f 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -152,13 +152,13 @@ module_param_named(log_level, ipr_log_level, uint, 0);
 MODULE_PARM_DESC(log_level, "Set to 0 - 4 for increasing verbosity of device driver");
 module_param_named(testmode, ipr_testmode, int, 0);
 MODULE_PARM_DESC(testmode, "DANGEROUS!!! Allows unsupported configurations");
-module_param_named(fastfail, ipr_fastfail, int, 0);
+module_param_named(fastfail, ipr_fastfail, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(fastfail, "Reduce timeouts and retries");
 module_param_named(transop_timeout, ipr_transop_timeout, int, 0);
 MODULE_PARM_DESC(transop_timeout, "Time in seconds to wait for adapter to come operational (default: 300)");
 module_param_named(enable_cache, ipr_enable_cache, int, 0);
 MODULE_PARM_DESC(enable_cache, "Enable adapter's non-volatile write cache (default: 1)");
-module_param_named(debug, ipr_debug, int, 0);
+module_param_named(debug, ipr_debug, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(debug, "Enable device driver debugging logging. Set to 1 to enable. (default: 0)");
 module_param_named(dual_ioa_raid, ipr_dual_ioa_raid, int, 0);
 MODULE_PARM_DESC(dual_ioa_raid, "Enable dual adapter RAID support. Set to 1 to enable. (default: 1)");
@@ -354,6 +354,8 @@ struct ipr_error_table_t ipr_error_table[] = {
 	"9076: Configuration error, missing remote IOA"},
 	{0x06679100, 0, IPR_DEFAULT_LOG_LEVEL,
 	"4050: Enclosure does not support a required multipath function"},
+	{0x06690000, 0, IPR_DEFAULT_LOG_LEVEL,
+	"4070: Logically bad block written on device"},
 	{0x06690200, 0, IPR_DEFAULT_LOG_LEVEL,
 	"9041: Array protection temporarily suspended"},
 	{0x06698200, 0, IPR_DEFAULT_LOG_LEVEL,
@@ -7147,6 +7149,7 @@ static void ipr_free_all_resources(struct ipr_ioa_cfg *ioa_cfg)
 
 	ENTER;
 	free_irq(pdev->irq, ioa_cfg);
+	pci_disable_msi(pdev);
 	iounmap(ioa_cfg->hdw_dma_regs);
 	pci_release_regions(pdev);
 	ipr_free_mem(ioa_cfg);
@@ -7432,6 +7435,11 @@ static int __devinit ipr_probe_ioa(struct pci_dev *pdev,
 		goto out;
 	}
 
+	if (!(rc = pci_enable_msi(pdev)))
+		dev_info(&pdev->dev, "MSI enabled\n");
+	else if (ipr_debug)
+		dev_info(&pdev->dev, "Cannot enable MSI\n");
+
 	dev_info(&pdev->dev, "Found IOA with IRQ: %d\n", pdev->irq);
 
 	host = scsi_host_alloc(&driver_template, sizeof(*ioa_cfg));
@@ -7574,6 +7582,7 @@ out_release_regions:
 out_scsi_host_put:
 	scsi_host_put(host);
 out_disable:
+	pci_disable_msi(pdev);
 	pci_disable_device(pdev);
 	goto out;
 }
diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h
index 8f872f816fe4..79a3ae4fb2c7 100644
--- a/drivers/scsi/ipr.h
+++ b/drivers/scsi/ipr.h
@@ -37,8 +37,8 @@
 /*
  * Literals
  */
-#define IPR_DRIVER_VERSION "2.4.1"
-#define IPR_DRIVER_DATE "(April 24, 2007)"
+#define IPR_DRIVER_VERSION "2.4.2"
+#define IPR_DRIVER_DATE "(January 21, 2009)"
 
 /*
  * IPR_MAX_CMD_PER_LUN: This defines the maximum number of outstanding
diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index ef683f0d2b5a..457d76a4cfe5 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -1004,8 +1004,7 @@ static int __ips_eh_reset(struct scsi_cmnd *SC)
 	DEBUG_VAR(1, "(%s%d) Failing active commands", ips_name, ha->host_num);
 
 	while ((scb = ips_removeq_scb_head(&ha->scb_activelist))) {
-		scb->scsi_cmd->result =
-		    (DID_RESET << 16) | (SUGGEST_RETRY << 24);
+		scb->scsi_cmd->result = DID_RESET << 16;
 		scb->scsi_cmd->scsi_done(scb->scsi_cmd);
 		ips_freescb(ha, scb);
 	}
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 23808dfe22ba..b3e5e08e44ab 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -48,13 +48,6 @@ MODULE_AUTHOR("Mike Christie <michaelc@cs.wisc.edu>, "
 	      "Alex Aizman <itn780@yahoo.com>");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-#undef DEBUG_TCP
-
-#ifdef DEBUG_TCP
-#define debug_tcp(fmt...) printk(KERN_INFO "tcp: " fmt)
-#else
-#define debug_tcp(fmt...)
-#endif
 
 static struct scsi_transport_template *iscsi_sw_tcp_scsi_transport;
 static struct scsi_host_template iscsi_sw_tcp_sht;
@@ -63,6 +56,21 @@ static struct iscsi_transport iscsi_sw_tcp_transport;
 static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
 
+static int iscsi_sw_tcp_dbg;
+module_param_named(debug_iscsi_tcp, iscsi_sw_tcp_dbg, int,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_iscsi_tcp, "Turn on debugging for iscsi_tcp module "
+		 "Set to 1 to turn on, and zero to turn off. Default is off.");
+
+#define ISCSI_SW_TCP_DBG(_conn, dbg_fmt, arg...)		\
+	do {							\
+		if (iscsi_sw_tcp_dbg)				\
+			iscsi_conn_printk(KERN_INFO, _conn,	\
+					     "%s " dbg_fmt,	\
+					     __func__, ##arg);	\
+	} while (0);
+
+
 /**
  * iscsi_sw_tcp_recv - TCP receive in sendfile fashion
  * @rd_desc: read descriptor
@@ -77,7 +85,7 @@ static int iscsi_sw_tcp_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 	unsigned int consumed, total_consumed = 0;
 	int status;
 
-	debug_tcp("in %d bytes\n", skb->len - offset);
+	ISCSI_SW_TCP_DBG(conn, "in %d bytes\n", skb->len - offset);
 
 	do {
 		status = 0;
@@ -86,7 +94,8 @@ static int iscsi_sw_tcp_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 		total_consumed += consumed;
 	} while (consumed != 0 && status != ISCSI_TCP_SKB_DONE);
 
-	debug_tcp("read %d bytes status %d\n", skb->len - offset, status);
+	ISCSI_SW_TCP_DBG(conn, "read %d bytes status %d\n",
+			 skb->len - offset, status);
 	return total_consumed;
 }
 
@@ -131,7 +140,8 @@ static void iscsi_sw_tcp_state_change(struct sock *sk)
 	if ((sk->sk_state == TCP_CLOSE_WAIT ||
 	     sk->sk_state == TCP_CLOSE) &&
 	    !atomic_read(&sk->sk_rmem_alloc)) {
-		debug_tcp("iscsi_tcp_state_change: TCP_CLOSE|TCP_CLOSE_WAIT\n");
+		ISCSI_SW_TCP_DBG(conn, "iscsi_tcp_state_change: "
+				 "TCP_CLOSE|TCP_CLOSE_WAIT\n");
 		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
 	}
 
@@ -155,8 +165,8 @@ static void iscsi_sw_tcp_write_space(struct sock *sk)
 	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
 
 	tcp_sw_conn->old_write_space(sk);
-	debug_tcp("iscsi_write_space: cid %d\n", conn->id);
-	scsi_queue_work(conn->session->host, &conn->xmitwork);
+	ISCSI_SW_TCP_DBG(conn, "iscsi_write_space\n");
+	iscsi_conn_queue_work(conn);
 }
 
 static void iscsi_sw_tcp_conn_set_callbacks(struct iscsi_conn *conn)
@@ -283,7 +293,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 		}
 	}
 
-	debug_tcp("xmit %d bytes\n", consumed);
+	ISCSI_SW_TCP_DBG(conn, "xmit %d bytes\n", consumed);
 
 	conn->txdata_octets += consumed;
 	return consumed;
@@ -291,7 +301,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 error:
 	/* Transmit error. We could initiate error recovery
 	 * here. */
-	debug_tcp("Error sending PDU, errno=%d\n", rc);
+	ISCSI_SW_TCP_DBG(conn, "Error sending PDU, errno=%d\n", rc);
 	iscsi_conn_failure(conn, rc);
 	return -EIO;
 }
@@ -334,9 +344,10 @@ static int iscsi_sw_tcp_send_hdr_done(struct iscsi_tcp_conn *tcp_conn,
 	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
 
 	tcp_sw_conn->out.segment = tcp_sw_conn->out.data_segment;
-	debug_tcp("Header done. Next segment size %u total_size %u\n",
-		  tcp_sw_conn->out.segment.size,
-		  tcp_sw_conn->out.segment.total_size);
+	ISCSI_SW_TCP_DBG(tcp_conn->iscsi_conn,
+			 "Header done. Next segment size %u total_size %u\n",
+			 tcp_sw_conn->out.segment.size,
+			 tcp_sw_conn->out.segment.total_size);
 	return 0;
 }
 
@@ -346,8 +357,8 @@ static void iscsi_sw_tcp_send_hdr_prep(struct iscsi_conn *conn, void *hdr,
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
 
-	debug_tcp("%s(%p%s)\n", __func__, tcp_conn,
-			conn->hdrdgst_en? ", digest enabled" : "");
+	ISCSI_SW_TCP_DBG(conn, "%s\n", conn->hdrdgst_en ?
+			 "digest enabled" : "digest disabled");
 
 	/* Clear the data segment - needs to be filled in by the
 	 * caller using iscsi_tcp_send_data_prep() */
@@ -389,9 +400,9 @@ iscsi_sw_tcp_send_data_prep(struct iscsi_conn *conn, struct scatterlist *sg,
 	struct hash_desc *tx_hash = NULL;
 	unsigned int hdr_spec_len;
 
-	debug_tcp("%s(%p, offset=%d, datalen=%d%s)\n", __func__,
-			tcp_conn, offset, len,
-			conn->datadgst_en? ", digest enabled" : "");
+	ISCSI_SW_TCP_DBG(conn, "offset=%d, datalen=%d %s\n", offset, len,
+			 conn->datadgst_en ?
+			 "digest enabled" : "digest disabled");
 
 	/* Make sure the datalen matches what the caller
 	   said he would send. */
@@ -415,8 +426,8 @@ iscsi_sw_tcp_send_linear_data_prep(struct iscsi_conn *conn, void *data,
 	struct hash_desc *tx_hash = NULL;
 	unsigned int hdr_spec_len;
 
-	debug_tcp("%s(%p, datalen=%d%s)\n", __func__, tcp_conn, len,
-		  conn->datadgst_en? ", digest enabled" : "");
+	ISCSI_SW_TCP_DBG(conn, "datalen=%zd %s\n", len, conn->datadgst_en ?
+			 "digest enabled" : "digest disabled");
 
 	/* Make sure the datalen matches what the caller
 	   said he would send. */
@@ -754,8 +765,7 @@ iscsi_sw_tcp_conn_get_stats(struct iscsi_cls_conn *cls_conn,
 
 static struct iscsi_cls_session *
 iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
-			    uint16_t qdepth, uint32_t initial_cmdsn,
-			    uint32_t *hostno)
+			    uint16_t qdepth, uint32_t initial_cmdsn)
 {
 	struct iscsi_cls_session *cls_session;
 	struct iscsi_session *session;
@@ -766,10 +776,11 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
 		return NULL;
 	}
 
-	shost = iscsi_host_alloc(&iscsi_sw_tcp_sht, 0, qdepth);
+	shost = iscsi_host_alloc(&iscsi_sw_tcp_sht, 0, 1);
 	if (!shost)
 		return NULL;
 	shost->transportt = iscsi_sw_tcp_scsi_transport;
+	shost->cmd_per_lun = qdepth;
 	shost->max_lun = iscsi_max_lun;
 	shost->max_id = 0;
 	shost->max_channel = 0;
@@ -777,7 +788,6 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
 
 	if (iscsi_host_add(shost, NULL))
 		goto free_host;
-	*hostno = shost->host_no;
 
 	cls_session = iscsi_session_setup(&iscsi_sw_tcp_transport, shost,
 					  cmds_max,
@@ -813,6 +823,12 @@ static void iscsi_sw_tcp_session_destroy(struct iscsi_cls_session *cls_session)
 	iscsi_host_free(shost);
 }
 
+static int iscsi_sw_tcp_slave_alloc(struct scsi_device *sdev)
+{
+	set_bit(QUEUE_FLAG_BIDI, &sdev->request_queue->queue_flags);
+	return 0;
+}
+
 static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
 {
 	blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
@@ -833,6 +849,7 @@ static struct scsi_host_template iscsi_sw_tcp_sht = {
 	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_target_reset_handler= iscsi_eh_target_reset,
 	.use_clustering         = DISABLE_CLUSTERING,
+	.slave_alloc            = iscsi_sw_tcp_slave_alloc,
 	.slave_configure        = iscsi_sw_tcp_slave_configure,
 	.proc_name		= "iscsi_tcp",
 	.this_id		= -1,
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
index 505825b6124d..992af05aacf1 100644
--- a/drivers/scsi/libfc/fc_exch.c
+++ b/drivers/scsi/libfc/fc_exch.c
@@ -281,7 +281,7 @@ static void fc_exch_release(struct fc_exch *ep)
 			ep->destructor(&ep->seq, ep->arg);
 		if (ep->lp->tt.exch_put)
 			ep->lp->tt.exch_put(ep->lp, mp, ep->xid);
-		WARN_ON(!ep->esb_stat & ESB_ST_COMPLETE);
+		WARN_ON(!(ep->esb_stat & ESB_ST_COMPLETE));
 		mempool_free(ep, mp->ep_pool);
 	}
 }
@@ -489,7 +489,7 @@ static u16 fc_em_alloc_xid(struct fc_exch_mgr *mp, const struct fc_frame *fp)
 	struct fc_exch *ep = NULL;
 
 	if (mp->max_read) {
-		if (fc_frame_is_read(fp)) {
+		if (fc_fcp_is_read(fr_fsp(fp))) {
 			min = mp->min_xid;
 			max = mp->max_read;
 			plast = &mp->last_read;
@@ -1841,6 +1841,8 @@ struct fc_seq *fc_exch_seq_send(struct fc_lport *lp,
 	fc_exch_setup_hdr(ep, fp, ep->f_ctl);
 	sp->cnt++;
 
+	fc_fcp_ddp_setup(fr_fsp(fp), ep->xid);
+
 	if (unlikely(lp->tt.frame_send(lp, fp)))
 		goto err;
 
diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 2a631d7dbcec..a5725f3b7ce1 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -259,12 +259,62 @@ static void fc_fcp_retry_cmd(struct fc_fcp_pkt *fsp)
 	}
 
 	fsp->state &= ~FC_SRB_ABORT_PENDING;
-	fsp->io_status = SUGGEST_RETRY << 24;
+	fsp->io_status = 0;
 	fsp->status_code = FC_ERROR;
 	fc_fcp_complete_locked(fsp);
 }
 
 /*
+ * fc_fcp_ddp_setup - calls to LLD's ddp_setup to set up DDP
+ * transfer for a read I/O indicated by the fc_fcp_pkt.
+ * @fsp: ptr to the fc_fcp_pkt
+ *
+ * This is called in exch_seq_send() when we have a newly allocated
+ * exchange with a valid exchange id to setup ddp.
+ *
+ * returns: none
+ */
+void fc_fcp_ddp_setup(struct fc_fcp_pkt *fsp, u16 xid)
+{
+	struct fc_lport *lp;
+
+	if (!fsp)
+		return;
+
+	lp = fsp->lp;
+	if ((fsp->req_flags & FC_SRB_READ) &&
+	    (lp->lro_enabled) && (lp->tt.ddp_setup)) {
+		if (lp->tt.ddp_setup(lp, xid, scsi_sglist(fsp->cmd),
+				     scsi_sg_count(fsp->cmd)))
+			fsp->xfer_ddp = xid;
+	}
+}
+EXPORT_SYMBOL(fc_fcp_ddp_setup);
+
+/*
+ * fc_fcp_ddp_done - calls to LLD's ddp_done to release any
+ * DDP related resources for this I/O if it is initialized
+ * as a ddp transfer
+ * @fsp: ptr to the fc_fcp_pkt
+ *
+ * returns: none
+ */
+static void fc_fcp_ddp_done(struct fc_fcp_pkt *fsp)
+{
+	struct fc_lport *lp;
+
+	if (!fsp)
+		return;
+
+	lp = fsp->lp;
+	if (fsp->xfer_ddp && lp->tt.ddp_done) {
+		fsp->xfer_len = lp->tt.ddp_done(lp, fsp->xfer_ddp);
+		fsp->xfer_ddp = 0;
+	}
+}
+
+
+/*
  * Receive SCSI data from target.
  * Called after receiving solicited data.
  */
@@ -289,6 +339,9 @@ static void fc_fcp_recv_data(struct fc_fcp_pkt *fsp, struct fc_frame *fp)
 	len = fr_len(fp) - sizeof(*fh);
 	buf = fc_frame_payload_get(fp, 0);
 
+	/* if this I/O is ddped, update xfer len */
+	fc_fcp_ddp_done(fsp);
+
 	if (offset + len > fsp->data_len) {
 		/* this should never happen */
 		if ((fr_flags(fp) & FCPHF_CRC_UNCHECKED) &&
@@ -435,7 +488,13 @@ static int fc_fcp_send_data(struct fc_fcp_pkt *fsp, struct fc_seq *seq,
 	 * burst length (t_blen) to seq_blen, otherwise set t_blen
 	 * to max FC frame payload previously set in fsp->max_payload.
 	 */
-	t_blen = lp->seq_offload ? seq_blen : fsp->max_payload;
+	t_blen = fsp->max_payload;
+	if (lp->seq_offload) {
+		t_blen = min(seq_blen, (size_t)lp->lso_max);
+		FC_DEBUG_FCP("fsp=%p:lso:blen=%zx lso_max=0x%x t_blen=%zx\n",
+			   fsp, seq_blen, lp->lso_max, t_blen);
+	}
+
 	WARN_ON(t_blen < FC_MIN_MAX_PAYLOAD);
 	if (t_blen > 512)
 		t_blen &= ~(512 - 1);	/* round down to block size */
@@ -744,6 +803,9 @@ static void fc_fcp_resp(struct fc_fcp_pkt *fsp, struct fc_frame *fp)
 	fsp->scsi_comp_flags = flags;
 	expected_len = fsp->data_len;
 
+	/* if ddp, update xfer len */
+	fc_fcp_ddp_done(fsp);
+
 	if (unlikely((flags & ~FCP_CONF_REQ) || fc_rp->fr_status)) {
 		rp_ex = (void *)(fc_rp + 1);
 		if (flags & (FCP_RSP_LEN_VAL | FCP_SNS_LEN_VAL)) {
@@ -859,7 +921,7 @@ static void fc_fcp_complete_locked(struct fc_fcp_pkt *fsp)
 		    (!(fsp->scsi_comp_flags & FCP_RESID_UNDER) ||
 		     fsp->xfer_len < fsp->data_len - fsp->scsi_resid)) {
 			fsp->status_code = FC_DATA_UNDRUN;
-			fsp->io_status = SUGGEST_RETRY << 24;
+			fsp->io_status = 0;
 		}
 	}
 
@@ -1006,7 +1068,7 @@ static int fc_fcp_cmd_send(struct fc_lport *lp, struct fc_fcp_pkt *fsp,
 	}
 
 	memcpy(fc_frame_payload_get(fp, len), &fsp->cdb_cmd, len);
-	fr_cmd(fp) = fsp->cmd;
+	fr_fsp(fp) = fsp;
 	rport = fsp->rport;
 	fsp->max_payload = rport->maxframe_size;
 	rp = rport->dd_data;
@@ -1267,7 +1329,7 @@ static void fc_fcp_rec(struct fc_fcp_pkt *fsp)
 	rp = rport->dd_data;
 	if (!fsp->seq_ptr || rp->rp_state != RPORT_ST_READY) {
 		fsp->status_code = FC_HRD_ERROR;
-		fsp->io_status = SUGGEST_RETRY << 24;
+		fsp->io_status = 0;
 		fc_fcp_complete_locked(fsp);
 		return;
 	}
@@ -1740,6 +1802,9 @@ static void fc_io_compl(struct fc_fcp_pkt *fsp)
 	struct fc_lport *lp;
 	unsigned long flags;
 
+	/* release outstanding ddp context */
+	fc_fcp_ddp_done(fsp);
+
 	fsp->state |= FC_SRB_COMPL;
 	if (!(fsp->state & FC_SRB_FCP_PROCESSING_TMO)) {
 		spin_unlock_bh(&fsp->scsi_pkt_lock);
diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
index 2ae50a1188e6..7ef44501ecc6 100644
--- a/drivers/scsi/libfc/fc_lport.c
+++ b/drivers/scsi/libfc/fc_lport.c
@@ -762,10 +762,11 @@ static void fc_lport_recv_flogi_req(struct fc_seq *sp_in,
 	remote_wwpn = get_unaligned_be64(&flp->fl_wwpn);
 	if (remote_wwpn == lport->wwpn) {
 		FC_DBG("FLOGI from port with same WWPN %llx "
-		       "possible configuration error\n", remote_wwpn);
+		       "possible configuration error\n",
+		       (unsigned long long)remote_wwpn);
 		goto out;
 	}
-	FC_DBG("FLOGI from port WWPN %llx\n", remote_wwpn);
+	FC_DBG("FLOGI from port WWPN %llx\n", (unsigned long long)remote_wwpn);
 
 	/*
 	 * XXX what is the right thing to do for FIDs?
diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
index dae65133a833..0472bb73221e 100644
--- a/drivers/scsi/libfc/fc_rport.c
+++ b/drivers/scsi/libfc/fc_rport.c
@@ -988,7 +988,7 @@ static void fc_rport_recv_plogi_req(struct fc_rport *rport,
 	switch (rdata->rp_state) {
 	case RPORT_ST_INIT:
 		FC_DEBUG_RPORT("incoming PLOGI from %6x wwpn %llx state INIT "
-			       "- reject\n", sid, wwpn);
+			       "- reject\n", sid, (unsigned long long)wwpn);
 		reject = ELS_RJT_UNSUP;
 		break;
 	case RPORT_ST_PLOGI:
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 809d32d95c76..dfaa8adf099e 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -38,6 +38,28 @@
 #include <scsi/scsi_transport_iscsi.h>
 #include <scsi/libiscsi.h>
 
+static int iscsi_dbg_lib;
+module_param_named(debug_libiscsi, iscsi_dbg_lib, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_libiscsi, "Turn on debugging for libiscsi module. "
+		 "Set to 1 to turn on, and zero to turn off. Default "
+		 "is off.");
+
+#define ISCSI_DBG_CONN(_conn, dbg_fmt, arg...)			\
+	do {							\
+		if (iscsi_dbg_lib)				\
+			iscsi_conn_printk(KERN_INFO, _conn,	\
+					     "%s " dbg_fmt,	\
+					     __func__, ##arg);	\
+	} while (0);
+
+#define ISCSI_DBG_SESSION(_session, dbg_fmt, arg...)			\
+	do {								\
+		if (iscsi_dbg_lib)					\
+			iscsi_session_printk(KERN_INFO, _session,	\
+					     "%s " dbg_fmt,		\
+					     __func__, ##arg);		\
+	} while (0);
+
 /* Serial Number Arithmetic, 32 bits, less than, RFC1982 */
 #define SNA32_CHECK 2147483648UL
 
@@ -54,6 +76,15 @@ static int iscsi_sna_lte(u32 n1, u32 n2)
 			    (n1 > n2 && (n2 - n1 < SNA32_CHECK)));
 }
 
+inline void iscsi_conn_queue_work(struct iscsi_conn *conn)
+{
+	struct Scsi_Host *shost = conn->session->host;
+	struct iscsi_host *ihost = shost_priv(shost);
+
+	queue_work(ihost->workq, &conn->xmitwork);
+}
+EXPORT_SYMBOL_GPL(iscsi_conn_queue_work);
+
 void
 iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
 {
@@ -81,8 +112,7 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
 		if (!list_empty(&session->leadconn->xmitqueue) ||
 		    !list_empty(&session->leadconn->mgmtqueue)) {
 			if (!(session->tt->caps & CAP_DATA_PATH_OFFLOAD))
-				scsi_queue_work(session->host,
-						&session->leadconn->xmitwork);
+				iscsi_conn_queue_work(session->leadconn);
 		}
 	}
 }
@@ -176,10 +206,11 @@ static int iscsi_prep_ecdb_ahs(struct iscsi_task *task)
 	ecdb_ahdr->reserved = 0;
 	memcpy(ecdb_ahdr->ecdb, cmd->cmnd + ISCSI_CDB_SIZE, rlen);
 
-	debug_scsi("iscsi_prep_ecdb_ahs: varlen_cdb_len %d "
-		   "rlen %d pad_len %d ahs_length %d iscsi_headers_size %u\n",
-		   cmd->cmd_len, rlen, pad_len, ahslength, task->hdr_len);
-
+	ISCSI_DBG_SESSION(task->conn->session,
+			  "iscsi_prep_ecdb_ahs: varlen_cdb_len %d "
+		          "rlen %d pad_len %d ahs_length %d iscsi_headers_size "
+		          "%u\n", cmd->cmd_len, rlen, pad_len, ahslength,
+		          task->hdr_len);
 	return 0;
 }
 
@@ -201,10 +232,11 @@ static int iscsi_prep_bidi_ahs(struct iscsi_task *task)
 	rlen_ahdr->reserved = 0;
 	rlen_ahdr->read_length = cpu_to_be32(scsi_in(sc)->length);
 
-	debug_scsi("bidi-in rlen_ahdr->read_length(%d) "
-		   "rlen_ahdr->ahslength(%d)\n",
-		   be32_to_cpu(rlen_ahdr->read_length),
-		   be16_to_cpu(rlen_ahdr->ahslength));
+	ISCSI_DBG_SESSION(task->conn->session,
+			  "bidi-in rlen_ahdr->read_length(%d) "
+		          "rlen_ahdr->ahslength(%d)\n",
+		          be32_to_cpu(rlen_ahdr->read_length),
+		          be16_to_cpu(rlen_ahdr->ahslength));
 	return 0;
 }
 
@@ -335,13 +367,15 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	list_move_tail(&task->running, &conn->run_list);
 
 	conn->scsicmd_pdus_cnt++;
-	debug_scsi("iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x len %d "
-		   "bidi_len %d cmdsn %d win %d]\n", scsi_bidi_cmnd(sc) ?
-		   "bidirectional" : sc->sc_data_direction == DMA_TO_DEVICE ?
-		   "write" : "read", conn->id, sc, sc->cmnd[0], task->itt,
-		   scsi_bufflen(sc),
-		   scsi_bidi_cmnd(sc) ? scsi_in(sc)->length : 0,
-		   session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1);
+	ISCSI_DBG_SESSION(session, "iscsi prep [%s cid %d sc %p cdb 0x%x "
+			  "itt 0x%x len %d bidi_len %d cmdsn %d win %d]\n",
+			  scsi_bidi_cmnd(sc) ? "bidirectional" :
+			  sc->sc_data_direction == DMA_TO_DEVICE ?
+			  "write" : "read", conn->id, sc, sc->cmnd[0],
+			  task->itt, scsi_bufflen(sc),
+			  scsi_bidi_cmnd(sc) ? scsi_in(sc)->length : 0,
+			  session->cmdsn,
+			  session->max_cmdsn - session->exp_cmdsn + 1);
 	return 0;
 }
 
@@ -483,9 +517,9 @@ static int iscsi_prep_mgmt_task(struct iscsi_conn *conn,
 
 	task->state = ISCSI_TASK_RUNNING;
 	list_move_tail(&task->running, &conn->mgmt_run_list);
-	debug_scsi("mgmtpdu [op 0x%x hdr->itt 0x%x datalen %d]\n",
-		   hdr->opcode & ISCSI_OPCODE_MASK, hdr->itt,
-		   task->data_count);
+	ISCSI_DBG_SESSION(session, "mgmtpdu [op 0x%x hdr->itt 0x%x "
+			  "datalen %d]\n", hdr->opcode & ISCSI_OPCODE_MASK,
+			  hdr->itt, task->data_count);
 	return 0;
 }
 
@@ -560,7 +594,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			goto free_task;
 
 	} else
-		scsi_queue_work(conn->session->host, &conn->xmitwork);
+		iscsi_conn_queue_work(conn);
 
 	return task;
 
@@ -637,8 +671,9 @@ invalid_datalen:
 
 		memcpy(sc->sense_buffer, data + 2,
 		       min_t(uint16_t, senselen, SCSI_SENSE_BUFFERSIZE));
-		debug_scsi("copied %d bytes of sense\n",
-			   min_t(uint16_t, senselen, SCSI_SENSE_BUFFERSIZE));
+		ISCSI_DBG_SESSION(session, "copied %d bytes of sense\n",
+				  min_t(uint16_t, senselen,
+				  SCSI_SENSE_BUFFERSIZE));
 	}
 
 	if (rhdr->flags & (ISCSI_FLAG_CMD_BIDI_UNDERFLOW |
@@ -666,8 +701,8 @@ invalid_datalen:
 			sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status;
 	}
 out:
-	debug_scsi("done [sc %lx res %d itt 0x%x]\n",
-		   (long)sc, sc->result, task->itt);
+	ISCSI_DBG_SESSION(session, "done [sc %p res %d itt 0x%x]\n",
+			  sc, sc->result, task->itt);
 	conn->scsirsp_pdus_cnt++;
 
 	__iscsi_put_task(task);
@@ -835,8 +870,8 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 	else
 		itt = ~0U;
 
-	debug_scsi("[op 0x%x cid %d itt 0x%x len %d]\n",
-		   opcode, conn->id, itt, datalen);
+	ISCSI_DBG_SESSION(session, "[op 0x%x cid %d itt 0x%x len %d]\n",
+			  opcode, conn->id, itt, datalen);
 
 	if (itt == ~0U) {
 		iscsi_update_cmdsn(session, (struct iscsi_nopin*)hdr);
@@ -1034,10 +1069,9 @@ struct iscsi_task *iscsi_itt_to_ctask(struct iscsi_conn *conn, itt_t itt)
 }
 EXPORT_SYMBOL_GPL(iscsi_itt_to_ctask);
 
-void iscsi_session_failure(struct iscsi_cls_session *cls_session,
+void iscsi_session_failure(struct iscsi_session *session,
 			   enum iscsi_err err)
 {
-	struct iscsi_session *session = cls_session->dd_data;
 	struct iscsi_conn *conn;
 	struct device *dev;
 	unsigned long flags;
@@ -1095,10 +1129,10 @@ static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn)
 	 * Check for iSCSI window and take care of CmdSN wrap-around
 	 */
 	if (!iscsi_sna_lte(session->queued_cmdsn, session->max_cmdsn)) {
-		debug_scsi("iSCSI CmdSN closed. ExpCmdSn %u MaxCmdSN %u "
-			   "CmdSN %u/%u\n", session->exp_cmdsn,
-			   session->max_cmdsn, session->cmdsn,
-			   session->queued_cmdsn);
+		ISCSI_DBG_SESSION(session, "iSCSI CmdSN closed. ExpCmdSn "
+				  "%u MaxCmdSN %u CmdSN %u/%u\n",
+				  session->exp_cmdsn, session->max_cmdsn,
+				  session->cmdsn, session->queued_cmdsn);
 		return -ENOSPC;
 	}
 	return 0;
@@ -1133,7 +1167,7 @@ void iscsi_requeue_task(struct iscsi_task *task)
 	struct iscsi_conn *conn = task->conn;
 
 	list_move_tail(&task->running, &conn->requeue);
-	scsi_queue_work(conn->session->host, &conn->xmitwork);
+	iscsi_conn_queue_work(conn);
 }
 EXPORT_SYMBOL_GPL(iscsi_requeue_task);
 
@@ -1152,7 +1186,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 
 	spin_lock_bh(&conn->session->lock);
 	if (unlikely(conn->suspend_tx)) {
-		debug_scsi("conn %d Tx suspended!\n", conn->id);
+		ISCSI_DBG_SESSION(conn->session, "Tx suspended!\n");
 		spin_unlock_bh(&conn->session->lock);
 		return -ENODATA;
 	}
@@ -1386,7 +1420,7 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 			goto prepd_reject;
 		}
 	} else
-		scsi_queue_work(session->host, &conn->xmitwork);
+		iscsi_conn_queue_work(conn);
 
 	session->queued_cmdsn++;
 	spin_unlock(&session->lock);
@@ -1398,7 +1432,8 @@ prepd_reject:
 	iscsi_complete_command(task);
 reject:
 	spin_unlock(&session->lock);
-	debug_scsi("cmd 0x%x rejected (%d)\n", sc->cmnd[0], reason);
+	ISCSI_DBG_SESSION(session, "cmd 0x%x rejected (%d)\n",
+			  sc->cmnd[0], reason);
 	spin_lock(host->host_lock);
 	return SCSI_MLQUEUE_TARGET_BUSY;
 
@@ -1407,7 +1442,8 @@ prepd_fault:
 	iscsi_complete_command(task);
 fault:
 	spin_unlock(&session->lock);
-	debug_scsi("iscsi: cmd 0x%x is not queued (%d)\n", sc->cmnd[0], reason);
+	ISCSI_DBG_SESSION(session, "iscsi: cmd 0x%x is not queued (%d)\n",
+			  sc->cmnd[0], reason);
 	if (!scsi_bidi_cmnd(sc))
 		scsi_set_resid(sc, scsi_bufflen(sc));
 	else {
@@ -1422,8 +1458,6 @@ EXPORT_SYMBOL_GPL(iscsi_queuecommand);
 
 int iscsi_change_queue_depth(struct scsi_device *sdev, int depth)
 {
-	if (depth > ISCSI_MAX_CMD_PER_LUN)
-		depth = ISCSI_MAX_CMD_PER_LUN;
 	scsi_adjust_queue_depth(sdev, scsi_get_tag_type(sdev), depth);
 	return sdev->queue_depth;
 }
@@ -1457,8 +1491,10 @@ int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 	spin_lock_bh(&session->lock);
 	if (session->state == ISCSI_STATE_TERMINATE) {
 failed:
-		debug_scsi("failing target reset: session terminated "
-			   "[CID %d age %d]\n", conn->id, session->age);
+		iscsi_session_printk(KERN_INFO, session,
+				     "failing target reset: Could not log "
+				     "back into target [age %d]\n",
+				     session->age);
 		spin_unlock_bh(&session->lock);
 		mutex_unlock(&session->eh_mutex);
 		return FAILED;
@@ -1472,7 +1508,7 @@ failed:
 	 */
 	iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
 
-	debug_scsi("iscsi_eh_target_reset wait for relogin\n");
+	ISCSI_DBG_SESSION(session, "wait for relogin\n");
 	wait_event_interruptible(conn->ehwait,
 				 session->state == ISCSI_STATE_TERMINATE ||
 				 session->state == ISCSI_STATE_LOGGED_IN ||
@@ -1501,7 +1537,7 @@ static void iscsi_tmf_timedout(unsigned long data)
 	spin_lock(&session->lock);
 	if (conn->tmf_state == TMF_QUEUED) {
 		conn->tmf_state = TMF_TIMEDOUT;
-		debug_scsi("tmf timedout\n");
+		ISCSI_DBG_SESSION(session, "tmf timedout\n");
 		/* unblock eh_abort() */
 		wake_up(&conn->ehwait);
 	}
@@ -1521,7 +1557,7 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
 		spin_unlock_bh(&session->lock);
 		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
 		spin_lock_bh(&session->lock);
-		debug_scsi("tmf exec failure\n");
+		ISCSI_DBG_SESSION(session, "tmf exec failure\n");
 		return -EPERM;
 	}
 	conn->tmfcmd_pdus_cnt++;
@@ -1529,7 +1565,7 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
 	conn->tmf_timer.function = iscsi_tmf_timedout;
 	conn->tmf_timer.data = (unsigned long)conn;
 	add_timer(&conn->tmf_timer);
-	debug_scsi("tmf set timeout\n");
+	ISCSI_DBG_SESSION(session, "tmf set timeout\n");
 
 	spin_unlock_bh(&session->lock);
 	mutex_unlock(&session->eh_mutex);
@@ -1567,22 +1603,27 @@ static void fail_all_commands(struct iscsi_conn *conn, unsigned lun,
 {
 	struct iscsi_task *task, *tmp;
 
-	if (conn->task && (conn->task->sc->device->lun == lun || lun == -1))
-		conn->task = NULL;
+	if (conn->task) {
+		if (lun == -1 ||
+		    (conn->task->sc && conn->task->sc->device->lun == lun))
+			conn->task = NULL;
+	}
 
 	/* flush pending */
 	list_for_each_entry_safe(task, tmp, &conn->xmitqueue, running) {
 		if (lun == task->sc->device->lun || lun == -1) {
-			debug_scsi("failing pending sc %p itt 0x%x\n",
-				   task->sc, task->itt);
+			ISCSI_DBG_SESSION(conn->session,
+					  "failing pending sc %p itt 0x%x\n",
+					  task->sc, task->itt);
 			fail_command(conn, task, error << 16);
 		}
 	}
 
 	list_for_each_entry_safe(task, tmp, &conn->requeue, running) {
 		if (lun == task->sc->device->lun || lun == -1) {
-			debug_scsi("failing requeued sc %p itt 0x%x\n",
-				   task->sc, task->itt);
+			ISCSI_DBG_SESSION(conn->session,
+					  "failing requeued sc %p itt 0x%x\n",
+					  task->sc, task->itt);
 			fail_command(conn, task, error << 16);
 		}
 	}
@@ -1590,8 +1631,9 @@ static void fail_all_commands(struct iscsi_conn *conn, unsigned lun,
 	/* fail all other running */
 	list_for_each_entry_safe(task, tmp, &conn->run_list, running) {
 		if (lun == task->sc->device->lun || lun == -1) {
-			debug_scsi("failing in progress sc %p itt 0x%x\n",
-				   task->sc, task->itt);
+			ISCSI_DBG_SESSION(conn->session,
+					 "failing in progress sc %p itt 0x%x\n",
+					 task->sc, task->itt);
 			fail_command(conn, task, error << 16);
 		}
 	}
@@ -1599,9 +1641,12 @@ static void fail_all_commands(struct iscsi_conn *conn, unsigned lun,
 
 void iscsi_suspend_tx(struct iscsi_conn *conn)
 {
+	struct Scsi_Host *shost = conn->session->host;
+	struct iscsi_host *ihost = shost_priv(shost);
+
 	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 	if (!(conn->session->tt->caps & CAP_DATA_PATH_OFFLOAD))
-		scsi_flush_work(conn->session->host);
+		flush_workqueue(ihost->workq);
 }
 EXPORT_SYMBOL_GPL(iscsi_suspend_tx);
 
@@ -1609,7 +1654,7 @@ static void iscsi_start_tx(struct iscsi_conn *conn)
 {
 	clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 	if (!(conn->session->tt->caps & CAP_DATA_PATH_OFFLOAD))
-		scsi_queue_work(conn->session->host, &conn->xmitwork);
+		iscsi_conn_queue_work(conn);
 }
 
 static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *scmd)
@@ -1622,7 +1667,7 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *scmd)
 	cls_session = starget_to_session(scsi_target(scmd->device));
 	session = cls_session->dd_data;
 
-	debug_scsi("scsi cmd %p timedout\n", scmd);
+	ISCSI_DBG_SESSION(session, "scsi cmd %p timedout\n", scmd);
 
 	spin_lock(&session->lock);
 	if (session->state != ISCSI_STATE_LOGGED_IN) {
@@ -1662,8 +1707,8 @@ static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *scmd)
 		rc = BLK_EH_RESET_TIMER;
 done:
 	spin_unlock(&session->lock);
-	debug_scsi("return %s\n", rc == BLK_EH_RESET_TIMER ?
-					"timer reset" : "nh");
+	ISCSI_DBG_SESSION(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
+			  "timer reset" : "nh");
 	return rc;
 }
 
@@ -1697,13 +1742,13 @@ static void iscsi_check_transport_timeouts(unsigned long data)
 
 	if (time_before_eq(last_recv + recv_timeout, jiffies)) {
 		/* send a ping to try to provoke some traffic */
-		debug_scsi("Sending nopout as ping on conn %p\n", conn);
+		ISCSI_DBG_CONN(conn, "Sending nopout as ping\n");
 		iscsi_send_nopout(conn, NULL);
 		next_timeout = conn->last_ping + (conn->ping_timeout * HZ);
 	} else
 		next_timeout = last_recv + recv_timeout;
 
-	debug_scsi("Setting next tmo %lu\n", next_timeout);
+	ISCSI_DBG_CONN(conn, "Setting next tmo %lu\n", next_timeout);
 	mod_timer(&conn->transport_timer, next_timeout);
 done:
 	spin_unlock(&session->lock);
@@ -1740,7 +1785,8 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 	 * got the command.
 	 */
 	if (!sc->SCp.ptr) {
-		debug_scsi("sc never reached iscsi layer or it completed.\n");
+		ISCSI_DBG_SESSION(session, "sc never reached iscsi layer or "
+				  "it completed.\n");
 		spin_unlock_bh(&session->lock);
 		mutex_unlock(&session->eh_mutex);
 		return SUCCESS;
@@ -1762,11 +1808,13 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 	age = session->age;
 
 	task = (struct iscsi_task *)sc->SCp.ptr;
-	debug_scsi("aborting [sc %p itt 0x%x]\n", sc, task->itt);
+	ISCSI_DBG_SESSION(session, "aborting [sc %p itt 0x%x]\n",
+			  sc, task->itt);
 
 	/* task completed before time out */
 	if (!task->sc) {
-		debug_scsi("sc completed while abort in progress\n");
+		ISCSI_DBG_SESSION(session, "sc completed while abort in "
+				  "progress\n");
 		goto success;
 	}
 
@@ -1815,7 +1863,8 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		if (!sc->SCp.ptr) {
 			conn->tmf_state = TMF_INITIAL;
 			/* task completed before tmf abort response */
-			debug_scsi("sc completed while abort in progress\n");
+			ISCSI_DBG_SESSION(session, "sc completed while abort "
+					  "in progress\n");
 			goto success;
 		}
 		/* fall through */
@@ -1827,15 +1876,16 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 success:
 	spin_unlock_bh(&session->lock);
 success_unlocked:
-	debug_scsi("abort success [sc %lx itt 0x%x]\n", (long)sc, task->itt);
+	ISCSI_DBG_SESSION(session, "abort success [sc %p itt 0x%x]\n",
+			  sc, task->itt);
 	mutex_unlock(&session->eh_mutex);
 	return SUCCESS;
 
 failed:
 	spin_unlock_bh(&session->lock);
 failed_unlocked:
-	debug_scsi("abort failed [sc %p itt 0x%x]\n", sc,
-		    task ? task->itt : 0);
+	ISCSI_DBG_SESSION(session, "abort failed [sc %p itt 0x%x]\n", sc,
+			  task ? task->itt : 0);
 	mutex_unlock(&session->eh_mutex);
 	return FAILED;
 }
@@ -1862,7 +1912,8 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 	cls_session = starget_to_session(scsi_target(sc->device));
 	session = cls_session->dd_data;
 
-	debug_scsi("LU Reset [sc %p lun %u]\n", sc, sc->device->lun);
+	ISCSI_DBG_SESSION(session, "LU Reset [sc %p lun %u]\n",
+			  sc, sc->device->lun);
 
 	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
@@ -1916,8 +1967,8 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 unlock:
 	spin_unlock_bh(&session->lock);
 done:
-	debug_scsi("iscsi_eh_device_reset %s\n",
-		  rc == SUCCESS ? "SUCCESS" : "FAILED");
+	ISCSI_DBG_SESSION(session, "dev reset result = %s\n",
+			 rc == SUCCESS ? "SUCCESS" : "FAILED");
 	mutex_unlock(&session->eh_mutex);
 	return rc;
 }
@@ -1944,7 +1995,7 @@ iscsi_pool_init(struct iscsi_pool *q, int max, void ***items, int item_size)
 		num_arrays++;
 	q->pool = kzalloc(num_arrays * max * sizeof(void*), GFP_KERNEL);
 	if (q->pool == NULL)
-		goto enomem;
+		return -ENOMEM;
 
 	q->queue = kfifo_init((void*)q->pool, max * sizeof(void*),
 			      GFP_KERNEL, NULL);
@@ -1979,8 +2030,7 @@ void iscsi_pool_free(struct iscsi_pool *q)
 
 	for (i = 0; i < q->max; i++)
 		kfree(q->pool[i]);
-	if (q->pool)
-		kfree(q->pool);
+	kfree(q->pool);
 	kfree(q->queue);
 }
 EXPORT_SYMBOL_GPL(iscsi_pool_free);
@@ -1998,6 +2048,9 @@ int iscsi_host_add(struct Scsi_Host *shost, struct device *pdev)
 	if (!shost->can_queue)
 		shost->can_queue = ISCSI_DEF_XMIT_CMDS_MAX;
 
+	if (!shost->cmd_per_lun)
+		shost->cmd_per_lun = ISCSI_DEF_CMD_PER_LUN;
+
 	if (!shost->transportt->eh_timed_out)
 		shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
 	return scsi_add_host(shost, pdev);
@@ -2008,13 +2061,13 @@ EXPORT_SYMBOL_GPL(iscsi_host_add);
  * iscsi_host_alloc - allocate a host and driver data
  * @sht: scsi host template
  * @dd_data_size: driver host data size
- * @qdepth: default device queue depth
+ * @xmit_can_sleep: bool indicating if LLD will queue IO from a work queue
  *
  * This should be called by partial offload and software iscsi drivers.
  * To access the driver specific memory use the iscsi_host_priv() macro.
  */
 struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
-				   int dd_data_size, uint16_t qdepth)
+				   int dd_data_size, bool xmit_can_sleep)
 {
 	struct Scsi_Host *shost;
 	struct iscsi_host *ihost;
@@ -2022,28 +2075,31 @@ struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
 	shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) + dd_data_size);
 	if (!shost)
 		return NULL;
+	ihost = shost_priv(shost);
 
-	if (qdepth > ISCSI_MAX_CMD_PER_LUN || qdepth < 1) {
-		if (qdepth != 0)
-			printk(KERN_ERR "iscsi: invalid queue depth of %d. "
-			       "Queue depth must be between 1 and %d.\n",
-			       qdepth, ISCSI_MAX_CMD_PER_LUN);
-		qdepth = ISCSI_DEF_CMD_PER_LUN;
+	if (xmit_can_sleep) {
+		snprintf(ihost->workq_name, sizeof(ihost->workq_name),
+			"iscsi_q_%d", shost->host_no);
+		ihost->workq = create_singlethread_workqueue(ihost->workq_name);
+		if (!ihost->workq)
+			goto free_host;
 	}
-	shost->cmd_per_lun = qdepth;
 
-	ihost = shost_priv(shost);
 	spin_lock_init(&ihost->lock);
 	ihost->state = ISCSI_HOST_SETUP;
 	ihost->num_sessions = 0;
 	init_waitqueue_head(&ihost->session_removal_wq);
 	return shost;
+
+free_host:
+	scsi_host_put(shost);
+	return NULL;
 }
 EXPORT_SYMBOL_GPL(iscsi_host_alloc);
 
 static void iscsi_notify_host_removed(struct iscsi_cls_session *cls_session)
 {
-	iscsi_session_failure(cls_session, ISCSI_ERR_INVALID_HOST);
+	iscsi_session_failure(cls_session->dd_data, ISCSI_ERR_INVALID_HOST);
 }
 
 /**
@@ -2069,6 +2125,8 @@ void iscsi_host_remove(struct Scsi_Host *shost)
 		flush_signals(current);
 
 	scsi_remove_host(shost);
+	if (ihost->workq)
+		destroy_workqueue(ihost->workq);
 }
 EXPORT_SYMBOL_GPL(iscsi_host_remove);
 
@@ -2467,14 +2525,16 @@ flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
 
 	/* handle pending */
 	list_for_each_entry_safe(task, tmp, &conn->mgmtqueue, running) {
-		debug_scsi("flushing pending mgmt task itt 0x%x\n", task->itt);
+		ISCSI_DBG_SESSION(session, "flushing pending mgmt task "
+				  "itt 0x%x\n", task->itt);
 		/* release ref from prep task */
 		__iscsi_put_task(task);
 	}
 
 	/* handle running */
 	list_for_each_entry_safe(task, tmp, &conn->mgmt_run_list, running) {
-		debug_scsi("flushing running mgmt task itt 0x%x\n", task->itt);
+		ISCSI_DBG_SESSION(session, "flushing running mgmt task "
+				  "itt 0x%x\n", task->itt);
 		/* release ref from prep task */
 		__iscsi_put_task(task);
 	}
@@ -2524,7 +2584,7 @@ static void iscsi_start_session_recovery(struct iscsi_session *session,
 		conn->datadgst_en = 0;
 		if (session->state == ISCSI_STATE_IN_RECOVERY &&
 		    old_stop_stage != STOP_CONN_RECOVER) {
-			debug_scsi("blocking session\n");
+			ISCSI_DBG_SESSION(session, "blocking session\n");
 			iscsi_block_session(session->cls_session);
 		}
 	}
diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index e7705d3532c9..91f8ce4d8d08 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -49,13 +49,21 @@ MODULE_AUTHOR("Mike Christie <michaelc@cs.wisc.edu>, "
 	      "Alex Aizman <itn780@yahoo.com>");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-#undef DEBUG_TCP
 
-#ifdef DEBUG_TCP
-#define debug_tcp(fmt...) printk(KERN_INFO "tcp: " fmt)
-#else
-#define debug_tcp(fmt...)
-#endif
+static int iscsi_dbg_libtcp;
+module_param_named(debug_libiscsi_tcp, iscsi_dbg_libtcp, int,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_libiscsi_tcp, "Turn on debugging for libiscsi_tcp "
+		 "module. Set to 1 to turn on, and zero to turn off. Default "
+		 "is off.");
+
+#define ISCSI_DBG_TCP(_conn, dbg_fmt, arg...)			\
+	do {							\
+		if (iscsi_dbg_libtcp)				\
+			iscsi_conn_printk(KERN_INFO, _conn,	\
+					     "%s " dbg_fmt,	\
+					     __func__, ##arg);	\
+	} while (0);
 
 static int iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
 				   struct iscsi_segment *segment);
@@ -123,18 +131,13 @@ static void iscsi_tcp_segment_map(struct iscsi_segment *segment, int recv)
 	if (page_count(sg_page(sg)) >= 1 && !recv)
 		return;
 
-	debug_tcp("iscsi_tcp_segment_map %s %p\n", recv ? "recv" : "xmit",
-		  segment);
 	segment->sg_mapped = kmap_atomic(sg_page(sg), KM_SOFTIRQ0);
 	segment->data = segment->sg_mapped + sg->offset + segment->sg_offset;
 }
 
 void iscsi_tcp_segment_unmap(struct iscsi_segment *segment)
 {
-	debug_tcp("iscsi_tcp_segment_unmap %p\n", segment);
-
 	if (segment->sg_mapped) {
-		debug_tcp("iscsi_tcp_segment_unmap valid\n");
 		kunmap_atomic(segment->sg_mapped, KM_SOFTIRQ0);
 		segment->sg_mapped = NULL;
 		segment->data = NULL;
@@ -180,8 +183,9 @@ int iscsi_tcp_segment_done(struct iscsi_tcp_conn *tcp_conn,
 	struct scatterlist sg;
 	unsigned int pad;
 
-	debug_tcp("copied %u %u size %u %s\n", segment->copied, copied,
-		  segment->size, recv ? "recv" : "xmit");
+	ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "copied %u %u size %u %s\n",
+		      segment->copied, copied, segment->size,
+		      recv ? "recv" : "xmit");
 	if (segment->hash && copied) {
 		/*
 		 * If a segment is kmapd we must unmap it before sending
@@ -214,8 +218,8 @@ int iscsi_tcp_segment_done(struct iscsi_tcp_conn *tcp_conn,
 	iscsi_tcp_segment_unmap(segment);
 
 	/* Do we have more scatterlist entries? */
-	debug_tcp("total copied %u total size %u\n", segment->total_copied,
-		   segment->total_size);
+	ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "total copied %u total size %u\n",
+		      segment->total_copied, segment->total_size);
 	if (segment->total_copied < segment->total_size) {
 		/* Proceed to the next entry in the scatterlist. */
 		iscsi_tcp_segment_init_sg(segment, sg_next(segment->sg),
@@ -229,7 +233,8 @@ int iscsi_tcp_segment_done(struct iscsi_tcp_conn *tcp_conn,
 	if (!(tcp_conn->iscsi_conn->session->tt->caps & CAP_PADDING_OFFLOAD)) {
 		pad = iscsi_padding(segment->total_copied);
 		if (pad != 0) {
-			debug_tcp("consume %d pad bytes\n", pad);
+			ISCSI_DBG_TCP(tcp_conn->iscsi_conn,
+				      "consume %d pad bytes\n", pad);
 			segment->total_size += pad;
 			segment->size = pad;
 			segment->data = segment->padbuf;
@@ -278,13 +283,13 @@ iscsi_tcp_segment_recv(struct iscsi_tcp_conn *tcp_conn,
 
 	while (!iscsi_tcp_segment_done(tcp_conn, segment, 1, copy)) {
 		if (copied == len) {
-			debug_tcp("iscsi_tcp_segment_recv copied %d bytes\n",
-				  len);
+			ISCSI_DBG_TCP(tcp_conn->iscsi_conn,
+				      "copied %d bytes\n", len);
 			break;
 		}
 
 		copy = min(len - copied, segment->size - segment->copied);
-		debug_tcp("iscsi_tcp_segment_recv copying %d\n", copy);
+		ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "copying %d\n", copy);
 		memcpy(segment->data + segment->copied, ptr + copied, copy);
 		copied += copy;
 	}
@@ -311,7 +316,7 @@ iscsi_tcp_dgst_verify(struct iscsi_tcp_conn *tcp_conn,
 
 	if (memcmp(segment->recv_digest, segment->digest,
 		   segment->digest_len)) {
-		debug_scsi("digest mismatch\n");
+		ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "digest mismatch\n");
 		return 0;
 	}
 
@@ -355,12 +360,8 @@ iscsi_segment_seek_sg(struct iscsi_segment *segment,
 	struct scatterlist *sg;
 	unsigned int i;
 
-	debug_scsi("iscsi_segment_seek_sg offset %u size %llu\n",
-		  offset, size);
 	__iscsi_segment_init(segment, size, done, hash);
 	for_each_sg(sg_list, sg, sg_count, i) {
-		debug_scsi("sg %d, len %u offset %u\n", i, sg->length,
-			   sg->offset);
 		if (offset < sg->length) {
 			iscsi_tcp_segment_init_sg(segment, sg, offset);
 			return 0;
@@ -382,8 +383,9 @@ EXPORT_SYMBOL_GPL(iscsi_segment_seek_sg);
  */
 void iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn)
 {
-	debug_tcp("iscsi_tcp_hdr_recv_prep(%p%s)\n", tcp_conn,
-		  tcp_conn->iscsi_conn->hdrdgst_en ? ", digest enabled" : "");
+	ISCSI_DBG_TCP(tcp_conn->iscsi_conn,
+		      "(%s)\n", tcp_conn->iscsi_conn->hdrdgst_en ?
+		      "digest enabled" : "digest disabled");
 	iscsi_segment_init_linear(&tcp_conn->in.segment,
 				tcp_conn->in.hdr_buf, sizeof(struct iscsi_hdr),
 				iscsi_tcp_hdr_recv_done, NULL);
@@ -446,7 +448,7 @@ void iscsi_tcp_cleanup_task(struct iscsi_task *task)
 	while (__kfifo_get(tcp_task->r2tqueue, (void*)&r2t, sizeof(void*))) {
 		__kfifo_put(tcp_task->r2tpool.queue, (void*)&r2t,
 			    sizeof(void*));
-		debug_scsi("iscsi_tcp_cleanup_task pending r2t dropped\n");
+		ISCSI_DBG_TCP(task->conn, "pending r2t dropped\n");
 	}
 
 	r2t = tcp_task->r2t;
@@ -476,8 +478,8 @@ static int iscsi_tcp_data_in(struct iscsi_conn *conn, struct iscsi_task *task)
 		return 0;
 
 	if (tcp_task->exp_datasn != datasn) {
-		debug_tcp("%s: task->exp_datasn(%d) != rhdr->datasn(%d)\n",
-		          __func__, tcp_task->exp_datasn, datasn);
+		ISCSI_DBG_TCP(conn, "task->exp_datasn(%d) != rhdr->datasn(%d)"
+			      "\n", tcp_task->exp_datasn, datasn);
 		return ISCSI_ERR_DATASN;
 	}
 
@@ -485,9 +487,9 @@ static int iscsi_tcp_data_in(struct iscsi_conn *conn, struct iscsi_task *task)
 
 	tcp_task->data_offset = be32_to_cpu(rhdr->offset);
 	if (tcp_task->data_offset + tcp_conn->in.datalen > total_in_length) {
-		debug_tcp("%s: data_offset(%d) + data_len(%d) > total_length_in(%d)\n",
-		          __func__, tcp_task->data_offset,
-		          tcp_conn->in.datalen, total_in_length);
+		ISCSI_DBG_TCP(conn, "data_offset(%d) + data_len(%d) > "
+			      "total_length_in(%d)\n", tcp_task->data_offset,
+			      tcp_conn->in.datalen, total_in_length);
 		return ISCSI_ERR_DATA_OFFSET;
 	}
 
@@ -518,8 +520,8 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, struct iscsi_task *task)
 	}
 
 	if (tcp_task->exp_datasn != r2tsn){
-		debug_tcp("%s: task->exp_datasn(%d) != rhdr->r2tsn(%d)\n",
-		          __func__, tcp_task->exp_datasn, r2tsn);
+		ISCSI_DBG_TCP(conn, "task->exp_datasn(%d) != rhdr->r2tsn(%d)\n",
+			      tcp_task->exp_datasn, r2tsn);
 		return ISCSI_ERR_R2TSN;
 	}
 
@@ -552,9 +554,9 @@ static int iscsi_tcp_r2t_rsp(struct iscsi_conn *conn, struct iscsi_task *task)
 	}
 
 	if (r2t->data_length > session->max_burst)
-		debug_scsi("invalid R2T with data len %u and max burst %u."
-			   "Attempting to execute request.\n",
-			    r2t->data_length, session->max_burst);
+		ISCSI_DBG_TCP(conn, "invalid R2T with data len %u and max "
+			      "burst %u. Attempting to execute request.\n",
+			      r2t->data_length, session->max_burst);
 
 	r2t->data_offset = be32_to_cpu(rhdr->data_offset);
 	if (r2t->data_offset + r2t->data_length > scsi_out(task->sc)->length) {
@@ -641,8 +643,8 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	if (rc)
 		return rc;
 
-	debug_tcp("opcode 0x%x ahslen %d datalen %d\n",
-		  opcode, ahslen, tcp_conn->in.datalen);
+	ISCSI_DBG_TCP(conn, "opcode 0x%x ahslen %d datalen %d\n",
+		      opcode, ahslen, tcp_conn->in.datalen);
 
 	switch(opcode) {
 	case ISCSI_OP_SCSI_DATA_IN:
@@ -674,10 +676,10 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 			    !(conn->session->tt->caps & CAP_DIGEST_OFFLOAD))
 				rx_hash = tcp_conn->rx_hash;
 
-			debug_tcp("iscsi_tcp_begin_data_in(%p, offset=%d, "
-				  "datalen=%d)\n", tcp_conn,
-				  tcp_task->data_offset,
-				  tcp_conn->in.datalen);
+			ISCSI_DBG_TCP(conn, "iscsi_tcp_begin_data_in( "
+				     "offset=%d, datalen=%d)\n",
+				      tcp_task->data_offset,
+				      tcp_conn->in.datalen);
 			rc = iscsi_segment_seek_sg(&tcp_conn->in.segment,
 						   sdb->table.sgl,
 						   sdb->table.nents,
@@ -854,10 +856,10 @@ int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct sk_buff *skb,
 	unsigned int consumed = 0;
 	int rc = 0;
 
-	debug_tcp("in %d bytes\n", skb->len - offset);
+	ISCSI_DBG_TCP(conn, "in %d bytes\n", skb->len - offset);
 
 	if (unlikely(conn->suspend_rx)) {
-		debug_tcp("conn %d Rx suspended!\n", conn->id);
+		ISCSI_DBG_TCP(conn, "Rx suspended!\n");
 		*status = ISCSI_TCP_SUSPENDED;
 		return 0;
 	}
@@ -874,15 +876,16 @@ int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct sk_buff *skb,
 
 		avail = skb_seq_read(consumed, &ptr, &seq);
 		if (avail == 0) {
-			debug_tcp("no more data avail. Consumed %d\n",
-				  consumed);
+			ISCSI_DBG_TCP(conn, "no more data avail. Consumed %d\n",
+				      consumed);
 			*status = ISCSI_TCP_SKB_DONE;
 			skb_abort_seq_read(&seq);
 			goto skb_done;
 		}
 		BUG_ON(segment->copied >= segment->size);
 
-		debug_tcp("skb %p ptr=%p avail=%u\n", skb, ptr, avail);
+		ISCSI_DBG_TCP(conn, "skb %p ptr=%p avail=%u\n", skb, ptr,
+			      avail);
 		rc = iscsi_tcp_segment_recv(tcp_conn, segment, ptr, avail);
 		BUG_ON(rc == 0);
 		consumed += rc;
@@ -895,11 +898,11 @@ int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct sk_buff *skb,
 
 segment_done:
 	*status = ISCSI_TCP_SEGMENT_DONE;
-	debug_tcp("segment done\n");
+	ISCSI_DBG_TCP(conn, "segment done\n");
 	rc = segment->done(tcp_conn, segment);
 	if (rc != 0) {
 		*status = ISCSI_TCP_CONN_ERR;
-		debug_tcp("Error receiving PDU, errno=%d\n", rc);
+		ISCSI_DBG_TCP(conn, "Error receiving PDU, errno=%d\n", rc);
 		iscsi_conn_failure(conn, rc);
 		return 0;
 	}
@@ -929,8 +932,7 @@ int iscsi_tcp_task_init(struct iscsi_task *task)
 		 * mgmt tasks do not have a scatterlist since they come
 		 * in from the iscsi interface.
 		 */
-		debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id,
-			   task->itt);
+		ISCSI_DBG_TCP(conn, "mtask deq [itt 0x%x]\n", task->itt);
 
 		return conn->session->tt->init_pdu(task, 0, task->data_count);
 	}
@@ -939,9 +941,8 @@ int iscsi_tcp_task_init(struct iscsi_task *task)
 	tcp_task->exp_datasn = 0;
 
 	/* Prepare PDU, optionally w/ immediate data */
-	debug_scsi("task deq [cid %d itt 0x%x imm %d unsol %d]\n",
-		    conn->id, task->itt, task->imm_count,
-		    task->unsol_r2t.data_length);
+	ISCSI_DBG_TCP(conn, "task deq [itt 0x%x imm %d unsol %d]\n",
+		      task->itt, task->imm_count, task->unsol_r2t.data_length);
 
 	err = conn->session->tt->init_pdu(task, 0, task->imm_count);
 	if (err)
@@ -965,7 +966,8 @@ static struct iscsi_r2t_info *iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
 			r2t = tcp_task->r2t;
 			/* Continue with this R2T? */
 			if (r2t->data_length <= r2t->sent) {
-				debug_scsi("  done with r2t %p\n", r2t);
+				ISCSI_DBG_TCP(task->conn,
+					      "  done with r2t %p\n", r2t);
 				__kfifo_put(tcp_task->r2tpool.queue,
 					    (void *)&tcp_task->r2t,
 					    sizeof(void *));
@@ -1019,7 +1021,7 @@ flush:
 	r2t = iscsi_tcp_get_curr_r2t(task);
 	if (r2t == NULL) {
 		/* Waiting for more R2Ts to arrive. */
-		debug_tcp("no R2Ts yet\n");
+		ISCSI_DBG_TCP(conn, "no R2Ts yet\n");
 		return 0;
 	}
 
@@ -1028,9 +1030,9 @@ flush:
 		return rc;
 	iscsi_prep_data_out_pdu(task, r2t, (struct iscsi_data *) task->hdr);
 
-	debug_scsi("sol dout %p [dsn %d itt 0x%x doff %d dlen %d]\n",
-		   r2t, r2t->datasn - 1, task->hdr->itt,
-		   r2t->data_offset + r2t->sent, r2t->data_count);
+	ISCSI_DBG_TCP(conn, "sol dout %p [dsn %d itt 0x%x doff %d dlen %d]\n",
+		      r2t, r2t->datasn - 1, task->hdr->itt,
+		      r2t->data_offset + r2t->sent, r2t->data_count);
 
 	rc = conn->session->tt->init_pdu(task, r2t->data_offset + r2t->sent,
 					 r2t->data_count);
diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index b615eda361d5..81cdcf46c471 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -1132,7 +1132,7 @@ lpfc_debugfs_dumpDataDif_release(struct inode *inode, struct file *file)
 }
 
 #undef lpfc_debugfs_op_disc_trc
-static struct file_operations lpfc_debugfs_op_disc_trc = {
+static const struct file_operations lpfc_debugfs_op_disc_trc = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_disc_trc_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1141,7 +1141,7 @@ static struct file_operations lpfc_debugfs_op_disc_trc = {
 };
 
 #undef lpfc_debugfs_op_nodelist
-static struct file_operations lpfc_debugfs_op_nodelist = {
+static const struct file_operations lpfc_debugfs_op_nodelist = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_nodelist_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1150,7 +1150,7 @@ static struct file_operations lpfc_debugfs_op_nodelist = {
 };
 
 #undef lpfc_debugfs_op_hbqinfo
-static struct file_operations lpfc_debugfs_op_hbqinfo = {
+static const struct file_operations lpfc_debugfs_op_hbqinfo = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_hbqinfo_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1159,7 +1159,7 @@ static struct file_operations lpfc_debugfs_op_hbqinfo = {
 };
 
 #undef lpfc_debugfs_op_dumpHBASlim
-static struct file_operations lpfc_debugfs_op_dumpHBASlim = {
+static const struct file_operations lpfc_debugfs_op_dumpHBASlim = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_dumpHBASlim_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1168,7 +1168,7 @@ static struct file_operations lpfc_debugfs_op_dumpHBASlim = {
 };
 
 #undef lpfc_debugfs_op_dumpHostSlim
-static struct file_operations lpfc_debugfs_op_dumpHostSlim = {
+static const struct file_operations lpfc_debugfs_op_dumpHostSlim = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_dumpHostSlim_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1177,7 +1177,7 @@ static struct file_operations lpfc_debugfs_op_dumpHostSlim = {
 };
 
 #undef lpfc_debugfs_op_dumpData
-static struct file_operations lpfc_debugfs_op_dumpData = {
+static const struct file_operations lpfc_debugfs_op_dumpData = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_dumpData_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1187,7 +1187,7 @@ static struct file_operations lpfc_debugfs_op_dumpData = {
 };
 
 #undef lpfc_debugfs_op_dumpDif
-static struct file_operations lpfc_debugfs_op_dumpDif = {
+static const struct file_operations lpfc_debugfs_op_dumpDif = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_dumpDif_open,
 	.llseek =       lpfc_debugfs_lseek,
@@ -1197,7 +1197,7 @@ static struct file_operations lpfc_debugfs_op_dumpDif = {
 };
 
 #undef lpfc_debugfs_op_slow_ring_trc
-static struct file_operations lpfc_debugfs_op_slow_ring_trc = {
+static const struct file_operations lpfc_debugfs_op_slow_ring_trc = {
 	.owner =        THIS_MODULE,
 	.open =         lpfc_debugfs_slow_ring_trc_open,
 	.llseek =       lpfc_debugfs_lseek,
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index b103b6ed4970..b1bd3fc7bae8 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -1357,7 +1357,7 @@ lpfc_parse_bg_err(struct lpfc_hba *phba, struct lpfc_scsi_buf *lpfc_cmd,
 
 		scsi_build_sense_buffer(1, cmd->sense_buffer, ILLEGAL_REQUEST,
 				0x10, 0x1);
-		cmd->result = (DRIVER_SENSE|SUGGEST_DIE) << 24
+		cmd->result = DRIVER_SENSE << 24
 			| ScsiResult(DID_ABORT, SAM_STAT_CHECK_CONDITION);
 		phba->bg_guard_err_cnt++;
 		printk(KERN_ERR "BLKGRD: guard_tag error\n");
@@ -1368,7 +1368,7 @@ lpfc_parse_bg_err(struct lpfc_hba *phba, struct lpfc_scsi_buf *lpfc_cmd,
 
 		scsi_build_sense_buffer(1, cmd->sense_buffer, ILLEGAL_REQUEST,
 				0x10, 0x3);
-		cmd->result = (DRIVER_SENSE|SUGGEST_DIE) << 24
+		cmd->result = DRIVER_SENSE << 24
 			| ScsiResult(DID_ABORT, SAM_STAT_CHECK_CONDITION);
 
 		phba->bg_reftag_err_cnt++;
@@ -1380,7 +1380,7 @@ lpfc_parse_bg_err(struct lpfc_hba *phba, struct lpfc_scsi_buf *lpfc_cmd,
 
 		scsi_build_sense_buffer(1, cmd->sense_buffer, ILLEGAL_REQUEST,
 				0x10, 0x2);
-		cmd->result = (DRIVER_SENSE|SUGGEST_DIE) << 24
+		cmd->result = DRIVER_SENSE << 24
 			| ScsiResult(DID_ABORT, SAM_STAT_CHECK_CONDITION);
 
 		phba->bg_apptag_err_cnt++;
diff --git a/drivers/scsi/mpt2sas/Kconfig b/drivers/scsi/mpt2sas/Kconfig
new file mode 100644
index 000000000000..4a86855c23b3
--- /dev/null
+++ b/drivers/scsi/mpt2sas/Kconfig
@@ -0,0 +1,66 @@
+#
+# Kernel configuration file for the MPT2SAS
+#
+# This code is based on drivers/scsi/mpt2sas/Kconfig
+# Copyright (C) 2007-2008  LSI Corporation
+#  (mailto:DL-MPTFusionLinux@lsi.com)
+
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# NO WARRANTY
+# THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+# CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+# LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+# solely responsible for determining the appropriateness of using and
+# distributing the Program and assumes all risks associated with its
+# exercise of rights under this Agreement, including but not limited to
+# the risks and costs of program errors, damage to or loss of data,
+# programs or equipment, and unavailability or interruption of operations.
+
+# DISCLAIMER OF LIABILITY
+# NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+# TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+# USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+# HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+# USA.
+
+config SCSI_MPT2SAS
+	tristate "LSI MPT Fusion SAS 2.0 Device Driver"
+	depends on PCI && SCSI
+	select SCSI_SAS_ATTRS
+	---help---
+	This driver supports PCI-Express SAS 6Gb/s Host Adapters.
+
+config SCSI_MPT2SAS_MAX_SGE
+	int "LSI MPT Fusion Max number of SG Entries (16 - 128)"
+	depends on PCI && SCSI && SCSI_MPT2SAS
+	default "128"
+	range 16 128
+	---help---
+	This option allows you to specify the maximum number of scatter-
+	gather entries per I/O. The driver default is 128, which matches
+	SAFE_PHYS_SEGMENTS.  However, it may decreased down to 16.
+	Decreasing this parameter will reduce memory requirements
+	on a per controller instance.
+
+config SCSI_MPT2SAS_LOGGING
+	bool "LSI MPT Fusion logging facility"
+	depends on PCI && SCSI && SCSI_MPT2SAS
+	---help---
+	This turns on a logging facility.
diff --git a/drivers/scsi/mpt2sas/Makefile b/drivers/scsi/mpt2sas/Makefile
new file mode 100644
index 000000000000..728f0475711d
--- /dev/null
+++ b/drivers/scsi/mpt2sas/Makefile
@@ -0,0 +1,7 @@
+# mpt2sas makefile
+obj-$(CONFIG_SCSI_MPT2SAS) += mpt2sas.o
+mpt2sas-y +=  mpt2sas_base.o        \
+		mpt2sas_config.o    \
+		mpt2sas_scsih.o     \
+		mpt2sas_transport.o \
+		mpt2sas_ctl.o
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2.h b/drivers/scsi/mpt2sas/mpi/mpi2.h
new file mode 100644
index 000000000000..7bb2ece8b2e4
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2.h
@@ -0,0 +1,1067 @@
+/*
+ *  Copyright (c) 2000-2009 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2.h
+ *          Title:  MPI Message independent structures and definitions
+ *                  including System Interface Register Set and
+ *                  scatter/gather formats.
+ *  Creation Date:  June 21, 2006
+ *
+ *  mpi2.h Version:  02.00.11
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  06-04-07  02.00.01  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  06-26-07  02.00.02  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  08-31-07  02.00.03  Bumped MPI2_HEADER_VERSION_UNIT.
+ *                      Moved ReplyPostHostIndex register to offset 0x6C of the
+ *                      MPI2_SYSTEM_INTERFACE_REGS and modified the define for
+ *                      MPI2_REPLY_POST_HOST_INDEX_OFFSET.
+ *                      Added union of request descriptors.
+ *                      Added union of reply descriptors.
+ *  10-31-07  02.00.04  Bumped MPI2_HEADER_VERSION_UNIT.
+ *                      Added define for MPI2_VERSION_02_00.
+ *                      Fixed the size of the FunctionDependent5 field in the
+ *                      MPI2_DEFAULT_REPLY structure.
+ *  12-18-07  02.00.05  Bumped MPI2_HEADER_VERSION_UNIT.
+ *                      Removed the MPI-defined Fault Codes and extended the
+ *                      product specific codes up to 0xEFFF.
+ *                      Added a sixth key value for the WriteSequence register
+ *                      and changed the flush value to 0x0.
+ *                      Added message function codes for Diagnostic Buffer Post
+ *                      and Diagnsotic Release.
+ *                      New IOCStatus define: MPI2_IOCSTATUS_DIAGNOSTIC_RELEASED
+ *                      Moved MPI2_VERSION_UNION from mpi2_ioc.h.
+ *  02-29-08  02.00.06  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  03-03-08  02.00.07  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  05-21-08  02.00.08  Bumped MPI2_HEADER_VERSION_UNIT.
+ *                      Added #defines for marking a reply descriptor as unused.
+ *  06-27-08  02.00.09  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  10-02-08  02.00.10  Bumped MPI2_HEADER_VERSION_UNIT.
+ *                      Moved LUN field defines from mpi2_init.h.
+ *  01-19-09  02.00.11  Bumped MPI2_HEADER_VERSION_UNIT.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_H
+#define MPI2_H
+
+
+/*****************************************************************************
+*
+*        MPI Version Definitions
+*
+*****************************************************************************/
+
+#define MPI2_VERSION_MAJOR                  (0x02)
+#define MPI2_VERSION_MINOR                  (0x00)
+#define MPI2_VERSION_MAJOR_MASK             (0xFF00)
+#define MPI2_VERSION_MAJOR_SHIFT            (8)
+#define MPI2_VERSION_MINOR_MASK             (0x00FF)
+#define MPI2_VERSION_MINOR_SHIFT            (0)
+#define MPI2_VERSION ((MPI2_VERSION_MAJOR << MPI2_VERSION_MAJOR_SHIFT) |   \
+                                      MPI2_VERSION_MINOR)
+
+#define MPI2_VERSION_02_00                  (0x0200)
+
+/* versioning for this MPI header set */
+#define MPI2_HEADER_VERSION_UNIT            (0x0B)
+#define MPI2_HEADER_VERSION_DEV             (0x00)
+#define MPI2_HEADER_VERSION_UNIT_MASK       (0xFF00)
+#define MPI2_HEADER_VERSION_UNIT_SHIFT      (8)
+#define MPI2_HEADER_VERSION_DEV_MASK        (0x00FF)
+#define MPI2_HEADER_VERSION_DEV_SHIFT       (0)
+#define MPI2_HEADER_VERSION ((MPI2_HEADER_VERSION_UNIT << 8) | MPI2_HEADER_VERSION_DEV)
+
+
+/*****************************************************************************
+*
+*        IOC State Definitions
+*
+*****************************************************************************/
+
+#define MPI2_IOC_STATE_RESET               (0x00000000)
+#define MPI2_IOC_STATE_READY               (0x10000000)
+#define MPI2_IOC_STATE_OPERATIONAL         (0x20000000)
+#define MPI2_IOC_STATE_FAULT               (0x40000000)
+
+#define MPI2_IOC_STATE_MASK                (0xF0000000)
+#define MPI2_IOC_STATE_SHIFT               (28)
+
+/* Fault state range for prodcut specific codes */
+#define MPI2_FAULT_PRODUCT_SPECIFIC_MIN                 (0x0000)
+#define MPI2_FAULT_PRODUCT_SPECIFIC_MAX                 (0xEFFF)
+
+
+/*****************************************************************************
+*
+*        System Interface Register Definitions
+*
+*****************************************************************************/
+
+typedef volatile struct _MPI2_SYSTEM_INTERFACE_REGS
+{
+    U32         Doorbell;                   /* 0x00 */
+    U32         WriteSequence;              /* 0x04 */
+    U32         HostDiagnostic;             /* 0x08 */
+    U32         Reserved1;                  /* 0x0C */
+    U32         DiagRWData;                 /* 0x10 */
+    U32         DiagRWAddressLow;           /* 0x14 */
+    U32         DiagRWAddressHigh;          /* 0x18 */
+    U32         Reserved2[5];               /* 0x1C */
+    U32         HostInterruptStatus;        /* 0x30 */
+    U32         HostInterruptMask;          /* 0x34 */
+    U32         DCRData;                    /* 0x38 */
+    U32         DCRAddress;                 /* 0x3C */
+    U32         Reserved3[2];               /* 0x40 */
+    U32         ReplyFreeHostIndex;         /* 0x48 */
+    U32         Reserved4[8];               /* 0x4C */
+    U32         ReplyPostHostIndex;         /* 0x6C */
+    U32         Reserved5;                  /* 0x70 */
+    U32         HCBSize;                    /* 0x74 */
+    U32         HCBAddressLow;              /* 0x78 */
+    U32         HCBAddressHigh;             /* 0x7C */
+    U32         Reserved6[16];              /* 0x80 */
+    U32         RequestDescriptorPostLow;   /* 0xC0 */
+    U32         RequestDescriptorPostHigh;  /* 0xC4 */
+    U32         Reserved7[14];              /* 0xC8 */
+} MPI2_SYSTEM_INTERFACE_REGS, MPI2_POINTER PTR_MPI2_SYSTEM_INTERFACE_REGS,
+  Mpi2SystemInterfaceRegs_t, MPI2_POINTER pMpi2SystemInterfaceRegs_t;
+
+/*
+ * Defines for working with the Doorbell register.
+ */
+#define MPI2_DOORBELL_OFFSET                    (0x00000000)
+
+/* IOC --> System values */
+#define MPI2_DOORBELL_USED                      (0x08000000)
+#define MPI2_DOORBELL_WHO_INIT_MASK             (0x07000000)
+#define MPI2_DOORBELL_WHO_INIT_SHIFT            (24)
+#define MPI2_DOORBELL_FAULT_CODE_MASK           (0x0000FFFF)
+#define MPI2_DOORBELL_DATA_MASK                 (0x0000FFFF)
+
+/* System --> IOC values */
+#define MPI2_DOORBELL_FUNCTION_MASK             (0xFF000000)
+#define MPI2_DOORBELL_FUNCTION_SHIFT            (24)
+#define MPI2_DOORBELL_ADD_DWORDS_MASK           (0x00FF0000)
+#define MPI2_DOORBELL_ADD_DWORDS_SHIFT          (16)
+
+
+/*
+ * Defines for the WriteSequence register
+ */
+#define MPI2_WRITE_SEQUENCE_OFFSET              (0x00000004)
+#define MPI2_WRSEQ_KEY_VALUE_MASK               (0x0000000F)
+#define MPI2_WRSEQ_FLUSH_KEY_VALUE              (0x0)
+#define MPI2_WRSEQ_1ST_KEY_VALUE                (0xF)
+#define MPI2_WRSEQ_2ND_KEY_VALUE                (0x4)
+#define MPI2_WRSEQ_3RD_KEY_VALUE                (0xB)
+#define MPI2_WRSEQ_4TH_KEY_VALUE                (0x2)
+#define MPI2_WRSEQ_5TH_KEY_VALUE                (0x7)
+#define MPI2_WRSEQ_6TH_KEY_VALUE                (0xD)
+
+/*
+ * Defines for the HostDiagnostic register
+ */
+#define MPI2_HOST_DIAGNOSTIC_OFFSET             (0x00000008)
+
+#define MPI2_DIAG_BOOT_DEVICE_SELECT_MASK       (0x00001800)
+#define MPI2_DIAG_BOOT_DEVICE_SELECT_DEFAULT    (0x00000000)
+#define MPI2_DIAG_BOOT_DEVICE_SELECT_HCDW       (0x00000800)
+
+#define MPI2_DIAG_CLEAR_FLASH_BAD_SIG           (0x00000400)
+#define MPI2_DIAG_FORCE_HCB_ON_RESET            (0x00000200)
+#define MPI2_DIAG_HCB_MODE                      (0x00000100)
+#define MPI2_DIAG_DIAG_WRITE_ENABLE             (0x00000080)
+#define MPI2_DIAG_FLASH_BAD_SIG                 (0x00000040)
+#define MPI2_DIAG_RESET_HISTORY                 (0x00000020)
+#define MPI2_DIAG_DIAG_RW_ENABLE                (0x00000010)
+#define MPI2_DIAG_RESET_ADAPTER                 (0x00000004)
+#define MPI2_DIAG_HOLD_IOC_RESET                (0x00000002)
+
+/*
+ * Offsets for DiagRWData and address
+ */
+#define MPI2_DIAG_RW_DATA_OFFSET                (0x00000010)
+#define MPI2_DIAG_RW_ADDRESS_LOW_OFFSET         (0x00000014)
+#define MPI2_DIAG_RW_ADDRESS_HIGH_OFFSET        (0x00000018)
+
+/*
+ * Defines for the HostInterruptStatus register
+ */
+#define MPI2_HOST_INTERRUPT_STATUS_OFFSET       (0x00000030)
+#define MPI2_HIS_SYS2IOC_DB_STATUS              (0x80000000)
+#define MPI2_HIS_IOP_DOORBELL_STATUS            MPI2_HIS_SYS2IOC_DB_STATUS
+#define MPI2_HIS_RESET_IRQ_STATUS               (0x40000000)
+#define MPI2_HIS_REPLY_DESCRIPTOR_INTERRUPT     (0x00000008)
+#define MPI2_HIS_IOC2SYS_DB_STATUS              (0x00000001)
+#define MPI2_HIS_DOORBELL_INTERRUPT             MPI2_HIS_IOC2SYS_DB_STATUS
+
+/*
+ * Defines for the HostInterruptMask register
+ */
+#define MPI2_HOST_INTERRUPT_MASK_OFFSET         (0x00000034)
+#define MPI2_HIM_RESET_IRQ_MASK                 (0x40000000)
+#define MPI2_HIM_REPLY_INT_MASK                 (0x00000008)
+#define MPI2_HIM_RIM                            MPI2_HIM_REPLY_INT_MASK
+#define MPI2_HIM_IOC2SYS_DB_MASK                (0x00000001)
+#define MPI2_HIM_DIM                            MPI2_HIM_IOC2SYS_DB_MASK
+
+/*
+ * Offsets for DCRData and address
+ */
+#define MPI2_DCR_DATA_OFFSET                    (0x00000038)
+#define MPI2_DCR_ADDRESS_OFFSET                 (0x0000003C)
+
+/*
+ * Offset for the Reply Free Queue
+ */
+#define MPI2_REPLY_FREE_HOST_INDEX_OFFSET       (0x00000048)
+
+/*
+ * Offset for the Reply Descriptor Post Queue
+ */
+#define MPI2_REPLY_POST_HOST_INDEX_OFFSET       (0x0000006C)
+
+/*
+ * Defines for the HCBSize and address
+ */
+#define MPI2_HCB_SIZE_OFFSET                    (0x00000074)
+#define MPI2_HCB_SIZE_SIZE_MASK                 (0xFFFFF000)
+#define MPI2_HCB_SIZE_HCB_ENABLE                (0x00000001)
+
+#define MPI2_HCB_ADDRESS_LOW_OFFSET             (0x00000078)
+#define MPI2_HCB_ADDRESS_HIGH_OFFSET            (0x0000007C)
+
+/*
+ * Offsets for the Request Queue
+ */
+#define MPI2_REQUEST_DESCRIPTOR_POST_LOW_OFFSET     (0x000000C0)
+#define MPI2_REQUEST_DESCRIPTOR_POST_HIGH_OFFSET    (0x000000C4)
+
+
+/*****************************************************************************
+*
+*        Message Descriptors
+*
+*****************************************************************************/
+
+/* Request Descriptors */
+
+/* Default Request Descriptor */
+typedef struct _MPI2_DEFAULT_REQUEST_DESCRIPTOR
+{
+    U8              RequestFlags;               /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U16             LMID;                       /* 0x04 */
+    U16             DescriptorTypeDependent;    /* 0x06 */
+} MPI2_DEFAULT_REQUEST_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_DEFAULT_REQUEST_DESCRIPTOR,
+  Mpi2DefaultRequestDescriptor_t, MPI2_POINTER pMpi2DefaultRequestDescriptor_t;
+
+/* defines for the RequestFlags field */
+#define MPI2_REQ_DESCRIPT_FLAGS_TYPE_MASK               (0x0E)
+#define MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO                 (0x00)
+#define MPI2_REQ_DESCRIPT_FLAGS_SCSI_TARGET             (0x02)
+#define MPI2_REQ_DESCRIPT_FLAGS_HIGH_PRIORITY           (0x06)
+#define MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE            (0x08)
+
+#define MPI2_REQ_DESCRIPT_FLAGS_IOC_FIFO_MARKER (0x01)
+
+
+/* High Priority Request Descriptor */
+typedef struct _MPI2_HIGH_PRIORITY_REQUEST_DESCRIPTOR
+{
+    U8              RequestFlags;               /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U16             LMID;                       /* 0x04 */
+    U16             Reserved1;                  /* 0x06 */
+} MPI2_HIGH_PRIORITY_REQUEST_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_HIGH_PRIORITY_REQUEST_DESCRIPTOR,
+  Mpi2HighPriorityRequestDescriptor_t,
+  MPI2_POINTER pMpi2HighPriorityRequestDescriptor_t;
+
+
+/* SCSI IO Request Descriptor */
+typedef struct _MPI2_SCSI_IO_REQUEST_DESCRIPTOR
+{
+    U8              RequestFlags;               /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U16             LMID;                       /* 0x04 */
+    U16             DevHandle;                  /* 0x06 */
+} MPI2_SCSI_IO_REQUEST_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_SCSI_IO_REQUEST_DESCRIPTOR,
+  Mpi2SCSIIORequestDescriptor_t, MPI2_POINTER pMpi2SCSIIORequestDescriptor_t;
+
+
+/* SCSI Target Request Descriptor */
+typedef struct _MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR
+{
+    U8              RequestFlags;               /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U16             LMID;                       /* 0x04 */
+    U16             IoIndex;                    /* 0x06 */
+} MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR,
+  Mpi2SCSITargetRequestDescriptor_t,
+  MPI2_POINTER pMpi2SCSITargetRequestDescriptor_t;
+
+/* union of Request Descriptors */
+typedef union _MPI2_REQUEST_DESCRIPTOR_UNION
+{
+    MPI2_DEFAULT_REQUEST_DESCRIPTOR         Default;
+    MPI2_HIGH_PRIORITY_REQUEST_DESCRIPTOR   HighPriority;
+    MPI2_SCSI_IO_REQUEST_DESCRIPTOR         SCSIIO;
+    MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR     SCSITarget;
+    U64                                     Words;
+} MPI2_REQUEST_DESCRIPTOR_UNION, MPI2_POINTER PTR_MPI2_REQUEST_DESCRIPTOR_UNION,
+  Mpi2RequestDescriptorUnion_t, MPI2_POINTER pMpi2RequestDescriptorUnion_t;
+
+
+/* Reply Descriptors */
+
+/* Default Reply Descriptor */
+typedef struct _MPI2_DEFAULT_REPLY_DESCRIPTOR
+{
+    U8              ReplyFlags;                 /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             DescriptorTypeDependent1;   /* 0x02 */
+    U32             DescriptorTypeDependent2;   /* 0x04 */
+} MPI2_DEFAULT_REPLY_DESCRIPTOR, MPI2_POINTER PTR_MPI2_DEFAULT_REPLY_DESCRIPTOR,
+  Mpi2DefaultReplyDescriptor_t, MPI2_POINTER pMpi2DefaultReplyDescriptor_t;
+
+/* defines for the ReplyFlags field */
+#define MPI2_RPY_DESCRIPT_FLAGS_TYPE_MASK               (0x0F)
+#define MPI2_RPY_DESCRIPT_FLAGS_SCSI_IO_SUCCESS         (0x00)
+#define MPI2_RPY_DESCRIPT_FLAGS_ADDRESS_REPLY           (0x01)
+#define MPI2_RPY_DESCRIPT_FLAGS_TARGETASSIST_SUCCESS    (0x02)
+#define MPI2_RPY_DESCRIPT_FLAGS_TARGET_COMMAND_BUFFER   (0x03)
+#define MPI2_RPY_DESCRIPT_FLAGS_UNUSED                  (0x0F)
+
+/* values for marking a reply descriptor as unused */
+#define MPI2_RPY_DESCRIPT_UNUSED_WORD0_MARK             (0xFFFFFFFF)
+#define MPI2_RPY_DESCRIPT_UNUSED_WORD1_MARK             (0xFFFFFFFF)
+
+/* Address Reply Descriptor */
+typedef struct _MPI2_ADDRESS_REPLY_DESCRIPTOR
+{
+    U8              ReplyFlags;                 /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U32             ReplyFrameAddress;          /* 0x04 */
+} MPI2_ADDRESS_REPLY_DESCRIPTOR, MPI2_POINTER PTR_MPI2_ADDRESS_REPLY_DESCRIPTOR,
+  Mpi2AddressReplyDescriptor_t, MPI2_POINTER pMpi2AddressReplyDescriptor_t;
+
+#define MPI2_ADDRESS_REPLY_SMID_INVALID                 (0x00)
+
+
+/* SCSI IO Success Reply Descriptor */
+typedef struct _MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR
+{
+    U8              ReplyFlags;                 /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U16             TaskTag;                    /* 0x04 */
+    U16             DevHandle;                  /* 0x06 */
+} MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR,
+  Mpi2SCSIIOSuccessReplyDescriptor_t,
+  MPI2_POINTER pMpi2SCSIIOSuccessReplyDescriptor_t;
+
+
+/* TargetAssist Success Reply Descriptor */
+typedef struct _MPI2_TARGETASSIST_SUCCESS_REPLY_DESCRIPTOR
+{
+    U8              ReplyFlags;                 /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U16             SMID;                       /* 0x02 */
+    U8              SequenceNumber;             /* 0x04 */
+    U8              Reserved1;                  /* 0x05 */
+    U16             IoIndex;                    /* 0x06 */
+} MPI2_TARGETASSIST_SUCCESS_REPLY_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_TARGETASSIST_SUCCESS_REPLY_DESCRIPTOR,
+  Mpi2TargetAssistSuccessReplyDescriptor_t,
+  MPI2_POINTER pMpi2TargetAssistSuccessReplyDescriptor_t;
+
+
+/* Target Command Buffer Reply Descriptor */
+typedef struct _MPI2_TARGET_COMMAND_BUFFER_REPLY_DESCRIPTOR
+{
+    U8              ReplyFlags;                 /* 0x00 */
+    U8              VF_ID;                      /* 0x01 */
+    U8              VP_ID;                      /* 0x02 */
+    U8              Flags;                      /* 0x03 */
+    U16             InitiatorDevHandle;         /* 0x04 */
+    U16             IoIndex;                    /* 0x06 */
+} MPI2_TARGET_COMMAND_BUFFER_REPLY_DESCRIPTOR,
+  MPI2_POINTER PTR_MPI2_TARGET_COMMAND_BUFFER_REPLY_DESCRIPTOR,
+  Mpi2TargetCommandBufferReplyDescriptor_t,
+  MPI2_POINTER pMpi2TargetCommandBufferReplyDescriptor_t;
+
+/* defines for Flags field */
+#define MPI2_RPY_DESCRIPT_TCB_FLAGS_PHYNUM_MASK     (0x3F)
+
+
+/* union of Reply Descriptors */
+typedef union _MPI2_REPLY_DESCRIPTORS_UNION
+{
+    MPI2_DEFAULT_REPLY_DESCRIPTOR               Default;
+    MPI2_ADDRESS_REPLY_DESCRIPTOR               AddressReply;
+    MPI2_SCSI_IO_SUCCESS_REPLY_DESCRIPTOR       SCSIIOSuccess;
+    MPI2_TARGETASSIST_SUCCESS_REPLY_DESCRIPTOR  TargetAssistSuccess;
+    MPI2_TARGET_COMMAND_BUFFER_REPLY_DESCRIPTOR TargetCommandBuffer;
+    U64                                         Words;
+} MPI2_REPLY_DESCRIPTORS_UNION, MPI2_POINTER PTR_MPI2_REPLY_DESCRIPTORS_UNION,
+  Mpi2ReplyDescriptorsUnion_t, MPI2_POINTER pMpi2ReplyDescriptorsUnion_t;
+
+
+
+/*****************************************************************************
+*
+*        Message Functions
+*              0x80 -> 0x8F reserved for private message use per product
+*
+*
+*****************************************************************************/
+
+#define MPI2_FUNCTION_SCSI_IO_REQUEST               (0x00) /* SCSI IO */
+#define MPI2_FUNCTION_SCSI_TASK_MGMT                (0x01) /* SCSI Task Management */
+#define MPI2_FUNCTION_IOC_INIT                      (0x02) /* IOC Init */
+#define MPI2_FUNCTION_IOC_FACTS                     (0x03) /* IOC Facts */
+#define MPI2_FUNCTION_CONFIG                        (0x04) /* Configuration */
+#define MPI2_FUNCTION_PORT_FACTS                    (0x05) /* Port Facts */
+#define MPI2_FUNCTION_PORT_ENABLE                   (0x06) /* Port Enable */
+#define MPI2_FUNCTION_EVENT_NOTIFICATION            (0x07) /* Event Notification */
+#define MPI2_FUNCTION_EVENT_ACK                     (0x08) /* Event Acknowledge */
+#define MPI2_FUNCTION_FW_DOWNLOAD                   (0x09) /* FW Download */
+#define MPI2_FUNCTION_TARGET_ASSIST                 (0x0B) /* Target Assist */
+#define MPI2_FUNCTION_TARGET_STATUS_SEND            (0x0C) /* Target Status Send */
+#define MPI2_FUNCTION_TARGET_MODE_ABORT             (0x0D) /* Target Mode Abort */
+#define MPI2_FUNCTION_FW_UPLOAD                     (0x12) /* FW Upload */
+#define MPI2_FUNCTION_RAID_ACTION                   (0x15) /* RAID Action */
+#define MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH      (0x16) /* SCSI IO RAID Passthrough */
+#define MPI2_FUNCTION_TOOLBOX                       (0x17) /* Toolbox */
+#define MPI2_FUNCTION_SCSI_ENCLOSURE_PROCESSOR      (0x18) /* SCSI Enclosure Processor */
+#define MPI2_FUNCTION_SMP_PASSTHROUGH               (0x1A) /* SMP Passthrough */
+#define MPI2_FUNCTION_SAS_IO_UNIT_CONTROL           (0x1B) /* SAS IO Unit Control */
+#define MPI2_FUNCTION_SATA_PASSTHROUGH              (0x1C) /* SATA Passthrough */
+#define MPI2_FUNCTION_DIAG_BUFFER_POST              (0x1D) /* Diagnostic Buffer Post */
+#define MPI2_FUNCTION_DIAG_RELEASE                  (0x1E) /* Diagnostic Release */
+#define MPI2_FUNCTION_TARGET_CMD_BUF_BASE_POST      (0x24) /* Target Command Buffer Post Base */
+#define MPI2_FUNCTION_TARGET_CMD_BUF_LIST_POST      (0x25) /* Target Command Buffer Post List */
+
+
+
+/* Doorbell functions */
+#define MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET        (0x40)
+/* #define MPI2_FUNCTION_IO_UNIT_RESET                 (0x41) */
+#define MPI2_FUNCTION_HANDSHAKE                     (0x42)
+
+
+/*****************************************************************************
+*
+*        IOC Status Values
+*
+*****************************************************************************/
+
+/* mask for IOCStatus status value */
+#define MPI2_IOCSTATUS_MASK                     (0x7FFF)
+
+/****************************************************************************
+*  Common IOCStatus values for all replies
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_SUCCESS                      (0x0000)
+#define MPI2_IOCSTATUS_INVALID_FUNCTION             (0x0001)
+#define MPI2_IOCSTATUS_BUSY                         (0x0002)
+#define MPI2_IOCSTATUS_INVALID_SGL                  (0x0003)
+#define MPI2_IOCSTATUS_INTERNAL_ERROR               (0x0004)
+#define MPI2_IOCSTATUS_INVALID_VPID                 (0x0005)
+#define MPI2_IOCSTATUS_INSUFFICIENT_RESOURCES       (0x0006)
+#define MPI2_IOCSTATUS_INVALID_FIELD                (0x0007)
+#define MPI2_IOCSTATUS_INVALID_STATE                (0x0008)
+#define MPI2_IOCSTATUS_OP_STATE_NOT_SUPPORTED       (0x0009)
+
+/****************************************************************************
+*  Config IOCStatus values
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_CONFIG_INVALID_ACTION        (0x0020)
+#define MPI2_IOCSTATUS_CONFIG_INVALID_TYPE          (0x0021)
+#define MPI2_IOCSTATUS_CONFIG_INVALID_PAGE          (0x0022)
+#define MPI2_IOCSTATUS_CONFIG_INVALID_DATA          (0x0023)
+#define MPI2_IOCSTATUS_CONFIG_NO_DEFAULTS           (0x0024)
+#define MPI2_IOCSTATUS_CONFIG_CANT_COMMIT           (0x0025)
+
+/****************************************************************************
+*  SCSI IO Reply
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_SCSI_RECOVERED_ERROR         (0x0040)
+#define MPI2_IOCSTATUS_SCSI_INVALID_DEVHANDLE       (0x0042)
+#define MPI2_IOCSTATUS_SCSI_DEVICE_NOT_THERE        (0x0043)
+#define MPI2_IOCSTATUS_SCSI_DATA_OVERRUN            (0x0044)
+#define MPI2_IOCSTATUS_SCSI_DATA_UNDERRUN           (0x0045)
+#define MPI2_IOCSTATUS_SCSI_IO_DATA_ERROR           (0x0046)
+#define MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR          (0x0047)
+#define MPI2_IOCSTATUS_SCSI_TASK_TERMINATED         (0x0048)
+#define MPI2_IOCSTATUS_SCSI_RESIDUAL_MISMATCH       (0x0049)
+#define MPI2_IOCSTATUS_SCSI_TASK_MGMT_FAILED        (0x004A)
+#define MPI2_IOCSTATUS_SCSI_IOC_TERMINATED          (0x004B)
+#define MPI2_IOCSTATUS_SCSI_EXT_TERMINATED          (0x004C)
+
+/****************************************************************************
+*  For use by SCSI Initiator and SCSI Target end-to-end data protection
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_EEDP_GUARD_ERROR             (0x004D)
+#define MPI2_IOCSTATUS_EEDP_REF_TAG_ERROR           (0x004E)
+#define MPI2_IOCSTATUS_EEDP_APP_TAG_ERROR           (0x004F)
+
+/****************************************************************************
+*  SCSI Target values
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_TARGET_INVALID_IO_INDEX      (0x0062)
+#define MPI2_IOCSTATUS_TARGET_ABORTED               (0x0063)
+#define MPI2_IOCSTATUS_TARGET_NO_CONN_RETRYABLE     (0x0064)
+#define MPI2_IOCSTATUS_TARGET_NO_CONNECTION         (0x0065)
+#define MPI2_IOCSTATUS_TARGET_XFER_COUNT_MISMATCH   (0x006A)
+#define MPI2_IOCSTATUS_TARGET_DATA_OFFSET_ERROR     (0x006D)
+#define MPI2_IOCSTATUS_TARGET_TOO_MUCH_WRITE_DATA   (0x006E)
+#define MPI2_IOCSTATUS_TARGET_IU_TOO_SHORT          (0x006F)
+#define MPI2_IOCSTATUS_TARGET_ACK_NAK_TIMEOUT       (0x0070)
+#define MPI2_IOCSTATUS_TARGET_NAK_RECEIVED          (0x0071)
+
+/****************************************************************************
+*  Serial Attached SCSI values
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_SAS_SMP_REQUEST_FAILED       (0x0090)
+#define MPI2_IOCSTATUS_SAS_SMP_DATA_OVERRUN         (0x0091)
+
+/****************************************************************************
+*  Diagnostic Buffer Post / Diagnostic Release values
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_DIAGNOSTIC_RELEASED          (0x00A0)
+
+
+/****************************************************************************
+*  IOCStatus flag to indicate that log info is available
+****************************************************************************/
+
+#define MPI2_IOCSTATUS_FLAG_LOG_INFO_AVAILABLE  (0x8000)
+
+/****************************************************************************
+*  IOCLogInfo Types
+****************************************************************************/
+
+#define MPI2_IOCLOGINFO_TYPE_MASK               (0xF0000000)
+#define MPI2_IOCLOGINFO_TYPE_SHIFT              (28)
+#define MPI2_IOCLOGINFO_TYPE_NONE               (0x0)
+#define MPI2_IOCLOGINFO_TYPE_SCSI               (0x1)
+#define MPI2_IOCLOGINFO_TYPE_FC                 (0x2)
+#define MPI2_IOCLOGINFO_TYPE_SAS                (0x3)
+#define MPI2_IOCLOGINFO_TYPE_ISCSI              (0x4)
+#define MPI2_IOCLOGINFO_LOG_DATA_MASK           (0x0FFFFFFF)
+
+
+/*****************************************************************************
+*
+*        Standard Message Structures
+*
+*****************************************************************************/
+
+/****************************************************************************
+* Request Message Header for all request messages
+****************************************************************************/
+
+typedef struct _MPI2_REQUEST_HEADER
+{
+    U16             FunctionDependent1;         /* 0x00 */
+    U8              ChainOffset;                /* 0x02 */
+    U8              Function;                   /* 0x03 */
+    U16             FunctionDependent2;         /* 0x04 */
+    U8              FunctionDependent3;         /* 0x06 */
+    U8              MsgFlags;                   /* 0x07 */
+    U8              VP_ID;                      /* 0x08 */
+    U8              VF_ID;                      /* 0x09 */
+    U16             Reserved1;                  /* 0x0A */
+} MPI2_REQUEST_HEADER, MPI2_POINTER PTR_MPI2_REQUEST_HEADER,
+  MPI2RequestHeader_t, MPI2_POINTER pMPI2RequestHeader_t;
+
+
+/****************************************************************************
+*  Default Reply
+****************************************************************************/
+
+typedef struct _MPI2_DEFAULT_REPLY
+{
+    U16             FunctionDependent1;         /* 0x00 */
+    U8              MsgLength;                  /* 0x02 */
+    U8              Function;                   /* 0x03 */
+    U16             FunctionDependent2;         /* 0x04 */
+    U8              FunctionDependent3;         /* 0x06 */
+    U8              MsgFlags;                   /* 0x07 */
+    U8              VP_ID;                      /* 0x08 */
+    U8              VF_ID;                      /* 0x09 */
+    U16             Reserved1;                  /* 0x0A */
+    U16             FunctionDependent5;         /* 0x0C */
+    U16             IOCStatus;                  /* 0x0E */
+    U32             IOCLogInfo;                 /* 0x10 */
+} MPI2_DEFAULT_REPLY, MPI2_POINTER PTR_MPI2_DEFAULT_REPLY,
+  MPI2DefaultReply_t, MPI2_POINTER pMPI2DefaultReply_t;
+
+
+/* common version structure/union used in messages and configuration pages */
+
+typedef struct _MPI2_VERSION_STRUCT
+{
+    U8                      Dev;                        /* 0x00 */
+    U8                      Unit;                       /* 0x01 */
+    U8                      Minor;                      /* 0x02 */
+    U8                      Major;                      /* 0x03 */
+} MPI2_VERSION_STRUCT;
+
+typedef union _MPI2_VERSION_UNION
+{
+    MPI2_VERSION_STRUCT     Struct;
+    U32                     Word;
+} MPI2_VERSION_UNION;
+
+
+/* LUN field defines, common to many structures */
+#define MPI2_LUN_FIRST_LEVEL_ADDRESSING             (0x0000FFFF)
+#define MPI2_LUN_SECOND_LEVEL_ADDRESSING            (0xFFFF0000)
+#define MPI2_LUN_THIRD_LEVEL_ADDRESSING             (0x0000FFFF)
+#define MPI2_LUN_FOURTH_LEVEL_ADDRESSING            (0xFFFF0000)
+#define MPI2_LUN_LEVEL_1_WORD                       (0xFF00)
+#define MPI2_LUN_LEVEL_1_DWORD                      (0x0000FF00)
+
+
+/*****************************************************************************
+*
+*        Fusion-MPT MPI Scatter Gather Elements
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  MPI Simple Element structures
+****************************************************************************/
+
+typedef struct _MPI2_SGE_SIMPLE32
+{
+    U32                     FlagsLength;
+    U32                     Address;
+} MPI2_SGE_SIMPLE32, MPI2_POINTER PTR_MPI2_SGE_SIMPLE32,
+  Mpi2SGESimple32_t, MPI2_POINTER pMpi2SGESimple32_t;
+
+typedef struct _MPI2_SGE_SIMPLE64
+{
+    U32                     FlagsLength;
+    U64                     Address;
+} MPI2_SGE_SIMPLE64, MPI2_POINTER PTR_MPI2_SGE_SIMPLE64,
+  Mpi2SGESimple64_t, MPI2_POINTER pMpi2SGESimple64_t;
+
+typedef struct _MPI2_SGE_SIMPLE_UNION
+{
+    U32                     FlagsLength;
+    union
+    {
+        U32                 Address32;
+        U64                 Address64;
+    } u;
+} MPI2_SGE_SIMPLE_UNION, MPI2_POINTER PTR_MPI2_SGE_SIMPLE_UNION,
+  Mpi2SGESimpleUnion_t, MPI2_POINTER pMpi2SGESimpleUnion_t;
+
+
+/****************************************************************************
+*  MPI Chain Element structures
+****************************************************************************/
+
+typedef struct _MPI2_SGE_CHAIN32
+{
+    U16                     Length;
+    U8                      NextChainOffset;
+    U8                      Flags;
+    U32                     Address;
+} MPI2_SGE_CHAIN32, MPI2_POINTER PTR_MPI2_SGE_CHAIN32,
+  Mpi2SGEChain32_t, MPI2_POINTER pMpi2SGEChain32_t;
+
+typedef struct _MPI2_SGE_CHAIN64
+{
+    U16                     Length;
+    U8                      NextChainOffset;
+    U8                      Flags;
+    U64                     Address;
+} MPI2_SGE_CHAIN64, MPI2_POINTER PTR_MPI2_SGE_CHAIN64,
+  Mpi2SGEChain64_t, MPI2_POINTER pMpi2SGEChain64_t;
+
+typedef struct _MPI2_SGE_CHAIN_UNION
+{
+    U16                     Length;
+    U8                      NextChainOffset;
+    U8                      Flags;
+    union
+    {
+        U32                 Address32;
+        U64                 Address64;
+    } u;
+} MPI2_SGE_CHAIN_UNION, MPI2_POINTER PTR_MPI2_SGE_CHAIN_UNION,
+  Mpi2SGEChainUnion_t, MPI2_POINTER pMpi2SGEChainUnion_t;
+
+
+/****************************************************************************
+*  MPI Transaction Context Element structures
+****************************************************************************/
+
+typedef struct _MPI2_SGE_TRANSACTION32
+{
+    U8                      Reserved;
+    U8                      ContextSize;
+    U8                      DetailsLength;
+    U8                      Flags;
+    U32                     TransactionContext[1];
+    U32                     TransactionDetails[1];
+} MPI2_SGE_TRANSACTION32, MPI2_POINTER PTR_MPI2_SGE_TRANSACTION32,
+  Mpi2SGETransaction32_t, MPI2_POINTER pMpi2SGETransaction32_t;
+
+typedef struct _MPI2_SGE_TRANSACTION64
+{
+    U8                      Reserved;
+    U8                      ContextSize;
+    U8                      DetailsLength;
+    U8                      Flags;
+    U32                     TransactionContext[2];
+    U32                     TransactionDetails[1];
+} MPI2_SGE_TRANSACTION64, MPI2_POINTER PTR_MPI2_SGE_TRANSACTION64,
+  Mpi2SGETransaction64_t, MPI2_POINTER pMpi2SGETransaction64_t;
+
+typedef struct _MPI2_SGE_TRANSACTION96
+{
+    U8                      Reserved;
+    U8                      ContextSize;
+    U8                      DetailsLength;
+    U8                      Flags;
+    U32                     TransactionContext[3];
+    U32                     TransactionDetails[1];
+} MPI2_SGE_TRANSACTION96, MPI2_POINTER PTR_MPI2_SGE_TRANSACTION96,
+  Mpi2SGETransaction96_t, MPI2_POINTER pMpi2SGETransaction96_t;
+
+typedef struct _MPI2_SGE_TRANSACTION128
+{
+    U8                      Reserved;
+    U8                      ContextSize;
+    U8                      DetailsLength;
+    U8                      Flags;
+    U32                     TransactionContext[4];
+    U32                     TransactionDetails[1];
+} MPI2_SGE_TRANSACTION128, MPI2_POINTER PTR_MPI2_SGE_TRANSACTION128,
+  Mpi2SGETransaction_t128, MPI2_POINTER pMpi2SGETransaction_t128;
+
+typedef struct _MPI2_SGE_TRANSACTION_UNION
+{
+    U8                      Reserved;
+    U8                      ContextSize;
+    U8                      DetailsLength;
+    U8                      Flags;
+    union
+    {
+        U32                 TransactionContext32[1];
+        U32                 TransactionContext64[2];
+        U32                 TransactionContext96[3];
+        U32                 TransactionContext128[4];
+    } u;
+    U32                     TransactionDetails[1];
+} MPI2_SGE_TRANSACTION_UNION, MPI2_POINTER PTR_MPI2_SGE_TRANSACTION_UNION,
+  Mpi2SGETransactionUnion_t, MPI2_POINTER pMpi2SGETransactionUnion_t;
+
+
+/****************************************************************************
+*  MPI SGE union for IO SGL's
+****************************************************************************/
+
+typedef struct _MPI2_MPI_SGE_IO_UNION
+{
+    union
+    {
+        MPI2_SGE_SIMPLE_UNION   Simple;
+        MPI2_SGE_CHAIN_UNION    Chain;
+    } u;
+} MPI2_MPI_SGE_IO_UNION, MPI2_POINTER PTR_MPI2_MPI_SGE_IO_UNION,
+  Mpi2MpiSGEIOUnion_t, MPI2_POINTER pMpi2MpiSGEIOUnion_t;
+
+
+/****************************************************************************
+*  MPI SGE union for SGL's with Simple and Transaction elements
+****************************************************************************/
+
+typedef struct _MPI2_SGE_TRANS_SIMPLE_UNION
+{
+    union
+    {
+        MPI2_SGE_SIMPLE_UNION       Simple;
+        MPI2_SGE_TRANSACTION_UNION  Transaction;
+    } u;
+} MPI2_SGE_TRANS_SIMPLE_UNION, MPI2_POINTER PTR_MPI2_SGE_TRANS_SIMPLE_UNION,
+  Mpi2SGETransSimpleUnion_t, MPI2_POINTER pMpi2SGETransSimpleUnion_t;
+
+
+/****************************************************************************
+*  All MPI SGE types union
+****************************************************************************/
+
+typedef struct _MPI2_MPI_SGE_UNION
+{
+    union
+    {
+        MPI2_SGE_SIMPLE_UNION       Simple;
+        MPI2_SGE_CHAIN_UNION        Chain;
+        MPI2_SGE_TRANSACTION_UNION  Transaction;
+    } u;
+} MPI2_MPI_SGE_UNION, MPI2_POINTER PTR_MPI2_MPI_SGE_UNION,
+  Mpi2MpiSgeUnion_t, MPI2_POINTER pMpi2MpiSgeUnion_t;
+
+
+/****************************************************************************
+*  MPI SGE field definition and masks
+****************************************************************************/
+
+/* Flags field bit definitions */
+
+#define MPI2_SGE_FLAGS_LAST_ELEMENT             (0x80)
+#define MPI2_SGE_FLAGS_END_OF_BUFFER            (0x40)
+#define MPI2_SGE_FLAGS_ELEMENT_TYPE_MASK        (0x30)
+#define MPI2_SGE_FLAGS_LOCAL_ADDRESS            (0x08)
+#define MPI2_SGE_FLAGS_DIRECTION                (0x04)
+#define MPI2_SGE_FLAGS_ADDRESS_SIZE             (0x02)
+#define MPI2_SGE_FLAGS_END_OF_LIST              (0x01)
+
+#define MPI2_SGE_FLAGS_SHIFT                    (24)
+
+#define MPI2_SGE_LENGTH_MASK                    (0x00FFFFFF)
+#define MPI2_SGE_CHAIN_LENGTH_MASK              (0x0000FFFF)
+
+/* Element Type */
+
+#define MPI2_SGE_FLAGS_TRANSACTION_ELEMENT      (0x00)
+#define MPI2_SGE_FLAGS_SIMPLE_ELEMENT           (0x10)
+#define MPI2_SGE_FLAGS_CHAIN_ELEMENT            (0x30)
+#define MPI2_SGE_FLAGS_ELEMENT_MASK             (0x30)
+
+/* Address location */
+
+#define MPI2_SGE_FLAGS_SYSTEM_ADDRESS           (0x00)
+
+/* Direction */
+
+#define MPI2_SGE_FLAGS_IOC_TO_HOST              (0x00)
+#define MPI2_SGE_FLAGS_HOST_TO_IOC              (0x04)
+
+/* Address Size */
+
+#define MPI2_SGE_FLAGS_32_BIT_ADDRESSING        (0x00)
+#define MPI2_SGE_FLAGS_64_BIT_ADDRESSING        (0x02)
+
+/* Context Size */
+
+#define MPI2_SGE_FLAGS_32_BIT_CONTEXT           (0x00)
+#define MPI2_SGE_FLAGS_64_BIT_CONTEXT           (0x02)
+#define MPI2_SGE_FLAGS_96_BIT_CONTEXT           (0x04)
+#define MPI2_SGE_FLAGS_128_BIT_CONTEXT          (0x06)
+
+#define MPI2_SGE_CHAIN_OFFSET_MASK              (0x00FF0000)
+#define MPI2_SGE_CHAIN_OFFSET_SHIFT             (16)
+
+/****************************************************************************
+*  MPI SGE operation Macros
+****************************************************************************/
+
+/* SIMPLE FlagsLength manipulations... */
+#define MPI2_SGE_SET_FLAGS(f)          ((U32)(f) << MPI2_SGE_FLAGS_SHIFT)
+#define MPI2_SGE_GET_FLAGS(f)          (((f) & ~MPI2_SGE_LENGTH_MASK) >> MPI2_SGE_FLAGS_SHIFT)
+#define MPI2_SGE_LENGTH(f)             ((f) & MPI2_SGE_LENGTH_MASK)
+#define MPI2_SGE_CHAIN_LENGTH(f)       ((f) & MPI2_SGE_CHAIN_LENGTH_MASK)
+
+#define MPI2_SGE_SET_FLAGS_LENGTH(f,l) (MPI2_SGE_SET_FLAGS(f) | MPI2_SGE_LENGTH(l))
+
+#define MPI2_pSGE_GET_FLAGS(psg)            MPI2_SGE_GET_FLAGS((psg)->FlagsLength)
+#define MPI2_pSGE_GET_LENGTH(psg)           MPI2_SGE_LENGTH((psg)->FlagsLength)
+#define MPI2_pSGE_SET_FLAGS_LENGTH(psg,f,l) (psg)->FlagsLength = MPI2_SGE_SET_FLAGS_LENGTH(f,l)
+
+/* CAUTION - The following are READ-MODIFY-WRITE! */
+#define MPI2_pSGE_SET_FLAGS(psg,f)      (psg)->FlagsLength |= MPI2_SGE_SET_FLAGS(f)
+#define MPI2_pSGE_SET_LENGTH(psg,l)     (psg)->FlagsLength |= MPI2_SGE_LENGTH(l)
+
+#define MPI2_GET_CHAIN_OFFSET(x)    ((x & MPI2_SGE_CHAIN_OFFSET_MASK) >> MPI2_SGE_CHAIN_OFFSET_SHIFT)
+
+
+/*****************************************************************************
+*
+*        Fusion-MPT IEEE Scatter Gather Elements
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  IEEE Simple Element structures
+****************************************************************************/
+
+typedef struct _MPI2_IEEE_SGE_SIMPLE32
+{
+    U32                     Address;
+    U32                     FlagsLength;
+} MPI2_IEEE_SGE_SIMPLE32, MPI2_POINTER PTR_MPI2_IEEE_SGE_SIMPLE32,
+  Mpi2IeeeSgeSimple32_t, MPI2_POINTER pMpi2IeeeSgeSimple32_t;
+
+typedef struct _MPI2_IEEE_SGE_SIMPLE64
+{
+    U64                     Address;
+    U32                     Length;
+    U16                     Reserved1;
+    U8                      Reserved2;
+    U8                      Flags;
+} MPI2_IEEE_SGE_SIMPLE64, MPI2_POINTER PTR_MPI2_IEEE_SGE_SIMPLE64,
+  Mpi2IeeeSgeSimple64_t, MPI2_POINTER pMpi2IeeeSgeSimple64_t;
+
+typedef union _MPI2_IEEE_SGE_SIMPLE_UNION
+{
+    MPI2_IEEE_SGE_SIMPLE32  Simple32;
+    MPI2_IEEE_SGE_SIMPLE64  Simple64;
+} MPI2_IEEE_SGE_SIMPLE_UNION, MPI2_POINTER PTR_MPI2_IEEE_SGE_SIMPLE_UNION,
+  Mpi2IeeeSgeSimpleUnion_t, MPI2_POINTER pMpi2IeeeSgeSimpleUnion_t;
+
+
+/****************************************************************************
+*  IEEE Chain Element structures
+****************************************************************************/
+
+typedef MPI2_IEEE_SGE_SIMPLE32  MPI2_IEEE_SGE_CHAIN32;
+
+typedef MPI2_IEEE_SGE_SIMPLE64  MPI2_IEEE_SGE_CHAIN64;
+
+typedef union _MPI2_IEEE_SGE_CHAIN_UNION
+{
+    MPI2_IEEE_SGE_CHAIN32   Chain32;
+    MPI2_IEEE_SGE_CHAIN64   Chain64;
+} MPI2_IEEE_SGE_CHAIN_UNION, MPI2_POINTER PTR_MPI2_IEEE_SGE_CHAIN_UNION,
+  Mpi2IeeeSgeChainUnion_t, MPI2_POINTER pMpi2IeeeSgeChainUnion_t;
+
+
+/****************************************************************************
+*  All IEEE SGE types union
+****************************************************************************/
+
+typedef struct _MPI2_IEEE_SGE_UNION
+{
+    union
+    {
+        MPI2_IEEE_SGE_SIMPLE_UNION  Simple;
+        MPI2_IEEE_SGE_CHAIN_UNION   Chain;
+    } u;
+} MPI2_IEEE_SGE_UNION, MPI2_POINTER PTR_MPI2_IEEE_SGE_UNION,
+  Mpi2IeeeSgeUnion_t, MPI2_POINTER pMpi2IeeeSgeUnion_t;
+
+
+/****************************************************************************
+*  IEEE SGE field definitions and masks
+****************************************************************************/
+
+/* Flags field bit definitions */
+
+#define MPI2_IEEE_SGE_FLAGS_ELEMENT_TYPE_MASK   (0x80)
+
+#define MPI2_IEEE32_SGE_FLAGS_SHIFT             (24)
+
+#define MPI2_IEEE32_SGE_LENGTH_MASK             (0x00FFFFFF)
+
+/* Element Type */
+
+#define MPI2_IEEE_SGE_FLAGS_SIMPLE_ELEMENT      (0x00)
+#define MPI2_IEEE_SGE_FLAGS_CHAIN_ELEMENT       (0x80)
+
+/* Data Location Address Space */
+
+#define MPI2_IEEE_SGE_FLAGS_ADDR_MASK           (0x03)
+#define MPI2_IEEE_SGE_FLAGS_SYSTEM_ADDR         (0x00)
+#define MPI2_IEEE_SGE_FLAGS_IOCDDR_ADDR         (0x01)
+#define MPI2_IEEE_SGE_FLAGS_IOCPLB_ADDR         (0x02)
+#define MPI2_IEEE_SGE_FLAGS_IOCPLBNTA_ADDR      (0x03)
+
+
+/****************************************************************************
+*  IEEE SGE operation Macros
+****************************************************************************/
+
+/* SIMPLE FlagsLength manipulations... */
+#define MPI2_IEEE32_SGE_SET_FLAGS(f)     ((U32)(f) << MPI2_IEEE32_SGE_FLAGS_SHIFT)
+#define MPI2_IEEE32_SGE_GET_FLAGS(f)     (((f) & ~MPI2_IEEE32_SGE_LENGTH_MASK) >> MPI2_IEEE32_SGE_FLAGS_SHIFT)
+#define MPI2_IEEE32_SGE_LENGTH(f)        ((f) & MPI2_IEEE32_SGE_LENGTH_MASK)
+
+#define MPI2_IEEE32_SGE_SET_FLAGS_LENGTH(f, l)      (MPI2_IEEE32_SGE_SET_FLAGS(f) | MPI2_IEEE32_SGE_LENGTH(l))
+
+#define MPI2_IEEE32_pSGE_GET_FLAGS(psg)             MPI2_IEEE32_SGE_GET_FLAGS((psg)->FlagsLength)
+#define MPI2_IEEE32_pSGE_GET_LENGTH(psg)            MPI2_IEEE32_SGE_LENGTH((psg)->FlagsLength)
+#define MPI2_IEEE32_pSGE_SET_FLAGS_LENGTH(psg,f,l)  (psg)->FlagsLength = MPI2_IEEE32_SGE_SET_FLAGS_LENGTH(f,l)
+
+/* CAUTION - The following are READ-MODIFY-WRITE! */
+#define MPI2_IEEE32_pSGE_SET_FLAGS(psg,f)    (psg)->FlagsLength |= MPI2_IEEE32_SGE_SET_FLAGS(f)
+#define MPI2_IEEE32_pSGE_SET_LENGTH(psg,l)   (psg)->FlagsLength |= MPI2_IEEE32_SGE_LENGTH(l)
+
+
+
+
+/*****************************************************************************
+*
+*        Fusion-MPT MPI/IEEE Scatter Gather Unions
+*
+*****************************************************************************/
+
+typedef union _MPI2_SIMPLE_SGE_UNION
+{
+    MPI2_SGE_SIMPLE_UNION       MpiSimple;
+    MPI2_IEEE_SGE_SIMPLE_UNION  IeeeSimple;
+} MPI2_SIMPLE_SGE_UNION, MPI2_POINTER PTR_MPI2_SIMPLE_SGE_UNION,
+  Mpi2SimpleSgeUntion_t, MPI2_POINTER pMpi2SimpleSgeUntion_t;
+
+
+typedef union _MPI2_SGE_IO_UNION
+{
+    MPI2_SGE_SIMPLE_UNION       MpiSimple;
+    MPI2_SGE_CHAIN_UNION        MpiChain;
+    MPI2_IEEE_SGE_SIMPLE_UNION  IeeeSimple;
+    MPI2_IEEE_SGE_CHAIN_UNION   IeeeChain;
+} MPI2_SGE_IO_UNION, MPI2_POINTER PTR_MPI2_SGE_IO_UNION,
+  Mpi2SGEIOUnion_t, MPI2_POINTER pMpi2SGEIOUnion_t;
+
+
+/****************************************************************************
+*
+*  Values for SGLFlags field, used in many request messages with an SGL
+*
+****************************************************************************/
+
+/* values for MPI SGL Data Location Address Space subfield */
+#define MPI2_SGLFLAGS_ADDRESS_SPACE_MASK            (0x0C)
+#define MPI2_SGLFLAGS_SYSTEM_ADDRESS_SPACE          (0x00)
+#define MPI2_SGLFLAGS_IOCDDR_ADDRESS_SPACE          (0x04)
+#define MPI2_SGLFLAGS_IOCPLB_ADDRESS_SPACE          (0x08)
+#define MPI2_SGLFLAGS_IOCPLBNTA_ADDRESS_SPACE       (0x0C)
+/* values for SGL Type subfield */
+#define MPI2_SGLFLAGS_SGL_TYPE_MASK                 (0x03)
+#define MPI2_SGLFLAGS_SGL_TYPE_MPI                  (0x00)
+#define MPI2_SGLFLAGS_SGL_TYPE_IEEE32               (0x01)
+#define MPI2_SGLFLAGS_SGL_TYPE_IEEE64               (0x02)
+
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_cnfg.h b/drivers/scsi/mpt2sas/mpi/mpi2_cnfg.h
new file mode 100644
index 000000000000..2f27cf6d6c65
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_cnfg.h
@@ -0,0 +1,2151 @@
+/*
+ *  Copyright (c) 2000-2009 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_cnfg.h
+ *          Title:  MPI Configuration messages and pages
+ *  Creation Date:  November 10, 2006
+ *
+ *    mpi2_cnfg.h Version:  02.00.10
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  06-04-07  02.00.01  Added defines for SAS IO Unit Page 2 PhyFlags.
+ *                      Added Manufacturing Page 11.
+ *                      Added MPI2_SAS_EXPANDER0_FLAGS_CONNECTOR_END_DEVICE
+ *                      define.
+ *  06-26-07  02.00.02  Adding generic structure for product-specific
+ *                      Manufacturing pages: MPI2_CONFIG_PAGE_MANUFACTURING_PS.
+ *                      Rework of BIOS Page 2 configuration page.
+ *                      Fixed MPI2_BIOSPAGE2_BOOT_DEVICE to be a union of the
+ *                      forms.
+ *                      Added configuration pages IOC Page 8 and Driver
+ *                      Persistent Mapping Page 0.
+ *  08-31-07  02.00.03  Modified configuration pages dealing with Integrated
+ *                      RAID (Manufacturing Page 4, RAID Volume Pages 0 and 1,
+ *                      RAID Physical Disk Pages 0 and 1, RAID Configuration
+ *                      Page 0).
+ *                      Added new value for AccessStatus field of SAS Device
+ *                      Page 0 (_SATA_NEEDS_INITIALIZATION).
+ *  10-31-07  02.00.04  Added missing SEPDevHandle field to
+ *                      MPI2_CONFIG_PAGE_SAS_ENCLOSURE_0.
+ *  12-18-07  02.00.05  Modified IO Unit Page 0 to use 32-bit version fields for
+ *                      NVDATA.
+ *                      Modified IOC Page 7 to use masks and added field for
+ *                      SASBroadcastPrimitiveMasks.
+ *                      Added MPI2_CONFIG_PAGE_BIOS_4.
+ *                      Added MPI2_CONFIG_PAGE_LOG_0.
+ *  02-29-08  02.00.06  Modified various names to make them 32-character unique.
+ *                      Added SAS Device IDs.
+ *                      Updated Integrated RAID configuration pages including
+ *                      Manufacturing Page 4, IOC Page 6, and RAID Configuration
+ *                      Page 0.
+ *  05-21-08  02.00.07  Added define MPI2_MANPAGE4_MIX_SSD_SAS_SATA.
+ *                      Added define MPI2_MANPAGE4_PHYSDISK_128MB_COERCION.
+ *                      Fixed define MPI2_IOCPAGE8_FLAGS_ENCLOSURE_SLOT_MAPPING.
+ *                      Added missing MaxNumRoutedSasAddresses field to
+ *                      MPI2_CONFIG_PAGE_EXPANDER_0.
+ *                      Added SAS Port Page 0.
+ *                      Modified structure layout for
+ *                      MPI2_CONFIG_PAGE_DRIVER_MAPPING_0.
+ *  06-27-08  02.00.08  Changed MPI2_CONFIG_PAGE_RD_PDISK_1 to use
+ *                      MPI2_RAID_PHYS_DISK1_PATH_MAX to size the array.
+ *  10-02-08  02.00.09  Changed MPI2_RAID_PGAD_CONFIGNUM_MASK from 0x0000FFFF
+ *                      to 0x000000FF.
+ *                      Added two new values for the Physical Disk Coercion Size
+ *                      bits in the Flags field of Manufacturing Page 4.
+ *                      Added product-specific Manufacturing pages 16 to 31.
+ *                      Modified Flags bits for controlling write cache on SATA
+ *                      drives in IO Unit Page 1.
+ *                      Added new bit to AdditionalControlFlags of SAS IO Unit
+ *                      Page 1 to control Invalid Topology Correction.
+ *                      Added additional defines for RAID Volume Page 0
+ *                      VolumeStatusFlags field.
+ *                      Modified meaning of RAID Volume Page 0 VolumeSettings
+ *                      define for auto-configure of hot-swap drives.
+ *                      Added SupportedPhysDisks field to RAID Volume Page 1 and
+ *                      added related defines.
+ *                      Added PhysDiskAttributes field (and related defines) to
+ *                      RAID Physical Disk Page 0.
+ *                      Added MPI2_SAS_PHYINFO_PHY_VACANT define.
+ *                      Added three new DiscoveryStatus bits for SAS IO Unit
+ *                      Page 0 and SAS Expander Page 0.
+ *                      Removed multiplexing information from SAS IO Unit pages.
+ *                      Added BootDeviceWaitTime field to SAS IO Unit Page 4.
+ *                      Removed Zone Address Resolved bit from PhyInfo and from
+ *                      Expander Page 0 Flags field.
+ *                      Added two new AccessStatus values to SAS Device Page 0
+ *                      for indicating routing problems. Added 3 reserved words
+ *                      to this page.
+ *  01-19-09  02.00.10  Fixed defines for GPIOVal field of IO Unit Page 3.
+ *                      Inserted missing reserved field into structure for IOC
+ *                      Page 6.
+ *                      Added more pending task bits to RAID Volume Page 0
+ *                      VolumeStatusFlags defines.
+ *                      Added MPI2_PHYSDISK0_STATUS_FLAG_NOT_CERTIFIED define.
+ *                      Added a new DiscoveryStatus bit for SAS IO Unit Page 0
+ *                      and SAS Expander Page 0 to flag a downstream initiator
+ *                      when in simplified routing mode.
+ *                      Removed SATA Init Failure defines for DiscoveryStatus
+ *                      fields of SAS IO Unit Page 0 and SAS Expander Page 0.
+ *                      Added MPI2_SAS_DEVICE0_ASTATUS_DEVICE_BLOCKED define.
+ *                      Added PortGroups, DmaGroup, and ControlGroup fields to
+ *                      SAS Device Page 0.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_CNFG_H
+#define MPI2_CNFG_H
+
+/*****************************************************************************
+*   Configuration Page Header and defines
+*****************************************************************************/
+
+/* Config Page Header */
+typedef struct _MPI2_CONFIG_PAGE_HEADER
+{
+    U8                 PageVersion;                /* 0x00 */
+    U8                 PageLength;                 /* 0x01 */
+    U8                 PageNumber;                 /* 0x02 */
+    U8                 PageType;                   /* 0x03 */
+} MPI2_CONFIG_PAGE_HEADER, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_HEADER,
+  Mpi2ConfigPageHeader_t, MPI2_POINTER pMpi2ConfigPageHeader_t;
+
+typedef union _MPI2_CONFIG_PAGE_HEADER_UNION
+{
+   MPI2_CONFIG_PAGE_HEADER  Struct;
+   U8                       Bytes[4];
+   U16                      Word16[2];
+   U32                      Word32;
+} MPI2_CONFIG_PAGE_HEADER_UNION, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_HEADER_UNION,
+  Mpi2ConfigPageHeaderUnion, MPI2_POINTER pMpi2ConfigPageHeaderUnion;
+
+/* Extended Config Page Header */
+typedef struct _MPI2_CONFIG_EXTENDED_PAGE_HEADER
+{
+    U8                  PageVersion;                /* 0x00 */
+    U8                  Reserved1;                  /* 0x01 */
+    U8                  PageNumber;                 /* 0x02 */
+    U8                  PageType;                   /* 0x03 */
+    U16                 ExtPageLength;              /* 0x04 */
+    U8                  ExtPageType;                /* 0x06 */
+    U8                  Reserved2;                  /* 0x07 */
+} MPI2_CONFIG_EXTENDED_PAGE_HEADER,
+  MPI2_POINTER PTR_MPI2_CONFIG_EXTENDED_PAGE_HEADER,
+  Mpi2ConfigExtendedPageHeader_t, MPI2_POINTER pMpi2ConfigExtendedPageHeader_t;
+
+typedef union _MPI2_CONFIG_EXT_PAGE_HEADER_UNION
+{
+   MPI2_CONFIG_PAGE_HEADER          Struct;
+   MPI2_CONFIG_EXTENDED_PAGE_HEADER Ext;
+   U8                               Bytes[8];
+   U16                              Word16[4];
+   U32                              Word32[2];
+} MPI2_CONFIG_EXT_PAGE_HEADER_UNION, MPI2_POINTER PTR_MPI2_CONFIG_EXT_PAGE_HEADER_UNION,
+  Mpi2ConfigPageExtendedHeaderUnion, MPI2_POINTER pMpi2ConfigPageExtendedHeaderUnion;
+
+
+/* PageType field values */
+#define MPI2_CONFIG_PAGEATTR_READ_ONLY              (0x00)
+#define MPI2_CONFIG_PAGEATTR_CHANGEABLE             (0x10)
+#define MPI2_CONFIG_PAGEATTR_PERSISTENT             (0x20)
+#define MPI2_CONFIG_PAGEATTR_MASK                   (0xF0)
+
+#define MPI2_CONFIG_PAGETYPE_IO_UNIT                (0x00)
+#define MPI2_CONFIG_PAGETYPE_IOC                    (0x01)
+#define MPI2_CONFIG_PAGETYPE_BIOS                   (0x02)
+#define MPI2_CONFIG_PAGETYPE_RAID_VOLUME            (0x08)
+#define MPI2_CONFIG_PAGETYPE_MANUFACTURING          (0x09)
+#define MPI2_CONFIG_PAGETYPE_RAID_PHYSDISK          (0x0A)
+#define MPI2_CONFIG_PAGETYPE_EXTENDED               (0x0F)
+#define MPI2_CONFIG_PAGETYPE_MASK                   (0x0F)
+
+#define MPI2_CONFIG_TYPENUM_MASK                    (0x0FFF)
+
+
+/* ExtPageType field values */
+#define MPI2_CONFIG_EXTPAGETYPE_SAS_IO_UNIT         (0x10)
+#define MPI2_CONFIG_EXTPAGETYPE_SAS_EXPANDER        (0x11)
+#define MPI2_CONFIG_EXTPAGETYPE_SAS_DEVICE          (0x12)
+#define MPI2_CONFIG_EXTPAGETYPE_SAS_PHY             (0x13)
+#define MPI2_CONFIG_EXTPAGETYPE_LOG                 (0x14)
+#define MPI2_CONFIG_EXTPAGETYPE_ENCLOSURE           (0x15)
+#define MPI2_CONFIG_EXTPAGETYPE_RAID_CONFIG         (0x16)
+#define MPI2_CONFIG_EXTPAGETYPE_DRIVER_MAPPING      (0x17)
+#define MPI2_CONFIG_EXTPAGETYPE_SAS_PORT            (0x18)
+
+
+/*****************************************************************************
+*   PageAddress defines
+*****************************************************************************/
+
+/* RAID Volume PageAddress format */
+#define MPI2_RAID_VOLUME_PGAD_FORM_MASK             (0xF0000000)
+#define MPI2_RAID_VOLUME_PGAD_FORM_GET_NEXT_HANDLE  (0x00000000)
+#define MPI2_RAID_VOLUME_PGAD_FORM_HANDLE           (0x10000000)
+
+#define MPI2_RAID_VOLUME_PGAD_HANDLE_MASK           (0x0000FFFF)
+
+
+/* RAID Physical Disk PageAddress format */
+#define MPI2_PHYSDISK_PGAD_FORM_MASK                    (0xF0000000)
+#define MPI2_PHYSDISK_PGAD_FORM_GET_NEXT_PHYSDISKNUM    (0x00000000)
+#define MPI2_PHYSDISK_PGAD_FORM_PHYSDISKNUM             (0x10000000)
+#define MPI2_PHYSDISK_PGAD_FORM_DEVHANDLE               (0x20000000)
+
+#define MPI2_PHYSDISK_PGAD_PHYSDISKNUM_MASK             (0x000000FF)
+#define MPI2_PHYSDISK_PGAD_DEVHANDLE_MASK               (0x0000FFFF)
+
+
+/* SAS Expander PageAddress format */
+#define MPI2_SAS_EXPAND_PGAD_FORM_MASK              (0xF0000000)
+#define MPI2_SAS_EXPAND_PGAD_FORM_GET_NEXT_HNDL     (0x00000000)
+#define MPI2_SAS_EXPAND_PGAD_FORM_HNDL_PHY_NUM      (0x10000000)
+#define MPI2_SAS_EXPAND_PGAD_FORM_HNDL              (0x20000000)
+
+#define MPI2_SAS_EXPAND_PGAD_HANDLE_MASK            (0x0000FFFF)
+#define MPI2_SAS_EXPAND_PGAD_PHYNUM_MASK            (0x00FF0000)
+#define MPI2_SAS_EXPAND_PGAD_PHYNUM_SHIFT           (16)
+
+
+/* SAS Device PageAddress format */
+#define MPI2_SAS_DEVICE_PGAD_FORM_MASK              (0xF0000000)
+#define MPI2_SAS_DEVICE_PGAD_FORM_GET_NEXT_HANDLE   (0x00000000)
+#define MPI2_SAS_DEVICE_PGAD_FORM_HANDLE            (0x20000000)
+
+#define MPI2_SAS_DEVICE_PGAD_HANDLE_MASK            (0x0000FFFF)
+
+
+/* SAS PHY PageAddress format */
+#define MPI2_SAS_PHY_PGAD_FORM_MASK                 (0xF0000000)
+#define MPI2_SAS_PHY_PGAD_FORM_PHY_NUMBER           (0x00000000)
+#define MPI2_SAS_PHY_PGAD_FORM_PHY_TBL_INDEX        (0x10000000)
+
+#define MPI2_SAS_PHY_PGAD_PHY_NUMBER_MASK           (0x000000FF)
+#define MPI2_SAS_PHY_PGAD_PHY_TBL_INDEX_MASK        (0x0000FFFF)
+
+
+/* SAS Port PageAddress format */
+#define MPI2_SASPORT_PGAD_FORM_MASK                 (0xF0000000)
+#define MPI2_SASPORT_PGAD_FORM_GET_NEXT_PORT        (0x00000000)
+#define MPI2_SASPORT_PGAD_FORM_PORT_NUM             (0x10000000)
+
+#define MPI2_SASPORT_PGAD_PORTNUMBER_MASK           (0x00000FFF)
+
+
+/* SAS Enclosure PageAddress format */
+#define MPI2_SAS_ENCLOS_PGAD_FORM_MASK              (0xF0000000)
+#define MPI2_SAS_ENCLOS_PGAD_FORM_GET_NEXT_HANDLE   (0x00000000)
+#define MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE            (0x10000000)
+
+#define MPI2_SAS_ENCLOS_PGAD_HANDLE_MASK            (0x0000FFFF)
+
+
+/* RAID Configuration PageAddress format */
+#define MPI2_RAID_PGAD_FORM_MASK                    (0xF0000000)
+#define MPI2_RAID_PGAD_FORM_GET_NEXT_CONFIGNUM      (0x00000000)
+#define MPI2_RAID_PGAD_FORM_CONFIGNUM               (0x10000000)
+#define MPI2_RAID_PGAD_FORM_ACTIVE_CONFIG           (0x20000000)
+
+#define MPI2_RAID_PGAD_CONFIGNUM_MASK               (0x000000FF)
+
+
+/* Driver Persistent Mapping PageAddress format */
+#define MPI2_DPM_PGAD_FORM_MASK                     (0xF0000000)
+#define MPI2_DPM_PGAD_FORM_ENTRY_RANGE              (0x00000000)
+
+#define MPI2_DPM_PGAD_ENTRY_COUNT_MASK              (0x0FFF0000)
+#define MPI2_DPM_PGAD_ENTRY_COUNT_SHIFT             (16)
+#define MPI2_DPM_PGAD_START_ENTRY_MASK              (0x0000FFFF)
+
+
+/****************************************************************************
+*   Configuration messages
+****************************************************************************/
+
+/* Configuration Request Message */
+typedef struct _MPI2_CONFIG_REQUEST
+{
+    U8                      Action;                     /* 0x00 */
+    U8                      SGLFlags;                   /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     ExtPageLength;              /* 0x04 */
+    U8                      ExtPageType;                /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved1;                  /* 0x0A */
+    U32                     Reserved2;                  /* 0x0C */
+    U32                     Reserved3;                  /* 0x10 */
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x14 */
+    U32                     PageAddress;                /* 0x18 */
+    MPI2_SGE_IO_UNION       PageBufferSGE;              /* 0x1C */
+} MPI2_CONFIG_REQUEST, MPI2_POINTER PTR_MPI2_CONFIG_REQUEST,
+  Mpi2ConfigRequest_t, MPI2_POINTER pMpi2ConfigRequest_t;
+
+/* values for the Action field */
+#define MPI2_CONFIG_ACTION_PAGE_HEADER              (0x00)
+#define MPI2_CONFIG_ACTION_PAGE_READ_CURRENT        (0x01)
+#define MPI2_CONFIG_ACTION_PAGE_WRITE_CURRENT       (0x02)
+#define MPI2_CONFIG_ACTION_PAGE_DEFAULT             (0x03)
+#define MPI2_CONFIG_ACTION_PAGE_WRITE_NVRAM         (0x04)
+#define MPI2_CONFIG_ACTION_PAGE_READ_DEFAULT        (0x05)
+#define MPI2_CONFIG_ACTION_PAGE_READ_NVRAM          (0x06)
+#define MPI2_CONFIG_ACTION_PAGE_GET_CHANGEABLE      (0x07)
+
+/* values for SGLFlags field are in the SGL section of mpi2.h */
+
+
+/* Config Reply Message */
+typedef struct _MPI2_CONFIG_REPLY
+{
+    U8                      Action;                     /* 0x00 */
+    U8                      SGLFlags;                   /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     ExtPageLength;              /* 0x04 */
+    U8                      ExtPageType;                /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved1;                  /* 0x0A */
+    U16                     Reserved2;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x14 */
+} MPI2_CONFIG_REPLY, MPI2_POINTER PTR_MPI2_CONFIG_REPLY,
+  Mpi2ConfigReply_t, MPI2_POINTER pMpi2ConfigReply_t;
+
+
+
+/*****************************************************************************
+*
+*               C o n f i g u r a t i o n    P a g e s
+*
+*****************************************************************************/
+
+/****************************************************************************
+*   Manufacturing Config pages
+****************************************************************************/
+
+#define MPI2_MFGPAGE_VENDORID_LSI                   (0x1000)
+
+/* SAS */
+#define MPI2_MFGPAGE_DEVID_SAS2004                  (0x0070)
+#define MPI2_MFGPAGE_DEVID_SAS2008                  (0x0072)
+#define MPI2_MFGPAGE_DEVID_SAS2108_1                (0x0074)
+#define MPI2_MFGPAGE_DEVID_SAS2108_2                (0x0076)
+#define MPI2_MFGPAGE_DEVID_SAS2108_3                (0x0077)
+#define MPI2_MFGPAGE_DEVID_SAS2116_1                (0x0064)
+#define MPI2_MFGPAGE_DEVID_SAS2116_2                (0x0065)
+
+
+/* Manufacturing Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_0
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U8                      ChipName[16];               /* 0x04 */
+    U8                      ChipRevision[8];            /* 0x14 */
+    U8                      BoardName[16];              /* 0x1C */
+    U8                      BoardAssembly[16];          /* 0x2C */
+    U8                      BoardTracerNumber[16];      /* 0x3C */
+} MPI2_CONFIG_PAGE_MAN_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_0,
+  Mpi2ManufacturingPage0_t, MPI2_POINTER pMpi2ManufacturingPage0_t;
+
+#define MPI2_MANUFACTURING0_PAGEVERSION                (0x00)
+
+
+/* Manufacturing Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_1
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U8                      VPD[256];                   /* 0x04 */
+} MPI2_CONFIG_PAGE_MAN_1,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_1,
+  Mpi2ManufacturingPage1_t, MPI2_POINTER pMpi2ManufacturingPage1_t;
+
+#define MPI2_MANUFACTURING1_PAGEVERSION                (0x00)
+
+
+typedef struct _MPI2_CHIP_REVISION_ID
+{
+    U16 DeviceID;                                       /* 0x00 */
+    U8  PCIRevisionID;                                  /* 0x02 */
+    U8  Reserved;                                       /* 0x03 */
+} MPI2_CHIP_REVISION_ID, MPI2_POINTER PTR_MPI2_CHIP_REVISION_ID,
+  Mpi2ChipRevisionId_t, MPI2_POINTER pMpi2ChipRevisionId_t;
+
+
+/* Manufacturing Page 2 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength at runtime.
+ */
+#ifndef MPI2_MAN_PAGE_2_HW_SETTINGS_WORDS
+#define MPI2_MAN_PAGE_2_HW_SETTINGS_WORDS   (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_2
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    MPI2_CHIP_REVISION_ID   ChipId;                     /* 0x04 */
+    U32                     HwSettings[MPI2_MAN_PAGE_2_HW_SETTINGS_WORDS];/* 0x08 */
+} MPI2_CONFIG_PAGE_MAN_2,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_2,
+  Mpi2ManufacturingPage2_t, MPI2_POINTER pMpi2ManufacturingPage2_t;
+
+#define MPI2_MANUFACTURING2_PAGEVERSION                 (0x00)
+
+
+/* Manufacturing Page 3 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength at runtime.
+ */
+#ifndef MPI2_MAN_PAGE_3_INFO_WORDS
+#define MPI2_MAN_PAGE_3_INFO_WORDS          (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_3
+{
+    MPI2_CONFIG_PAGE_HEADER             Header;         /* 0x00 */
+    MPI2_CHIP_REVISION_ID               ChipId;         /* 0x04 */
+    U32                                 Info[MPI2_MAN_PAGE_3_INFO_WORDS];/* 0x08 */
+} MPI2_CONFIG_PAGE_MAN_3,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_3,
+  Mpi2ManufacturingPage3_t, MPI2_POINTER pMpi2ManufacturingPage3_t;
+
+#define MPI2_MANUFACTURING3_PAGEVERSION                 (0x00)
+
+
+/* Manufacturing Page 4 */
+
+typedef struct _MPI2_MANPAGE4_PWR_SAVE_SETTINGS
+{
+    U8                          PowerSaveFlags;                 /* 0x00 */
+    U8                          InternalOperationsSleepTime;    /* 0x01 */
+    U8                          InternalOperationsRunTime;      /* 0x02 */
+    U8                          HostIdleTime;                   /* 0x03 */
+} MPI2_MANPAGE4_PWR_SAVE_SETTINGS,
+  MPI2_POINTER PTR_MPI2_MANPAGE4_PWR_SAVE_SETTINGS,
+  Mpi2ManPage4PwrSaveSettings_t, MPI2_POINTER pMpi2ManPage4PwrSaveSettings_t;
+
+/* defines for the PowerSaveFlags field */
+#define MPI2_MANPAGE4_MASK_POWERSAVE_MODE               (0x03)
+#define MPI2_MANPAGE4_POWERSAVE_MODE_DISABLED           (0x00)
+#define MPI2_MANPAGE4_CUSTOM_POWERSAVE_MODE             (0x01)
+#define MPI2_MANPAGE4_FULL_POWERSAVE_MODE               (0x02)
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_4
+{
+    MPI2_CONFIG_PAGE_HEADER             Header;                 /* 0x00 */
+    U32                                 Reserved1;              /* 0x04 */
+    U32                                 Flags;                  /* 0x08 */
+    U8                                  InquirySize;            /* 0x0C */
+    U8                                  Reserved2;              /* 0x0D */
+    U16                                 Reserved3;              /* 0x0E */
+    U8                                  InquiryData[56];        /* 0x10 */
+    U32                                 RAID0VolumeSettings;    /* 0x48 */
+    U32                                 RAID1EVolumeSettings;   /* 0x4C */
+    U32                                 RAID1VolumeSettings;    /* 0x50 */
+    U32                                 RAID10VolumeSettings;   /* 0x54 */
+    U32                                 Reserved4;              /* 0x58 */
+    U32                                 Reserved5;              /* 0x5C */
+    MPI2_MANPAGE4_PWR_SAVE_SETTINGS     PowerSaveSettings;      /* 0x60 */
+    U8                                  MaxOCEDisks;            /* 0x64 */
+    U8                                  ResyncRate;             /* 0x65 */
+    U16                                 DataScrubDuration;      /* 0x66 */
+    U8                                  MaxHotSpares;           /* 0x68 */
+    U8                                  MaxPhysDisksPerVol;     /* 0x69 */
+    U8                                  MaxPhysDisks;           /* 0x6A */
+    U8                                  MaxVolumes;             /* 0x6B */
+} MPI2_CONFIG_PAGE_MAN_4,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_4,
+  Mpi2ManufacturingPage4_t, MPI2_POINTER pMpi2ManufacturingPage4_t;
+
+#define MPI2_MANUFACTURING4_PAGEVERSION                 (0x0A)
+
+/* Manufacturing Page 4 Flags field */
+#define MPI2_MANPAGE4_METADATA_SIZE_MASK                (0x00030000)
+#define MPI2_MANPAGE4_METADATA_512MB                    (0x00000000)
+
+#define MPI2_MANPAGE4_MIX_SSD_SAS_SATA                  (0x00008000)
+#define MPI2_MANPAGE4_MIX_SSD_AND_NON_SSD               (0x00004000)
+#define MPI2_MANPAGE4_HIDE_PHYSDISK_NON_IR              (0x00002000)
+
+#define MPI2_MANPAGE4_MASK_PHYSDISK_COERCION            (0x00001C00)
+#define MPI2_MANPAGE4_PHYSDISK_COERCION_1GB             (0x00000000)
+#define MPI2_MANPAGE4_PHYSDISK_128MB_COERCION           (0x00000400)
+#define MPI2_MANPAGE4_PHYSDISK_ADAPTIVE_COERCION        (0x00000800)
+#define MPI2_MANPAGE4_PHYSDISK_ZERO_COERCION            (0x00000C00)
+
+#define MPI2_MANPAGE4_MASK_BAD_BLOCK_MARKING            (0x00000300)
+#define MPI2_MANPAGE4_DEFAULT_BAD_BLOCK_MARKING         (0x00000000)
+#define MPI2_MANPAGE4_TABLE_BAD_BLOCK_MARKING           (0x00000100)
+#define MPI2_MANPAGE4_WRITE_LONG_BAD_BLOCK_MARKING      (0x00000200)
+
+#define MPI2_MANPAGE4_FORCE_OFFLINE_FAILOVER            (0x00000080)
+#define MPI2_MANPAGE4_RAID10_DISABLE                    (0x00000040)
+#define MPI2_MANPAGE4_RAID1E_DISABLE                    (0x00000020)
+#define MPI2_MANPAGE4_RAID1_DISABLE                     (0x00000010)
+#define MPI2_MANPAGE4_RAID0_DISABLE                     (0x00000008)
+#define MPI2_MANPAGE4_IR_MODEPAGE8_DISABLE              (0x00000004)
+#define MPI2_MANPAGE4_IM_RESYNC_CACHE_ENABLE            (0x00000002)
+#define MPI2_MANPAGE4_IR_NO_MIX_SAS_SATA                (0x00000001)
+
+
+/* Manufacturing Page 5 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_MAN_PAGE_5_PHY_ENTRIES
+#define MPI2_MAN_PAGE_5_PHY_ENTRIES         (1)
+#endif
+
+typedef struct _MPI2_MANUFACTURING5_ENTRY
+{
+    U64                                 WWID;           /* 0x00 */
+    U64                                 DeviceName;     /* 0x08 */
+} MPI2_MANUFACTURING5_ENTRY, MPI2_POINTER PTR_MPI2_MANUFACTURING5_ENTRY,
+  Mpi2Manufacturing5Entry_t, MPI2_POINTER pMpi2Manufacturing5Entry_t;
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_5
+{
+    MPI2_CONFIG_PAGE_HEADER             Header;         /* 0x00 */
+    U8                                  NumPhys;        /* 0x04 */
+    U8                                  Reserved1;      /* 0x05 */
+    U16                                 Reserved2;      /* 0x06 */
+    U32                                 Reserved3;      /* 0x08 */
+    U32                                 Reserved4;      /* 0x0C */
+    MPI2_MANUFACTURING5_ENTRY           Phy[MPI2_MAN_PAGE_5_PHY_ENTRIES];/* 0x08 */
+} MPI2_CONFIG_PAGE_MAN_5,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_5,
+  Mpi2ManufacturingPage5_t, MPI2_POINTER pMpi2ManufacturingPage5_t;
+
+#define MPI2_MANUFACTURING5_PAGEVERSION                 (0x03)
+
+
+/* Manufacturing Page 6 */
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_6
+{
+    MPI2_CONFIG_PAGE_HEADER         Header;             /* 0x00 */
+    U32                             ProductSpecificInfo;/* 0x04 */
+} MPI2_CONFIG_PAGE_MAN_6,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_6,
+  Mpi2ManufacturingPage6_t, MPI2_POINTER pMpi2ManufacturingPage6_t;
+
+#define MPI2_MANUFACTURING6_PAGEVERSION                 (0x00)
+
+
+/* Manufacturing Page 7 */
+
+typedef struct _MPI2_MANPAGE7_CONNECTOR_INFO
+{
+    U32                         Pinout;                 /* 0x00 */
+    U8                          Connector[16];          /* 0x04 */
+    U8                          Location;               /* 0x14 */
+    U8                          Reserved1;              /* 0x15 */
+    U16                         Slot;                   /* 0x16 */
+    U32                         Reserved2;              /* 0x18 */
+} MPI2_MANPAGE7_CONNECTOR_INFO, MPI2_POINTER PTR_MPI2_MANPAGE7_CONNECTOR_INFO,
+  Mpi2ManPage7ConnectorInfo_t, MPI2_POINTER pMpi2ManPage7ConnectorInfo_t;
+
+/* defines for the Pinout field */
+#define MPI2_MANPAGE7_PINOUT_SFF_8484_L4                (0x00080000)
+#define MPI2_MANPAGE7_PINOUT_SFF_8484_L3                (0x00040000)
+#define MPI2_MANPAGE7_PINOUT_SFF_8484_L2                (0x00020000)
+#define MPI2_MANPAGE7_PINOUT_SFF_8484_L1                (0x00010000)
+#define MPI2_MANPAGE7_PINOUT_SFF_8470_L4                (0x00000800)
+#define MPI2_MANPAGE7_PINOUT_SFF_8470_L3                (0x00000400)
+#define MPI2_MANPAGE7_PINOUT_SFF_8470_L2                (0x00000200)
+#define MPI2_MANPAGE7_PINOUT_SFF_8470_L1                (0x00000100)
+#define MPI2_MANPAGE7_PINOUT_SFF_8482                   (0x00000002)
+#define MPI2_MANPAGE7_PINOUT_CONNECTION_UNKNOWN         (0x00000001)
+
+/* defines for the Location field */
+#define MPI2_MANPAGE7_LOCATION_UNKNOWN                  (0x01)
+#define MPI2_MANPAGE7_LOCATION_INTERNAL                 (0x02)
+#define MPI2_MANPAGE7_LOCATION_EXTERNAL                 (0x04)
+#define MPI2_MANPAGE7_LOCATION_SWITCHABLE               (0x08)
+#define MPI2_MANPAGE7_LOCATION_AUTO                     (0x10)
+#define MPI2_MANPAGE7_LOCATION_NOT_PRESENT              (0x20)
+#define MPI2_MANPAGE7_LOCATION_NOT_CONNECTED            (0x80)
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check NumPhys at runtime.
+ */
+#ifndef MPI2_MANPAGE7_CONNECTOR_INFO_MAX
+#define MPI2_MANPAGE7_CONNECTOR_INFO_MAX  (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_7
+{
+    MPI2_CONFIG_PAGE_HEADER         Header;             /* 0x00 */
+    U32                             Reserved1;          /* 0x04 */
+    U32                             Reserved2;          /* 0x08 */
+    U32                             Flags;              /* 0x0C */
+    U8                              EnclosureName[16];  /* 0x10 */
+    U8                              NumPhys;            /* 0x20 */
+    U8                              Reserved3;          /* 0x21 */
+    U16                             Reserved4;          /* 0x22 */
+    MPI2_MANPAGE7_CONNECTOR_INFO    ConnectorInfo[MPI2_MANPAGE7_CONNECTOR_INFO_MAX]; /* 0x24 */
+} MPI2_CONFIG_PAGE_MAN_7,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_7,
+  Mpi2ManufacturingPage7_t, MPI2_POINTER pMpi2ManufacturingPage7_t;
+
+#define MPI2_MANUFACTURING7_PAGEVERSION                 (0x00)
+
+/* defines for the Flags field */
+#define MPI2_MANPAGE7_FLAG_USE_SLOT_INFO                (0x00000001)
+
+
+/*
+ * Generic structure to use for product-specific manufacturing pages
+ * (currently Manufacturing Page 8 through Manufacturing Page 31).
+ */
+
+typedef struct _MPI2_CONFIG_PAGE_MAN_PS
+{
+    MPI2_CONFIG_PAGE_HEADER         Header;             /* 0x00 */
+    U32                             ProductSpecificInfo;/* 0x04 */
+} MPI2_CONFIG_PAGE_MAN_PS,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_MAN_PS,
+  Mpi2ManufacturingPagePS_t, MPI2_POINTER pMpi2ManufacturingPagePS_t;
+
+#define MPI2_MANUFACTURING8_PAGEVERSION                 (0x00)
+#define MPI2_MANUFACTURING9_PAGEVERSION                 (0x00)
+#define MPI2_MANUFACTURING10_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING11_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING12_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING13_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING14_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING15_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING16_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING17_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING18_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING19_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING20_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING21_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING22_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING23_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING24_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING25_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING26_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING27_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING28_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING29_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING30_PAGEVERSION                (0x00)
+#define MPI2_MANUFACTURING31_PAGEVERSION                (0x00)
+
+
+/****************************************************************************
+*   IO Unit Config Pages
+****************************************************************************/
+
+/* IO Unit Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_IO_UNIT_0
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U64                     UniqueValue;                /* 0x04 */
+    MPI2_VERSION_UNION      NvdataVersionDefault;       /* 0x08 */
+    MPI2_VERSION_UNION      NvdataVersionPersistent;    /* 0x0A */
+} MPI2_CONFIG_PAGE_IO_UNIT_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IO_UNIT_0,
+  Mpi2IOUnitPage0_t, MPI2_POINTER pMpi2IOUnitPage0_t;
+
+#define MPI2_IOUNITPAGE0_PAGEVERSION                    (0x02)
+
+
+/* IO Unit Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_IO_UNIT_1
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     Flags;                      /* 0x04 */
+} MPI2_CONFIG_PAGE_IO_UNIT_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IO_UNIT_1,
+  Mpi2IOUnitPage1_t, MPI2_POINTER pMpi2IOUnitPage1_t;
+
+#define MPI2_IOUNITPAGE1_PAGEVERSION                    (0x04)
+
+/* IO Unit Page 1 Flags defines */
+#define MPI2_IOUNITPAGE1_MASK_SATA_WRITE_CACHE          (0x00000600)
+#define MPI2_IOUNITPAGE1_ENABLE_SATA_WRITE_CACHE        (0x00000000)
+#define MPI2_IOUNITPAGE1_DISABLE_SATA_WRITE_CACHE       (0x00000200)
+#define MPI2_IOUNITPAGE1_UNCHANGED_SATA_WRITE_CACHE     (0x00000400)
+#define MPI2_IOUNITPAGE1_NATIVE_COMMAND_Q_DISABLE       (0x00000100)
+#define MPI2_IOUNITPAGE1_DISABLE_IR                     (0x00000040)
+#define MPI2_IOUNITPAGE1_DISABLE_TASK_SET_FULL_HANDLING (0x00000020)
+#define MPI2_IOUNITPAGE1_IR_USE_STATIC_VOLUME_ID        (0x00000004)
+#define MPI2_IOUNITPAGE1_MULTI_PATHING                  (0x00000002)
+#define MPI2_IOUNITPAGE1_SINGLE_PATHING                 (0x00000000)
+
+
+/* IO Unit Page 3 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength at runtime.
+ */
+#ifndef MPI2_IO_UNIT_PAGE_3_GPIO_VAL_MAX
+#define MPI2_IO_UNIT_PAGE_3_GPIO_VAL_MAX    (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_IO_UNIT_3
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                                   /* 0x00 */
+    U8                      GPIOCount;                                /* 0x04 */
+    U8                      Reserved1;                                /* 0x05 */
+    U16                     Reserved2;                                /* 0x06 */
+    U16                     GPIOVal[MPI2_IO_UNIT_PAGE_3_GPIO_VAL_MAX];/* 0x08 */
+} MPI2_CONFIG_PAGE_IO_UNIT_3, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IO_UNIT_3,
+  Mpi2IOUnitPage3_t, MPI2_POINTER pMpi2IOUnitPage3_t;
+
+#define MPI2_IOUNITPAGE3_PAGEVERSION                    (0x01)
+
+/* defines for IO Unit Page 3 GPIOVal field */
+#define MPI2_IOUNITPAGE3_GPIO_FUNCTION_MASK             (0xFFFC)
+#define MPI2_IOUNITPAGE3_GPIO_FUNCTION_SHIFT            (2)
+#define MPI2_IOUNITPAGE3_GPIO_SETTING_OFF               (0x0000)
+#define MPI2_IOUNITPAGE3_GPIO_SETTING_ON                (0x0001)
+
+
+/****************************************************************************
+*   IOC Config Pages
+****************************************************************************/
+
+/* IOC Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_IOC_0
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     Reserved1;                  /* 0x04 */
+    U32                     Reserved2;                  /* 0x08 */
+    U16                     VendorID;                   /* 0x0C */
+    U16                     DeviceID;                   /* 0x0E */
+    U8                      RevisionID;                 /* 0x10 */
+    U8                      Reserved3;                  /* 0x11 */
+    U16                     Reserved4;                  /* 0x12 */
+    U32                     ClassCode;                  /* 0x14 */
+    U16                     SubsystemVendorID;          /* 0x18 */
+    U16                     SubsystemID;                /* 0x1A */
+} MPI2_CONFIG_PAGE_IOC_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IOC_0,
+  Mpi2IOCPage0_t, MPI2_POINTER pMpi2IOCPage0_t;
+
+#define MPI2_IOCPAGE0_PAGEVERSION                       (0x02)
+
+
+/* IOC Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_IOC_1
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     Flags;                      /* 0x04 */
+    U32                     CoalescingTimeout;          /* 0x08 */
+    U8                      CoalescingDepth;            /* 0x0C */
+    U8                      PCISlotNum;                 /* 0x0D */
+    U8                      PCIBusNum;                  /* 0x0E */
+    U8                      PCIDomainSegment;           /* 0x0F */
+    U32                     Reserved1;                  /* 0x10 */
+    U32                     Reserved2;                  /* 0x14 */
+} MPI2_CONFIG_PAGE_IOC_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IOC_1,
+  Mpi2IOCPage1_t, MPI2_POINTER pMpi2IOCPage1_t;
+
+#define MPI2_IOCPAGE1_PAGEVERSION                       (0x05)
+
+/* defines for IOC Page 1 Flags field */
+#define MPI2_IOCPAGE1_REPLY_COALESCING                  (0x00000001)
+
+#define MPI2_IOCPAGE1_PCISLOTNUM_UNKNOWN                (0xFF)
+#define MPI2_IOCPAGE1_PCIBUSNUM_UNKNOWN                 (0xFF)
+#define MPI2_IOCPAGE1_PCIDOMAIN_UNKNOWN                 (0xFF)
+
+/* IOC Page 6 */
+
+typedef struct _MPI2_CONFIG_PAGE_IOC_6
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                         /* 0x00 */
+    U32                     CapabilitiesFlags;              /* 0x04 */
+    U8                      MaxDrivesRAID0;                 /* 0x08 */
+    U8                      MaxDrivesRAID1;                 /* 0x09 */
+    U8                      MaxDrivesRAID1E;                /* 0x0A */
+    U8                      MaxDrivesRAID10;                /* 0x0B */
+    U8                      MinDrivesRAID0;                 /* 0x0C */
+    U8                      MinDrivesRAID1;                 /* 0x0D */
+    U8                      MinDrivesRAID1E;                /* 0x0E */
+    U8                      MinDrivesRAID10;                /* 0x0F */
+    U32                     Reserved1;                      /* 0x10 */
+    U8                      MaxGlobalHotSpares;             /* 0x14 */
+    U8                      MaxPhysDisks;                   /* 0x15 */
+    U8                      MaxVolumes;                     /* 0x16 */
+    U8                      MaxConfigs;                     /* 0x17 */
+    U8                      MaxOCEDisks;                    /* 0x18 */
+    U8                      Reserved2;                      /* 0x19 */
+    U16                     Reserved3;                      /* 0x1A */
+    U32                     SupportedStripeSizeMapRAID0;    /* 0x1C */
+    U32                     SupportedStripeSizeMapRAID1E;   /* 0x20 */
+    U32                     SupportedStripeSizeMapRAID10;   /* 0x24 */
+    U32                     Reserved4;                      /* 0x28 */
+    U32                     Reserved5;                      /* 0x2C */
+    U16                     DefaultMetadataSize;            /* 0x30 */
+    U16                     Reserved6;                      /* 0x32 */
+    U16                     MaxBadBlockTableEntries;        /* 0x34 */
+    U16                     Reserved7;                      /* 0x36 */
+    U32                     IRNvsramVersion;                /* 0x38 */
+} MPI2_CONFIG_PAGE_IOC_6, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IOC_6,
+  Mpi2IOCPage6_t, MPI2_POINTER pMpi2IOCPage6_t;
+
+#define MPI2_IOCPAGE6_PAGEVERSION                       (0x04)
+
+/* defines for IOC Page 6 CapabilitiesFlags */
+#define MPI2_IOCPAGE6_CAP_FLAGS_RAID10_SUPPORT          (0x00000010)
+#define MPI2_IOCPAGE6_CAP_FLAGS_RAID1_SUPPORT           (0x00000008)
+#define MPI2_IOCPAGE6_CAP_FLAGS_RAID1E_SUPPORT          (0x00000004)
+#define MPI2_IOCPAGE6_CAP_FLAGS_RAID0_SUPPORT           (0x00000002)
+#define MPI2_IOCPAGE6_CAP_FLAGS_GLOBAL_HOT_SPARE        (0x00000001)
+
+
+/* IOC Page 7 */
+
+#define MPI2_IOCPAGE7_EVENTMASK_WORDS       (4)
+
+typedef struct _MPI2_CONFIG_PAGE_IOC_7
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     Reserved1;                  /* 0x04 */
+    U32                     EventMasks[MPI2_IOCPAGE7_EVENTMASK_WORDS];/* 0x08 */
+    U16                     SASBroadcastPrimitiveMasks; /* 0x18 */
+    U16                     Reserved2;                  /* 0x1A */
+    U32                     Reserved3;                  /* 0x1C */
+} MPI2_CONFIG_PAGE_IOC_7, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IOC_7,
+  Mpi2IOCPage7_t, MPI2_POINTER pMpi2IOCPage7_t;
+
+#define MPI2_IOCPAGE7_PAGEVERSION                       (0x01)
+
+
+/* IOC Page 8 */
+
+typedef struct _MPI2_CONFIG_PAGE_IOC_8
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U8                      NumDevsPerEnclosure;        /* 0x04 */
+    U8                      Reserved1;                  /* 0x05 */
+    U16                     Reserved2;                  /* 0x06 */
+    U16                     MaxPersistentEntries;       /* 0x08 */
+    U16                     MaxNumPhysicalMappedIDs;    /* 0x0A */
+    U16                     Flags;                      /* 0x0C */
+    U16                     Reserved3;                  /* 0x0E */
+    U16                     IRVolumeMappingFlags;       /* 0x10 */
+    U16                     Reserved4;                  /* 0x12 */
+    U32                     Reserved5;                  /* 0x14 */
+} MPI2_CONFIG_PAGE_IOC_8, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_IOC_8,
+  Mpi2IOCPage8_t, MPI2_POINTER pMpi2IOCPage8_t;
+
+#define MPI2_IOCPAGE8_PAGEVERSION                       (0x00)
+
+/* defines for IOC Page 8 Flags field */
+#define MPI2_IOCPAGE8_FLAGS_DA_START_SLOT_1             (0x00000020)
+#define MPI2_IOCPAGE8_FLAGS_RESERVED_TARGETID_0         (0x00000010)
+
+#define MPI2_IOCPAGE8_FLAGS_MASK_MAPPING_MODE           (0x0000000E)
+#define MPI2_IOCPAGE8_FLAGS_DEVICE_PERSISTENCE_MAPPING  (0x00000000)
+#define MPI2_IOCPAGE8_FLAGS_ENCLOSURE_SLOT_MAPPING      (0x00000002)
+
+#define MPI2_IOCPAGE8_FLAGS_DISABLE_PERSISTENT_MAPPING  (0x00000001)
+#define MPI2_IOCPAGE8_FLAGS_ENABLE_PERSISTENT_MAPPING   (0x00000000)
+
+/* defines for IOC Page 8 IRVolumeMappingFlags */
+#define MPI2_IOCPAGE8_IRFLAGS_MASK_VOLUME_MAPPING_MODE  (0x00000003)
+#define MPI2_IOCPAGE8_IRFLAGS_LOW_VOLUME_MAPPING        (0x00000000)
+#define MPI2_IOCPAGE8_IRFLAGS_HIGH_VOLUME_MAPPING       (0x00000001)
+
+
+/****************************************************************************
+*   BIOS Config Pages
+****************************************************************************/
+
+/* BIOS Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_BIOS_1
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     BiosOptions;                /* 0x04 */
+    U32                     IOCSettings;                /* 0x08 */
+    U32                     Reserved1;                  /* 0x0C */
+    U32                     DeviceSettings;             /* 0x10 */
+    U16                     NumberOfDevices;            /* 0x14 */
+    U16                     Reserved2;                  /* 0x16 */
+    U16                     IOTimeoutBlockDevicesNonRM; /* 0x18 */
+    U16                     IOTimeoutSequential;        /* 0x1A */
+    U16                     IOTimeoutOther;             /* 0x1C */
+    U16                     IOTimeoutBlockDevicesRM;    /* 0x1E */
+} MPI2_CONFIG_PAGE_BIOS_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_BIOS_1,
+  Mpi2BiosPage1_t, MPI2_POINTER pMpi2BiosPage1_t;
+
+#define MPI2_BIOSPAGE1_PAGEVERSION                      (0x04)
+
+/* values for BIOS Page 1 BiosOptions field */
+#define MPI2_BIOSPAGE1_OPTIONS_DISABLE_BIOS             (0x00000001)
+
+/* values for BIOS Page 1 IOCSettings field */
+#define MPI2_BIOSPAGE1_IOCSET_MASK_BOOT_PREFERENCE      (0x00030000)
+#define MPI2_BIOSPAGE1_IOCSET_ENCLOSURE_SLOT_BOOT       (0x00000000)
+#define MPI2_BIOSPAGE1_IOCSET_SAS_ADDRESS_BOOT          (0x00010000)
+
+#define MPI2_BIOSPAGE1_IOCSET_MASK_RM_SETTING           (0x000000C0)
+#define MPI2_BIOSPAGE1_IOCSET_NONE_RM_SETTING           (0x00000000)
+#define MPI2_BIOSPAGE1_IOCSET_BOOT_RM_SETTING           (0x00000040)
+#define MPI2_BIOSPAGE1_IOCSET_MEDIA_RM_SETTING          (0x00000080)
+
+#define MPI2_BIOSPAGE1_IOCSET_MASK_ADAPTER_SUPPORT      (0x00000030)
+#define MPI2_BIOSPAGE1_IOCSET_NO_SUPPORT                (0x00000000)
+#define MPI2_BIOSPAGE1_IOCSET_BIOS_SUPPORT              (0x00000010)
+#define MPI2_BIOSPAGE1_IOCSET_OS_SUPPORT                (0x00000020)
+#define MPI2_BIOSPAGE1_IOCSET_ALL_SUPPORT               (0x00000030)
+
+#define MPI2_BIOSPAGE1_IOCSET_ALTERNATE_CHS             (0x00000008)
+
+/* values for BIOS Page 1 DeviceSettings field */
+#define MPI2_BIOSPAGE1_DEVSET_DISABLE_SMART_POLLING     (0x00000010)
+#define MPI2_BIOSPAGE1_DEVSET_DISABLE_SEQ_LUN           (0x00000008)
+#define MPI2_BIOSPAGE1_DEVSET_DISABLE_RM_LUN            (0x00000004)
+#define MPI2_BIOSPAGE1_DEVSET_DISABLE_NON_RM_LUN        (0x00000002)
+#define MPI2_BIOSPAGE1_DEVSET_DISABLE_OTHER_LUN         (0x00000001)
+
+
+/* BIOS Page 2 */
+
+typedef struct _MPI2_BOOT_DEVICE_ADAPTER_ORDER
+{
+    U32         Reserved1;                              /* 0x00 */
+    U32         Reserved2;                              /* 0x04 */
+    U32         Reserved3;                              /* 0x08 */
+    U32         Reserved4;                              /* 0x0C */
+    U32         Reserved5;                              /* 0x10 */
+    U32         Reserved6;                              /* 0x14 */
+} MPI2_BOOT_DEVICE_ADAPTER_ORDER,
+  MPI2_POINTER PTR_MPI2_BOOT_DEVICE_ADAPTER_ORDER,
+  Mpi2BootDeviceAdapterOrder_t, MPI2_POINTER pMpi2BootDeviceAdapterOrder_t;
+
+typedef struct _MPI2_BOOT_DEVICE_SAS_WWID
+{
+    U64         SASAddress;                             /* 0x00 */
+    U8          LUN[8];                                 /* 0x08 */
+    U32         Reserved1;                              /* 0x10 */
+    U32         Reserved2;                              /* 0x14 */
+} MPI2_BOOT_DEVICE_SAS_WWID, MPI2_POINTER PTR_MPI2_BOOT_DEVICE_SAS_WWID,
+  Mpi2BootDeviceSasWwid_t, MPI2_POINTER pMpi2BootDeviceSasWwid_t;
+
+typedef struct _MPI2_BOOT_DEVICE_ENCLOSURE_SLOT
+{
+    U64         EnclosureLogicalID;                     /* 0x00 */
+    U32         Reserved1;                              /* 0x08 */
+    U32         Reserved2;                              /* 0x0C */
+    U16         SlotNumber;                             /* 0x10 */
+    U16         Reserved3;                              /* 0x12 */
+    U32         Reserved4;                              /* 0x14 */
+} MPI2_BOOT_DEVICE_ENCLOSURE_SLOT,
+  MPI2_POINTER PTR_MPI2_BOOT_DEVICE_ENCLOSURE_SLOT,
+  Mpi2BootDeviceEnclosureSlot_t, MPI2_POINTER pMpi2BootDeviceEnclosureSlot_t;
+
+typedef struct _MPI2_BOOT_DEVICE_DEVICE_NAME
+{
+    U64         DeviceName;                             /* 0x00 */
+    U8          LUN[8];                                 /* 0x08 */
+    U32         Reserved1;                              /* 0x10 */
+    U32         Reserved2;                              /* 0x14 */
+} MPI2_BOOT_DEVICE_DEVICE_NAME, MPI2_POINTER PTR_MPI2_BOOT_DEVICE_DEVICE_NAME,
+  Mpi2BootDeviceDeviceName_t, MPI2_POINTER pMpi2BootDeviceDeviceName_t;
+
+typedef union _MPI2_MPI2_BIOSPAGE2_BOOT_DEVICE
+{
+    MPI2_BOOT_DEVICE_ADAPTER_ORDER  AdapterOrder;
+    MPI2_BOOT_DEVICE_SAS_WWID       SasWwid;
+    MPI2_BOOT_DEVICE_ENCLOSURE_SLOT EnclosureSlot;
+    MPI2_BOOT_DEVICE_DEVICE_NAME    DeviceName;
+} MPI2_BIOSPAGE2_BOOT_DEVICE, MPI2_POINTER PTR_MPI2_BIOSPAGE2_BOOT_DEVICE,
+  Mpi2BiosPage2BootDevice_t, MPI2_POINTER pMpi2BiosPage2BootDevice_t;
+
+typedef struct _MPI2_CONFIG_PAGE_BIOS_2
+{
+    MPI2_CONFIG_PAGE_HEADER     Header;                 /* 0x00 */
+    U32                         Reserved1;              /* 0x04 */
+    U32                         Reserved2;              /* 0x08 */
+    U32                         Reserved3;              /* 0x0C */
+    U32                         Reserved4;              /* 0x10 */
+    U32                         Reserved5;              /* 0x14 */
+    U32                         Reserved6;              /* 0x18 */
+    U8                          ReqBootDeviceForm;      /* 0x1C */
+    U8                          Reserved7;              /* 0x1D */
+    U16                         Reserved8;              /* 0x1E */
+    MPI2_BIOSPAGE2_BOOT_DEVICE  RequestedBootDevice;    /* 0x20 */
+    U8                          ReqAltBootDeviceForm;   /* 0x38 */
+    U8                          Reserved9;              /* 0x39 */
+    U16                         Reserved10;             /* 0x3A */
+    MPI2_BIOSPAGE2_BOOT_DEVICE  RequestedAltBootDevice; /* 0x3C */
+    U8                          CurrentBootDeviceForm;  /* 0x58 */
+    U8                          Reserved11;             /* 0x59 */
+    U16                         Reserved12;             /* 0x5A */
+    MPI2_BIOSPAGE2_BOOT_DEVICE  CurrentBootDevice;      /* 0x58 */
+} MPI2_CONFIG_PAGE_BIOS_2, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_BIOS_2,
+  Mpi2BiosPage2_t, MPI2_POINTER pMpi2BiosPage2_t;
+
+#define MPI2_BIOSPAGE2_PAGEVERSION                      (0x04)
+
+/* values for BIOS Page 2 BootDeviceForm fields */
+#define MPI2_BIOSPAGE2_FORM_MASK                        (0x0F)
+#define MPI2_BIOSPAGE2_FORM_NO_DEVICE_SPECIFIED         (0x00)
+#define MPI2_BIOSPAGE2_FORM_SAS_WWID                    (0x05)
+#define MPI2_BIOSPAGE2_FORM_ENCLOSURE_SLOT              (0x06)
+#define MPI2_BIOSPAGE2_FORM_DEVICE_NAME                 (0x07)
+
+
+/* BIOS Page 3 */
+
+typedef struct _MPI2_ADAPTER_INFO
+{
+    U8      PciBusNumber;                               /* 0x00 */
+    U8      PciDeviceAndFunctionNumber;                 /* 0x01 */
+    U16     AdapterFlags;                               /* 0x02 */
+} MPI2_ADAPTER_INFO, MPI2_POINTER PTR_MPI2_ADAPTER_INFO,
+  Mpi2AdapterInfo_t, MPI2_POINTER pMpi2AdapterInfo_t;
+
+#define MPI2_ADAPTER_INFO_FLAGS_EMBEDDED                (0x0001)
+#define MPI2_ADAPTER_INFO_FLAGS_INIT_STATUS             (0x0002)
+
+typedef struct _MPI2_CONFIG_PAGE_BIOS_3
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U32                     GlobalFlags;                /* 0x04 */
+    U32                     BiosVersion;                /* 0x08 */
+    MPI2_ADAPTER_INFO       AdapterOrder[4];            /* 0x0C */
+    U32                     Reserved1;                  /* 0x1C */
+} MPI2_CONFIG_PAGE_BIOS_3, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_BIOS_3,
+  Mpi2BiosPage3_t, MPI2_POINTER pMpi2BiosPage3_t;
+
+#define MPI2_BIOSPAGE3_PAGEVERSION                      (0x00)
+
+/* values for BIOS Page 3 GlobalFlags */
+#define MPI2_BIOSPAGE3_FLAGS_PAUSE_ON_ERROR             (0x00000002)
+#define MPI2_BIOSPAGE3_FLAGS_VERBOSE_ENABLE             (0x00000004)
+#define MPI2_BIOSPAGE3_FLAGS_HOOK_INT_40_DISABLE        (0x00000010)
+
+#define MPI2_BIOSPAGE3_FLAGS_DEV_LIST_DISPLAY_MASK      (0x000000E0)
+#define MPI2_BIOSPAGE3_FLAGS_INSTALLED_DEV_DISPLAY      (0x00000000)
+#define MPI2_BIOSPAGE3_FLAGS_ADAPTER_DISPLAY            (0x00000020)
+#define MPI2_BIOSPAGE3_FLAGS_ADAPTER_DEV_DISPLAY        (0x00000040)
+
+
+/* BIOS Page 4 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_BIOS_PAGE_4_PHY_ENTRIES
+#define MPI2_BIOS_PAGE_4_PHY_ENTRIES        (1)
+#endif
+
+typedef struct _MPI2_BIOS4_ENTRY
+{
+    U64                     ReassignmentWWID;       /* 0x00 */
+    U64                     ReassignmentDeviceName; /* 0x08 */
+} MPI2_BIOS4_ENTRY, MPI2_POINTER PTR_MPI2_BIOS4_ENTRY,
+  Mpi2MBios4Entry_t, MPI2_POINTER pMpi2Bios4Entry_t;
+
+typedef struct _MPI2_CONFIG_PAGE_BIOS_4
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                             /* 0x00 */
+    U8                      NumPhys;                            /* 0x04 */
+    U8                      Reserved1;                          /* 0x05 */
+    U16                     Reserved2;                          /* 0x06 */
+    MPI2_BIOS4_ENTRY        Phy[MPI2_BIOS_PAGE_4_PHY_ENTRIES];  /* 0x08 */
+} MPI2_CONFIG_PAGE_BIOS_4, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_BIOS_4,
+  Mpi2BiosPage4_t, MPI2_POINTER pMpi2BiosPage4_t;
+
+#define MPI2_BIOSPAGE4_PAGEVERSION                      (0x01)
+
+
+/****************************************************************************
+*   RAID Volume Config Pages
+****************************************************************************/
+
+/* RAID Volume Page 0 */
+
+typedef struct _MPI2_RAIDVOL0_PHYS_DISK
+{
+    U8                      RAIDSetNum;                 /* 0x00 */
+    U8                      PhysDiskMap;                /* 0x01 */
+    U8                      PhysDiskNum;                /* 0x02 */
+    U8                      Reserved;                   /* 0x03 */
+} MPI2_RAIDVOL0_PHYS_DISK, MPI2_POINTER PTR_MPI2_RAIDVOL0_PHYS_DISK,
+  Mpi2RaidVol0PhysDisk_t, MPI2_POINTER pMpi2RaidVol0PhysDisk_t;
+
+/* defines for the PhysDiskMap field */
+#define MPI2_RAIDVOL0_PHYSDISK_PRIMARY                  (0x01)
+#define MPI2_RAIDVOL0_PHYSDISK_SECONDARY                (0x02)
+
+typedef struct _MPI2_RAIDVOL0_SETTINGS
+{
+    U16                     Settings;                   /* 0x00 */
+    U8                      HotSparePool;               /* 0x01 */
+    U8                      Reserved;                   /* 0x02 */
+} MPI2_RAIDVOL0_SETTINGS, MPI2_POINTER PTR_MPI2_RAIDVOL0_SETTINGS,
+  Mpi2RaidVol0Settings_t, MPI2_POINTER pMpi2RaidVol0Settings_t;
+
+/* RAID Volume Page 0 HotSparePool defines, also used in RAID Physical Disk */
+#define MPI2_RAID_HOT_SPARE_POOL_0                      (0x01)
+#define MPI2_RAID_HOT_SPARE_POOL_1                      (0x02)
+#define MPI2_RAID_HOT_SPARE_POOL_2                      (0x04)
+#define MPI2_RAID_HOT_SPARE_POOL_3                      (0x08)
+#define MPI2_RAID_HOT_SPARE_POOL_4                      (0x10)
+#define MPI2_RAID_HOT_SPARE_POOL_5                      (0x20)
+#define MPI2_RAID_HOT_SPARE_POOL_6                      (0x40)
+#define MPI2_RAID_HOT_SPARE_POOL_7                      (0x80)
+
+/* RAID Volume Page 0 VolumeSettings defines */
+#define MPI2_RAIDVOL0_SETTING_USE_PRODUCT_ID_SUFFIX     (0x0008)
+#define MPI2_RAIDVOL0_SETTING_AUTO_CONFIG_HSWAP_DISABLE (0x0004)
+
+#define MPI2_RAIDVOL0_SETTING_MASK_WRITE_CACHING        (0x0003)
+#define MPI2_RAIDVOL0_SETTING_UNCHANGED                 (0x0000)
+#define MPI2_RAIDVOL0_SETTING_DISABLE_WRITE_CACHING     (0x0001)
+#define MPI2_RAIDVOL0_SETTING_ENABLE_WRITE_CACHING      (0x0002)
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength at runtime.
+ */
+#ifndef MPI2_RAID_VOL_PAGE_0_PHYSDISK_MAX
+#define MPI2_RAID_VOL_PAGE_0_PHYSDISK_MAX       (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_RAID_VOL_0
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U16                     DevHandle;                  /* 0x04 */
+    U8                      VolumeState;                /* 0x06 */
+    U8                      VolumeType;                 /* 0x07 */
+    U32                     VolumeStatusFlags;          /* 0x08 */
+    MPI2_RAIDVOL0_SETTINGS  VolumeSettings;             /* 0x0C */
+    U64                     MaxLBA;                     /* 0x10 */
+    U32                     StripeSize;                 /* 0x18 */
+    U16                     BlockSize;                  /* 0x1C */
+    U16                     Reserved1;                  /* 0x1E */
+    U8                      SupportedPhysDisks;         /* 0x20 */
+    U8                      ResyncRate;                 /* 0x21 */
+    U16                     DataScrubDuration;          /* 0x22 */
+    U8                      NumPhysDisks;               /* 0x24 */
+    U8                      Reserved2;                  /* 0x25 */
+    U8                      Reserved3;                  /* 0x26 */
+    U8                      InactiveStatus;             /* 0x27 */
+    MPI2_RAIDVOL0_PHYS_DISK PhysDisk[MPI2_RAID_VOL_PAGE_0_PHYSDISK_MAX]; /* 0x28 */
+} MPI2_CONFIG_PAGE_RAID_VOL_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_RAID_VOL_0,
+  Mpi2RaidVolPage0_t, MPI2_POINTER pMpi2RaidVolPage0_t;
+
+#define MPI2_RAIDVOLPAGE0_PAGEVERSION           (0x0A)
+
+/* values for RAID VolumeState */
+#define MPI2_RAID_VOL_STATE_MISSING                         (0x00)
+#define MPI2_RAID_VOL_STATE_FAILED                          (0x01)
+#define MPI2_RAID_VOL_STATE_INITIALIZING                    (0x02)
+#define MPI2_RAID_VOL_STATE_ONLINE                          (0x03)
+#define MPI2_RAID_VOL_STATE_DEGRADED                        (0x04)
+#define MPI2_RAID_VOL_STATE_OPTIMAL                         (0x05)
+
+/* values for RAID VolumeType */
+#define MPI2_RAID_VOL_TYPE_RAID0                            (0x00)
+#define MPI2_RAID_VOL_TYPE_RAID1E                           (0x01)
+#define MPI2_RAID_VOL_TYPE_RAID1                            (0x02)
+#define MPI2_RAID_VOL_TYPE_RAID10                           (0x05)
+#define MPI2_RAID_VOL_TYPE_UNKNOWN                          (0xFF)
+
+/* values for RAID Volume Page 0 VolumeStatusFlags field */
+#define MPI2_RAIDVOL0_STATUS_FLAG_PENDING_RESYNC            (0x02000000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_BACKG_INIT_PENDING        (0x01000000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_MDC_PENDING               (0x00800000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_USER_CONSIST_PENDING      (0x00400000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_MAKE_DATA_CONSISTENT      (0x00200000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_DATA_SCRUB                (0x00100000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_CONSISTENCY_CHECK         (0x00080000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_CAPACITY_EXPANSION        (0x00040000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_BACKGROUND_INIT           (0x00020000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_RESYNC_IN_PROGRESS        (0x00010000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_OCE_ALLOWED               (0x00000040)
+#define MPI2_RAIDVOL0_STATUS_FLAG_BGI_COMPLETE              (0x00000020)
+#define MPI2_RAIDVOL0_STATUS_FLAG_1E_OFFSET_MIRROR          (0x00000000)
+#define MPI2_RAIDVOL0_STATUS_FLAG_1E_ADJACENT_MIRROR        (0x00000010)
+#define MPI2_RAIDVOL0_STATUS_FLAG_BAD_BLOCK_TABLE_FULL      (0x00000008)
+#define MPI2_RAIDVOL0_STATUS_FLAG_VOLUME_INACTIVE           (0x00000004)
+#define MPI2_RAIDVOL0_STATUS_FLAG_QUIESCED                  (0x00000002)
+#define MPI2_RAIDVOL0_STATUS_FLAG_ENABLED                   (0x00000001)
+
+/* values for RAID Volume Page 0 SupportedPhysDisks field */
+#define MPI2_RAIDVOL0_SUPPORT_SOLID_STATE_DISKS             (0x08)
+#define MPI2_RAIDVOL0_SUPPORT_HARD_DISKS                    (0x04)
+#define MPI2_RAIDVOL0_SUPPORT_SAS_PROTOCOL                  (0x02)
+#define MPI2_RAIDVOL0_SUPPORT_SATA_PROTOCOL                 (0x01)
+
+/* values for RAID Volume Page 0 InactiveStatus field */
+#define MPI2_RAIDVOLPAGE0_UNKNOWN_INACTIVE                  (0x00)
+#define MPI2_RAIDVOLPAGE0_STALE_METADATA_INACTIVE           (0x01)
+#define MPI2_RAIDVOLPAGE0_FOREIGN_VOLUME_INACTIVE           (0x02)
+#define MPI2_RAIDVOLPAGE0_INSUFFICIENT_RESOURCE_INACTIVE    (0x03)
+#define MPI2_RAIDVOLPAGE0_CLONE_VOLUME_INACTIVE             (0x04)
+#define MPI2_RAIDVOLPAGE0_INSUFFICIENT_METADATA_INACTIVE    (0x05)
+#define MPI2_RAIDVOLPAGE0_PREVIOUSLY_DELETED                (0x06)
+
+
+/* RAID Volume Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_RAID_VOL_1
+{
+    MPI2_CONFIG_PAGE_HEADER Header;                     /* 0x00 */
+    U16                     DevHandle;                  /* 0x04 */
+    U16                     Reserved0;                  /* 0x06 */
+    U8                      GUID[24];                   /* 0x08 */
+    U8                      Name[16];                   /* 0x20 */
+    U64                     WWID;                       /* 0x30 */
+    U32                     Reserved1;                  /* 0x38 */
+    U32                     Reserved2;                  /* 0x3C */
+} MPI2_CONFIG_PAGE_RAID_VOL_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_RAID_VOL_1,
+  Mpi2RaidVolPage1_t, MPI2_POINTER pMpi2RaidVolPage1_t;
+
+#define MPI2_RAIDVOLPAGE1_PAGEVERSION           (0x03)
+
+
+/****************************************************************************
+*   RAID Physical Disk Config Pages
+****************************************************************************/
+
+/* RAID Physical Disk Page 0 */
+
+typedef struct _MPI2_RAIDPHYSDISK0_SETTINGS
+{
+    U16                     Reserved1;                  /* 0x00 */
+    U8                      HotSparePool;               /* 0x02 */
+    U8                      Reserved2;                  /* 0x03 */
+} MPI2_RAIDPHYSDISK0_SETTINGS, MPI2_POINTER PTR_MPI2_RAIDPHYSDISK0_SETTINGS,
+  Mpi2RaidPhysDisk0Settings_t, MPI2_POINTER pMpi2RaidPhysDisk0Settings_t;
+
+/* use MPI2_RAID_HOT_SPARE_POOL_ defines for the HotSparePool field */
+
+typedef struct _MPI2_RAIDPHYSDISK0_INQUIRY_DATA
+{
+    U8                      VendorID[8];                /* 0x00 */
+    U8                      ProductID[16];              /* 0x08 */
+    U8                      ProductRevLevel[4];         /* 0x18 */
+    U8                      SerialNum[32];              /* 0x1C */
+} MPI2_RAIDPHYSDISK0_INQUIRY_DATA,
+  MPI2_POINTER PTR_MPI2_RAIDPHYSDISK0_INQUIRY_DATA,
+  Mpi2RaidPhysDisk0InquiryData_t, MPI2_POINTER pMpi2RaidPhysDisk0InquiryData_t;
+
+typedef struct _MPI2_CONFIG_PAGE_RD_PDISK_0
+{
+    MPI2_CONFIG_PAGE_HEADER         Header;                     /* 0x00 */
+    U16                             DevHandle;                  /* 0x04 */
+    U8                              Reserved1;                  /* 0x06 */
+    U8                              PhysDiskNum;                /* 0x07 */
+    MPI2_RAIDPHYSDISK0_SETTINGS     PhysDiskSettings;           /* 0x08 */
+    U32                             Reserved2;                  /* 0x0C */
+    MPI2_RAIDPHYSDISK0_INQUIRY_DATA InquiryData;                /* 0x10 */
+    U32                             Reserved3;                  /* 0x4C */
+    U8                              PhysDiskState;              /* 0x50 */
+    U8                              OfflineReason;              /* 0x51 */
+    U8                              IncompatibleReason;         /* 0x52 */
+    U8                              PhysDiskAttributes;         /* 0x53 */
+    U32                             PhysDiskStatusFlags;        /* 0x54 */
+    U64                             DeviceMaxLBA;               /* 0x58 */
+    U64                             HostMaxLBA;                 /* 0x60 */
+    U64                             CoercedMaxLBA;              /* 0x68 */
+    U16                             BlockSize;                  /* 0x70 */
+    U16                             Reserved5;                  /* 0x72 */
+    U32                             Reserved6;                  /* 0x74 */
+} MPI2_CONFIG_PAGE_RD_PDISK_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_RD_PDISK_0,
+  Mpi2RaidPhysDiskPage0_t, MPI2_POINTER pMpi2RaidPhysDiskPage0_t;
+
+#define MPI2_RAIDPHYSDISKPAGE0_PAGEVERSION          (0x05)
+
+/* PhysDiskState defines */
+#define MPI2_RAID_PD_STATE_NOT_CONFIGURED               (0x00)
+#define MPI2_RAID_PD_STATE_NOT_COMPATIBLE               (0x01)
+#define MPI2_RAID_PD_STATE_OFFLINE                      (0x02)
+#define MPI2_RAID_PD_STATE_ONLINE                       (0x03)
+#define MPI2_RAID_PD_STATE_HOT_SPARE                    (0x04)
+#define MPI2_RAID_PD_STATE_DEGRADED                     (0x05)
+#define MPI2_RAID_PD_STATE_REBUILDING                   (0x06)
+#define MPI2_RAID_PD_STATE_OPTIMAL                      (0x07)
+
+/* OfflineReason defines */
+#define MPI2_PHYSDISK0_ONLINE                           (0x00)
+#define MPI2_PHYSDISK0_OFFLINE_MISSING                  (0x01)
+#define MPI2_PHYSDISK0_OFFLINE_FAILED                   (0x03)
+#define MPI2_PHYSDISK0_OFFLINE_INITIALIZING             (0x04)
+#define MPI2_PHYSDISK0_OFFLINE_REQUESTED                (0x05)
+#define MPI2_PHYSDISK0_OFFLINE_FAILED_REQUESTED         (0x06)
+#define MPI2_PHYSDISK0_OFFLINE_OTHER                    (0xFF)
+
+/* IncompatibleReason defines */
+#define MPI2_PHYSDISK0_COMPATIBLE                       (0x00)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_PROTOCOL            (0x01)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_BLOCKSIZE           (0x02)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_MAX_LBA             (0x03)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_SATA_EXTENDED_CMD   (0x04)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_REMOVEABLE_MEDIA    (0x05)
+#define MPI2_PHYSDISK0_INCOMPATIBLE_UNKNOWN             (0xFF)
+
+/* PhysDiskAttributes defines */
+#define MPI2_PHYSDISK0_ATTRIB_SOLID_STATE_DRIVE         (0x08)
+#define MPI2_PHYSDISK0_ATTRIB_HARD_DISK_DRIVE           (0x04)
+#define MPI2_PHYSDISK0_ATTRIB_SAS_PROTOCOL              (0x02)
+#define MPI2_PHYSDISK0_ATTRIB_SATA_PROTOCOL             (0x01)
+
+/* PhysDiskStatusFlags defines */
+#define MPI2_PHYSDISK0_STATUS_FLAG_NOT_CERTIFIED        (0x00000040)
+#define MPI2_PHYSDISK0_STATUS_FLAG_OCE_TARGET           (0x00000020)
+#define MPI2_PHYSDISK0_STATUS_FLAG_WRITE_CACHE_ENABLED  (0x00000010)
+#define MPI2_PHYSDISK0_STATUS_FLAG_OPTIMAL_PREVIOUS     (0x00000000)
+#define MPI2_PHYSDISK0_STATUS_FLAG_NOT_OPTIMAL_PREVIOUS (0x00000008)
+#define MPI2_PHYSDISK0_STATUS_FLAG_INACTIVE_VOLUME      (0x00000004)
+#define MPI2_PHYSDISK0_STATUS_FLAG_QUIESCED             (0x00000002)
+#define MPI2_PHYSDISK0_STATUS_FLAG_OUT_OF_SYNC          (0x00000001)
+
+
+/* RAID Physical Disk Page 1 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.PageLength or NumPhysDiskPaths at runtime.
+ */
+#ifndef MPI2_RAID_PHYS_DISK1_PATH_MAX
+#define MPI2_RAID_PHYS_DISK1_PATH_MAX   (1)
+#endif
+
+typedef struct _MPI2_RAIDPHYSDISK1_PATH
+{
+    U16             DevHandle;          /* 0x00 */
+    U16             Reserved1;          /* 0x02 */
+    U64             WWID;               /* 0x04 */
+    U64             OwnerWWID;          /* 0x0C */
+    U8              OwnerIdentifier;    /* 0x14 */
+    U8              Reserved2;          /* 0x15 */
+    U16             Flags;              /* 0x16 */
+} MPI2_RAIDPHYSDISK1_PATH, MPI2_POINTER PTR_MPI2_RAIDPHYSDISK1_PATH,
+  Mpi2RaidPhysDisk1Path_t, MPI2_POINTER pMpi2RaidPhysDisk1Path_t;
+
+/* RAID Physical Disk Page 1 Physical Disk Path Flags field defines */
+#define MPI2_RAID_PHYSDISK1_FLAG_PRIMARY        (0x0004)
+#define MPI2_RAID_PHYSDISK1_FLAG_BROKEN         (0x0002)
+#define MPI2_RAID_PHYSDISK1_FLAG_INVALID        (0x0001)
+
+typedef struct _MPI2_CONFIG_PAGE_RD_PDISK_1
+{
+    MPI2_CONFIG_PAGE_HEADER         Header;                     /* 0x00 */
+    U8                              NumPhysDiskPaths;           /* 0x04 */
+    U8                              PhysDiskNum;                /* 0x05 */
+    U16                             Reserved1;                  /* 0x06 */
+    U32                             Reserved2;                  /* 0x08 */
+    MPI2_RAIDPHYSDISK1_PATH         PhysicalDiskPath[MPI2_RAID_PHYS_DISK1_PATH_MAX];/* 0x0C */
+} MPI2_CONFIG_PAGE_RD_PDISK_1,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_RD_PDISK_1,
+  Mpi2RaidPhysDiskPage1_t, MPI2_POINTER pMpi2RaidPhysDiskPage1_t;
+
+#define MPI2_RAIDPHYSDISKPAGE1_PAGEVERSION          (0x02)
+
+
+/****************************************************************************
+*   values for fields used by several types of SAS Config Pages
+****************************************************************************/
+
+/* values for NegotiatedLinkRates fields */
+#define MPI2_SAS_NEG_LINK_RATE_MASK_LOGICAL             (0xF0)
+#define MPI2_SAS_NEG_LINK_RATE_SHIFT_LOGICAL            (4)
+#define MPI2_SAS_NEG_LINK_RATE_MASK_PHYSICAL            (0x0F)
+/* link rates used for Negotiated Physical and Logical Link Rate */
+#define MPI2_SAS_NEG_LINK_RATE_UNKNOWN_LINK_RATE        (0x00)
+#define MPI2_SAS_NEG_LINK_RATE_PHY_DISABLED             (0x01)
+#define MPI2_SAS_NEG_LINK_RATE_NEGOTIATION_FAILED       (0x02)
+#define MPI2_SAS_NEG_LINK_RATE_SATA_OOB_COMPLETE        (0x03)
+#define MPI2_SAS_NEG_LINK_RATE_PORT_SELECTOR            (0x04)
+#define MPI2_SAS_NEG_LINK_RATE_SMP_RESET_IN_PROGRESS    (0x05)
+#define MPI2_SAS_NEG_LINK_RATE_1_5                      (0x08)
+#define MPI2_SAS_NEG_LINK_RATE_3_0                      (0x09)
+#define MPI2_SAS_NEG_LINK_RATE_6_0                      (0x0A)
+
+
+/* values for AttachedPhyInfo fields */
+#define MPI2_SAS_APHYINFO_INSIDE_ZPSDS_PERSISTENT       (0x00000040)
+#define MPI2_SAS_APHYINFO_REQUESTED_INSIDE_ZPSDS        (0x00000020)
+#define MPI2_SAS_APHYINFO_BREAK_REPLY_CAPABLE           (0x00000010)
+
+#define MPI2_SAS_APHYINFO_REASON_MASK                   (0x0000000F)
+#define MPI2_SAS_APHYINFO_REASON_UNKNOWN                (0x00000000)
+#define MPI2_SAS_APHYINFO_REASON_POWER_ON               (0x00000001)
+#define MPI2_SAS_APHYINFO_REASON_HARD_RESET             (0x00000002)
+#define MPI2_SAS_APHYINFO_REASON_SMP_PHY_CONTROL        (0x00000003)
+#define MPI2_SAS_APHYINFO_REASON_LOSS_OF_SYNC           (0x00000004)
+#define MPI2_SAS_APHYINFO_REASON_MULTIPLEXING_SEQ       (0x00000005)
+#define MPI2_SAS_APHYINFO_REASON_IT_NEXUS_LOSS_TIMER    (0x00000006)
+#define MPI2_SAS_APHYINFO_REASON_BREAK_TIMEOUT          (0x00000007)
+#define MPI2_SAS_APHYINFO_REASON_PHY_TEST_STOPPED       (0x00000008)
+
+
+/* values for PhyInfo fields */
+#define MPI2_SAS_PHYINFO_PHY_VACANT                     (0x80000000)
+#define MPI2_SAS_PHYINFO_CHANGED_REQ_INSIDE_ZPSDS       (0x04000000)
+#define MPI2_SAS_PHYINFO_INSIDE_ZPSDS_PERSISTENT        (0x02000000)
+#define MPI2_SAS_PHYINFO_REQ_INSIDE_ZPSDS               (0x01000000)
+#define MPI2_SAS_PHYINFO_ZONE_GROUP_PERSISTENT          (0x00400000)
+#define MPI2_SAS_PHYINFO_INSIDE_ZPSDS                   (0x00200000)
+#define MPI2_SAS_PHYINFO_ZONING_ENABLED                 (0x00100000)
+
+#define MPI2_SAS_PHYINFO_REASON_MASK                    (0x000F0000)
+#define MPI2_SAS_PHYINFO_REASON_UNKNOWN                 (0x00000000)
+#define MPI2_SAS_PHYINFO_REASON_POWER_ON                (0x00010000)
+#define MPI2_SAS_PHYINFO_REASON_HARD_RESET              (0x00020000)
+#define MPI2_SAS_PHYINFO_REASON_SMP_PHY_CONTROL         (0x00030000)
+#define MPI2_SAS_PHYINFO_REASON_LOSS_OF_SYNC            (0x00040000)
+#define MPI2_SAS_PHYINFO_REASON_MULTIPLEXING_SEQ        (0x00050000)
+#define MPI2_SAS_PHYINFO_REASON_IT_NEXUS_LOSS_TIMER     (0x00060000)
+#define MPI2_SAS_PHYINFO_REASON_BREAK_TIMEOUT           (0x00070000)
+#define MPI2_SAS_PHYINFO_REASON_PHY_TEST_STOPPED        (0x00080000)
+
+#define MPI2_SAS_PHYINFO_MULTIPLEXING_SUPPORTED         (0x00008000)
+#define MPI2_SAS_PHYINFO_SATA_PORT_ACTIVE               (0x00004000)
+#define MPI2_SAS_PHYINFO_SATA_PORT_SELECTOR_PRESENT     (0x00002000)
+#define MPI2_SAS_PHYINFO_VIRTUAL_PHY                    (0x00001000)
+
+#define MPI2_SAS_PHYINFO_MASK_PARTIAL_PATHWAY_TIME      (0x00000F00)
+#define MPI2_SAS_PHYINFO_SHIFT_PARTIAL_PATHWAY_TIME     (8)
+
+#define MPI2_SAS_PHYINFO_MASK_ROUTING_ATTRIBUTE         (0x000000F0)
+#define MPI2_SAS_PHYINFO_DIRECT_ROUTING                 (0x00000000)
+#define MPI2_SAS_PHYINFO_SUBTRACTIVE_ROUTING            (0x00000010)
+#define MPI2_SAS_PHYINFO_TABLE_ROUTING                  (0x00000020)
+
+
+/* values for SAS ProgrammedLinkRate fields */
+#define MPI2_SAS_PRATE_MAX_RATE_MASK                    (0xF0)
+#define MPI2_SAS_PRATE_MAX_RATE_NOT_PROGRAMMABLE        (0x00)
+#define MPI2_SAS_PRATE_MAX_RATE_1_5                     (0x80)
+#define MPI2_SAS_PRATE_MAX_RATE_3_0                     (0x90)
+#define MPI2_SAS_PRATE_MAX_RATE_6_0                     (0xA0)
+#define MPI2_SAS_PRATE_MIN_RATE_MASK                    (0x0F)
+#define MPI2_SAS_PRATE_MIN_RATE_NOT_PROGRAMMABLE        (0x00)
+#define MPI2_SAS_PRATE_MIN_RATE_1_5                     (0x08)
+#define MPI2_SAS_PRATE_MIN_RATE_3_0                     (0x09)
+#define MPI2_SAS_PRATE_MIN_RATE_6_0                     (0x0A)
+
+
+/* values for SAS HwLinkRate fields */
+#define MPI2_SAS_HWRATE_MAX_RATE_MASK                   (0xF0)
+#define MPI2_SAS_HWRATE_MAX_RATE_1_5                    (0x80)
+#define MPI2_SAS_HWRATE_MAX_RATE_3_0                    (0x90)
+#define MPI2_SAS_HWRATE_MAX_RATE_6_0                    (0xA0)
+#define MPI2_SAS_HWRATE_MIN_RATE_MASK                   (0x0F)
+#define MPI2_SAS_HWRATE_MIN_RATE_1_5                    (0x08)
+#define MPI2_SAS_HWRATE_MIN_RATE_3_0                    (0x09)
+#define MPI2_SAS_HWRATE_MIN_RATE_6_0                    (0x0A)
+
+
+
+/****************************************************************************
+*   SAS IO Unit Config Pages
+****************************************************************************/
+
+/* SAS IO Unit Page 0 */
+
+typedef struct _MPI2_SAS_IO_UNIT0_PHY_DATA
+{
+    U8          Port;                   /* 0x00 */
+    U8          PortFlags;              /* 0x01 */
+    U8          PhyFlags;               /* 0x02 */
+    U8          NegotiatedLinkRate;     /* 0x03 */
+    U32         ControllerPhyDeviceInfo;/* 0x04 */
+    U16         AttachedDevHandle;      /* 0x08 */
+    U16         ControllerDevHandle;    /* 0x0A */
+    U32         DiscoveryStatus;        /* 0x0C */
+    U32         Reserved;               /* 0x10 */
+} MPI2_SAS_IO_UNIT0_PHY_DATA, MPI2_POINTER PTR_MPI2_SAS_IO_UNIT0_PHY_DATA,
+  Mpi2SasIOUnit0PhyData_t, MPI2_POINTER pMpi2SasIOUnit0PhyData_t;
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.ExtPageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_SAS_IOUNIT0_PHY_MAX
+#define MPI2_SAS_IOUNIT0_PHY_MAX        (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_SASIOUNIT_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                             /* 0x00 */
+    U32                                 Reserved1;                          /* 0x08 */
+    U8                                  NumPhys;                            /* 0x0C */
+    U8                                  Reserved2;                          /* 0x0D */
+    U16                                 Reserved3;                          /* 0x0E */
+    MPI2_SAS_IO_UNIT0_PHY_DATA          PhyData[MPI2_SAS_IOUNIT0_PHY_MAX];  /* 0x10 */
+} MPI2_CONFIG_PAGE_SASIOUNIT_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SASIOUNIT_0,
+  Mpi2SasIOUnitPage0_t, MPI2_POINTER pMpi2SasIOUnitPage0_t;
+
+#define MPI2_SASIOUNITPAGE0_PAGEVERSION                     (0x05)
+
+/* values for SAS IO Unit Page 0 PortFlags */
+#define MPI2_SASIOUNIT0_PORTFLAGS_DISCOVERY_IN_PROGRESS     (0x08)
+#define MPI2_SASIOUNIT0_PORTFLAGS_AUTO_PORT_CONFIG          (0x01)
+
+/* values for SAS IO Unit Page 0 PhyFlags */
+#define MPI2_SASIOUNIT0_PHYFLAGS_ZONING_ENABLED             (0x10)
+#define MPI2_SASIOUNIT0_PHYFLAGS_PHY_DISABLED               (0x08)
+
+/* use MPI2_SAS_NEG_LINK_RATE_ defines for the NegotiatedLinkRate field */
+
+/* see mpi2_sas.h for values for SAS IO Unit Page 0 ControllerPhyDeviceInfo values */
+
+/* values for SAS IO Unit Page 0 DiscoveryStatus */
+#define MPI2_SASIOUNIT0_DS_MAX_ENCLOSURES_EXCEED            (0x80000000)
+#define MPI2_SASIOUNIT0_DS_MAX_EXPANDERS_EXCEED             (0x40000000)
+#define MPI2_SASIOUNIT0_DS_MAX_DEVICES_EXCEED               (0x20000000)
+#define MPI2_SASIOUNIT0_DS_MAX_TOPO_PHYS_EXCEED             (0x10000000)
+#define MPI2_SASIOUNIT0_DS_DOWNSTREAM_INITIATOR             (0x08000000)
+#define MPI2_SASIOUNIT0_DS_MULTI_SUBTRACTIVE_SUBTRACTIVE    (0x00008000)
+#define MPI2_SASIOUNIT0_DS_EXP_MULTI_SUBTRACTIVE            (0x00004000)
+#define MPI2_SASIOUNIT0_DS_MULTI_PORT_DOMAIN                (0x00002000)
+#define MPI2_SASIOUNIT0_DS_TABLE_TO_SUBTRACTIVE_LINK        (0x00001000)
+#define MPI2_SASIOUNIT0_DS_UNSUPPORTED_DEVICE               (0x00000800)
+#define MPI2_SASIOUNIT0_DS_TABLE_LINK                       (0x00000400)
+#define MPI2_SASIOUNIT0_DS_SUBTRACTIVE_LINK                 (0x00000200)
+#define MPI2_SASIOUNIT0_DS_SMP_CRC_ERROR                    (0x00000100)
+#define MPI2_SASIOUNIT0_DS_SMP_FUNCTION_FAILED              (0x00000080)
+#define MPI2_SASIOUNIT0_DS_INDEX_NOT_EXIST                  (0x00000040)
+#define MPI2_SASIOUNIT0_DS_OUT_ROUTE_ENTRIES                (0x00000020)
+#define MPI2_SASIOUNIT0_DS_SMP_TIMEOUT                      (0x00000010)
+#define MPI2_SASIOUNIT0_DS_MULTIPLE_PORTS                   (0x00000004)
+#define MPI2_SASIOUNIT0_DS_UNADDRESSABLE_DEVICE             (0x00000002)
+#define MPI2_SASIOUNIT0_DS_LOOP_DETECTED                    (0x00000001)
+
+
+/* SAS IO Unit Page 1 */
+
+typedef struct _MPI2_SAS_IO_UNIT1_PHY_DATA
+{
+    U8          Port;                       /* 0x00 */
+    U8          PortFlags;                  /* 0x01 */
+    U8          PhyFlags;                   /* 0x02 */
+    U8          MaxMinLinkRate;             /* 0x03 */
+    U32         ControllerPhyDeviceInfo;    /* 0x04 */
+    U16         MaxTargetPortConnectTime;   /* 0x08 */
+    U16         Reserved1;                  /* 0x0A */
+} MPI2_SAS_IO_UNIT1_PHY_DATA, MPI2_POINTER PTR_MPI2_SAS_IO_UNIT1_PHY_DATA,
+  Mpi2SasIOUnit1PhyData_t, MPI2_POINTER pMpi2SasIOUnit1PhyData_t;
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.ExtPageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_SAS_IOUNIT1_PHY_MAX
+#define MPI2_SAS_IOUNIT1_PHY_MAX        (1)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_SASIOUNIT_1
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                             /* 0x00 */
+    U16                                 ControlFlags;                       /* 0x08 */
+    U16                                 SASNarrowMaxQueueDepth;             /* 0x0A */
+    U16                                 AdditionalControlFlags;             /* 0x0C */
+    U16                                 SASWideMaxQueueDepth;               /* 0x0E */
+    U8                                  NumPhys;                            /* 0x10 */
+    U8                                  SATAMaxQDepth;                      /* 0x11 */
+    U8                                  ReportDeviceMissingDelay;           /* 0x12 */
+    U8                                  IODeviceMissingDelay;               /* 0x13 */
+    MPI2_SAS_IO_UNIT1_PHY_DATA          PhyData[MPI2_SAS_IOUNIT1_PHY_MAX];  /* 0x14 */
+} MPI2_CONFIG_PAGE_SASIOUNIT_1,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SASIOUNIT_1,
+  Mpi2SasIOUnitPage1_t, MPI2_POINTER pMpi2SasIOUnitPage1_t;
+
+#define MPI2_SASIOUNITPAGE1_PAGEVERSION     (0x09)
+
+/* values for SAS IO Unit Page 1 ControlFlags */
+#define MPI2_SASIOUNIT1_CONTROL_DEVICE_SELF_TEST                    (0x8000)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_3_0_MAX                        (0x4000)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_1_5_MAX                        (0x2000)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_SW_PRESERVE                    (0x1000)
+
+#define MPI2_SASIOUNIT1_CONTROL_MASK_DEV_SUPPORT                    (0x0600)
+#define MPI2_SASIOUNIT1_CONTROL_SHIFT_DEV_SUPPORT                   (9)
+#define MPI2_SASIOUNIT1_CONTROL_DEV_SUPPORT_BOTH                    (0x0)
+#define MPI2_SASIOUNIT1_CONTROL_DEV_SAS_SUPPORT                     (0x1)
+#define MPI2_SASIOUNIT1_CONTROL_DEV_SATA_SUPPORT                    (0x2)
+
+#define MPI2_SASIOUNIT1_CONTROL_SATA_48BIT_LBA_REQUIRED             (0x0080)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_SMART_REQUIRED                 (0x0040)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_NCQ_REQUIRED                   (0x0020)
+#define MPI2_SASIOUNIT1_CONTROL_SATA_FUA_REQUIRED                   (0x0010)
+#define MPI2_SASIOUNIT1_CONTROL_TABLE_SUBTRACTIVE_ILLEGAL           (0x0008)
+#define MPI2_SASIOUNIT1_CONTROL_SUBTRACTIVE_ILLEGAL                 (0x0004)
+#define MPI2_SASIOUNIT1_CONTROL_FIRST_LVL_DISC_ONLY                 (0x0002)
+#define MPI2_SASIOUNIT1_CONTROL_CLEAR_AFFILIATION                   (0x0001)
+
+/* values for SAS IO Unit Page 1 AdditionalControlFlags */
+#define MPI2_SASIOUNIT1_ACONTROL_MULTI_PORT_DOMAIN_ILLEGAL          (0x0080)
+#define MPI2_SASIOUNIT1_ACONTROL_SATA_ASYNCHROUNOUS_NOTIFICATION    (0x0040)
+#define MPI2_SASIOUNIT1_ACONTROL_INVALID_TOPOLOGY_CORRECTION        (0x0020)
+#define MPI2_SASIOUNIT1_ACONTROL_PORT_ENABLE_ONLY_SATA_LINK_RESET   (0x0010)
+#define MPI2_SASIOUNIT1_ACONTROL_OTHER_AFFILIATION_SATA_LINK_RESET  (0x0008)
+#define MPI2_SASIOUNIT1_ACONTROL_SELF_AFFILIATION_SATA_LINK_RESET   (0x0004)
+#define MPI2_SASIOUNIT1_ACONTROL_NO_AFFILIATION_SATA_LINK_RESET     (0x0002)
+#define MPI2_SASIOUNIT1_ACONTROL_ALLOW_TABLE_TO_TABLE               (0x0001)
+
+/* defines for SAS IO Unit Page 1 ReportDeviceMissingDelay */
+#define MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK                 (0x7F)
+#define MPI2_SASIOUNIT1_REPORT_MISSING_UNIT_16                      (0x80)
+
+/* values for SAS IO Unit Page 1 PortFlags */
+#define MPI2_SASIOUNIT1_PORT_FLAGS_AUTO_PORT_CONFIG                 (0x01)
+
+/* values for SAS IO Unit Page 2 PhyFlags */
+#define MPI2_SASIOUNIT1_PHYFLAGS_ZONING_ENABLE                      (0x10)
+#define MPI2_SASIOUNIT1_PHYFLAGS_PHY_DISABLE                        (0x08)
+
+/* values for SAS IO Unit Page 0 MaxMinLinkRate */
+#define MPI2_SASIOUNIT1_MAX_RATE_MASK                               (0xF0)
+#define MPI2_SASIOUNIT1_MAX_RATE_1_5                                (0x80)
+#define MPI2_SASIOUNIT1_MAX_RATE_3_0                                (0x90)
+#define MPI2_SASIOUNIT1_MAX_RATE_6_0                                (0xA0)
+#define MPI2_SASIOUNIT1_MIN_RATE_MASK                               (0x0F)
+#define MPI2_SASIOUNIT1_MIN_RATE_1_5                                (0x08)
+#define MPI2_SASIOUNIT1_MIN_RATE_3_0                                (0x09)
+#define MPI2_SASIOUNIT1_MIN_RATE_6_0                                (0x0A)
+
+/* see mpi2_sas.h for values for SAS IO Unit Page 1 ControllerPhyDeviceInfo values */
+
+
+/* SAS IO Unit Page 4 */
+
+typedef struct _MPI2_SAS_IOUNIT4_SPINUP_GROUP
+{
+    U8          MaxTargetSpinup;            /* 0x00 */
+    U8          SpinupDelay;                /* 0x01 */
+    U16         Reserved1;                  /* 0x02 */
+} MPI2_SAS_IOUNIT4_SPINUP_GROUP, MPI2_POINTER PTR_MPI2_SAS_IOUNIT4_SPINUP_GROUP,
+  Mpi2SasIOUnit4SpinupGroup_t, MPI2_POINTER pMpi2SasIOUnit4SpinupGroup_t;
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * four and check Header.ExtPageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_SAS_IOUNIT4_PHY_MAX
+#define MPI2_SAS_IOUNIT4_PHY_MAX        (4)
+#endif
+
+typedef struct _MPI2_CONFIG_PAGE_SASIOUNIT_4
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                         /* 0x00 */
+    MPI2_SAS_IOUNIT4_SPINUP_GROUP       SpinupGroupParameters[4];       /* 0x08 */
+    U32                                 Reserved1;                      /* 0x18 */
+    U32                                 Reserved2;                      /* 0x1C */
+    U32                                 Reserved3;                      /* 0x20 */
+    U8                                  BootDeviceWaitTime;             /* 0x24 */
+    U8                                  Reserved4;                      /* 0x25 */
+    U16                                 Reserved5;                      /* 0x26 */
+    U8                                  NumPhys;                        /* 0x28 */
+    U8                                  PEInitialSpinupDelay;           /* 0x29 */
+    U8                                  PEReplyDelay;                   /* 0x2A */
+    U8                                  Flags;                          /* 0x2B */
+    U8                                  PHY[MPI2_SAS_IOUNIT4_PHY_MAX];  /* 0x2C */
+} MPI2_CONFIG_PAGE_SASIOUNIT_4,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SASIOUNIT_4,
+  Mpi2SasIOUnitPage4_t, MPI2_POINTER pMpi2SasIOUnitPage4_t;
+
+#define MPI2_SASIOUNITPAGE4_PAGEVERSION     (0x02)
+
+/* defines for Flags field */
+#define MPI2_SASIOUNIT4_FLAGS_AUTO_PORTENABLE               (0x01)
+
+/* defines for PHY field */
+#define MPI2_SASIOUNIT4_PHY_SPINUP_GROUP_MASK               (0x03)
+
+
+/****************************************************************************
+*   SAS Expander Config Pages
+****************************************************************************/
+
+/* SAS Expander Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_EXPANDER_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U8                                  PhysicalPort;               /* 0x08 */
+    U8                                  ReportGenLength;            /* 0x09 */
+    U16                                 EnclosureHandle;            /* 0x0A */
+    U64                                 SASAddress;                 /* 0x0C */
+    U32                                 DiscoveryStatus;            /* 0x14 */
+    U16                                 DevHandle;                  /* 0x18 */
+    U16                                 ParentDevHandle;            /* 0x1A */
+    U16                                 ExpanderChangeCount;        /* 0x1C */
+    U16                                 ExpanderRouteIndexes;       /* 0x1E */
+    U8                                  NumPhys;                    /* 0x20 */
+    U8                                  SASLevel;                   /* 0x21 */
+    U16                                 Flags;                      /* 0x22 */
+    U16                                 STPBusInactivityTimeLimit;  /* 0x24 */
+    U16                                 STPMaxConnectTimeLimit;     /* 0x26 */
+    U16                                 STP_SMP_NexusLossTime;      /* 0x28 */
+    U16                                 MaxNumRoutedSasAddresses;   /* 0x2A */
+    U64                                 ActiveZoneManagerSASAddress;/* 0x2C */
+    U16                                 ZoneLockInactivityLimit;    /* 0x34 */
+    U16                                 Reserved1;                  /* 0x36 */
+} MPI2_CONFIG_PAGE_EXPANDER_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_EXPANDER_0,
+  Mpi2ExpanderPage0_t, MPI2_POINTER pMpi2ExpanderPage0_t;
+
+#define MPI2_SASEXPANDER0_PAGEVERSION       (0x05)
+
+/* values for SAS Expander Page 0 DiscoveryStatus field */
+#define MPI2_SAS_EXPANDER0_DS_MAX_ENCLOSURES_EXCEED         (0x80000000)
+#define MPI2_SAS_EXPANDER0_DS_MAX_EXPANDERS_EXCEED          (0x40000000)
+#define MPI2_SAS_EXPANDER0_DS_MAX_DEVICES_EXCEED            (0x20000000)
+#define MPI2_SAS_EXPANDER0_DS_MAX_TOPO_PHYS_EXCEED          (0x10000000)
+#define MPI2_SAS_EXPANDER0_DS_DOWNSTREAM_INITIATOR          (0x08000000)
+#define MPI2_SAS_EXPANDER0_DS_MULTI_SUBTRACTIVE_SUBTRACTIVE (0x00008000)
+#define MPI2_SAS_EXPANDER0_DS_EXP_MULTI_SUBTRACTIVE         (0x00004000)
+#define MPI2_SAS_EXPANDER0_DS_MULTI_PORT_DOMAIN             (0x00002000)
+#define MPI2_SAS_EXPANDER0_DS_TABLE_TO_SUBTRACTIVE_LINK     (0x00001000)
+#define MPI2_SAS_EXPANDER0_DS_UNSUPPORTED_DEVICE            (0x00000800)
+#define MPI2_SAS_EXPANDER0_DS_TABLE_LINK                    (0x00000400)
+#define MPI2_SAS_EXPANDER0_DS_SUBTRACTIVE_LINK              (0x00000200)
+#define MPI2_SAS_EXPANDER0_DS_SMP_CRC_ERROR                 (0x00000100)
+#define MPI2_SAS_EXPANDER0_DS_SMP_FUNCTION_FAILED           (0x00000080)
+#define MPI2_SAS_EXPANDER0_DS_INDEX_NOT_EXIST               (0x00000040)
+#define MPI2_SAS_EXPANDER0_DS_OUT_ROUTE_ENTRIES             (0x00000020)
+#define MPI2_SAS_EXPANDER0_DS_SMP_TIMEOUT                   (0x00000010)
+#define MPI2_SAS_EXPANDER0_DS_MULTIPLE_PORTS                (0x00000004)
+#define MPI2_SAS_EXPANDER0_DS_UNADDRESSABLE_DEVICE          (0x00000002)
+#define MPI2_SAS_EXPANDER0_DS_LOOP_DETECTED                 (0x00000001)
+
+/* values for SAS Expander Page 0 Flags field */
+#define MPI2_SAS_EXPANDER0_FLAGS_ZONE_LOCKED                (0x1000)
+#define MPI2_SAS_EXPANDER0_FLAGS_SUPPORTED_PHYSICAL_PRES    (0x0800)
+#define MPI2_SAS_EXPANDER0_FLAGS_ASSERTED_PHYSICAL_PRES     (0x0400)
+#define MPI2_SAS_EXPANDER0_FLAGS_ZONING_SUPPORT             (0x0200)
+#define MPI2_SAS_EXPANDER0_FLAGS_ENABLED_ZONING             (0x0100)
+#define MPI2_SAS_EXPANDER0_FLAGS_TABLE_TO_TABLE_SUPPORT     (0x0080)
+#define MPI2_SAS_EXPANDER0_FLAGS_CONNECTOR_END_DEVICE       (0x0010)
+#define MPI2_SAS_EXPANDER0_FLAGS_OTHERS_CONFIG              (0x0004)
+#define MPI2_SAS_EXPANDER0_FLAGS_CONFIG_IN_PROGRESS         (0x0002)
+#define MPI2_SAS_EXPANDER0_FLAGS_ROUTE_TABLE_CONFIG         (0x0001)
+
+
+/* SAS Expander Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_EXPANDER_1
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U8                                  PhysicalPort;               /* 0x08 */
+    U8                                  Reserved1;                  /* 0x09 */
+    U16                                 Reserved2;                  /* 0x0A */
+    U8                                  NumPhys;                    /* 0x0C */
+    U8                                  Phy;                        /* 0x0D */
+    U16                                 NumTableEntriesProgrammed;  /* 0x0E */
+    U8                                  ProgrammedLinkRate;         /* 0x10 */
+    U8                                  HwLinkRate;                 /* 0x11 */
+    U16                                 AttachedDevHandle;          /* 0x12 */
+    U32                                 PhyInfo;                    /* 0x14 */
+    U32                                 AttachedDeviceInfo;         /* 0x18 */
+    U16                                 ExpanderDevHandle;          /* 0x1C */
+    U8                                  ChangeCount;                /* 0x1E */
+    U8                                  NegotiatedLinkRate;         /* 0x1F */
+    U8                                  PhyIdentifier;              /* 0x20 */
+    U8                                  AttachedPhyIdentifier;      /* 0x21 */
+    U8                                  Reserved3;                  /* 0x22 */
+    U8                                  DiscoveryInfo;              /* 0x23 */
+    U32                                 AttachedPhyInfo;            /* 0x24 */
+    U8                                  ZoneGroup;                  /* 0x28 */
+    U8                                  SelfConfigStatus;           /* 0x29 */
+    U16                                 Reserved4;                  /* 0x2A */
+} MPI2_CONFIG_PAGE_EXPANDER_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_EXPANDER_1,
+  Mpi2ExpanderPage1_t, MPI2_POINTER pMpi2ExpanderPage1_t;
+
+#define MPI2_SASEXPANDER1_PAGEVERSION       (0x02)
+
+/* use MPI2_SAS_PRATE_ defines for the ProgrammedLinkRate field */
+
+/* use MPI2_SAS_HWRATE_ defines for the HwLinkRate field */
+
+/* use MPI2_SAS_PHYINFO_ for the PhyInfo field */
+
+/* see mpi2_sas.h for the MPI2_SAS_DEVICE_INFO_ defines used for the AttachedDeviceInfo field */
+
+/* use MPI2_SAS_NEG_LINK_RATE_ defines for the NegotiatedLinkRate field */
+
+/* use MPI2_SAS_APHYINFO_ defines for AttachedPhyInfo field */
+
+/* values for SAS Expander Page 1 DiscoveryInfo field */
+#define MPI2_SAS_EXPANDER1_DISCINFO_BAD_PHY_DISABLED    (0x04)
+#define MPI2_SAS_EXPANDER1_DISCINFO_LINK_STATUS_CHANGE  (0x02)
+#define MPI2_SAS_EXPANDER1_DISCINFO_NO_ROUTING_ENTRIES  (0x01)
+
+
+/****************************************************************************
+*   SAS Device Config Pages
+****************************************************************************/
+
+/* SAS Device Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_DEV_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                 /* 0x00 */
+    U16                                 Slot;                   /* 0x08 */
+    U16                                 EnclosureHandle;        /* 0x0A */
+    U64                                 SASAddress;             /* 0x0C */
+    U16                                 ParentDevHandle;        /* 0x14 */
+    U8                                  PhyNum;                 /* 0x16 */
+    U8                                  AccessStatus;           /* 0x17 */
+    U16                                 DevHandle;              /* 0x18 */
+    U8                                  AttachedPhyIdentifier;  /* 0x1A */
+    U8                                  ZoneGroup;              /* 0x1B */
+    U32                                 DeviceInfo;             /* 0x1C */
+    U16                                 Flags;                  /* 0x20 */
+    U8                                  PhysicalPort;           /* 0x22 */
+    U8                                  MaxPortConnections;     /* 0x23 */
+    U64                                 DeviceName;             /* 0x24 */
+    U8                                  PortGroups;             /* 0x2C */
+    U8                                  DmaGroup;               /* 0x2D */
+    U8                                  ControlGroup;           /* 0x2E */
+    U8                                  Reserved1;              /* 0x2F */
+    U32                                 Reserved2;              /* 0x30 */
+    U32                                 Reserved3;              /* 0x34 */
+} MPI2_CONFIG_PAGE_SAS_DEV_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_DEV_0,
+  Mpi2SasDevicePage0_t, MPI2_POINTER pMpi2SasDevicePage0_t;
+
+#define MPI2_SASDEVICE0_PAGEVERSION         (0x08)
+
+/* values for SAS Device Page 0 AccessStatus field */
+#define MPI2_SAS_DEVICE0_ASTATUS_NO_ERRORS                  (0x00)
+#define MPI2_SAS_DEVICE0_ASTATUS_SATA_INIT_FAILED           (0x01)
+#define MPI2_SAS_DEVICE0_ASTATUS_SATA_CAPABILITY_FAILED     (0x02)
+#define MPI2_SAS_DEVICE0_ASTATUS_SATA_AFFILIATION_CONFLICT  (0x03)
+#define MPI2_SAS_DEVICE0_ASTATUS_SATA_NEEDS_INITIALIZATION  (0x04)
+#define MPI2_SAS_DEVICE0_ASTATUS_ROUTE_NOT_ADDRESSABLE      (0x05)
+#define MPI2_SAS_DEVICE0_ASTATUS_SMP_ERROR_NOT_ADDRESSABLE  (0x06)
+#define MPI2_SAS_DEVICE0_ASTATUS_DEVICE_BLOCKED             (0x07)
+/* specific values for SATA Init failures */
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_UNKNOWN                (0x10)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_AFFILIATION_CONFLICT   (0x11)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_DIAG                   (0x12)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_IDENTIFICATION         (0x13)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_CHECK_POWER            (0x14)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_PIO_SN                 (0x15)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_MDMA_SN                (0x16)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_UDMA_SN                (0x17)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_ZONING_VIOLATION       (0x18)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_NOT_ADDRESSABLE        (0x19)
+#define MPI2_SAS_DEVICE0_ASTATUS_SIF_MAX                    (0x1F)
+
+/* see mpi2_sas.h for values for SAS Device Page 0 DeviceInfo values */
+
+/* values for SAS Device Page 0 Flags field */
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_ASYNCHRONOUS_NOTIFY     (0x0400)
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_SW_PRESERVE             (0x0200)
+#define MPI2_SAS_DEVICE0_FLAGS_UNSUPPORTED_DEVICE           (0x0100)
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_48BIT_LBA_SUPPORTED     (0x0080)
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_SMART_SUPPORTED         (0x0040)
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_NCQ_SUPPORTED           (0x0020)
+#define MPI2_SAS_DEVICE0_FLAGS_SATA_FUA_SUPPORTED           (0x0010)
+#define MPI2_SAS_DEVICE0_FLAGS_PORT_SELECTOR_ATTACH         (0x0008)
+#define MPI2_SAS_DEVICE0_FLAGS_DEVICE_PRESENT               (0x0001)
+
+
+/* SAS Device Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_DEV_1
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                 /* 0x00 */
+    U32                                 Reserved1;              /* 0x08 */
+    U64                                 SASAddress;             /* 0x0C */
+    U32                                 Reserved2;              /* 0x14 */
+    U16                                 DevHandle;              /* 0x18 */
+    U16                                 Reserved3;              /* 0x1A */
+    U8                                  InitialRegDeviceFIS[20];/* 0x1C */
+} MPI2_CONFIG_PAGE_SAS_DEV_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_DEV_1,
+  Mpi2SasDevicePage1_t, MPI2_POINTER pMpi2SasDevicePage1_t;
+
+#define MPI2_SASDEVICE1_PAGEVERSION         (0x01)
+
+
+/****************************************************************************
+*   SAS PHY Config Pages
+****************************************************************************/
+
+/* SAS PHY Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_PHY_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                 /* 0x00 */
+    U16                                 OwnerDevHandle;         /* 0x08 */
+    U16                                 Reserved1;              /* 0x0A */
+    U16                                 AttachedDevHandle;      /* 0x0C */
+    U8                                  AttachedPhyIdentifier;  /* 0x0E */
+    U8                                  Reserved2;              /* 0x0F */
+    U32                                 AttachedPhyInfo;        /* 0x10 */
+    U8                                  ProgrammedLinkRate;     /* 0x14 */
+    U8                                  HwLinkRate;             /* 0x15 */
+    U8                                  ChangeCount;            /* 0x16 */
+    U8                                  Flags;                  /* 0x17 */
+    U32                                 PhyInfo;                /* 0x18 */
+    U8                                  NegotiatedLinkRate;     /* 0x1C */
+    U8                                  Reserved3;              /* 0x1D */
+    U16                                 Reserved4;              /* 0x1E */
+} MPI2_CONFIG_PAGE_SAS_PHY_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_PHY_0,
+  Mpi2SasPhyPage0_t, MPI2_POINTER pMpi2SasPhyPage0_t;
+
+#define MPI2_SASPHY0_PAGEVERSION            (0x03)
+
+/* use MPI2_SAS_PRATE_ defines for the ProgrammedLinkRate field */
+
+/* use MPI2_SAS_HWRATE_ defines for the HwLinkRate field */
+
+/* values for SAS PHY Page 0 Flags field */
+#define MPI2_SAS_PHY0_FLAGS_SGPIO_DIRECT_ATTACH_ENC             (0x01)
+
+/* use MPI2_SAS_APHYINFO_ defines for AttachedPhyInfo field */
+
+/* use MPI2_SAS_NEG_LINK_RATE_ defines for the NegotiatedLinkRate field */
+
+/* use MPI2_SAS_PHYINFO_ for the PhyInfo field */
+
+
+/* SAS PHY Page 1 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_PHY_1
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U32                                 Reserved1;                  /* 0x08 */
+    U32                                 InvalidDwordCount;          /* 0x0C */
+    U32                                 RunningDisparityErrorCount; /* 0x10 */
+    U32                                 LossDwordSynchCount;        /* 0x14 */
+    U32                                 PhyResetProblemCount;       /* 0x18 */
+} MPI2_CONFIG_PAGE_SAS_PHY_1, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_PHY_1,
+  Mpi2SasPhyPage1_t, MPI2_POINTER pMpi2SasPhyPage1_t;
+
+#define MPI2_SASPHY1_PAGEVERSION            (0x01)
+
+
+/****************************************************************************
+*   SAS Port Config Pages
+****************************************************************************/
+
+/* SAS Port Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_PORT_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U8                                  PortNumber;                 /* 0x08 */
+    U8                                  PhysicalPort;               /* 0x09 */
+    U8                                  PortWidth;                  /* 0x0A */
+    U8                                  PhysicalPortWidth;          /* 0x0B */
+    U8                                  ZoneGroup;                  /* 0x0C */
+    U8                                  Reserved1;                  /* 0x0D */
+    U16                                 Reserved2;                  /* 0x0E */
+    U64                                 SASAddress;                 /* 0x10 */
+    U32                                 DeviceInfo;                 /* 0x18 */
+    U32                                 Reserved3;                  /* 0x1C */
+    U32                                 Reserved4;                  /* 0x20 */
+} MPI2_CONFIG_PAGE_SAS_PORT_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_PORT_0,
+  Mpi2SasPortPage0_t, MPI2_POINTER pMpi2SasPortPage0_t;
+
+#define MPI2_SASPORT0_PAGEVERSION           (0x00)
+
+/* see mpi2_sas.h for values for SAS Port Page 0 DeviceInfo values */
+
+
+/****************************************************************************
+*   SAS Enclosure Config Pages
+****************************************************************************/
+
+/* SAS Enclosure Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_SAS_ENCLOSURE_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U32                                 Reserved1;                  /* 0x08 */
+    U64                                 EnclosureLogicalID;         /* 0x0C */
+    U16                                 Flags;                      /* 0x14 */
+    U16                                 EnclosureHandle;            /* 0x16 */
+    U16                                 NumSlots;                   /* 0x18 */
+    U16                                 StartSlot;                  /* 0x1A */
+    U16                                 Reserved2;                  /* 0x1C */
+    U16                                 SEPDevHandle;               /* 0x1E */
+    U32                                 Reserved3;                  /* 0x20 */
+    U32                                 Reserved4;                  /* 0x24 */
+} MPI2_CONFIG_PAGE_SAS_ENCLOSURE_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_SAS_ENCLOSURE_0,
+  Mpi2SasEnclosurePage0_t, MPI2_POINTER pMpi2SasEnclosurePage0_t;
+
+#define MPI2_SASENCLOSURE0_PAGEVERSION      (0x03)
+
+/* values for SAS Enclosure Page 0 Flags field */
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_MASK              (0x000F)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_UNKNOWN           (0x0000)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_IOC_SES           (0x0001)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_IOC_SGPIO         (0x0002)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_EXP_SGPIO         (0x0003)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_SES_ENCLOSURE     (0x0004)
+#define MPI2_SAS_ENCLS0_FLAGS_MNG_IOC_GPIO          (0x0005)
+
+
+/****************************************************************************
+*   Log Config Page
+****************************************************************************/
+
+/* Log Page 0 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.ExtPageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_LOG_0_NUM_LOG_ENTRIES
+#define MPI2_LOG_0_NUM_LOG_ENTRIES          (1)
+#endif
+
+#define MPI2_LOG_0_LOG_DATA_LENGTH          (0x1C)
+
+typedef struct _MPI2_LOG_0_ENTRY
+{
+    U64         TimeStamp;                          /* 0x00 */
+    U32         Reserved1;                          /* 0x08 */
+    U16         LogSequence;                        /* 0x0C */
+    U16         LogEntryQualifier;                  /* 0x0E */
+    U8          VP_ID;                              /* 0x10 */
+    U8          VF_ID;                              /* 0x11 */
+    U16         Reserved2;                          /* 0x12 */
+    U8          LogData[MPI2_LOG_0_LOG_DATA_LENGTH];/* 0x14 */
+} MPI2_LOG_0_ENTRY, MPI2_POINTER PTR_MPI2_LOG_0_ENTRY,
+  Mpi2Log0Entry_t, MPI2_POINTER pMpi2Log0Entry_t;
+
+/* values for Log Page 0 LogEntry LogEntryQualifier field */
+#define MPI2_LOG_0_ENTRY_QUAL_ENTRY_UNUSED          (0x0000)
+#define MPI2_LOG_0_ENTRY_QUAL_POWER_ON_RESET        (0x0001)
+#define MPI2_LOG_0_ENTRY_QUAL_TIMESTAMP_UPDATE      (0x0002)
+#define MPI2_LOG_0_ENTRY_QUAL_MIN_IMPLEMENT_SPEC    (0x8000)
+#define MPI2_LOG_0_ENTRY_QUAL_MAX_IMPLEMENT_SPEC    (0xFFFF)
+
+typedef struct _MPI2_CONFIG_PAGE_LOG_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U32                                 Reserved1;                  /* 0x08 */
+    U32                                 Reserved2;                  /* 0x0C */
+    U16                                 NumLogEntries;              /* 0x10 */
+    U16                                 Reserved3;                  /* 0x12 */
+    MPI2_LOG_0_ENTRY                    LogEntry[MPI2_LOG_0_NUM_LOG_ENTRIES]; /* 0x14 */
+} MPI2_CONFIG_PAGE_LOG_0, MPI2_POINTER PTR_MPI2_CONFIG_PAGE_LOG_0,
+  Mpi2LogPage0_t, MPI2_POINTER pMpi2LogPage0_t;
+
+#define MPI2_LOG_0_PAGEVERSION              (0x02)
+
+
+/****************************************************************************
+*   RAID Config Page
+****************************************************************************/
+
+/* RAID Page 0 */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check Header.ExtPageLength or NumPhys at runtime.
+ */
+#ifndef MPI2_RAIDCONFIG0_MAX_ELEMENTS
+#define MPI2_RAIDCONFIG0_MAX_ELEMENTS       (1)
+#endif
+
+typedef struct _MPI2_RAIDCONFIG0_CONFIG_ELEMENT
+{
+    U16                     ElementFlags;               /* 0x00 */
+    U16                     VolDevHandle;               /* 0x02 */
+    U8                      HotSparePool;               /* 0x04 */
+    U8                      PhysDiskNum;                /* 0x05 */
+    U16                     PhysDiskDevHandle;          /* 0x06 */
+} MPI2_RAIDCONFIG0_CONFIG_ELEMENT,
+  MPI2_POINTER PTR_MPI2_RAIDCONFIG0_CONFIG_ELEMENT,
+  Mpi2RaidConfig0ConfigElement_t, MPI2_POINTER pMpi2RaidConfig0ConfigElement_t;
+
+/* values for the ElementFlags field */
+#define MPI2_RAIDCONFIG0_EFLAGS_MASK_ELEMENT_TYPE       (0x000F)
+#define MPI2_RAIDCONFIG0_EFLAGS_VOLUME_ELEMENT          (0x0000)
+#define MPI2_RAIDCONFIG0_EFLAGS_VOL_PHYS_DISK_ELEMENT   (0x0001)
+#define MPI2_RAIDCONFIG0_EFLAGS_HOT_SPARE_ELEMENT       (0x0002)
+#define MPI2_RAIDCONFIG0_EFLAGS_OCE_ELEMENT             (0x0003)
+
+
+typedef struct _MPI2_CONFIG_PAGE_RAID_CONFIGURATION_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    U8                                  NumHotSpares;               /* 0x08 */
+    U8                                  NumPhysDisks;               /* 0x09 */
+    U8                                  NumVolumes;                 /* 0x0A */
+    U8                                  ConfigNum;                  /* 0x0B */
+    U32                                 Flags;                      /* 0x0C */
+    U8                                  ConfigGUID[24];             /* 0x10 */
+    U32                                 Reserved1;                  /* 0x28 */
+    U8                                  NumElements;                /* 0x2C */
+    U8                                  Reserved2;                  /* 0x2D */
+    U16                                 Reserved3;                  /* 0x2E */
+    MPI2_RAIDCONFIG0_CONFIG_ELEMENT     ConfigElement[MPI2_RAIDCONFIG0_MAX_ELEMENTS]; /* 0x30 */
+} MPI2_CONFIG_PAGE_RAID_CONFIGURATION_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_RAID_CONFIGURATION_0,
+  Mpi2RaidConfigurationPage0_t, MPI2_POINTER pMpi2RaidConfigurationPage0_t;
+
+#define MPI2_RAIDCONFIG0_PAGEVERSION            (0x00)
+
+/* values for RAID Configuration Page 0 Flags field */
+#define MPI2_RAIDCONFIG0_FLAG_FOREIGN_CONFIG        (0x00000001)
+
+
+/****************************************************************************
+*   Driver Persistent Mapping Config Pages
+****************************************************************************/
+
+/* Driver Persistent Mapping Page 0 */
+
+typedef struct _MPI2_CONFIG_PAGE_DRIVER_MAP0_ENTRY
+{
+    U64                                 PhysicalIdentifier;         /* 0x00 */
+    U16                                 MappingInformation;         /* 0x08 */
+    U16                                 DeviceIndex;                /* 0x0A */
+    U32                                 PhysicalBitsMapping;        /* 0x0C */
+    U32                                 Reserved1;                  /* 0x10 */
+} MPI2_CONFIG_PAGE_DRIVER_MAP0_ENTRY,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_DRIVER_MAP0_ENTRY,
+  Mpi2DriverMap0Entry_t, MPI2_POINTER pMpi2DriverMap0Entry_t;
+
+typedef struct _MPI2_CONFIG_PAGE_DRIVER_MAPPING_0
+{
+    MPI2_CONFIG_EXTENDED_PAGE_HEADER    Header;                     /* 0x00 */
+    MPI2_CONFIG_PAGE_DRIVER_MAP0_ENTRY  Entry;                      /* 0x08 */
+} MPI2_CONFIG_PAGE_DRIVER_MAPPING_0,
+  MPI2_POINTER PTR_MPI2_CONFIG_PAGE_DRIVER_MAPPING_0,
+  Mpi2DriverMappingPage0_t, MPI2_POINTER pMpi2DriverMappingPage0_t;
+
+#define MPI2_DRIVERMAPPING0_PAGEVERSION         (0x00)
+
+/* values for Driver Persistent Mapping Page 0 MappingInformation field */
+#define MPI2_DRVMAP0_MAPINFO_SLOT_MASK              (0x07F0)
+#define MPI2_DRVMAP0_MAPINFO_SLOT_SHIFT             (4)
+#define MPI2_DRVMAP0_MAPINFO_MISSING_MASK           (0x000F)
+
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_init.h b/drivers/scsi/mpt2sas/mpi/mpi2_init.h
new file mode 100644
index 000000000000..f1115f0f0eb2
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_init.h
@@ -0,0 +1,420 @@
+/*
+ *  Copyright (c) 2000-2008 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_init.h
+ *          Title:  MPI SCSI initiator mode messages and structures
+ *  Creation Date:  June 23, 2006
+ *
+ *    mpi2_init.h Version:  02.00.06
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  10-31-07  02.00.01  Fixed name for pMpi2SCSITaskManagementRequest_t.
+ *  12-18-07  02.00.02  Modified Task Management Target Reset Method defines.
+ *  02-29-08  02.00.03  Added Query Task Set and Query Unit Attention.
+ *  03-03-08  02.00.04  Fixed name of struct _MPI2_SCSI_TASK_MANAGE_REPLY.
+ *  05-21-08  02.00.05  Fixed typo in name of Mpi2SepRequest_t.
+ *  10-02-08  02.00.06  Removed Untagged and No Disconnect values from SCSI IO
+ *                      Control field Task Attribute flags.
+ *                      Moved LUN field defines to mpi2.h becasue they are
+ *                      common to many structures.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_INIT_H
+#define MPI2_INIT_H
+
+/*****************************************************************************
+*
+*               SCSI Initiator Messages
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  SCSI IO messages and associated structures
+****************************************************************************/
+
+typedef struct
+{
+    U8                      CDB[20];                    /* 0x00 */
+    U32                     PrimaryReferenceTag;        /* 0x14 */
+    U16                     PrimaryApplicationTag;      /* 0x18 */
+    U16                     PrimaryApplicationTagMask;  /* 0x1A */
+    U32                     TransferLength;             /* 0x1C */
+} MPI2_SCSI_IO_CDB_EEDP32, MPI2_POINTER PTR_MPI2_SCSI_IO_CDB_EEDP32,
+  Mpi2ScsiIoCdbEedp32_t, MPI2_POINTER pMpi2ScsiIoCdbEedp32_t;
+
+/* TBD: I don't think this is needed for MPI2/Gen2 */
+#if 0
+typedef struct
+{
+    U8                      CDB[16];                    /* 0x00 */
+    U32                     DataLength;                 /* 0x10 */
+    U32                     PrimaryReferenceTag;        /* 0x14 */
+    U16                     PrimaryApplicationTag;      /* 0x18 */
+    U16                     PrimaryApplicationTagMask;  /* 0x1A */
+    U32                     TransferLength;             /* 0x1C */
+} MPI2_SCSI_IO32_CDB_EEDP16, MPI2_POINTER PTR_MPI2_SCSI_IO32_CDB_EEDP16,
+  Mpi2ScsiIo32CdbEedp16_t, MPI2_POINTER pMpi2ScsiIo32CdbEedp16_t;
+#endif
+
+typedef union
+{
+    U8                      CDB32[32];
+    MPI2_SCSI_IO_CDB_EEDP32 EEDP32;
+    MPI2_SGE_SIMPLE_UNION   SGE;
+} MPI2_SCSI_IO_CDB_UNION, MPI2_POINTER PTR_MPI2_SCSI_IO_CDB_UNION,
+  Mpi2ScsiIoCdb_t, MPI2_POINTER pMpi2ScsiIoCdb_t;
+
+/* SCSI IO Request Message */
+typedef struct _MPI2_SCSI_IO_REQUEST
+{
+    U16                     DevHandle;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved1;                      /* 0x04 */
+    U8                      Reserved2;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved3;                      /* 0x0A */
+    U32                     SenseBufferLowAddress;          /* 0x0C */
+    U16                     SGLFlags;                       /* 0x10 */
+    U8                      SenseBufferLength;              /* 0x12 */
+    U8                      Reserved4;                      /* 0x13 */
+    U8                      SGLOffset0;                     /* 0x14 */
+    U8                      SGLOffset1;                     /* 0x15 */
+    U8                      SGLOffset2;                     /* 0x16 */
+    U8                      SGLOffset3;                     /* 0x17 */
+    U32                     SkipCount;                      /* 0x18 */
+    U32                     DataLength;                     /* 0x1C */
+    U32                     BidirectionalDataLength;        /* 0x20 */
+    U16                     IoFlags;                        /* 0x24 */
+    U16                     EEDPFlags;                      /* 0x26 */
+    U32                     EEDPBlockSize;                  /* 0x28 */
+    U32                     SecondaryReferenceTag;          /* 0x2C */
+    U16                     SecondaryApplicationTag;        /* 0x30 */
+    U16                     ApplicationTagTranslationMask;  /* 0x32 */
+    U8                      LUN[8];                         /* 0x34 */
+    U32                     Control;                        /* 0x3C */
+    MPI2_SCSI_IO_CDB_UNION  CDB;                            /* 0x40 */
+    MPI2_SGE_IO_UNION       SGL;                            /* 0x60 */
+} MPI2_SCSI_IO_REQUEST, MPI2_POINTER PTR_MPI2_SCSI_IO_REQUEST,
+  Mpi2SCSIIORequest_t, MPI2_POINTER pMpi2SCSIIORequest_t;
+
+/* SCSI IO MsgFlags bits */
+
+/* MsgFlags for SenseBufferAddressSpace */
+#define MPI2_SCSIIO_MSGFLAGS_MASK_SENSE_ADDR        (0x0C)
+#define MPI2_SCSIIO_MSGFLAGS_SYSTEM_SENSE_ADDR      (0x00)
+#define MPI2_SCSIIO_MSGFLAGS_IOCDDR_SENSE_ADDR      (0x04)
+#define MPI2_SCSIIO_MSGFLAGS_IOCPLB_SENSE_ADDR      (0x08)
+#define MPI2_SCSIIO_MSGFLAGS_IOCPLBNTA_SENSE_ADDR   (0x0C)
+
+/* SCSI IO SGLFlags bits */
+
+/* base values for Data Location Address Space */
+#define MPI2_SCSIIO_SGLFLAGS_ADDR_MASK              (0x0C)
+#define MPI2_SCSIIO_SGLFLAGS_SYSTEM_ADDR            (0x00)
+#define MPI2_SCSIIO_SGLFLAGS_IOCDDR_ADDR            (0x04)
+#define MPI2_SCSIIO_SGLFLAGS_IOCPLB_ADDR            (0x08)
+#define MPI2_SCSIIO_SGLFLAGS_IOCPLBNTA_ADDR         (0x0C)
+
+/* base values for Type */
+#define MPI2_SCSIIO_SGLFLAGS_TYPE_MASK              (0x03)
+#define MPI2_SCSIIO_SGLFLAGS_TYPE_MPI               (0x00)
+#define MPI2_SCSIIO_SGLFLAGS_TYPE_IEEE32            (0x01)
+#define MPI2_SCSIIO_SGLFLAGS_TYPE_IEEE64            (0x02)
+
+/* shift values for each sub-field */
+#define MPI2_SCSIIO_SGLFLAGS_SGL3_SHIFT             (12)
+#define MPI2_SCSIIO_SGLFLAGS_SGL2_SHIFT             (8)
+#define MPI2_SCSIIO_SGLFLAGS_SGL1_SHIFT             (4)
+#define MPI2_SCSIIO_SGLFLAGS_SGL0_SHIFT             (0)
+
+/* SCSI IO IoFlags bits */
+
+/* Large CDB Address Space */
+#define MPI2_SCSIIO_CDB_ADDR_MASK                   (0x6000)
+#define MPI2_SCSIIO_CDB_ADDR_SYSTEM                 (0x0000)
+#define MPI2_SCSIIO_CDB_ADDR_IOCDDR                 (0x2000)
+#define MPI2_SCSIIO_CDB_ADDR_IOCPLB                 (0x4000)
+#define MPI2_SCSIIO_CDB_ADDR_IOCPLBNTA              (0x6000)
+
+#define MPI2_SCSIIO_IOFLAGS_LARGE_CDB               (0x1000)
+#define MPI2_SCSIIO_IOFLAGS_BIDIRECTIONAL           (0x0800)
+#define MPI2_SCSIIO_IOFLAGS_MULTICAST               (0x0400)
+#define MPI2_SCSIIO_IOFLAGS_CMD_DETERMINES_DATA_DIR (0x0200)
+#define MPI2_SCSIIO_IOFLAGS_CDBLENGTH_MASK          (0x01FF)
+
+/* SCSI IO EEDPFlags bits */
+
+#define MPI2_SCSIIO_EEDPFLAGS_INC_PRI_REFTAG        (0x8000)
+#define MPI2_SCSIIO_EEDPFLAGS_INC_SEC_REFTAG        (0x4000)
+#define MPI2_SCSIIO_EEDPFLAGS_INC_PRI_APPTAG        (0x2000)
+#define MPI2_SCSIIO_EEDPFLAGS_INC_SEC_APPTAG        (0x1000)
+
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_REFTAG          (0x0400)
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_APPTAG          (0x0200)
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_GUARD           (0x0100)
+
+#define MPI2_SCSIIO_EEDPFLAGS_PASSTHRU_REFTAG       (0x0008)
+
+#define MPI2_SCSIIO_EEDPFLAGS_MASK_OP               (0x0007)
+#define MPI2_SCSIIO_EEDPFLAGS_NOOP_OP               (0x0000)
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_OP              (0x0001)
+#define MPI2_SCSIIO_EEDPFLAGS_STRIP_OP              (0x0002)
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_REMOVE_OP       (0x0003)
+#define MPI2_SCSIIO_EEDPFLAGS_INSERT_OP             (0x0004)
+#define MPI2_SCSIIO_EEDPFLAGS_REPLACE_OP            (0x0006)
+#define MPI2_SCSIIO_EEDPFLAGS_CHECK_REGEN_OP        (0x0007)
+
+/* SCSI IO LUN fields: use MPI2_LUN_ from mpi2.h */
+
+/* SCSI IO Control bits */
+#define MPI2_SCSIIO_CONTROL_ADDCDBLEN_MASK      (0xFC000000)
+#define MPI2_SCSIIO_CONTROL_ADDCDBLEN_SHIFT     (26)
+
+#define MPI2_SCSIIO_CONTROL_DATADIRECTION_MASK  (0x03000000)
+#define MPI2_SCSIIO_CONTROL_NODATATRANSFER      (0x00000000)
+#define MPI2_SCSIIO_CONTROL_WRITE               (0x01000000)
+#define MPI2_SCSIIO_CONTROL_READ                (0x02000000)
+#define MPI2_SCSIIO_CONTROL_BIDIRECTIONAL       (0x03000000)
+
+#define MPI2_SCSIIO_CONTROL_TASKPRI_MASK        (0x00007800)
+#define MPI2_SCSIIO_CONTROL_TASKPRI_SHIFT       (11)
+
+#define MPI2_SCSIIO_CONTROL_TASKATTRIBUTE_MASK  (0x00000700)
+#define MPI2_SCSIIO_CONTROL_SIMPLEQ             (0x00000000)
+#define MPI2_SCSIIO_CONTROL_HEADOFQ             (0x00000100)
+#define MPI2_SCSIIO_CONTROL_ORDEREDQ            (0x00000200)
+#define MPI2_SCSIIO_CONTROL_ACAQ                (0x00000400)
+
+#define MPI2_SCSIIO_CONTROL_TLR_MASK            (0x000000C0)
+#define MPI2_SCSIIO_CONTROL_NO_TLR              (0x00000000)
+#define MPI2_SCSIIO_CONTROL_TLR_ON              (0x00000040)
+#define MPI2_SCSIIO_CONTROL_TLR_OFF             (0x00000080)
+
+
+/* SCSI IO Error Reply Message */
+typedef struct _MPI2_SCSI_IO_REPLY
+{
+    U16                     DevHandle;                      /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved1;                      /* 0x04 */
+    U8                      Reserved2;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved3;                      /* 0x0A */
+    U8                      SCSIStatus;                     /* 0x0C */
+    U8                      SCSIState;                      /* 0x0D */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+    U32                     TransferCount;                  /* 0x14 */
+    U32                     SenseCount;                     /* 0x18 */
+    U32                     ResponseInfo;                   /* 0x1C */
+    U16                     TaskTag;                        /* 0x20 */
+    U16                     Reserved4;                      /* 0x22 */
+    U32                     BidirectionalTransferCount;     /* 0x24 */
+    U32                     Reserved5;                      /* 0x28 */
+    U32                     Reserved6;                      /* 0x2C */
+} MPI2_SCSI_IO_REPLY, MPI2_POINTER PTR_MPI2_SCSI_IO_REPLY,
+  Mpi2SCSIIOReply_t, MPI2_POINTER pMpi2SCSIIOReply_t;
+
+/* SCSI IO Reply SCSIStatus values (SAM-4 status codes) */
+
+#define MPI2_SCSI_STATUS_GOOD                   (0x00)
+#define MPI2_SCSI_STATUS_CHECK_CONDITION        (0x02)
+#define MPI2_SCSI_STATUS_CONDITION_MET          (0x04)
+#define MPI2_SCSI_STATUS_BUSY                   (0x08)
+#define MPI2_SCSI_STATUS_INTERMEDIATE           (0x10)
+#define MPI2_SCSI_STATUS_INTERMEDIATE_CONDMET   (0x14)
+#define MPI2_SCSI_STATUS_RESERVATION_CONFLICT   (0x18)
+#define MPI2_SCSI_STATUS_COMMAND_TERMINATED     (0x22) /* obsolete */
+#define MPI2_SCSI_STATUS_TASK_SET_FULL          (0x28)
+#define MPI2_SCSI_STATUS_ACA_ACTIVE             (0x30)
+#define MPI2_SCSI_STATUS_TASK_ABORTED           (0x40)
+
+/* SCSI IO Reply SCSIState flags */
+
+#define MPI2_SCSI_STATE_RESPONSE_INFO_VALID     (0x10)
+#define MPI2_SCSI_STATE_TERMINATED              (0x08)
+#define MPI2_SCSI_STATE_NO_SCSI_STATUS          (0x04)
+#define MPI2_SCSI_STATE_AUTOSENSE_FAILED        (0x02)
+#define MPI2_SCSI_STATE_AUTOSENSE_VALID         (0x01)
+
+#define MPI2_SCSI_TASKTAG_UNKNOWN               (0xFFFF)
+
+
+/****************************************************************************
+*  SCSI Task Management messages
+****************************************************************************/
+
+/* SCSI Task Management Request Message */
+typedef struct _MPI2_SCSI_TASK_MANAGE_REQUEST
+{
+    U16                     DevHandle;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U8                      Reserved1;                      /* 0x04 */
+    U8                      TaskType;                       /* 0x05 */
+    U8                      Reserved2;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved3;                      /* 0x0A */
+    U8                      LUN[8];                         /* 0x0C */
+    U32                     Reserved4[7];                   /* 0x14 */
+    U16                     TaskMID;                        /* 0x30 */
+    U16                     Reserved5;                      /* 0x32 */
+} MPI2_SCSI_TASK_MANAGE_REQUEST,
+  MPI2_POINTER PTR_MPI2_SCSI_TASK_MANAGE_REQUEST,
+  Mpi2SCSITaskManagementRequest_t,
+  MPI2_POINTER pMpi2SCSITaskManagementRequest_t;
+
+/* TaskType values */
+
+#define MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK           (0x01)
+#define MPI2_SCSITASKMGMT_TASKTYPE_ABRT_TASK_SET        (0x02)
+#define MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET         (0x03)
+#define MPI2_SCSITASKMGMT_TASKTYPE_LOGICAL_UNIT_RESET   (0x05)
+#define MPI2_SCSITASKMGMT_TASKTYPE_CLEAR_TASK_SET       (0x06)
+#define MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK           (0x07)
+#define MPI2_SCSITASKMGMT_TASKTYPE_CLR_ACA              (0x08)
+#define MPI2_SCSITASKMGMT_TASKTYPE_QRY_TASK_SET         (0x09)
+#define MPI2_SCSITASKMGMT_TASKTYPE_QRY_UNIT_ATTENTION   (0x0A)
+
+/* MsgFlags bits */
+
+#define MPI2_SCSITASKMGMT_MSGFLAGS_MASK_TARGET_RESET    (0x18)
+#define MPI2_SCSITASKMGMT_MSGFLAGS_LINK_RESET           (0x00)
+#define MPI2_SCSITASKMGMT_MSGFLAGS_NEXUS_RESET_SRST     (0x08)
+#define MPI2_SCSITASKMGMT_MSGFLAGS_SAS_HARD_LINK_RESET  (0x10)
+
+#define MPI2_SCSITASKMGMT_MSGFLAGS_DO_NOT_SEND_TASK_IU  (0x01)
+
+
+
+/* SCSI Task Management Reply Message */
+typedef struct _MPI2_SCSI_TASK_MANAGE_REPLY
+{
+    U16                     DevHandle;                      /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U8                      ResponseCode;                   /* 0x04 */
+    U8                      TaskType;                       /* 0x05 */
+    U8                      Reserved1;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved2;                      /* 0x0A */
+    U16                     Reserved3;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+    U32                     TerminationCount;               /* 0x14 */
+} MPI2_SCSI_TASK_MANAGE_REPLY,
+  MPI2_POINTER PTR_MPI2_SCSI_TASK_MANAGE_REPLY,
+  Mpi2SCSITaskManagementReply_t, MPI2_POINTER pMpi2SCSIManagementReply_t;
+
+/* ResponseCode values */
+
+#define MPI2_SCSITASKMGMT_RSP_TM_COMPLETE               (0x00)
+#define MPI2_SCSITASKMGMT_RSP_INVALID_FRAME             (0x02)
+#define MPI2_SCSITASKMGMT_RSP_TM_NOT_SUPPORTED          (0x04)
+#define MPI2_SCSITASKMGMT_RSP_TM_FAILED                 (0x05)
+#define MPI2_SCSITASKMGMT_RSP_TM_SUCCEEDED              (0x08)
+#define MPI2_SCSITASKMGMT_RSP_TM_INVALID_LUN            (0x09)
+#define MPI2_SCSITASKMGMT_RSP_IO_QUEUED_ON_IOC          (0x80)
+
+
+/****************************************************************************
+*  SCSI Enclosure Processor messages
+****************************************************************************/
+
+/* SCSI Enclosure Processor Request Message */
+typedef struct _MPI2_SEP_REQUEST
+{
+    U16                     DevHandle;          /* 0x00 */
+    U8                      ChainOffset;        /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U8                      Action;             /* 0x04 */
+    U8                      Flags;              /* 0x05 */
+    U8                      Reserved1;          /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved2;          /* 0x0A */
+    U32                     SlotStatus;         /* 0x0C */
+    U32                     Reserved3;          /* 0x10 */
+    U32                     Reserved4;          /* 0x14 */
+    U32                     Reserved5;          /* 0x18 */
+    U16                     Slot;               /* 0x1C */
+    U16                     EnclosureHandle;    /* 0x1E */
+} MPI2_SEP_REQUEST, MPI2_POINTER PTR_MPI2_SEP_REQUEST,
+  Mpi2SepRequest_t, MPI2_POINTER pMpi2SepRequest_t;
+
+/* Action defines */
+#define MPI2_SEP_REQ_ACTION_WRITE_STATUS                (0x00)
+#define MPI2_SEP_REQ_ACTION_READ_STATUS                 (0x01)
+
+/* Flags defines */
+#define MPI2_SEP_REQ_FLAGS_DEVHANDLE_ADDRESS            (0x00)
+#define MPI2_SEP_REQ_FLAGS_ENCLOSURE_SLOT_ADDRESS       (0x01)
+
+/* SlotStatus defines */
+#define MPI2_SEP_REQ_SLOTSTATUS_REQUEST_REMOVE          (0x00040000)
+#define MPI2_SEP_REQ_SLOTSTATUS_IDENTIFY_REQUEST        (0x00020000)
+#define MPI2_SEP_REQ_SLOTSTATUS_REBUILD_STOPPED         (0x00000200)
+#define MPI2_SEP_REQ_SLOTSTATUS_HOT_SPARE               (0x00000100)
+#define MPI2_SEP_REQ_SLOTSTATUS_UNCONFIGURED            (0x00000080)
+#define MPI2_SEP_REQ_SLOTSTATUS_PREDICTED_FAULT         (0x00000040)
+#define MPI2_SEP_REQ_SLOTSTATUS_DEV_REBUILDING          (0x00000004)
+#define MPI2_SEP_REQ_SLOTSTATUS_DEV_FAULTY              (0x00000002)
+#define MPI2_SEP_REQ_SLOTSTATUS_NO_ERROR                (0x00000001)
+
+
+/* SCSI Enclosure Processor Reply Message */
+typedef struct _MPI2_SEP_REPLY
+{
+    U16                     DevHandle;          /* 0x00 */
+    U8                      MsgLength;          /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U8                      Action;             /* 0x04 */
+    U8                      Flags;              /* 0x05 */
+    U8                      Reserved1;          /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved2;          /* 0x0A */
+    U16                     Reserved3;          /* 0x0C */
+    U16                     IOCStatus;          /* 0x0E */
+    U32                     IOCLogInfo;         /* 0x10 */
+    U32                     SlotStatus;         /* 0x14 */
+    U32                     Reserved4;          /* 0x18 */
+    U16                     Slot;               /* 0x1C */
+    U16                     EnclosureHandle;    /* 0x1E */
+} MPI2_SEP_REPLY, MPI2_POINTER PTR_MPI2_SEP_REPLY,
+  Mpi2SepReply_t, MPI2_POINTER pMpi2SepReply_t;
+
+/* SlotStatus defines */
+#define MPI2_SEP_REPLY_SLOTSTATUS_REMOVE_READY          (0x00040000)
+#define MPI2_SEP_REPLY_SLOTSTATUS_IDENTIFY_REQUEST      (0x00020000)
+#define MPI2_SEP_REPLY_SLOTSTATUS_REBUILD_STOPPED       (0x00000200)
+#define MPI2_SEP_REPLY_SLOTSTATUS_HOT_SPARE             (0x00000100)
+#define MPI2_SEP_REPLY_SLOTSTATUS_UNCONFIGURED          (0x00000080)
+#define MPI2_SEP_REPLY_SLOTSTATUS_PREDICTED_FAULT       (0x00000040)
+#define MPI2_SEP_REPLY_SLOTSTATUS_DEV_REBUILDING        (0x00000004)
+#define MPI2_SEP_REPLY_SLOTSTATUS_DEV_FAULTY            (0x00000002)
+#define MPI2_SEP_REPLY_SLOTSTATUS_NO_ERROR              (0x00000001)
+
+
+#endif
+
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_ioc.h b/drivers/scsi/mpt2sas/mpi/mpi2_ioc.h
new file mode 100644
index 000000000000..8c5d81870c03
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_ioc.h
@@ -0,0 +1,1295 @@
+/*
+ *  Copyright (c) 2000-2009 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_ioc.h
+ *          Title:  MPI IOC, Port, Event, FW Download, and FW Upload messages
+ *  Creation Date:  October 11, 2006
+ *
+ *  mpi2_ioc.h Version:  02.00.10
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  06-04-07  02.00.01  In IOCFacts Reply structure, renamed MaxDevices to
+ *                      MaxTargets.
+ *                      Added TotalImageSize field to FWDownload Request.
+ *                      Added reserved words to FWUpload Request.
+ *  06-26-07  02.00.02  Added IR Configuration Change List Event.
+ *  08-31-07  02.00.03  Removed SystemReplyQueueDepth field from the IOCInit
+ *                      request and replaced it with
+ *                      ReplyDescriptorPostQueueDepth and ReplyFreeQueueDepth.
+ *                      Replaced the MinReplyQueueDepth field of the IOCFacts
+ *                      reply with MaxReplyDescriptorPostQueueDepth.
+ *                      Added MPI2_RDPQ_DEPTH_MIN define to specify the minimum
+ *                      depth for the Reply Descriptor Post Queue.
+ *                      Added SASAddress field to Initiator Device Table
+ *                      Overflow Event data.
+ *  10-31-07  02.00.04  Added ReasonCode MPI2_EVENT_SAS_INIT_RC_NOT_RESPONDING
+ *                      for SAS Initiator Device Status Change Event data.
+ *                      Modified Reason Code defines for SAS Topology Change
+ *                      List Event data, including adding a bit for PHY Vacant
+ *                      status, and adding a mask for the Reason Code.
+ *                      Added define for
+ *                      MPI2_EVENT_SAS_TOPO_ES_DELAY_NOT_RESPONDING.
+ *                      Added define for MPI2_EXT_IMAGE_TYPE_MEGARAID.
+ *  12-18-07  02.00.05  Added Boot Status defines for the IOCExceptions field of
+ *                      the IOCFacts Reply.
+ *                      Removed MPI2_IOCFACTS_CAPABILITY_EXTENDED_BUFFER define.
+ *                      Moved MPI2_VERSION_UNION to mpi2.h.
+ *                      Changed MPI2_EVENT_NOTIFICATION_REQUEST to use masks
+ *                      instead of enables, and added SASBroadcastPrimitiveMasks
+ *                      field.
+ *                      Added Log Entry Added Event and related structure.
+ *  02-29-08  02.00.06  Added define MPI2_IOCFACTS_CAPABILITY_INTEGRATED_RAID.
+ *                      Removed define MPI2_IOCFACTS_PROTOCOL_SMP_TARGET.
+ *                      Added MaxVolumes and MaxPersistentEntries fields to
+ *                      IOCFacts reply.
+ *                      Added ProtocalFlags and IOCCapabilities fields to
+ *                      MPI2_FW_IMAGE_HEADER.
+ *                      Removed MPI2_PORTENABLE_FLAGS_ENABLE_SINGLE_PORT.
+ *  03-03-08  02.00.07  Fixed MPI2_FW_IMAGE_HEADER by changing Reserved26 to
+ *                      a U16 (from a U32).
+ *                      Removed extra 's' from EventMasks name.
+ *  06-27-08  02.00.08  Fixed an offset in a comment.
+ *  10-02-08  02.00.09  Removed SystemReplyFrameSize from MPI2_IOC_INIT_REQUEST.
+ *                      Removed CurReplyFrameSize from MPI2_IOC_FACTS_REPLY and
+ *                      renamed MinReplyFrameSize to ReplyFrameSize.
+ *                      Added MPI2_IOCFACTS_EXCEPT_IR_FOREIGN_CONFIG_MAX.
+ *                      Added two new RAIDOperation values for Integrated RAID
+ *                      Operations Status Event data.
+ *                      Added four new IR Configuration Change List Event data
+ *                      ReasonCode values.
+ *                      Added two new ReasonCode defines for SAS Device Status
+ *                      Change Event data.
+ *                      Added three new DiscoveryStatus bits for the SAS
+ *                      Discovery event data.
+ *                      Added Multiplexing Status Change bit to the PhyStatus
+ *                      field of the SAS Topology Change List event data.
+ *                      Removed define for MPI2_INIT_IMAGE_BOOTFLAGS_XMEMCOPY.
+ *                      BootFlags are now product-specific.
+ *                      Added defines for the indivdual signature bytes
+ *                      for MPI2_INIT_IMAGE_FOOTER.
+ *  01-19-09  02.00.10  Added MPI2_IOCFACTS_CAPABILITY_EVENT_REPLAY define.
+ *                      Added MPI2_EVENT_SAS_DISC_DS_DOWNSTREAM_INITIATOR
+ *                      define.
+ *                      Added MPI2_EVENT_SAS_DEV_STAT_RC_SATA_INIT_FAILURE
+ *                      define.
+ *                      Removed MPI2_EVENT_SAS_DISC_DS_SATA_INIT_FAILURE define.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_IOC_H
+#define MPI2_IOC_H
+
+/*****************************************************************************
+*
+*               IOC Messages
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  IOCInit message
+****************************************************************************/
+
+/* IOCInit Request message */
+typedef struct _MPI2_IOC_INIT_REQUEST
+{
+    U8                      WhoInit;                        /* 0x00 */
+    U8                      Reserved1;                      /* 0x01 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U16                     MsgVersion;                     /* 0x0C */
+    U16                     HeaderVersion;                  /* 0x0E */
+    U32                     Reserved5;                      /* 0x10 */
+    U32                     Reserved6;                      /* 0x14 */
+    U16                     Reserved7;                      /* 0x18 */
+    U16                     SystemRequestFrameSize;         /* 0x1A */
+    U16                     ReplyDescriptorPostQueueDepth;  /* 0x1C */
+    U16                     ReplyFreeQueueDepth;            /* 0x1E */
+    U32                     SenseBufferAddressHigh;         /* 0x20 */
+    U32                     SystemReplyAddressHigh;         /* 0x24 */
+    U64                     SystemRequestFrameBaseAddress;  /* 0x28 */
+    U64                     ReplyDescriptorPostQueueAddress;/* 0x30 */
+    U64                     ReplyFreeQueueAddress;          /* 0x38 */
+    U64                     TimeStamp;                      /* 0x40 */
+} MPI2_IOC_INIT_REQUEST, MPI2_POINTER PTR_MPI2_IOC_INIT_REQUEST,
+  Mpi2IOCInitRequest_t, MPI2_POINTER pMpi2IOCInitRequest_t;
+
+/* WhoInit values */
+#define MPI2_WHOINIT_NOT_INITIALIZED            (0x00)
+#define MPI2_WHOINIT_SYSTEM_BIOS                (0x01)
+#define MPI2_WHOINIT_ROM_BIOS                   (0x02)
+#define MPI2_WHOINIT_PCI_PEER                   (0x03)
+#define MPI2_WHOINIT_HOST_DRIVER                (0x04)
+#define MPI2_WHOINIT_MANUFACTURER               (0x05)
+
+/* MsgVersion */
+#define MPI2_IOCINIT_MSGVERSION_MAJOR_MASK      (0xFF00)
+#define MPI2_IOCINIT_MSGVERSION_MAJOR_SHIFT     (8)
+#define MPI2_IOCINIT_MSGVERSION_MINOR_MASK      (0x00FF)
+#define MPI2_IOCINIT_MSGVERSION_MINOR_SHIFT     (0)
+
+/* HeaderVersion */
+#define MPI2_IOCINIT_HDRVERSION_UNIT_MASK       (0xFF00)
+#define MPI2_IOCINIT_HDRVERSION_UNIT_SHIFT      (8)
+#define MPI2_IOCINIT_HDRVERSION_DEV_MASK        (0x00FF)
+#define MPI2_IOCINIT_HDRVERSION_DEV_SHIFT       (0)
+
+/* minimum depth for the Reply Descriptor Post Queue */
+#define MPI2_RDPQ_DEPTH_MIN                     (16)
+
+
+/* IOCInit Reply message */
+typedef struct _MPI2_IOC_INIT_REPLY
+{
+    U8                      WhoInit;                        /* 0x00 */
+    U8                      Reserved1;                      /* 0x01 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U16                     Reserved5;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+} MPI2_IOC_INIT_REPLY, MPI2_POINTER PTR_MPI2_IOC_INIT_REPLY,
+  Mpi2IOCInitReply_t, MPI2_POINTER pMpi2IOCInitReply_t;
+
+
+/****************************************************************************
+*  IOCFacts message
+****************************************************************************/
+
+/* IOCFacts Request message */
+typedef struct _MPI2_IOC_FACTS_REQUEST
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+} MPI2_IOC_FACTS_REQUEST, MPI2_POINTER PTR_MPI2_IOC_FACTS_REQUEST,
+  Mpi2IOCFactsRequest_t, MPI2_POINTER pMpi2IOCFactsRequest_t;
+
+
+/* IOCFacts Reply message */
+typedef struct _MPI2_IOC_FACTS_REPLY
+{
+    U16                     MsgVersion;                     /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     HeaderVersion;                  /* 0x04 */
+    U8                      IOCNumber;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved1;                      /* 0x0A */
+    U16                     IOCExceptions;                  /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+    U8                      MaxChainDepth;                  /* 0x14 */
+    U8                      WhoInit;                        /* 0x15 */
+    U8                      NumberOfPorts;                  /* 0x16 */
+    U8                      Reserved2;                      /* 0x17 */
+    U16                     RequestCredit;                  /* 0x18 */
+    U16                     ProductID;                      /* 0x1A */
+    U32                     IOCCapabilities;                /* 0x1C */
+    MPI2_VERSION_UNION      FWVersion;                      /* 0x20 */
+    U16                     IOCRequestFrameSize;            /* 0x24 */
+    U16                     Reserved3;                      /* 0x26 */
+    U16                     MaxInitiators;                  /* 0x28 */
+    U16                     MaxTargets;                     /* 0x2A */
+    U16                     MaxSasExpanders;                /* 0x2C */
+    U16                     MaxEnclosures;                  /* 0x2E */
+    U16                     ProtocolFlags;                  /* 0x30 */
+    U16                     HighPriorityCredit;             /* 0x32 */
+    U16                     MaxReplyDescriptorPostQueueDepth; /* 0x34 */
+    U8                      ReplyFrameSize;                 /* 0x36 */
+    U8                      MaxVolumes;                     /* 0x37 */
+    U16                     MaxDevHandle;                   /* 0x38 */
+    U16                     MaxPersistentEntries;           /* 0x3A */
+    U32                     Reserved4;                      /* 0x3C */
+} MPI2_IOC_FACTS_REPLY, MPI2_POINTER PTR_MPI2_IOC_FACTS_REPLY,
+  Mpi2IOCFactsReply_t, MPI2_POINTER pMpi2IOCFactsReply_t;
+
+/* MsgVersion */
+#define MPI2_IOCFACTS_MSGVERSION_MAJOR_MASK             (0xFF00)
+#define MPI2_IOCFACTS_MSGVERSION_MAJOR_SHIFT            (8)
+#define MPI2_IOCFACTS_MSGVERSION_MINOR_MASK             (0x00FF)
+#define MPI2_IOCFACTS_MSGVERSION_MINOR_SHIFT            (0)
+
+/* HeaderVersion */
+#define MPI2_IOCFACTS_HDRVERSION_UNIT_MASK              (0xFF00)
+#define MPI2_IOCFACTS_HDRVERSION_UNIT_SHIFT             (8)
+#define MPI2_IOCFACTS_HDRVERSION_DEV_MASK               (0x00FF)
+#define MPI2_IOCFACTS_HDRVERSION_DEV_SHIFT              (0)
+
+/* IOCExceptions */
+#define MPI2_IOCFACTS_EXCEPT_IR_FOREIGN_CONFIG_MAX      (0x0100)
+
+#define MPI2_IOCFACTS_EXCEPT_BOOTSTAT_MASK              (0x00E0)
+#define MPI2_IOCFACTS_EXCEPT_BOOTSTAT_GOOD              (0x0000)
+#define MPI2_IOCFACTS_EXCEPT_BOOTSTAT_BACKUP            (0x0020)
+#define MPI2_IOCFACTS_EXCEPT_BOOTSTAT_RESTORED          (0x0040)
+#define MPI2_IOCFACTS_EXCEPT_BOOTSTAT_CORRUPT_BACKUP    (0x0060)
+
+#define MPI2_IOCFACTS_EXCEPT_METADATA_UNSUPPORTED       (0x0010)
+#define MPI2_IOCFACTS_EXCEPT_MANUFACT_CHECKSUM_FAIL     (0x0008)
+#define MPI2_IOCFACTS_EXCEPT_FW_CHECKSUM_FAIL           (0x0004)
+#define MPI2_IOCFACTS_EXCEPT_RAID_CONFIG_INVALID        (0x0002)
+#define MPI2_IOCFACTS_EXCEPT_CONFIG_CHECKSUM_FAIL       (0x0001)
+
+/* defines for WhoInit field are after the IOCInit Request */
+
+/* ProductID field uses MPI2_FW_HEADER_PID_ */
+
+/* IOCCapabilities */
+#define MPI2_IOCFACTS_CAPABILITY_EVENT_REPLAY           (0x00002000)
+#define MPI2_IOCFACTS_CAPABILITY_INTEGRATED_RAID        (0x00001000)
+#define MPI2_IOCFACTS_CAPABILITY_TLR                    (0x00000800)
+#define MPI2_IOCFACTS_CAPABILITY_MULTICAST              (0x00000100)
+#define MPI2_IOCFACTS_CAPABILITY_BIDIRECTIONAL_TARGET   (0x00000080)
+#define MPI2_IOCFACTS_CAPABILITY_EEDP                   (0x00000040)
+#define MPI2_IOCFACTS_CAPABILITY_SNAPSHOT_BUFFER        (0x00000010)
+#define MPI2_IOCFACTS_CAPABILITY_DIAG_TRACE_BUFFER      (0x00000008)
+#define MPI2_IOCFACTS_CAPABILITY_TASK_SET_FULL_HANDLING (0x00000004)
+
+/* ProtocolFlags */
+#define MPI2_IOCFACTS_PROTOCOL_SCSI_TARGET              (0x0001)
+#define MPI2_IOCFACTS_PROTOCOL_SCSI_INITIATOR           (0x0002)
+
+
+/****************************************************************************
+*  PortFacts message
+****************************************************************************/
+
+/* PortFacts Request message */
+typedef struct _MPI2_PORT_FACTS_REQUEST
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      PortNumber;                     /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved3;                      /* 0x0A */
+} MPI2_PORT_FACTS_REQUEST, MPI2_POINTER PTR_MPI2_PORT_FACTS_REQUEST,
+  Mpi2PortFactsRequest_t, MPI2_POINTER pMpi2PortFactsRequest_t;
+
+/* PortFacts Reply message */
+typedef struct _MPI2_PORT_FACTS_REPLY
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      PortNumber;                     /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved3;                      /* 0x0A */
+    U16                     Reserved4;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+    U8                      Reserved5;                      /* 0x14 */
+    U8                      PortType;                       /* 0x15 */
+    U16                     Reserved6;                      /* 0x16 */
+    U16                     MaxPostedCmdBuffers;            /* 0x18 */
+    U16                     Reserved7;                      /* 0x1A */
+} MPI2_PORT_FACTS_REPLY, MPI2_POINTER PTR_MPI2_PORT_FACTS_REPLY,
+  Mpi2PortFactsReply_t, MPI2_POINTER pMpi2PortFactsReply_t;
+
+/* PortType values */
+#define MPI2_PORTFACTS_PORTTYPE_INACTIVE            (0x00)
+#define MPI2_PORTFACTS_PORTTYPE_FC                  (0x10)
+#define MPI2_PORTFACTS_PORTTYPE_ISCSI               (0x20)
+#define MPI2_PORTFACTS_PORTTYPE_SAS_PHYSICAL        (0x30)
+#define MPI2_PORTFACTS_PORTTYPE_SAS_VIRTUAL         (0x31)
+
+
+/****************************************************************************
+*  PortEnable message
+****************************************************************************/
+
+/* PortEnable Request message */
+typedef struct _MPI2_PORT_ENABLE_REQUEST
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U8                      Reserved2;                      /* 0x04 */
+    U8                      PortFlags;                      /* 0x05 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+} MPI2_PORT_ENABLE_REQUEST, MPI2_POINTER PTR_MPI2_PORT_ENABLE_REQUEST,
+  Mpi2PortEnableRequest_t, MPI2_POINTER pMpi2PortEnableRequest_t;
+
+
+/* PortEnable Reply message */
+typedef struct _MPI2_PORT_ENABLE_REPLY
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U8                      Reserved2;                      /* 0x04 */
+    U8                      PortFlags;                      /* 0x05 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U16                     Reserved5;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+} MPI2_PORT_ENABLE_REPLY, MPI2_POINTER PTR_MPI2_PORT_ENABLE_REPLY,
+  Mpi2PortEnableReply_t, MPI2_POINTER pMpi2PortEnableReply_t;
+
+
+/****************************************************************************
+*  EventNotification message
+****************************************************************************/
+
+/* EventNotification Request message */
+#define MPI2_EVENT_NOTIFY_EVENTMASK_WORDS           (4)
+
+typedef struct _MPI2_EVENT_NOTIFICATION_REQUEST
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U32                     Reserved5;                      /* 0x0C */
+    U32                     Reserved6;                      /* 0x10 */
+    U32                     EventMasks[MPI2_EVENT_NOTIFY_EVENTMASK_WORDS];/* 0x14 */
+    U16                     SASBroadcastPrimitiveMasks;     /* 0x24 */
+    U16                     Reserved7;                      /* 0x26 */
+    U32                     Reserved8;                      /* 0x28 */
+} MPI2_EVENT_NOTIFICATION_REQUEST,
+  MPI2_POINTER PTR_MPI2_EVENT_NOTIFICATION_REQUEST,
+  Mpi2EventNotificationRequest_t, MPI2_POINTER pMpi2EventNotificationRequest_t;
+
+
+/* EventNotification Reply message */
+typedef struct _MPI2_EVENT_NOTIFICATION_REPLY
+{
+    U16                     EventDataLength;                /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved1;                      /* 0x04 */
+    U8                      AckRequired;                    /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved2;                      /* 0x0A */
+    U16                     Reserved3;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+    U16                     Event;                          /* 0x14 */
+    U16                     Reserved4;                      /* 0x16 */
+    U32                     EventContext;                   /* 0x18 */
+    U32                     EventData[1];                   /* 0x1C */
+} MPI2_EVENT_NOTIFICATION_REPLY, MPI2_POINTER PTR_MPI2_EVENT_NOTIFICATION_REPLY,
+  Mpi2EventNotificationReply_t, MPI2_POINTER pMpi2EventNotificationReply_t;
+
+/* AckRequired */
+#define MPI2_EVENT_NOTIFICATION_ACK_NOT_REQUIRED    (0x00)
+#define MPI2_EVENT_NOTIFICATION_ACK_REQUIRED        (0x01)
+
+/* Event */
+#define MPI2_EVENT_LOG_DATA                         (0x0001)
+#define MPI2_EVENT_STATE_CHANGE                     (0x0002)
+#define MPI2_EVENT_HARD_RESET_RECEIVED              (0x0005)
+#define MPI2_EVENT_EVENT_CHANGE                     (0x000A)
+#define MPI2_EVENT_TASK_SET_FULL                    (0x000E)
+#define MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE         (0x000F)
+#define MPI2_EVENT_IR_OPERATION_STATUS              (0x0014)
+#define MPI2_EVENT_SAS_DISCOVERY                    (0x0016)
+#define MPI2_EVENT_SAS_BROADCAST_PRIMITIVE          (0x0017)
+#define MPI2_EVENT_SAS_INIT_DEVICE_STATUS_CHANGE    (0x0018)
+#define MPI2_EVENT_SAS_INIT_TABLE_OVERFLOW          (0x0019)
+#define MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST         (0x001C)
+#define MPI2_EVENT_SAS_ENCL_DEVICE_STATUS_CHANGE    (0x001D)
+#define MPI2_EVENT_IR_VOLUME                        (0x001E)
+#define MPI2_EVENT_IR_PHYSICAL_DISK                 (0x001F)
+#define MPI2_EVENT_IR_CONFIGURATION_CHANGE_LIST     (0x0020)
+#define MPI2_EVENT_LOG_ENTRY_ADDED                  (0x0021)
+
+
+/* Log Entry Added Event data */
+
+/* the following structure matches MPI2_LOG_0_ENTRY in mpi2_cnfg.h */
+#define MPI2_EVENT_DATA_LOG_DATA_LENGTH             (0x1C)
+
+typedef struct _MPI2_EVENT_DATA_LOG_ENTRY_ADDED
+{
+    U64         TimeStamp;                          /* 0x00 */
+    U32         Reserved1;                          /* 0x08 */
+    U16         LogSequence;                        /* 0x0C */
+    U16         LogEntryQualifier;                  /* 0x0E */
+    U8          VP_ID;                              /* 0x10 */
+    U8          VF_ID;                              /* 0x11 */
+    U16         Reserved2;                          /* 0x12 */
+    U8          LogData[MPI2_EVENT_DATA_LOG_DATA_LENGTH];/* 0x14 */
+} MPI2_EVENT_DATA_LOG_ENTRY_ADDED,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_LOG_ENTRY_ADDED,
+  Mpi2EventDataLogEntryAdded_t, MPI2_POINTER pMpi2EventDataLogEntryAdded_t;
+
+/* Hard Reset Received Event data */
+
+typedef struct _MPI2_EVENT_DATA_HARD_RESET_RECEIVED
+{
+    U8                      Reserved1;                      /* 0x00 */
+    U8                      Port;                           /* 0x01 */
+    U16                     Reserved2;                      /* 0x02 */
+} MPI2_EVENT_DATA_HARD_RESET_RECEIVED,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_HARD_RESET_RECEIVED,
+  Mpi2EventDataHardResetReceived_t,
+  MPI2_POINTER pMpi2EventDataHardResetReceived_t;
+
+/* Task Set Full Event data */
+
+typedef struct _MPI2_EVENT_DATA_TASK_SET_FULL
+{
+    U16                     DevHandle;                      /* 0x00 */
+    U16                     CurrentDepth;                   /* 0x02 */
+} MPI2_EVENT_DATA_TASK_SET_FULL, MPI2_POINTER PTR_MPI2_EVENT_DATA_TASK_SET_FULL,
+  Mpi2EventDataTaskSetFull_t, MPI2_POINTER pMpi2EventDataTaskSetFull_t;
+
+
+/* SAS Device Status Change Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE
+{
+    U16                     TaskTag;                        /* 0x00 */
+    U8                      ReasonCode;                     /* 0x02 */
+    U8                      Reserved1;                      /* 0x03 */
+    U8                      ASC;                            /* 0x04 */
+    U8                      ASCQ;                           /* 0x05 */
+    U16                     DevHandle;                      /* 0x06 */
+    U32                     Reserved2;                      /* 0x08 */
+    U64                     SASAddress;                     /* 0x0C */
+    U8                      LUN[8];                         /* 0x14 */
+} MPI2_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE,
+  Mpi2EventDataSasDeviceStatusChange_t,
+  MPI2_POINTER pMpi2EventDataSasDeviceStatusChange_t;
+
+/* SAS Device Status Change Event data ReasonCode values */
+#define MPI2_EVENT_SAS_DEV_STAT_RC_SMART_DATA               (0x05)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_UNSUPPORTED              (0x07)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET    (0x08)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_TASK_ABORT_INTERNAL      (0x09)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_ABORT_TASK_SET_INTERNAL  (0x0A)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_CLEAR_TASK_SET_INTERNAL  (0x0B)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_QUERY_TASK_INTERNAL      (0x0C)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_ASYNC_NOTIFICATION       (0x0D)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_CMP_INTERNAL_DEV_RESET   (0x0E)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_CMP_TASK_ABORT_INTERNAL  (0x0F)
+#define MPI2_EVENT_SAS_DEV_STAT_RC_SATA_INIT_FAILURE        (0x10)
+
+
+/* Integrated RAID Operation Status Event data */
+
+typedef struct _MPI2_EVENT_DATA_IR_OPERATION_STATUS
+{
+    U16                     VolDevHandle;               /* 0x00 */
+    U16                     Reserved1;                  /* 0x02 */
+    U8                      RAIDOperation;              /* 0x04 */
+    U8                      PercentComplete;            /* 0x05 */
+    U16                     Reserved2;                  /* 0x06 */
+    U32                     Resereved3;                 /* 0x08 */
+} MPI2_EVENT_DATA_IR_OPERATION_STATUS,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_IR_OPERATION_STATUS,
+  Mpi2EventDataIrOperationStatus_t,
+  MPI2_POINTER pMpi2EventDataIrOperationStatus_t;
+
+/* Integrated RAID Operation Status Event data RAIDOperation values */
+#define MPI2_EVENT_IR_RAIDOP_RESYNC                     (0x00)
+#define MPI2_EVENT_IR_RAIDOP_ONLINE_CAP_EXPANSION       (0x01)
+#define MPI2_EVENT_IR_RAIDOP_CONSISTENCY_CHECK          (0x02)
+#define MPI2_EVENT_IR_RAIDOP_BACKGROUND_INIT            (0x03)
+#define MPI2_EVENT_IR_RAIDOP_MAKE_DATA_CONSISTENT       (0x04)
+
+
+/* Integrated RAID Volume Event data */
+
+typedef struct _MPI2_EVENT_DATA_IR_VOLUME
+{
+    U16                     VolDevHandle;               /* 0x00 */
+    U8                      ReasonCode;                 /* 0x02 */
+    U8                      Reserved1;                  /* 0x03 */
+    U32                     NewValue;                   /* 0x04 */
+    U32                     PreviousValue;              /* 0x08 */
+} MPI2_EVENT_DATA_IR_VOLUME, MPI2_POINTER PTR_MPI2_EVENT_DATA_IR_VOLUME,
+  Mpi2EventDataIrVolume_t, MPI2_POINTER pMpi2EventDataIrVolume_t;
+
+/* Integrated RAID Volume Event data ReasonCode values */
+#define MPI2_EVENT_IR_VOLUME_RC_SETTINGS_CHANGED        (0x01)
+#define MPI2_EVENT_IR_VOLUME_RC_STATUS_FLAGS_CHANGED    (0x02)
+#define MPI2_EVENT_IR_VOLUME_RC_STATE_CHANGED           (0x03)
+
+
+/* Integrated RAID Physical Disk Event data */
+
+typedef struct _MPI2_EVENT_DATA_IR_PHYSICAL_DISK
+{
+    U16                     Reserved1;                  /* 0x00 */
+    U8                      ReasonCode;                 /* 0x02 */
+    U8                      PhysDiskNum;                /* 0x03 */
+    U16                     PhysDiskDevHandle;          /* 0x04 */
+    U16                     Reserved2;                  /* 0x06 */
+    U16                     Slot;                       /* 0x08 */
+    U16                     EnclosureHandle;            /* 0x0A */
+    U32                     NewValue;                   /* 0x0C */
+    U32                     PreviousValue;              /* 0x10 */
+} MPI2_EVENT_DATA_IR_PHYSICAL_DISK,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_IR_PHYSICAL_DISK,
+  Mpi2EventDataIrPhysicalDisk_t, MPI2_POINTER pMpi2EventDataIrPhysicalDisk_t;
+
+/* Integrated RAID Physical Disk Event data ReasonCode values */
+#define MPI2_EVENT_IR_PHYSDISK_RC_SETTINGS_CHANGED      (0x01)
+#define MPI2_EVENT_IR_PHYSDISK_RC_STATUS_FLAGS_CHANGED  (0x02)
+#define MPI2_EVENT_IR_PHYSDISK_RC_STATE_CHANGED         (0x03)
+
+
+/* Integrated RAID Configuration Change List Event data */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check NumElements at runtime.
+ */
+#ifndef MPI2_EVENT_IR_CONFIG_ELEMENT_COUNT
+#define MPI2_EVENT_IR_CONFIG_ELEMENT_COUNT          (1)
+#endif
+
+typedef struct _MPI2_EVENT_IR_CONFIG_ELEMENT
+{
+    U16                     ElementFlags;               /* 0x00 */
+    U16                     VolDevHandle;               /* 0x02 */
+    U8                      ReasonCode;                 /* 0x04 */
+    U8                      PhysDiskNum;                /* 0x05 */
+    U16                     PhysDiskDevHandle;          /* 0x06 */
+} MPI2_EVENT_IR_CONFIG_ELEMENT, MPI2_POINTER PTR_MPI2_EVENT_IR_CONFIG_ELEMENT,
+  Mpi2EventIrConfigElement_t, MPI2_POINTER pMpi2EventIrConfigElement_t;
+
+/* IR Configuration Change List Event data ElementFlags values */
+#define MPI2_EVENT_IR_CHANGE_EFLAGS_ELEMENT_TYPE_MASK   (0x000F)
+#define MPI2_EVENT_IR_CHANGE_EFLAGS_VOLUME_ELEMENT      (0x0000)
+#define MPI2_EVENT_IR_CHANGE_EFLAGS_VOLPHYSDISK_ELEMENT (0x0001)
+#define MPI2_EVENT_IR_CHANGE_EFLAGS_HOTSPARE_ELEMENT    (0x0002)
+
+/* IR Configuration Change List Event data ReasonCode values */
+#define MPI2_EVENT_IR_CHANGE_RC_ADDED                   (0x01)
+#define MPI2_EVENT_IR_CHANGE_RC_REMOVED                 (0x02)
+#define MPI2_EVENT_IR_CHANGE_RC_NO_CHANGE               (0x03)
+#define MPI2_EVENT_IR_CHANGE_RC_HIDE                    (0x04)
+#define MPI2_EVENT_IR_CHANGE_RC_UNHIDE                  (0x05)
+#define MPI2_EVENT_IR_CHANGE_RC_VOLUME_CREATED          (0x06)
+#define MPI2_EVENT_IR_CHANGE_RC_VOLUME_DELETED          (0x07)
+#define MPI2_EVENT_IR_CHANGE_RC_PD_CREATED              (0x08)
+#define MPI2_EVENT_IR_CHANGE_RC_PD_DELETED              (0x09)
+
+typedef struct _MPI2_EVENT_DATA_IR_CONFIG_CHANGE_LIST
+{
+    U8                              NumElements;        /* 0x00 */
+    U8                              Reserved1;          /* 0x01 */
+    U8                              Reserved2;          /* 0x02 */
+    U8                              ConfigNum;          /* 0x03 */
+    U32                             Flags;              /* 0x04 */
+    MPI2_EVENT_IR_CONFIG_ELEMENT    ConfigElement[MPI2_EVENT_IR_CONFIG_ELEMENT_COUNT];    /* 0x08 */
+} MPI2_EVENT_DATA_IR_CONFIG_CHANGE_LIST,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_IR_CONFIG_CHANGE_LIST,
+  Mpi2EventDataIrConfigChangeList_t,
+  MPI2_POINTER pMpi2EventDataIrConfigChangeList_t;
+
+/* IR Configuration Change List Event data Flags values */
+#define MPI2_EVENT_IR_CHANGE_FLAGS_FOREIGN_CONFIG   (0x00000001)
+
+
+/* SAS Discovery Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_DISCOVERY
+{
+    U8                      Flags;                      /* 0x00 */
+    U8                      ReasonCode;                 /* 0x01 */
+    U8                      PhysicalPort;               /* 0x02 */
+    U8                      Reserved1;                  /* 0x03 */
+    U32                     DiscoveryStatus;            /* 0x04 */
+} MPI2_EVENT_DATA_SAS_DISCOVERY,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_DISCOVERY,
+  Mpi2EventDataSasDiscovery_t, MPI2_POINTER pMpi2EventDataSasDiscovery_t;
+
+/* SAS Discovery Event data Flags values */
+#define MPI2_EVENT_SAS_DISC_DEVICE_CHANGE                   (0x02)
+#define MPI2_EVENT_SAS_DISC_IN_PROGRESS                     (0x01)
+
+/* SAS Discovery Event data ReasonCode values */
+#define MPI2_EVENT_SAS_DISC_RC_STARTED                      (0x01)
+#define MPI2_EVENT_SAS_DISC_RC_COMPLETED                    (0x02)
+
+/* SAS Discovery Event data DiscoveryStatus values */
+#define MPI2_EVENT_SAS_DISC_DS_MAX_ENCLOSURES_EXCEED            (0x80000000)
+#define MPI2_EVENT_SAS_DISC_DS_MAX_EXPANDERS_EXCEED             (0x40000000)
+#define MPI2_EVENT_SAS_DISC_DS_MAX_DEVICES_EXCEED               (0x20000000)
+#define MPI2_EVENT_SAS_DISC_DS_MAX_TOPO_PHYS_EXCEED             (0x10000000)
+#define MPI2_EVENT_SAS_DISC_DS_DOWNSTREAM_INITIATOR             (0x08000000)
+#define MPI2_EVENT_SAS_DISC_DS_MULTI_SUBTRACTIVE_SUBTRACTIVE    (0x00008000)
+#define MPI2_EVENT_SAS_DISC_DS_EXP_MULTI_SUBTRACTIVE            (0x00004000)
+#define MPI2_EVENT_SAS_DISC_DS_MULTI_PORT_DOMAIN                (0x00002000)
+#define MPI2_EVENT_SAS_DISC_DS_TABLE_TO_SUBTRACTIVE_LINK        (0x00001000)
+#define MPI2_EVENT_SAS_DISC_DS_UNSUPPORTED_DEVICE               (0x00000800)
+#define MPI2_EVENT_SAS_DISC_DS_TABLE_LINK                       (0x00000400)
+#define MPI2_EVENT_SAS_DISC_DS_SUBTRACTIVE_LINK                 (0x00000200)
+#define MPI2_EVENT_SAS_DISC_DS_SMP_CRC_ERROR                    (0x00000100)
+#define MPI2_EVENT_SAS_DISC_DS_SMP_FUNCTION_FAILED              (0x00000080)
+#define MPI2_EVENT_SAS_DISC_DS_INDEX_NOT_EXIST                  (0x00000040)
+#define MPI2_EVENT_SAS_DISC_DS_OUT_ROUTE_ENTRIES                (0x00000020)
+#define MPI2_EVENT_SAS_DISC_DS_SMP_TIMEOUT                      (0x00000010)
+#define MPI2_EVENT_SAS_DISC_DS_MULTIPLE_PORTS                   (0x00000004)
+#define MPI2_EVENT_SAS_DISC_DS_UNADDRESSABLE_DEVICE             (0x00000002)
+#define MPI2_EVENT_SAS_DISC_DS_LOOP_DETECTED                    (0x00000001)
+
+
+/* SAS Broadcast Primitive Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_BROADCAST_PRIMITIVE
+{
+    U8                      PhyNum;                     /* 0x00 */
+    U8                      Port;                       /* 0x01 */
+    U8                      PortWidth;                  /* 0x02 */
+    U8                      Primitive;                  /* 0x03 */
+} MPI2_EVENT_DATA_SAS_BROADCAST_PRIMITIVE,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_BROADCAST_PRIMITIVE,
+  Mpi2EventDataSasBroadcastPrimitive_t,
+  MPI2_POINTER pMpi2EventDataSasBroadcastPrimitive_t;
+
+/* defines for the Primitive field */
+#define MPI2_EVENT_PRIMITIVE_CHANGE                         (0x01)
+#define MPI2_EVENT_PRIMITIVE_SES                            (0x02)
+#define MPI2_EVENT_PRIMITIVE_EXPANDER                       (0x03)
+#define MPI2_EVENT_PRIMITIVE_ASYNCHRONOUS_EVENT             (0x04)
+#define MPI2_EVENT_PRIMITIVE_RESERVED3                      (0x05)
+#define MPI2_EVENT_PRIMITIVE_RESERVED4                      (0x06)
+#define MPI2_EVENT_PRIMITIVE_CHANGE0_RESERVED               (0x07)
+#define MPI2_EVENT_PRIMITIVE_CHANGE1_RESERVED               (0x08)
+
+
+/* SAS Initiator Device Status Change Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_INIT_DEV_STATUS_CHANGE
+{
+    U8                      ReasonCode;                 /* 0x00 */
+    U8                      PhysicalPort;               /* 0x01 */
+    U16                     DevHandle;                  /* 0x02 */
+    U64                     SASAddress;                 /* 0x04 */
+} MPI2_EVENT_DATA_SAS_INIT_DEV_STATUS_CHANGE,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_INIT_DEV_STATUS_CHANGE,
+  Mpi2EventDataSasInitDevStatusChange_t,
+  MPI2_POINTER pMpi2EventDataSasInitDevStatusChange_t;
+
+/* SAS Initiator Device Status Change event ReasonCode values */
+#define MPI2_EVENT_SAS_INIT_RC_ADDED                (0x01)
+#define MPI2_EVENT_SAS_INIT_RC_NOT_RESPONDING       (0x02)
+
+
+/* SAS Initiator Device Table Overflow Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_INIT_TABLE_OVERFLOW
+{
+    U16                     MaxInit;                    /* 0x00 */
+    U16                     CurrentInit;                /* 0x02 */
+    U64                     SASAddress;                 /* 0x04 */
+} MPI2_EVENT_DATA_SAS_INIT_TABLE_OVERFLOW,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_INIT_TABLE_OVERFLOW,
+  Mpi2EventDataSasInitTableOverflow_t,
+  MPI2_POINTER pMpi2EventDataSasInitTableOverflow_t;
+
+
+/* SAS Topology Change List Event data */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check NumEntries at runtime.
+ */
+#ifndef MPI2_EVENT_SAS_TOPO_PHY_COUNT
+#define MPI2_EVENT_SAS_TOPO_PHY_COUNT           (1)
+#endif
+
+typedef struct _MPI2_EVENT_SAS_TOPO_PHY_ENTRY
+{
+    U16                     AttachedDevHandle;          /* 0x00 */
+    U8                      LinkRate;                   /* 0x02 */
+    U8                      PhyStatus;                  /* 0x03 */
+} MPI2_EVENT_SAS_TOPO_PHY_ENTRY, MPI2_POINTER PTR_MPI2_EVENT_SAS_TOPO_PHY_ENTRY,
+  Mpi2EventSasTopoPhyEntry_t, MPI2_POINTER pMpi2EventSasTopoPhyEntry_t;
+
+typedef struct _MPI2_EVENT_DATA_SAS_TOPOLOGY_CHANGE_LIST
+{
+    U16                             EnclosureHandle;            /* 0x00 */
+    U16                             ExpanderDevHandle;          /* 0x02 */
+    U8                              NumPhys;                    /* 0x04 */
+    U8                              Reserved1;                  /* 0x05 */
+    U16                             Reserved2;                  /* 0x06 */
+    U8                              NumEntries;                 /* 0x08 */
+    U8                              StartPhyNum;                /* 0x09 */
+    U8                              ExpStatus;                  /* 0x0A */
+    U8                              PhysicalPort;               /* 0x0B */
+    MPI2_EVENT_SAS_TOPO_PHY_ENTRY   PHY[MPI2_EVENT_SAS_TOPO_PHY_COUNT]; /* 0x0C*/
+} MPI2_EVENT_DATA_SAS_TOPOLOGY_CHANGE_LIST,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_TOPOLOGY_CHANGE_LIST,
+  Mpi2EventDataSasTopologyChangeList_t,
+  MPI2_POINTER pMpi2EventDataSasTopologyChangeList_t;
+
+/* values for the ExpStatus field */
+#define MPI2_EVENT_SAS_TOPO_ES_ADDED                        (0x01)
+#define MPI2_EVENT_SAS_TOPO_ES_NOT_RESPONDING               (0x02)
+#define MPI2_EVENT_SAS_TOPO_ES_RESPONDING                   (0x03)
+#define MPI2_EVENT_SAS_TOPO_ES_DELAY_NOT_RESPONDING         (0x04)
+
+/* defines for the LinkRate field */
+#define MPI2_EVENT_SAS_TOPO_LR_CURRENT_MASK                 (0xF0)
+#define MPI2_EVENT_SAS_TOPO_LR_CURRENT_SHIFT                (4)
+#define MPI2_EVENT_SAS_TOPO_LR_PREV_MASK                    (0x0F)
+#define MPI2_EVENT_SAS_TOPO_LR_PREV_SHIFT                   (0)
+
+#define MPI2_EVENT_SAS_TOPO_LR_UNKNOWN_LINK_RATE            (0x00)
+#define MPI2_EVENT_SAS_TOPO_LR_PHY_DISABLED                 (0x01)
+#define MPI2_EVENT_SAS_TOPO_LR_NEGOTIATION_FAILED           (0x02)
+#define MPI2_EVENT_SAS_TOPO_LR_SATA_OOB_COMPLETE            (0x03)
+#define MPI2_EVENT_SAS_TOPO_LR_PORT_SELECTOR                (0x04)
+#define MPI2_EVENT_SAS_TOPO_LR_SMP_RESET_IN_PROGRESS        (0x05)
+#define MPI2_EVENT_SAS_TOPO_LR_RATE_1_5                     (0x08)
+#define MPI2_EVENT_SAS_TOPO_LR_RATE_3_0                     (0x09)
+#define MPI2_EVENT_SAS_TOPO_LR_RATE_6_0                     (0x0A)
+
+/* values for the PhyStatus field */
+#define MPI2_EVENT_SAS_TOPO_PHYSTATUS_VACANT                (0x80)
+#define MPI2_EVENT_SAS_TOPO_PS_MULTIPLEX_CHANGE             (0x10)
+/* values for the PhyStatus ReasonCode sub-field */
+#define MPI2_EVENT_SAS_TOPO_RC_MASK                         (0x0F)
+#define MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED                   (0x01)
+#define MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING          (0x02)
+#define MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED                  (0x03)
+#define MPI2_EVENT_SAS_TOPO_RC_NO_CHANGE                    (0x04)
+#define MPI2_EVENT_SAS_TOPO_RC_DELAY_NOT_RESPONDING         (0x05)
+
+
+/* SAS Enclosure Device Status Change Event data */
+
+typedef struct _MPI2_EVENT_DATA_SAS_ENCL_DEV_STATUS_CHANGE
+{
+    U16                     EnclosureHandle;            /* 0x00 */
+    U8                      ReasonCode;                 /* 0x02 */
+    U8                      PhysicalPort;               /* 0x03 */
+    U64                     EnclosureLogicalID;         /* 0x04 */
+    U16                     NumSlots;                   /* 0x0C */
+    U16                     StartSlot;                  /* 0x0E */
+    U32                     PhyBits;                    /* 0x10 */
+} MPI2_EVENT_DATA_SAS_ENCL_DEV_STATUS_CHANGE,
+  MPI2_POINTER PTR_MPI2_EVENT_DATA_SAS_ENCL_DEV_STATUS_CHANGE,
+  Mpi2EventDataSasEnclDevStatusChange_t,
+  MPI2_POINTER pMpi2EventDataSasEnclDevStatusChange_t;
+
+/* SAS Enclosure Device Status Change event ReasonCode values */
+#define MPI2_EVENT_SAS_ENCL_RC_ADDED                (0x01)
+#define MPI2_EVENT_SAS_ENCL_RC_NOT_RESPONDING       (0x02)
+
+
+/****************************************************************************
+*  EventAck message
+****************************************************************************/
+
+/* EventAck Request message */
+typedef struct _MPI2_EVENT_ACK_REQUEST
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U16                     Event;                          /* 0x0C */
+    U16                     Reserved5;                      /* 0x0E */
+    U32                     EventContext;                   /* 0x10 */
+} MPI2_EVENT_ACK_REQUEST, MPI2_POINTER PTR_MPI2_EVENT_ACK_REQUEST,
+  Mpi2EventAckRequest_t, MPI2_POINTER pMpi2EventAckRequest_t;
+
+
+/* EventAck Reply message */
+typedef struct _MPI2_EVENT_ACK_REPLY
+{
+    U16                     Reserved1;                      /* 0x00 */
+    U8                      MsgLength;                      /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     Reserved2;                      /* 0x04 */
+    U8                      Reserved3;                      /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved4;                      /* 0x0A */
+    U16                     Reserved5;                      /* 0x0C */
+    U16                     IOCStatus;                      /* 0x0E */
+    U32                     IOCLogInfo;                     /* 0x10 */
+} MPI2_EVENT_ACK_REPLY, MPI2_POINTER PTR_MPI2_EVENT_ACK_REPLY,
+  Mpi2EventAckReply_t, MPI2_POINTER pMpi2EventAckReply_t;
+
+
+/****************************************************************************
+*  FWDownload message
+****************************************************************************/
+
+/* FWDownload Request message */
+typedef struct _MPI2_FW_DOWNLOAD_REQUEST
+{
+    U8                      ImageType;                  /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U32                     TotalImageSize;             /* 0x0C */
+    U32                     Reserved5;                  /* 0x10 */
+    MPI2_MPI_SGE_UNION      SGL;                        /* 0x14 */
+} MPI2_FW_DOWNLOAD_REQUEST, MPI2_POINTER PTR_MPI2_FW_DOWNLOAD_REQUEST,
+  Mpi2FWDownloadRequest, MPI2_POINTER pMpi2FWDownloadRequest;
+
+#define MPI2_FW_DOWNLOAD_MSGFLGS_LAST_SEGMENT   (0x01)
+
+#define MPI2_FW_DOWNLOAD_ITYPE_FW                   (0x01)
+#define MPI2_FW_DOWNLOAD_ITYPE_BIOS                 (0x02)
+#define MPI2_FW_DOWNLOAD_ITYPE_MANUFACTURING        (0x06)
+#define MPI2_FW_DOWNLOAD_ITYPE_CONFIG_1             (0x07)
+#define MPI2_FW_DOWNLOAD_ITYPE_CONFIG_2             (0x08)
+#define MPI2_FW_DOWNLOAD_ITYPE_MEGARAID             (0x09)
+#define MPI2_FW_DOWNLOAD_ITYPE_COMMON_BOOT_BLOCK    (0x0B)
+
+/* FWDownload TransactionContext Element */
+typedef struct _MPI2_FW_DOWNLOAD_TCSGE
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      ContextSize;                /* 0x01 */
+    U8                      DetailsLength;              /* 0x02 */
+    U8                      Flags;                      /* 0x03 */
+    U32                     Reserved2;                  /* 0x04 */
+    U32                     ImageOffset;                /* 0x08 */
+    U32                     ImageSize;                  /* 0x0C */
+} MPI2_FW_DOWNLOAD_TCSGE, MPI2_POINTER PTR_MPI2_FW_DOWNLOAD_TCSGE,
+  Mpi2FWDownloadTCSGE_t, MPI2_POINTER pMpi2FWDownloadTCSGE_t;
+
+/* FWDownload Reply message */
+typedef struct _MPI2_FW_DOWNLOAD_REPLY
+{
+    U8                      ImageType;                  /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U16                     Reserved5;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+} MPI2_FW_DOWNLOAD_REPLY, MPI2_POINTER PTR_MPI2_FW_DOWNLOAD_REPLY,
+  Mpi2FWDownloadReply_t, MPI2_POINTER pMpi2FWDownloadReply_t;
+
+
+/****************************************************************************
+*  FWUpload message
+****************************************************************************/
+
+/* FWUpload Request message */
+typedef struct _MPI2_FW_UPLOAD_REQUEST
+{
+    U8                      ImageType;                  /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U32                     Reserved5;                  /* 0x0C */
+    U32                     Reserved6;                  /* 0x10 */
+    MPI2_MPI_SGE_UNION      SGL;                        /* 0x14 */
+} MPI2_FW_UPLOAD_REQUEST, MPI2_POINTER PTR_MPI2_FW_UPLOAD_REQUEST,
+  Mpi2FWUploadRequest_t, MPI2_POINTER pMpi2FWUploadRequest_t;
+
+#define MPI2_FW_UPLOAD_ITYPE_FW_CURRENT         (0x00)
+#define MPI2_FW_UPLOAD_ITYPE_FW_FLASH           (0x01)
+#define MPI2_FW_UPLOAD_ITYPE_BIOS_FLASH         (0x02)
+#define MPI2_FW_UPLOAD_ITYPE_FW_BACKUP          (0x05)
+#define MPI2_FW_UPLOAD_ITYPE_MANUFACTURING      (0x06)
+#define MPI2_FW_UPLOAD_ITYPE_CONFIG_1           (0x07)
+#define MPI2_FW_UPLOAD_ITYPE_CONFIG_2           (0x08)
+#define MPI2_FW_UPLOAD_ITYPE_MEGARAID           (0x09)
+#define MPI2_FW_UPLOAD_ITYPE_COMPLETE           (0x0A)
+#define MPI2_FW_UPLOAD_ITYPE_COMMON_BOOT_BLOCK  (0x0B)
+
+typedef struct _MPI2_FW_UPLOAD_TCSGE
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      ContextSize;                /* 0x01 */
+    U8                      DetailsLength;              /* 0x02 */
+    U8                      Flags;                      /* 0x03 */
+    U32                     Reserved2;                  /* 0x04 */
+    U32                     ImageOffset;                /* 0x08 */
+    U32                     ImageSize;                  /* 0x0C */
+} MPI2_FW_UPLOAD_TCSGE, MPI2_POINTER PTR_MPI2_FW_UPLOAD_TCSGE,
+  Mpi2FWUploadTCSGE_t, MPI2_POINTER pMpi2FWUploadTCSGE_t;
+
+/* FWUpload Reply message */
+typedef struct _MPI2_FW_UPLOAD_REPLY
+{
+    U8                      ImageType;                  /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U16                     Reserved5;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+    U32                     ActualImageSize;            /* 0x14 */
+} MPI2_FW_UPLOAD_REPLY, MPI2_POINTER PTR_MPI2_FW_UPLOAD_REPLY,
+  Mpi2FWUploadReply_t, MPI2_POINTER pMPi2FWUploadReply_t;
+
+
+/* FW Image Header */
+typedef struct _MPI2_FW_IMAGE_HEADER
+{
+    U32                     Signature;                  /* 0x00 */
+    U32                     Signature0;                 /* 0x04 */
+    U32                     Signature1;                 /* 0x08 */
+    U32                     Signature2;                 /* 0x0C */
+    MPI2_VERSION_UNION      MPIVersion;                 /* 0x10 */
+    MPI2_VERSION_UNION      FWVersion;                  /* 0x14 */
+    MPI2_VERSION_UNION      NVDATAVersion;              /* 0x18 */
+    MPI2_VERSION_UNION      PackageVersion;             /* 0x1C */
+    U16                     VendorID;                   /* 0x20 */
+    U16                     ProductID;                  /* 0x22 */
+    U16                     ProtocolFlags;              /* 0x24 */
+    U16                     Reserved26;                 /* 0x26 */
+    U32                     IOCCapabilities;            /* 0x28 */
+    U32                     ImageSize;                  /* 0x2C */
+    U32                     NextImageHeaderOffset;      /* 0x30 */
+    U32                     Checksum;                   /* 0x34 */
+    U32                     Reserved38;                 /* 0x38 */
+    U32                     Reserved3C;                 /* 0x3C */
+    U32                     Reserved40;                 /* 0x40 */
+    U32                     Reserved44;                 /* 0x44 */
+    U32                     Reserved48;                 /* 0x48 */
+    U32                     Reserved4C;                 /* 0x4C */
+    U32                     Reserved50;                 /* 0x50 */
+    U32                     Reserved54;                 /* 0x54 */
+    U32                     Reserved58;                 /* 0x58 */
+    U32                     Reserved5C;                 /* 0x5C */
+    U32                     Reserved60;                 /* 0x60 */
+    U32                     FirmwareVersionNameWhat;    /* 0x64 */
+    U8                      FirmwareVersionName[32];    /* 0x68 */
+    U32                     VendorNameWhat;             /* 0x88 */
+    U8                      VendorName[32];             /* 0x8C */
+    U32                     PackageNameWhat;            /* 0x88 */
+    U8                      PackageName[32];            /* 0x8C */
+    U32                     ReservedD0;                 /* 0xD0 */
+    U32                     ReservedD4;                 /* 0xD4 */
+    U32                     ReservedD8;                 /* 0xD8 */
+    U32                     ReservedDC;                 /* 0xDC */
+    U32                     ReservedE0;                 /* 0xE0 */
+    U32                     ReservedE4;                 /* 0xE4 */
+    U32                     ReservedE8;                 /* 0xE8 */
+    U32                     ReservedEC;                 /* 0xEC */
+    U32                     ReservedF0;                 /* 0xF0 */
+    U32                     ReservedF4;                 /* 0xF4 */
+    U32                     ReservedF8;                 /* 0xF8 */
+    U32                     ReservedFC;                 /* 0xFC */
+} MPI2_FW_IMAGE_HEADER, MPI2_POINTER PTR_MPI2_FW_IMAGE_HEADER,
+  Mpi2FWImageHeader_t, MPI2_POINTER pMpi2FWImageHeader_t;
+
+/* Signature field */
+#define MPI2_FW_HEADER_SIGNATURE_OFFSET         (0x00)
+#define MPI2_FW_HEADER_SIGNATURE_MASK           (0xFF000000)
+#define MPI2_FW_HEADER_SIGNATURE                (0xEA000000)
+
+/* Signature0 field */
+#define MPI2_FW_HEADER_SIGNATURE0_OFFSET        (0x04)
+#define MPI2_FW_HEADER_SIGNATURE0               (0x5AFAA55A)
+
+/* Signature1 field */
+#define MPI2_FW_HEADER_SIGNATURE1_OFFSET        (0x08)
+#define MPI2_FW_HEADER_SIGNATURE1               (0xA55AFAA5)
+
+/* Signature2 field */
+#define MPI2_FW_HEADER_SIGNATURE2_OFFSET        (0x0C)
+#define MPI2_FW_HEADER_SIGNATURE2               (0x5AA55AFA)
+
+
+/* defines for using the ProductID field */
+#define MPI2_FW_HEADER_PID_TYPE_MASK            (0xF000)
+#define MPI2_FW_HEADER_PID_TYPE_SAS             (0x2000)
+
+#define MPI2_FW_HEADER_PID_PROD_MASK            (0x0F00)
+#define MPI2_FW_HEADER_PID_PROD_A               (0x0000)
+
+#define MPI2_FW_HEADER_PID_FAMILY_MASK          (0x00FF)
+/* SAS */
+#define MPI2_FW_HEADER_PID_FAMILY_2108_SAS      (0x0010)
+
+/* use MPI2_IOCFACTS_PROTOCOL_ defines for ProtocolFlags field */
+
+/* use MPI2_IOCFACTS_CAPABILITY_ defines for IOCCapabilities field */
+
+
+#define MPI2_FW_HEADER_IMAGESIZE_OFFSET         (0x2C)
+#define MPI2_FW_HEADER_NEXTIMAGE_OFFSET         (0x30)
+#define MPI2_FW_HEADER_VERNMHWAT_OFFSET         (0x64)
+
+#define MPI2_FW_HEADER_WHAT_SIGNATURE           (0x29232840)
+
+#define MPI2_FW_HEADER_SIZE                     (0x100)
+
+
+/* Extended Image Header */
+typedef struct _MPI2_EXT_IMAGE_HEADER
+
+{
+    U8                      ImageType;                  /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U16                     Reserved2;                  /* 0x02 */
+    U32                     Checksum;                   /* 0x04 */
+    U32                     ImageSize;                  /* 0x08 */
+    U32                     NextImageHeaderOffset;      /* 0x0C */
+    U32                     PackageVersion;             /* 0x10 */
+    U32                     Reserved3;                  /* 0x14 */
+    U32                     Reserved4;                  /* 0x18 */
+    U32                     Reserved5;                  /* 0x1C */
+    U8                      IdentifyString[32];         /* 0x20 */
+} MPI2_EXT_IMAGE_HEADER, MPI2_POINTER PTR_MPI2_EXT_IMAGE_HEADER,
+  Mpi2ExtImageHeader_t, MPI2_POINTER pMpi2ExtImageHeader_t;
+
+/* useful offsets */
+#define MPI2_EXT_IMAGE_IMAGETYPE_OFFSET         (0x00)
+#define MPI2_EXT_IMAGE_IMAGESIZE_OFFSET         (0x08)
+#define MPI2_EXT_IMAGE_NEXTIMAGE_OFFSET         (0x0C)
+
+#define MPI2_EXT_IMAGE_HEADER_SIZE              (0x40)
+
+/* defines for the ImageType field */
+#define MPI2_EXT_IMAGE_TYPE_UNSPECIFIED         (0x00)
+#define MPI2_EXT_IMAGE_TYPE_FW                  (0x01)
+#define MPI2_EXT_IMAGE_TYPE_NVDATA              (0x03)
+#define MPI2_EXT_IMAGE_TYPE_BOOTLOADER          (0x04)
+#define MPI2_EXT_IMAGE_TYPE_INITIALIZATION      (0x05)
+#define MPI2_EXT_IMAGE_TYPE_FLASH_LAYOUT        (0x06)
+#define MPI2_EXT_IMAGE_TYPE_SUPPORTED_DEVICES   (0x07)
+#define MPI2_EXT_IMAGE_TYPE_MEGARAID            (0x08)
+
+#define MPI2_EXT_IMAGE_TYPE_MAX                 (MPI2_EXT_IMAGE_TYPE_MEGARAID)
+
+
+
+/* FLASH Layout Extended Image Data */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check RegionsPerLayout at runtime.
+ */
+#ifndef MPI2_FLASH_NUMBER_OF_REGIONS
+#define MPI2_FLASH_NUMBER_OF_REGIONS        (1)
+#endif
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check NumberOfLayouts at runtime.
+ */
+#ifndef MPI2_FLASH_NUMBER_OF_LAYOUTS
+#define MPI2_FLASH_NUMBER_OF_LAYOUTS        (1)
+#endif
+
+typedef struct _MPI2_FLASH_REGION
+{
+    U8                      RegionType;                 /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U16                     Reserved2;                  /* 0x02 */
+    U32                     RegionOffset;               /* 0x04 */
+    U32                     RegionSize;                 /* 0x08 */
+    U32                     Reserved3;                  /* 0x0C */
+} MPI2_FLASH_REGION, MPI2_POINTER PTR_MPI2_FLASH_REGION,
+  Mpi2FlashRegion_t, MPI2_POINTER pMpi2FlashRegion_t;
+
+typedef struct _MPI2_FLASH_LAYOUT
+{
+    U32                     FlashSize;                  /* 0x00 */
+    U32                     Reserved1;                  /* 0x04 */
+    U32                     Reserved2;                  /* 0x08 */
+    U32                     Reserved3;                  /* 0x0C */
+    MPI2_FLASH_REGION       Region[MPI2_FLASH_NUMBER_OF_REGIONS];/* 0x10 */
+} MPI2_FLASH_LAYOUT, MPI2_POINTER PTR_MPI2_FLASH_LAYOUT,
+  Mpi2FlashLayout_t, MPI2_POINTER pMpi2FlashLayout_t;
+
+typedef struct _MPI2_FLASH_LAYOUT_DATA
+{
+    U8                      ImageRevision;              /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      SizeOfRegion;               /* 0x02 */
+    U8                      Reserved2;                  /* 0x03 */
+    U16                     NumberOfLayouts;            /* 0x04 */
+    U16                     RegionsPerLayout;           /* 0x06 */
+    U16                     MinimumSectorAlignment;     /* 0x08 */
+    U16                     Reserved3;                  /* 0x0A */
+    U32                     Reserved4;                  /* 0x0C */
+    MPI2_FLASH_LAYOUT       Layout[MPI2_FLASH_NUMBER_OF_LAYOUTS];/* 0x10 */
+} MPI2_FLASH_LAYOUT_DATA, MPI2_POINTER PTR_MPI2_FLASH_LAYOUT_DATA,
+  Mpi2FlashLayoutData_t, MPI2_POINTER pMpi2FlashLayoutData_t;
+
+/* defines for the RegionType field */
+#define MPI2_FLASH_REGION_UNUSED                (0x00)
+#define MPI2_FLASH_REGION_FIRMWARE              (0x01)
+#define MPI2_FLASH_REGION_BIOS                  (0x02)
+#define MPI2_FLASH_REGION_NVDATA                (0x03)
+#define MPI2_FLASH_REGION_FIRMWARE_BACKUP       (0x05)
+#define MPI2_FLASH_REGION_MFG_INFORMATION       (0x06)
+#define MPI2_FLASH_REGION_CONFIG_1              (0x07)
+#define MPI2_FLASH_REGION_CONFIG_2              (0x08)
+#define MPI2_FLASH_REGION_MEGARAID              (0x09)
+#define MPI2_FLASH_REGION_INIT                  (0x0A)
+
+/* ImageRevision */
+#define MPI2_FLASH_LAYOUT_IMAGE_REVISION        (0x00)
+
+
+
+/* Supported Devices Extended Image Data */
+
+/*
+ * Host code (drivers, BIOS, utilities, etc.) should leave this define set to
+ * one and check NumberOfDevices at runtime.
+ */
+#ifndef MPI2_SUPPORTED_DEVICES_IMAGE_NUM_DEVICES
+#define MPI2_SUPPORTED_DEVICES_IMAGE_NUM_DEVICES    (1)
+#endif
+
+typedef struct _MPI2_SUPPORTED_DEVICE
+{
+    U16                     DeviceID;                   /* 0x00 */
+    U16                     VendorID;                   /* 0x02 */
+    U16                     DeviceIDMask;               /* 0x04 */
+    U16                     Reserved1;                  /* 0x06 */
+    U8                      LowPCIRev;                  /* 0x08 */
+    U8                      HighPCIRev;                 /* 0x09 */
+    U16                     Reserved2;                  /* 0x0A */
+    U32                     Reserved3;                  /* 0x0C */
+} MPI2_SUPPORTED_DEVICE, MPI2_POINTER PTR_MPI2_SUPPORTED_DEVICE,
+  Mpi2SupportedDevice_t, MPI2_POINTER pMpi2SupportedDevice_t;
+
+typedef struct _MPI2_SUPPORTED_DEVICES_DATA
+{
+    U8                      ImageRevision;              /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      NumberOfDevices;            /* 0x02 */
+    U8                      Reserved2;                  /* 0x03 */
+    U32                     Reserved3;                  /* 0x04 */
+    MPI2_SUPPORTED_DEVICE   SupportedDevice[MPI2_SUPPORTED_DEVICES_IMAGE_NUM_DEVICES]; /* 0x08 */
+} MPI2_SUPPORTED_DEVICES_DATA, MPI2_POINTER PTR_MPI2_SUPPORTED_DEVICES_DATA,
+  Mpi2SupportedDevicesData_t, MPI2_POINTER pMpi2SupportedDevicesData_t;
+
+/* ImageRevision */
+#define MPI2_SUPPORTED_DEVICES_IMAGE_REVISION   (0x00)
+
+
+/* Init Extended Image Data */
+
+typedef struct _MPI2_INIT_IMAGE_FOOTER
+
+{
+    U32                     BootFlags;                  /* 0x00 */
+    U32                     ImageSize;                  /* 0x04 */
+    U32                     Signature0;                 /* 0x08 */
+    U32                     Signature1;                 /* 0x0C */
+    U32                     Signature2;                 /* 0x10 */
+    U32                     ResetVector;                /* 0x14 */
+} MPI2_INIT_IMAGE_FOOTER, MPI2_POINTER PTR_MPI2_INIT_IMAGE_FOOTER,
+  Mpi2InitImageFooter_t, MPI2_POINTER pMpi2InitImageFooter_t;
+
+/* defines for the BootFlags field */
+#define MPI2_INIT_IMAGE_BOOTFLAGS_OFFSET        (0x00)
+
+/* defines for the ImageSize field */
+#define MPI2_INIT_IMAGE_IMAGESIZE_OFFSET        (0x04)
+
+/* defines for the Signature0 field */
+#define MPI2_INIT_IMAGE_SIGNATURE0_OFFSET       (0x08)
+#define MPI2_INIT_IMAGE_SIGNATURE0              (0x5AA55AEA)
+
+/* defines for the Signature1 field */
+#define MPI2_INIT_IMAGE_SIGNATURE1_OFFSET       (0x0C)
+#define MPI2_INIT_IMAGE_SIGNATURE1              (0xA55AEAA5)
+
+/* defines for the Signature2 field */
+#define MPI2_INIT_IMAGE_SIGNATURE2_OFFSET       (0x10)
+#define MPI2_INIT_IMAGE_SIGNATURE2              (0x5AEAA55A)
+
+/* Signature fields as individual bytes */
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_0        (0xEA)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_1        (0x5A)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_2        (0xA5)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_3        (0x5A)
+
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_4        (0xA5)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_5        (0xEA)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_6        (0x5A)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_7        (0xA5)
+
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_8        (0x5A)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_9        (0xA5)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_A        (0xEA)
+#define MPI2_INIT_IMAGE_SIGNATURE_BYTE_B        (0x5A)
+
+/* defines for the ResetVector field */
+#define MPI2_INIT_IMAGE_RESETVECTOR_OFFSET      (0x14)
+
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_raid.h b/drivers/scsi/mpt2sas/mpi/mpi2_raid.h
new file mode 100644
index 000000000000..7134816d9046
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_raid.h
@@ -0,0 +1,295 @@
+/*
+ *  Copyright (c) 2000-2008 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_raid.h
+ *          Title:  MPI Integrated RAID messages and structures
+ *  Creation Date:  April 26, 2007
+ *
+ *    mpi2_raid.h Version:  02.00.03
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  08-31-07  02.00.01  Modifications to RAID Action request and reply,
+ *                      including the Actions and ActionData.
+ *  02-29-08  02.00.02  Added MPI2_RAID_ACTION_ADATA_DISABL_FULL_REBUILD.
+ *  05-21-08  02.00.03  Added MPI2_RAID_VOL_CREATION_NUM_PHYSDISKS so that
+ *                      the PhysDisk array in MPI2_RAID_VOLUME_CREATION_STRUCT
+ *                      can be sized by the build environment.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_RAID_H
+#define MPI2_RAID_H
+
+/*****************************************************************************
+*
+*               Integrated RAID Messages
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  RAID Action messages
+****************************************************************************/
+
+/* ActionDataWord defines for use with MPI2_RAID_ACTION_DELETE_VOLUME action */
+#define MPI2_RAID_ACTION_ADATA_KEEP_LBA0            (0x00000000)
+#define MPI2_RAID_ACTION_ADATA_ZERO_LBA0            (0x00000001)
+
+/* use MPI2_RAIDVOL0_SETTING_ defines from mpi2_cnfg.h for MPI2_RAID_ACTION_CHANGE_VOL_WRITE_CACHE action */
+
+/* ActionDataWord defines for use with MPI2_RAID_ACTION_DISABLE_ALL_VOLUMES action */
+#define MPI2_RAID_ACTION_ADATA_DISABL_FULL_REBUILD  (0x00000001)
+
+/* ActionDataWord for MPI2_RAID_ACTION_SET_RAID_FUNCTION_RATE Action */
+typedef struct _MPI2_RAID_ACTION_RATE_DATA
+{
+    U8              RateToChange;               /* 0x00 */
+    U8              RateOrMode;                 /* 0x01 */
+    U16             DataScrubDuration;          /* 0x02 */
+} MPI2_RAID_ACTION_RATE_DATA, MPI2_POINTER PTR_MPI2_RAID_ACTION_RATE_DATA,
+  Mpi2RaidActionRateData_t, MPI2_POINTER pMpi2RaidActionRateData_t;
+
+#define MPI2_RAID_ACTION_SET_RATE_RESYNC            (0x00)
+#define MPI2_RAID_ACTION_SET_RATE_DATA_SCRUB        (0x01)
+#define MPI2_RAID_ACTION_SET_RATE_POWERSAVE_MODE    (0x02)
+
+/* ActionDataWord for MPI2_RAID_ACTION_START_RAID_FUNCTION Action */
+typedef struct _MPI2_RAID_ACTION_START_RAID_FUNCTION
+{
+    U8              RAIDFunction;                       /* 0x00 */
+    U8              Flags;                              /* 0x01 */
+    U16             Reserved1;                          /* 0x02 */
+} MPI2_RAID_ACTION_START_RAID_FUNCTION,
+  MPI2_POINTER PTR_MPI2_RAID_ACTION_START_RAID_FUNCTION,
+  Mpi2RaidActionStartRaidFunction_t,
+  MPI2_POINTER pMpi2RaidActionStartRaidFunction_t;
+
+/* defines for the RAIDFunction field */
+#define MPI2_RAID_ACTION_START_BACKGROUND_INIT      (0x00)
+#define MPI2_RAID_ACTION_START_ONLINE_CAP_EXPANSION (0x01)
+#define MPI2_RAID_ACTION_START_CONSISTENCY_CHECK    (0x02)
+
+/* defines for the Flags field */
+#define MPI2_RAID_ACTION_START_NEW                  (0x00)
+#define MPI2_RAID_ACTION_START_RESUME               (0x01)
+
+/* ActionDataWord for MPI2_RAID_ACTION_STOP_RAID_FUNCTION Action */
+typedef struct _MPI2_RAID_ACTION_STOP_RAID_FUNCTION
+{
+    U8              RAIDFunction;                       /* 0x00 */
+    U8              Flags;                              /* 0x01 */
+    U16             Reserved1;                          /* 0x02 */
+} MPI2_RAID_ACTION_STOP_RAID_FUNCTION,
+  MPI2_POINTER PTR_MPI2_RAID_ACTION_STOP_RAID_FUNCTION,
+  Mpi2RaidActionStopRaidFunction_t,
+  MPI2_POINTER pMpi2RaidActionStopRaidFunction_t;
+
+/* defines for the RAIDFunction field */
+#define MPI2_RAID_ACTION_STOP_BACKGROUND_INIT       (0x00)
+#define MPI2_RAID_ACTION_STOP_ONLINE_CAP_EXPANSION  (0x01)
+#define MPI2_RAID_ACTION_STOP_CONSISTENCY_CHECK     (0x02)
+
+/* defines for the Flags field */
+#define MPI2_RAID_ACTION_STOP_ABORT                 (0x00)
+#define MPI2_RAID_ACTION_STOP_PAUSE                 (0x01)
+
+/* ActionDataWord for MPI2_RAID_ACTION_CREATE_HOT_SPARE Action */
+typedef struct _MPI2_RAID_ACTION_HOT_SPARE
+{
+    U8              HotSparePool;               /* 0x00 */
+    U8              Reserved1;                  /* 0x01 */
+    U16             DevHandle;                  /* 0x02 */
+} MPI2_RAID_ACTION_HOT_SPARE, MPI2_POINTER PTR_MPI2_RAID_ACTION_HOT_SPARE,
+  Mpi2RaidActionHotSpare_t, MPI2_POINTER pMpi2RaidActionHotSpare_t;
+
+/* ActionDataWord for MPI2_RAID_ACTION_DEVICE_FW_UPDATE_MODE Action */
+typedef struct _MPI2_RAID_ACTION_FW_UPDATE_MODE
+{
+    U8              Flags;                              /* 0x00 */
+    U8              DeviceFirmwareUpdateModeTimeout;    /* 0x01 */
+    U16             Reserved1;                          /* 0x02 */
+} MPI2_RAID_ACTION_FW_UPDATE_MODE,
+  MPI2_POINTER PTR_MPI2_RAID_ACTION_FW_UPDATE_MODE,
+  Mpi2RaidActionFwUpdateMode_t, MPI2_POINTER pMpi2RaidActionFwUpdateMode_t;
+
+/* ActionDataWord defines for use with MPI2_RAID_ACTION_DEVICE_FW_UPDATE_MODE action */
+#define MPI2_RAID_ACTION_ADATA_DISABLE_FW_UPDATE        (0x00)
+#define MPI2_RAID_ACTION_ADATA_ENABLE_FW_UPDATE         (0x01)
+
+typedef union _MPI2_RAID_ACTION_DATA
+{
+    U32                                     Word;
+    MPI2_RAID_ACTION_RATE_DATA              Rates;
+    MPI2_RAID_ACTION_START_RAID_FUNCTION    StartRaidFunction;
+    MPI2_RAID_ACTION_STOP_RAID_FUNCTION     StopRaidFunction;
+    MPI2_RAID_ACTION_HOT_SPARE              HotSpare;
+    MPI2_RAID_ACTION_FW_UPDATE_MODE         FwUpdateMode;
+} MPI2_RAID_ACTION_DATA, MPI2_POINTER PTR_MPI2_RAID_ACTION_DATA,
+  Mpi2RaidActionData_t, MPI2_POINTER pMpi2RaidActionData_t;
+
+
+/* RAID Action Request Message */
+typedef struct _MPI2_RAID_ACTION_REQUEST
+{
+    U8                      Action;                         /* 0x00 */
+    U8                      Reserved1;                      /* 0x01 */
+    U8                      ChainOffset;                    /* 0x02 */
+    U8                      Function;                       /* 0x03 */
+    U16                     VolDevHandle;                   /* 0x04 */
+    U8                      PhysDiskNum;                    /* 0x06 */
+    U8                      MsgFlags;                       /* 0x07 */
+    U8                      VP_ID;                          /* 0x08 */
+    U8                      VF_ID;                          /* 0x09 */
+    U16                     Reserved2;                      /* 0x0A */
+    U32                     Reserved3;                      /* 0x0C */
+    MPI2_RAID_ACTION_DATA   ActionDataWord;                 /* 0x10 */
+    MPI2_SGE_SIMPLE_UNION   ActionDataSGE;                  /* 0x14 */
+} MPI2_RAID_ACTION_REQUEST, MPI2_POINTER PTR_MPI2_RAID_ACTION_REQUEST,
+  Mpi2RaidActionRequest_t, MPI2_POINTER pMpi2RaidActionRequest_t;
+
+/* RAID Action request Action values */
+
+#define MPI2_RAID_ACTION_INDICATOR_STRUCT           (0x01)
+#define MPI2_RAID_ACTION_CREATE_VOLUME              (0x02)
+#define MPI2_RAID_ACTION_DELETE_VOLUME              (0x03)
+#define MPI2_RAID_ACTION_DISABLE_ALL_VOLUMES        (0x04)
+#define MPI2_RAID_ACTION_ENABLE_ALL_VOLUMES         (0x05)
+#define MPI2_RAID_ACTION_PHYSDISK_OFFLINE           (0x0A)
+#define MPI2_RAID_ACTION_PHYSDISK_ONLINE            (0x0B)
+#define MPI2_RAID_ACTION_FAIL_PHYSDISK              (0x0F)
+#define MPI2_RAID_ACTION_ACTIVATE_VOLUME            (0x11)
+#define MPI2_RAID_ACTION_DEVICE_FW_UPDATE_MODE      (0x15)
+#define MPI2_RAID_ACTION_CHANGE_VOL_WRITE_CACHE     (0x17)
+#define MPI2_RAID_ACTION_SET_VOLUME_NAME            (0x18)
+#define MPI2_RAID_ACTION_SET_RAID_FUNCTION_RATE     (0x19)
+#define MPI2_RAID_ACTION_ENABLE_FAILED_VOLUME       (0x1C)
+#define MPI2_RAID_ACTION_CREATE_HOT_SPARE           (0x1D)
+#define MPI2_RAID_ACTION_DELETE_HOT_SPARE           (0x1E)
+#define MPI2_RAID_ACTION_SYSTEM_SHUTDOWN_INITIATED  (0x20)
+#define MPI2_RAID_ACTION_START_RAID_FUNCTION        (0x21)
+#define MPI2_RAID_ACTION_STOP_RAID_FUNCTION         (0x22)
+
+
+/* RAID Volume Creation Structure */
+
+/*
+ * The following define can be customized for the targeted product.
+ */
+#ifndef MPI2_RAID_VOL_CREATION_NUM_PHYSDISKS
+#define MPI2_RAID_VOL_CREATION_NUM_PHYSDISKS        (1)
+#endif
+
+typedef struct _MPI2_RAID_VOLUME_PHYSDISK
+{
+    U8                      RAIDSetNum;                     /* 0x00 */
+    U8                      PhysDiskMap;                    /* 0x01 */
+    U16                     PhysDiskDevHandle;              /* 0x02 */
+} MPI2_RAID_VOLUME_PHYSDISK, MPI2_POINTER PTR_MPI2_RAID_VOLUME_PHYSDISK,
+  Mpi2RaidVolumePhysDisk_t, MPI2_POINTER pMpi2RaidVolumePhysDisk_t;
+
+/* defines for the PhysDiskMap field */
+#define MPI2_RAIDACTION_PHYSDISK_PRIMARY            (0x01)
+#define MPI2_RAIDACTION_PHYSDISK_SECONDARY          (0x02)
+
+typedef struct _MPI2_RAID_VOLUME_CREATION_STRUCT
+{
+    U8                          NumPhysDisks;               /* 0x00 */
+    U8                          VolumeType;                 /* 0x01 */
+    U16                         Reserved1;                  /* 0x02 */
+    U32                         VolumeCreationFlags;        /* 0x04 */
+    U32                         VolumeSettings;             /* 0x08 */
+    U8                          Reserved2;                  /* 0x0C */
+    U8                          ResyncRate;                 /* 0x0D */
+    U16                         DataScrubDuration;          /* 0x0E */
+    U64                         VolumeMaxLBA;               /* 0x10 */
+    U32                         StripeSize;                 /* 0x18 */
+    U8                          Name[16];                   /* 0x1C */
+    MPI2_RAID_VOLUME_PHYSDISK   PhysDisk[MPI2_RAID_VOL_CREATION_NUM_PHYSDISKS];/* 0x2C */
+} MPI2_RAID_VOLUME_CREATION_STRUCT,
+  MPI2_POINTER PTR_MPI2_RAID_VOLUME_CREATION_STRUCT,
+  Mpi2RaidVolumeCreationStruct_t, MPI2_POINTER pMpi2RaidVolumeCreationStruct_t;
+
+/* use MPI2_RAID_VOL_TYPE_ defines from mpi2_cnfg.h for VolumeType */
+
+/* defines for the VolumeCreationFlags field */
+#define MPI2_RAID_VOL_CREATION_USE_DEFAULT_SETTINGS (0x80)
+#define MPI2_RAID_VOL_CREATION_BACKGROUND_INIT      (0x04)
+#define MPI2_RAID_VOL_CREATION_LOW_LEVEL_INIT       (0x02)
+#define MPI2_RAID_VOL_CREATION_MIGRATE_DATA         (0x01)
+
+
+/* RAID Online Capacity Expansion Structure */
+
+typedef struct _MPI2_RAID_ONLINE_CAPACITY_EXPANSION
+{
+    U32                     Flags;                          /* 0x00 */
+    U16                     DevHandle0;                     /* 0x04 */
+    U16                     Reserved1;                      /* 0x06 */
+    U16                     DevHandle1;                     /* 0x08 */
+    U16                     Reserved2;                      /* 0x0A */
+} MPI2_RAID_ONLINE_CAPACITY_EXPANSION,
+  MPI2_POINTER PTR_MPI2_RAID_ONLINE_CAPACITY_EXPANSION,
+  Mpi2RaidOnlineCapacityExpansion_t,
+  MPI2_POINTER pMpi2RaidOnlineCapacityExpansion_t;
+
+
+/* RAID Volume Indicator Structure */
+
+typedef struct _MPI2_RAID_VOL_INDICATOR
+{
+    U64                     TotalBlocks;                    /* 0x00 */
+    U64                     BlocksRemaining;                /* 0x08 */
+    U32                     Flags;                          /* 0x10 */
+} MPI2_RAID_VOL_INDICATOR, MPI2_POINTER PTR_MPI2_RAID_VOL_INDICATOR,
+  Mpi2RaidVolIndicator_t, MPI2_POINTER pMpi2RaidVolIndicator_t;
+
+/* defines for RAID Volume Indicator Flags field */
+#define MPI2_RAID_VOL_FLAGS_OP_MASK                 (0x0000000F)
+#define MPI2_RAID_VOL_FLAGS_OP_BACKGROUND_INIT      (0x00000000)
+#define MPI2_RAID_VOL_FLAGS_OP_ONLINE_CAP_EXPANSION (0x00000001)
+#define MPI2_RAID_VOL_FLAGS_OP_CONSISTENCY_CHECK    (0x00000002)
+#define MPI2_RAID_VOL_FLAGS_OP_RESYNC               (0x00000003)
+
+
+/* RAID Action Reply ActionData union */
+typedef union _MPI2_RAID_ACTION_REPLY_DATA
+{
+    U32                     Word[5];
+    MPI2_RAID_VOL_INDICATOR RaidVolumeIndicator;
+    U16                     VolDevHandle;
+    U8                      VolumeState;
+    U8                      PhysDiskNum;
+} MPI2_RAID_ACTION_REPLY_DATA, MPI2_POINTER PTR_MPI2_RAID_ACTION_REPLY_DATA,
+  Mpi2RaidActionReplyData_t, MPI2_POINTER pMpi2RaidActionReplyData_t;
+
+/* use MPI2_RAIDVOL0_SETTING_ defines from mpi2_cnfg.h for MPI2_RAID_ACTION_CHANGE_VOL_WRITE_CACHE action */
+
+
+/* RAID Action Reply Message */
+typedef struct _MPI2_RAID_ACTION_REPLY
+{
+    U8                          Action;                     /* 0x00 */
+    U8                          Reserved1;                  /* 0x01 */
+    U8                          MsgLength;                  /* 0x02 */
+    U8                          Function;                   /* 0x03 */
+    U16                         VolDevHandle;               /* 0x04 */
+    U8                          PhysDiskNum;                /* 0x06 */
+    U8                          MsgFlags;                   /* 0x07 */
+    U8                          VP_ID;                      /* 0x08 */
+    U8                          VF_ID;                      /* 0x09 */
+    U16                         Reserved2;                  /* 0x0A */
+    U16                         Reserved3;                  /* 0x0C */
+    U16                         IOCStatus;                  /* 0x0E */
+    U32                         IOCLogInfo;                 /* 0x10 */
+    MPI2_RAID_ACTION_REPLY_DATA ActionData;                 /* 0x14 */
+} MPI2_RAID_ACTION_REPLY, MPI2_POINTER PTR_MPI2_RAID_ACTION_REPLY,
+  Mpi2RaidActionReply_t, MPI2_POINTER pMpi2RaidActionReply_t;
+
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_sas.h b/drivers/scsi/mpt2sas/mpi/mpi2_sas.h
new file mode 100644
index 000000000000..8a42b136cf53
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_sas.h
@@ -0,0 +1,282 @@
+/*
+ *  Copyright (c) 2000-2007 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_sas.h
+ *          Title:  MPI Serial Attached SCSI structures and definitions
+ *  Creation Date:  February 9, 2007
+ *
+ *  mpi2.h Version:  02.00.02
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  06-26-07  02.00.01  Added Clear All Persistent Operation to SAS IO Unit
+ *                      Control Request.
+ *  10-02-08  02.00.02  Added Set IOC Parameter Operation to SAS IO Unit Control
+ *                      Request.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_SAS_H
+#define MPI2_SAS_H
+
+/*
+ * Values for SASStatus.
+ */
+#define MPI2_SASSTATUS_SUCCESS                          (0x00)
+#define MPI2_SASSTATUS_UNKNOWN_ERROR                    (0x01)
+#define MPI2_SASSTATUS_INVALID_FRAME                    (0x02)
+#define MPI2_SASSTATUS_UTC_BAD_DEST                     (0x03)
+#define MPI2_SASSTATUS_UTC_BREAK_RECEIVED               (0x04)
+#define MPI2_SASSTATUS_UTC_CONNECT_RATE_NOT_SUPPORTED   (0x05)
+#define MPI2_SASSTATUS_UTC_PORT_LAYER_REQUEST           (0x06)
+#define MPI2_SASSTATUS_UTC_PROTOCOL_NOT_SUPPORTED       (0x07)
+#define MPI2_SASSTATUS_UTC_STP_RESOURCES_BUSY           (0x08)
+#define MPI2_SASSTATUS_UTC_WRONG_DESTINATION            (0x09)
+#define MPI2_SASSTATUS_SHORT_INFORMATION_UNIT           (0x0A)
+#define MPI2_SASSTATUS_LONG_INFORMATION_UNIT            (0x0B)
+#define MPI2_SASSTATUS_XFER_RDY_INCORRECT_WRITE_DATA    (0x0C)
+#define MPI2_SASSTATUS_XFER_RDY_REQUEST_OFFSET_ERROR    (0x0D)
+#define MPI2_SASSTATUS_XFER_RDY_NOT_EXPECTED            (0x0E)
+#define MPI2_SASSTATUS_DATA_INCORRECT_DATA_LENGTH       (0x0F)
+#define MPI2_SASSTATUS_DATA_TOO_MUCH_READ_DATA          (0x10)
+#define MPI2_SASSTATUS_DATA_OFFSET_ERROR                (0x11)
+#define MPI2_SASSTATUS_SDSF_NAK_RECEIVED                (0x12)
+#define MPI2_SASSTATUS_SDSF_CONNECTION_FAILED           (0x13)
+#define MPI2_SASSTATUS_INITIATOR_RESPONSE_TIMEOUT       (0x14)
+
+
+/*
+ * Values for the SAS DeviceInfo field used in SAS Device Status Change Event
+ * data and SAS Configuration pages.
+ */
+#define MPI2_SAS_DEVICE_INFO_SEP                (0x00004000)
+#define MPI2_SAS_DEVICE_INFO_ATAPI_DEVICE       (0x00002000)
+#define MPI2_SAS_DEVICE_INFO_LSI_DEVICE         (0x00001000)
+#define MPI2_SAS_DEVICE_INFO_DIRECT_ATTACH      (0x00000800)
+#define MPI2_SAS_DEVICE_INFO_SSP_TARGET         (0x00000400)
+#define MPI2_SAS_DEVICE_INFO_STP_TARGET         (0x00000200)
+#define MPI2_SAS_DEVICE_INFO_SMP_TARGET         (0x00000100)
+#define MPI2_SAS_DEVICE_INFO_SATA_DEVICE        (0x00000080)
+#define MPI2_SAS_DEVICE_INFO_SSP_INITIATOR      (0x00000040)
+#define MPI2_SAS_DEVICE_INFO_STP_INITIATOR      (0x00000020)
+#define MPI2_SAS_DEVICE_INFO_SMP_INITIATOR      (0x00000010)
+#define MPI2_SAS_DEVICE_INFO_SATA_HOST          (0x00000008)
+
+#define MPI2_SAS_DEVICE_INFO_MASK_DEVICE_TYPE   (0x00000007)
+#define MPI2_SAS_DEVICE_INFO_NO_DEVICE          (0x00000000)
+#define MPI2_SAS_DEVICE_INFO_END_DEVICE         (0x00000001)
+#define MPI2_SAS_DEVICE_INFO_EDGE_EXPANDER      (0x00000002)
+#define MPI2_SAS_DEVICE_INFO_FANOUT_EXPANDER    (0x00000003)
+
+
+/*****************************************************************************
+*
+*        SAS Messages
+*
+*****************************************************************************/
+
+/****************************************************************************
+*  SMP Passthrough messages
+****************************************************************************/
+
+/* SMP Passthrough Request Message */
+typedef struct _MPI2_SMP_PASSTHROUGH_REQUEST
+{
+    U8                      PassthroughFlags;   /* 0x00 */
+    U8                      PhysicalPort;       /* 0x01 */
+    U8                      ChainOffset;        /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     RequestDataLength;  /* 0x04 */
+    U8                      SGLFlags;           /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved1;          /* 0x0A */
+    U32                     Reserved2;          /* 0x0C */
+    U64                     SASAddress;         /* 0x10 */
+    U32                     Reserved3;          /* 0x18 */
+    U32                     Reserved4;          /* 0x1C */
+    MPI2_SIMPLE_SGE_UNION   SGL;                /* 0x20 */
+} MPI2_SMP_PASSTHROUGH_REQUEST, MPI2_POINTER PTR_MPI2_SMP_PASSTHROUGH_REQUEST,
+  Mpi2SmpPassthroughRequest_t, MPI2_POINTER pMpi2SmpPassthroughRequest_t;
+
+/* values for PassthroughFlags field */
+#define MPI2_SMP_PT_REQ_PT_FLAGS_IMMEDIATE      (0x80)
+
+/* values for SGLFlags field are in the SGL section of mpi2.h */
+
+
+/* SMP Passthrough Reply Message */
+typedef struct _MPI2_SMP_PASSTHROUGH_REPLY
+{
+    U8                      PassthroughFlags;   /* 0x00 */
+    U8                      PhysicalPort;       /* 0x01 */
+    U8                      MsgLength;          /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     ResponseDataLength; /* 0x04 */
+    U8                      SGLFlags;           /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved1;          /* 0x0A */
+    U8                      Reserved2;          /* 0x0C */
+    U8                      SASStatus;          /* 0x0D */
+    U16                     IOCStatus;          /* 0x0E */
+    U32                     IOCLogInfo;         /* 0x10 */
+    U32                     Reserved3;          /* 0x14 */
+    U8                      ResponseData[4];    /* 0x18 */
+} MPI2_SMP_PASSTHROUGH_REPLY, MPI2_POINTER PTR_MPI2_SMP_PASSTHROUGH_REPLY,
+  Mpi2SmpPassthroughReply_t, MPI2_POINTER pMpi2SmpPassthroughReply_t;
+
+/* values for PassthroughFlags field */
+#define MPI2_SMP_PT_REPLY_PT_FLAGS_IMMEDIATE    (0x80)
+
+/* values for SASStatus field are at the top of this file */
+
+
+/****************************************************************************
+*  SATA Passthrough messages
+****************************************************************************/
+
+/* SATA Passthrough Request Message */
+typedef struct _MPI2_SATA_PASSTHROUGH_REQUEST
+{
+    U16                     DevHandle;          /* 0x00 */
+    U8                      ChainOffset;        /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     PassthroughFlags;   /* 0x04 */
+    U8                      SGLFlags;           /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved1;          /* 0x0A */
+    U32                     Reserved2;          /* 0x0C */
+    U32                     Reserved3;          /* 0x10 */
+    U32                     Reserved4;          /* 0x14 */
+    U32                     DataLength;         /* 0x18 */
+    U8                      CommandFIS[20];     /* 0x1C */
+    MPI2_SIMPLE_SGE_UNION   SGL;                /* 0x20 */
+} MPI2_SATA_PASSTHROUGH_REQUEST, MPI2_POINTER PTR_MPI2_SATA_PASSTHROUGH_REQUEST,
+  Mpi2SataPassthroughRequest_t, MPI2_POINTER pMpi2SataPassthroughRequest_t;
+
+/* values for PassthroughFlags field */
+#define MPI2_SATA_PT_REQ_PT_FLAGS_EXECUTE_DIAG      (0x0100)
+#define MPI2_SATA_PT_REQ_PT_FLAGS_DMA               (0x0020)
+#define MPI2_SATA_PT_REQ_PT_FLAGS_PIO               (0x0010)
+#define MPI2_SATA_PT_REQ_PT_FLAGS_UNSPECIFIED_VU    (0x0004)
+#define MPI2_SATA_PT_REQ_PT_FLAGS_WRITE             (0x0002)
+#define MPI2_SATA_PT_REQ_PT_FLAGS_READ              (0x0001)
+
+/* values for SGLFlags field are in the SGL section of mpi2.h */
+
+
+/* SATA Passthrough Reply Message */
+typedef struct _MPI2_SATA_PASSTHROUGH_REPLY
+{
+    U16                     DevHandle;          /* 0x00 */
+    U8                      MsgLength;          /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     PassthroughFlags;   /* 0x04 */
+    U8                      SGLFlags;           /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved1;          /* 0x0A */
+    U8                      Reserved2;          /* 0x0C */
+    U8                      SASStatus;          /* 0x0D */
+    U16                     IOCStatus;          /* 0x0E */
+    U32                     IOCLogInfo;         /* 0x10 */
+    U8                      StatusFIS[20];      /* 0x14 */
+    U32                     StatusControlRegisters; /* 0x28 */
+    U32                     TransferCount;      /* 0x2C */
+} MPI2_SATA_PASSTHROUGH_REPLY, MPI2_POINTER PTR_MPI2_SATA_PASSTHROUGH_REPLY,
+  Mpi2SataPassthroughReply_t, MPI2_POINTER pMpi2SataPassthroughReply_t;
+
+/* values for SASStatus field are at the top of this file */
+
+
+/****************************************************************************
+*  SAS IO Unit Control messages
+****************************************************************************/
+
+/* SAS IO Unit Control Request Message */
+typedef struct _MPI2_SAS_IOUNIT_CONTROL_REQUEST
+{
+    U8                      Operation;          /* 0x00 */
+    U8                      Reserved1;          /* 0x01 */
+    U8                      ChainOffset;        /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     DevHandle;          /* 0x04 */
+    U8                      IOCParameter;       /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved3;          /* 0x0A */
+    U16                     Reserved4;          /* 0x0C */
+    U8                      PhyNum;             /* 0x0E */
+    U8                      PrimFlags;          /* 0x0F */
+    U32                     Primitive;          /* 0x10 */
+    U8                      LookupMethod;       /* 0x14 */
+    U8                      Reserved5;          /* 0x15 */
+    U16                     SlotNumber;         /* 0x16 */
+    U64                     LookupAddress;      /* 0x18 */
+    U32                     IOCParameterValue;  /* 0x20 */
+    U32                     Reserved7;          /* 0x24 */
+    U32                     Reserved8;          /* 0x28 */
+} MPI2_SAS_IOUNIT_CONTROL_REQUEST,
+  MPI2_POINTER PTR_MPI2_SAS_IOUNIT_CONTROL_REQUEST,
+  Mpi2SasIoUnitControlRequest_t, MPI2_POINTER pMpi2SasIoUnitControlRequest_t;
+
+/* values for the Operation field */
+#define MPI2_SAS_OP_CLEAR_ALL_PERSISTENT        (0x02)
+#define MPI2_SAS_OP_PHY_LINK_RESET              (0x06)
+#define MPI2_SAS_OP_PHY_HARD_RESET              (0x07)
+#define MPI2_SAS_OP_PHY_CLEAR_ERROR_LOG         (0x08)
+#define MPI2_SAS_OP_SEND_PRIMITIVE              (0x0A)
+#define MPI2_SAS_OP_FORCE_FULL_DISCOVERY        (0x0B)
+#define MPI2_SAS_OP_TRANSMIT_PORT_SELECT_SIGNAL (0x0C)
+#define MPI2_SAS_OP_REMOVE_DEVICE               (0x0D)
+#define MPI2_SAS_OP_LOOKUP_MAPPING              (0x0E)
+#define MPI2_SAS_OP_SET_IOC_PARAMETER           (0x0F)
+#define MPI2_SAS_OP_PRODUCT_SPECIFIC_MIN        (0x80)
+
+/* values for the PrimFlags field */
+#define MPI2_SAS_PRIMFLAGS_SINGLE               (0x08)
+#define MPI2_SAS_PRIMFLAGS_TRIPLE               (0x02)
+#define MPI2_SAS_PRIMFLAGS_REDUNDANT            (0x01)
+
+/* values for the LookupMethod field */
+#define MPI2_SAS_LOOKUP_METHOD_SAS_ADDRESS          (0x01)
+#define MPI2_SAS_LOOKUP_METHOD_SAS_ENCLOSURE_SLOT   (0x02)
+#define MPI2_SAS_LOOKUP_METHOD_SAS_DEVICE_NAME      (0x03)
+
+
+/* SAS IO Unit Control Reply Message */
+typedef struct _MPI2_SAS_IOUNIT_CONTROL_REPLY
+{
+    U8                      Operation;          /* 0x00 */
+    U8                      Reserved1;          /* 0x01 */
+    U8                      MsgLength;          /* 0x02 */
+    U8                      Function;           /* 0x03 */
+    U16                     DevHandle;          /* 0x04 */
+    U8                      IOCParameter;       /* 0x06 */
+    U8                      MsgFlags;           /* 0x07 */
+    U8                      VP_ID;              /* 0x08 */
+    U8                      VF_ID;              /* 0x09 */
+    U16                     Reserved3;          /* 0x0A */
+    U16                     Reserved4;          /* 0x0C */
+    U16                     IOCStatus;          /* 0x0E */
+    U32                     IOCLogInfo;         /* 0x10 */
+} MPI2_SAS_IOUNIT_CONTROL_REPLY,
+  MPI2_POINTER PTR_MPI2_SAS_IOUNIT_CONTROL_REPLY,
+  Mpi2SasIoUnitControlReply_t, MPI2_POINTER pMpi2SasIoUnitControlReply_t;
+
+
+#endif
+
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_tool.h b/drivers/scsi/mpt2sas/mpi/mpi2_tool.h
new file mode 100644
index 000000000000..2ff4e936bd39
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_tool.h
@@ -0,0 +1,249 @@
+/*
+ *  Copyright (c) 2000-2008 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_tool.h
+ *          Title:  MPI diagnostic tool structures and definitions
+ *  Creation Date:  March 26, 2007
+ *
+ *    mpi2_tool.h Version:  02.00.02
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  12-18-07  02.00.01  Added Diagnostic Buffer Post and Diagnostic Release
+ *                      structures and defines.
+ *  02-29-08  02.00.02  Modified various names to make them 32-character unique.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_TOOL_H
+#define MPI2_TOOL_H
+
+/*****************************************************************************
+*
+*               Toolbox Messages
+*
+*****************************************************************************/
+
+/* defines for the Tools */
+#define MPI2_TOOLBOX_CLEAN_TOOL                     (0x00)
+#define MPI2_TOOLBOX_MEMORY_MOVE_TOOL               (0x01)
+#define MPI2_TOOLBOX_BEACON_TOOL                    (0x05)
+
+/****************************************************************************
+*  Toolbox reply
+****************************************************************************/
+
+typedef struct _MPI2_TOOLBOX_REPLY
+{
+    U8                      Tool;                       /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U16                     Reserved5;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+} MPI2_TOOLBOX_REPLY, MPI2_POINTER PTR_MPI2_TOOLBOX_REPLY,
+  Mpi2ToolboxReply_t, MPI2_POINTER pMpi2ToolboxReply_t;
+
+
+/****************************************************************************
+*  Toolbox Clean Tool request
+****************************************************************************/
+
+typedef struct _MPI2_TOOLBOX_CLEAN_REQUEST
+{
+    U8                      Tool;                       /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U32                     Flags;                      /* 0x0C */
+   } MPI2_TOOLBOX_CLEAN_REQUEST, MPI2_POINTER PTR_MPI2_TOOLBOX_CLEAN_REQUEST,
+  Mpi2ToolboxCleanRequest_t, MPI2_POINTER pMpi2ToolboxCleanRequest_t;
+
+/* values for the Flags field */
+#define MPI2_TOOLBOX_CLEAN_BOOT_SERVICES            (0x80000000)
+#define MPI2_TOOLBOX_CLEAN_PERSIST_MANUFACT_PAGES   (0x40000000)
+#define MPI2_TOOLBOX_CLEAN_OTHER_PERSIST_PAGES      (0x20000000)
+#define MPI2_TOOLBOX_CLEAN_FW_CURRENT               (0x10000000)
+#define MPI2_TOOLBOX_CLEAN_FW_BACKUP                (0x08000000)
+#define MPI2_TOOLBOX_CLEAN_MEGARAID                 (0x02000000)
+#define MPI2_TOOLBOX_CLEAN_INITIALIZATION           (0x01000000)
+#define MPI2_TOOLBOX_CLEAN_FLASH                    (0x00000004)
+#define MPI2_TOOLBOX_CLEAN_SEEPROM                  (0x00000002)
+#define MPI2_TOOLBOX_CLEAN_NVSRAM                   (0x00000001)
+
+
+/****************************************************************************
+*  Toolbox Memory Move request
+****************************************************************************/
+
+typedef struct _MPI2_TOOLBOX_MEM_MOVE_REQUEST
+{
+    U8                      Tool;                       /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    MPI2_SGE_SIMPLE_UNION   SGL;                        /* 0x0C */
+} MPI2_TOOLBOX_MEM_MOVE_REQUEST, MPI2_POINTER PTR_MPI2_TOOLBOX_MEM_MOVE_REQUEST,
+  Mpi2ToolboxMemMoveRequest_t, MPI2_POINTER pMpi2ToolboxMemMoveRequest_t;
+
+
+/****************************************************************************
+*  Toolbox Beacon Tool request
+****************************************************************************/
+
+typedef struct _MPI2_TOOLBOX_BEACON_REQUEST
+{
+    U8                      Tool;                       /* 0x00 */
+    U8                      Reserved1;                  /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U8                      Reserved5;                  /* 0x0C */
+    U8                      PhysicalPort;               /* 0x0D */
+    U8                      Reserved6;                  /* 0x0E */
+    U8                      Flags;                      /* 0x0F */
+} MPI2_TOOLBOX_BEACON_REQUEST, MPI2_POINTER PTR_MPI2_TOOLBOX_BEACON_REQUEST,
+  Mpi2ToolboxBeaconRequest_t, MPI2_POINTER pMpi2ToolboxBeaconRequest_t;
+
+/* values for the Flags field */
+#define MPI2_TOOLBOX_FLAGS_BEACONMODE_OFF       (0x00)
+#define MPI2_TOOLBOX_FLAGS_BEACONMODE_ON        (0x01)
+
+
+/*****************************************************************************
+*
+*       Diagnostic Buffer Messages
+*
+*****************************************************************************/
+
+
+/****************************************************************************
+*  Diagnostic Buffer Post request
+****************************************************************************/
+
+typedef struct _MPI2_DIAG_BUFFER_POST_REQUEST
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      BufferType;                 /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U64                     BufferAddress;              /* 0x0C */
+    U32                     BufferLength;               /* 0x14 */
+    U32                     Reserved5;                  /* 0x18 */
+    U32                     Reserved6;                  /* 0x1C */
+    U32                     Flags;                      /* 0x20 */
+    U32                     ProductSpecific[23];        /* 0x24 */
+} MPI2_DIAG_BUFFER_POST_REQUEST, MPI2_POINTER PTR_MPI2_DIAG_BUFFER_POST_REQUEST,
+  Mpi2DiagBufferPostRequest_t, MPI2_POINTER pMpi2DiagBufferPostRequest_t;
+
+/* values for the BufferType field */
+#define MPI2_DIAG_BUF_TYPE_TRACE                    (0x00)
+#define MPI2_DIAG_BUF_TYPE_SNAPSHOT                 (0x01)
+/* count of the number of buffer types */
+#define MPI2_DIAG_BUF_TYPE_COUNT                    (0x02)
+
+
+/****************************************************************************
+*  Diagnostic Buffer Post reply
+****************************************************************************/
+
+typedef struct _MPI2_DIAG_BUFFER_POST_REPLY
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      BufferType;                 /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U16                     Reserved5;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+    U32                     TransferLength;             /* 0x14 */
+} MPI2_DIAG_BUFFER_POST_REPLY, MPI2_POINTER PTR_MPI2_DIAG_BUFFER_POST_REPLY,
+  Mpi2DiagBufferPostReply_t, MPI2_POINTER pMpi2DiagBufferPostReply_t;
+
+
+/****************************************************************************
+*  Diagnostic Release request
+****************************************************************************/
+
+typedef struct _MPI2_DIAG_RELEASE_REQUEST
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      BufferType;                 /* 0x01 */
+    U8                      ChainOffset;                /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+} MPI2_DIAG_RELEASE_REQUEST, MPI2_POINTER PTR_MPI2_DIAG_RELEASE_REQUEST,
+  Mpi2DiagReleaseRequest_t, MPI2_POINTER pMpi2DiagReleaseRequest_t;
+
+
+/****************************************************************************
+*  Diagnostic Buffer Post reply
+****************************************************************************/
+
+typedef struct _MPI2_DIAG_RELEASE_REPLY
+{
+    U8                      Reserved1;                  /* 0x00 */
+    U8                      BufferType;                 /* 0x01 */
+    U8                      MsgLength;                  /* 0x02 */
+    U8                      Function;                   /* 0x03 */
+    U16                     Reserved2;                  /* 0x04 */
+    U8                      Reserved3;                  /* 0x06 */
+    U8                      MsgFlags;                   /* 0x07 */
+    U8                      VP_ID;                      /* 0x08 */
+    U8                      VF_ID;                      /* 0x09 */
+    U16                     Reserved4;                  /* 0x0A */
+    U16                     Reserved5;                  /* 0x0C */
+    U16                     IOCStatus;                  /* 0x0E */
+    U32                     IOCLogInfo;                 /* 0x10 */
+} MPI2_DIAG_RELEASE_REPLY, MPI2_POINTER PTR_MPI2_DIAG_RELEASE_REPLY,
+  Mpi2DiagReleaseReply_t, MPI2_POINTER pMpi2DiagReleaseReply_t;
+
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpi/mpi2_type.h b/drivers/scsi/mpt2sas/mpi/mpi2_type.h
new file mode 100644
index 000000000000..cfde017bf16e
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpi/mpi2_type.h
@@ -0,0 +1,61 @@
+/*
+ *  Copyright (c) 2000-2007 LSI Corporation.
+ *
+ *
+ *           Name:  mpi2_type.h
+ *          Title:  MPI basic type definitions
+ *  Creation Date:  August 16, 2006
+ *
+ *    mpi2_type.h Version:  02.00.00
+ *
+ *  Version History
+ *  ---------------
+ *
+ *  Date      Version   Description
+ *  --------  --------  ------------------------------------------------------
+ *  04-30-07  02.00.00  Corresponds to Fusion-MPT MPI Specification Rev A.
+ *  --------------------------------------------------------------------------
+ */
+
+#ifndef MPI2_TYPE_H
+#define MPI2_TYPE_H
+
+
+/*******************************************************************************
+ * Define MPI2_POINTER if it hasn't already been defined. By default
+ * MPI2_POINTER is defined to be a near pointer. MPI2_POINTER can be defined as
+ * a far pointer by defining MPI2_POINTER as "far *" before this header file is
+ * included.
+ */
+#ifndef MPI2_POINTER
+#define MPI2_POINTER     *
+#endif
+
+/* the basic types may have already been included by mpi_type.h */
+#ifndef MPI_TYPE_H
+/*****************************************************************************
+*
+*               Basic Types
+*
+*****************************************************************************/
+
+typedef u8 U8;
+typedef __le16 U16;
+typedef __le32 U32;
+typedef __le64 U64 __attribute__((aligned(4)));
+
+/*****************************************************************************
+*
+*               Pointer Types
+*
+*****************************************************************************/
+
+typedef U8      *PU8;
+typedef U16     *PU16;
+typedef U32     *PU32;
+typedef U64     *PU64;
+
+#endif
+
+#endif
+
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
new file mode 100644
index 000000000000..52427a8324f5
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -0,0 +1,3435 @@
+/*
+ * This is the Fusion MPT base driver providing common API layer interface
+ * for access to MPT (Message Passing Technology) firmware.
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_base.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#include <linux/version.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/kdev_t.h>
+#include <linux/blkdev.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/dma-mapping.h>
+#include <linux/sort.h>
+#include <linux/io.h>
+
+#include "mpt2sas_base.h"
+
+static MPT_CALLBACK	mpt_callbacks[MPT_MAX_CALLBACKS];
+
+#define FAULT_POLLING_INTERVAL 1000 /* in milliseconds */
+#define MPT2SAS_MAX_REQUEST_QUEUE 500 /* maximum controller queue depth */
+
+static int max_queue_depth = -1;
+module_param(max_queue_depth, int, 0);
+MODULE_PARM_DESC(max_queue_depth, " max controller queue depth ");
+
+static int max_sgl_entries = -1;
+module_param(max_sgl_entries, int, 0);
+MODULE_PARM_DESC(max_sgl_entries, " max sg entries ");
+
+static int msix_disable = -1;
+module_param(msix_disable, int, 0);
+MODULE_PARM_DESC(msix_disable, " disable msix routed interrupts (default=0)");
+
+/**
+ * _base_fault_reset_work - workq handling ioc fault conditions
+ * @work: input argument, used to derive ioc
+ * Context: sleep.
+ *
+ * Return nothing.
+ */
+static void
+_base_fault_reset_work(struct work_struct *work)
+{
+	struct MPT2SAS_ADAPTER *ioc =
+	    container_of(work, struct MPT2SAS_ADAPTER, fault_reset_work.work);
+	unsigned long	 flags;
+	u32 doorbell;
+	int rc;
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->ioc_reset_in_progress)
+		goto rearm_timer;
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	doorbell = mpt2sas_base_get_iocstate(ioc, 0);
+	if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) {
+		rc = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+		printk(MPT2SAS_WARN_FMT "%s: hard reset: %s\n", ioc->name,
+		    __func__, (rc == 0) ? "success" : "failed");
+		doorbell = mpt2sas_base_get_iocstate(ioc, 0);
+		if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT)
+			mpt2sas_base_fault_info(ioc, doorbell &
+			    MPI2_DOORBELL_DATA_MASK);
+	}
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+ rearm_timer:
+	if (ioc->fault_reset_work_q)
+		queue_delayed_work(ioc->fault_reset_work_q,
+		    &ioc->fault_reset_work,
+		    msecs_to_jiffies(FAULT_POLLING_INTERVAL));
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _base_sas_ioc_info - verbose translation of the ioc status
+ * @ioc: pointer to scsi command object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @request_hdr: request mf
+ *
+ * Return nothing.
+ */
+static void
+_base_sas_ioc_info(struct MPT2SAS_ADAPTER *ioc, MPI2DefaultReply_t *mpi_reply,
+     MPI2RequestHeader_t *request_hdr)
+{
+	u16 ioc_status = le16_to_cpu(mpi_reply->IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	char *desc = NULL;
+	u16 frame_sz;
+	char *func_str = NULL;
+
+	/* SCSI_IO, RAID_PASS are handled from _scsih_scsi_ioc_info */
+	if (request_hdr->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
+	    request_hdr->Function == MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH ||
+	    request_hdr->Function == MPI2_FUNCTION_EVENT_NOTIFICATION)
+		return;
+
+	switch (ioc_status) {
+
+/****************************************************************************
+*  Common IOCStatus values for all replies
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_INVALID_FUNCTION:
+		desc = "invalid function";
+		break;
+	case MPI2_IOCSTATUS_BUSY:
+		desc = "busy";
+		break;
+	case MPI2_IOCSTATUS_INVALID_SGL:
+		desc = "invalid sgl";
+		break;
+	case MPI2_IOCSTATUS_INTERNAL_ERROR:
+		desc = "internal error";
+		break;
+	case MPI2_IOCSTATUS_INVALID_VPID:
+		desc = "invalid vpid";
+		break;
+	case MPI2_IOCSTATUS_INSUFFICIENT_RESOURCES:
+		desc = "insufficient resources";
+		break;
+	case MPI2_IOCSTATUS_INVALID_FIELD:
+		desc = "invalid field";
+		break;
+	case MPI2_IOCSTATUS_INVALID_STATE:
+		desc = "invalid state";
+		break;
+	case MPI2_IOCSTATUS_OP_STATE_NOT_SUPPORTED:
+		desc = "op state not supported";
+		break;
+
+/****************************************************************************
+*  Config IOCStatus values
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_CONFIG_INVALID_ACTION:
+		desc = "config invalid action";
+		break;
+	case MPI2_IOCSTATUS_CONFIG_INVALID_TYPE:
+		desc = "config invalid type";
+		break;
+	case MPI2_IOCSTATUS_CONFIG_INVALID_PAGE:
+		desc = "config invalid page";
+		break;
+	case MPI2_IOCSTATUS_CONFIG_INVALID_DATA:
+		desc = "config invalid data";
+		break;
+	case MPI2_IOCSTATUS_CONFIG_NO_DEFAULTS:
+		desc = "config no defaults";
+		break;
+	case MPI2_IOCSTATUS_CONFIG_CANT_COMMIT:
+		desc = "config cant commit";
+		break;
+
+/****************************************************************************
+*  SCSI IO Reply
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_SCSI_RECOVERED_ERROR:
+	case MPI2_IOCSTATUS_SCSI_INVALID_DEVHANDLE:
+	case MPI2_IOCSTATUS_SCSI_DEVICE_NOT_THERE:
+	case MPI2_IOCSTATUS_SCSI_DATA_OVERRUN:
+	case MPI2_IOCSTATUS_SCSI_DATA_UNDERRUN:
+	case MPI2_IOCSTATUS_SCSI_IO_DATA_ERROR:
+	case MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR:
+	case MPI2_IOCSTATUS_SCSI_TASK_TERMINATED:
+	case MPI2_IOCSTATUS_SCSI_RESIDUAL_MISMATCH:
+	case MPI2_IOCSTATUS_SCSI_TASK_MGMT_FAILED:
+	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
+	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
+		break;
+
+/****************************************************************************
+*  For use by SCSI Initiator and SCSI Target end-to-end data protection
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_EEDP_GUARD_ERROR:
+		desc = "eedp guard error";
+		break;
+	case MPI2_IOCSTATUS_EEDP_REF_TAG_ERROR:
+		desc = "eedp ref tag error";
+		break;
+	case MPI2_IOCSTATUS_EEDP_APP_TAG_ERROR:
+		desc = "eedp app tag error";
+		break;
+
+/****************************************************************************
+*  SCSI Target values
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_TARGET_INVALID_IO_INDEX:
+		desc = "target invalid io index";
+		break;
+	case MPI2_IOCSTATUS_TARGET_ABORTED:
+		desc = "target aborted";
+		break;
+	case MPI2_IOCSTATUS_TARGET_NO_CONN_RETRYABLE:
+		desc = "target no conn retryable";
+		break;
+	case MPI2_IOCSTATUS_TARGET_NO_CONNECTION:
+		desc = "target no connection";
+		break;
+	case MPI2_IOCSTATUS_TARGET_XFER_COUNT_MISMATCH:
+		desc = "target xfer count mismatch";
+		break;
+	case MPI2_IOCSTATUS_TARGET_DATA_OFFSET_ERROR:
+		desc = "target data offset error";
+		break;
+	case MPI2_IOCSTATUS_TARGET_TOO_MUCH_WRITE_DATA:
+		desc = "target too much write data";
+		break;
+	case MPI2_IOCSTATUS_TARGET_IU_TOO_SHORT:
+		desc = "target iu too short";
+		break;
+	case MPI2_IOCSTATUS_TARGET_ACK_NAK_TIMEOUT:
+		desc = "target ack nak timeout";
+		break;
+	case MPI2_IOCSTATUS_TARGET_NAK_RECEIVED:
+		desc = "target nak received";
+		break;
+
+/****************************************************************************
+*  Serial Attached SCSI values
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_SAS_SMP_REQUEST_FAILED:
+		desc = "smp request failed";
+		break;
+	case MPI2_IOCSTATUS_SAS_SMP_DATA_OVERRUN:
+		desc = "smp data overrun";
+		break;
+
+/****************************************************************************
+*  Diagnostic Buffer Post / Diagnostic Release values
+****************************************************************************/
+
+	case MPI2_IOCSTATUS_DIAGNOSTIC_RELEASED:
+		desc = "diagnostic released";
+		break;
+	default:
+		break;
+	}
+
+	if (!desc)
+		return;
+
+	switch (request_hdr->Function) {
+	case MPI2_FUNCTION_CONFIG:
+		frame_sz = sizeof(Mpi2ConfigRequest_t) + ioc->sge_size;
+		func_str = "config_page";
+		break;
+	case MPI2_FUNCTION_SCSI_TASK_MGMT:
+		frame_sz = sizeof(Mpi2SCSITaskManagementRequest_t);
+		func_str = "task_mgmt";
+		break;
+	case MPI2_FUNCTION_SAS_IO_UNIT_CONTROL:
+		frame_sz = sizeof(Mpi2SasIoUnitControlRequest_t);
+		func_str = "sas_iounit_ctl";
+		break;
+	case MPI2_FUNCTION_SCSI_ENCLOSURE_PROCESSOR:
+		frame_sz = sizeof(Mpi2SepRequest_t);
+		func_str = "enclosure";
+		break;
+	case MPI2_FUNCTION_IOC_INIT:
+		frame_sz = sizeof(Mpi2IOCInitRequest_t);
+		func_str = "ioc_init";
+		break;
+	case MPI2_FUNCTION_PORT_ENABLE:
+		frame_sz = sizeof(Mpi2PortEnableRequest_t);
+		func_str = "port_enable";
+		break;
+	case MPI2_FUNCTION_SMP_PASSTHROUGH:
+		frame_sz = sizeof(Mpi2SmpPassthroughRequest_t) + ioc->sge_size;
+		func_str = "smp_passthru";
+		break;
+	default:
+		frame_sz = 32;
+		func_str = "unknown";
+		break;
+	}
+
+	printk(MPT2SAS_WARN_FMT "ioc_status: %s(0x%04x), request(0x%p),"
+	    " (%s)\n", ioc->name, desc, ioc_status, request_hdr, func_str);
+
+	_debug_dump_mf(request_hdr, frame_sz/4);
+}
+
+/**
+ * _base_display_event_data - verbose translation of firmware asyn events
+ * @ioc: pointer to scsi command object
+ * @mpi_reply: reply mf payload returned from firmware
+ *
+ * Return nothing.
+ */
+static void
+_base_display_event_data(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventNotificationReply_t *mpi_reply)
+{
+	char *desc = NULL;
+	u16 event;
+
+	if (!(ioc->logging_level & MPT_DEBUG_EVENTS))
+		return;
+
+	event = le16_to_cpu(mpi_reply->Event);
+
+	switch (event) {
+	case MPI2_EVENT_LOG_DATA:
+		desc = "Log Data";
+		break;
+	case MPI2_EVENT_STATE_CHANGE:
+		desc = "Status Change";
+		break;
+	case MPI2_EVENT_HARD_RESET_RECEIVED:
+		desc = "Hard Reset Received";
+		break;
+	case MPI2_EVENT_EVENT_CHANGE:
+		desc = "Event Change";
+		break;
+	case MPI2_EVENT_TASK_SET_FULL:
+		desc = "Task Set Full";
+		break;
+	case MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE:
+		desc = "Device Status Change";
+		break;
+	case MPI2_EVENT_IR_OPERATION_STATUS:
+		desc = "IR Operation Status";
+		break;
+	case MPI2_EVENT_SAS_DISCOVERY:
+		desc =  "Discovery";
+		break;
+	case MPI2_EVENT_SAS_BROADCAST_PRIMITIVE:
+		desc = "SAS Broadcast Primitive";
+		break;
+	case MPI2_EVENT_SAS_INIT_DEVICE_STATUS_CHANGE:
+		desc = "SAS Init Device Status Change";
+		break;
+	case MPI2_EVENT_SAS_INIT_TABLE_OVERFLOW:
+		desc = "SAS Init Table Overflow";
+		break;
+	case MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST:
+		desc = "SAS Topology Change List";
+		break;
+	case MPI2_EVENT_SAS_ENCL_DEVICE_STATUS_CHANGE:
+		desc = "SAS Enclosure Device Status Change";
+		break;
+	case MPI2_EVENT_IR_VOLUME:
+		desc = "IR Volume";
+		break;
+	case MPI2_EVENT_IR_PHYSICAL_DISK:
+		desc = "IR Physical Disk";
+		break;
+	case MPI2_EVENT_IR_CONFIGURATION_CHANGE_LIST:
+		desc = "IR Configuration Change List";
+		break;
+	case MPI2_EVENT_LOG_ENTRY_ADDED:
+		desc = "Log Entry Added";
+		break;
+	}
+
+	if (!desc)
+		return;
+
+	printk(MPT2SAS_INFO_FMT "%s\n", ioc->name, desc);
+}
+#endif
+
+/**
+ * _base_sas_log_info - verbose translation of firmware log info
+ * @ioc: pointer to scsi command object
+ * @log_info: log info
+ *
+ * Return nothing.
+ */
+static void
+_base_sas_log_info(struct MPT2SAS_ADAPTER *ioc , u32 log_info)
+{
+	union loginfo_type {
+		u32	loginfo;
+		struct {
+			u32	subcode:16;
+			u32	code:8;
+			u32	originator:4;
+			u32	bus_type:4;
+		} dw;
+	};
+	union loginfo_type sas_loginfo;
+	char *originator_str = NULL;
+
+	sas_loginfo.loginfo = log_info;
+	if (sas_loginfo.dw.bus_type != 3 /*SAS*/)
+		return;
+
+	/* eat the loginfos associated with task aborts */
+	if (ioc->ignore_loginfos && (log_info == 30050000 || log_info ==
+	    0x31140000 || log_info == 0x31130000))
+		return;
+
+	switch (sas_loginfo.dw.originator) {
+	case 0:
+		originator_str = "IOP";
+		break;
+	case 1:
+		originator_str = "PL";
+		break;
+	case 2:
+		originator_str = "IR";
+		break;
+	}
+
+	printk(MPT2SAS_WARN_FMT "log_info(0x%08x): originator(%s), "
+	    "code(0x%02x), sub_code(0x%04x)\n", ioc->name, log_info,
+	     originator_str, sas_loginfo.dw.code,
+	     sas_loginfo.dw.subcode);
+}
+
+/**
+ * mpt2sas_base_fault_info - verbose translation of firmware FAULT code
+ * @ioc: pointer to scsi command object
+ * @fault_code: fault code
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_fault_info(struct MPT2SAS_ADAPTER *ioc , u16 fault_code)
+{
+	printk(MPT2SAS_ERR_FMT "fault_state(0x%04x)!\n",
+	    ioc->name, fault_code);
+}
+
+/**
+ * _base_display_reply_info -
+ * @ioc: pointer to scsi command object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ *
+ * Return nothing.
+ */
+static void
+_base_display_reply_info(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID,
+    u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+	u16 ioc_status;
+
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus);
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if ((ioc_status & MPI2_IOCSTATUS_MASK) &&
+	    (ioc->logging_level & MPT_DEBUG_REPLY)) {
+		_base_sas_ioc_info(ioc , mpi_reply,
+		   mpt2sas_base_get_msg_frame(ioc, smid));
+	}
+#endif
+	if (ioc_status & MPI2_IOCSTATUS_FLAG_LOG_INFO_AVAILABLE)
+		_base_sas_log_info(ioc, le32_to_cpu(mpi_reply->IOCLogInfo));
+}
+
+/**
+ * mpt2sas_base_done - base internal command completion routine
+ * @ioc: pointer to scsi command object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (mpi_reply && mpi_reply->Function == MPI2_FUNCTION_EVENT_ACK)
+		return;
+
+	if (ioc->base_cmds.status == MPT2_CMD_NOT_USED)
+		return;
+
+	ioc->base_cmds.status |= MPT2_CMD_COMPLETE;
+	if (mpi_reply) {
+		ioc->base_cmds.status |= MPT2_CMD_REPLY_VALID;
+		memcpy(ioc->base_cmds.reply, mpi_reply, mpi_reply->MsgLength*4);
+	}
+	ioc->base_cmds.status &= ~MPT2_CMD_PENDING;
+	complete(&ioc->base_cmds.done);
+}
+
+/**
+ * _base_async_event - main callback handler for firmware asyn events
+ * @ioc: pointer to scsi command object
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ *
+ * Return nothing.
+ */
+static void
+_base_async_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, u32 reply)
+{
+	Mpi2EventNotificationReply_t *mpi_reply;
+	Mpi2EventAckRequest_t *ack_request;
+	u16 smid;
+
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (!mpi_reply)
+		return;
+	if (mpi_reply->Function != MPI2_FUNCTION_EVENT_NOTIFICATION)
+		return;
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	_base_display_event_data(ioc, mpi_reply);
+#endif
+	if (!(mpi_reply->AckRequired & MPI2_EVENT_NOTIFICATION_ACK_REQUIRED))
+		goto out;
+	smid = mpt2sas_base_get_smid(ioc, ioc->base_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		goto out;
+	}
+
+	ack_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	memset(ack_request, 0, sizeof(Mpi2EventAckRequest_t));
+	ack_request->Function = MPI2_FUNCTION_EVENT_ACK;
+	ack_request->Event = mpi_reply->Event;
+	ack_request->EventContext = mpi_reply->EventContext;
+	ack_request->VF_ID = VF_ID;
+	mpt2sas_base_put_smid_default(ioc, smid, VF_ID);
+
+ out:
+
+	/* scsih callback handler */
+	mpt2sas_scsih_event_callback(ioc, VF_ID, reply);
+
+	/* ctl callback handler */
+	mpt2sas_ctl_event_callback(ioc, VF_ID, reply);
+}
+
+/**
+ * _base_mask_interrupts - disable interrupts
+ * @ioc: pointer to scsi command object
+ *
+ * Disabling ResetIRQ, Reply and Doorbell Interrupts
+ *
+ * Return nothing.
+ */
+static void
+_base_mask_interrupts(struct MPT2SAS_ADAPTER *ioc)
+{
+	u32 him_register;
+
+	ioc->mask_interrupts = 1;
+	him_register = readl(&ioc->chip->HostInterruptMask);
+	him_register |= MPI2_HIM_DIM + MPI2_HIM_RIM + MPI2_HIM_RESET_IRQ_MASK;
+	writel(him_register, &ioc->chip->HostInterruptMask);
+	readl(&ioc->chip->HostInterruptMask);
+}
+
+/**
+ * _base_unmask_interrupts - enable interrupts
+ * @ioc: pointer to scsi command object
+ *
+ * Enabling only Reply Interrupts
+ *
+ * Return nothing.
+ */
+static void
+_base_unmask_interrupts(struct MPT2SAS_ADAPTER *ioc)
+{
+	u32 him_register;
+
+	writel(0, &ioc->chip->HostInterruptStatus);
+	him_register = readl(&ioc->chip->HostInterruptMask);
+	him_register &= ~MPI2_HIM_RIM;
+	writel(him_register, &ioc->chip->HostInterruptMask);
+	ioc->mask_interrupts = 0;
+}
+
+/**
+ * _base_interrupt - MPT adapter (IOC) specific interrupt handler.
+ * @irq: irq number (not used)
+ * @bus_id: bus identifier cookie == pointer to MPT_ADAPTER structure
+ * @r: pt_regs pointer (not used)
+ *
+ * Return IRQ_HANDLE if processed, else IRQ_NONE.
+ */
+static irqreturn_t
+_base_interrupt(int irq, void *bus_id)
+{
+	u32 post_index, post_index_next, completed_cmds;
+	u8 request_desript_type;
+	u16 smid;
+	u8 cb_idx;
+	u32 reply;
+	u8 VF_ID;
+	int i;
+	struct MPT2SAS_ADAPTER *ioc = bus_id;
+
+	if (ioc->mask_interrupts)
+		return IRQ_NONE;
+
+	post_index = ioc->reply_post_host_index;
+	request_desript_type = ioc->reply_post_free[post_index].
+	    Default.ReplyFlags & MPI2_RPY_DESCRIPT_FLAGS_TYPE_MASK;
+	if (request_desript_type == MPI2_RPY_DESCRIPT_FLAGS_UNUSED)
+		return IRQ_NONE;
+
+	completed_cmds = 0;
+	do {
+		if (ioc->reply_post_free[post_index].Words == ~0ULL)
+			goto out;
+		reply = 0;
+		cb_idx = 0xFF;
+		smid = le16_to_cpu(ioc->reply_post_free[post_index].
+		    Default.DescriptorTypeDependent1);
+		VF_ID = ioc->reply_post_free[post_index].
+		    Default.VF_ID;
+		if (request_desript_type ==
+		    MPI2_RPY_DESCRIPT_FLAGS_ADDRESS_REPLY) {
+			reply = le32_to_cpu(ioc->reply_post_free[post_index].
+			    AddressReply.ReplyFrameAddress);
+		} else if (request_desript_type ==
+		    MPI2_RPY_DESCRIPT_FLAGS_TARGET_COMMAND_BUFFER)
+			goto next;
+		else if (request_desript_type ==
+		    MPI2_RPY_DESCRIPT_FLAGS_TARGETASSIST_SUCCESS)
+			goto next;
+		if (smid)
+			cb_idx = ioc->scsi_lookup[smid - 1].cb_idx;
+		if (smid && cb_idx != 0xFF) {
+			mpt_callbacks[cb_idx](ioc, smid, VF_ID, reply);
+			if (reply)
+				_base_display_reply_info(ioc, smid, VF_ID,
+				    reply);
+			mpt2sas_base_free_smid(ioc, smid);
+		}
+		if (!smid)
+			_base_async_event(ioc, VF_ID, reply);
+
+		/* reply free queue handling */
+		if (reply) {
+			ioc->reply_free_host_index =
+			    (ioc->reply_free_host_index ==
+			    (ioc->reply_free_queue_depth - 1)) ?
+			    0 : ioc->reply_free_host_index + 1;
+			ioc->reply_free[ioc->reply_free_host_index] =
+			    cpu_to_le32(reply);
+			writel(ioc->reply_free_host_index,
+			    &ioc->chip->ReplyFreeHostIndex);
+			wmb();
+		}
+
+ next:
+		post_index_next = (post_index == (ioc->reply_post_queue_depth -
+		    1)) ? 0 : post_index + 1;
+		request_desript_type =
+		    ioc->reply_post_free[post_index_next].Default.ReplyFlags
+		    & MPI2_RPY_DESCRIPT_FLAGS_TYPE_MASK;
+		completed_cmds++;
+		if (request_desript_type == MPI2_RPY_DESCRIPT_FLAGS_UNUSED)
+			goto out;
+		post_index = post_index_next;
+	} while (1);
+
+ out:
+
+	if (!completed_cmds)
+		return IRQ_NONE;
+
+	/* reply post descriptor handling */
+	post_index_next = ioc->reply_post_host_index;
+	for (i = 0 ; i < completed_cmds; i++) {
+		post_index = post_index_next;
+		/* poison the reply post descriptor */
+		ioc->reply_post_free[post_index_next].Words = ~0ULL;
+		post_index_next = (post_index ==
+		    (ioc->reply_post_queue_depth - 1))
+		    ? 0 : post_index + 1;
+	}
+	ioc->reply_post_host_index = post_index_next;
+	writel(post_index_next, &ioc->chip->ReplyPostHostIndex);
+	wmb();
+	return IRQ_HANDLED;
+}
+
+/**
+ * mpt2sas_base_release_callback_handler - clear interupt callback handler
+ * @cb_idx: callback index
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_release_callback_handler(u8 cb_idx)
+{
+	mpt_callbacks[cb_idx] = NULL;
+}
+
+/**
+ * mpt2sas_base_register_callback_handler - obtain index for the interrupt callback handler
+ * @cb_func: callback function
+ *
+ * Returns cb_func.
+ */
+u8
+mpt2sas_base_register_callback_handler(MPT_CALLBACK cb_func)
+{
+	u8 cb_idx;
+
+	for (cb_idx = MPT_MAX_CALLBACKS-1; cb_idx; cb_idx--)
+		if (mpt_callbacks[cb_idx] == NULL)
+			break;
+
+	mpt_callbacks[cb_idx] = cb_func;
+	return cb_idx;
+}
+
+/**
+ * mpt2sas_base_initialize_callback_handler - initialize the interrupt callback handler
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_initialize_callback_handler(void)
+{
+	u8 cb_idx;
+
+	for (cb_idx = 0; cb_idx < MPT_MAX_CALLBACKS; cb_idx++)
+		mpt2sas_base_release_callback_handler(cb_idx);
+}
+
+/**
+ * mpt2sas_base_build_zero_len_sge - build zero length sg entry
+ * @ioc: per adapter object
+ * @paddr: virtual address for SGE
+ *
+ * Create a zero length scatter gather entry to insure the IOCs hardware has
+ * something to use if the target device goes brain dead and tries
+ * to send data even when none is asked for.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_build_zero_len_sge(struct MPT2SAS_ADAPTER *ioc, void *paddr)
+{
+	u32 flags_length = (u32)((MPI2_SGE_FLAGS_LAST_ELEMENT |
+	    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_END_OF_LIST |
+	    MPI2_SGE_FLAGS_SIMPLE_ELEMENT) <<
+	    MPI2_SGE_FLAGS_SHIFT);
+	ioc->base_add_sg_single(paddr, flags_length, -1);
+}
+
+/**
+ * _base_add_sg_single_32 - Place a simple 32 bit SGE at address pAddr.
+ * @paddr: virtual address for SGE
+ * @flags_length: SGE flags and data transfer length
+ * @dma_addr: Physical address
+ *
+ * Return nothing.
+ */
+static void
+_base_add_sg_single_32(void *paddr, u32 flags_length, dma_addr_t dma_addr)
+{
+	Mpi2SGESimple32_t *sgel = paddr;
+
+	flags_length |= (MPI2_SGE_FLAGS_32_BIT_ADDRESSING |
+	    MPI2_SGE_FLAGS_SYSTEM_ADDRESS) << MPI2_SGE_FLAGS_SHIFT;
+	sgel->FlagsLength = cpu_to_le32(flags_length);
+	sgel->Address = cpu_to_le32(dma_addr);
+}
+
+
+/**
+ * _base_add_sg_single_64 - Place a simple 64 bit SGE at address pAddr.
+ * @paddr: virtual address for SGE
+ * @flags_length: SGE flags and data transfer length
+ * @dma_addr: Physical address
+ *
+ * Return nothing.
+ */
+static void
+_base_add_sg_single_64(void *paddr, u32 flags_length, dma_addr_t dma_addr)
+{
+	Mpi2SGESimple64_t *sgel = paddr;
+
+	flags_length |= (MPI2_SGE_FLAGS_64_BIT_ADDRESSING |
+	    MPI2_SGE_FLAGS_SYSTEM_ADDRESS) << MPI2_SGE_FLAGS_SHIFT;
+	sgel->FlagsLength = cpu_to_le32(flags_length);
+	sgel->Address = cpu_to_le64(dma_addr);
+}
+
+#define convert_to_kb(x) ((x) << (PAGE_SHIFT - 10))
+
+/**
+ * _base_config_dma_addressing - set dma addressing
+ * @ioc: per adapter object
+ * @pdev: PCI device struct
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_config_dma_addressing(struct MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev)
+{
+	struct sysinfo s;
+	char *desc = NULL;
+
+	if (sizeof(dma_addr_t) > 4) {
+		const uint64_t required_mask =
+		    dma_get_required_mask(&pdev->dev);
+		if ((required_mask > DMA_32BIT_MASK) && !pci_set_dma_mask(pdev,
+		    DMA_64BIT_MASK) && !pci_set_consistent_dma_mask(pdev,
+		    DMA_64BIT_MASK)) {
+			ioc->base_add_sg_single = &_base_add_sg_single_64;
+			ioc->sge_size = sizeof(Mpi2SGESimple64_t);
+			desc = "64";
+			goto out;
+		}
+	}
+
+	if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)
+	    && !pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK)) {
+		ioc->base_add_sg_single = &_base_add_sg_single_32;
+		ioc->sge_size = sizeof(Mpi2SGESimple32_t);
+		desc = "32";
+	} else
+		return -ENODEV;
+
+ out:
+	si_meminfo(&s);
+	printk(MPT2SAS_INFO_FMT "%s BIT PCI BUS DMA ADDRESSING SUPPORTED, "
+	    "total mem (%ld kB)\n", ioc->name, desc, convert_to_kb(s.totalram));
+
+	return 0;
+}
+
+/**
+ * _base_save_msix_table - backup msix vector table
+ * @ioc: per adapter object
+ *
+ * This address an errata where diag reset clears out the table
+ */
+static void
+_base_save_msix_table(struct MPT2SAS_ADAPTER *ioc)
+{
+	int i;
+
+	if (!ioc->msix_enable || ioc->msix_table_backup == NULL)
+		return;
+
+	for (i = 0; i < ioc->msix_vector_count; i++)
+		ioc->msix_table_backup[i] = ioc->msix_table[i];
+}
+
+/**
+ * _base_restore_msix_table - this restores the msix vector table
+ * @ioc: per adapter object
+ *
+ */
+static void
+_base_restore_msix_table(struct MPT2SAS_ADAPTER *ioc)
+{
+	int i;
+
+	if (!ioc->msix_enable || ioc->msix_table_backup == NULL)
+		return;
+
+	for (i = 0; i < ioc->msix_vector_count; i++)
+		ioc->msix_table[i] = ioc->msix_table_backup[i];
+}
+
+/**
+ * _base_check_enable_msix - checks MSIX capabable.
+ * @ioc: per adapter object
+ *
+ * Check to see if card is capable of MSIX, and set number
+ * of avaliable msix vectors
+ */
+static int
+_base_check_enable_msix(struct MPT2SAS_ADAPTER *ioc)
+{
+	int base;
+	u16 message_control;
+	u32 msix_table_offset;
+
+	base = pci_find_capability(ioc->pdev, PCI_CAP_ID_MSIX);
+	if (!base) {
+		dfailprintk(ioc, printk(MPT2SAS_INFO_FMT "msix not "
+		    "supported\n", ioc->name));
+		return -EINVAL;
+	}
+
+	/* get msix vector count */
+	pci_read_config_word(ioc->pdev, base + 2, &message_control);
+	ioc->msix_vector_count = (message_control & 0x3FF) + 1;
+
+	/* get msix table  */
+	pci_read_config_dword(ioc->pdev, base + 4, &msix_table_offset);
+	msix_table_offset &= 0xFFFFFFF8;
+	ioc->msix_table = (u32 *)((void *)ioc->chip + msix_table_offset);
+
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "msix is supported, "
+	    "vector_count(%d), table_offset(0x%08x), table(%p)\n", ioc->name,
+	    ioc->msix_vector_count, msix_table_offset, ioc->msix_table));
+	return 0;
+}
+
+/**
+ * _base_disable_msix - disables msix
+ * @ioc: per adapter object
+ *
+ */
+static void
+_base_disable_msix(struct MPT2SAS_ADAPTER *ioc)
+{
+	if (ioc->msix_enable) {
+		pci_disable_msix(ioc->pdev);
+		kfree(ioc->msix_table_backup);
+		ioc->msix_table_backup = NULL;
+		ioc->msix_enable = 0;
+	}
+}
+
+/**
+ * _base_enable_msix - enables msix, failback to io_apic
+ * @ioc: per adapter object
+ *
+ */
+static int
+_base_enable_msix(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct msix_entry entries;
+	int r;
+	u8 try_msix = 0;
+
+	if (msix_disable == -1 || msix_disable == 0)
+		try_msix = 1;
+
+	if (!try_msix)
+		goto try_ioapic;
+
+	if (_base_check_enable_msix(ioc) != 0)
+		goto try_ioapic;
+
+	ioc->msix_table_backup = kcalloc(ioc->msix_vector_count,
+	    sizeof(u32), GFP_KERNEL);
+	if (!ioc->msix_table_backup) {
+		dfailprintk(ioc, printk(MPT2SAS_INFO_FMT "allocation for "
+		    "msix_table_backup failed!!!\n", ioc->name));
+		goto try_ioapic;
+	}
+
+	memset(&entries, 0, sizeof(struct msix_entry));
+	r = pci_enable_msix(ioc->pdev, &entries, 1);
+	if (r) {
+		dfailprintk(ioc, printk(MPT2SAS_INFO_FMT "pci_enable_msix "
+		    "failed (r=%d) !!!\n", ioc->name, r));
+		goto try_ioapic;
+	}
+
+	r = request_irq(entries.vector, _base_interrupt, IRQF_SHARED,
+	    ioc->name, ioc);
+	if (r) {
+		dfailprintk(ioc, printk(MPT2SAS_INFO_FMT "unable to allocate "
+		    "interrupt %d !!!\n", ioc->name, entries.vector));
+		pci_disable_msix(ioc->pdev);
+		goto try_ioapic;
+	}
+
+	ioc->pci_irq = entries.vector;
+	ioc->msix_enable = 1;
+	return 0;
+
+/* failback to io_apic interrupt routing */
+ try_ioapic:
+
+	r = request_irq(ioc->pdev->irq, _base_interrupt, IRQF_SHARED,
+	    ioc->name, ioc);
+	if (r) {
+		printk(MPT2SAS_ERR_FMT "unable to allocate interrupt %d!\n",
+		    ioc->name, ioc->pdev->irq);
+		r = -EBUSY;
+		goto out_fail;
+	}
+
+	ioc->pci_irq = ioc->pdev->irq;
+	return 0;
+
+ out_fail:
+	return r;
+}
+
+/**
+ * mpt2sas_base_map_resources - map in controller resources (io/irq/memap)
+ * @ioc: per adapter object
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_base_map_resources(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct pci_dev *pdev = ioc->pdev;
+	u32 memap_sz;
+	u32 pio_sz;
+	int i, r = 0;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n",
+	    ioc->name, __func__));
+
+	ioc->bars = pci_select_bars(pdev, IORESOURCE_MEM);
+	if (pci_enable_device_mem(pdev)) {
+		printk(MPT2SAS_WARN_FMT "pci_enable_device_mem: "
+		    "failed\n", ioc->name);
+		return -ENODEV;
+	}
+
+
+	if (pci_request_selected_regions(pdev, ioc->bars,
+	    MPT2SAS_DRIVER_NAME)) {
+		printk(MPT2SAS_WARN_FMT "pci_request_selected_regions: "
+		    "failed\n", ioc->name);
+		r = -ENODEV;
+		goto out_fail;
+	}
+
+	pci_set_master(pdev);
+
+	if (_base_config_dma_addressing(ioc, pdev) != 0) {
+		printk(MPT2SAS_WARN_FMT "no suitable DMA mask for %s\n",
+		    ioc->name, pci_name(pdev));
+		r = -ENODEV;
+		goto out_fail;
+	}
+
+	for (i = 0, memap_sz = 0, pio_sz = 0 ; i < DEVICE_COUNT_RESOURCE; i++) {
+		if (pci_resource_flags(pdev, i) & PCI_BASE_ADDRESS_SPACE_IO) {
+			if (pio_sz)
+				continue;
+			ioc->pio_chip = pci_resource_start(pdev, i);
+			pio_sz = pci_resource_len(pdev, i);
+		} else {
+			if (memap_sz)
+				continue;
+			ioc->chip_phys = pci_resource_start(pdev, i);
+			memap_sz = pci_resource_len(pdev, i);
+			ioc->chip = ioremap(ioc->chip_phys, memap_sz);
+			if (ioc->chip == NULL) {
+				printk(MPT2SAS_ERR_FMT "unable to map adapter "
+				    "memory!\n", ioc->name);
+				r = -EINVAL;
+				goto out_fail;
+			}
+		}
+	}
+
+	pci_set_drvdata(pdev, ioc->shost);
+	_base_mask_interrupts(ioc);
+	r = _base_enable_msix(ioc);
+	if (r)
+		goto out_fail;
+
+	printk(MPT2SAS_INFO_FMT "%s: IRQ %d\n",
+	    ioc->name,  ((ioc->msix_enable) ? "PCI-MSI-X enabled" :
+	    "IO-APIC enabled"), ioc->pci_irq);
+	printk(MPT2SAS_INFO_FMT "iomem(0x%lx), mapped(0x%p), size(%d)\n",
+	    ioc->name, ioc->chip_phys, ioc->chip, memap_sz);
+	printk(MPT2SAS_INFO_FMT "ioport(0x%lx), size(%d)\n",
+	    ioc->name, ioc->pio_chip, pio_sz);
+
+	return 0;
+
+ out_fail:
+	if (ioc->chip_phys)
+		iounmap(ioc->chip);
+	ioc->chip_phys = 0;
+	ioc->pci_irq = -1;
+	pci_release_selected_regions(ioc->pdev, ioc->bars);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+	return r;
+}
+
+/**
+ * mpt2sas_base_get_msg_frame_dma - obtain request mf pointer phys addr
+ * @ioc: per adapter object
+ * @smid: system request message index(smid zero is invalid)
+ *
+ * Returns phys pointer to message frame.
+ */
+dma_addr_t
+mpt2sas_base_get_msg_frame_dma(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return ioc->request_dma + (smid * ioc->request_sz);
+}
+
+/**
+ * mpt2sas_base_get_msg_frame - obtain request mf pointer
+ * @ioc: per adapter object
+ * @smid: system request message index(smid zero is invalid)
+ *
+ * Returns virt pointer to message frame.
+ */
+void *
+mpt2sas_base_get_msg_frame(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return (void *)(ioc->request + (smid * ioc->request_sz));
+}
+
+/**
+ * mpt2sas_base_get_sense_buffer - obtain a sense buffer assigned to a mf request
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * Returns virt pointer to sense buffer.
+ */
+void *
+mpt2sas_base_get_sense_buffer(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return (void *)(ioc->sense + ((smid - 1) * SCSI_SENSE_BUFFERSIZE));
+}
+
+/**
+ * mpt2sas_base_get_sense_buffer_dma - obtain a sense buffer assigned to a mf request
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * Returns phys pointer to sense buffer.
+ */
+dma_addr_t
+mpt2sas_base_get_sense_buffer_dma(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return ioc->sense_dma + ((smid - 1) * SCSI_SENSE_BUFFERSIZE);
+}
+
+/**
+ * mpt2sas_base_get_reply_virt_addr - obtain reply frames virt address
+ * @ioc: per adapter object
+ * @phys_addr: lower 32 physical addr of the reply
+ *
+ * Converts 32bit lower physical addr into a virt address.
+ */
+void *
+mpt2sas_base_get_reply_virt_addr(struct MPT2SAS_ADAPTER *ioc, u32 phys_addr)
+{
+	if (!phys_addr)
+		return NULL;
+	return ioc->reply + (phys_addr - (u32)ioc->reply_dma);
+}
+
+/**
+ * mpt2sas_base_get_smid - obtain a free smid
+ * @ioc: per adapter object
+ * @cb_idx: callback index
+ *
+ * Returns smid (zero is invalid)
+ */
+u16
+mpt2sas_base_get_smid(struct MPT2SAS_ADAPTER *ioc, u8 cb_idx)
+{
+	unsigned long flags;
+	struct request_tracker *request;
+	u16 smid;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	if (list_empty(&ioc->free_list)) {
+		spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+		printk(MPT2SAS_ERR_FMT "%s: smid not available\n",
+		    ioc->name, __func__);
+		return 0;
+	}
+
+	request = list_entry(ioc->free_list.next,
+	    struct request_tracker, tracker_list);
+	request->cb_idx = cb_idx;
+	smid = request->smid;
+	list_del(&request->tracker_list);
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+	return smid;
+}
+
+
+/**
+ * mpt2sas_base_free_smid - put smid back on free_list
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	ioc->scsi_lookup[smid - 1].cb_idx = 0xFF;
+	list_add_tail(&ioc->scsi_lookup[smid - 1].tracker_list,
+	    &ioc->free_list);
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+
+	/*
+	 * See _wait_for_commands_to_complete() call with regards to this code.
+	 */
+	if (ioc->shost_recovery && ioc->pending_io_count) {
+		if (ioc->pending_io_count == 1)
+			wake_up(&ioc->reset_wq);
+		ioc->pending_io_count--;
+	}
+}
+
+/**
+ * _base_writeq - 64 bit write to MMIO
+ * @ioc: per adapter object
+ * @b: data payload
+ * @addr: address in MMIO space
+ * @writeq_lock: spin lock
+ *
+ * Glue for handling an atomic 64 bit word to MMIO. This special handling takes
+ * care of 32 bit environment where its not quarenteed to send the entire word
+ * in one transfer.
+ */
+#ifndef writeq
+static inline void _base_writeq(__u64 b, volatile void __iomem *addr,
+    spinlock_t *writeq_lock)
+{
+	unsigned long flags;
+	__u64 data_out = cpu_to_le64(b);
+
+	spin_lock_irqsave(writeq_lock, flags);
+	writel((u32)(data_out), addr);
+	writel((u32)(data_out >> 32), (addr + 4));
+	spin_unlock_irqrestore(writeq_lock, flags);
+}
+#else
+static inline void _base_writeq(__u64 b, volatile void __iomem *addr,
+    spinlock_t *writeq_lock)
+{
+	writeq(cpu_to_le64(b), addr);
+}
+#endif
+
+/**
+ * mpt2sas_base_put_smid_scsi_io - send SCSI_IO request to firmware
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @vf_id: virtual function id
+ * @handle: device handle
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_put_smid_scsi_io(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 vf_id,
+    u16 handle)
+{
+	Mpi2RequestDescriptorUnion_t descriptor;
+	u64 *request = (u64 *)&descriptor;
+
+
+	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
+	descriptor.SCSIIO.VF_ID = vf_id;
+	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
+	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
+	descriptor.SCSIIO.LMID = 0;
+	_base_writeq(*request, &ioc->chip->RequestDescriptorPostLow,
+	    &ioc->scsi_lookup_lock);
+}
+
+
+/**
+ * mpt2sas_base_put_smid_hi_priority - send Task Managment request to firmware
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @vf_id: virtual function id
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_put_smid_hi_priority(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    u8 vf_id)
+{
+	Mpi2RequestDescriptorUnion_t descriptor;
+	u64 *request = (u64 *)&descriptor;
+
+	descriptor.HighPriority.RequestFlags =
+	    MPI2_REQ_DESCRIPT_FLAGS_HIGH_PRIORITY;
+	descriptor.HighPriority.VF_ID = vf_id;
+	descriptor.HighPriority.SMID = cpu_to_le16(smid);
+	descriptor.HighPriority.LMID = 0;
+	descriptor.HighPriority.Reserved1 = 0;
+	_base_writeq(*request, &ioc->chip->RequestDescriptorPostLow,
+	    &ioc->scsi_lookup_lock);
+}
+
+/**
+ * mpt2sas_base_put_smid_default - Default, primarily used for config pages
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @vf_id: virtual function id
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_put_smid_default(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 vf_id)
+{
+	Mpi2RequestDescriptorUnion_t descriptor;
+	u64 *request = (u64 *)&descriptor;
+
+	descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE;
+	descriptor.Default.VF_ID = vf_id;
+	descriptor.Default.SMID = cpu_to_le16(smid);
+	descriptor.Default.LMID = 0;
+	descriptor.Default.DescriptorTypeDependent = 0;
+	_base_writeq(*request, &ioc->chip->RequestDescriptorPostLow,
+	    &ioc->scsi_lookup_lock);
+}
+
+/**
+ * mpt2sas_base_put_smid_target_assist - send Target Assist/Status to firmware
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @vf_id: virtual function id
+ * @io_index: value used to track the IO
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_put_smid_target_assist(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    u8 vf_id, u16 io_index)
+{
+	Mpi2RequestDescriptorUnion_t descriptor;
+	u64 *request = (u64 *)&descriptor;
+
+	descriptor.SCSITarget.RequestFlags =
+	    MPI2_REQ_DESCRIPT_FLAGS_SCSI_TARGET;
+	descriptor.SCSITarget.VF_ID = vf_id;
+	descriptor.SCSITarget.SMID = cpu_to_le16(smid);
+	descriptor.SCSITarget.LMID = 0;
+	descriptor.SCSITarget.IoIndex = cpu_to_le16(io_index);
+	_base_writeq(*request, &ioc->chip->RequestDescriptorPostLow,
+	    &ioc->scsi_lookup_lock);
+}
+
+/**
+ * _base_display_ioc_capabilities - Disply IOC's capabilities.
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+static void
+_base_display_ioc_capabilities(struct MPT2SAS_ADAPTER *ioc)
+{
+	int i = 0;
+	char desc[16];
+	u8 revision;
+	u32 iounit_pg1_flags;
+
+	pci_read_config_byte(ioc->pdev, PCI_CLASS_REVISION, &revision);
+	strncpy(desc, ioc->manu_pg0.ChipName, 16);
+	printk(MPT2SAS_INFO_FMT "%s: FWVersion(%02d.%02d.%02d.%02d), "
+	   "ChipRevision(0x%02x), BiosVersion(%02d.%02d.%02d.%02d)\n",
+	    ioc->name, desc,
+	   (ioc->facts.FWVersion.Word & 0xFF000000) >> 24,
+	   (ioc->facts.FWVersion.Word & 0x00FF0000) >> 16,
+	   (ioc->facts.FWVersion.Word & 0x0000FF00) >> 8,
+	   ioc->facts.FWVersion.Word & 0x000000FF,
+	   revision,
+	   (ioc->bios_pg3.BiosVersion & 0xFF000000) >> 24,
+	   (ioc->bios_pg3.BiosVersion & 0x00FF0000) >> 16,
+	   (ioc->bios_pg3.BiosVersion & 0x0000FF00) >> 8,
+	    ioc->bios_pg3.BiosVersion & 0x000000FF);
+
+	printk(MPT2SAS_INFO_FMT "Protocol=(", ioc->name);
+
+	if (ioc->facts.ProtocolFlags & MPI2_IOCFACTS_PROTOCOL_SCSI_INITIATOR) {
+		printk("Initiator");
+		i++;
+	}
+
+	if (ioc->facts.ProtocolFlags & MPI2_IOCFACTS_PROTOCOL_SCSI_TARGET) {
+		printk("%sTarget", i ? "," : "");
+		i++;
+	}
+
+	i = 0;
+	printk("), ");
+	printk("Capabilities=(");
+
+	if (ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_INTEGRATED_RAID) {
+		printk("Raid");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities & MPI2_IOCFACTS_CAPABILITY_TLR) {
+		printk("%sTLR", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities & MPI2_IOCFACTS_CAPABILITY_MULTICAST) {
+		printk("%sMulticast", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_BIDIRECTIONAL_TARGET) {
+		printk("%sBIDI Target", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities & MPI2_IOCFACTS_CAPABILITY_EEDP) {
+		printk("%sEEDP", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_SNAPSHOT_BUFFER) {
+		printk("%sSnapshot Buffer", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_DIAG_TRACE_BUFFER) {
+		printk("%sDiag Trace Buffer", i ? "," : "");
+		i++;
+	}
+
+	if (ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_TASK_SET_FULL_HANDLING) {
+		printk("%sTask Set Full", i ? "," : "");
+		i++;
+	}
+
+	iounit_pg1_flags = le32_to_cpu(ioc->iounit_pg1.Flags);
+	if (!(iounit_pg1_flags & MPI2_IOUNITPAGE1_NATIVE_COMMAND_Q_DISABLE)) {
+		printk("%sNCQ", i ? "," : "");
+		i++;
+	}
+
+	printk(")\n");
+}
+
+/**
+ * _base_static_config_pages - static start of day config pages
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+static void
+_base_static_config_pages(struct MPT2SAS_ADAPTER *ioc)
+{
+	Mpi2ConfigReply_t mpi_reply;
+	u32 iounit_pg1_flags;
+
+	mpt2sas_config_get_manufacturing_pg0(ioc, &mpi_reply, &ioc->manu_pg0);
+	mpt2sas_config_get_bios_pg2(ioc, &mpi_reply, &ioc->bios_pg2);
+	mpt2sas_config_get_bios_pg3(ioc, &mpi_reply, &ioc->bios_pg3);
+	mpt2sas_config_get_ioc_pg8(ioc, &mpi_reply, &ioc->ioc_pg8);
+	mpt2sas_config_get_iounit_pg0(ioc, &mpi_reply, &ioc->iounit_pg0);
+	mpt2sas_config_get_iounit_pg1(ioc, &mpi_reply, &ioc->iounit_pg1);
+	_base_display_ioc_capabilities(ioc);
+
+	/*
+	 * Enable task_set_full handling in iounit_pg1 when the
+	 * facts capabilities indicate that its supported.
+	 */
+	iounit_pg1_flags = le32_to_cpu(ioc->iounit_pg1.Flags);
+	if ((ioc->facts.IOCCapabilities &
+	    MPI2_IOCFACTS_CAPABILITY_TASK_SET_FULL_HANDLING))
+		iounit_pg1_flags &=
+		    ~MPI2_IOUNITPAGE1_DISABLE_TASK_SET_FULL_HANDLING;
+	else
+		iounit_pg1_flags |=
+		    MPI2_IOUNITPAGE1_DISABLE_TASK_SET_FULL_HANDLING;
+	ioc->iounit_pg1.Flags = cpu_to_le32(iounit_pg1_flags);
+	mpt2sas_config_set_iounit_pg1(ioc, &mpi_reply, ioc->iounit_pg1);
+}
+
+/**
+ * _base_release_memory_pools - release memory
+ * @ioc: per adapter object
+ *
+ * Free memory allocated from _base_allocate_memory_pools.
+ *
+ * Return nothing.
+ */
+static void
+_base_release_memory_pools(struct MPT2SAS_ADAPTER *ioc)
+{
+	dexitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	if (ioc->request) {
+		pci_free_consistent(ioc->pdev, ioc->request_dma_sz,
+		    ioc->request,  ioc->request_dma);
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT "request_pool(0x%p)"
+		    ": free\n", ioc->name, ioc->request));
+		ioc->request = NULL;
+	}
+
+	if (ioc->sense) {
+		pci_pool_free(ioc->sense_dma_pool, ioc->sense, ioc->sense_dma);
+		if (ioc->sense_dma_pool)
+			pci_pool_destroy(ioc->sense_dma_pool);
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT "sense_pool(0x%p)"
+		    ": free\n", ioc->name, ioc->sense));
+		ioc->sense = NULL;
+	}
+
+	if (ioc->reply) {
+		pci_pool_free(ioc->reply_dma_pool, ioc->reply, ioc->reply_dma);
+		if (ioc->reply_dma_pool)
+			pci_pool_destroy(ioc->reply_dma_pool);
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_pool(0x%p)"
+		     ": free\n", ioc->name, ioc->reply));
+		ioc->reply = NULL;
+	}
+
+	if (ioc->reply_free) {
+		pci_pool_free(ioc->reply_free_dma_pool, ioc->reply_free,
+		    ioc->reply_free_dma);
+		if (ioc->reply_free_dma_pool)
+			pci_pool_destroy(ioc->reply_free_dma_pool);
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_free_pool"
+		    "(0x%p): free\n", ioc->name, ioc->reply_free));
+		ioc->reply_free = NULL;
+	}
+
+	if (ioc->reply_post_free) {
+		pci_pool_free(ioc->reply_post_free_dma_pool,
+		    ioc->reply_post_free, ioc->reply_post_free_dma);
+		if (ioc->reply_post_free_dma_pool)
+			pci_pool_destroy(ioc->reply_post_free_dma_pool);
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT
+		    "reply_post_free_pool(0x%p): free\n", ioc->name,
+		    ioc->reply_post_free));
+		ioc->reply_post_free = NULL;
+	}
+
+	if (ioc->config_page) {
+		dexitprintk(ioc, printk(MPT2SAS_INFO_FMT
+		    "config_page(0x%p): free\n", ioc->name,
+		    ioc->config_page));
+		pci_free_consistent(ioc->pdev, ioc->config_page_sz,
+		    ioc->config_page, ioc->config_page_dma);
+	}
+
+	kfree(ioc->scsi_lookup);
+}
+
+
+/**
+ * _base_allocate_memory_pools - allocate start of day memory pools
+ * @ioc: per adapter object
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 success, anything else error
+ */
+static int
+_base_allocate_memory_pools(struct MPT2SAS_ADAPTER *ioc,  int sleep_flag)
+{
+	Mpi2IOCFactsReply_t *facts;
+	u32 queue_size, queue_diff;
+	u16 max_sge_elements;
+	u16 num_of_reply_frames;
+	u16 chains_needed_per_io;
+	u32 sz, total_sz;
+	u16 i;
+	u32 retry_sz;
+	u16 max_request_credit;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	retry_sz = 0;
+	facts = &ioc->facts;
+
+	/* command line tunables  for max sgl entries */
+	if (max_sgl_entries != -1) {
+		ioc->shost->sg_tablesize = (max_sgl_entries <
+		    MPT2SAS_SG_DEPTH) ? max_sgl_entries :
+		    MPT2SAS_SG_DEPTH;
+	} else {
+		ioc->shost->sg_tablesize = MPT2SAS_SG_DEPTH;
+	}
+
+	/* command line tunables  for max controller queue depth */
+	if (max_queue_depth != -1) {
+		max_request_credit = (max_queue_depth < facts->RequestCredit)
+		    ? max_queue_depth : facts->RequestCredit;
+	} else {
+		max_request_credit = (facts->RequestCredit >
+		    MPT2SAS_MAX_REQUEST_QUEUE) ? MPT2SAS_MAX_REQUEST_QUEUE :
+		    facts->RequestCredit;
+	}
+	ioc->request_depth = max_request_credit;
+
+	/* request frame size */
+	ioc->request_sz = facts->IOCRequestFrameSize * 4;
+
+	/* reply frame size */
+	ioc->reply_sz = facts->ReplyFrameSize * 4;
+
+ retry_allocation:
+	total_sz = 0;
+	/* calculate number of sg elements left over in the 1st frame */
+	max_sge_elements = ioc->request_sz - ((sizeof(Mpi2SCSIIORequest_t) -
+	    sizeof(Mpi2SGEIOUnion_t)) + ioc->sge_size);
+	ioc->max_sges_in_main_message = max_sge_elements/ioc->sge_size;
+
+	/* now do the same for a chain buffer */
+	max_sge_elements = ioc->request_sz - ioc->sge_size;
+	ioc->max_sges_in_chain_message = max_sge_elements/ioc->sge_size;
+
+	ioc->chain_offset_value_for_main_message =
+	    ((sizeof(Mpi2SCSIIORequest_t) - sizeof(Mpi2SGEIOUnion_t)) +
+	     (ioc->max_sges_in_chain_message * ioc->sge_size)) / 4;
+
+	/*
+	 *  MPT2SAS_SG_DEPTH = CONFIG_FUSION_MAX_SGE
+	 */
+	chains_needed_per_io = ((ioc->shost->sg_tablesize -
+	   ioc->max_sges_in_main_message)/ioc->max_sges_in_chain_message)
+	    + 1;
+	if (chains_needed_per_io > facts->MaxChainDepth) {
+		chains_needed_per_io = facts->MaxChainDepth;
+		ioc->shost->sg_tablesize = min_t(u16,
+		ioc->max_sges_in_main_message + (ioc->max_sges_in_chain_message
+		* chains_needed_per_io), ioc->shost->sg_tablesize);
+	}
+	ioc->chains_needed_per_io = chains_needed_per_io;
+
+	/* reply free queue sizing - taking into account for events */
+	num_of_reply_frames = ioc->request_depth + 32;
+
+	/* number of replies frames can't be a multiple of 16 */
+	/* decrease number of reply frames by 1 */
+	if (!(num_of_reply_frames % 16))
+		num_of_reply_frames--;
+
+	/* calculate number of reply free queue entries
+	 *  (must be multiple of 16)
+	 */
+
+	/* (we know reply_free_queue_depth is not a multiple of 16) */
+	queue_size = num_of_reply_frames;
+	queue_size += 16 - (queue_size % 16);
+	ioc->reply_free_queue_depth = queue_size;
+
+	/* reply descriptor post queue sizing */
+	/* this size should be the number of request frames + number of reply
+	 * frames
+	 */
+
+	queue_size = ioc->request_depth + num_of_reply_frames + 1;
+	/* round up to 16 byte boundary */
+	if (queue_size % 16)
+		queue_size += 16 - (queue_size % 16);
+
+	/* check against IOC maximum reply post queue depth */
+	if (queue_size > facts->MaxReplyDescriptorPostQueueDepth) {
+		queue_diff = queue_size -
+		    facts->MaxReplyDescriptorPostQueueDepth;
+
+		/* round queue_diff up to multiple of 16 */
+		if (queue_diff % 16)
+			queue_diff += 16 - (queue_diff % 16);
+
+		/* adjust request_depth, reply_free_queue_depth,
+		 * and queue_size
+		 */
+		ioc->request_depth -= queue_diff;
+		ioc->reply_free_queue_depth -= queue_diff;
+		queue_size -= queue_diff;
+	}
+	ioc->reply_post_queue_depth = queue_size;
+
+	/* max scsi host queue depth */
+	ioc->shost->can_queue = ioc->request_depth - INTERNAL_CMDS_COUNT;
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "scsi host queue: depth"
+	    "(%d)\n", ioc->name, ioc->shost->can_queue));
+
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "scatter gather: "
+	    "sge_in_main_msg(%d), sge_per_chain(%d), sge_per_io(%d), "
+	    "chains_per_io(%d)\n", ioc->name, ioc->max_sges_in_main_message,
+	    ioc->max_sges_in_chain_message, ioc->shost->sg_tablesize,
+	    ioc->chains_needed_per_io));
+
+	/* contiguous pool for request and chains, 16 byte align, one extra "
+	 * "frame for smid=0
+	 */
+	ioc->chain_depth = ioc->chains_needed_per_io * ioc->request_depth;
+	sz = ((ioc->request_depth + 1 + ioc->chain_depth) * ioc->request_sz);
+
+	ioc->request_dma_sz = sz;
+	ioc->request = pci_alloc_consistent(ioc->pdev, sz, &ioc->request_dma);
+	if (!ioc->request) {
+		printk(MPT2SAS_ERR_FMT "request pool: pci_alloc_consistent "
+		    "failed: req_depth(%d), chains_per_io(%d), frame_sz(%d), "
+		    "total(%d kB)\n", ioc->name, ioc->request_depth,
+		    ioc->chains_needed_per_io, ioc->request_sz, sz/1024);
+		if (ioc->request_depth < MPT2SAS_SAS_QUEUE_DEPTH)
+			goto out;
+		retry_sz += 64;
+		ioc->request_depth = max_request_credit - retry_sz;
+		goto retry_allocation;
+	}
+
+	if (retry_sz)
+		printk(MPT2SAS_ERR_FMT "request pool: pci_alloc_consistent "
+		    "succeed: req_depth(%d), chains_per_io(%d), frame_sz(%d), "
+		    "total(%d kb)\n", ioc->name, ioc->request_depth,
+		    ioc->chains_needed_per_io, ioc->request_sz, sz/1024);
+
+	ioc->chain = ioc->request + ((ioc->request_depth + 1) *
+	    ioc->request_sz);
+	ioc->chain_dma = ioc->request_dma + ((ioc->request_depth + 1) *
+	    ioc->request_sz);
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "request pool(0x%p): "
+	    "depth(%d), frame_size(%d), pool_size(%d kB)\n", ioc->name,
+	    ioc->request, ioc->request_depth, ioc->request_sz,
+	    ((ioc->request_depth + 1) * ioc->request_sz)/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "chain pool(0x%p): depth"
+	    "(%d), frame_size(%d), pool_size(%d kB)\n", ioc->name, ioc->chain,
+	    ioc->chain_depth, ioc->request_sz, ((ioc->chain_depth *
+	    ioc->request_sz))/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "request pool: dma(0x%llx)\n",
+	    ioc->name, (unsigned long long) ioc->request_dma));
+	total_sz += sz;
+
+	ioc->scsi_lookup = kcalloc(ioc->request_depth,
+	    sizeof(struct request_tracker), GFP_KERNEL);
+	if (!ioc->scsi_lookup) {
+		printk(MPT2SAS_ERR_FMT "scsi_lookup: kcalloc failed\n",
+		    ioc->name);
+		goto out;
+	}
+
+	 /* initialize some bits */
+	for (i = 0; i < ioc->request_depth; i++)
+		ioc->scsi_lookup[i].smid = i + 1;
+
+	/* sense buffers, 4 byte align */
+	sz = ioc->request_depth * SCSI_SENSE_BUFFERSIZE;
+	ioc->sense_dma_pool = pci_pool_create("sense pool", ioc->pdev, sz, 4,
+	    0);
+	if (!ioc->sense_dma_pool) {
+		printk(MPT2SAS_ERR_FMT "sense pool: pci_pool_create failed\n",
+		    ioc->name);
+		goto out;
+	}
+	ioc->sense = pci_pool_alloc(ioc->sense_dma_pool , GFP_KERNEL,
+	    &ioc->sense_dma);
+	if (!ioc->sense) {
+		printk(MPT2SAS_ERR_FMT "sense pool: pci_pool_alloc failed\n",
+		    ioc->name);
+		goto out;
+	}
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT
+	    "sense pool(0x%p): depth(%d), element_size(%d), pool_size"
+	    "(%d kB)\n", ioc->name, ioc->sense, ioc->request_depth,
+	    SCSI_SENSE_BUFFERSIZE, sz/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "sense_dma(0x%llx)\n",
+	    ioc->name, (unsigned long long)ioc->sense_dma));
+	total_sz += sz;
+
+	/* reply pool, 4 byte align */
+	sz = ioc->reply_free_queue_depth * ioc->reply_sz;
+	ioc->reply_dma_pool = pci_pool_create("reply pool", ioc->pdev, sz, 4,
+	    0);
+	if (!ioc->reply_dma_pool) {
+		printk(MPT2SAS_ERR_FMT "reply pool: pci_pool_create failed\n",
+		    ioc->name);
+		goto out;
+	}
+	ioc->reply = pci_pool_alloc(ioc->reply_dma_pool , GFP_KERNEL,
+	    &ioc->reply_dma);
+	if (!ioc->reply) {
+		printk(MPT2SAS_ERR_FMT "reply pool: pci_pool_alloc failed\n",
+		    ioc->name);
+		goto out;
+	}
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply pool(0x%p): depth"
+	    "(%d), frame_size(%d), pool_size(%d kB)\n", ioc->name, ioc->reply,
+	    ioc->reply_free_queue_depth, ioc->reply_sz, sz/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_dma(0x%llx)\n",
+	    ioc->name, (unsigned long long)ioc->reply_dma));
+	total_sz += sz;
+
+	/* reply free queue, 16 byte align */
+	sz = ioc->reply_free_queue_depth * 4;
+	ioc->reply_free_dma_pool = pci_pool_create("reply_free pool",
+	    ioc->pdev, sz, 16, 0);
+	if (!ioc->reply_free_dma_pool) {
+		printk(MPT2SAS_ERR_FMT "reply_free pool: pci_pool_create "
+		    "failed\n", ioc->name);
+		goto out;
+	}
+	ioc->reply_free = pci_pool_alloc(ioc->reply_free_dma_pool , GFP_KERNEL,
+	    &ioc->reply_free_dma);
+	if (!ioc->reply_free) {
+		printk(MPT2SAS_ERR_FMT "reply_free pool: pci_pool_alloc "
+		    "failed\n", ioc->name);
+		goto out;
+	}
+	memset(ioc->reply_free, 0, sz);
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_free pool(0x%p): "
+	    "depth(%d), element_size(%d), pool_size(%d kB)\n", ioc->name,
+	    ioc->reply_free, ioc->reply_free_queue_depth, 4, sz/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_free_dma"
+	    "(0x%llx)\n", ioc->name, (unsigned long long)ioc->reply_free_dma));
+	total_sz += sz;
+
+	/* reply post queue, 16 byte align */
+	sz = ioc->reply_post_queue_depth * sizeof(Mpi2DefaultReplyDescriptor_t);
+	ioc->reply_post_free_dma_pool = pci_pool_create("reply_post_free pool",
+	    ioc->pdev, sz, 16, 0);
+	if (!ioc->reply_post_free_dma_pool) {
+		printk(MPT2SAS_ERR_FMT "reply_post_free pool: pci_pool_create "
+		    "failed\n", ioc->name);
+		goto out;
+	}
+	ioc->reply_post_free = pci_pool_alloc(ioc->reply_post_free_dma_pool ,
+	    GFP_KERNEL, &ioc->reply_post_free_dma);
+	if (!ioc->reply_post_free) {
+		printk(MPT2SAS_ERR_FMT "reply_post_free pool: pci_pool_alloc "
+		    "failed\n", ioc->name);
+		goto out;
+	}
+	memset(ioc->reply_post_free, 0, sz);
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply post free pool"
+	    "(0x%p): depth(%d), element_size(%d), pool_size(%d kB)\n",
+	    ioc->name, ioc->reply_post_free, ioc->reply_post_queue_depth, 8,
+	    sz/1024));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "reply_post_free_dma = "
+	    "(0x%llx)\n", ioc->name, (unsigned long long)
+	    ioc->reply_post_free_dma));
+	total_sz += sz;
+
+	ioc->config_page_sz = 512;
+	ioc->config_page = pci_alloc_consistent(ioc->pdev,
+	    ioc->config_page_sz, &ioc->config_page_dma);
+	if (!ioc->config_page) {
+		printk(MPT2SAS_ERR_FMT "config page: pci_pool_alloc "
+		    "failed\n", ioc->name);
+		goto out;
+	}
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "config page(0x%p): size"
+	    "(%d)\n", ioc->name, ioc->config_page, ioc->config_page_sz));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "config_page_dma"
+	    "(0x%llx)\n", ioc->name, (unsigned long long)ioc->config_page_dma));
+	total_sz += ioc->config_page_sz;
+
+	printk(MPT2SAS_INFO_FMT "Allocated physical memory: size(%d kB)\n",
+	    ioc->name, total_sz/1024);
+	printk(MPT2SAS_INFO_FMT "Current Controller Queue Depth(%d), "
+	    "Max Controller Queue Depth(%d)\n",
+	    ioc->name, ioc->shost->can_queue, facts->RequestCredit);
+	printk(MPT2SAS_INFO_FMT "Scatter Gather Elements per IO(%d)\n",
+	    ioc->name, ioc->shost->sg_tablesize);
+	return 0;
+
+ out:
+	_base_release_memory_pools(ioc);
+	return -ENOMEM;
+}
+
+
+/**
+ * mpt2sas_base_get_iocstate - Get the current state of a MPT adapter.
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @cooked: Request raw or cooked IOC state
+ *
+ * Returns all IOC Doorbell register bits if cooked==0, else just the
+ * Doorbell bits in MPI_IOC_STATE_MASK.
+ */
+u32
+mpt2sas_base_get_iocstate(struct MPT2SAS_ADAPTER *ioc, int cooked)
+{
+	u32 s, sc;
+
+	s = readl(&ioc->chip->Doorbell);
+	sc = s & MPI2_IOC_STATE_MASK;
+	return cooked ? sc : s;
+}
+
+/**
+ * _base_wait_on_iocstate - waiting on a particular ioc state
+ * @ioc_state: controller state { READY, OPERATIONAL, or RESET }
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_wait_on_iocstate(struct MPT2SAS_ADAPTER *ioc, u32 ioc_state, int timeout,
+    int sleep_flag)
+{
+	u32 count, cntdn;
+	u32 current_state;
+
+	count = 0;
+	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+	do {
+		current_state = mpt2sas_base_get_iocstate(ioc, 1);
+		if (current_state == ioc_state)
+			return 0;
+		if (count && current_state == MPI2_IOC_STATE_FAULT)
+			break;
+		if (sleep_flag == CAN_SLEEP)
+			msleep(1);
+		else
+			udelay(500);
+		count++;
+	} while (--cntdn);
+
+	return current_state;
+}
+
+/**
+ * _base_wait_for_doorbell_int - waiting for controller interrupt(generated by
+ * a write to the doorbell)
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ * Notes: MPI2_HIS_IOC2SYS_DB_STATUS - set to one when IOC writes to doorbell.
+ */
+static int
+_base_wait_for_doorbell_int(struct MPT2SAS_ADAPTER *ioc, int timeout,
+    int sleep_flag)
+{
+	u32 cntdn, count;
+	u32 int_status;
+
+	count = 0;
+	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+	do {
+		int_status = readl(&ioc->chip->HostInterruptStatus);
+		if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
+			dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+			    "successfull count(%d), timeout(%d)\n", ioc->name,
+			    __func__, count, timeout));
+			return 0;
+		}
+		if (sleep_flag == CAN_SLEEP)
+			msleep(1);
+		else
+			udelay(500);
+		count++;
+	} while (--cntdn);
+
+	printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), "
+	    "int_status(%x)!\n", ioc->name, __func__, count, int_status);
+	return -EFAULT;
+}
+
+/**
+ * _base_wait_for_doorbell_ack - waiting for controller to read the doorbell.
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ * Notes: MPI2_HIS_SYS2IOC_DB_STATUS - set to one when host writes to
+ * doorbell.
+ */
+static int
+_base_wait_for_doorbell_ack(struct MPT2SAS_ADAPTER *ioc, int timeout,
+    int sleep_flag)
+{
+	u32 cntdn, count;
+	u32 int_status;
+	u32 doorbell;
+
+	count = 0;
+	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+	do {
+		int_status = readl(&ioc->chip->HostInterruptStatus);
+		if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) {
+			dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+			    "successfull count(%d), timeout(%d)\n", ioc->name,
+			    __func__, count, timeout));
+			return 0;
+		} else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) {
+			doorbell = readl(&ioc->chip->Doorbell);
+			if ((doorbell & MPI2_IOC_STATE_MASK) ==
+			    MPI2_IOC_STATE_FAULT) {
+				mpt2sas_base_fault_info(ioc , doorbell);
+				return -EFAULT;
+			}
+		} else if (int_status == 0xFFFFFFFF)
+			goto out;
+
+		if (sleep_flag == CAN_SLEEP)
+			msleep(1);
+		else
+			udelay(500);
+		count++;
+	} while (--cntdn);
+
+ out:
+	printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), "
+	    "int_status(%x)!\n", ioc->name, __func__, count, int_status);
+	return -EFAULT;
+}
+
+/**
+ * _base_wait_for_doorbell_not_used - waiting for doorbell to not be in use
+ * @ioc: per adapter object
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ *
+ */
+static int
+_base_wait_for_doorbell_not_used(struct MPT2SAS_ADAPTER *ioc, int timeout,
+    int sleep_flag)
+{
+	u32 cntdn, count;
+	u32 doorbell_reg;
+
+	count = 0;
+	cntdn = (sleep_flag == CAN_SLEEP) ? 1000*timeout : 2000*timeout;
+	do {
+		doorbell_reg = readl(&ioc->chip->Doorbell);
+		if (!(doorbell_reg & MPI2_DOORBELL_USED)) {
+			dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+			    "successfull count(%d), timeout(%d)\n", ioc->name,
+			    __func__, count, timeout));
+			return 0;
+		}
+		if (sleep_flag == CAN_SLEEP)
+			msleep(1);
+		else
+			udelay(500);
+		count++;
+	} while (--cntdn);
+
+	printk(MPT2SAS_ERR_FMT "%s: failed due to timeout count(%d), "
+	    "doorbell_reg(%x)!\n", ioc->name, __func__, count, doorbell_reg);
+	return -EFAULT;
+}
+
+/**
+ * _base_send_ioc_reset - send doorbell reset
+ * @ioc: per adapter object
+ * @reset_type: currently only supports: MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_send_ioc_reset(struct MPT2SAS_ADAPTER *ioc, u8 reset_type, int timeout,
+    int sleep_flag)
+{
+	u32 ioc_state;
+	int r = 0;
+
+	if (reset_type != MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET) {
+		printk(MPT2SAS_ERR_FMT "%s: unknown reset_type\n",
+		    ioc->name, __func__);
+		return -EFAULT;
+	}
+
+	if (!(ioc->facts.IOCCapabilities &
+	   MPI2_IOCFACTS_CAPABILITY_EVENT_REPLAY))
+		return -EFAULT;
+
+	printk(MPT2SAS_INFO_FMT "sending message unit reset !!\n", ioc->name);
+
+	writel(reset_type << MPI2_DOORBELL_FUNCTION_SHIFT,
+	    &ioc->chip->Doorbell);
+	if ((_base_wait_for_doorbell_ack(ioc, 15, sleep_flag))) {
+		r = -EFAULT;
+		goto out;
+	}
+	ioc_state = _base_wait_on_iocstate(ioc, MPI2_IOC_STATE_READY,
+	    timeout, sleep_flag);
+	if (ioc_state) {
+		printk(MPT2SAS_ERR_FMT "%s: failed going to ready state "
+		    " (ioc_state=0x%x)\n", ioc->name, __func__, ioc_state);
+		r = -EFAULT;
+		goto out;
+	}
+ out:
+	printk(MPT2SAS_INFO_FMT "message unit reset: %s\n",
+	    ioc->name, ((r == 0) ? "SUCCESS" : "FAILED"));
+	return r;
+}
+
+/**
+ * _base_handshake_req_reply_wait - send request thru doorbell interface
+ * @ioc: per adapter object
+ * @request_bytes: request length
+ * @request: pointer having request payload
+ * @reply_bytes: reply length
+ * @reply: pointer to reply payload
+ * @timeout: timeout in second
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_handshake_req_reply_wait(struct MPT2SAS_ADAPTER *ioc, int request_bytes,
+    u32 *request, int reply_bytes, u16 *reply, int timeout, int sleep_flag)
+{
+	MPI2DefaultReply_t *default_reply = (MPI2DefaultReply_t *)reply;
+	int i;
+	u8 failed;
+	u16 dummy;
+	u32 *mfp;
+
+	/* make sure doorbell is not in use */
+	if ((readl(&ioc->chip->Doorbell) & MPI2_DOORBELL_USED)) {
+		printk(MPT2SAS_ERR_FMT "doorbell is in use "
+		    " (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+
+	/* clear pending doorbell interrupts from previous state changes */
+	if (readl(&ioc->chip->HostInterruptStatus) &
+	    MPI2_HIS_IOC2SYS_DB_STATUS)
+		writel(0, &ioc->chip->HostInterruptStatus);
+
+	/* send message to ioc */
+	writel(((MPI2_FUNCTION_HANDSHAKE<<MPI2_DOORBELL_FUNCTION_SHIFT) |
+	    ((request_bytes/4)<<MPI2_DOORBELL_ADD_DWORDS_SHIFT)),
+	    &ioc->chip->Doorbell);
+
+	if ((_base_wait_for_doorbell_int(ioc, 5, sleep_flag))) {
+		printk(MPT2SAS_ERR_FMT "doorbell handshake "
+		   "int failed (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+	writel(0, &ioc->chip->HostInterruptStatus);
+
+	if ((_base_wait_for_doorbell_ack(ioc, 5, sleep_flag))) {
+		printk(MPT2SAS_ERR_FMT "doorbell handshake "
+		    "ack failed (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+
+	/* send message 32-bits at a time */
+	for (i = 0, failed = 0; i < request_bytes/4 && !failed; i++) {
+		writel(cpu_to_le32(request[i]), &ioc->chip->Doorbell);
+		if ((_base_wait_for_doorbell_ack(ioc, 5, sleep_flag)))
+			failed = 1;
+	}
+
+	if (failed) {
+		printk(MPT2SAS_ERR_FMT "doorbell handshake "
+		    "sending request failed (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+
+	/* now wait for the reply */
+	if ((_base_wait_for_doorbell_int(ioc, timeout, sleep_flag))) {
+		printk(MPT2SAS_ERR_FMT "doorbell handshake "
+		   "int failed (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+
+	/* read the first two 16-bits, it gives the total length of the reply */
+	reply[0] = le16_to_cpu(readl(&ioc->chip->Doorbell)
+	    & MPI2_DOORBELL_DATA_MASK);
+	writel(0, &ioc->chip->HostInterruptStatus);
+	if ((_base_wait_for_doorbell_int(ioc, 5, sleep_flag))) {
+		printk(MPT2SAS_ERR_FMT "doorbell handshake "
+		   "int failed (line=%d)\n", ioc->name, __LINE__);
+		return -EFAULT;
+	}
+	reply[1] = le16_to_cpu(readl(&ioc->chip->Doorbell)
+	    & MPI2_DOORBELL_DATA_MASK);
+	writel(0, &ioc->chip->HostInterruptStatus);
+
+	for (i = 2; i < default_reply->MsgLength * 2; i++)  {
+		if ((_base_wait_for_doorbell_int(ioc, 5, sleep_flag))) {
+			printk(MPT2SAS_ERR_FMT "doorbell "
+			    "handshake int failed (line=%d)\n", ioc->name,
+			    __LINE__);
+			return -EFAULT;
+		}
+		if (i >=  reply_bytes/2) /* overflow case */
+			dummy = readl(&ioc->chip->Doorbell);
+		else
+			reply[i] = le16_to_cpu(readl(&ioc->chip->Doorbell)
+			    & MPI2_DOORBELL_DATA_MASK);
+		writel(0, &ioc->chip->HostInterruptStatus);
+	}
+
+	_base_wait_for_doorbell_int(ioc, 5, sleep_flag);
+	if (_base_wait_for_doorbell_not_used(ioc, 5, sleep_flag) != 0) {
+		dhsprintk(ioc, printk(MPT2SAS_INFO_FMT "doorbell is in use "
+		    " (line=%d)\n", ioc->name, __LINE__));
+	}
+	writel(0, &ioc->chip->HostInterruptStatus);
+
+	if (ioc->logging_level & MPT_DEBUG_INIT) {
+		mfp = (u32 *)reply;
+		printk(KERN_DEBUG "\toffset:data\n");
+		for (i = 0; i < reply_bytes/4; i++)
+			printk(KERN_DEBUG "\t[0x%02x]:%08x\n", i*4,
+			    le32_to_cpu(mfp[i]));
+	}
+	return 0;
+}
+
+/**
+ * mpt2sas_base_sas_iounit_control - send sas iounit control to FW
+ * @ioc: per adapter object
+ * @mpi_reply: the reply payload from FW
+ * @mpi_request: the request payload sent to FW
+ *
+ * The SAS IO Unit Control Request message allows the host to perform low-level
+ * operations, such as resets on the PHYs of the IO Unit, also allows the host
+ * to obtain the IOC assigned device handles for a device if it has other
+ * identifying information about the device, in addition allows the host to
+ * remove IOC resources associated with the device.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_base_sas_iounit_control(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2SasIoUnitControlReply_t *mpi_reply,
+    Mpi2SasIoUnitControlRequest_t *mpi_request)
+{
+	u16 smid;
+	u32 ioc_state;
+	unsigned long timeleft;
+	u8 issue_reset;
+	int rc;
+	void *request;
+	u16 wait_state_count;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	mutex_lock(&ioc->base_cmds.mutex);
+
+	if (ioc->base_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: base_cmd in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == 10) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			rc = -EFAULT;
+			goto out;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->base_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	ioc->base_cmds.status = MPT2_CMD_PENDING;
+	request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->base_cmds.smid = smid;
+	memcpy(request, mpi_request, sizeof(Mpi2SasIoUnitControlRequest_t));
+	if (mpi_request->Operation == MPI2_SAS_OP_PHY_HARD_RESET ||
+	    mpi_request->Operation == MPI2_SAS_OP_PHY_LINK_RESET)
+		ioc->ioc_link_reset_in_progress = 1;
+	mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
+	    msecs_to_jiffies(10000));
+	if ((mpi_request->Operation == MPI2_SAS_OP_PHY_HARD_RESET ||
+	    mpi_request->Operation == MPI2_SAS_OP_PHY_LINK_RESET) &&
+	    ioc->ioc_link_reset_in_progress)
+		ioc->ioc_link_reset_in_progress = 0;
+	if (!(ioc->base_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2SasIoUnitControlRequest_t)/4);
+		if (!(ioc->base_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+	if (ioc->base_cmds.status & MPT2_CMD_REPLY_VALID)
+		memcpy(mpi_reply, ioc->base_cmds.reply,
+		    sizeof(Mpi2SasIoUnitControlReply_t));
+	else
+		memset(mpi_reply, 0, sizeof(Mpi2SasIoUnitControlReply_t));
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	goto out;
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	rc = -EFAULT;
+ out:
+	mutex_unlock(&ioc->base_cmds.mutex);
+	return rc;
+}
+
+
+/**
+ * mpt2sas_base_scsi_enclosure_processor - sending request to sep device
+ * @ioc: per adapter object
+ * @mpi_reply: the reply payload from FW
+ * @mpi_request: the request payload sent to FW
+ *
+ * The SCSI Enclosure Processor request message causes the IOC to
+ * communicate with SES devices to control LED status signals.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_base_scsi_enclosure_processor(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2SepReply_t *mpi_reply, Mpi2SepRequest_t *mpi_request)
+{
+	u16 smid;
+	u32 ioc_state;
+	unsigned long timeleft;
+	u8 issue_reset;
+	int rc;
+	void *request;
+	u16 wait_state_count;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	mutex_lock(&ioc->base_cmds.mutex);
+
+	if (ioc->base_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: base_cmd in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == 10) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			rc = -EFAULT;
+			goto out;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->base_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	ioc->base_cmds.status = MPT2_CMD_PENDING;
+	request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->base_cmds.smid = smid;
+	memcpy(request, mpi_request, sizeof(Mpi2SepReply_t));
+	mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
+	    msecs_to_jiffies(10000));
+	if (!(ioc->base_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2SepRequest_t)/4);
+		if (!(ioc->base_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+	if (ioc->base_cmds.status & MPT2_CMD_REPLY_VALID)
+		memcpy(mpi_reply, ioc->base_cmds.reply,
+		    sizeof(Mpi2SepReply_t));
+	else
+		memset(mpi_reply, 0, sizeof(Mpi2SepReply_t));
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	goto out;
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	rc = -EFAULT;
+ out:
+	mutex_unlock(&ioc->base_cmds.mutex);
+	return rc;
+}
+
+/**
+ * _base_get_port_facts - obtain port facts reply and save in ioc
+ * @ioc: per adapter object
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_get_port_facts(struct MPT2SAS_ADAPTER *ioc, int port, int sleep_flag)
+{
+	Mpi2PortFactsRequest_t mpi_request;
+	Mpi2PortFactsReply_t mpi_reply, *pfacts;
+	int mpi_reply_sz, mpi_request_sz, r;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	mpi_reply_sz = sizeof(Mpi2PortFactsReply_t);
+	mpi_request_sz = sizeof(Mpi2PortFactsRequest_t);
+	memset(&mpi_request, 0, mpi_request_sz);
+	mpi_request.Function = MPI2_FUNCTION_PORT_FACTS;
+	mpi_request.PortNumber = port;
+	r = _base_handshake_req_reply_wait(ioc, mpi_request_sz,
+	    (u32 *)&mpi_request, mpi_reply_sz, (u16 *)&mpi_reply, 5, CAN_SLEEP);
+
+	if (r != 0) {
+		printk(MPT2SAS_ERR_FMT "%s: handshake failed (r=%d)\n",
+		    ioc->name, __func__, r);
+		return r;
+	}
+
+	pfacts = &ioc->pfacts[port];
+	memset(pfacts, 0, sizeof(Mpi2PortFactsReply_t));
+	pfacts->PortNumber = mpi_reply.PortNumber;
+	pfacts->VP_ID = mpi_reply.VP_ID;
+	pfacts->VF_ID = mpi_reply.VF_ID;
+	pfacts->MaxPostedCmdBuffers =
+	    le16_to_cpu(mpi_reply.MaxPostedCmdBuffers);
+
+	return 0;
+}
+
+/**
+ * _base_get_ioc_facts - obtain ioc facts reply and save in ioc
+ * @ioc: per adapter object
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_get_ioc_facts(struct MPT2SAS_ADAPTER *ioc, int sleep_flag)
+{
+	Mpi2IOCFactsRequest_t mpi_request;
+	Mpi2IOCFactsReply_t mpi_reply, *facts;
+	int mpi_reply_sz, mpi_request_sz, r;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	mpi_reply_sz = sizeof(Mpi2IOCFactsReply_t);
+	mpi_request_sz = sizeof(Mpi2IOCFactsRequest_t);
+	memset(&mpi_request, 0, mpi_request_sz);
+	mpi_request.Function = MPI2_FUNCTION_IOC_FACTS;
+	r = _base_handshake_req_reply_wait(ioc, mpi_request_sz,
+	    (u32 *)&mpi_request, mpi_reply_sz, (u16 *)&mpi_reply, 5, CAN_SLEEP);
+
+	if (r != 0) {
+		printk(MPT2SAS_ERR_FMT "%s: handshake failed (r=%d)\n",
+		    ioc->name, __func__, r);
+		return r;
+	}
+
+	facts = &ioc->facts;
+	memset(facts, 0, sizeof(Mpi2IOCFactsReply_t));
+	facts->MsgVersion = le16_to_cpu(mpi_reply.MsgVersion);
+	facts->HeaderVersion = le16_to_cpu(mpi_reply.HeaderVersion);
+	facts->VP_ID = mpi_reply.VP_ID;
+	facts->VF_ID = mpi_reply.VF_ID;
+	facts->IOCExceptions = le16_to_cpu(mpi_reply.IOCExceptions);
+	facts->MaxChainDepth = mpi_reply.MaxChainDepth;
+	facts->WhoInit = mpi_reply.WhoInit;
+	facts->NumberOfPorts = mpi_reply.NumberOfPorts;
+	facts->RequestCredit = le16_to_cpu(mpi_reply.RequestCredit);
+	facts->MaxReplyDescriptorPostQueueDepth =
+	    le16_to_cpu(mpi_reply.MaxReplyDescriptorPostQueueDepth);
+	facts->ProductID = le16_to_cpu(mpi_reply.ProductID);
+	facts->IOCCapabilities = le32_to_cpu(mpi_reply.IOCCapabilities);
+	if ((facts->IOCCapabilities & MPI2_IOCFACTS_CAPABILITY_INTEGRATED_RAID))
+		ioc->ir_firmware = 1;
+	facts->FWVersion.Word = le32_to_cpu(mpi_reply.FWVersion.Word);
+	facts->IOCRequestFrameSize =
+	    le16_to_cpu(mpi_reply.IOCRequestFrameSize);
+	facts->MaxInitiators = le16_to_cpu(mpi_reply.MaxInitiators);
+	facts->MaxTargets = le16_to_cpu(mpi_reply.MaxTargets);
+	ioc->shost->max_id = -1;
+	facts->MaxSasExpanders = le16_to_cpu(mpi_reply.MaxSasExpanders);
+	facts->MaxEnclosures = le16_to_cpu(mpi_reply.MaxEnclosures);
+	facts->ProtocolFlags = le16_to_cpu(mpi_reply.ProtocolFlags);
+	facts->HighPriorityCredit =
+	    le16_to_cpu(mpi_reply.HighPriorityCredit);
+	facts->ReplyFrameSize = mpi_reply.ReplyFrameSize;
+	facts->MaxDevHandle = le16_to_cpu(mpi_reply.MaxDevHandle);
+
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "hba queue depth(%d), "
+	    "max chains per io(%d)\n", ioc->name, facts->RequestCredit,
+	    facts->MaxChainDepth));
+	dinitprintk(ioc, printk(MPT2SAS_INFO_FMT "request frame size(%d), "
+	    "reply frame size(%d)\n", ioc->name,
+	    facts->IOCRequestFrameSize * 4, facts->ReplyFrameSize * 4));
+	return 0;
+}
+
+/**
+ * _base_send_ioc_init - send ioc_init to firmware
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_send_ioc_init(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, int sleep_flag)
+{
+	Mpi2IOCInitRequest_t mpi_request;
+	Mpi2IOCInitReply_t mpi_reply;
+	int r;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	memset(&mpi_request, 0, sizeof(Mpi2IOCInitRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_IOC_INIT;
+	mpi_request.WhoInit = MPI2_WHOINIT_HOST_DRIVER;
+	mpi_request.VF_ID = VF_ID;
+	mpi_request.MsgVersion = cpu_to_le16(MPI2_VERSION);
+	mpi_request.HeaderVersion = cpu_to_le16(MPI2_HEADER_VERSION);
+
+	/* In MPI Revision I (0xA), the SystemReplyFrameSize(offset 0x18) was
+	 * removed and made reserved.  For those with older firmware will need
+	 * this fix. It was decided that the Reply and Request frame sizes are
+	 * the same.
+	 */
+	if ((ioc->facts.HeaderVersion >> 8) < 0xA) {
+		mpi_request.Reserved7 = cpu_to_le16(ioc->reply_sz);
+/*		mpi_request.SystemReplyFrameSize =
+ *		 cpu_to_le16(ioc->reply_sz);
+ */
+	}
+
+	mpi_request.SystemRequestFrameSize = cpu_to_le16(ioc->request_sz/4);
+	mpi_request.ReplyDescriptorPostQueueDepth =
+	    cpu_to_le16(ioc->reply_post_queue_depth);
+	mpi_request.ReplyFreeQueueDepth =
+	    cpu_to_le16(ioc->reply_free_queue_depth);
+
+#if BITS_PER_LONG > 32
+	mpi_request.SenseBufferAddressHigh =
+	    cpu_to_le32(ioc->sense_dma >> 32);
+	mpi_request.SystemReplyAddressHigh =
+	    cpu_to_le32(ioc->reply_dma >> 32);
+	mpi_request.SystemRequestFrameBaseAddress =
+	    cpu_to_le64(ioc->request_dma);
+	mpi_request.ReplyFreeQueueAddress =
+	    cpu_to_le64(ioc->reply_free_dma);
+	mpi_request.ReplyDescriptorPostQueueAddress =
+	    cpu_to_le64(ioc->reply_post_free_dma);
+#else
+	mpi_request.SystemRequestFrameBaseAddress =
+	    cpu_to_le32(ioc->request_dma);
+	mpi_request.ReplyFreeQueueAddress =
+	    cpu_to_le32(ioc->reply_free_dma);
+	mpi_request.ReplyDescriptorPostQueueAddress =
+	    cpu_to_le32(ioc->reply_post_free_dma);
+#endif
+
+	if (ioc->logging_level & MPT_DEBUG_INIT) {
+		u32 *mfp;
+		int i;
+
+		mfp = (u32 *)&mpi_request;
+		printk(KERN_DEBUG "\toffset:data\n");
+		for (i = 0; i < sizeof(Mpi2IOCInitRequest_t)/4; i++)
+			printk(KERN_DEBUG "\t[0x%02x]:%08x\n", i*4,
+			    le32_to_cpu(mfp[i]));
+	}
+
+	r = _base_handshake_req_reply_wait(ioc,
+	    sizeof(Mpi2IOCInitRequest_t), (u32 *)&mpi_request,
+	    sizeof(Mpi2IOCInitReply_t), (u16 *)&mpi_reply, 10,
+	    sleep_flag);
+
+	if (r != 0) {
+		printk(MPT2SAS_ERR_FMT "%s: handshake failed (r=%d)\n",
+		    ioc->name, __func__, r);
+		return r;
+	}
+
+	if (mpi_reply.IOCStatus != MPI2_IOCSTATUS_SUCCESS ||
+	    mpi_reply.IOCLogInfo) {
+		printk(MPT2SAS_ERR_FMT "%s: failed\n", ioc->name, __func__);
+		r = -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * _base_send_port_enable - send port_enable(discovery stuff) to firmware
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_send_port_enable(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, int sleep_flag)
+{
+	Mpi2PortEnableRequest_t *mpi_request;
+	u32 ioc_state;
+	unsigned long timeleft;
+	int r = 0;
+	u16 smid;
+
+	printk(MPT2SAS_INFO_FMT "sending port enable !!\n", ioc->name);
+
+	if (ioc->base_cmds.status & MPT2_CMD_PENDING) {
+		printk(MPT2SAS_ERR_FMT "%s: internal command already in use\n",
+		    ioc->name, __func__);
+		return -EAGAIN;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->base_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		return -EAGAIN;
+	}
+
+	ioc->base_cmds.status = MPT2_CMD_PENDING;
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->base_cmds.smid = smid;
+	memset(mpi_request, 0, sizeof(Mpi2PortEnableRequest_t));
+	mpi_request->Function = MPI2_FUNCTION_PORT_ENABLE;
+	mpi_request->VF_ID = VF_ID;
+
+	mpt2sas_base_put_smid_default(ioc, smid, VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->base_cmds.done,
+	    300*HZ);
+	if (!(ioc->base_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2PortEnableRequest_t)/4);
+		if (ioc->base_cmds.status & MPT2_CMD_RESET)
+			r = -EFAULT;
+		else
+			r = -ETIME;
+		goto out;
+	} else
+		dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: complete\n",
+		    ioc->name, __func__));
+
+	ioc_state = _base_wait_on_iocstate(ioc, MPI2_IOC_STATE_OPERATIONAL,
+	    60, sleep_flag);
+	if (ioc_state) {
+		printk(MPT2SAS_ERR_FMT "%s: failed going to operational state "
+		    " (ioc_state=0x%x)\n", ioc->name, __func__, ioc_state);
+		r = -EFAULT;
+	}
+ out:
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	printk(MPT2SAS_INFO_FMT "port enable: %s\n",
+	    ioc->name, ((r == 0) ? "SUCCESS" : "FAILED"));
+	return r;
+}
+
+/**
+ * _base_unmask_events - turn on notification for this event
+ * @ioc: per adapter object
+ * @event: firmware event
+ *
+ * The mask is stored in ioc->event_masks.
+ */
+static void
+_base_unmask_events(struct MPT2SAS_ADAPTER *ioc, u16 event)
+{
+	u32 desired_event;
+
+	if (event >= 128)
+		return;
+
+	desired_event = (1 << (event % 32));
+
+	if (event < 32)
+		ioc->event_masks[0] &= ~desired_event;
+	else if (event < 64)
+		ioc->event_masks[1] &= ~desired_event;
+	else if (event < 96)
+		ioc->event_masks[2] &= ~desired_event;
+	else if (event < 128)
+		ioc->event_masks[3] &= ~desired_event;
+}
+
+/**
+ * _base_event_notification - send event notification
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_event_notification(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, int sleep_flag)
+{
+	Mpi2EventNotificationRequest_t *mpi_request;
+	unsigned long timeleft;
+	u16 smid;
+	int r = 0;
+	int i;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	if (ioc->base_cmds.status & MPT2_CMD_PENDING) {
+		printk(MPT2SAS_ERR_FMT "%s: internal command already in use\n",
+		    ioc->name, __func__);
+		return -EAGAIN;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->base_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		return -EAGAIN;
+	}
+	ioc->base_cmds.status = MPT2_CMD_PENDING;
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->base_cmds.smid = smid;
+	memset(mpi_request, 0, sizeof(Mpi2EventNotificationRequest_t));
+	mpi_request->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
+	mpi_request->VF_ID = VF_ID;
+	for (i = 0; i < MPI2_EVENT_NOTIFY_EVENTMASK_WORDS; i++)
+		mpi_request->EventMasks[i] =
+		    le32_to_cpu(ioc->event_masks[i]);
+	mpt2sas_base_put_smid_default(ioc, smid, VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->base_cmds.done, 30*HZ);
+	if (!(ioc->base_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2EventNotificationRequest_t)/4);
+		if (ioc->base_cmds.status & MPT2_CMD_RESET)
+			r = -EFAULT;
+		else
+			r = -ETIME;
+	} else
+		dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: complete\n",
+		    ioc->name, __func__));
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+	return r;
+}
+
+/**
+ * mpt2sas_base_validate_event_type - validating event types
+ * @ioc: per adapter object
+ * @event: firmware event
+ *
+ * This will turn on firmware event notification when application
+ * ask for that event. We don't mask events that are already enabled.
+ */
+void
+mpt2sas_base_validate_event_type(struct MPT2SAS_ADAPTER *ioc, u32 *event_type)
+{
+	int i, j;
+	u32 event_mask, desired_event;
+	u8 send_update_to_fw;
+
+	for (i = 0, send_update_to_fw = 0; i <
+	    MPI2_EVENT_NOTIFY_EVENTMASK_WORDS; i++) {
+		event_mask = ~event_type[i];
+		desired_event = 1;
+		for (j = 0; j < 32; j++) {
+			if (!(event_mask & desired_event) &&
+			    (ioc->event_masks[i] & desired_event)) {
+				ioc->event_masks[i] &= ~desired_event;
+				send_update_to_fw = 1;
+			}
+			desired_event = (desired_event << 1);
+		}
+	}
+
+	if (!send_update_to_fw)
+		return;
+
+	mutex_lock(&ioc->base_cmds.mutex);
+	_base_event_notification(ioc, 0, CAN_SLEEP);
+	mutex_unlock(&ioc->base_cmds.mutex);
+}
+
+/**
+ * _base_diag_reset - the "big hammer" start of day reset
+ * @ioc: per adapter object
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_diag_reset(struct MPT2SAS_ADAPTER *ioc, int sleep_flag)
+{
+	u32 host_diagnostic;
+	u32 ioc_state;
+	u32 count;
+	u32 hcb_size;
+
+	printk(MPT2SAS_INFO_FMT "sending diag reset !!\n", ioc->name);
+
+	_base_save_msix_table(ioc);
+
+	drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "clear interrupts\n",
+	    ioc->name));
+	writel(0, &ioc->chip->HostInterruptStatus);
+
+	count = 0;
+	do {
+		/* Write magic sequence to WriteSequence register
+		 * Loop until in diagnostic mode
+		 */
+		drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "write magic "
+		    "sequence\n", ioc->name));
+		writel(MPI2_WRSEQ_FLUSH_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_1ST_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_2ND_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_3RD_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_4TH_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_5TH_KEY_VALUE, &ioc->chip->WriteSequence);
+		writel(MPI2_WRSEQ_6TH_KEY_VALUE, &ioc->chip->WriteSequence);
+
+		/* wait 100 msec */
+		if (sleep_flag == CAN_SLEEP)
+			msleep(100);
+		else
+			mdelay(100);
+
+		if (count++ > 20)
+			goto out;
+
+		host_diagnostic = readl(&ioc->chip->HostDiagnostic);
+		drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "wrote magic "
+		    "sequence: count(%d), host_diagnostic(0x%08x)\n",
+		    ioc->name, count, host_diagnostic));
+
+	} while ((host_diagnostic & MPI2_DIAG_DIAG_WRITE_ENABLE) == 0);
+
+	hcb_size = readl(&ioc->chip->HCBSize);
+
+	drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "diag reset: issued\n",
+	    ioc->name));
+	writel(host_diagnostic | MPI2_DIAG_RESET_ADAPTER,
+	     &ioc->chip->HostDiagnostic);
+
+	/* don't access any registers for 50 milliseconds */
+	msleep(50);
+
+	/* 300 second max wait */
+	for (count = 0; count < 3000000 ; count++) {
+
+		host_diagnostic = readl(&ioc->chip->HostDiagnostic);
+
+		if (host_diagnostic == 0xFFFFFFFF)
+			goto out;
+		if (!(host_diagnostic & MPI2_DIAG_RESET_ADAPTER))
+			break;
+
+		/* wait 100 msec */
+		if (sleep_flag == CAN_SLEEP)
+			msleep(1);
+		else
+			mdelay(1);
+	}
+
+	if (host_diagnostic & MPI2_DIAG_HCB_MODE) {
+
+		drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "restart the adapter "
+		    "assuming the HCB Address points to good F/W\n",
+		    ioc->name));
+		host_diagnostic &= ~MPI2_DIAG_BOOT_DEVICE_SELECT_MASK;
+		host_diagnostic |= MPI2_DIAG_BOOT_DEVICE_SELECT_HCDW;
+		writel(host_diagnostic, &ioc->chip->HostDiagnostic);
+
+		drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "re-enable the HCDW\n", ioc->name));
+		writel(hcb_size | MPI2_HCB_SIZE_HCB_ENABLE,
+		    &ioc->chip->HCBSize);
+	}
+
+	drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "restart the adapter\n",
+	    ioc->name));
+	writel(host_diagnostic & ~MPI2_DIAG_HOLD_IOC_RESET,
+	    &ioc->chip->HostDiagnostic);
+
+	drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "disable writes to the "
+	    "diagnostic register\n", ioc->name));
+	writel(MPI2_WRSEQ_FLUSH_KEY_VALUE, &ioc->chip->WriteSequence);
+
+	drsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "Wait for FW to go to the "
+	    "READY state\n", ioc->name));
+	ioc_state = _base_wait_on_iocstate(ioc, MPI2_IOC_STATE_READY, 20,
+	    sleep_flag);
+	if (ioc_state) {
+		printk(MPT2SAS_ERR_FMT "%s: failed going to ready state "
+		    " (ioc_state=0x%x)\n", ioc->name, __func__, ioc_state);
+		goto out;
+	}
+
+	_base_restore_msix_table(ioc);
+	printk(MPT2SAS_INFO_FMT "diag reset: SUCCESS\n", ioc->name);
+	return 0;
+
+ out:
+	printk(MPT2SAS_ERR_FMT "diag reset: FAILED\n", ioc->name);
+	return -EFAULT;
+}
+
+/**
+ * _base_make_ioc_ready - put controller in READY state
+ * @ioc: per adapter object
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ * @type: FORCE_BIG_HAMMER or SOFT_RESET
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_make_ioc_ready(struct MPT2SAS_ADAPTER *ioc, int sleep_flag,
+    enum reset_type type)
+{
+	u32 ioc_state;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 0);
+	dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: ioc_state(0x%08x)\n",
+	    ioc->name, __func__, ioc_state));
+
+	if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_READY)
+		return 0;
+
+	if (ioc_state & MPI2_DOORBELL_USED) {
+		dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "unexpected doorbell "
+		    "active!\n", ioc->name));
+		goto issue_diag_reset;
+	}
+
+	if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) {
+		mpt2sas_base_fault_info(ioc, ioc_state &
+		    MPI2_DOORBELL_DATA_MASK);
+		goto issue_diag_reset;
+	}
+
+	if (type == FORCE_BIG_HAMMER)
+		goto issue_diag_reset;
+
+	if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_OPERATIONAL)
+		if (!(_base_send_ioc_reset(ioc,
+		    MPI2_FUNCTION_IOC_MESSAGE_UNIT_RESET, 15, CAN_SLEEP)))
+			return 0;
+
+ issue_diag_reset:
+	return _base_diag_reset(ioc, CAN_SLEEP);
+}
+
+/**
+ * _base_make_ioc_operational - put controller in OPERATIONAL state
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_base_make_ioc_operational(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    int sleep_flag)
+{
+	int r, i;
+	unsigned long	flags;
+	u32 reply_address;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	/* initialize the scsi lookup free list */
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	INIT_LIST_HEAD(&ioc->free_list);
+	for (i = 0; i < ioc->request_depth; i++) {
+		ioc->scsi_lookup[i].cb_idx = 0xFF;
+		list_add_tail(&ioc->scsi_lookup[i].tracker_list,
+		    &ioc->free_list);
+	}
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+
+	/* initialize Reply Free Queue */
+	for (i = 0, reply_address = (u32)ioc->reply_dma ;
+	    i < ioc->reply_free_queue_depth ; i++, reply_address +=
+	    ioc->reply_sz)
+		ioc->reply_free[i] = cpu_to_le32(reply_address);
+
+	/* initialize Reply Post Free Queue */
+	for (i = 0; i < ioc->reply_post_queue_depth; i++)
+		ioc->reply_post_free[i].Words = ~0ULL;
+
+	r = _base_send_ioc_init(ioc, VF_ID, sleep_flag);
+	if (r)
+		return r;
+
+	/* initialize the index's */
+	ioc->reply_free_host_index = ioc->reply_free_queue_depth - 1;
+	ioc->reply_post_host_index = 0;
+	writel(ioc->reply_free_host_index, &ioc->chip->ReplyFreeHostIndex);
+	writel(0, &ioc->chip->ReplyPostHostIndex);
+
+	_base_unmask_interrupts(ioc);
+	r = _base_event_notification(ioc, VF_ID, sleep_flag);
+	if (r)
+		return r;
+
+	if (sleep_flag == CAN_SLEEP)
+		_base_static_config_pages(ioc);
+
+	r = _base_send_port_enable(ioc, VF_ID, sleep_flag);
+	if (r)
+		return r;
+
+	return r;
+}
+
+/**
+ * mpt2sas_base_free_resources - free resources controller resources (io/irq/memap)
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_free_resources(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct pci_dev *pdev = ioc->pdev;
+
+	dexitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	_base_mask_interrupts(ioc);
+	_base_make_ioc_ready(ioc, CAN_SLEEP, SOFT_RESET);
+	if (ioc->pci_irq) {
+		synchronize_irq(pdev->irq);
+		free_irq(ioc->pci_irq, ioc);
+	}
+	_base_disable_msix(ioc);
+	if (ioc->chip_phys)
+		iounmap(ioc->chip);
+	ioc->pci_irq = -1;
+	ioc->chip_phys = 0;
+	pci_release_selected_regions(ioc->pdev, ioc->bars);
+	pci_disable_device(pdev);
+	pci_set_drvdata(pdev, NULL);
+	return;
+}
+
+/**
+ * mpt2sas_base_attach - attach controller instance
+ * @ioc: per adapter object
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc)
+{
+	int r, i;
+	unsigned long	 flags;
+
+	dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	r = mpt2sas_base_map_resources(ioc);
+	if (r)
+		return r;
+
+	r = _base_make_ioc_ready(ioc, CAN_SLEEP, SOFT_RESET);
+	if (r)
+		goto out_free_resources;
+
+	r = _base_get_ioc_facts(ioc, CAN_SLEEP);
+	if (r)
+		goto out_free_resources;
+
+	r = _base_allocate_memory_pools(ioc, CAN_SLEEP);
+	if (r)
+		goto out_free_resources;
+
+	init_waitqueue_head(&ioc->reset_wq);
+
+	/* base internal command bits */
+	mutex_init(&ioc->base_cmds.mutex);
+	init_completion(&ioc->base_cmds.done);
+	ioc->base_cmds.reply = kzalloc(ioc->reply_sz, GFP_KERNEL);
+	ioc->base_cmds.status = MPT2_CMD_NOT_USED;
+
+	/* transport internal command bits */
+	ioc->transport_cmds.reply = kzalloc(ioc->reply_sz, GFP_KERNEL);
+	ioc->transport_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_init(&ioc->transport_cmds.mutex);
+	init_completion(&ioc->transport_cmds.done);
+
+	/* task management internal command bits */
+	ioc->tm_cmds.reply = kzalloc(ioc->reply_sz, GFP_KERNEL);
+	ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_init(&ioc->tm_cmds.mutex);
+	init_completion(&ioc->tm_cmds.done);
+
+	/* config page internal command bits */
+	ioc->config_cmds.reply = kzalloc(ioc->reply_sz, GFP_KERNEL);
+	ioc->config_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_init(&ioc->config_cmds.mutex);
+	init_completion(&ioc->config_cmds.done);
+
+	/* ctl module internal command bits */
+	ioc->ctl_cmds.reply = kzalloc(ioc->reply_sz, GFP_KERNEL);
+	ioc->ctl_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_init(&ioc->ctl_cmds.mutex);
+	init_completion(&ioc->ctl_cmds.done);
+
+	for (i = 0; i < MPI2_EVENT_NOTIFY_EVENTMASK_WORDS; i++)
+		ioc->event_masks[i] = -1;
+
+	/* here we enable the events we care about */
+	_base_unmask_events(ioc, MPI2_EVENT_SAS_DISCOVERY);
+	_base_unmask_events(ioc, MPI2_EVENT_SAS_BROADCAST_PRIMITIVE);
+	_base_unmask_events(ioc, MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST);
+	_base_unmask_events(ioc, MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE);
+	_base_unmask_events(ioc, MPI2_EVENT_SAS_ENCL_DEVICE_STATUS_CHANGE);
+	_base_unmask_events(ioc, MPI2_EVENT_IR_CONFIGURATION_CHANGE_LIST);
+	_base_unmask_events(ioc, MPI2_EVENT_IR_VOLUME);
+	_base_unmask_events(ioc, MPI2_EVENT_IR_PHYSICAL_DISK);
+	_base_unmask_events(ioc, MPI2_EVENT_IR_OPERATION_STATUS);
+	_base_unmask_events(ioc, MPI2_EVENT_TASK_SET_FULL);
+	_base_unmask_events(ioc, MPI2_EVENT_LOG_ENTRY_ADDED);
+
+	ioc->pfacts = kcalloc(ioc->facts.NumberOfPorts,
+	    sizeof(Mpi2PortFactsReply_t), GFP_KERNEL);
+	if (!ioc->pfacts)
+		goto out_free_resources;
+
+	for (i = 0 ; i < ioc->facts.NumberOfPorts; i++) {
+		r = _base_get_port_facts(ioc, i, CAN_SLEEP);
+		if (r)
+			goto out_free_resources;
+	}
+	r = _base_make_ioc_operational(ioc, 0, CAN_SLEEP);
+	if (r)
+		goto out_free_resources;
+
+	/* initialize fault polling */
+	INIT_DELAYED_WORK(&ioc->fault_reset_work, _base_fault_reset_work);
+	snprintf(ioc->fault_reset_work_q_name,
+	    sizeof(ioc->fault_reset_work_q_name), "poll_%d_status", ioc->id);
+	ioc->fault_reset_work_q =
+		create_singlethread_workqueue(ioc->fault_reset_work_q_name);
+	if (!ioc->fault_reset_work_q) {
+		printk(MPT2SAS_ERR_FMT "%s: failed (line=%d)\n",
+		    ioc->name, __func__, __LINE__);
+			goto out_free_resources;
+	}
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->fault_reset_work_q)
+		queue_delayed_work(ioc->fault_reset_work_q,
+		    &ioc->fault_reset_work,
+		    msecs_to_jiffies(FAULT_POLLING_INTERVAL));
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+	return 0;
+
+ out_free_resources:
+
+	ioc->remove_host = 1;
+	mpt2sas_base_free_resources(ioc);
+	_base_release_memory_pools(ioc);
+	kfree(ioc->tm_cmds.reply);
+	kfree(ioc->transport_cmds.reply);
+	kfree(ioc->config_cmds.reply);
+	kfree(ioc->base_cmds.reply);
+	kfree(ioc->ctl_cmds.reply);
+	kfree(ioc->pfacts);
+	ioc->ctl_cmds.reply = NULL;
+	ioc->base_cmds.reply = NULL;
+	ioc->tm_cmds.reply = NULL;
+	ioc->transport_cmds.reply = NULL;
+	ioc->config_cmds.reply = NULL;
+	ioc->pfacts = NULL;
+	return r;
+}
+
+
+/**
+ * mpt2sas_base_detach - remove controller instance
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_base_detach(struct MPT2SAS_ADAPTER *ioc)
+{
+	unsigned long	 flags;
+	struct workqueue_struct *wq;
+
+	dexitprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	wq = ioc->fault_reset_work_q;
+	ioc->fault_reset_work_q = NULL;
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+	if (!cancel_delayed_work(&ioc->fault_reset_work))
+		flush_workqueue(wq);
+	destroy_workqueue(wq);
+
+	mpt2sas_base_free_resources(ioc);
+	_base_release_memory_pools(ioc);
+	kfree(ioc->pfacts);
+	kfree(ioc->ctl_cmds.reply);
+	kfree(ioc->base_cmds.reply);
+	kfree(ioc->tm_cmds.reply);
+	kfree(ioc->transport_cmds.reply);
+	kfree(ioc->config_cmds.reply);
+}
+
+/**
+ * _base_reset_handler - reset callback handler (for base)
+ * @ioc: per adapter object
+ * @reset_phase: phase
+ *
+ * The handler for doing any required cleanup or initialization.
+ *
+ * The reset phase can be MPT2_IOC_PRE_RESET, MPT2_IOC_AFTER_RESET,
+ * MPT2_IOC_DONE_RESET
+ *
+ * Return nothing.
+ */
+static void
+_base_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase)
+{
+	switch (reset_phase) {
+	case MPT2_IOC_PRE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_PRE_RESET\n", ioc->name, __func__));
+		break;
+	case MPT2_IOC_AFTER_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_AFTER_RESET\n", ioc->name, __func__));
+		if (ioc->transport_cmds.status & MPT2_CMD_PENDING) {
+			ioc->transport_cmds.status |= MPT2_CMD_RESET;
+			mpt2sas_base_free_smid(ioc, ioc->transport_cmds.smid);
+			complete(&ioc->transport_cmds.done);
+		}
+		if (ioc->base_cmds.status & MPT2_CMD_PENDING) {
+			ioc->base_cmds.status |= MPT2_CMD_RESET;
+			mpt2sas_base_free_smid(ioc, ioc->base_cmds.smid);
+			complete(&ioc->base_cmds.done);
+		}
+		if (ioc->config_cmds.status & MPT2_CMD_PENDING) {
+			ioc->config_cmds.status |= MPT2_CMD_RESET;
+			mpt2sas_base_free_smid(ioc, ioc->config_cmds.smid);
+			complete(&ioc->config_cmds.done);
+		}
+		break;
+	case MPT2_IOC_DONE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_DONE_RESET\n", ioc->name, __func__));
+		break;
+	}
+	mpt2sas_scsih_reset_handler(ioc, reset_phase);
+	mpt2sas_ctl_reset_handler(ioc, reset_phase);
+}
+
+/**
+ * _wait_for_commands_to_complete - reset controller
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ *
+ * This function waiting(3s) for all pending commands to complete
+ * prior to putting controller in reset.
+ */
+static void
+_wait_for_commands_to_complete(struct MPT2SAS_ADAPTER *ioc, int sleep_flag)
+{
+	u32 ioc_state;
+	unsigned long flags;
+	u16 i;
+
+	ioc->pending_io_count = 0;
+	if (sleep_flag != CAN_SLEEP)
+		return;
+
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 0);
+	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
+		return;
+
+	/* pending command count */
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	for (i = 0; i < ioc->request_depth; i++)
+		if (ioc->scsi_lookup[i].cb_idx != 0xFF)
+			ioc->pending_io_count++;
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+
+	if (!ioc->pending_io_count)
+		return;
+
+	/* wait for pending commands to complete */
+	wait_event_timeout(ioc->reset_wq, ioc->pending_io_count == 0, 3 * HZ);
+}
+
+/**
+ * mpt2sas_base_hard_reset_handler - reset controller
+ * @ioc: Pointer to MPT_ADAPTER structure
+ * @sleep_flag: CAN_SLEEP or NO_SLEEP
+ * @type: FORCE_BIG_HAMMER or SOFT_RESET
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_base_hard_reset_handler(struct MPT2SAS_ADAPTER *ioc, int sleep_flag,
+    enum reset_type type)
+{
+	int r, i;
+	unsigned long flags;
+
+	dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->ioc_reset_in_progress) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		printk(MPT2SAS_ERR_FMT "%s: busy\n",
+		    ioc->name, __func__);
+		return -EBUSY;
+	}
+	ioc->ioc_reset_in_progress = 1;
+	ioc->shost_recovery = 1;
+	if (ioc->shost->shost_state == SHOST_RUNNING) {
+		/* set back to SHOST_RUNNING in mpt2sas_scsih.c */
+		scsi_host_set_state(ioc->shost, SHOST_RECOVERY);
+		printk(MPT2SAS_INFO_FMT "putting controller into "
+		    "SHOST_RECOVERY\n", ioc->name);
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	_base_reset_handler(ioc, MPT2_IOC_PRE_RESET);
+	_wait_for_commands_to_complete(ioc, sleep_flag);
+	_base_mask_interrupts(ioc);
+	r = _base_make_ioc_ready(ioc, sleep_flag, type);
+	if (r)
+		goto out;
+	_base_reset_handler(ioc, MPT2_IOC_AFTER_RESET);
+	for (i = 0 ; i < ioc->facts.NumberOfPorts; i++)
+		r = _base_make_ioc_operational(ioc, ioc->pfacts[i].VF_ID,
+		    sleep_flag);
+	if (!r)
+		_base_reset_handler(ioc, MPT2_IOC_DONE_RESET);
+ out:
+	dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: %s\n",
+	    ioc->name, __func__, ((r == 0) ? "SUCCESS" : "FAILED")));
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	ioc->ioc_reset_in_progress = 0;
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+	return r;
+}
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
new file mode 100644
index 000000000000..6945ff4d382e
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -0,0 +1,779 @@
+/*
+ * This is the Fusion MPT base driver providing common API layer interface
+ * for access to MPT (Message Passing Technology) firmware.
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_base.h
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#ifndef MPT2SAS_BASE_H_INCLUDED
+#define MPT2SAS_BASE_H_INCLUDED
+
+#include "mpi/mpi2_type.h"
+#include "mpi/mpi2.h"
+#include "mpi/mpi2_ioc.h"
+#include "mpi/mpi2_cnfg.h"
+#include "mpi/mpi2_init.h"
+#include "mpi/mpi2_raid.h"
+#include "mpi/mpi2_tool.h"
+#include "mpi/mpi2_sas.h"
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_tcq.h>
+#include <scsi/scsi_transport_sas.h>
+#include <scsi/scsi_dbg.h>
+
+#include "mpt2sas_debug.h"
+
+/* driver versioning info */
+#define MPT2SAS_DRIVER_NAME		"mpt2sas"
+#define MPT2SAS_AUTHOR	"LSI Corporation <DL-MPTFusionLinux@lsi.com>"
+#define MPT2SAS_DESCRIPTION	"LSI MPT Fusion SAS 2.0 Device Driver"
+#define MPT2SAS_DRIVER_VERSION		"00.100.11.16"
+#define MPT2SAS_MAJOR_VERSION		00
+#define MPT2SAS_MINOR_VERSION		100
+#define MPT2SAS_BUILD_VERSION		11
+#define MPT2SAS_RELEASE_VERSION		16
+
+/*
+ * Set MPT2SAS_SG_DEPTH value based on user input.
+ */
+#ifdef CONFIG_SCSI_MPT2SAS_MAX_SGE
+#if     CONFIG_SCSI_MPT2SAS_MAX_SGE  < 16
+#define MPT2SAS_SG_DEPTH       16
+#elif CONFIG_SCSI_MPT2SAS_MAX_SGE  > 128
+#define MPT2SAS_SG_DEPTH       128
+#else
+#define MPT2SAS_SG_DEPTH       CONFIG_SCSI_MPT2SAS_MAX_SGE
+#endif
+#else
+#define MPT2SAS_SG_DEPTH       128 /* MAX_HW_SEGMENTS */
+#endif
+
+
+/*
+ * Generic Defines
+ */
+#define MPT2SAS_SATA_QUEUE_DEPTH	32
+#define MPT2SAS_SAS_QUEUE_DEPTH		254
+#define MPT2SAS_RAID_QUEUE_DEPTH	128
+
+#define MPT_NAME_LENGTH			32	/* generic length of strings */
+#define MPT_STRING_LENGTH		64
+
+#define	MPT_MAX_CALLBACKS		16
+
+#define	 CAN_SLEEP			1
+#define  NO_SLEEP			0
+
+#define INTERNAL_CMDS_COUNT		10	/* reserved cmds */
+
+#define MPI2_HIM_MASK			0xFFFFFFFF /* mask every bit*/
+
+#define MPT2SAS_INVALID_DEVICE_HANDLE	0xFFFF
+
+
+/*
+ * reset phases
+ */
+#define MPT2_IOC_PRE_RESET		1 /* prior to host reset */
+#define MPT2_IOC_AFTER_RESET		2 /* just after host reset */
+#define MPT2_IOC_DONE_RESET		3 /* links re-initialized */
+
+/*
+ * logging format
+ */
+#define MPT2SAS_FMT			"%s: "
+#define MPT2SAS_DEBUG_FMT		KERN_DEBUG MPT2SAS_FMT
+#define MPT2SAS_INFO_FMT		KERN_INFO MPT2SAS_FMT
+#define MPT2SAS_NOTE_FMT		KERN_NOTICE MPT2SAS_FMT
+#define MPT2SAS_WARN_FMT		KERN_WARNING MPT2SAS_FMT
+#define MPT2SAS_ERR_FMT			KERN_ERR MPT2SAS_FMT
+
+/*
+ * per target private data
+ */
+#define MPT_TARGET_FLAGS_RAID_COMPONENT	0x01
+#define MPT_TARGET_FLAGS_VOLUME		0x02
+#define MPT_TARGET_FLAGS_DELETED	0x04
+
+/**
+ * struct MPT2SAS_TARGET - starget private hostdata
+ * @starget: starget object
+ * @sas_address: target sas address
+ * @handle: device handle
+ * @num_luns: number luns
+ * @flags: MPT_TARGET_FLAGS_XXX flags
+ * @deleted: target flaged for deletion
+ * @tm_busy: target is busy with TM request.
+ */
+struct MPT2SAS_TARGET {
+	struct scsi_target *starget;
+	u64	sas_address;
+	u16	handle;
+	int	num_luns;
+	u32	flags;
+	u8	deleted;
+	u8	tm_busy;
+};
+
+/*
+ * per device private data
+ */
+#define MPT_DEVICE_FLAGS_INIT		0x01
+#define MPT_DEVICE_TLR_ON		0x02
+
+/**
+ * struct MPT2SAS_DEVICE - sdev private hostdata
+ * @sas_target: starget private hostdata
+ * @lun: lun number
+ * @flags: MPT_DEVICE_XXX flags
+ * @configured_lun: lun is configured
+ * @block: device is in SDEV_BLOCK state
+ * @tlr_snoop_check: flag used in determining whether to disable TLR
+ */
+struct MPT2SAS_DEVICE {
+	struct MPT2SAS_TARGET *sas_target;
+	unsigned int	lun;
+	u32	flags;
+	u8	configured_lun;
+	u8	block;
+	u8	tlr_snoop_check;
+};
+
+#define MPT2_CMD_NOT_USED	0x8000	/* free */
+#define MPT2_CMD_COMPLETE	0x0001	/* completed */
+#define MPT2_CMD_PENDING	0x0002	/* pending */
+#define MPT2_CMD_REPLY_VALID	0x0004	/* reply is valid */
+#define MPT2_CMD_RESET		0x0008	/* host reset dropped the command */
+
+/**
+ * struct _internal_cmd - internal commands struct
+ * @mutex: mutex
+ * @done: completion
+ * @reply: reply message pointer
+ * @status: MPT2_CMD_XXX status
+ * @smid: system message id
+ */
+struct _internal_cmd {
+	struct mutex mutex;
+	struct completion done;
+	void 	*reply;
+	u16	status;
+	u16	smid;
+};
+
+/*
+ * SAS Topology Structures
+ */
+
+/**
+ * struct _sas_device - attached device information
+ * @list: sas device list
+ * @starget: starget object
+ * @sas_address: device sas address
+ * @device_name: retrieved from the SAS IDENTIFY frame.
+ * @handle: device handle
+ * @parent_handle: handle to parent device
+ * @enclosure_handle: enclosure handle
+ * @enclosure_logical_id: enclosure logical identifier
+ * @volume_handle: volume handle (valid when hidden raid member)
+ * @volume_wwid: volume unique identifier
+ * @device_info: bitfield provides detailed info about the device
+ * @id: target id
+ * @channel: target channel
+ * @slot: number number
+ * @hidden_raid_component: set to 1 when this is a raid member
+ * @responding: used in _scsih_sas_device_mark_responding
+ */
+struct _sas_device {
+	struct list_head list;
+	struct scsi_target *starget;
+	u64	sas_address;
+	u64	device_name;
+	u16	handle;
+	u16	parent_handle;
+	u16	enclosure_handle;
+	u64	enclosure_logical_id;
+	u16	volume_handle;
+	u64	volume_wwid;
+	u32	device_info;
+	int	id;
+	int	channel;
+	u16	slot;
+	u8	hidden_raid_component;
+	u8	responding;
+};
+
+/**
+ * struct _raid_device - raid volume link list
+ * @list: sas device list
+ * @starget: starget object
+ * @sdev: scsi device struct (volumes are single lun)
+ * @wwid: unique identifier for the volume
+ * @handle: device handle
+ * @id: target id
+ * @channel: target channel
+ * @volume_type: the raid level
+ * @device_info: bitfield provides detailed info about the hidden components
+ * @num_pds: number of hidden raid components
+ * @responding: used in _scsih_raid_device_mark_responding
+ */
+struct _raid_device {
+	struct list_head list;
+	struct scsi_target *starget;
+	struct scsi_device *sdev;
+	u64	wwid;
+	u16	handle;
+	int	id;
+	int	channel;
+	u8	volume_type;
+	u32	device_info;
+	u8	num_pds;
+	u8	responding;
+};
+
+/**
+ * struct _boot_device - boot device info
+ * @is_raid: flag to indicate whether this is volume
+ * @device: holds pointer for either struct _sas_device or
+ *     struct _raid_device
+ */
+struct _boot_device {
+	u8 is_raid;
+	void *device;
+};
+
+/**
+ * struct _sas_port - wide/narrow sas port information
+ * @port_list: list of ports belonging to expander
+ * @handle: device handle for this port
+ * @sas_address: sas address of this port
+ * @num_phys: number of phys belonging to this port
+ * @remote_identify: attached device identification
+ * @rphy: sas transport rphy object
+ * @port: sas transport wide/narrow port object
+ * @phy_list: _sas_phy list objects belonging to this port
+ */
+struct _sas_port {
+	struct list_head port_list;
+	u16	handle;
+	u64	sas_address;
+	u8	num_phys;
+	struct sas_identify remote_identify;
+	struct sas_rphy *rphy;
+	struct sas_port *port;
+	struct list_head phy_list;
+};
+
+/**
+ * struct _sas_phy - phy information
+ * @port_siblings: list of phys belonging to a port
+ * @identify: phy identification
+ * @remote_identify: attached device identification
+ * @phy: sas transport phy object
+ * @phy_id: unique phy id
+ * @handle: device handle for this phy
+ * @attached_handle: device handle for attached device
+ */
+struct _sas_phy {
+	struct list_head port_siblings;
+	struct sas_identify identify;
+	struct sas_identify remote_identify;
+	struct sas_phy *phy;
+	u8	phy_id;
+	u16	handle;
+	u16	attached_handle;
+};
+
+/**
+ * struct _sas_node - sas_host/expander information
+ * @list: list of expanders
+ * @parent_dev: parent device class
+ * @num_phys: number phys belonging to this sas_host/expander
+ * @sas_address: sas address of this sas_host/expander
+ * @handle: handle for this sas_host/expander
+ * @parent_handle: parent handle
+ * @enclosure_handle: handle for this a member of an enclosure
+ * @device_info: bitwise defining capabilities of this sas_host/expander
+ * @responding: used in _scsih_expander_device_mark_responding
+ * @phy: a list of phys that make up this sas_host/expander
+ * @sas_port_list: list of ports attached to this sas_host/expander
+ */
+struct _sas_node {
+	struct list_head list;
+	struct device *parent_dev;
+	u8	num_phys;
+	u64	sas_address;
+	u16	handle;
+	u16	parent_handle;
+	u16	enclosure_handle;
+	u64	enclosure_logical_id;
+	u8	responding;
+	struct	_sas_phy *phy;
+	struct list_head sas_port_list;
+};
+
+/**
+ * enum reset_type - reset state
+ * @FORCE_BIG_HAMMER: issue diagnostic reset
+ * @SOFT_RESET: issue message_unit_reset, if fails to to big hammer
+ */
+enum reset_type {
+	FORCE_BIG_HAMMER,
+	SOFT_RESET,
+};
+
+/**
+ * struct request_tracker - firmware request tracker
+ * @smid: system message id
+ * @scmd: scsi request pointer
+ * @cb_idx: callback index
+ * @chain_list: list of chains associated to this IO
+ * @tracker_list: list of free request (ioc->free_list)
+ */
+struct request_tracker {
+	u16	smid;
+	struct scsi_cmnd *scmd;
+	u8	cb_idx;
+	struct list_head tracker_list;
+};
+
+typedef void (*MPT_ADD_SGE)(void *paddr, u32 flags_length, dma_addr_t dma_addr);
+
+/**
+ * struct MPT2SAS_ADAPTER - per adapter struct
+ * @list: ioc_list
+ * @shost: shost object
+ * @id: unique adapter id
+ * @pci_irq: irq number
+ * @name: generic ioc string
+ * @tmp_string: tmp string used for logging
+ * @pdev: pci pdev object
+ * @chip: memory mapped register space
+ * @chip_phys: physical addrss prior to mapping
+ * @pio_chip: I/O mapped register space
+ * @logging_level: see mpt2sas_debug.h
+ * @ir_firmware: IR firmware present
+ * @bars: bitmask of BAR's that must be configured
+ * @mask_interrupts: ignore interrupt
+ * @fault_reset_work_q_name: fw fault work queue
+ * @fault_reset_work_q: ""
+ * @fault_reset_work: ""
+ * @firmware_event_name: fw event work queue
+ * @firmware_event_thread: ""
+ * @fw_events_off: flag to turn off fw event handling
+ * @fw_event_lock:
+ * @fw_event_list: list of fw events
+ * @aen_event_read_flag: event log was read
+ * @broadcast_aen_busy: broadcast aen waiting to be serviced
+ * @ioc_reset_in_progress: host reset in progress
+ * @ioc_reset_in_progress_lock:
+ * @ioc_link_reset_in_progress: phy/hard reset in progress
+ * @ignore_loginfos: ignore loginfos during task managment
+ * @remove_host: flag for when driver unloads, to avoid sending dev resets
+ * @wait_for_port_enable_to_complete:
+ * @msix_enable: flag indicating msix is enabled
+ * @msix_vector_count: number msix vectors
+ * @msix_table: virt address to the msix table
+ * @msix_table_backup: backup msix table
+ * @scsi_io_cb_idx: shost generated commands
+ * @tm_cb_idx: task management commands
+ * @transport_cb_idx: transport internal commands
+ * @ctl_cb_idx: clt internal commands
+ * @base_cb_idx: base internal commands
+ * @config_cb_idx: base internal commands
+ * @base_cmds:
+ * @transport_cmds:
+ * @tm_cmds:
+ * @ctl_cmds:
+ * @config_cmds:
+ * @base_add_sg_single: handler for either 32/64 bit sgl's
+ * @event_type: bits indicating which events to log
+ * @event_context: unique id for each logged event
+ * @event_log: event log pointer
+ * @event_masks: events that are masked
+ * @facts: static facts data
+ * @pfacts: static port facts data
+ * @manu_pg0: static manufacturing page 0
+ * @bios_pg2: static bios page 2
+ * @bios_pg3: static bios page 3
+ * @ioc_pg8: static ioc page 8
+ * @iounit_pg0: static iounit page 0
+ * @iounit_pg1: static iounit page 1
+ * @sas_hba: sas host object
+ * @sas_expander_list: expander object list
+ * @sas_node_lock:
+ * @sas_device_list: sas device object list
+ * @sas_device_init_list: sas device object list (used only at init time)
+ * @sas_device_lock:
+ * @io_missing_delay: time for IO completed by fw when PDR enabled
+ * @device_missing_delay: time for device missing by fw when PDR enabled
+ * @config_page_sz: config page size
+ * @config_page: reserve memory for config page payload
+ * @config_page_dma:
+ * @sge_size: sg element size for either 32/64 bit
+ * @request_depth: hba request queue depth
+ * @request_sz: per request frame size
+ * @request: pool of request frames
+ * @request_dma:
+ * @request_dma_sz:
+ * @scsi_lookup: firmware request tracker list
+ * @scsi_lookup_lock:
+ * @free_list: free list of request
+ * @chain: pool of chains
+ * @pending_io_count:
+ * @reset_wq:
+ * @chain_dma:
+ * @max_sges_in_main_message: number sg elements in main message
+ * @max_sges_in_chain_message: number sg elements per chain
+ * @chains_needed_per_io: max chains per io
+ * @chain_offset_value_for_main_message: location 1st sg in main
+ * @chain_depth: total chains allocated
+ * @sense: pool of sense
+ * @sense_dma:
+ * @sense_dma_pool:
+ * @reply_depth: hba reply queue depth:
+ * @reply_sz: per reply frame size:
+ * @reply: pool of replys:
+ * @reply_dma:
+ * @reply_dma_pool:
+ * @reply_free_queue_depth: reply free depth
+ * @reply_free: pool for reply free queue (32 bit addr)
+ * @reply_free_dma:
+ * @reply_free_dma_pool:
+ * @reply_free_host_index: tail index in pool to insert free replys
+ * @reply_post_queue_depth: reply post queue depth
+ * @reply_post_free: pool for reply post (64bit descriptor)
+ * @reply_post_free_dma:
+ * @reply_post_free_dma_pool:
+ * @reply_post_host_index: head index in the pool where FW completes IO
+ */
+struct MPT2SAS_ADAPTER {
+	struct list_head list;
+	struct Scsi_Host *shost;
+	u8		id;
+	u32		pci_irq;
+	char		name[MPT_NAME_LENGTH];
+	char		tmp_string[MPT_STRING_LENGTH];
+	struct pci_dev	*pdev;
+	Mpi2SystemInterfaceRegs_t __iomem *chip;
+	unsigned long	chip_phys;
+	unsigned long	pio_chip;
+	int		logging_level;
+	u8		ir_firmware;
+	int		bars;
+	u8		mask_interrupts;
+
+	/* fw fault handler */
+	char		fault_reset_work_q_name[20];
+	struct workqueue_struct *fault_reset_work_q;
+	struct delayed_work fault_reset_work;
+
+	/* fw event handler */
+	char		firmware_event_name[20];
+	struct workqueue_struct	*firmware_event_thread;
+	u8		fw_events_off;
+	spinlock_t	fw_event_lock;
+	struct list_head fw_event_list;
+
+	 /* misc flags */
+	int		aen_event_read_flag;
+	u8		broadcast_aen_busy;
+	u8		ioc_reset_in_progress;
+	u8		shost_recovery;
+	spinlock_t 	ioc_reset_in_progress_lock;
+	u8		ioc_link_reset_in_progress;
+	u8		ignore_loginfos;
+	u8		remove_host;
+	u8		wait_for_port_enable_to_complete;
+
+	u8		msix_enable;
+	u16		msix_vector_count;
+	u32		*msix_table;
+	u32		*msix_table_backup;
+
+	/* internal commands, callback index */
+	u8		scsi_io_cb_idx;
+	u8		tm_cb_idx;
+	u8		transport_cb_idx;
+	u8		ctl_cb_idx;
+	u8		base_cb_idx;
+	u8		config_cb_idx;
+	struct _internal_cmd base_cmds;
+	struct _internal_cmd transport_cmds;
+	struct _internal_cmd tm_cmds;
+	struct _internal_cmd ctl_cmds;
+	struct _internal_cmd config_cmds;
+
+	MPT_ADD_SGE	base_add_sg_single;
+
+	/* event log */
+	u32		event_type[MPI2_EVENT_NOTIFY_EVENTMASK_WORDS];
+	u32		event_context;
+	void		*event_log;
+	u32		event_masks[MPI2_EVENT_NOTIFY_EVENTMASK_WORDS];
+
+	/* static config pages */
+	Mpi2IOCFactsReply_t facts;
+	Mpi2PortFactsReply_t *pfacts;
+	Mpi2ManufacturingPage0_t manu_pg0;
+	Mpi2BiosPage2_t	bios_pg2;
+	Mpi2BiosPage3_t	bios_pg3;
+	Mpi2IOCPage8_t ioc_pg8;
+	Mpi2IOUnitPage0_t iounit_pg0;
+	Mpi2IOUnitPage1_t iounit_pg1;
+
+	struct _boot_device req_boot_device;
+	struct _boot_device req_alt_boot_device;
+	struct _boot_device current_boot_device;
+
+	/* sas hba, expander, and device list */
+	struct _sas_node sas_hba;
+	struct list_head sas_expander_list;
+	spinlock_t	sas_node_lock;
+	struct list_head sas_device_list;
+	struct list_head sas_device_init_list;
+	spinlock_t	sas_device_lock;
+	struct list_head raid_device_list;
+	spinlock_t	raid_device_lock;
+	u8		io_missing_delay;
+	u16		device_missing_delay;
+	int		sas_id;
+
+	/* config page */
+	u16		config_page_sz;
+	void 		*config_page;
+	dma_addr_t	config_page_dma;
+
+	/* request */
+	u16		sge_size;
+	u16 		request_depth;
+	u16		request_sz;
+	u8		*request;
+	dma_addr_t	request_dma;
+	u32		request_dma_sz;
+	struct request_tracker *scsi_lookup;
+	spinlock_t scsi_lookup_lock;
+	struct list_head free_list;
+	int		pending_io_count;
+	wait_queue_head_t reset_wq;
+
+	/* chain */
+	u8		*chain;
+	dma_addr_t	chain_dma;
+	u16 		max_sges_in_main_message;
+	u16		max_sges_in_chain_message;
+	u16		chains_needed_per_io;
+	u16		chain_offset_value_for_main_message;
+	u16		chain_depth;
+
+	/* sense */
+	u8		*sense;
+	dma_addr_t	sense_dma;
+	struct dma_pool *sense_dma_pool;
+
+	/* reply */
+	u16		reply_sz;
+	u8		*reply;
+	dma_addr_t	reply_dma;
+	struct dma_pool *reply_dma_pool;
+
+	/* reply free queue */
+	u16 		reply_free_queue_depth;
+	u32		*reply_free;
+	dma_addr_t	reply_free_dma;
+	struct dma_pool *reply_free_dma_pool;
+	u32		reply_free_host_index;
+
+	/* reply post queue */
+	u16 		reply_post_queue_depth;
+	Mpi2ReplyDescriptorsUnion_t *reply_post_free;
+	dma_addr_t	reply_post_free_dma;
+	struct dma_pool *reply_post_free_dma_pool;
+	u32		reply_post_host_index;
+
+	/* diag buffer support */
+	u8		*diag_buffer[MPI2_DIAG_BUF_TYPE_COUNT];
+	u32		diag_buffer_sz[MPI2_DIAG_BUF_TYPE_COUNT];
+	dma_addr_t	diag_buffer_dma[MPI2_DIAG_BUF_TYPE_COUNT];
+	u8		diag_buffer_status[MPI2_DIAG_BUF_TYPE_COUNT];
+	u32		unique_id[MPI2_DIAG_BUF_TYPE_COUNT];
+	u32		product_specific[MPI2_DIAG_BUF_TYPE_COUNT][23];
+	u32		diagnostic_flags[MPI2_DIAG_BUF_TYPE_COUNT];
+};
+
+typedef void (*MPT_CALLBACK)(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID,
+    u32 reply);
+
+
+/* base shared API */
+extern struct list_head mpt2sas_ioc_list;
+
+int mpt2sas_base_attach(struct MPT2SAS_ADAPTER *ioc);
+void mpt2sas_base_detach(struct MPT2SAS_ADAPTER *ioc);
+int mpt2sas_base_map_resources(struct MPT2SAS_ADAPTER *ioc);
+void mpt2sas_base_free_resources(struct MPT2SAS_ADAPTER *ioc);
+int mpt2sas_base_hard_reset_handler(struct MPT2SAS_ADAPTER *ioc, int sleep_flag,
+    enum reset_type type);
+
+void *mpt2sas_base_get_msg_frame(struct MPT2SAS_ADAPTER *ioc, u16 smid);
+void *mpt2sas_base_get_sense_buffer(struct MPT2SAS_ADAPTER *ioc, u16 smid);
+void mpt2sas_base_build_zero_len_sge(struct MPT2SAS_ADAPTER *ioc, void *paddr);
+dma_addr_t mpt2sas_base_get_msg_frame_dma(struct MPT2SAS_ADAPTER *ioc, u16 smid);
+dma_addr_t mpt2sas_base_get_sense_buffer_dma(struct MPT2SAS_ADAPTER *ioc, u16 smid);
+
+u16 mpt2sas_base_get_smid(struct MPT2SAS_ADAPTER *ioc, u8 cb_idx);
+void mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER *ioc, u16 smid);
+void mpt2sas_base_put_smid_scsi_io(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 vf_id,
+    u16 handle);
+void mpt2sas_base_put_smid_hi_priority(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 vf_id);
+void mpt2sas_base_put_smid_target_assist(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    u8 vf_id, u16 io_index);
+void mpt2sas_base_put_smid_default(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 vf_id);
+void mpt2sas_base_initialize_callback_handler(void);
+u8 mpt2sas_base_register_callback_handler(MPT_CALLBACK cb_func);
+void mpt2sas_base_release_callback_handler(u8 cb_idx);
+
+void mpt2sas_base_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply);
+void *mpt2sas_base_get_reply_virt_addr(struct MPT2SAS_ADAPTER *ioc, u32 phys_addr);
+
+u32 mpt2sas_base_get_iocstate(struct MPT2SAS_ADAPTER *ioc, int cooked);
+
+void mpt2sas_base_fault_info(struct MPT2SAS_ADAPTER *ioc , u16 fault_code);
+int mpt2sas_base_sas_iounit_control(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2SasIoUnitControlReply_t *mpi_reply, Mpi2SasIoUnitControlRequest_t
+    *mpi_request);
+int mpt2sas_base_scsi_enclosure_processor(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2SepReply_t *mpi_reply, Mpi2SepRequest_t *mpi_request);
+void mpt2sas_base_validate_event_type(struct MPT2SAS_ADAPTER *ioc, u32 *event_type);
+
+/* scsih shared API */
+void mpt2sas_scsih_issue_tm(struct MPT2SAS_ADAPTER *ioc, u16 handle, uint lun,
+    u8 type, u16 smid_task, ulong timeout);
+void mpt2sas_scsih_set_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle);
+void mpt2sas_scsih_clear_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle);
+struct _sas_node *mpt2sas_scsih_expander_find_by_handle(struct MPT2SAS_ADAPTER *ioc,
+    u16 handle);
+struct _sas_node *mpt2sas_scsih_expander_find_by_sas_address(struct MPT2SAS_ADAPTER
+    *ioc, u64 sas_address);
+struct _sas_device *mpt2sas_scsih_sas_device_find_by_sas_address(
+    struct MPT2SAS_ADAPTER *ioc, u64 sas_address);
+
+void mpt2sas_scsih_event_callback(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, u32 reply);
+void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
+
+/* config shared API */
+void mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply);
+int mpt2sas_config_get_number_hba_phys(struct MPT2SAS_ADAPTER *ioc, u8 *num_phys);
+int mpt2sas_config_get_manufacturing_pg0(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2ManufacturingPage0_t *config_page);
+int mpt2sas_config_get_bios_pg2(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2BiosPage2_t *config_page);
+int mpt2sas_config_get_bios_pg3(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2BiosPage3_t *config_page);
+int mpt2sas_config_get_iounit_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2IOUnitPage0_t *config_page);
+int mpt2sas_config_get_sas_device_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasDevicePage0_t *config_page, u32 form, u32 handle);
+int mpt2sas_config_get_sas_device_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasDevicePage1_t *config_page, u32 form, u32 handle);
+int mpt2sas_config_get_sas_iounit_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasIOUnitPage0_t *config_page, u16 sz);
+int mpt2sas_config_get_iounit_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2IOUnitPage1_t *config_page);
+int mpt2sas_config_set_iounit_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2IOUnitPage1_t config_page);
+int mpt2sas_config_get_sas_iounit_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasIOUnitPage1_t *config_page, u16 sz);
+int mpt2sas_config_get_ioc_pg8(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2IOCPage8_t *config_page);
+int mpt2sas_config_get_expander_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2ExpanderPage0_t *config_page, u32 form, u32 handle);
+int mpt2sas_config_get_expander_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2ExpanderPage1_t *config_page, u32 phy_number, u16 handle);
+int mpt2sas_config_get_enclosure_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasEnclosurePage0_t *config_page, u32 form, u32 handle);
+int mpt2sas_config_get_phy_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasPhyPage0_t *config_page, u32 phy_number);
+int mpt2sas_config_get_phy_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasPhyPage1_t *config_page, u32 phy_number);
+int mpt2sas_config_get_raid_volume_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2RaidVolPage1_t *config_page, u32 form, u32 handle);
+int mpt2sas_config_get_number_pds(struct MPT2SAS_ADAPTER *ioc, u16 handle, u8 *num_pds);
+int mpt2sas_config_get_raid_volume_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2RaidVolPage0_t *config_page, u32 form, u32 handle, u16 sz);
+int mpt2sas_config_get_phys_disk_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2RaidPhysDiskPage0_t *config_page, u32 form,
+    u32 form_specific);
+int mpt2sas_config_get_volume_handle(struct MPT2SAS_ADAPTER *ioc, u16 pd_handle,
+    u16 *volume_handle);
+int mpt2sas_config_get_volume_wwid(struct MPT2SAS_ADAPTER *ioc, u16 volume_handle,
+    u64 *wwid);
+
+/* ctl shared API */
+extern struct device_attribute *mpt2sas_host_attrs[];
+extern struct device_attribute *mpt2sas_dev_attrs[];
+void mpt2sas_ctl_init(void);
+void mpt2sas_ctl_exit(void);
+void mpt2sas_ctl_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply);
+void mpt2sas_ctl_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
+void mpt2sas_ctl_event_callback(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, u32 reply);
+void mpt2sas_ctl_add_to_event_log(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventNotificationReply_t *mpi_reply);
+
+/* transport shared API */
+void mpt2sas_transport_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply);
+struct _sas_port *mpt2sas_transport_port_add(struct MPT2SAS_ADAPTER *ioc,
+    u16 handle, u16 parent_handle);
+void mpt2sas_transport_port_remove(struct MPT2SAS_ADAPTER *ioc, u64 sas_address,
+    u16 parent_handle);
+int mpt2sas_transport_add_host_phy(struct MPT2SAS_ADAPTER *ioc, struct _sas_phy
+    *mpt2sas_phy, Mpi2SasPhyPage0_t phy_pg0, struct device *parent_dev);
+int mpt2sas_transport_add_expander_phy(struct MPT2SAS_ADAPTER *ioc, struct _sas_phy
+    *mpt2sas_phy, Mpi2ExpanderPage1_t expander_pg1, struct device *parent_dev);
+void mpt2sas_transport_update_phy_link_change(struct MPT2SAS_ADAPTER *ioc, u16 handle,
+   u16 attached_handle, u8 phy_number, u8 link_rate);
+extern struct sas_function_template mpt2sas_transport_functions;
+extern struct scsi_transport_template *mpt2sas_transport_template;
+
+#endif /* MPT2SAS_BASE_H_INCLUDED */
diff --git a/drivers/scsi/mpt2sas/mpt2sas_config.c b/drivers/scsi/mpt2sas/mpt2sas_config.c
new file mode 100644
index 000000000000..58cfb97846f7
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_config.c
@@ -0,0 +1,1873 @@
+/*
+ * This module provides common API for accessing firmware configuration pages
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_base.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/blkdev.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+
+#include "mpt2sas_base.h"
+
+/* local definitions */
+
+/* Timeout for config page request (in seconds) */
+#define MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT 15
+
+/* Common sgl flags for READING a config page. */
+#define MPT2_CONFIG_COMMON_SGLFLAGS ((MPI2_SGE_FLAGS_SIMPLE_ELEMENT | \
+    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER \
+    | MPI2_SGE_FLAGS_END_OF_LIST) << MPI2_SGE_FLAGS_SHIFT)
+
+/* Common sgl flags for WRITING a config page. */
+#define MPT2_CONFIG_COMMON_WRITE_SGLFLAGS ((MPI2_SGE_FLAGS_SIMPLE_ELEMENT | \
+    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER \
+    | MPI2_SGE_FLAGS_END_OF_LIST | MPI2_SGE_FLAGS_HOST_TO_IOC) \
+    << MPI2_SGE_FLAGS_SHIFT)
+
+/**
+ * struct config_request - obtain dma memory via routine
+ * @config_page_sz: size
+ * @config_page: virt pointer
+ * @config_page_dma: phys pointer
+ *
+ */
+struct config_request{
+	u16			config_page_sz;
+	void			*config_page;
+	dma_addr_t		config_page_dma;
+};
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _config_display_some_debug - debug routine
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @calling_function_name: string pass from calling function
+ * @mpi_reply: reply message frame
+ * Context: none.
+ *
+ * Function for displaying debug info helpfull when debugging issues
+ * in this module.
+ */
+static void
+_config_display_some_debug(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    char *calling_function_name, MPI2DefaultReply_t *mpi_reply)
+{
+	Mpi2ConfigRequest_t *mpi_request;
+	char *desc = NULL;
+
+	if (!(ioc->logging_level & MPT_DEBUG_CONFIG))
+		return;
+
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	switch (mpi_request->Header.PageType & MPI2_CONFIG_PAGETYPE_MASK) {
+	case MPI2_CONFIG_PAGETYPE_IO_UNIT:
+		desc = "io_unit";
+		break;
+	case MPI2_CONFIG_PAGETYPE_IOC:
+		desc = "ioc";
+		break;
+	case MPI2_CONFIG_PAGETYPE_BIOS:
+		desc = "bios";
+		break;
+	case MPI2_CONFIG_PAGETYPE_RAID_VOLUME:
+		desc = "raid_volume";
+		break;
+	case MPI2_CONFIG_PAGETYPE_MANUFACTURING:
+		desc = "manufaucturing";
+		break;
+	case MPI2_CONFIG_PAGETYPE_RAID_PHYSDISK:
+		desc = "physdisk";
+		break;
+	case MPI2_CONFIG_PAGETYPE_EXTENDED:
+		switch (mpi_request->ExtPageType) {
+		case MPI2_CONFIG_EXTPAGETYPE_SAS_IO_UNIT:
+			desc = "sas_io_unit";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_SAS_EXPANDER:
+			desc = "sas_expander";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_SAS_DEVICE:
+			desc = "sas_device";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_SAS_PHY:
+			desc = "sas_phy";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_LOG:
+			desc = "log";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_ENCLOSURE:
+			desc = "enclosure";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_RAID_CONFIG:
+			desc = "raid_config";
+			break;
+		case MPI2_CONFIG_EXTPAGETYPE_DRIVER_MAPPING:
+			desc = "driver_mappping";
+			break;
+		}
+		break;
+	}
+
+	if (!desc)
+		return;
+
+	printk(MPT2SAS_DEBUG_FMT "%s: %s(%d), action(%d), form(0x%08x), "
+	    "smid(%d)\n", ioc->name, calling_function_name, desc,
+	    mpi_request->Header.PageNumber, mpi_request->Action,
+	    le32_to_cpu(mpi_request->PageAddress), smid);
+
+	if (!mpi_reply)
+		return;
+
+	if (mpi_reply->IOCStatus || mpi_reply->IOCLogInfo)
+		printk(MPT2SAS_DEBUG_FMT
+		    "\tiocstatus(0x%04x), loginfo(0x%08x)\n",
+		    ioc->name, le16_to_cpu(mpi_reply->IOCStatus),
+		    le32_to_cpu(mpi_reply->IOCLogInfo));
+}
+#endif
+
+/**
+ * mpt2sas_config_done - config page completion routine
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ * Context: none.
+ *
+ * The callback handler when using _config_request.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+
+	if (ioc->config_cmds.status == MPT2_CMD_NOT_USED)
+		return;
+	if (ioc->config_cmds.smid != smid)
+		return;
+	ioc->config_cmds.status |= MPT2_CMD_COMPLETE;
+	mpi_reply =  mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (mpi_reply) {
+		ioc->config_cmds.status |= MPT2_CMD_REPLY_VALID;
+		memcpy(ioc->config_cmds.reply, mpi_reply,
+		    mpi_reply->MsgLength*4);
+	}
+	ioc->config_cmds.status &= ~MPT2_CMD_PENDING;
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	_config_display_some_debug(ioc, smid, "config_done", mpi_reply);
+#endif
+	complete(&ioc->config_cmds.done);
+}
+
+/**
+ * _config_request - main routine for sending config page requests
+ * @ioc: per adapter object
+ * @mpi_request: request message frame
+ * @mpi_reply: reply mf payload returned from firmware
+ * @timeout: timeout in seconds
+ * Context: sleep, the calling function needs to acquire the config_cmds.mutex
+ *
+ * A generic API for config page requests to firmware.
+ *
+ * The ioc->config_cmds.status flag should be MPT2_CMD_NOT_USED before calling
+ * this API.
+ *
+ * The callback index is set inside `ioc->config_cb_idx.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_config_request(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigRequest_t
+    *mpi_request, Mpi2ConfigReply_t *mpi_reply, int timeout)
+{
+	u16 smid;
+	u32 ioc_state;
+	unsigned long timeleft;
+	Mpi2ConfigRequest_t *config_request;
+	int r;
+	u8 retry_count;
+	u8 issue_reset;
+	u16 wait_state_count;
+
+	if (ioc->config_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: config_cmd in use\n",
+		    ioc->name, __func__);
+		return -EAGAIN;
+	}
+	retry_count = 0;
+
+ retry_config:
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			ioc->config_cmds.status = MPT2_CMD_NOT_USED;
+			return -EFAULT;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+	if (wait_state_count)
+		printk(MPT2SAS_INFO_FMT "%s: ioc is operational\n",
+		    ioc->name, __func__);
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->config_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		ioc->config_cmds.status = MPT2_CMD_NOT_USED;
+		return -EAGAIN;
+	}
+
+	r = 0;
+	memset(mpi_reply, 0, sizeof(Mpi2ConfigReply_t));
+	ioc->config_cmds.status = MPT2_CMD_PENDING;
+	config_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->config_cmds.smid = smid;
+	memcpy(config_request, mpi_request, sizeof(Mpi2ConfigRequest_t));
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	_config_display_some_debug(ioc, smid, "config_request", NULL);
+#endif
+	mpt2sas_base_put_smid_default(ioc, smid, config_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->config_cmds.done,
+	    timeout*HZ);
+	if (!(ioc->config_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2ConfigRequest_t)/4);
+		if (!(ioc->config_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+	if (ioc->config_cmds.status & MPT2_CMD_REPLY_VALID)
+		memcpy(mpi_reply, ioc->config_cmds.reply,
+		    sizeof(Mpi2ConfigReply_t));
+	if (retry_count)
+		printk(MPT2SAS_INFO_FMT "%s: retry completed!!\n",
+		    ioc->name, __func__);
+	ioc->config_cmds.status = MPT2_CMD_NOT_USED;
+	return r;
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+	ioc->config_cmds.status = MPT2_CMD_NOT_USED;
+	if (!retry_count) {
+		printk(MPT2SAS_INFO_FMT "%s: attempting retry\n",
+		    ioc->name, __func__);
+		retry_count++;
+		goto retry_config;
+	}
+	return -EFAULT;
+}
+
+/**
+ * _config_alloc_config_dma_memory - obtain physical memory
+ * @ioc: per adapter object
+ * @mem: struct config_request
+ *
+ * A wrapper for obtaining dma-able memory for config page request.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_config_alloc_config_dma_memory(struct MPT2SAS_ADAPTER *ioc,
+    struct config_request *mem)
+{
+	int r = 0;
+
+	mem->config_page = pci_alloc_consistent(ioc->pdev, mem->config_page_sz,
+	    &mem->config_page_dma);
+	if (!mem->config_page)
+		r = -ENOMEM;
+	return r;
+}
+
+/**
+ * _config_free_config_dma_memory - wrapper to free the memory
+ * @ioc: per adapter object
+ * @mem: struct config_request
+ *
+ * A wrapper to free dma-able memory when using _config_alloc_config_dma_memory.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static void
+_config_free_config_dma_memory(struct MPT2SAS_ADAPTER *ioc,
+    struct config_request *mem)
+{
+	pci_free_consistent(ioc->pdev, mem->config_page_sz, mem->config_page,
+	    mem->config_page_dma);
+}
+
+/**
+ * mpt2sas_config_get_manufacturing_pg0 - obtain manufacturing page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_manufacturing_pg0(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2ManufacturingPage0_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2ManufacturingPage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_MANUFACTURING;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_MANUFACTURING0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2ManufacturingPage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_bios_pg2 - obtain bios page 2
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_bios_pg2(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2BiosPage2_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2BiosPage2_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_BIOS;
+	mpi_request.Header.PageNumber = 2;
+	mpi_request.Header.PageVersion = MPI2_BIOSPAGE2_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2BiosPage2_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_bios_pg3 - obtain bios page 3
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_bios_pg3(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2BiosPage3_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2BiosPage3_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_BIOS;
+	mpi_request.Header.PageNumber = 3;
+	mpi_request.Header.PageVersion = MPI2_BIOSPAGE3_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2BiosPage3_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_iounit_pg0 - obtain iounit page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_iounit_pg0(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2IOUnitPage0_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2IOUnitPage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_IO_UNIT;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_IOUNITPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2IOUnitPage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_iounit_pg1 - obtain iounit page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_iounit_pg1(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2IOUnitPage1_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2IOUnitPage1_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_IO_UNIT;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_IOUNITPAGE1_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2IOUnitPage1_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_set_iounit_pg1 - set iounit page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_set_iounit_pg1(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2IOUnitPage1_t config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_IO_UNIT;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_IOUNITPAGE1_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_WRITE_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+
+	memset(mem.config_page, 0, mem.config_page_sz);
+	memcpy(mem.config_page, &config_page,
+	    sizeof(Mpi2IOUnitPage1_t));
+
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_WRITE_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_ioc_pg8 - obtain ioc page 8
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_ioc_pg8(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2IOCPage8_t *config_page)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2IOCPage8_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_IOC;
+	mpi_request.Header.PageNumber = 8;
+	mpi_request.Header.PageVersion = MPI2_IOCPAGE8_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2IOCPage8_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_sas_device_pg0 - obtain sas device page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: device handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_sas_device_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasDevicePage0_t *config_page, u32 form, u32 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2SasDevicePage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_DEVICE;
+	mpi_request.Header.PageVersion = MPI2_SASDEVICE0_PAGEVERSION;
+	mpi_request.Header.PageNumber = 0;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2SasDevicePage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_sas_device_pg1 - obtain sas device page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: device handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_sas_device_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasDevicePage1_t *config_page, u32 form, u32 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2SasDevicePage1_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_DEVICE;
+	mpi_request.Header.PageVersion = MPI2_SASDEVICE1_PAGEVERSION;
+	mpi_request.Header.PageNumber = 1;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2SasDevicePage1_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_number_hba_phys - obtain number of phys on the host
+ * @ioc: per adapter object
+ * @num_phys: pointer returned with the number of phys
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_number_hba_phys(struct MPT2SAS_ADAPTER *ioc, u8 *num_phys)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+	u16 ioc_status;
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasIOUnitPage0_t config_page;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_IO_UNIT;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_SASIOUNITPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply.Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply.Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply.Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply.ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply.ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply.ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r) {
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
+			memcpy(&config_page, mem.config_page,
+			    min_t(u16, mem.config_page_sz,
+			    sizeof(Mpi2SasIOUnitPage0_t)));
+			*num_phys = config_page.NumPhys;
+		}
+	}
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_sas_iounit_pg0 - obtain sas iounit page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @sz: size of buffer passed in config_page
+ * Context: sleep.
+ *
+ * Calling function should call config_get_number_hba_phys prior to
+ * this function, so enough memory is allocated for config_page.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_sas_iounit_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasIOUnitPage0_t *config_page, u16 sz)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sz);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_IO_UNIT;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_SASIOUNITPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, sz, mem.config_page_sz));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_sas_iounit_pg1 - obtain sas iounit page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @sz: size of buffer passed in config_page
+ * Context: sleep.
+ *
+ * Calling function should call config_get_number_hba_phys prior to
+ * this function, so enough memory is allocated for config_page.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_sas_iounit_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasIOUnitPage1_t *config_page, u16 sz)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sz);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_IO_UNIT;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_SASIOUNITPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, sz, mem.config_page_sz));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_expander_pg0 - obtain expander page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: expander handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_expander_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2ExpanderPage0_t *config_page, u32 form, u32 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2ExpanderPage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_EXPANDER;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_SASEXPANDER0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2ExpanderPage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_expander_pg1 - obtain expander page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @phy_number: phy number
+ * @handle: expander handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_expander_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2ExpanderPage1_t *config_page, u32 phy_number,
+    u16 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2ExpanderPage1_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_EXPANDER;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_SASEXPANDER1_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress =
+	    cpu_to_le32(MPI2_SAS_EXPAND_PGAD_FORM_HNDL_PHY_NUM |
+	    (phy_number << MPI2_SAS_EXPAND_PGAD_PHYNUM_SHIFT) | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2ExpanderPage1_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_enclosure_pg0 - obtain enclosure page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: expander handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_enclosure_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasEnclosurePage0_t *config_page, u32 form, u32 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2SasEnclosurePage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_ENCLOSURE;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_SASENCLOSURE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2SasEnclosurePage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_phy_pg0 - obtain phy page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @phy_number: phy number
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_phy_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasPhyPage0_t *config_page, u32 phy_number)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2SasPhyPage0_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_PHY;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_SASPHY0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress =
+	    cpu_to_le32(MPI2_SAS_PHY_PGAD_FORM_PHY_NUMBER | phy_number);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2SasPhyPage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_phy_pg1 - obtain phy page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @phy_number: phy number
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_phy_pg1(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2SasPhyPage1_t *config_page, u32 phy_number)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2SasPhyPage1_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_SAS_PHY;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_SASPHY1_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress =
+	    cpu_to_le32(MPI2_SAS_PHY_PGAD_FORM_PHY_NUMBER | phy_number);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply->ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply->ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2SasPhyPage1_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_raid_volume_pg1 - obtain raid volume page 1
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: volume handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_raid_volume_pg1(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2RaidVolPage1_t *config_page, u32 form,
+    u32 handle)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(config_page, 0, sizeof(Mpi2RaidVolPage1_t));
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_RAID_VOLUME;
+	mpi_request.Header.PageNumber = 1;
+	mpi_request.Header.PageVersion = MPI2_RAIDVOLPAGE1_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2RaidVolPage1_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_number_pds - obtain number of phys disk assigned to volume
+ * @ioc: per adapter object
+ * @handle: volume handle
+ * @num_pds: returns pds count
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_number_pds(struct MPT2SAS_ADAPTER *ioc, u16 handle,
+    u8 *num_pds)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	Mpi2RaidVolPage0_t *config_page;
+	Mpi2ConfigReply_t mpi_reply;
+	int r;
+	struct config_request mem;
+	u16 ioc_status;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	*num_pds = 0;
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_RAID_VOLUME;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_RAIDVOLPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress =
+	    cpu_to_le32(MPI2_RAID_VOLUME_PGAD_FORM_HANDLE | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply.Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply.Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply.Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply.Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply.Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r) {
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
+			config_page = mem.config_page;
+			*num_pds = config_page->NumPhysDisks;
+		}
+	}
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_raid_volume_pg0 - obtain raid volume page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_HANDLE or HANDLE
+ * @handle: volume handle
+ * @sz: size of buffer passed in config_page
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_raid_volume_pg0(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2ConfigReply_t *mpi_reply, Mpi2RaidVolPage0_t *config_page, u32 form,
+    u32 handle, u16 sz)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	memset(config_page, 0, sz);
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_RAID_VOLUME;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_RAIDVOLPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | handle);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, sz, mem.config_page_sz));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_phys_disk_pg0 - obtain phys disk page 0
+ * @ioc: per adapter object
+ * @mpi_reply: reply mf payload returned from firmware
+ * @config_page: contents of the config page
+ * @form: GET_NEXT_PHYSDISKNUM, PHYSDISKNUM, DEVHANDLE
+ * @form_specific: specific to the form
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_phys_disk_pg0(struct MPT2SAS_ADAPTER *ioc, Mpi2ConfigReply_t
+    *mpi_reply, Mpi2RaidPhysDiskPage0_t *config_page, u32 form,
+    u32 form_specific)
+{
+	Mpi2ConfigRequest_t mpi_request;
+	int r;
+	struct config_request mem;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	memset(config_page, 0, sizeof(Mpi2RaidPhysDiskPage0_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_RAID_PHYSDISK;
+	mpi_request.Header.PageNumber = 0;
+	mpi_request.Header.PageVersion = MPI2_RAIDPHYSDISKPAGE0_PAGEVERSION;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress = cpu_to_le32(form | form_specific);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply->Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply->Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply->Header.PageType;
+	mpi_request.Header.PageLength = mpi_reply->Header.PageLength;
+	mem.config_page_sz = le16_to_cpu(mpi_reply->Header.PageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (!r)
+		memcpy(config_page, mem.config_page,
+		    min_t(u16, mem.config_page_sz,
+		    sizeof(Mpi2RaidPhysDiskPage0_t)));
+
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_volume_handle - returns volume handle for give hidden raid components
+ * @ioc: per adapter object
+ * @pd_handle: phys disk handle
+ * @volume_handle: volume handle
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_volume_handle(struct MPT2SAS_ADAPTER *ioc, u16 pd_handle,
+    u16 *volume_handle)
+{
+	Mpi2RaidConfigurationPage0_t *config_page;
+	Mpi2ConfigRequest_t mpi_request;
+	Mpi2ConfigReply_t mpi_reply;
+	int r, i;
+	struct config_request mem;
+	u16 ioc_status;
+
+	mutex_lock(&ioc->config_cmds.mutex);
+	*volume_handle = 0;
+	memset(&mpi_request, 0, sizeof(Mpi2ConfigRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_CONFIG;
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_HEADER;
+	mpi_request.Header.PageType = MPI2_CONFIG_PAGETYPE_EXTENDED;
+	mpi_request.ExtPageType = MPI2_CONFIG_EXTPAGETYPE_RAID_CONFIG;
+	mpi_request.Header.PageVersion = MPI2_RAIDCONFIG0_PAGEVERSION;
+	mpi_request.Header.PageNumber = 0;
+	mpt2sas_base_build_zero_len_sge(ioc, &mpi_request.PageBufferSGE);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	mpi_request.PageAddress =
+	    cpu_to_le32(MPI2_RAID_PGAD_FORM_ACTIVE_CONFIG);
+	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_READ_CURRENT;
+	mpi_request.Header.PageVersion = mpi_reply.Header.PageVersion;
+	mpi_request.Header.PageNumber = mpi_reply.Header.PageNumber;
+	mpi_request.Header.PageType = mpi_reply.Header.PageType;
+	mpi_request.ExtPageLength = mpi_reply.ExtPageLength;
+	mpi_request.ExtPageType = mpi_reply.ExtPageType;
+	mem.config_page_sz = le16_to_cpu(mpi_reply.ExtPageLength) * 4;
+	if (mem.config_page_sz > ioc->config_page_sz) {
+		r = _config_alloc_config_dma_memory(ioc, &mem);
+		if (r)
+			goto out;
+	} else {
+		mem.config_page_dma = ioc->config_page_dma;
+		mem.config_page = ioc->config_page;
+	}
+	ioc->base_add_sg_single(&mpi_request.PageBufferSGE,
+	    MPT2_CONFIG_COMMON_SGLFLAGS | mem.config_page_sz,
+	    mem.config_page_dma);
+	r = _config_request(ioc, &mpi_request, &mpi_reply,
+	    MPT2_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto out;
+
+	r = -1;
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) & MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS)
+		goto done;
+	config_page = mem.config_page;
+	for (i = 0; i < config_page->NumElements; i++) {
+		if ((config_page->ConfigElement[i].ElementFlags &
+		    MPI2_RAIDCONFIG0_EFLAGS_MASK_ELEMENT_TYPE) !=
+		    MPI2_RAIDCONFIG0_EFLAGS_VOL_PHYS_DISK_ELEMENT)
+			continue;
+		if (config_page->ConfigElement[i].PhysDiskDevHandle ==
+		    pd_handle) {
+			*volume_handle = le16_to_cpu(config_page->
+			    ConfigElement[i].VolDevHandle);
+			r = 0;
+			goto done;
+		}
+	}
+
+ done:
+	if (mem.config_page_sz > ioc->config_page_sz)
+		_config_free_config_dma_memory(ioc, &mem);
+
+ out:
+	mutex_unlock(&ioc->config_cmds.mutex);
+	return r;
+}
+
+/**
+ * mpt2sas_config_get_volume_wwid - returns wwid given the volume handle
+ * @ioc: per adapter object
+ * @volume_handle: volume handle
+ * @wwid: volume wwid
+ * Context: sleep.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_config_get_volume_wwid(struct MPT2SAS_ADAPTER *ioc, u16 volume_handle,
+    u64 *wwid)
+{
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2RaidVolPage1_t raid_vol_pg1;
+
+	*wwid = 0;
+	if (!(mpt2sas_config_get_raid_volume_pg1(ioc, &mpi_reply,
+	    &raid_vol_pg1, MPI2_RAID_VOLUME_PGAD_FORM_HANDLE,
+	    volume_handle))) {
+		*wwid = le64_to_cpu(raid_vol_pg1.WWID);
+		return 0;
+	} else
+		return -1;
+}
diff --git a/drivers/scsi/mpt2sas/mpt2sas_ctl.c b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
new file mode 100644
index 000000000000..2d4f85c9d7a1
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_ctl.c
@@ -0,0 +1,2516 @@
+/*
+ * Management Module Support for MPT (Message Passing Technology) based
+ * controllers
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_ctl.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#include <linux/version.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/smp_lock.h>
+#include <linux/compat.h>
+#include <linux/poll.h>
+
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include "mpt2sas_base.h"
+#include "mpt2sas_ctl.h"
+
+static struct fasync_struct *async_queue;
+static DECLARE_WAIT_QUEUE_HEAD(ctl_poll_wait);
+
+/**
+ * enum block_state - blocking state
+ * @NON_BLOCKING: non blocking
+ * @BLOCKING: blocking
+ *
+ * These states are for ioctls that need to wait for a response
+ * from firmware, so they probably require sleep.
+ */
+enum block_state {
+	NON_BLOCKING,
+	BLOCKING,
+};
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _ctl_display_some_debug - debug routine
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @calling_function_name: string pass from calling function
+ * @mpi_reply: reply message frame
+ * Context: none.
+ *
+ * Function for displaying debug info helpfull when debugging issues
+ * in this module.
+ */
+static void
+_ctl_display_some_debug(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    char *calling_function_name, MPI2DefaultReply_t *mpi_reply)
+{
+	Mpi2ConfigRequest_t *mpi_request;
+	char *desc = NULL;
+
+	if (!(ioc->logging_level & MPT_DEBUG_IOCTL))
+		return;
+
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	switch (mpi_request->Function) {
+	case MPI2_FUNCTION_SCSI_IO_REQUEST:
+	{
+		Mpi2SCSIIORequest_t *scsi_request =
+		    (Mpi2SCSIIORequest_t *)mpi_request;
+
+		snprintf(ioc->tmp_string, MPT_STRING_LENGTH,
+		    "scsi_io, cmd(0x%02x), cdb_len(%d)",
+		    scsi_request->CDB.CDB32[0],
+		    le16_to_cpu(scsi_request->IoFlags) & 0xF);
+		desc = ioc->tmp_string;
+		break;
+	}
+	case MPI2_FUNCTION_SCSI_TASK_MGMT:
+		desc = "task_mgmt";
+		break;
+	case MPI2_FUNCTION_IOC_INIT:
+		desc = "ioc_init";
+		break;
+	case MPI2_FUNCTION_IOC_FACTS:
+		desc = "ioc_facts";
+		break;
+	case MPI2_FUNCTION_CONFIG:
+	{
+		Mpi2ConfigRequest_t *config_request =
+		    (Mpi2ConfigRequest_t *)mpi_request;
+
+		snprintf(ioc->tmp_string, MPT_STRING_LENGTH,
+		    "config, type(0x%02x), ext_type(0x%02x), number(%d)",
+		    (config_request->Header.PageType &
+		     MPI2_CONFIG_PAGETYPE_MASK), config_request->ExtPageType,
+		    config_request->Header.PageNumber);
+		desc = ioc->tmp_string;
+		break;
+	}
+	case MPI2_FUNCTION_PORT_FACTS:
+		desc = "port_facts";
+		break;
+	case MPI2_FUNCTION_PORT_ENABLE:
+		desc = "port_enable";
+		break;
+	case MPI2_FUNCTION_EVENT_NOTIFICATION:
+		desc = "event_notification";
+		break;
+	case MPI2_FUNCTION_FW_DOWNLOAD:
+		desc = "fw_download";
+		break;
+	case MPI2_FUNCTION_FW_UPLOAD:
+		desc = "fw_upload";
+		break;
+	case MPI2_FUNCTION_RAID_ACTION:
+		desc = "raid_action";
+		break;
+	case MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH:
+	{
+		Mpi2SCSIIORequest_t *scsi_request =
+		    (Mpi2SCSIIORequest_t *)mpi_request;
+
+		snprintf(ioc->tmp_string, MPT_STRING_LENGTH,
+		    "raid_pass, cmd(0x%02x), cdb_len(%d)",
+		    scsi_request->CDB.CDB32[0],
+		    le16_to_cpu(scsi_request->IoFlags) & 0xF);
+		desc = ioc->tmp_string;
+		break;
+	}
+	case MPI2_FUNCTION_SAS_IO_UNIT_CONTROL:
+		desc = "sas_iounit_cntl";
+		break;
+	case MPI2_FUNCTION_SATA_PASSTHROUGH:
+		desc = "sata_pass";
+		break;
+	case MPI2_FUNCTION_DIAG_BUFFER_POST:
+		desc = "diag_buffer_post";
+		break;
+	case MPI2_FUNCTION_DIAG_RELEASE:
+		desc = "diag_release";
+		break;
+	case MPI2_FUNCTION_SMP_PASSTHROUGH:
+		desc = "smp_passthrough";
+		break;
+	}
+
+	if (!desc)
+		return;
+
+	printk(MPT2SAS_DEBUG_FMT "%s: %s, smid(%d)\n",
+	    ioc->name, calling_function_name, desc, smid);
+
+	if (!mpi_reply)
+		return;
+
+	if (mpi_reply->IOCStatus || mpi_reply->IOCLogInfo)
+		printk(MPT2SAS_DEBUG_FMT
+		    "\tiocstatus(0x%04x), loginfo(0x%08x)\n",
+		    ioc->name, le16_to_cpu(mpi_reply->IOCStatus),
+		    le32_to_cpu(mpi_reply->IOCLogInfo));
+
+	if (mpi_request->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
+	    mpi_request->Function ==
+	    MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH) {
+		Mpi2SCSIIOReply_t *scsi_reply =
+		    (Mpi2SCSIIOReply_t *)mpi_reply;
+		if (scsi_reply->SCSIState || scsi_reply->SCSIStatus)
+			printk(MPT2SAS_DEBUG_FMT
+			    "\tscsi_state(0x%02x), scsi_status"
+			    "(0x%02x)\n", ioc->name,
+			    scsi_reply->SCSIState,
+			    scsi_reply->SCSIStatus);
+	}
+}
+#endif
+
+/**
+ * mpt2sas_ctl_done - ctl module completion routine
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ * Context: none.
+ *
+ * The callback handler when using ioc->ctl_cb_idx.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_ctl_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+
+	if (ioc->ctl_cmds.status == MPT2_CMD_NOT_USED)
+		return;
+	if (ioc->ctl_cmds.smid != smid)
+		return;
+	ioc->ctl_cmds.status |= MPT2_CMD_COMPLETE;
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (mpi_reply) {
+		memcpy(ioc->ctl_cmds.reply, mpi_reply, mpi_reply->MsgLength*4);
+		ioc->ctl_cmds.status |= MPT2_CMD_REPLY_VALID;
+	}
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	_ctl_display_some_debug(ioc, smid, "ctl_done", mpi_reply);
+#endif
+	ioc->ctl_cmds.status &= ~MPT2_CMD_PENDING;
+	complete(&ioc->ctl_cmds.done);
+}
+
+/**
+ * _ctl_check_event_type - determines when an event needs logging
+ * @ioc: per adapter object
+ * @event: firmware event
+ *
+ * The bitmask in ioc->event_type[] indicates which events should be
+ * be saved in the driver event_log.  This bitmask is set by application.
+ *
+ * Returns 1 when event should be captured, or zero means no match.
+ */
+static int
+_ctl_check_event_type(struct MPT2SAS_ADAPTER *ioc, u16 event)
+{
+	u16 i;
+	u32 desired_event;
+
+	if (event >= 128 || !event || !ioc->event_log)
+		return 0;
+
+	desired_event = (1 << (event % 32));
+	if (!desired_event)
+		desired_event = 1;
+	i = event / 32;
+	return desired_event & ioc->event_type[i];
+}
+
+/**
+ * mpt2sas_ctl_add_to_event_log - add event
+ * @ioc: per adapter object
+ * @mpi_reply: reply message frame
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_ctl_add_to_event_log(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventNotificationReply_t *mpi_reply)
+{
+	struct MPT2_IOCTL_EVENTS *event_log;
+	u16 event;
+	int i;
+	u32 sz, event_data_sz;
+	u8 send_aen = 0;
+
+	if (!ioc->event_log)
+		return;
+
+	event = le16_to_cpu(mpi_reply->Event);
+
+	if (_ctl_check_event_type(ioc, event)) {
+
+		/* insert entry into circular event_log */
+		i = ioc->event_context % MPT2SAS_CTL_EVENT_LOG_SIZE;
+		event_log = ioc->event_log;
+		event_log[i].event = event;
+		event_log[i].context = ioc->event_context++;
+
+		event_data_sz = le16_to_cpu(mpi_reply->EventDataLength)*4;
+		sz = min_t(u32, event_data_sz, MPT2_EVENT_DATA_SIZE);
+		memset(event_log[i].data, 0, MPT2_EVENT_DATA_SIZE);
+		memcpy(event_log[i].data, mpi_reply->EventData, sz);
+		send_aen = 1;
+	}
+
+	/* This aen_event_read_flag flag is set until the
+	 * application has read the event log.
+	 * For MPI2_EVENT_LOG_ENTRY_ADDED, we always notify.
+	 */
+	if (event == MPI2_EVENT_LOG_ENTRY_ADDED ||
+	    (send_aen && !ioc->aen_event_read_flag)) {
+		ioc->aen_event_read_flag = 1;
+		wake_up_interruptible(&ctl_poll_wait);
+		if (async_queue)
+			kill_fasync(&async_queue, SIGIO, POLL_IN);
+	}
+}
+
+/**
+ * mpt2sas_ctl_event_callback - firmware event handler (called at ISR time)
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ * Context: interrupt.
+ *
+ * This function merely adds a new work task into ioc->firmware_event_thread.
+ * The tasks are worked from _firmware_event_work in user context.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_ctl_event_callback(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, u32 reply)
+{
+	Mpi2EventNotificationReply_t *mpi_reply;
+
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	mpt2sas_ctl_add_to_event_log(ioc, mpi_reply);
+}
+
+/**
+ * _ctl_verify_adapter - validates ioc_number passed from application
+ * @ioc: per adapter object
+ * @iocpp: The ioc pointer is returned in this.
+ *
+ * Return (-1) means error, else ioc_number.
+ */
+static int
+_ctl_verify_adapter(int ioc_number, struct MPT2SAS_ADAPTER **iocpp)
+{
+	struct MPT2SAS_ADAPTER *ioc;
+
+	list_for_each_entry(ioc, &mpt2sas_ioc_list, list) {
+		if (ioc->id != ioc_number)
+			continue;
+		*iocpp = ioc;
+		return ioc_number;
+	}
+	*iocpp = NULL;
+	return -1;
+}
+
+/**
+ * mpt2sas_ctl_reset_handler - reset callback handler (for ctl)
+ * @ioc: per adapter object
+ * @reset_phase: phase
+ *
+ * The handler for doing any required cleanup or initialization.
+ *
+ * The reset phase can be MPT2_IOC_PRE_RESET, MPT2_IOC_AFTER_RESET,
+ * MPT2_IOC_DONE_RESET
+ */
+void
+mpt2sas_ctl_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase)
+{
+	switch (reset_phase) {
+	case MPT2_IOC_PRE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_PRE_RESET\n", ioc->name, __func__));
+		break;
+	case MPT2_IOC_AFTER_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_AFTER_RESET\n", ioc->name, __func__));
+		if (ioc->ctl_cmds.status & MPT2_CMD_PENDING) {
+			ioc->ctl_cmds.status |= MPT2_CMD_RESET;
+			mpt2sas_base_free_smid(ioc, ioc->ctl_cmds.smid);
+			complete(&ioc->ctl_cmds.done);
+		}
+		break;
+	case MPT2_IOC_DONE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_DONE_RESET\n", ioc->name, __func__));
+		break;
+	}
+}
+
+/**
+ * _ctl_fasync -
+ * @fd -
+ * @filep -
+ * @mode -
+ *
+ * Called when application request fasyn callback handler.
+ */
+static int
+_ctl_fasync(int fd, struct file *filep, int mode)
+{
+	return fasync_helper(fd, filep, mode, &async_queue);
+}
+
+/**
+ * _ctl_release -
+ * @inode -
+ * @filep -
+ *
+ * Called when application releases the fasyn callback handler.
+ */
+static int
+_ctl_release(struct inode *inode, struct file *filep)
+{
+	return fasync_helper(-1, filep, 0, &async_queue);
+}
+
+/**
+ * _ctl_poll -
+ * @file -
+ * @wait -
+ *
+ */
+static unsigned int
+_ctl_poll(struct file *filep, poll_table *wait)
+{
+	struct MPT2SAS_ADAPTER *ioc;
+
+	poll_wait(filep, &ctl_poll_wait, wait);
+
+	list_for_each_entry(ioc, &mpt2sas_ioc_list, list) {
+		if (ioc->aen_event_read_flag)
+			return POLLIN | POLLRDNORM;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_do_task_abort - assign an active smid to the abort_task
+ * @ioc: per adapter object
+ * @karg - (struct mpt2_ioctl_command)
+ * @tm_request - pointer to mf from user space
+ *
+ * Returns 0 when an smid if found, else fail.
+ * during failure, the reply frame is filled.
+ */
+static int
+_ctl_do_task_abort(struct MPT2SAS_ADAPTER *ioc, struct mpt2_ioctl_command *karg,
+    Mpi2SCSITaskManagementRequest_t *tm_request)
+{
+	u8 found = 0;
+	u16 i;
+	u16 handle;
+	struct scsi_cmnd *scmd;
+	struct MPT2SAS_DEVICE *priv_data;
+	unsigned long flags;
+	Mpi2SCSITaskManagementReply_t *tm_reply;
+	u32 sz;
+	u32 lun;
+
+	lun = scsilun_to_int((struct scsi_lun *)tm_request->LUN);
+
+	handle = le16_to_cpu(tm_request->DevHandle);
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	for (i = ioc->request_depth; i && !found; i--) {
+		scmd = ioc->scsi_lookup[i - 1].scmd;
+		if (scmd == NULL || scmd->device == NULL ||
+		    scmd->device->hostdata == NULL)
+			continue;
+		if (lun != scmd->device->lun)
+			continue;
+		priv_data = scmd->device->hostdata;
+		if (priv_data->sas_target == NULL)
+			continue;
+		if (priv_data->sas_target->handle != handle)
+			continue;
+		tm_request->TaskMID = cpu_to_le16(ioc->scsi_lookup[i - 1].smid);
+		found = 1;
+	}
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+
+	if (!found) {
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "ABORT_TASK: "
+		    "DevHandle(0x%04x), lun(%d), no active mid!!\n", ioc->name,
+		    tm_request->DevHandle, lun));
+		tm_reply = ioc->ctl_cmds.reply;
+		tm_reply->DevHandle = tm_request->DevHandle;
+		tm_reply->Function = MPI2_FUNCTION_SCSI_TASK_MGMT;
+		tm_reply->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK;
+		tm_reply->MsgLength = sizeof(Mpi2SCSITaskManagementReply_t)/4;
+		tm_reply->VP_ID = tm_request->VP_ID;
+		tm_reply->VF_ID = tm_request->VF_ID;
+		sz = min_t(u32, karg->max_reply_bytes, ioc->reply_sz);
+		if (copy_to_user(karg->reply_frame_buf_ptr, ioc->ctl_cmds.reply,
+		    sz))
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+		return 1;
+	}
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "ABORT_TASK: "
+	    "DevHandle(0x%04x), lun(%d), smid(%d)\n", ioc->name,
+	    tm_request->DevHandle, lun, tm_request->TaskMID));
+	return 0;
+}
+
+/**
+ * _ctl_do_mpt_command - main handler for MPT2COMMAND opcode
+ * @ioc: per adapter object
+ * @karg - (struct mpt2_ioctl_command)
+ * @mf - pointer to mf in user space
+ * @state - NON_BLOCKING or BLOCKING
+ */
+static long
+_ctl_do_mpt_command(struct MPT2SAS_ADAPTER *ioc,
+    struct mpt2_ioctl_command karg, void __user *mf, enum block_state state)
+{
+	MPI2RequestHeader_t *mpi_request;
+	MPI2DefaultReply_t *mpi_reply;
+	u32 ioc_state;
+	u16 ioc_status;
+	u16 smid;
+	unsigned long timeout, timeleft;
+	u8 issue_reset;
+	u32 sz;
+	void *psge;
+	void *priv_sense = NULL;
+	void *data_out = NULL;
+	dma_addr_t data_out_dma;
+	size_t data_out_sz = 0;
+	void *data_in = NULL;
+	dma_addr_t data_in_dma;
+	size_t data_in_sz = 0;
+	u32 sgl_flags;
+	long ret;
+	u16 wait_state_count;
+
+	issue_reset = 0;
+
+	if (state == NON_BLOCKING && !mutex_trylock(&ioc->ctl_cmds.mutex))
+		return -EAGAIN;
+	else if (mutex_lock_interruptible(&ioc->ctl_cmds.mutex))
+		return -ERESTARTSYS;
+
+	if (ioc->ctl_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: ctl_cmd in use\n",
+		    ioc->name, __func__);
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == 10) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			ret = -EFAULT;
+			goto out;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+	if (wait_state_count)
+		printk(MPT2SAS_INFO_FMT "%s: ioc is operational\n",
+		    ioc->name, __func__);
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->ctl_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	ret = 0;
+	ioc->ctl_cmds.status = MPT2_CMD_PENDING;
+	memset(ioc->ctl_cmds.reply, 0, ioc->reply_sz);
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->ctl_cmds.smid = smid;
+	data_out_sz = karg.data_out_size;
+	data_in_sz = karg.data_in_size;
+
+	/* copy in request message frame from user */
+	if (copy_from_user(mpi_request, mf, karg.data_sge_offset*4)) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__, __LINE__,
+		    __func__);
+		ret = -EFAULT;
+		mpt2sas_base_free_smid(ioc, smid);
+		goto out;
+	}
+
+	if (mpi_request->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
+	    mpi_request->Function == MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH) {
+		if (!mpi_request->FunctionDependent1 ||
+		    mpi_request->FunctionDependent1 >
+		    cpu_to_le16(ioc->facts.MaxDevHandle)) {
+			ret = -EINVAL;
+			mpt2sas_base_free_smid(ioc, smid);
+			goto out;
+		}
+	}
+
+	/* obtain dma-able memory for data transfer */
+	if (data_out_sz) /* WRITE */ {
+		data_out = pci_alloc_consistent(ioc->pdev, data_out_sz,
+		    &data_out_dma);
+		if (!data_out) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret = -ENOMEM;
+			mpt2sas_base_free_smid(ioc, smid);
+			goto out;
+		}
+		if (copy_from_user(data_out, karg.data_out_buf_ptr,
+			data_out_sz)) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret =  -EFAULT;
+			mpt2sas_base_free_smid(ioc, smid);
+			goto out;
+		}
+	}
+
+	if (data_in_sz) /* READ */ {
+		data_in = pci_alloc_consistent(ioc->pdev, data_in_sz,
+		    &data_in_dma);
+		if (!data_in) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret = -ENOMEM;
+			mpt2sas_base_free_smid(ioc, smid);
+			goto out;
+		}
+	}
+
+	/* add scatter gather elements */
+	psge = (void *)mpi_request + (karg.data_sge_offset*4);
+
+	if (!data_out_sz && !data_in_sz) {
+		mpt2sas_base_build_zero_len_sge(ioc, psge);
+	} else if (data_out_sz && data_in_sz) {
+		/* WRITE sgel first */
+		sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+		    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
+		sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+		ioc->base_add_sg_single(psge, sgl_flags |
+		    data_out_sz, data_out_dma);
+
+		/* incr sgel */
+		psge += ioc->sge_size;
+
+		/* READ sgel last */
+		sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+		    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
+		    MPI2_SGE_FLAGS_END_OF_LIST);
+		sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+		ioc->base_add_sg_single(psge, sgl_flags |
+		    data_in_sz, data_in_dma);
+	} else if (data_out_sz) /* WRITE */ {
+		sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+		    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
+		    MPI2_SGE_FLAGS_END_OF_LIST | MPI2_SGE_FLAGS_HOST_TO_IOC);
+		sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+		ioc->base_add_sg_single(psge, sgl_flags |
+		    data_out_sz, data_out_dma);
+	} else if (data_in_sz) /* READ */ {
+		sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+		    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
+		    MPI2_SGE_FLAGS_END_OF_LIST);
+		sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+		ioc->base_add_sg_single(psge, sgl_flags |
+		    data_in_sz, data_in_dma);
+	}
+
+	/* send command to firmware */
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	_ctl_display_some_debug(ioc, smid, "ctl_request", NULL);
+#endif
+
+	switch (mpi_request->Function) {
+	case MPI2_FUNCTION_SCSI_IO_REQUEST:
+	case MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH:
+	{
+		Mpi2SCSIIORequest_t *scsiio_request =
+		    (Mpi2SCSIIORequest_t *)mpi_request;
+		scsiio_request->SenseBufferLowAddress =
+		    (u32)mpt2sas_base_get_sense_buffer_dma(ioc, smid);
+		priv_sense = mpt2sas_base_get_sense_buffer(ioc, smid);
+		memset(priv_sense, 0, SCSI_SENSE_BUFFERSIZE);
+		mpt2sas_base_put_smid_scsi_io(ioc, smid, 0,
+		    le16_to_cpu(mpi_request->FunctionDependent1));
+		break;
+	}
+	case MPI2_FUNCTION_SCSI_TASK_MGMT:
+	{
+		Mpi2SCSITaskManagementRequest_t *tm_request =
+		    (Mpi2SCSITaskManagementRequest_t *)mpi_request;
+
+		if (tm_request->TaskType ==
+		    MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK) {
+			if (_ctl_do_task_abort(ioc, &karg, tm_request))
+				goto out;
+		}
+
+		mutex_lock(&ioc->tm_cmds.mutex);
+		mpt2sas_scsih_set_tm_flag(ioc, le16_to_cpu(
+		    tm_request->DevHandle));
+		mpt2sas_base_put_smid_hi_priority(ioc, smid,
+		    mpi_request->VF_ID);
+		break;
+	}
+	case MPI2_FUNCTION_SMP_PASSTHROUGH:
+	{
+		Mpi2SmpPassthroughRequest_t *smp_request =
+		    (Mpi2SmpPassthroughRequest_t *)mpi_request;
+		u8 *data;
+
+		/* ioc determines which port to use */
+		smp_request->PhysicalPort = 0xFF;
+		if (smp_request->PassthroughFlags &
+		    MPI2_SMP_PT_REQ_PT_FLAGS_IMMEDIATE)
+			data = (u8 *)&smp_request->SGL;
+		else
+			data = data_out;
+
+		if (data[1] == 0x91 && (data[10] == 1 || data[10] == 2)) {
+			ioc->ioc_link_reset_in_progress = 1;
+			ioc->ignore_loginfos = 1;
+		}
+		mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+		break;
+	}
+	case MPI2_FUNCTION_SAS_IO_UNIT_CONTROL:
+	{
+		Mpi2SasIoUnitControlRequest_t *sasiounit_request =
+		    (Mpi2SasIoUnitControlRequest_t *)mpi_request;
+
+		if (sasiounit_request->Operation == MPI2_SAS_OP_PHY_HARD_RESET
+		    || sasiounit_request->Operation ==
+		    MPI2_SAS_OP_PHY_LINK_RESET) {
+			ioc->ioc_link_reset_in_progress = 1;
+			ioc->ignore_loginfos = 1;
+		}
+		mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+		break;
+	}
+	default:
+		mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+		break;
+	}
+
+	if (karg.timeout < MPT2_IOCTL_DEFAULT_TIMEOUT)
+		timeout = MPT2_IOCTL_DEFAULT_TIMEOUT;
+	else
+		timeout = karg.timeout;
+	timeleft = wait_for_completion_timeout(&ioc->ctl_cmds.done,
+	    timeout*HZ);
+	if (mpi_request->Function == MPI2_FUNCTION_SCSI_TASK_MGMT) {
+		Mpi2SCSITaskManagementRequest_t *tm_request =
+		    (Mpi2SCSITaskManagementRequest_t *)mpi_request;
+		mutex_unlock(&ioc->tm_cmds.mutex);
+		mpt2sas_scsih_clear_tm_flag(ioc, le16_to_cpu(
+		    tm_request->DevHandle));
+	} else if ((mpi_request->Function == MPI2_FUNCTION_SMP_PASSTHROUGH ||
+	    mpi_request->Function == MPI2_FUNCTION_SAS_IO_UNIT_CONTROL) &&
+		ioc->ioc_link_reset_in_progress) {
+		ioc->ioc_link_reset_in_progress = 0;
+		ioc->ignore_loginfos = 0;
+	}
+	if (!(ioc->ctl_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n", ioc->name,
+		    __func__);
+		_debug_dump_mf(mpi_request, karg.data_sge_offset);
+		if (!(ioc->ctl_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	mpi_reply = ioc->ctl_cmds.reply;
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & MPI2_IOCSTATUS_MASK;
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (mpi_reply->Function == MPI2_FUNCTION_SCSI_TASK_MGMT &&
+	    (ioc->logging_level & MPT_DEBUG_TM)) {
+		Mpi2SCSITaskManagementReply_t *tm_reply =
+		    (Mpi2SCSITaskManagementReply_t *)mpi_reply;
+
+		printk(MPT2SAS_DEBUG_FMT "TASK_MGMT: "
+		    "IOCStatus(0x%04x), IOCLogInfo(0x%08x), "
+		    "TerminationCount(0x%08x)\n", ioc->name,
+		    tm_reply->IOCStatus, tm_reply->IOCLogInfo,
+		    tm_reply->TerminationCount);
+	}
+#endif
+	/* copy out xdata to user */
+	if (data_in_sz) {
+		if (copy_to_user(karg.data_in_buf_ptr, data_in,
+		    data_in_sz)) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret = -ENODATA;
+			goto out;
+		}
+	}
+
+	/* copy out reply message frame to user */
+	if (karg.max_reply_bytes) {
+		sz = min_t(u32, karg.max_reply_bytes, ioc->reply_sz);
+		if (copy_to_user(karg.reply_frame_buf_ptr, ioc->ctl_cmds.reply,
+		    sz)) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret = -ENODATA;
+			goto out;
+		}
+	}
+
+	/* copy out sense to user */
+	if (karg.max_sense_bytes && (mpi_request->Function ==
+	    MPI2_FUNCTION_SCSI_IO_REQUEST || mpi_request->Function ==
+	    MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH)) {
+		sz = min_t(u32, karg.max_sense_bytes, SCSI_SENSE_BUFFERSIZE);
+		if (copy_to_user(karg.sense_data_ptr, priv_sense, sz)) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+			    __LINE__, __func__);
+			ret = -ENODATA;
+			goto out;
+		}
+	}
+
+ issue_host_reset:
+	if (issue_reset) {
+		if ((mpi_request->Function == MPI2_FUNCTION_SCSI_IO_REQUEST ||
+		    mpi_request->Function ==
+		    MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH)) {
+			printk(MPT2SAS_INFO_FMT "issue target reset: handle "
+			    "= (0x%04x)\n", ioc->name,
+			    mpi_request->FunctionDependent1);
+			mutex_lock(&ioc->tm_cmds.mutex);
+			mpt2sas_scsih_issue_tm(ioc,
+			    mpi_request->FunctionDependent1, 0,
+			    MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, 10);
+			ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+			mutex_unlock(&ioc->tm_cmds.mutex);
+		} else
+			mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+			    FORCE_BIG_HAMMER);
+	}
+
+ out:
+
+	/* free memory associated with sg buffers */
+	if (data_in)
+		pci_free_consistent(ioc->pdev, data_in_sz, data_in,
+		    data_in_dma);
+
+	if (data_out)
+		pci_free_consistent(ioc->pdev, data_out_sz, data_out,
+		    data_out_dma);
+
+	ioc->ctl_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->ctl_cmds.mutex);
+	return ret;
+}
+
+/**
+ * _ctl_getiocinfo - main handler for MPT2IOCINFO opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_getiocinfo(void __user *arg)
+{
+	struct mpt2_ioctl_iocinfo karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	u8 revision;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	memset(&karg, 0 , sizeof(karg));
+	karg.adapter_type = MPT2_IOCTL_INTERFACE_SAS2;
+	if (ioc->pfacts)
+		karg.port_number = ioc->pfacts[0].PortNumber;
+	pci_read_config_byte(ioc->pdev, PCI_CLASS_REVISION, &revision);
+	karg.hw_rev = revision;
+	karg.pci_id = ioc->pdev->device;
+	karg.subsystem_device = ioc->pdev->subsystem_device;
+	karg.subsystem_vendor = ioc->pdev->subsystem_vendor;
+	karg.pci_information.u.bits.bus = ioc->pdev->bus->number;
+	karg.pci_information.u.bits.device = PCI_SLOT(ioc->pdev->devfn);
+	karg.pci_information.u.bits.function = PCI_FUNC(ioc->pdev->devfn);
+	karg.pci_information.segment_id = pci_domain_nr(ioc->pdev->bus);
+	karg.firmware_version = ioc->facts.FWVersion.Word;
+	strncpy(karg.driver_version, MPT2SAS_DRIVER_VERSION,
+	    MPT2_IOCTL_VERSION_LENGTH);
+	karg.driver_version[MPT2_IOCTL_VERSION_LENGTH - 1] = '\0';
+	karg.bios_version = le32_to_cpu(ioc->bios_pg3.BiosVersion);
+
+	if (copy_to_user(arg, &karg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_eventquery - main handler for MPT2EVENTQUERY opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_eventquery(void __user *arg)
+{
+	struct mpt2_ioctl_eventquery karg;
+	struct MPT2SAS_ADAPTER *ioc;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	karg.event_entries = MPT2SAS_CTL_EVENT_LOG_SIZE;
+	memcpy(karg.event_types, ioc->event_type,
+	    MPI2_EVENT_NOTIFY_EVENTMASK_WORDS * sizeof(u32));
+
+	if (copy_to_user(arg, &karg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_eventenable - main handler for MPT2EVENTENABLE opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_eventenable(void __user *arg)
+{
+	struct mpt2_ioctl_eventenable karg;
+	struct MPT2SAS_ADAPTER *ioc;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	if (ioc->event_log)
+		return 0;
+	memcpy(ioc->event_type, karg.event_types,
+	    MPI2_EVENT_NOTIFY_EVENTMASK_WORDS * sizeof(u32));
+	mpt2sas_base_validate_event_type(ioc, ioc->event_type);
+
+	/* initialize event_log */
+	ioc->event_context = 0;
+	ioc->aen_event_read_flag = 0;
+	ioc->event_log = kcalloc(MPT2SAS_CTL_EVENT_LOG_SIZE,
+	    sizeof(struct MPT2_IOCTL_EVENTS), GFP_KERNEL);
+	if (!ioc->event_log) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_eventreport - main handler for MPT2EVENTREPORT opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_eventreport(void __user *arg)
+{
+	struct mpt2_ioctl_eventreport karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	u32 number_bytes, max_events, max;
+	struct mpt2_ioctl_eventreport __user *uarg = arg;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	number_bytes = karg.hdr.max_data_size -
+	    sizeof(struct mpt2_ioctl_header);
+	max_events = number_bytes/sizeof(struct MPT2_IOCTL_EVENTS);
+	max = min_t(u32, MPT2SAS_CTL_EVENT_LOG_SIZE, max_events);
+
+	/* If fewer than 1 event is requested, there must have
+	 * been some type of error.
+	 */
+	if (!max || !ioc->event_log)
+		return -ENODATA;
+
+	number_bytes = max * sizeof(struct MPT2_IOCTL_EVENTS);
+	if (copy_to_user(uarg->event_data, ioc->event_log, number_bytes)) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+
+	/* reset flag so SIGIO can restart */
+	ioc->aen_event_read_flag = 0;
+	return 0;
+}
+
+/**
+ * _ctl_do_reset - main handler for MPT2HARDRESET opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_do_reset(void __user *arg)
+{
+	struct mpt2_ioctl_diag_reset karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	int retval;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	retval = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+	    FORCE_BIG_HAMMER);
+	printk(MPT2SAS_INFO_FMT "host reset: %s\n",
+	    ioc->name, ((!retval) ? "SUCCESS" : "FAILED"));
+	return 0;
+}
+
+/**
+ * _ctl_btdh_search_sas_device - searching for sas device
+ * @ioc: per adapter object
+ * @btdh: btdh ioctl payload
+ */
+static int
+_ctl_btdh_search_sas_device(struct MPT2SAS_ADAPTER *ioc,
+    struct mpt2_ioctl_btdh_mapping *btdh)
+{
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	int rc = 0;
+
+	if (list_empty(&ioc->sas_device_list))
+		return rc;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	list_for_each_entry(sas_device, &ioc->sas_device_list, list) {
+		if (btdh->bus == 0xFFFFFFFF && btdh->id == 0xFFFFFFFF &&
+		    btdh->handle == sas_device->handle) {
+			btdh->bus = sas_device->channel;
+			btdh->id = sas_device->id;
+			rc = 1;
+			goto out;
+		} else if (btdh->bus == sas_device->channel && btdh->id ==
+		    sas_device->id && btdh->handle == 0xFFFF) {
+			btdh->handle = sas_device->handle;
+			rc = 1;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	return rc;
+}
+
+/**
+ * _ctl_btdh_search_raid_device - searching for raid device
+ * @ioc: per adapter object
+ * @btdh: btdh ioctl payload
+ */
+static int
+_ctl_btdh_search_raid_device(struct MPT2SAS_ADAPTER *ioc,
+    struct mpt2_ioctl_btdh_mapping *btdh)
+{
+	struct _raid_device *raid_device;
+	unsigned long flags;
+	int rc = 0;
+
+	if (list_empty(&ioc->raid_device_list))
+		return rc;
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	list_for_each_entry(raid_device, &ioc->raid_device_list, list) {
+		if (btdh->bus == 0xFFFFFFFF && btdh->id == 0xFFFFFFFF &&
+		    btdh->handle == raid_device->handle) {
+			btdh->bus = raid_device->channel;
+			btdh->id = raid_device->id;
+			rc = 1;
+			goto out;
+		} else if (btdh->bus == raid_device->channel && btdh->id ==
+		    raid_device->id && btdh->handle == 0xFFFF) {
+			btdh->handle = raid_device->handle;
+			rc = 1;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+	return rc;
+}
+
+/**
+ * _ctl_btdh_mapping - main handler for MPT2BTDHMAPPING opcode
+ * @arg - user space buffer containing ioctl content
+ */
+static long
+_ctl_btdh_mapping(void __user *arg)
+{
+	struct mpt2_ioctl_btdh_mapping karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	int rc;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	rc = _ctl_btdh_search_sas_device(ioc, &karg);
+	if (!rc)
+		_ctl_btdh_search_raid_device(ioc, &karg);
+
+	if (copy_to_user(arg, &karg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_diag_capability - return diag buffer capability
+ * @ioc: per adapter object
+ * @buffer_type: specifies either TRACE or SNAPSHOT
+ *
+ * returns 1 when diag buffer support is enabled in firmware
+ */
+static u8
+_ctl_diag_capability(struct MPT2SAS_ADAPTER *ioc, u8 buffer_type)
+{
+	u8 rc = 0;
+
+	switch (buffer_type) {
+	case MPI2_DIAG_BUF_TYPE_TRACE:
+		if (ioc->facts.IOCCapabilities &
+		    MPI2_IOCFACTS_CAPABILITY_DIAG_TRACE_BUFFER)
+			rc = 1;
+		break;
+	case MPI2_DIAG_BUF_TYPE_SNAPSHOT:
+		if (ioc->facts.IOCCapabilities &
+		    MPI2_IOCFACTS_CAPABILITY_SNAPSHOT_BUFFER)
+			rc = 1;
+		break;
+	}
+
+	return rc;
+}
+
+/**
+ * _ctl_diag_register - application register with driver
+ * @arg - user space buffer containing ioctl content
+ * @state - NON_BLOCKING or BLOCKING
+ *
+ * This will allow the driver to setup any required buffers that will be
+ * needed by firmware to communicate with the driver.
+ */
+static long
+_ctl_diag_register(void __user *arg, enum block_state state)
+{
+	struct mpt2_diag_register karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	int rc, i;
+	void *request_data = NULL;
+	dma_addr_t request_data_dma;
+	u32 request_data_sz = 0;
+	Mpi2DiagBufferPostRequest_t *mpi_request;
+	Mpi2DiagBufferPostReply_t *mpi_reply;
+	u8 buffer_type;
+	unsigned long timeleft;
+	u16 smid;
+	u16 ioc_status;
+	u8 issue_reset = 0;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	buffer_type = karg.buffer_type;
+	if (!_ctl_diag_capability(ioc, buffer_type)) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have capability for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -EPERM;
+	}
+
+	if (ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_REGISTERED) {
+		printk(MPT2SAS_ERR_FMT "%s: already has a registered "
+		    "buffer for buffer_type(0x%02x)\n", ioc->name, __func__,
+		    buffer_type);
+		return -EINVAL;
+	}
+
+	if (karg.requested_buffer_size % 4)  {
+		printk(MPT2SAS_ERR_FMT "%s: the requested_buffer_size "
+		    "is not 4 byte aligned\n", ioc->name, __func__);
+		return -EINVAL;
+	}
+
+	if (state == NON_BLOCKING && !mutex_trylock(&ioc->ctl_cmds.mutex))
+		return -EAGAIN;
+	else if (mutex_lock_interruptible(&ioc->ctl_cmds.mutex))
+		return -ERESTARTSYS;
+
+	if (ioc->ctl_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: ctl_cmd in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->ctl_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	ioc->ctl_cmds.status = MPT2_CMD_PENDING;
+	memset(ioc->ctl_cmds.reply, 0, ioc->reply_sz);
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->ctl_cmds.smid = smid;
+
+	request_data = ioc->diag_buffer[buffer_type];
+	request_data_sz = karg.requested_buffer_size;
+	ioc->unique_id[buffer_type] = karg.unique_id;
+	ioc->diag_buffer_status[buffer_type] = 0;
+	memcpy(ioc->product_specific[buffer_type], karg.product_specific,
+	    MPT2_PRODUCT_SPECIFIC_DWORDS);
+	ioc->diagnostic_flags[buffer_type] = karg.diagnostic_flags;
+
+	if (request_data) {
+		request_data_dma = ioc->diag_buffer_dma[buffer_type];
+		if (request_data_sz != ioc->diag_buffer_sz[buffer_type]) {
+			pci_free_consistent(ioc->pdev,
+			    ioc->diag_buffer_sz[buffer_type],
+			    request_data, request_data_dma);
+			request_data = NULL;
+		}
+	}
+
+	if (request_data == NULL) {
+		ioc->diag_buffer_sz[buffer_type] = 0;
+		ioc->diag_buffer_dma[buffer_type] = 0;
+		request_data = pci_alloc_consistent(
+			ioc->pdev, request_data_sz, &request_data_dma);
+		if (request_data == NULL) {
+			printk(MPT2SAS_ERR_FMT "%s: failed allocating memory"
+			    " for diag buffers, requested size(%d)\n",
+			    ioc->name, __func__, request_data_sz);
+			mpt2sas_base_free_smid(ioc, smid);
+			return -ENOMEM;
+		}
+		ioc->diag_buffer[buffer_type] = request_data;
+		ioc->diag_buffer_sz[buffer_type] = request_data_sz;
+		ioc->diag_buffer_dma[buffer_type] = request_data_dma;
+	}
+
+	mpi_request->Function = MPI2_FUNCTION_DIAG_BUFFER_POST;
+	mpi_request->BufferType = karg.buffer_type;
+	mpi_request->Flags = cpu_to_le32(karg.diagnostic_flags);
+	mpi_request->BufferAddress = cpu_to_le64(request_data_dma);
+	mpi_request->BufferLength = cpu_to_le32(request_data_sz);
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: diag_buffer(0x%p), "
+	    "dma(0x%llx), sz(%d)\n", ioc->name, __func__, request_data,
+	    (unsigned long long)request_data_dma, mpi_request->BufferLength));
+
+	for (i = 0; i < MPT2_PRODUCT_SPECIFIC_DWORDS; i++)
+		mpi_request->ProductSpecific[i] =
+			cpu_to_le32(ioc->product_specific[buffer_type][i]);
+
+	mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->ctl_cmds.done,
+	    MPT2_IOCTL_DEFAULT_TIMEOUT*HZ);
+
+	if (!(ioc->ctl_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n", ioc->name,
+		    __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2DiagBufferPostRequest_t)/4);
+		if (!(ioc->ctl_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	/* process the completed Reply Message Frame */
+	if ((ioc->ctl_cmds.status & MPT2_CMD_REPLY_VALID) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: no reply message\n",
+		    ioc->name, __func__);
+		rc = -EFAULT;
+		goto out;
+	}
+
+	mpi_reply = ioc->ctl_cmds.reply;
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & MPI2_IOCSTATUS_MASK;
+
+	if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
+		ioc->diag_buffer_status[buffer_type] |=
+			MPT2_DIAG_BUFFER_IS_REGISTERED;
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: success\n",
+		    ioc->name, __func__));
+	} else {
+		printk(MPT2SAS_DEBUG_FMT "%s: ioc_status(0x%04x) "
+		    "log_info(0x%08x)\n", ioc->name, __func__,
+		    ioc_status, mpi_reply->IOCLogInfo);
+		rc = -EFAULT;
+	}
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+
+ out:
+
+	if (rc && request_data)
+		pci_free_consistent(ioc->pdev, request_data_sz,
+		    request_data, request_data_dma);
+
+	ioc->ctl_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->ctl_cmds.mutex);
+	return rc;
+}
+
+/**
+ * _ctl_diag_unregister - application unregister with driver
+ * @arg - user space buffer containing ioctl content
+ *
+ * This will allow the driver to cleanup any memory allocated for diag
+ * messages and to free up any resources.
+ */
+static long
+_ctl_diag_unregister(void __user *arg)
+{
+	struct mpt2_diag_unregister karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	void *request_data;
+	dma_addr_t request_data_dma;
+	u32 request_data_sz;
+	u8 buffer_type;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	buffer_type = karg.unique_id & 0x000000ff;
+	if (!_ctl_diag_capability(ioc, buffer_type)) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have capability for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -EPERM;
+	}
+
+	if ((ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_REGISTERED) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: buffer_type(0x%02x) is not "
+		    "registered\n", ioc->name, __func__, buffer_type);
+		return -EINVAL;
+	}
+	if ((ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_RELEASED) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: buffer_type(0x%02x) has not been "
+		    "released\n", ioc->name, __func__, buffer_type);
+		return -EINVAL;
+	}
+
+	if (karg.unique_id != ioc->unique_id[buffer_type]) {
+		printk(MPT2SAS_ERR_FMT "%s: unique_id(0x%08x) is not "
+		    "registered\n", ioc->name, __func__, karg.unique_id);
+		return -EINVAL;
+	}
+
+	request_data = ioc->diag_buffer[buffer_type];
+	if (!request_data) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have memory allocated for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -ENOMEM;
+	}
+
+	request_data_sz = ioc->diag_buffer_sz[buffer_type];
+	request_data_dma = ioc->diag_buffer_dma[buffer_type];
+	pci_free_consistent(ioc->pdev, request_data_sz,
+	    request_data, request_data_dma);
+	ioc->diag_buffer[buffer_type] = NULL;
+	ioc->diag_buffer_status[buffer_type] = 0;
+	return 0;
+}
+
+/**
+ * _ctl_diag_query - query relevant info associated with diag buffers
+ * @arg - user space buffer containing ioctl content
+ *
+ * The application will send only buffer_type and unique_id.  Driver will
+ * inspect unique_id first, if valid, fill in all the info.  If unique_id is
+ * 0x00, the driver will return info specified by Buffer Type.
+ */
+static long
+_ctl_diag_query(void __user *arg)
+{
+	struct mpt2_diag_query karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	void *request_data;
+	int i;
+	u8 buffer_type;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	karg.application_flags = 0;
+	buffer_type = karg.buffer_type;
+
+	if (!_ctl_diag_capability(ioc, buffer_type)) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have capability for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -EPERM;
+	}
+
+	if ((ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_REGISTERED) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: buffer_type(0x%02x) is not "
+		    "registered\n", ioc->name, __func__, buffer_type);
+		return -EINVAL;
+	}
+
+	if (karg.unique_id & 0xffffff00) {
+		if (karg.unique_id != ioc->unique_id[buffer_type]) {
+			printk(MPT2SAS_ERR_FMT "%s: unique_id(0x%08x) is not "
+			    "registered\n", ioc->name, __func__,
+			    karg.unique_id);
+			return -EINVAL;
+		}
+	}
+
+	request_data = ioc->diag_buffer[buffer_type];
+	if (!request_data) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have buffer for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -ENOMEM;
+	}
+
+	if (ioc->diag_buffer_status[buffer_type] & MPT2_DIAG_BUFFER_IS_RELEASED)
+		karg.application_flags = (MPT2_APP_FLAGS_APP_OWNED |
+		    MPT2_APP_FLAGS_BUFFER_VALID);
+	else
+		karg.application_flags = (MPT2_APP_FLAGS_APP_OWNED |
+		    MPT2_APP_FLAGS_BUFFER_VALID |
+		    MPT2_APP_FLAGS_FW_BUFFER_ACCESS);
+
+	for (i = 0; i < MPT2_PRODUCT_SPECIFIC_DWORDS; i++)
+		karg.product_specific[i] =
+		    ioc->product_specific[buffer_type][i];
+
+	karg.total_buffer_size = ioc->diag_buffer_sz[buffer_type];
+	karg.driver_added_buffer_size = 0;
+	karg.unique_id = ioc->unique_id[buffer_type];
+	karg.diagnostic_flags = ioc->diagnostic_flags[buffer_type];
+
+	if (copy_to_user(arg, &karg, sizeof(struct mpt2_diag_query))) {
+		printk(MPT2SAS_ERR_FMT "%s: unable to write mpt2_diag_query "
+		    "data @ %p\n", ioc->name, __func__, arg);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/**
+ * _ctl_diag_release - request to send Diag Release Message to firmware
+ * @arg - user space buffer containing ioctl content
+ * @state - NON_BLOCKING or BLOCKING
+ *
+ * This allows ownership of the specified buffer to returned to the driver,
+ * allowing an application to read the buffer without fear that firmware is
+ * overwritting information in the buffer.
+ */
+static long
+_ctl_diag_release(void __user *arg, enum block_state state)
+{
+	struct mpt2_diag_release karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	void *request_data;
+	int rc;
+	Mpi2DiagReleaseRequest_t *mpi_request;
+	Mpi2DiagReleaseReply_t *mpi_reply;
+	u8 buffer_type;
+	unsigned long timeleft;
+	u16 smid;
+	u16 ioc_status;
+	u8 issue_reset = 0;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	buffer_type = karg.unique_id & 0x000000ff;
+	if (!_ctl_diag_capability(ioc, buffer_type)) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have capability for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -EPERM;
+	}
+
+	if ((ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_REGISTERED) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: buffer_type(0x%02x) is not "
+		    "registered\n", ioc->name, __func__, buffer_type);
+		return -EINVAL;
+	}
+
+	if (karg.unique_id != ioc->unique_id[buffer_type]) {
+		printk(MPT2SAS_ERR_FMT "%s: unique_id(0x%08x) is not "
+		    "registered\n", ioc->name, __func__, karg.unique_id);
+		return -EINVAL;
+	}
+
+	if (ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_RELEASED) {
+		printk(MPT2SAS_ERR_FMT "%s: buffer_type(0x%02x) "
+		    "is already released\n", ioc->name, __func__,
+		    buffer_type);
+		return 0;
+	}
+
+	request_data = ioc->diag_buffer[buffer_type];
+
+	if (!request_data) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have memory allocated for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -ENOMEM;
+	}
+
+	if (state == NON_BLOCKING && !mutex_trylock(&ioc->ctl_cmds.mutex))
+		return -EAGAIN;
+	else if (mutex_lock_interruptible(&ioc->ctl_cmds.mutex))
+		return -ERESTARTSYS;
+
+	if (ioc->ctl_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: ctl_cmd in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->ctl_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	ioc->ctl_cmds.status = MPT2_CMD_PENDING;
+	memset(ioc->ctl_cmds.reply, 0, ioc->reply_sz);
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->ctl_cmds.smid = smid;
+
+	mpi_request->Function = MPI2_FUNCTION_DIAG_RELEASE;
+	mpi_request->BufferType = buffer_type;
+
+	mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->ctl_cmds.done,
+	    MPT2_IOCTL_DEFAULT_TIMEOUT*HZ);
+
+	if (!(ioc->ctl_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n", ioc->name,
+		    __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2DiagReleaseRequest_t)/4);
+		if (!(ioc->ctl_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	/* process the completed Reply Message Frame */
+	if ((ioc->ctl_cmds.status & MPT2_CMD_REPLY_VALID) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: no reply message\n",
+		    ioc->name, __func__);
+		rc = -EFAULT;
+		goto out;
+	}
+
+	mpi_reply = ioc->ctl_cmds.reply;
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & MPI2_IOCSTATUS_MASK;
+
+	if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
+		ioc->diag_buffer_status[buffer_type] |=
+		    MPT2_DIAG_BUFFER_IS_RELEASED;
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: success\n",
+		    ioc->name, __func__));
+	} else {
+		printk(MPT2SAS_DEBUG_FMT "%s: ioc_status(0x%04x) "
+		    "log_info(0x%08x)\n", ioc->name, __func__,
+		    ioc_status, mpi_reply->IOCLogInfo);
+		rc = -EFAULT;
+	}
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+
+ out:
+
+	ioc->ctl_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->ctl_cmds.mutex);
+	return rc;
+}
+
+/**
+ * _ctl_diag_read_buffer - request for copy of the diag buffer
+ * @arg - user space buffer containing ioctl content
+ * @state - NON_BLOCKING or BLOCKING
+ */
+static long
+_ctl_diag_read_buffer(void __user *arg, enum block_state state)
+{
+	struct mpt2_diag_read_buffer karg;
+	struct mpt2_diag_read_buffer __user *uarg = arg;
+	struct MPT2SAS_ADAPTER *ioc;
+	void *request_data, *diag_data;
+	Mpi2DiagBufferPostRequest_t *mpi_request;
+	Mpi2DiagBufferPostReply_t *mpi_reply;
+	int rc, i;
+	u8 buffer_type;
+	unsigned long timeleft;
+	u16 smid;
+	u16 ioc_status;
+	u8 issue_reset = 0;
+
+	if (copy_from_user(&karg, arg, sizeof(karg))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s\n", ioc->name,
+	    __func__));
+
+	buffer_type = karg.unique_id & 0x000000ff;
+	if (!_ctl_diag_capability(ioc, buffer_type)) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have capability for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -EPERM;
+	}
+
+	if (karg.unique_id != ioc->unique_id[buffer_type]) {
+		printk(MPT2SAS_ERR_FMT "%s: unique_id(0x%08x) is not "
+		    "registered\n", ioc->name, __func__, karg.unique_id);
+		return -EINVAL;
+	}
+
+	request_data = ioc->diag_buffer[buffer_type];
+	if (!request_data) {
+		printk(MPT2SAS_ERR_FMT "%s: doesn't have buffer for "
+		    "buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type);
+		return -ENOMEM;
+	}
+
+	if ((karg.starting_offset % 4) || (karg.bytes_to_read % 4)) {
+		printk(MPT2SAS_ERR_FMT "%s: either the starting_offset "
+		    "or bytes_to_read are not 4 byte aligned\n", ioc->name,
+		    __func__);
+		return -EINVAL;
+	}
+
+	diag_data = (void *)(request_data + karg.starting_offset);
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: diag_buffer(%p), "
+	    "offset(%d), sz(%d)\n", ioc->name, __func__,
+	    diag_data, karg.starting_offset, karg.bytes_to_read));
+
+	if (copy_to_user((void __user *)uarg->diagnostic_data,
+	    diag_data, karg.bytes_to_read)) {
+		printk(MPT2SAS_ERR_FMT "%s: Unable to write "
+		    "mpt_diag_read_buffer_t data @ %p\n", ioc->name,
+		    __func__, diag_data);
+		return -EFAULT;
+	}
+
+	if ((karg.flags & MPT2_FLAGS_REREGISTER) == 0)
+		return 0;
+
+	dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: Reregister "
+		"buffer_type(0x%02x)\n", ioc->name, __func__, buffer_type));
+	if ((ioc->diag_buffer_status[buffer_type] &
+	    MPT2_DIAG_BUFFER_IS_RELEASED) == 0) {
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "buffer_type(0x%02x) is still registered\n", ioc->name,
+		     __func__, buffer_type));
+		return 0;
+	}
+	/* Get a free request frame and save the message context.
+	*/
+	if (state == NON_BLOCKING && !mutex_trylock(&ioc->ctl_cmds.mutex))
+		return -EAGAIN;
+	else if (mutex_lock_interruptible(&ioc->ctl_cmds.mutex))
+		return -ERESTARTSYS;
+
+	if (ioc->ctl_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: ctl_cmd in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->ctl_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	ioc->ctl_cmds.status = MPT2_CMD_PENDING;
+	memset(ioc->ctl_cmds.reply, 0, ioc->reply_sz);
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->ctl_cmds.smid = smid;
+
+	mpi_request->Function = MPI2_FUNCTION_DIAG_BUFFER_POST;
+	mpi_request->BufferType = buffer_type;
+	mpi_request->BufferLength =
+	    cpu_to_le32(ioc->diag_buffer_sz[buffer_type]);
+	mpi_request->BufferAddress =
+	    cpu_to_le64(ioc->diag_buffer_dma[buffer_type]);
+	for (i = 0; i < MPT2_PRODUCT_SPECIFIC_DWORDS; i++)
+		mpi_request->ProductSpecific[i] =
+			cpu_to_le32(ioc->product_specific[buffer_type][i]);
+
+	mpt2sas_base_put_smid_default(ioc, smid, mpi_request->VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->ctl_cmds.done,
+	    MPT2_IOCTL_DEFAULT_TIMEOUT*HZ);
+
+	if (!(ioc->ctl_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n", ioc->name,
+		    __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2DiagBufferPostRequest_t)/4);
+		if (!(ioc->ctl_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	/* process the completed Reply Message Frame */
+	if ((ioc->ctl_cmds.status & MPT2_CMD_REPLY_VALID) == 0) {
+		printk(MPT2SAS_ERR_FMT "%s: no reply message\n",
+		    ioc->name, __func__);
+		rc = -EFAULT;
+		goto out;
+	}
+
+	mpi_reply = ioc->ctl_cmds.reply;
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus) & MPI2_IOCSTATUS_MASK;
+
+	if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
+		ioc->diag_buffer_status[buffer_type] |=
+		    MPT2_DIAG_BUFFER_IS_REGISTERED;
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: success\n",
+		    ioc->name, __func__));
+	} else {
+		printk(MPT2SAS_DEBUG_FMT "%s: ioc_status(0x%04x) "
+		    "log_info(0x%08x)\n", ioc->name, __func__,
+		    ioc_status, mpi_reply->IOCLogInfo);
+		rc = -EFAULT;
+	}
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+
+ out:
+
+	ioc->ctl_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->ctl_cmds.mutex);
+	return rc;
+}
+
+/**
+ * _ctl_ioctl_main - main ioctl entry point
+ * @file - (struct file)
+ * @cmd - ioctl opcode
+ * @arg -
+ */
+static long
+_ctl_ioctl_main(struct file *file, unsigned int cmd, void __user *arg)
+{
+	enum block_state state;
+	long ret = -EINVAL;
+	unsigned long flags;
+
+	state = (file->f_flags & O_NONBLOCK) ? NON_BLOCKING :
+	    BLOCKING;
+
+	switch (cmd) {
+	case MPT2IOCINFO:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_iocinfo))
+			ret = _ctl_getiocinfo(arg);
+		break;
+	case MPT2COMMAND:
+	{
+		struct mpt2_ioctl_command karg;
+		struct mpt2_ioctl_command __user *uarg;
+		struct MPT2SAS_ADAPTER *ioc;
+
+		if (copy_from_user(&karg, arg, sizeof(karg))) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n",
+			    __FILE__, __LINE__, __func__);
+			return -EFAULT;
+		}
+
+		if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 ||
+		    !ioc)
+			return -ENODEV;
+
+		spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+		if (ioc->shost_recovery) {
+			spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock,
+			    flags);
+			return -EAGAIN;
+		}
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_command)) {
+			uarg = arg;
+			ret = _ctl_do_mpt_command(ioc, karg, &uarg->mf, state);
+		}
+		break;
+	}
+	case MPT2EVENTQUERY:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_eventquery))
+			ret = _ctl_eventquery(arg);
+		break;
+	case MPT2EVENTENABLE:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_eventenable))
+			ret = _ctl_eventenable(arg);
+		break;
+	case MPT2EVENTREPORT:
+		ret = _ctl_eventreport(arg);
+		break;
+	case MPT2HARDRESET:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_diag_reset))
+			ret = _ctl_do_reset(arg);
+		break;
+	case MPT2BTDHMAPPING:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_ioctl_btdh_mapping))
+			ret = _ctl_btdh_mapping(arg);
+		break;
+	case MPT2DIAGREGISTER:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_diag_register))
+			ret = _ctl_diag_register(arg, state);
+		break;
+	case MPT2DIAGUNREGISTER:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_diag_unregister))
+			ret = _ctl_diag_unregister(arg);
+		break;
+	case MPT2DIAGQUERY:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_diag_query))
+			ret = _ctl_diag_query(arg);
+		break;
+	case MPT2DIAGRELEASE:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_diag_release))
+			ret = _ctl_diag_release(arg, state);
+		break;
+	case MPT2DIAGREADBUFFER:
+		if (_IOC_SIZE(cmd) == sizeof(struct mpt2_diag_read_buffer))
+			ret = _ctl_diag_read_buffer(arg, state);
+		break;
+	default:
+	{
+		struct mpt2_ioctl_command karg;
+		struct MPT2SAS_ADAPTER *ioc;
+
+		if (copy_from_user(&karg, arg, sizeof(karg))) {
+			printk(KERN_ERR "failure at %s:%d/%s()!\n",
+			    __FILE__, __LINE__, __func__);
+			return -EFAULT;
+		}
+
+		if (_ctl_verify_adapter(karg.hdr.ioc_number, &ioc) == -1 ||
+		    !ioc)
+			return -ENODEV;
+
+		dctlprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "unsupported ioctl opcode(0x%08x)\n", ioc->name, cmd));
+		break;
+	}
+	}
+	return ret;
+}
+
+/**
+ * _ctl_ioctl - main ioctl entry point (unlocked)
+ * @file - (struct file)
+ * @cmd - ioctl opcode
+ * @arg -
+ */
+static long
+_ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	long ret;
+	lock_kernel();
+	ret = _ctl_ioctl_main(file, cmd, (void __user *)arg);
+	unlock_kernel();
+	return ret;
+}
+
+#ifdef CONFIG_COMPAT
+/**
+ * _ctl_compat_mpt_command - convert 32bit pointers to 64bit.
+ * @file - (struct file)
+ * @cmd - ioctl opcode
+ * @arg - (struct mpt2_ioctl_command32)
+ *
+ * MPT2COMMAND32 - Handle 32bit applications running on 64bit os.
+ */
+static long
+_ctl_compat_mpt_command(struct file *file, unsigned cmd, unsigned long arg)
+{
+	struct mpt2_ioctl_command32 karg32;
+	struct mpt2_ioctl_command32 __user *uarg;
+	struct mpt2_ioctl_command karg;
+	struct MPT2SAS_ADAPTER *ioc;
+	enum block_state state;
+	unsigned long flags;
+
+	if (_IOC_SIZE(cmd) != sizeof(struct mpt2_ioctl_command32))
+		return -EINVAL;
+
+	uarg = (struct mpt2_ioctl_command32 __user *) arg;
+
+	if (copy_from_user(&karg32, (char __user *)arg, sizeof(karg32))) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n",
+		    __FILE__, __LINE__, __func__);
+		return -EFAULT;
+	}
+	if (_ctl_verify_adapter(karg32.hdr.ioc_number, &ioc) == -1 || !ioc)
+		return -ENODEV;
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->shost_recovery) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock,
+		    flags);
+		return -EAGAIN;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	memset(&karg, 0, sizeof(struct mpt2_ioctl_command));
+	karg.hdr.ioc_number = karg32.hdr.ioc_number;
+	karg.hdr.port_number = karg32.hdr.port_number;
+	karg.hdr.max_data_size = karg32.hdr.max_data_size;
+	karg.timeout = karg32.timeout;
+	karg.max_reply_bytes = karg32.max_reply_bytes;
+	karg.data_in_size = karg32.data_in_size;
+	karg.data_out_size = karg32.data_out_size;
+	karg.max_sense_bytes = karg32.max_sense_bytes;
+	karg.data_sge_offset = karg32.data_sge_offset;
+	memcpy(&karg.reply_frame_buf_ptr, &karg32.reply_frame_buf_ptr,
+	    sizeof(uint32_t));
+	memcpy(&karg.data_in_buf_ptr, &karg32.data_in_buf_ptr,
+	    sizeof(uint32_t));
+	memcpy(&karg.data_out_buf_ptr, &karg32.data_out_buf_ptr,
+	    sizeof(uint32_t));
+	memcpy(&karg.sense_data_ptr, &karg32.sense_data_ptr,
+	    sizeof(uint32_t));
+	state = (file->f_flags & O_NONBLOCK) ? NON_BLOCKING : BLOCKING;
+	return _ctl_do_mpt_command(ioc, karg, &uarg->mf, state);
+}
+
+/**
+ * _ctl_ioctl_compat - main ioctl entry point (compat)
+ * @file -
+ * @cmd -
+ * @arg -
+ *
+ * This routine handles 32 bit applications in 64bit os.
+ */
+static long
+_ctl_ioctl_compat(struct file *file, unsigned cmd, unsigned long arg)
+{
+	long ret;
+	lock_kernel();
+	if (cmd == MPT2COMMAND32)
+		ret = _ctl_compat_mpt_command(file, cmd, arg);
+	else
+		ret = _ctl_ioctl_main(file, cmd, (void __user *)arg);
+	unlock_kernel();
+	return ret;
+}
+#endif
+
+/* scsi host attributes */
+
+/**
+ * _ctl_version_fw_show - firmware version
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_fw_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02d.%02d.%02d.%02d\n",
+	    (ioc->facts.FWVersion.Word & 0xFF000000) >> 24,
+	    (ioc->facts.FWVersion.Word & 0x00FF0000) >> 16,
+	    (ioc->facts.FWVersion.Word & 0x0000FF00) >> 8,
+	    ioc->facts.FWVersion.Word & 0x000000FF);
+}
+static DEVICE_ATTR(version_fw, S_IRUGO, _ctl_version_fw_show, NULL);
+
+/**
+ * _ctl_version_bios_show - bios version
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_bios_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	u32 version = le32_to_cpu(ioc->bios_pg3.BiosVersion);
+
+	return snprintf(buf, PAGE_SIZE, "%02d.%02d.%02d.%02d\n",
+	    (version & 0xFF000000) >> 24,
+	    (version & 0x00FF0000) >> 16,
+	    (version & 0x0000FF00) >> 8,
+	    version & 0x000000FF);
+}
+static DEVICE_ATTR(version_bios, S_IRUGO, _ctl_version_bios_show, NULL);
+
+/**
+ * _ctl_version_mpi_show - MPI (message passing interface) version
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_mpi_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%03x.%02x\n",
+	    ioc->facts.MsgVersion, ioc->facts.HeaderVersion >> 8);
+}
+static DEVICE_ATTR(version_mpi, S_IRUGO, _ctl_version_mpi_show, NULL);
+
+/**
+ * _ctl_version_product_show - product name
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_product_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, 16, "%s\n", ioc->manu_pg0.ChipName);
+}
+static DEVICE_ATTR(version_product, S_IRUGO,
+   _ctl_version_product_show, NULL);
+
+/**
+ * _ctl_version_nvdata_persistent_show - ndvata persistent version
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_nvdata_persistent_show(struct device *cdev,
+    struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02xh\n",
+	    le16_to_cpu(ioc->iounit_pg0.NvdataVersionPersistent.Word));
+}
+static DEVICE_ATTR(version_nvdata_persistent, S_IRUGO,
+    _ctl_version_nvdata_persistent_show, NULL);
+
+/**
+ * _ctl_version_nvdata_default_show - nvdata default version
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_version_nvdata_default_show(struct device *cdev,
+    struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02xh\n",
+	    le16_to_cpu(ioc->iounit_pg0.NvdataVersionDefault.Word));
+}
+static DEVICE_ATTR(version_nvdata_default, S_IRUGO,
+    _ctl_version_nvdata_default_show, NULL);
+
+/**
+ * _ctl_board_name_show - board name
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_board_name_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, 16, "%s\n", ioc->manu_pg0.BoardName);
+}
+static DEVICE_ATTR(board_name, S_IRUGO, _ctl_board_name_show, NULL);
+
+/**
+ * _ctl_board_assembly_show - board assembly name
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_board_assembly_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, 16, "%s\n", ioc->manu_pg0.BoardAssembly);
+}
+static DEVICE_ATTR(board_assembly, S_IRUGO,
+    _ctl_board_assembly_show, NULL);
+
+/**
+ * _ctl_board_tracer_show - board tracer number
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_board_tracer_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, 16, "%s\n", ioc->manu_pg0.BoardTracerNumber);
+}
+static DEVICE_ATTR(board_tracer, S_IRUGO,
+    _ctl_board_tracer_show, NULL);
+
+/**
+ * _ctl_io_delay_show - io missing delay
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is for firmware implemention for deboucing device
+ * removal events.
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_io_delay_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02d\n", ioc->io_missing_delay);
+}
+static DEVICE_ATTR(io_delay, S_IRUGO,
+    _ctl_io_delay_show, NULL);
+
+/**
+ * _ctl_device_delay_show - device missing delay
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is for firmware implemention for deboucing device
+ * removal events.
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_device_delay_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02d\n", ioc->device_missing_delay);
+}
+static DEVICE_ATTR(device_delay, S_IRUGO,
+    _ctl_device_delay_show, NULL);
+
+/**
+ * _ctl_fw_queue_depth_show - global credits
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is firmware queue depth limit
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_fw_queue_depth_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%02d\n", ioc->facts.RequestCredit);
+}
+static DEVICE_ATTR(fw_queue_depth, S_IRUGO,
+    _ctl_fw_queue_depth_show, NULL);
+
+/**
+ * _ctl_sas_address_show - sas address
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is the controller sas address
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_host_sas_address_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "0x%016llx\n",
+	    (unsigned long long)ioc->sas_hba.sas_address);
+}
+static DEVICE_ATTR(host_sas_address, S_IRUGO,
+    _ctl_host_sas_address_show, NULL);
+
+/**
+ * _ctl_logging_level_show - logging level
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * A sysfs 'read/write' shost attribute.
+ */
+static ssize_t
+_ctl_logging_level_show(struct device *cdev, struct device_attribute *attr,
+    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	return snprintf(buf, PAGE_SIZE, "%08xh\n", ioc->logging_level);
+}
+static ssize_t
+_ctl_logging_level_store(struct device *cdev, struct device_attribute *attr,
+    const char *buf, size_t count)
+{
+	struct Scsi_Host *shost = class_to_shost(cdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	int val = 0;
+
+	if (sscanf(buf, "%x", &val) != 1)
+		return -EINVAL;
+
+	ioc->logging_level = val;
+	printk(MPT2SAS_INFO_FMT "logging_level=%08xh\n", ioc->name,
+	    ioc->logging_level);
+	return strlen(buf);
+}
+static DEVICE_ATTR(logging_level, S_IRUGO | S_IWUSR,
+    _ctl_logging_level_show, _ctl_logging_level_store);
+
+struct device_attribute *mpt2sas_host_attrs[] = {
+	&dev_attr_version_fw,
+	&dev_attr_version_bios,
+	&dev_attr_version_mpi,
+	&dev_attr_version_product,
+	&dev_attr_version_nvdata_persistent,
+	&dev_attr_version_nvdata_default,
+	&dev_attr_board_name,
+	&dev_attr_board_assembly,
+	&dev_attr_board_tracer,
+	&dev_attr_io_delay,
+	&dev_attr_device_delay,
+	&dev_attr_logging_level,
+	&dev_attr_fw_queue_depth,
+	&dev_attr_host_sas_address,
+	NULL,
+};
+
+/* device attributes */
+
+/**
+ * _ctl_device_sas_address_show - sas address
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is the sas address for the target
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_device_sas_address_show(struct device *dev, struct device_attribute *attr,
+    char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct MPT2SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+
+	return snprintf(buf, PAGE_SIZE, "0x%016llx\n",
+	    (unsigned long long)sas_device_priv_data->sas_target->sas_address);
+}
+static DEVICE_ATTR(sas_address, S_IRUGO, _ctl_device_sas_address_show, NULL);
+
+/**
+ * _ctl_device_handle_show - device handle
+ * @cdev - pointer to embedded class device
+ * @buf - the buffer returned
+ *
+ * This is the firmware assigned device handle
+ *
+ * A sysfs 'read-only' shost attribute.
+ */
+static ssize_t
+_ctl_device_handle_show(struct device *dev, struct device_attribute *attr,
+    char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct MPT2SAS_DEVICE *sas_device_priv_data = sdev->hostdata;
+
+	return snprintf(buf, PAGE_SIZE, "0x%04x\n",
+	    sas_device_priv_data->sas_target->handle);
+}
+static DEVICE_ATTR(sas_device_handle, S_IRUGO, _ctl_device_handle_show, NULL);
+
+struct device_attribute *mpt2sas_dev_attrs[] = {
+	&dev_attr_sas_address,
+	&dev_attr_sas_device_handle,
+	NULL,
+};
+
+static const struct file_operations ctl_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = _ctl_ioctl,
+	.release = _ctl_release,
+	.poll = _ctl_poll,
+	.fasync = _ctl_fasync,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = _ctl_ioctl_compat,
+#endif
+};
+
+static struct miscdevice ctl_dev = {
+	.minor  = MPT2SAS_MINOR,
+	.name   = MPT2SAS_DEV_NAME,
+	.fops   = &ctl_fops,
+};
+
+/**
+ * mpt2sas_ctl_init - main entry point for ctl.
+ *
+ */
+void
+mpt2sas_ctl_init(void)
+{
+	async_queue = NULL;
+	if (misc_register(&ctl_dev) < 0)
+		printk(KERN_ERR "%s can't register misc device [minor=%d]\n",
+		    MPT2SAS_DRIVER_NAME, MPT2SAS_MINOR);
+
+	init_waitqueue_head(&ctl_poll_wait);
+}
+
+/**
+ * mpt2sas_ctl_exit - exit point for ctl
+ *
+ */
+void
+mpt2sas_ctl_exit(void)
+{
+	struct MPT2SAS_ADAPTER *ioc;
+	int i;
+
+	list_for_each_entry(ioc, &mpt2sas_ioc_list, list) {
+
+		/* free memory associated to diag buffers */
+		for (i = 0; i < MPI2_DIAG_BUF_TYPE_COUNT; i++) {
+			if (!ioc->diag_buffer[i])
+				continue;
+			pci_free_consistent(ioc->pdev, ioc->diag_buffer_sz[i],
+			    ioc->diag_buffer[i], ioc->diag_buffer_dma[i]);
+			ioc->diag_buffer[i] = NULL;
+			ioc->diag_buffer_status[i] = 0;
+		}
+
+		kfree(ioc->event_log);
+	}
+	misc_deregister(&ctl_dev);
+}
+
diff --git a/drivers/scsi/mpt2sas/mpt2sas_ctl.h b/drivers/scsi/mpt2sas/mpt2sas_ctl.h
new file mode 100644
index 000000000000..dbb6c0cf8889
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_ctl.h
@@ -0,0 +1,416 @@
+/*
+ * Management Module Support for MPT (Message Passing Technology) based
+ * controllers
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_ctl.h
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#ifndef MPT2SAS_CTL_H_INCLUDED
+#define MPT2SAS_CTL_H_INCLUDED
+
+#ifdef __KERNEL__
+#include <linux/miscdevice.h>
+#endif
+
+#define MPT2SAS_DEV_NAME	"mpt2ctl"
+#define MPT2_MAGIC_NUMBER	'm'
+#define MPT2_IOCTL_DEFAULT_TIMEOUT (10) /* in seconds */
+
+/**
+ * IOCTL opcodes
+ */
+#define MPT2IOCINFO	_IOWR(MPT2_MAGIC_NUMBER, 17, \
+    struct mpt2_ioctl_iocinfo)
+#define MPT2COMMAND	_IOWR(MPT2_MAGIC_NUMBER, 20, \
+    struct mpt2_ioctl_command)
+#ifdef CONFIG_COMPAT
+#define MPT2COMMAND32	_IOWR(MPT2_MAGIC_NUMBER, 20, \
+    struct mpt2_ioctl_command32)
+#endif
+#define MPT2EVENTQUERY	_IOWR(MPT2_MAGIC_NUMBER, 21, \
+    struct mpt2_ioctl_eventquery)
+#define MPT2EVENTENABLE	_IOWR(MPT2_MAGIC_NUMBER, 22, \
+    struct mpt2_ioctl_eventenable)
+#define MPT2EVENTREPORT	_IOWR(MPT2_MAGIC_NUMBER, 23, \
+    struct mpt2_ioctl_eventreport)
+#define MPT2HARDRESET	_IOWR(MPT2_MAGIC_NUMBER, 24, \
+    struct mpt2_ioctl_diag_reset)
+#define MPT2BTDHMAPPING	_IOWR(MPT2_MAGIC_NUMBER, 31, \
+    struct mpt2_ioctl_btdh_mapping)
+
+/* diag buffer support */
+#define MPT2DIAGREGISTER _IOWR(MPT2_MAGIC_NUMBER, 26, \
+    struct mpt2_diag_register)
+#define MPT2DIAGRELEASE	_IOWR(MPT2_MAGIC_NUMBER, 27, \
+    struct mpt2_diag_release)
+#define MPT2DIAGUNREGISTER _IOWR(MPT2_MAGIC_NUMBER, 28, \
+    struct mpt2_diag_unregister)
+#define MPT2DIAGQUERY	_IOWR(MPT2_MAGIC_NUMBER, 29, \
+    struct mpt2_diag_query)
+#define MPT2DIAGREADBUFFER _IOWR(MPT2_MAGIC_NUMBER, 30, \
+    struct mpt2_diag_read_buffer)
+
+/**
+ * struct mpt2_ioctl_header - main header structure
+ * @ioc_number -  IOC unit number
+ * @port_number - IOC port number
+ * @max_data_size - maximum number bytes to transfer on read
+ */
+struct mpt2_ioctl_header {
+	uint32_t ioc_number;
+	uint32_t port_number;
+	uint32_t max_data_size;
+};
+
+/**
+ * struct mpt2_ioctl_diag_reset - diagnostic reset
+ * @hdr - generic header
+ */
+struct mpt2_ioctl_diag_reset {
+	struct mpt2_ioctl_header hdr;
+};
+
+
+/**
+ * struct mpt2_ioctl_pci_info - pci device info
+ * @device - pci device id
+ * @function - pci function id
+ * @bus - pci bus id
+ * @segment_id - pci segment id
+ */
+struct mpt2_ioctl_pci_info {
+	union {
+		struct {
+			uint32_t device:5;
+			uint32_t function:3;
+			uint32_t bus:24;
+		} bits;
+		uint32_t  word;
+	} u;
+	uint32_t segment_id;
+};
+
+
+#define MPT2_IOCTL_INTERFACE_SCSI	(0x00)
+#define MPT2_IOCTL_INTERFACE_FC		(0x01)
+#define MPT2_IOCTL_INTERFACE_FC_IP	(0x02)
+#define MPT2_IOCTL_INTERFACE_SAS	(0x03)
+#define MPT2_IOCTL_INTERFACE_SAS2	(0x04)
+#define MPT2_IOCTL_VERSION_LENGTH	(32)
+
+/**
+ * struct mpt2_ioctl_iocinfo - generic controller info
+ * @hdr - generic header
+ * @adapter_type - type of adapter (spi, fc, sas)
+ * @port_number - port number
+ * @pci_id - PCI Id
+ * @hw_rev - hardware revision
+ * @sub_system_device - PCI subsystem Device ID
+ * @sub_system_vendor - PCI subsystem Vendor ID
+ * @rsvd0 - reserved
+ * @firmware_version - firmware version
+ * @bios_version - BIOS version
+ * @driver_version - driver version - 32 ASCII characters
+ * @rsvd1 - reserved
+ * @scsi_id - scsi id of adapter 0
+ * @rsvd2 - reserved
+ * @pci_information - pci info (2nd revision)
+ */
+struct mpt2_ioctl_iocinfo {
+	struct mpt2_ioctl_header hdr;
+	uint32_t adapter_type;
+	uint32_t port_number;
+	uint32_t pci_id;
+	uint32_t hw_rev;
+	uint32_t subsystem_device;
+	uint32_t subsystem_vendor;
+	uint32_t rsvd0;
+	uint32_t firmware_version;
+	uint32_t bios_version;
+	uint8_t driver_version[MPT2_IOCTL_VERSION_LENGTH];
+	uint8_t rsvd1;
+	uint8_t scsi_id;
+	uint16_t rsvd2;
+	struct mpt2_ioctl_pci_info pci_information;
+};
+
+
+/* number of event log entries */
+#define MPT2SAS_CTL_EVENT_LOG_SIZE (50)
+
+/**
+ * struct mpt2_ioctl_eventquery - query event count and type
+ * @hdr - generic header
+ * @event_entries - number of events returned by get_event_report
+ * @rsvd - reserved
+ * @event_types - type of events currently being captured
+ */
+struct mpt2_ioctl_eventquery {
+	struct mpt2_ioctl_header hdr;
+	uint16_t event_entries;
+	uint16_t rsvd;
+	uint32_t event_types[MPI2_EVENT_NOTIFY_EVENTMASK_WORDS];
+};
+
+/**
+ * struct mpt2_ioctl_eventenable - enable/disable event capturing
+ * @hdr - generic header
+ * @event_types - toggle off/on type of events to be captured
+ */
+struct mpt2_ioctl_eventenable {
+	struct mpt2_ioctl_header hdr;
+	uint32_t event_types[4];
+};
+
+#define MPT2_EVENT_DATA_SIZE (192)
+/**
+ * struct MPT2_IOCTL_EVENTS -
+ * @event - the event that was reported
+ * @context - unique value for each event assigned by driver
+ * @data - event data returned in fw reply message
+ */
+struct MPT2_IOCTL_EVENTS {
+	uint32_t event;
+	uint32_t context;
+	uint8_t data[MPT2_EVENT_DATA_SIZE];
+};
+
+/**
+ * struct mpt2_ioctl_eventreport - returing event log
+ * @hdr - generic header
+ * @event_data - (see struct MPT2_IOCTL_EVENTS)
+ */
+struct mpt2_ioctl_eventreport {
+	struct mpt2_ioctl_header hdr;
+	struct MPT2_IOCTL_EVENTS event_data[1];
+};
+
+/**
+ * struct mpt2_ioctl_command - generic mpt firmware passthru ioclt
+ * @hdr - generic header
+ * @timeout - command timeout in seconds. (if zero then use driver default
+ *  value).
+ * @reply_frame_buf_ptr - reply location
+ * @data_in_buf_ptr - destination for read
+ * @data_out_buf_ptr - data source for write
+ * @sense_data_ptr - sense data location
+ * @max_reply_bytes - maximum number of reply bytes to be sent to app.
+ * @data_in_size - number bytes for data transfer in (read)
+ * @data_out_size - number bytes for data transfer out (write)
+ * @max_sense_bytes - maximum number of bytes for auto sense buffers
+ * @data_sge_offset - offset in words from the start of the request message to
+ * the first SGL
+ * @mf[1];
+ */
+struct mpt2_ioctl_command {
+	struct mpt2_ioctl_header hdr;
+	uint32_t timeout;
+	void __user *reply_frame_buf_ptr;
+	void __user *data_in_buf_ptr;
+	void __user *data_out_buf_ptr;
+	void __user *sense_data_ptr;
+	uint32_t max_reply_bytes;
+	uint32_t data_in_size;
+	uint32_t data_out_size;
+	uint32_t max_sense_bytes;
+	uint32_t data_sge_offset;
+	uint8_t mf[1];
+};
+
+#ifdef CONFIG_COMPAT
+struct mpt2_ioctl_command32 {
+	struct mpt2_ioctl_header hdr;
+	uint32_t timeout;
+	uint32_t reply_frame_buf_ptr;
+	uint32_t data_in_buf_ptr;
+	uint32_t data_out_buf_ptr;
+	uint32_t sense_data_ptr;
+	uint32_t max_reply_bytes;
+	uint32_t data_in_size;
+	uint32_t data_out_size;
+	uint32_t max_sense_bytes;
+	uint32_t data_sge_offset;
+	uint8_t mf[1];
+};
+#endif
+
+/**
+ * struct mpt2_ioctl_btdh_mapping - mapping info
+ * @hdr - generic header
+ * @id - target device identification number
+ * @bus - SCSI bus number that the target device exists on
+ * @handle - device handle for the target device
+ * @rsvd - reserved
+ *
+ * To obtain a bus/id the application sets
+ * handle to valid handle, and bus/id to 0xFFFF.
+ *
+ * To obtain the device handle the application sets
+ * bus/id valid value, and the handle to 0xFFFF.
+ */
+struct mpt2_ioctl_btdh_mapping {
+	struct mpt2_ioctl_header hdr;
+	uint32_t id;
+	uint32_t bus;
+	uint16_t handle;
+	uint16_t rsvd;
+};
+
+
+/* status bits for ioc->diag_buffer_status */
+#define MPT2_DIAG_BUFFER_IS_REGISTERED 	(0x01)
+#define MPT2_DIAG_BUFFER_IS_RELEASED 	(0x02)
+
+/* application flags for mpt2_diag_register, mpt2_diag_query */
+#define MPT2_APP_FLAGS_APP_OWNED	(0x0001)
+#define MPT2_APP_FLAGS_BUFFER_VALID	(0x0002)
+#define MPT2_APP_FLAGS_FW_BUFFER_ACCESS	(0x0004)
+
+/* flags for mpt2_diag_read_buffer */
+#define MPT2_FLAGS_REREGISTER		(0x0001)
+
+#define MPT2_PRODUCT_SPECIFIC_DWORDS 		23
+
+/**
+ * struct mpt2_diag_register - application register with driver
+ * @hdr - generic header
+ * @reserved -
+ * @buffer_type - specifies either TRACE or SNAPSHOT
+ * @application_flags - misc flags
+ * @diagnostic_flags - specifies flags affecting command processing
+ * @product_specific - product specific information
+ * @requested_buffer_size - buffers size in bytes
+ * @unique_id - tag specified by application that is used to signal ownership
+ *  of the buffer.
+ *
+ * This will allow the driver to setup any required buffers that will be
+ * needed by firmware to communicate with the driver.
+ */
+struct mpt2_diag_register {
+	struct mpt2_ioctl_header hdr;
+	uint8_t reserved;
+	uint8_t buffer_type;
+	uint16_t application_flags;
+	uint32_t diagnostic_flags;
+	uint32_t product_specific[MPT2_PRODUCT_SPECIFIC_DWORDS];
+	uint32_t requested_buffer_size;
+	uint32_t unique_id;
+};
+
+/**
+ * struct mpt2_diag_unregister - application unregister with driver
+ * @hdr - generic header
+ * @unique_id - tag uniquely identifies the buffer to be unregistered
+ *
+ * This will allow the driver to cleanup any memory allocated for diag
+ * messages and to free up any resources.
+ */
+struct mpt2_diag_unregister {
+	struct mpt2_ioctl_header hdr;
+	uint32_t unique_id;
+};
+
+/**
+ * struct mpt2_diag_query - query relevant info associated with diag buffers
+ * @hdr - generic header
+ * @reserved -
+ * @buffer_type - specifies either TRACE or SNAPSHOT
+ * @application_flags - misc flags
+ * @diagnostic_flags - specifies flags affecting command processing
+ * @product_specific - product specific information
+ * @total_buffer_size - diag buffer size in bytes
+ * @driver_added_buffer_size - size of extra space appended to end of buffer
+ * @unique_id - unique id associated with this buffer.
+ *
+ * The application will send only buffer_type and unique_id.  Driver will
+ * inspect unique_id first, if valid, fill in all the info.  If unique_id is
+ * 0x00, the driver will return info specified by Buffer Type.
+ */
+struct mpt2_diag_query {
+	struct mpt2_ioctl_header hdr;
+	uint8_t reserved;
+	uint8_t buffer_type;
+	uint16_t application_flags;
+	uint32_t diagnostic_flags;
+	uint32_t product_specific[MPT2_PRODUCT_SPECIFIC_DWORDS];
+	uint32_t total_buffer_size;
+	uint32_t driver_added_buffer_size;
+	uint32_t unique_id;
+};
+
+/**
+ * struct mpt2_diag_release -  request to send Diag Release Message to firmware
+ * @hdr - generic header
+ * @unique_id - tag uniquely identifies the buffer to be released
+ *
+ * This allows ownership of the specified buffer to returned to the driver,
+ * allowing an application to read the buffer without fear that firmware is
+ * overwritting information in the buffer.
+ */
+struct mpt2_diag_release {
+	struct mpt2_ioctl_header hdr;
+	uint32_t unique_id;
+};
+
+/**
+ * struct mpt2_diag_read_buffer - request for copy of the diag buffer
+ * @hdr - generic header
+ * @status -
+ * @reserved -
+ * @flags - misc flags
+ * @starting_offset - starting offset within drivers buffer where to start
+ *  reading data at into the specified application buffer
+ * @bytes_to_read - number of bytes to copy from the drivers buffer into the
+ *  application buffer starting at starting_offset.
+ * @unique_id - unique id associated with this buffer.
+ * @diagnostic_data - data payload
+ */
+struct mpt2_diag_read_buffer {
+	struct mpt2_ioctl_header hdr;
+	uint8_t status;
+	uint8_t reserved;
+	uint16_t flags;
+	uint32_t starting_offset;
+	uint32_t bytes_to_read;
+	uint32_t unique_id;
+	uint32_t diagnostic_data[1];
+};
+
+#endif /* MPT2SAS_CTL_H_INCLUDED */
diff --git a/drivers/scsi/mpt2sas/mpt2sas_debug.h b/drivers/scsi/mpt2sas/mpt2sas_debug.h
new file mode 100644
index 000000000000..ad325096e842
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_debug.h
@@ -0,0 +1,181 @@
+/*
+ * Logging Support for MPT (Message Passing Technology) based controllers
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_debug.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#ifndef MPT2SAS_DEBUG_H_INCLUDED
+#define MPT2SAS_DEBUG_H_INCLUDED
+
+#define MPT_DEBUG			0x00000001
+#define MPT_DEBUG_MSG_FRAME		0x00000002
+#define MPT_DEBUG_SG			0x00000004
+#define MPT_DEBUG_EVENTS		0x00000008
+#define MPT_DEBUG_EVENT_WORK_TASK	0x00000010
+#define MPT_DEBUG_INIT			0x00000020
+#define MPT_DEBUG_EXIT			0x00000040
+#define MPT_DEBUG_FAIL			0x00000080
+#define MPT_DEBUG_TM			0x00000100
+#define MPT_DEBUG_REPLY			0x00000200
+#define MPT_DEBUG_HANDSHAKE		0x00000400
+#define MPT_DEBUG_CONFIG		0x00000800
+#define MPT_DEBUG_DL			0x00001000
+#define MPT_DEBUG_RESET			0x00002000
+#define MPT_DEBUG_SCSI			0x00004000
+#define MPT_DEBUG_IOCTL			0x00008000
+#define MPT_DEBUG_CSMISAS		0x00010000
+#define MPT_DEBUG_SAS			0x00020000
+#define MPT_DEBUG_TRANSPORT		0x00040000
+#define MPT_DEBUG_TASK_SET_FULL		0x00080000
+
+#define MPT_DEBUG_TARGET_MODE		0x00100000
+
+
+/*
+ * CONFIG_SCSI_MPT2SAS_LOGGING - enabled in Kconfig
+ */
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+#define MPT_CHECK_LOGGING(IOC, CMD, BITS)			\
+{								\
+	if (IOC->logging_level & BITS)				\
+		CMD;						\
+}
+#else
+#define MPT_CHECK_LOGGING(IOC, CMD, BITS)
+#endif /* CONFIG_SCSI_MPT2SAS_LOGGING */
+
+
+/*
+ * debug macros
+ */
+
+#define dprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG)
+
+#define dsgprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_SG)
+
+#define devtprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_EVENTS)
+
+#define dewtprintk(IOC, CMD)		\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_EVENT_WORK_TASK)
+
+#define dinitprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_INIT)
+
+#define dexitprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_EXIT)
+
+#define dfailprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_FAIL)
+
+#define dtmprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_TM)
+
+#define dreplyprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_REPLY)
+
+#define dhsprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_HANDSHAKE)
+
+#define dcprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_CONFIG)
+
+#define ddlprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_DL)
+
+#define drsprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_RESET)
+
+#define dsprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_SCSI)
+
+#define dctlprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_IOCTL)
+
+#define dcsmisasprintk(IOC, CMD)		\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_CSMISAS)
+
+#define dsasprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_SAS)
+
+#define dsastransport(IOC, CMD)		\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_SAS_WIDE)
+
+#define dmfprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_MSG_FRAME)
+
+#define dtsfprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_TASK_SET_FULL)
+
+#define dtransportprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_TRANSPORT)
+
+#define dTMprintk(IOC, CMD)			\
+	MPT_CHECK_LOGGING(IOC, CMD, MPT_DEBUG_TARGET_MODE)
+
+/* inline functions for dumping debug data*/
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _debug_dump_mf - print message frame contents
+ * @mpi_request: pointer to message frame
+ * @sz: number of dwords
+ */
+static inline void
+_debug_dump_mf(void *mpi_request, int sz)
+{
+	int i;
+	u32 *mfp = (u32 *)mpi_request;
+
+	printk(KERN_INFO "mf:\n\t");
+	for (i = 0; i < sz; i++) {
+		if (i && ((i % 8) == 0))
+			printk("\n\t");
+		printk("%08x ", le32_to_cpu(mfp[i]));
+	}
+	printk("\n");
+}
+#else
+#define _debug_dump_mf(mpi_request, sz)
+#endif /* CONFIG_SCSI_MPT2SAS_LOGGING */
+
+#endif /* MPT2SAS_DEBUG_H_INCLUDED */
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
new file mode 100644
index 000000000000..0c463c483c02
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -0,0 +1,5687 @@
+/*
+ * Scsi Host Layer for MPT (Message Passing Technology) based controllers
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_scsih.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/blkdev.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+
+#include "mpt2sas_base.h"
+
+MODULE_AUTHOR(MPT2SAS_AUTHOR);
+MODULE_DESCRIPTION(MPT2SAS_DESCRIPTION);
+MODULE_LICENSE("GPL");
+MODULE_VERSION(MPT2SAS_DRIVER_VERSION);
+
+#define RAID_CHANNEL 1
+
+/* forward proto's */
+static void _scsih_expander_node_remove(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_node *sas_expander);
+static void _firmware_event_work(struct work_struct *work);
+
+/* global parameters */
+LIST_HEAD(mpt2sas_ioc_list);
+
+/* local parameters */
+static u8 scsi_io_cb_idx = -1;
+static u8 tm_cb_idx = -1;
+static u8 ctl_cb_idx = -1;
+static u8 base_cb_idx = -1;
+static u8 transport_cb_idx = -1;
+static u8 config_cb_idx = -1;
+static int mpt_ids;
+
+/* command line options */
+static u32 logging_level;
+MODULE_PARM_DESC(logging_level, " bits for enabling additional logging info "
+    "(default=0)");
+
+/* scsi-mid layer global parmeter is max_report_luns, which is 511 */
+#define MPT2SAS_MAX_LUN (16895)
+static int max_lun = MPT2SAS_MAX_LUN;
+module_param(max_lun, int, 0);
+MODULE_PARM_DESC(max_lun, " max lun, default=16895 ");
+
+/**
+ * struct sense_info - common structure for obtaining sense keys
+ * @skey: sense key
+ * @asc: additional sense code
+ * @ascq: additional sense code qualifier
+ */
+struct sense_info {
+	u8 skey;
+	u8 asc;
+	u8 ascq;
+};
+
+
+#define MPT2SAS_RESCAN_AFTER_HOST_RESET (0xFFFF)
+/**
+ * struct fw_event_work - firmware event struct
+ * @list: link list framework
+ * @work: work object (ioc->fault_reset_work_q)
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @host_reset_handling: handling events during host reset
+ * @ignore: flag meaning this event has been marked to ignore
+ * @event: firmware event MPI2_EVENT_XXX defined in mpt2_ioc.h
+ * @event_data: reply event data payload follows
+ *
+ * This object stored on ioc->fw_event_list.
+ */
+struct fw_event_work {
+	struct list_head 	list;
+	struct delayed_work	work;
+	struct MPT2SAS_ADAPTER *ioc;
+	u8			VF_ID;
+	u8			host_reset_handling;
+	u8			ignore;
+	u16			event;
+	void			*event_data;
+};
+
+/**
+ * struct _scsi_io_transfer - scsi io transfer
+ * @handle: sas device handle (assigned by firmware)
+ * @is_raid: flag set for hidden raid components
+ * @dir: DMA_TO_DEVICE, DMA_FROM_DEVICE,
+ * @data_length: data transfer length
+ * @data_dma: dma pointer to data
+ * @sense: sense data
+ * @lun: lun number
+ * @cdb_length: cdb length
+ * @cdb: cdb contents
+ * @valid_reply: flag set for reply message
+ * @timeout: timeout for this command
+ * @sense_length: sense length
+ * @ioc_status: ioc status
+ * @scsi_state: scsi state
+ * @scsi_status: scsi staus
+ * @log_info: log information
+ * @transfer_length: data length transfer when there is a reply message
+ *
+ * Used for sending internal scsi commands to devices within this module.
+ * Refer to _scsi_send_scsi_io().
+ */
+struct _scsi_io_transfer {
+	u16	handle;
+	u8	is_raid;
+	enum dma_data_direction dir;
+	u32	data_length;
+	dma_addr_t data_dma;
+	u8 	sense[SCSI_SENSE_BUFFERSIZE];
+	u32	lun;
+	u8	cdb_length;
+	u8	cdb[32];
+	u8	timeout;
+	u8	valid_reply;
+  /* the following bits are only valid when 'valid_reply = 1' */
+	u32	sense_length;
+	u16	ioc_status;
+	u8	scsi_state;
+	u8	scsi_status;
+	u32	log_info;
+	u32	transfer_length;
+};
+
+/*
+ * The pci device ids are defined in mpi/mpi2_cnfg.h.
+ */
+static struct pci_device_id scsih_pci_table[] = {
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2004,
+		PCI_ANY_ID, PCI_ANY_ID },
+	/* Falcon ~ 2008*/
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2008,
+		PCI_ANY_ID, PCI_ANY_ID },
+	/* Liberator ~ 2108 */
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2108_1,
+		PCI_ANY_ID, PCI_ANY_ID },
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2108_2,
+		PCI_ANY_ID, PCI_ANY_ID },
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2108_3,
+		PCI_ANY_ID, PCI_ANY_ID },
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2116_1,
+		PCI_ANY_ID, PCI_ANY_ID },
+	{ MPI2_MFGPAGE_VENDORID_LSI, MPI2_MFGPAGE_DEVID_SAS2116_2,
+		PCI_ANY_ID, PCI_ANY_ID },
+	{0}	/* Terminating entry */
+};
+MODULE_DEVICE_TABLE(pci, scsih_pci_table);
+
+/**
+ * scsih_set_debug_level - global setting of ioc->logging_level.
+ *
+ * Note: The logging levels are defined in mpt2sas_debug.h.
+ */
+static int
+scsih_set_debug_level(const char *val, struct kernel_param *kp)
+{
+	int ret = param_set_int(val, kp);
+	struct MPT2SAS_ADAPTER *ioc;
+
+	if (ret)
+		return ret;
+
+	printk(KERN_INFO "setting logging_level(0x%08x)\n", logging_level);
+	list_for_each_entry(ioc, &mpt2sas_ioc_list, list)
+		ioc->logging_level = logging_level;
+	return 0;
+}
+module_param_call(logging_level, scsih_set_debug_level, param_get_int,
+    &logging_level, 0644);
+
+/**
+ * _scsih_srch_boot_sas_address - search based on sas_address
+ * @sas_address: sas address
+ * @boot_device: boot device object from bios page 2
+ *
+ * Returns 1 when there's a match, 0 means no match.
+ */
+static inline int
+_scsih_srch_boot_sas_address(u64 sas_address,
+    Mpi2BootDeviceSasWwid_t *boot_device)
+{
+	return (sas_address == le64_to_cpu(boot_device->SASAddress)) ?  1 : 0;
+}
+
+/**
+ * _scsih_srch_boot_device_name - search based on device name
+ * @device_name: device name specified in INDENTIFY fram
+ * @boot_device: boot device object from bios page 2
+ *
+ * Returns 1 when there's a match, 0 means no match.
+ */
+static inline int
+_scsih_srch_boot_device_name(u64 device_name,
+    Mpi2BootDeviceDeviceName_t *boot_device)
+{
+	return (device_name == le64_to_cpu(boot_device->DeviceName)) ? 1 : 0;
+}
+
+/**
+ * _scsih_srch_boot_encl_slot - search based on enclosure_logical_id/slot
+ * @enclosure_logical_id: enclosure logical id
+ * @slot_number: slot number
+ * @boot_device: boot device object from bios page 2
+ *
+ * Returns 1 when there's a match, 0 means no match.
+ */
+static inline int
+_scsih_srch_boot_encl_slot(u64 enclosure_logical_id, u16 slot_number,
+    Mpi2BootDeviceEnclosureSlot_t *boot_device)
+{
+	return (enclosure_logical_id == le64_to_cpu(boot_device->
+	    EnclosureLogicalID) && slot_number == le16_to_cpu(boot_device->
+	    SlotNumber)) ? 1 : 0;
+}
+
+/**
+ * _scsih_is_boot_device - search for matching boot device.
+ * @sas_address: sas address
+ * @device_name: device name specified in INDENTIFY fram
+ * @enclosure_logical_id: enclosure logical id
+ * @slot_number: slot number
+ * @form: specifies boot device form
+ * @boot_device: boot device object from bios page 2
+ *
+ * Returns 1 when there's a match, 0 means no match.
+ */
+static int
+_scsih_is_boot_device(u64 sas_address, u64 device_name,
+    u64 enclosure_logical_id, u16 slot, u8 form,
+    Mpi2BiosPage2BootDevice_t *boot_device)
+{
+	int rc = 0;
+
+	switch (form) {
+	case MPI2_BIOSPAGE2_FORM_SAS_WWID:
+		if (!sas_address)
+			break;
+		rc = _scsih_srch_boot_sas_address(
+		    sas_address, &boot_device->SasWwid);
+		break;
+	case MPI2_BIOSPAGE2_FORM_ENCLOSURE_SLOT:
+		if (!enclosure_logical_id)
+			break;
+		rc = _scsih_srch_boot_encl_slot(
+		    enclosure_logical_id,
+		    slot, &boot_device->EnclosureSlot);
+		break;
+	case MPI2_BIOSPAGE2_FORM_DEVICE_NAME:
+		if (!device_name)
+			break;
+		rc = _scsih_srch_boot_device_name(
+		    device_name, &boot_device->DeviceName);
+		break;
+	case MPI2_BIOSPAGE2_FORM_NO_DEVICE_SPECIFIED:
+		break;
+	}
+
+	return rc;
+}
+
+/**
+ * _scsih_determine_boot_device - determine boot device.
+ * @ioc: per adapter object
+ * @device: either sas_device or raid_device object
+ * @is_raid: [flag] 1 = raid object, 0 = sas object
+ *
+ * Determines whether this device should be first reported device to
+ * to scsi-ml or sas transport, this purpose is for persistant boot device.
+ * There are primary, alternate, and current entries in bios page 2. The order
+ * priority is primary, alternate, then current.  This routine saves
+ * the corresponding device object and is_raid flag in the ioc object.
+ * The saved data to be used later in _scsih_probe_boot_devices().
+ */
+static void
+_scsih_determine_boot_device(struct MPT2SAS_ADAPTER *ioc,
+    void *device, u8 is_raid)
+{
+	struct _sas_device *sas_device;
+	struct _raid_device *raid_device;
+	u64 sas_address;
+	u64 device_name;
+	u64 enclosure_logical_id;
+	u16 slot;
+
+	 /* only process this function when driver loads */
+	if (!ioc->wait_for_port_enable_to_complete)
+		return;
+
+	if (!is_raid) {
+		sas_device = device;
+		sas_address = sas_device->sas_address;
+		device_name = sas_device->device_name;
+		enclosure_logical_id = sas_device->enclosure_logical_id;
+		slot = sas_device->slot;
+	} else {
+		raid_device = device;
+		sas_address = raid_device->wwid;
+		device_name = 0;
+		enclosure_logical_id = 0;
+		slot = 0;
+	}
+
+	if (!ioc->req_boot_device.device) {
+		if (_scsih_is_boot_device(sas_address, device_name,
+		    enclosure_logical_id, slot,
+		    (ioc->bios_pg2.ReqBootDeviceForm &
+		    MPI2_BIOSPAGE2_FORM_MASK),
+		    &ioc->bios_pg2.RequestedBootDevice)) {
+			dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+			   "%s: req_boot_device(0x%016llx)\n",
+			    ioc->name, __func__,
+			    (unsigned long long)sas_address));
+			ioc->req_boot_device.device = device;
+			ioc->req_boot_device.is_raid = is_raid;
+		}
+	}
+
+	if (!ioc->req_alt_boot_device.device) {
+		if (_scsih_is_boot_device(sas_address, device_name,
+		    enclosure_logical_id, slot,
+		    (ioc->bios_pg2.ReqAltBootDeviceForm &
+		    MPI2_BIOSPAGE2_FORM_MASK),
+		    &ioc->bios_pg2.RequestedAltBootDevice)) {
+			dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+			   "%s: req_alt_boot_device(0x%016llx)\n",
+			    ioc->name, __func__,
+			    (unsigned long long)sas_address));
+			ioc->req_alt_boot_device.device = device;
+			ioc->req_alt_boot_device.is_raid = is_raid;
+		}
+	}
+
+	if (!ioc->current_boot_device.device) {
+		if (_scsih_is_boot_device(sas_address, device_name,
+		    enclosure_logical_id, slot,
+		    (ioc->bios_pg2.CurrentBootDeviceForm &
+		    MPI2_BIOSPAGE2_FORM_MASK),
+		    &ioc->bios_pg2.CurrentBootDevice)) {
+			dinitprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+			   "%s: current_boot_device(0x%016llx)\n",
+			    ioc->name, __func__,
+			    (unsigned long long)sas_address));
+			ioc->current_boot_device.device = device;
+			ioc->current_boot_device.is_raid = is_raid;
+		}
+	}
+}
+
+/**
+ * mpt2sas_scsih_sas_device_find_by_sas_address - sas device search
+ * @ioc: per adapter object
+ * @sas_address: sas address
+ * Context: Calling function should acquire ioc->sas_device_lock
+ *
+ * This searches for sas_device based on sas_address, then return sas_device
+ * object.
+ */
+struct _sas_device *
+mpt2sas_scsih_sas_device_find_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
+    u64 sas_address)
+{
+	struct _sas_device *sas_device, *r;
+
+	r = NULL;
+	/* check the sas_device_init_list */
+	list_for_each_entry(sas_device, &ioc->sas_device_init_list,
+	    list) {
+		if (sas_device->sas_address != sas_address)
+			continue;
+		r = sas_device;
+		goto out;
+	}
+
+	/* then check the sas_device_list */
+	list_for_each_entry(sas_device, &ioc->sas_device_list, list) {
+		if (sas_device->sas_address != sas_address)
+			continue;
+		r = sas_device;
+		goto out;
+	}
+ out:
+	return r;
+}
+
+/**
+ * _scsih_sas_device_find_by_handle - sas device search
+ * @ioc: per adapter object
+ * @handle: sas device handle (assigned by firmware)
+ * Context: Calling function should acquire ioc->sas_device_lock
+ *
+ * This searches for sas_device based on sas_address, then return sas_device
+ * object.
+ */
+static struct _sas_device *
+_scsih_sas_device_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_device *sas_device, *r;
+
+	r = NULL;
+	if (ioc->wait_for_port_enable_to_complete) {
+		list_for_each_entry(sas_device, &ioc->sas_device_init_list,
+		    list) {
+			if (sas_device->handle != handle)
+				continue;
+			r = sas_device;
+			goto out;
+		}
+	} else {
+		list_for_each_entry(sas_device, &ioc->sas_device_list, list) {
+			if (sas_device->handle != handle)
+				continue;
+			r = sas_device;
+			goto out;
+		}
+	}
+
+ out:
+	return r;
+}
+
+/**
+ * _scsih_sas_device_remove - remove sas_device from list.
+ * @ioc: per adapter object
+ * @sas_device: the sas_device object
+ * Context: This function will acquire ioc->sas_device_lock.
+ *
+ * Removing object and freeing associated memory from the ioc->sas_device_list.
+ */
+static void
+_scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_device *sas_device)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	list_del(&sas_device->list);
+	memset(sas_device, 0, sizeof(struct _sas_device));
+	kfree(sas_device);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+}
+
+/**
+ * _scsih_sas_device_add - insert sas_device to the list.
+ * @ioc: per adapter object
+ * @sas_device: the sas_device object
+ * Context: This function will acquire ioc->sas_device_lock.
+ *
+ * Adding new object to the ioc->sas_device_list.
+ */
+static void
+_scsih_sas_device_add(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_device *sas_device)
+{
+	unsigned long flags;
+	u16 handle, parent_handle;
+	u64 sas_address;
+
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: handle"
+	    "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
+	    sas_device->handle, (unsigned long long)sas_device->sas_address));
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	list_add_tail(&sas_device->list, &ioc->sas_device_list);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	handle = sas_device->handle;
+	parent_handle = sas_device->parent_handle;
+	sas_address = sas_device->sas_address;
+	if (!mpt2sas_transport_port_add(ioc, handle, parent_handle)) {
+		_scsih_sas_device_remove(ioc, sas_device);
+	} else if (!sas_device->starget) {
+		mpt2sas_transport_port_remove(ioc, sas_address, parent_handle);
+		_scsih_sas_device_remove(ioc, sas_device);
+	}
+}
+
+/**
+ * _scsih_sas_device_init_add - insert sas_device to the list.
+ * @ioc: per adapter object
+ * @sas_device: the sas_device object
+ * Context: This function will acquire ioc->sas_device_lock.
+ *
+ * Adding new object at driver load time to the ioc->sas_device_init_list.
+ */
+static void
+_scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_device *sas_device)
+{
+	unsigned long flags;
+
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: handle"
+	    "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
+	    sas_device->handle, (unsigned long long)sas_device->sas_address));
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	_scsih_determine_boot_device(ioc, sas_device, 0);
+}
+
+/**
+ * mpt2sas_scsih_expander_find_by_handle - expander device search
+ * @ioc: per adapter object
+ * @handle: expander handle (assigned by firmware)
+ * Context: Calling function should acquire ioc->sas_device_lock
+ *
+ * This searches for expander device based on handle, then returns the
+ * sas_node object.
+ */
+struct _sas_node *
+mpt2sas_scsih_expander_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_node *sas_expander, *r;
+
+	r = NULL;
+	list_for_each_entry(sas_expander, &ioc->sas_expander_list, list) {
+		if (sas_expander->handle != handle)
+			continue;
+		r = sas_expander;
+		goto out;
+	}
+ out:
+	return r;
+}
+
+/**
+ * _scsih_raid_device_find_by_id - raid device search
+ * @ioc: per adapter object
+ * @id: sas device target id
+ * @channel: sas device channel
+ * Context: Calling function should acquire ioc->raid_device_lock
+ *
+ * This searches for raid_device based on target id, then return raid_device
+ * object.
+ */
+static struct _raid_device *
+_scsih_raid_device_find_by_id(struct MPT2SAS_ADAPTER *ioc, int id, int channel)
+{
+	struct _raid_device *raid_device, *r;
+
+	r = NULL;
+	list_for_each_entry(raid_device, &ioc->raid_device_list, list) {
+		if (raid_device->id == id && raid_device->channel == channel) {
+			r = raid_device;
+			goto out;
+		}
+	}
+
+ out:
+	return r;
+}
+
+/**
+ * _scsih_raid_device_find_by_handle - raid device search
+ * @ioc: per adapter object
+ * @handle: sas device handle (assigned by firmware)
+ * Context: Calling function should acquire ioc->raid_device_lock
+ *
+ * This searches for raid_device based on handle, then return raid_device
+ * object.
+ */
+static struct _raid_device *
+_scsih_raid_device_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _raid_device *raid_device, *r;
+
+	r = NULL;
+	list_for_each_entry(raid_device, &ioc->raid_device_list, list) {
+		if (raid_device->handle != handle)
+			continue;
+		r = raid_device;
+		goto out;
+	}
+
+ out:
+	return r;
+}
+
+/**
+ * _scsih_raid_device_find_by_wwid - raid device search
+ * @ioc: per adapter object
+ * @handle: sas device handle (assigned by firmware)
+ * Context: Calling function should acquire ioc->raid_device_lock
+ *
+ * This searches for raid_device based on wwid, then return raid_device
+ * object.
+ */
+static struct _raid_device *
+_scsih_raid_device_find_by_wwid(struct MPT2SAS_ADAPTER *ioc, u64 wwid)
+{
+	struct _raid_device *raid_device, *r;
+
+	r = NULL;
+	list_for_each_entry(raid_device, &ioc->raid_device_list, list) {
+		if (raid_device->wwid != wwid)
+			continue;
+		r = raid_device;
+		goto out;
+	}
+
+ out:
+	return r;
+}
+
+/**
+ * _scsih_raid_device_add - add raid_device object
+ * @ioc: per adapter object
+ * @raid_device: raid_device object
+ *
+ * This is added to the raid_device_list link list.
+ */
+static void
+_scsih_raid_device_add(struct MPT2SAS_ADAPTER *ioc,
+    struct _raid_device *raid_device)
+{
+	unsigned long flags;
+
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: handle"
+	    "(0x%04x), wwid(0x%016llx)\n", ioc->name, __func__,
+	    raid_device->handle, (unsigned long long)raid_device->wwid));
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	list_add_tail(&raid_device->list, &ioc->raid_device_list);
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+}
+
+/**
+ * _scsih_raid_device_remove - delete raid_device object
+ * @ioc: per adapter object
+ * @raid_device: raid_device object
+ *
+ * This is removed from the raid_device_list link list.
+ */
+static void
+_scsih_raid_device_remove(struct MPT2SAS_ADAPTER *ioc,
+    struct _raid_device *raid_device)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	list_del(&raid_device->list);
+	memset(raid_device, 0, sizeof(struct _raid_device));
+	kfree(raid_device);
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+}
+
+/**
+ * mpt2sas_scsih_expander_find_by_sas_address - expander device search
+ * @ioc: per adapter object
+ * @sas_address: sas address
+ * Context: Calling function should acquire ioc->sas_node_lock.
+ *
+ * This searches for expander device based on sas_address, then returns the
+ * sas_node object.
+ */
+struct _sas_node *
+mpt2sas_scsih_expander_find_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
+    u64 sas_address)
+{
+	struct _sas_node *sas_expander, *r;
+
+	r = NULL;
+	list_for_each_entry(sas_expander, &ioc->sas_expander_list, list) {
+		if (sas_expander->sas_address != sas_address)
+			continue;
+		r = sas_expander;
+		goto out;
+	}
+ out:
+	return r;
+}
+
+/**
+ * _scsih_expander_node_add - insert expander device to the list.
+ * @ioc: per adapter object
+ * @sas_expander: the sas_device object
+ * Context: This function will acquire ioc->sas_node_lock.
+ *
+ * Adding new object to the ioc->sas_expander_list.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_expander_node_add(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_node *sas_expander)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	list_add_tail(&sas_expander->list, &ioc->sas_expander_list);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+}
+
+/**
+ * _scsih_is_end_device - determines if device is an end device
+ * @device_info: bitfield providing information about the device.
+ * Context: none
+ *
+ * Returns 1 if end device.
+ */
+static int
+_scsih_is_end_device(u32 device_info)
+{
+	if (device_info & MPI2_SAS_DEVICE_INFO_END_DEVICE &&
+		((device_info & MPI2_SAS_DEVICE_INFO_SSP_TARGET) |
+		(device_info & MPI2_SAS_DEVICE_INFO_STP_TARGET) |
+		(device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)))
+		return 1;
+	else
+		return 0;
+}
+
+/**
+ * _scsih_scsi_lookup_get - returns scmd entry
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * Context: This function will acquire ioc->scsi_lookup_lock.
+ *
+ * Returns the smid stored scmd pointer.
+ */
+static struct scsi_cmnd *
+_scsih_scsi_lookup_get(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	unsigned long	flags;
+	struct scsi_cmnd *scmd;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	scmd = ioc->scsi_lookup[smid - 1].scmd;
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+	return scmd;
+}
+
+/**
+ * mptscsih_getclear_scsi_lookup - returns scmd entry
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * Context: This function will acquire ioc->scsi_lookup_lock.
+ *
+ * Returns the smid stored scmd pointer, as well as clearing the scmd pointer.
+ */
+static struct scsi_cmnd *
+_scsih_scsi_lookup_getclear(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	unsigned long	flags;
+	struct scsi_cmnd *scmd;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	scmd = ioc->scsi_lookup[smid - 1].scmd;
+	ioc->scsi_lookup[smid - 1].scmd = NULL;
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+	return scmd;
+}
+
+/**
+ * _scsih_scsi_lookup_set - updates scmd entry in lookup
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @scmd: pointer to scsi command object
+ * Context: This function will acquire ioc->scsi_lookup_lock.
+ *
+ * This will save scmd pointer in the scsi_lookup array.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_scsi_lookup_set(struct MPT2SAS_ADAPTER *ioc, u16 smid,
+    struct scsi_cmnd *scmd)
+{
+	unsigned long	flags;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	ioc->scsi_lookup[smid - 1].scmd = scmd;
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+}
+
+/**
+ * _scsih_scsi_lookup_find_by_scmd - scmd lookup
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @scmd: pointer to scsi command object
+ * Context: This function will acquire ioc->scsi_lookup_lock.
+ *
+ * This will search for a scmd pointer in the scsi_lookup array,
+ * returning the revelent smid.  A returned value of zero means invalid.
+ */
+static u16
+_scsih_scsi_lookup_find_by_scmd(struct MPT2SAS_ADAPTER *ioc, struct scsi_cmnd
+    *scmd)
+{
+	u16 smid;
+	unsigned long	flags;
+	int i;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	smid = 0;
+	for (i = 0; i < ioc->request_depth; i++) {
+		if (ioc->scsi_lookup[i].scmd == scmd) {
+			smid = i + 1;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+	return smid;
+}
+
+/**
+ * _scsih_scsi_lookup_find_by_target - search for matching channel:id
+ * @ioc: per adapter object
+ * @id: target id
+ * @channel: channel
+ * Context: This function will acquire ioc->scsi_lookup_lock.
+ *
+ * This will search for a matching channel:id in the scsi_lookup array,
+ * returning 1 if found.
+ */
+static u8
+_scsih_scsi_lookup_find_by_target(struct MPT2SAS_ADAPTER *ioc, int id,
+    int channel)
+{
+	u8 found;
+	unsigned long	flags;
+	int i;
+
+	spin_lock_irqsave(&ioc->scsi_lookup_lock, flags);
+	found = 0;
+	for (i = 0 ; i < ioc->request_depth; i++) {
+		if (ioc->scsi_lookup[i].scmd &&
+		    (ioc->scsi_lookup[i].scmd->device->id == id &&
+		    ioc->scsi_lookup[i].scmd->device->channel == channel)) {
+			found = 1;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->scsi_lookup_lock, flags);
+	return found;
+}
+
+/**
+ * _scsih_get_chain_buffer_dma - obtain block of chains (dma address)
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * Returns phys pointer to chain buffer.
+ */
+static dma_addr_t
+_scsih_get_chain_buffer_dma(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return ioc->chain_dma + ((smid - 1) * (ioc->request_sz *
+	    ioc->chains_needed_per_io));
+}
+
+/**
+ * _scsih_get_chain_buffer - obtain block of chains assigned to a mf request
+ * @ioc: per adapter object
+ * @smid: system request message index
+ *
+ * Returns virt pointer to chain buffer.
+ */
+static void *
+_scsih_get_chain_buffer(struct MPT2SAS_ADAPTER *ioc, u16 smid)
+{
+	return (void *)(ioc->chain + ((smid - 1) * (ioc->request_sz *
+	    ioc->chains_needed_per_io)));
+}
+
+/**
+ * _scsih_build_scatter_gather - main sg creation routine
+ * @ioc: per adapter object
+ * @scmd: scsi command
+ * @smid: system request message index
+ * Context: none.
+ *
+ * The main routine that builds scatter gather table from a given
+ * scsi request sent via the .queuecommand main handler.
+ *
+ * Returns 0 success, anything else error
+ */
+static int
+_scsih_build_scatter_gather(struct MPT2SAS_ADAPTER *ioc,
+    struct scsi_cmnd *scmd, u16 smid)
+{
+	Mpi2SCSIIORequest_t *mpi_request;
+	dma_addr_t chain_dma;
+	struct scatterlist *sg_scmd;
+	void *sg_local, *chain;
+	u32 chain_offset;
+	u32 chain_length;
+	u32 chain_flags;
+	u32 sges_left;
+	u32 sges_in_segment;
+	u32 sgl_flags;
+	u32 sgl_flags_last_element;
+	u32 sgl_flags_end_buffer;
+
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+
+	/* init scatter gather flags */
+	sgl_flags = MPI2_SGE_FLAGS_SIMPLE_ELEMENT;
+	if (scmd->sc_data_direction == DMA_TO_DEVICE)
+		sgl_flags |= MPI2_SGE_FLAGS_HOST_TO_IOC;
+	sgl_flags_last_element = (sgl_flags | MPI2_SGE_FLAGS_LAST_ELEMENT)
+	    << MPI2_SGE_FLAGS_SHIFT;
+	sgl_flags_end_buffer = (sgl_flags | MPI2_SGE_FLAGS_LAST_ELEMENT |
+	    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_END_OF_LIST)
+	    << MPI2_SGE_FLAGS_SHIFT;
+	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+
+	sg_scmd = scsi_sglist(scmd);
+	sges_left = scsi_dma_map(scmd);
+	if (!sges_left) {
+		sdev_printk(KERN_ERR, scmd->device, "pci_map_sg"
+		" failed: request for %d bytes!\n", scsi_bufflen(scmd));
+		return -ENOMEM;
+	}
+
+	sg_local = &mpi_request->SGL;
+	sges_in_segment = ioc->max_sges_in_main_message;
+	if (sges_left <= sges_in_segment)
+		goto fill_in_last_segment;
+
+	mpi_request->ChainOffset = (offsetof(Mpi2SCSIIORequest_t, SGL) +
+	    (sges_in_segment * ioc->sge_size))/4;
+
+	/* fill in main message segment when there is a chain following */
+	while (sges_in_segment) {
+		if (sges_in_segment == 1)
+			ioc->base_add_sg_single(sg_local,
+			    sgl_flags_last_element | sg_dma_len(sg_scmd),
+			    sg_dma_address(sg_scmd));
+		else
+			ioc->base_add_sg_single(sg_local, sgl_flags |
+			    sg_dma_len(sg_scmd), sg_dma_address(sg_scmd));
+		sg_scmd = sg_next(sg_scmd);
+		sg_local += ioc->sge_size;
+		sges_left--;
+		sges_in_segment--;
+	}
+
+	/* initializing the chain flags and pointers */
+	chain_flags = MPI2_SGE_FLAGS_CHAIN_ELEMENT << MPI2_SGE_FLAGS_SHIFT;
+	chain = _scsih_get_chain_buffer(ioc, smid);
+	chain_dma = _scsih_get_chain_buffer_dma(ioc, smid);
+	do {
+		sges_in_segment = (sges_left <=
+		    ioc->max_sges_in_chain_message) ? sges_left :
+		    ioc->max_sges_in_chain_message;
+		chain_offset = (sges_left == sges_in_segment) ?
+		    0 : (sges_in_segment * ioc->sge_size)/4;
+		chain_length = sges_in_segment * ioc->sge_size;
+		if (chain_offset) {
+			chain_offset = chain_offset <<
+			    MPI2_SGE_CHAIN_OFFSET_SHIFT;
+			chain_length += ioc->sge_size;
+		}
+		ioc->base_add_sg_single(sg_local, chain_flags | chain_offset |
+		    chain_length, chain_dma);
+		sg_local = chain;
+		if (!chain_offset)
+			goto fill_in_last_segment;
+
+		/* fill in chain segments */
+		while (sges_in_segment) {
+			if (sges_in_segment == 1)
+				ioc->base_add_sg_single(sg_local,
+				    sgl_flags_last_element |
+				    sg_dma_len(sg_scmd),
+				    sg_dma_address(sg_scmd));
+			else
+				ioc->base_add_sg_single(sg_local, sgl_flags |
+				    sg_dma_len(sg_scmd),
+				    sg_dma_address(sg_scmd));
+			sg_scmd = sg_next(sg_scmd);
+			sg_local += ioc->sge_size;
+			sges_left--;
+			sges_in_segment--;
+		}
+
+		chain_dma += ioc->request_sz;
+		chain += ioc->request_sz;
+	} while (1);
+
+
+ fill_in_last_segment:
+
+	/* fill the last segment */
+	while (sges_left) {
+		if (sges_left == 1)
+			ioc->base_add_sg_single(sg_local, sgl_flags_end_buffer |
+			    sg_dma_len(sg_scmd), sg_dma_address(sg_scmd));
+		else
+			ioc->base_add_sg_single(sg_local, sgl_flags |
+			    sg_dma_len(sg_scmd), sg_dma_address(sg_scmd));
+		sg_scmd = sg_next(sg_scmd);
+		sg_local += ioc->sge_size;
+		sges_left--;
+	}
+
+	return 0;
+}
+
+/**
+ * scsih_change_queue_depth - setting device queue depth
+ * @sdev: scsi device struct
+ * @qdepth: requested queue depth
+ *
+ * Returns queue depth.
+ */
+static int
+scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
+{
+	struct Scsi_Host *shost = sdev->host;
+	int max_depth;
+	int tag_type;
+
+	max_depth = shost->can_queue;
+	if (!sdev->tagged_supported)
+		max_depth = 1;
+	if (qdepth > max_depth)
+		qdepth = max_depth;
+	tag_type = (qdepth == 1) ? 0 : MSG_SIMPLE_TAG;
+	scsi_adjust_queue_depth(sdev, tag_type, qdepth);
+
+	if (sdev->inquiry_len > 7)
+		sdev_printk(KERN_INFO, sdev, "qdepth(%d), tagged(%d), "
+		"simple(%d), ordered(%d), scsi_level(%d), cmd_que(%d)\n",
+		sdev->queue_depth, sdev->tagged_supported, sdev->simple_tags,
+		sdev->ordered_tags, sdev->scsi_level,
+		(sdev->inquiry[7] & 2) >> 1);
+
+	return sdev->queue_depth;
+}
+
+/**
+ * scsih_change_queue_depth - changing device queue tag type
+ * @sdev: scsi device struct
+ * @tag_type: requested tag type
+ *
+ * Returns queue tag type.
+ */
+static int
+scsih_change_queue_type(struct scsi_device *sdev, int tag_type)
+{
+	if (sdev->tagged_supported) {
+		scsi_set_tag_type(sdev, tag_type);
+		if (tag_type)
+			scsi_activate_tcq(sdev, sdev->queue_depth);
+		else
+			scsi_deactivate_tcq(sdev, sdev->queue_depth);
+	} else
+		tag_type = 0;
+
+	return tag_type;
+}
+
+/**
+ * scsih_target_alloc - target add routine
+ * @starget: scsi target struct
+ *
+ * Returns 0 if ok. Any other return is assumed to be an error and
+ * the device is ignored.
+ */
+static int
+scsih_target_alloc(struct scsi_target *starget)
+{
+	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct _sas_device *sas_device;
+	struct _raid_device *raid_device;
+	unsigned long flags;
+	struct sas_rphy *rphy;
+
+	sas_target_priv_data = kzalloc(sizeof(struct scsi_target), GFP_KERNEL);
+	if (!sas_target_priv_data)
+		return -ENOMEM;
+
+	starget->hostdata = sas_target_priv_data;
+	sas_target_priv_data->starget = starget;
+	sas_target_priv_data->handle = MPT2SAS_INVALID_DEVICE_HANDLE;
+
+	/* RAID volumes */
+	if (starget->channel == RAID_CHANNEL) {
+		spin_lock_irqsave(&ioc->raid_device_lock, flags);
+		raid_device = _scsih_raid_device_find_by_id(ioc, starget->id,
+		    starget->channel);
+		if (raid_device) {
+			sas_target_priv_data->handle = raid_device->handle;
+			sas_target_priv_data->sas_address = raid_device->wwid;
+			sas_target_priv_data->flags |= MPT_TARGET_FLAGS_VOLUME;
+			raid_device->starget = starget;
+		}
+		spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+		return 0;
+	}
+
+	/* sas/sata devices */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	rphy = dev_to_rphy(starget->dev.parent);
+	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+	   rphy->identify.sas_address);
+
+	if (sas_device) {
+		sas_target_priv_data->handle = sas_device->handle;
+		sas_target_priv_data->sas_address = sas_device->sas_address;
+		sas_device->starget = starget;
+		sas_device->id = starget->id;
+		sas_device->channel = starget->channel;
+		if (sas_device->hidden_raid_component)
+			sas_target_priv_data->flags |=
+			    MPT_TARGET_FLAGS_RAID_COMPONENT;
+	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return 0;
+}
+
+/**
+ * scsih_target_destroy - target destroy routine
+ * @starget: scsi target struct
+ *
+ * Returns nothing.
+ */
+static void
+scsih_target_destroy(struct scsi_target *starget)
+{
+	struct Scsi_Host *shost = dev_to_shost(&starget->dev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct _sas_device *sas_device;
+	struct _raid_device *raid_device;
+	unsigned long flags;
+	struct sas_rphy *rphy;
+
+	sas_target_priv_data = starget->hostdata;
+	if (!sas_target_priv_data)
+		return;
+
+	if (starget->channel == RAID_CHANNEL) {
+		spin_lock_irqsave(&ioc->raid_device_lock, flags);
+		raid_device = _scsih_raid_device_find_by_id(ioc, starget->id,
+		    starget->channel);
+		if (raid_device) {
+			raid_device->starget = NULL;
+			raid_device->sdev = NULL;
+		}
+		spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+		goto out;
+	}
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	rphy = dev_to_rphy(starget->dev.parent);
+	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+	   rphy->identify.sas_address);
+	if (sas_device)
+		sas_device->starget = NULL;
+
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+ out:
+	kfree(sas_target_priv_data);
+	starget->hostdata = NULL;
+}
+
+/**
+ * scsih_slave_alloc - device add routine
+ * @sdev: scsi device struct
+ *
+ * Returns 0 if ok. Any other return is assumed to be an error and
+ * the device is ignored.
+ */
+static int
+scsih_slave_alloc(struct scsi_device *sdev)
+{
+	struct Scsi_Host *shost;
+	struct MPT2SAS_ADAPTER *ioc;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct scsi_target *starget;
+	struct _raid_device *raid_device;
+	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	sas_device_priv_data = kzalloc(sizeof(struct scsi_device), GFP_KERNEL);
+	if (!sas_device_priv_data)
+		return -ENOMEM;
+
+	sas_device_priv_data->lun = sdev->lun;
+	sas_device_priv_data->flags = MPT_DEVICE_FLAGS_INIT;
+
+	starget = scsi_target(sdev);
+	sas_target_priv_data = starget->hostdata;
+	sas_target_priv_data->num_luns++;
+	sas_device_priv_data->sas_target = sas_target_priv_data;
+	sdev->hostdata = sas_device_priv_data;
+	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT))
+		sdev->no_uld_attach = 1;
+
+	shost = dev_to_shost(&starget->dev);
+	ioc = shost_priv(shost);
+	if (starget->channel == RAID_CHANNEL) {
+		spin_lock_irqsave(&ioc->raid_device_lock, flags);
+		raid_device = _scsih_raid_device_find_by_id(ioc,
+		    starget->id, starget->channel);
+		if (raid_device)
+			raid_device->sdev = sdev; /* raid is single lun */
+		spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+	} else {
+		/* set TLR bit for SSP devices */
+		if (!(ioc->facts.IOCCapabilities &
+		     MPI2_IOCFACTS_CAPABILITY_TLR))
+			goto out;
+		spin_lock_irqsave(&ioc->sas_device_lock, flags);
+		sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+		   sas_device_priv_data->sas_target->sas_address);
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		if (sas_device && sas_device->device_info &
+		    MPI2_SAS_DEVICE_INFO_SSP_TARGET)
+			sas_device_priv_data->flags |= MPT_DEVICE_TLR_ON;
+	}
+
+ out:
+	return 0;
+}
+
+/**
+ * scsih_slave_destroy - device destroy routine
+ * @sdev: scsi device struct
+ *
+ * Returns nothing.
+ */
+static void
+scsih_slave_destroy(struct scsi_device *sdev)
+{
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct scsi_target *starget;
+
+	if (!sdev->hostdata)
+		return;
+
+	starget = scsi_target(sdev);
+	sas_target_priv_data = starget->hostdata;
+	sas_target_priv_data->num_luns--;
+	kfree(sdev->hostdata);
+	sdev->hostdata = NULL;
+}
+
+/**
+ * scsih_display_sata_capabilities - sata capabilities
+ * @ioc: per adapter object
+ * @sas_device: the sas_device object
+ * @sdev: scsi device struct
+ */
+static void
+scsih_display_sata_capabilities(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_device *sas_device, struct scsi_device *sdev)
+{
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	u32 ioc_status;
+	u16 flags;
+	u32 device_info;
+
+	if ((mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
+	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, sas_device->handle))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	flags = le16_to_cpu(sas_device_pg0.Flags);
+	device_info = le16_to_cpu(sas_device_pg0.DeviceInfo);
+
+	sdev_printk(KERN_INFO, sdev,
+	    "atapi(%s), ncq(%s), asyn_notify(%s), smart(%s), fua(%s), "
+	    "sw_preserve(%s)\n",
+	    (device_info & MPI2_SAS_DEVICE_INFO_ATAPI_DEVICE) ? "y" : "n",
+	    (flags & MPI2_SAS_DEVICE0_FLAGS_SATA_NCQ_SUPPORTED) ? "y" : "n",
+	    (flags & MPI2_SAS_DEVICE0_FLAGS_SATA_ASYNCHRONOUS_NOTIFY) ? "y" :
+	    "n",
+	    (flags & MPI2_SAS_DEVICE0_FLAGS_SATA_SMART_SUPPORTED) ? "y" : "n",
+	    (flags & MPI2_SAS_DEVICE0_FLAGS_SATA_FUA_SUPPORTED) ? "y" : "n",
+	    (flags & MPI2_SAS_DEVICE0_FLAGS_SATA_SW_PRESERVE) ? "y" : "n");
+}
+
+/**
+ * _scsih_get_volume_capabilities - volume capabilities
+ * @ioc: per adapter object
+ * @sas_device: the raid_device object
+ */
+static void
+_scsih_get_volume_capabilities(struct MPT2SAS_ADAPTER *ioc,
+    struct _raid_device *raid_device)
+{
+	Mpi2RaidVolPage0_t *vol_pg0;
+	Mpi2RaidPhysDiskPage0_t pd_pg0;
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	Mpi2ConfigReply_t mpi_reply;
+	u16 sz;
+	u8 num_pds;
+
+	if ((mpt2sas_config_get_number_pds(ioc, raid_device->handle,
+	    &num_pds)) || !num_pds) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	raid_device->num_pds = num_pds;
+	sz = offsetof(Mpi2RaidVolPage0_t, PhysDisk) + (num_pds *
+	    sizeof(Mpi2RaidVol0PhysDisk_t));
+	vol_pg0 = kzalloc(sz, GFP_KERNEL);
+	if (!vol_pg0) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	if ((mpt2sas_config_get_raid_volume_pg0(ioc, &mpi_reply, vol_pg0,
+	     MPI2_RAID_VOLUME_PGAD_FORM_HANDLE, raid_device->handle, sz))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		kfree(vol_pg0);
+		return;
+	}
+
+	raid_device->volume_type = vol_pg0->VolumeType;
+
+	/* figure out what the underlying devices are by
+	 * obtaining the device_info bits for the 1st device
+	 */
+	if (!(mpt2sas_config_get_phys_disk_pg0(ioc, &mpi_reply,
+	    &pd_pg0, MPI2_PHYSDISK_PGAD_FORM_PHYSDISKNUM,
+	    vol_pg0->PhysDisk[0].PhysDiskNum))) {
+		if (!(mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply,
+		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
+		    le16_to_cpu(pd_pg0.DevHandle)))) {
+			raid_device->device_info =
+			    le32_to_cpu(sas_device_pg0.DeviceInfo);
+		}
+	}
+
+	kfree(vol_pg0);
+}
+
+/**
+ * scsih_slave_configure - device configure routine.
+ * @sdev: scsi device struct
+ *
+ * Returns 0 if ok. Any other return is assumed to be an error and
+ * the device is ignored.
+ */
+static int
+scsih_slave_configure(struct scsi_device *sdev)
+{
+	struct Scsi_Host *shost = sdev->host;
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct _sas_device *sas_device;
+	struct _raid_device *raid_device;
+	unsigned long flags;
+	int qdepth;
+	u8 ssp_target = 0;
+	char *ds = "";
+	char *r_level = "";
+
+	qdepth = 1;
+	sas_device_priv_data = sdev->hostdata;
+	sas_device_priv_data->configured_lun = 1;
+	sas_device_priv_data->flags &= ~MPT_DEVICE_FLAGS_INIT;
+	sas_target_priv_data = sas_device_priv_data->sas_target;
+
+	/* raid volume handling */
+	if (sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME) {
+
+		spin_lock_irqsave(&ioc->raid_device_lock, flags);
+		raid_device = _scsih_raid_device_find_by_handle(ioc,
+		     sas_target_priv_data->handle);
+		spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+		if (!raid_device) {
+			printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+			    ioc->name, __FILE__, __LINE__, __func__);
+			return 0;
+		}
+
+		_scsih_get_volume_capabilities(ioc, raid_device);
+
+		/* RAID Queue Depth Support
+		 * IS volume = underlying qdepth of drive type, either
+		 *    MPT2SAS_SAS_QUEUE_DEPTH or MPT2SAS_SATA_QUEUE_DEPTH
+		 * IM/IME/R10 = 128 (MPT2SAS_RAID_QUEUE_DEPTH)
+		 */
+		if (raid_device->device_info &
+		    MPI2_SAS_DEVICE_INFO_SSP_TARGET) {
+			qdepth = MPT2SAS_SAS_QUEUE_DEPTH;
+			ds = "SSP";
+		} else {
+			qdepth = MPT2SAS_SATA_QUEUE_DEPTH;
+			 if (raid_device->device_info &
+			    MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+				ds = "SATA";
+			else
+				ds = "STP";
+		}
+
+		switch (raid_device->volume_type) {
+		case MPI2_RAID_VOL_TYPE_RAID0:
+			r_level = "RAID0";
+			break;
+		case MPI2_RAID_VOL_TYPE_RAID1E:
+			qdepth = MPT2SAS_RAID_QUEUE_DEPTH;
+			r_level = "RAID1E";
+			break;
+		case MPI2_RAID_VOL_TYPE_RAID1:
+			qdepth = MPT2SAS_RAID_QUEUE_DEPTH;
+			r_level = "RAID1";
+			break;
+		case MPI2_RAID_VOL_TYPE_RAID10:
+			qdepth = MPT2SAS_RAID_QUEUE_DEPTH;
+			r_level = "RAID10";
+			break;
+		case MPI2_RAID_VOL_TYPE_UNKNOWN:
+		default:
+			qdepth = MPT2SAS_RAID_QUEUE_DEPTH;
+			r_level = "RAIDX";
+			break;
+		}
+
+		sdev_printk(KERN_INFO, sdev, "%s: "
+		    "handle(0x%04x), wwid(0x%016llx), pd_count(%d), type(%s)\n",
+		    r_level, raid_device->handle,
+		    (unsigned long long)raid_device->wwid,
+		    raid_device->num_pds, ds);
+		scsih_change_queue_depth(sdev, qdepth);
+		return 0;
+	}
+
+	/* non-raid handling */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+	   sas_device_priv_data->sas_target->sas_address);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (sas_device) {
+		if (sas_target_priv_data->flags &
+		    MPT_TARGET_FLAGS_RAID_COMPONENT) {
+			mpt2sas_config_get_volume_handle(ioc,
+			    sas_device->handle, &sas_device->volume_handle);
+			mpt2sas_config_get_volume_wwid(ioc,
+			    sas_device->volume_handle,
+			    &sas_device->volume_wwid);
+		}
+		if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SSP_TARGET) {
+			qdepth = MPT2SAS_SAS_QUEUE_DEPTH;
+			ssp_target = 1;
+			ds = "SSP";
+		} else {
+			qdepth = MPT2SAS_SATA_QUEUE_DEPTH;
+			if (sas_device->device_info &
+			    MPI2_SAS_DEVICE_INFO_STP_TARGET)
+				ds = "STP";
+			else if (sas_device->device_info &
+			    MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+				ds = "SATA";
+		}
+
+		sdev_printk(KERN_INFO, sdev, "%s: handle(0x%04x), "
+		    "sas_addr(0x%016llx), device_name(0x%016llx)\n",
+		    ds, sas_device->handle,
+		    (unsigned long long)sas_device->sas_address,
+		    (unsigned long long)sas_device->device_name);
+		sdev_printk(KERN_INFO, sdev, "%s: "
+		    "enclosure_logical_id(0x%016llx), slot(%d)\n", ds,
+		    (unsigned long long) sas_device->enclosure_logical_id,
+		    sas_device->slot);
+
+		if (!ssp_target)
+			scsih_display_sata_capabilities(ioc, sas_device, sdev);
+	}
+
+	scsih_change_queue_depth(sdev, qdepth);
+
+	if (ssp_target)
+		sas_read_port_mode_page(sdev);
+	return 0;
+}
+
+/**
+ * scsih_bios_param - fetch head, sector, cylinder info for a disk
+ * @sdev: scsi device struct
+ * @bdev: pointer to block device context
+ * @capacity: device size (in 512 byte sectors)
+ * @params: three element array to place output:
+ *              params[0] number of heads (max 255)
+ *              params[1] number of sectors (max 63)
+ *              params[2] number of cylinders
+ *
+ * Return nothing.
+ */
+static int
+scsih_bios_param(struct scsi_device *sdev, struct block_device *bdev,
+    sector_t capacity, int params[])
+{
+	int		heads;
+	int		sectors;
+	sector_t	cylinders;
+	ulong 		dummy;
+
+	heads = 64;
+	sectors = 32;
+
+	dummy = heads * sectors;
+	cylinders = capacity;
+	sector_div(cylinders, dummy);
+
+	/*
+	 * Handle extended translation size for logical drives
+	 * > 1Gb
+	 */
+	if ((ulong)capacity >= 0x200000) {
+		heads = 255;
+		sectors = 63;
+		dummy = heads * sectors;
+		cylinders = capacity;
+		sector_div(cylinders, dummy);
+	}
+
+	/* return result */
+	params[0] = heads;
+	params[1] = sectors;
+	params[2] = cylinders;
+
+	return 0;
+}
+
+/**
+ * _scsih_response_code - translation of device response code
+ * @ioc: per adapter object
+ * @response_code: response code returned by the device
+ *
+ * Return nothing.
+ */
+static void
+_scsih_response_code(struct MPT2SAS_ADAPTER *ioc, u8 response_code)
+{
+	char *desc;
+
+	switch (response_code) {
+	case MPI2_SCSITASKMGMT_RSP_TM_COMPLETE:
+		desc = "task management request completed";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_INVALID_FRAME:
+		desc = "invalid frame";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_TM_NOT_SUPPORTED:
+		desc = "task management request not supported";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_TM_FAILED:
+		desc = "task management request failed";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_TM_SUCCEEDED:
+		desc = "task management request succeeded";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_TM_INVALID_LUN:
+		desc = "invalid lun";
+		break;
+	case 0xA:
+		desc = "overlapped tag attempted";
+		break;
+	case MPI2_SCSITASKMGMT_RSP_IO_QUEUED_ON_IOC:
+		desc = "task queued, however not sent to target";
+		break;
+	default:
+		desc = "unknown";
+		break;
+	}
+	printk(MPT2SAS_WARN_FMT "response_code(0x%01x): %s\n",
+		ioc->name, response_code, desc);
+}
+
+/**
+ * scsih_tm_done - tm completion routine
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ * Context: none.
+ *
+ * The callback handler when using scsih_issue_tm.
+ *
+ * Return nothing.
+ */
+static void
+scsih_tm_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+
+	if (ioc->tm_cmds.status == MPT2_CMD_NOT_USED)
+		return;
+	if (ioc->tm_cmds.smid != smid)
+		return;
+	ioc->tm_cmds.status |= MPT2_CMD_COMPLETE;
+	mpi_reply =  mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (mpi_reply) {
+		memcpy(ioc->tm_cmds.reply, mpi_reply, mpi_reply->MsgLength*4);
+		ioc->tm_cmds.status |= MPT2_CMD_REPLY_VALID;
+	}
+	ioc->tm_cmds.status &= ~MPT2_CMD_PENDING;
+	complete(&ioc->tm_cmds.done);
+}
+
+/**
+ * mpt2sas_scsih_set_tm_flag - set per target tm_busy
+ * @ioc: per adapter object
+ * @handle: device handle
+ *
+ * During taskmangement request, we need to freeze the device queue.
+ */
+void
+mpt2sas_scsih_set_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct scsi_device *sdev;
+	u8 skip = 0;
+
+	shost_for_each_device(sdev, ioc->shost) {
+		if (skip)
+			continue;
+		sas_device_priv_data = sdev->hostdata;
+		if (!sas_device_priv_data)
+			continue;
+		if (sas_device_priv_data->sas_target->handle == handle) {
+			sas_device_priv_data->sas_target->tm_busy = 1;
+			skip = 1;
+			ioc->ignore_loginfos = 1;
+		}
+	}
+}
+
+/**
+ * mpt2sas_scsih_clear_tm_flag - clear per target tm_busy
+ * @ioc: per adapter object
+ * @handle: device handle
+ *
+ * During taskmangement request, we need to freeze the device queue.
+ */
+void
+mpt2sas_scsih_clear_tm_flag(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct scsi_device *sdev;
+	u8 skip = 0;
+
+	shost_for_each_device(sdev, ioc->shost) {
+		if (skip)
+			continue;
+		sas_device_priv_data = sdev->hostdata;
+		if (!sas_device_priv_data)
+			continue;
+		if (sas_device_priv_data->sas_target->handle == handle) {
+			sas_device_priv_data->sas_target->tm_busy = 0;
+			skip = 1;
+			ioc->ignore_loginfos = 0;
+		}
+	}
+}
+
+/**
+ * mpt2sas_scsih_issue_tm - main routine for sending tm requests
+ * @ioc: per adapter struct
+ * @device_handle: device handle
+ * @lun: lun number
+ * @type: MPI2_SCSITASKMGMT_TASKTYPE__XXX (defined in mpi2_init.h)
+ * @smid_task: smid assigned to the task
+ * @timeout: timeout in seconds
+ * Context: The calling function needs to acquire the tm_cmds.mutex
+ *
+ * A generic API for sending task management requests to firmware.
+ *
+ * The ioc->tm_cmds.status flag should be MPT2_CMD_NOT_USED before calling
+ * this API.
+ *
+ * The callback index is set inside `ioc->tm_cb_idx`.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_scsih_issue_tm(struct MPT2SAS_ADAPTER *ioc, u16 handle, uint lun,
+    u8 type, u16 smid_task, ulong timeout)
+{
+	Mpi2SCSITaskManagementRequest_t *mpi_request;
+	Mpi2SCSITaskManagementReply_t *mpi_reply;
+	u16 smid = 0;
+	u32 ioc_state;
+	unsigned long timeleft;
+	u8 VF_ID = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->tm_cmds.status != MPT2_CMD_NOT_USED ||
+	    ioc->shost_recovery) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		printk(MPT2SAS_INFO_FMT "%s: host reset in progress!\n",
+		    __func__, ioc->name);
+		return;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 0);
+	if (ioc_state & MPI2_DOORBELL_USED) {
+		dhsprintk(ioc, printk(MPT2SAS_DEBUG_FMT "unexpected doorbell "
+		    "active!\n", ioc->name));
+		goto issue_host_reset;
+	}
+
+	if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) {
+		mpt2sas_base_fault_info(ioc, ioc_state &
+		    MPI2_DOORBELL_DATA_MASK);
+		goto issue_host_reset;
+	}
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->tm_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		return;
+	}
+
+	dtmprintk(ioc, printk(MPT2SAS_INFO_FMT "sending tm: handle(0x%04x),"
+	    " task_type(0x%02x), smid(%d)\n", ioc->name, handle, type, smid));
+	ioc->tm_cmds.status = MPT2_CMD_PENDING;
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->tm_cmds.smid = smid;
+	memset(mpi_request, 0, sizeof(Mpi2SCSITaskManagementRequest_t));
+	mpi_request->Function = MPI2_FUNCTION_SCSI_TASK_MGMT;
+	mpi_request->DevHandle = cpu_to_le16(handle);
+	mpi_request->TaskType = type;
+	mpi_request->TaskMID = cpu_to_le16(smid_task);
+	int_to_scsilun(lun, (struct scsi_lun *)mpi_request->LUN);
+	mpt2sas_scsih_set_tm_flag(ioc, handle);
+	mpt2sas_base_put_smid_hi_priority(ioc, smid, VF_ID);
+	timeleft = wait_for_completion_timeout(&ioc->tm_cmds.done, timeout*HZ);
+	mpt2sas_scsih_clear_tm_flag(ioc, handle);
+	if (!(ioc->tm_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2SCSITaskManagementRequest_t)/4);
+		if (!(ioc->tm_cmds.status & MPT2_CMD_RESET))
+			goto issue_host_reset;
+	}
+
+	if (ioc->tm_cmds.status & MPT2_CMD_REPLY_VALID) {
+		mpi_reply = ioc->tm_cmds.reply;
+		dtmprintk(ioc, printk(MPT2SAS_INFO_FMT "complete tm: "
+		    "ioc_status(0x%04x), loginfo(0x%08x), term_count(0x%08x)\n",
+		    ioc->name, le16_to_cpu(mpi_reply->IOCStatus),
+		    le32_to_cpu(mpi_reply->IOCLogInfo),
+		    le32_to_cpu(mpi_reply->TerminationCount)));
+		if (ioc->logging_level & MPT_DEBUG_TM)
+			_scsih_response_code(ioc, mpi_reply->ResponseCode);
+	}
+	return;
+ issue_host_reset:
+	mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, FORCE_BIG_HAMMER);
+}
+
+/**
+ * scsih_abort - eh threads main abort routine
+ * @sdev: scsi device struct
+ *
+ * Returns SUCCESS if command aborted else FAILED
+ */
+static int
+scsih_abort(struct scsi_cmnd *scmd)
+{
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	u16 smid;
+	u16 handle;
+	int r;
+	struct scsi_cmnd *scmd_lookup;
+
+	printk(MPT2SAS_INFO_FMT "attempting task abort! scmd(%p)\n",
+	    ioc->name, scmd);
+	scsi_print_command(scmd);
+
+	sas_device_priv_data = scmd->device->hostdata;
+	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
+		printk(MPT2SAS_INFO_FMT "device been deleted! scmd(%p)\n",
+		    ioc->name, scmd);
+		scmd->result = DID_NO_CONNECT << 16;
+		scmd->scsi_done(scmd);
+		r = SUCCESS;
+		goto out;
+	}
+
+	/* search for the command */
+	smid = _scsih_scsi_lookup_find_by_scmd(ioc, scmd);
+	if (!smid) {
+		scmd->result = DID_RESET << 16;
+		r = SUCCESS;
+		goto out;
+	}
+
+	/* for hidden raid components and volumes this is not supported */
+	if (sas_device_priv_data->sas_target->flags &
+	    MPT_TARGET_FLAGS_RAID_COMPONENT ||
+	    sas_device_priv_data->sas_target->flags & MPT_TARGET_FLAGS_VOLUME) {
+		scmd->result = DID_RESET << 16;
+		r = FAILED;
+		goto out;
+	}
+
+	mutex_lock(&ioc->tm_cmds.mutex);
+	handle = sas_device_priv_data->sas_target->handle;
+	mpt2sas_scsih_issue_tm(ioc, handle, sas_device_priv_data->lun,
+	    MPI2_SCSITASKMGMT_TASKTYPE_ABORT_TASK, smid, 30);
+
+	/* sanity check - see whether command actually completed */
+	scmd_lookup = _scsih_scsi_lookup_get(ioc, smid);
+	if (scmd_lookup && (scmd_lookup->serial_number == scmd->serial_number))
+		r = FAILED;
+	else
+		r = SUCCESS;
+	ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->tm_cmds.mutex);
+
+ out:
+	printk(MPT2SAS_INFO_FMT "task abort: %s scmd(%p)\n",
+	    ioc->name, ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+	return r;
+}
+
+
+/**
+ * scsih_dev_reset - eh threads main device reset routine
+ * @sdev: scsi device struct
+ *
+ * Returns SUCCESS if command aborted else FAILED
+ */
+static int
+scsih_dev_reset(struct scsi_cmnd *scmd)
+{
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	u16	handle;
+	int r;
+
+	printk(MPT2SAS_INFO_FMT "attempting target reset! scmd(%p)\n",
+	    ioc->name, scmd);
+	scsi_print_command(scmd);
+
+	sas_device_priv_data = scmd->device->hostdata;
+	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
+		printk(MPT2SAS_INFO_FMT "device been deleted! scmd(%p)\n",
+		    ioc->name, scmd);
+		scmd->result = DID_NO_CONNECT << 16;
+		scmd->scsi_done(scmd);
+		r = SUCCESS;
+		goto out;
+	}
+
+	/* for hidden raid components obtain the volume_handle */
+	handle = 0;
+	if (sas_device_priv_data->sas_target->flags &
+	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
+		spin_lock_irqsave(&ioc->sas_device_lock, flags);
+		sas_device = _scsih_sas_device_find_by_handle(ioc,
+		   sas_device_priv_data->sas_target->handle);
+		if (sas_device)
+			handle = sas_device->volume_handle;
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	} else
+		handle = sas_device_priv_data->sas_target->handle;
+
+	if (!handle) {
+		scmd->result = DID_RESET << 16;
+		r = FAILED;
+		goto out;
+	}
+
+	mutex_lock(&ioc->tm_cmds.mutex);
+	mpt2sas_scsih_issue_tm(ioc, handle, 0,
+	    MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, 30);
+
+	/*
+	 *  sanity check see whether all commands to this target been
+	 *  completed
+	 */
+	if (_scsih_scsi_lookup_find_by_target(ioc, scmd->device->id,
+	    scmd->device->channel))
+		r = FAILED;
+	else
+		r = SUCCESS;
+	ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->tm_cmds.mutex);
+
+ out:
+	printk(MPT2SAS_INFO_FMT "target reset: %s scmd(%p)\n",
+	    ioc->name, ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+	return r;
+}
+
+/**
+ * scsih_abort - eh threads main host reset routine
+ * @sdev: scsi device struct
+ *
+ * Returns SUCCESS if command aborted else FAILED
+ */
+static int
+scsih_host_reset(struct scsi_cmnd *scmd)
+{
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
+	int r, retval;
+
+	printk(MPT2SAS_INFO_FMT "attempting host reset! scmd(%p)\n",
+	    ioc->name, scmd);
+	scsi_print_command(scmd);
+
+	retval = mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+	    FORCE_BIG_HAMMER);
+	r = (retval < 0) ? FAILED : SUCCESS;
+	printk(MPT2SAS_INFO_FMT "host reset: %s scmd(%p)\n",
+	    ioc->name, ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+	return r;
+}
+
+/**
+ * _scsih_fw_event_add - insert and queue up fw_event
+ * @ioc: per adapter object
+ * @fw_event: object describing the event
+ * Context: This function will acquire ioc->fw_event_lock.
+ *
+ * This adds the firmware event object into link list, then queues it up to
+ * be processed from user context.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_fw_event_add(struct MPT2SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
+{
+	unsigned long flags;
+
+	if (ioc->firmware_event_thread == NULL)
+		return;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	list_add_tail(&fw_event->list, &ioc->fw_event_list);
+	INIT_DELAYED_WORK(&fw_event->work, _firmware_event_work);
+	queue_delayed_work(ioc->firmware_event_thread, &fw_event->work, 1);
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+}
+
+/**
+ * _scsih_fw_event_free - delete fw_event
+ * @ioc: per adapter object
+ * @fw_event: object describing the event
+ * Context: This function will acquire ioc->fw_event_lock.
+ *
+ * This removes firmware event object from link list, frees associated memory.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_fw_event_free(struct MPT2SAS_ADAPTER *ioc, struct fw_event_work
+    *fw_event)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	list_del(&fw_event->list);
+	kfree(fw_event->event_data);
+	kfree(fw_event);
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+}
+
+/**
+ * _scsih_fw_event_add - requeue an event
+ * @ioc: per adapter object
+ * @fw_event: object describing the event
+ * Context: This function will acquire ioc->fw_event_lock.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_fw_event_requeue(struct MPT2SAS_ADAPTER *ioc, struct fw_event_work
+    *fw_event, unsigned long delay)
+{
+	unsigned long flags;
+	if (ioc->firmware_event_thread == NULL)
+		return;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	queue_delayed_work(ioc->firmware_event_thread, &fw_event->work, delay);
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+}
+
+/**
+ * _scsih_fw_event_off - turn flag off preventing event handling
+ * @ioc: per adapter object
+ *
+ * Used to prevent handling of firmware events during adapter reset
+ * driver unload.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_fw_event_off(struct MPT2SAS_ADAPTER *ioc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	ioc->fw_events_off = 1;
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+}
+
+/**
+ * _scsih_fw_event_on - turn flag on allowing firmware event handling
+ * @ioc: per adapter object
+ *
+ * Returns nothing.
+ */
+static void
+_scsih_fw_event_on(struct MPT2SAS_ADAPTER *ioc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	ioc->fw_events_off = 0;
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+}
+
+/**
+ * _scsih_ublock_io_device - set the device state to SDEV_RUNNING
+ * @ioc: per adapter object
+ * @handle: device handle
+ *
+ * During device pull we need to appropiately set the sdev state.
+ */
+static void
+_scsih_ublock_io_device(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, ioc->shost) {
+		sas_device_priv_data = sdev->hostdata;
+		if (!sas_device_priv_data)
+			continue;
+		if (!sas_device_priv_data->block)
+			continue;
+		if (sas_device_priv_data->sas_target->handle == handle) {
+			dewtprintk(ioc, sdev_printk(KERN_INFO, sdev,
+			    MPT2SAS_INFO_FMT "SDEV_RUNNING: "
+			    "handle(0x%04x)\n", ioc->name, handle));
+			sas_device_priv_data->block = 0;
+			scsi_device_set_state(sdev, SDEV_RUNNING);
+		}
+	}
+}
+
+/**
+ * _scsih_block_io_device - set the device state to SDEV_BLOCK
+ * @ioc: per adapter object
+ * @handle: device handle
+ *
+ * During device pull we need to appropiately set the sdev state.
+ */
+static void
+_scsih_block_io_device(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct scsi_device *sdev;
+
+	shost_for_each_device(sdev, ioc->shost) {
+		sas_device_priv_data = sdev->hostdata;
+		if (!sas_device_priv_data)
+			continue;
+		if (sas_device_priv_data->block)
+			continue;
+		if (sas_device_priv_data->sas_target->handle == handle) {
+			dewtprintk(ioc, sdev_printk(KERN_INFO, sdev,
+			    MPT2SAS_INFO_FMT "SDEV_BLOCK: "
+			    "handle(0x%04x)\n", ioc->name, handle));
+			sas_device_priv_data->block = 1;
+			scsi_device_set_state(sdev, SDEV_BLOCK);
+		}
+	}
+}
+
+/**
+ * _scsih_block_io_to_children_attached_to_ex
+ * @ioc: per adapter object
+ * @sas_expander: the sas_device object
+ *
+ * This routine set sdev state to SDEV_BLOCK for all devices
+ * attached to this expander. This function called when expander is
+ * pulled.
+ */
+static void
+_scsih_block_io_to_children_attached_to_ex(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_node *sas_expander)
+{
+	struct _sas_port *mpt2sas_port;
+	struct _sas_device *sas_device;
+	struct _sas_node *expander_sibling;
+	unsigned long flags;
+
+	if (!sas_expander)
+		return;
+
+	list_for_each_entry(mpt2sas_port,
+	   &sas_expander->sas_port_list, port_list) {
+		if (mpt2sas_port->remote_identify.device_type ==
+		    SAS_END_DEVICE) {
+			spin_lock_irqsave(&ioc->sas_device_lock, flags);
+			sas_device =
+			    mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+			   mpt2sas_port->remote_identify.sas_address);
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			if (!sas_device)
+				continue;
+			_scsih_block_io_device(ioc, sas_device->handle);
+		}
+	}
+
+	list_for_each_entry(mpt2sas_port,
+	   &sas_expander->sas_port_list, port_list) {
+
+		if (mpt2sas_port->remote_identify.device_type ==
+		    MPI2_SAS_DEVICE_INFO_EDGE_EXPANDER ||
+		    mpt2sas_port->remote_identify.device_type ==
+		    MPI2_SAS_DEVICE_INFO_FANOUT_EXPANDER) {
+
+			spin_lock_irqsave(&ioc->sas_node_lock, flags);
+			expander_sibling =
+			    mpt2sas_scsih_expander_find_by_sas_address(
+			    ioc, mpt2sas_port->remote_identify.sas_address);
+			spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+			_scsih_block_io_to_children_attached_to_ex(ioc,
+			    expander_sibling);
+		}
+	}
+}
+
+/**
+ * _scsih_block_io_to_children_attached_directly
+ * @ioc: per adapter object
+ * @event_data: topology change event data
+ *
+ * This routine set sdev state to SDEV_BLOCK for all devices
+ * direct attached during device pull.
+ */
+static void
+_scsih_block_io_to_children_attached_directly(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataSasTopologyChangeList_t *event_data)
+{
+	int i;
+	u16 handle;
+	u16 reason_code;
+	u8 phy_number;
+
+	for (i = 0; i < event_data->NumEntries; i++) {
+		handle = le16_to_cpu(event_data->PHY[i].AttachedDevHandle);
+		if (!handle)
+			continue;
+		phy_number = event_data->StartPhyNum + i;
+		reason_code = event_data->PHY[i].PhyStatus &
+		    MPI2_EVENT_SAS_TOPO_RC_MASK;
+		if (reason_code == MPI2_EVENT_SAS_TOPO_RC_DELAY_NOT_RESPONDING)
+			_scsih_block_io_device(ioc, handle);
+	}
+}
+
+/**
+ * _scsih_check_topo_delete_events - sanity check on topo events
+ * @ioc: per adapter object
+ * @event_data: the event data payload
+ *
+ * This routine added to better handle cable breaker.
+ *
+ * This handles the case where driver recieves multiple expander
+ * add and delete events in a single shot.  When there is a delete event
+ * the routine will void any pending add events waiting in the event queue.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_check_topo_delete_events(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataSasTopologyChangeList_t *event_data)
+{
+	struct fw_event_work *fw_event;
+	Mpi2EventDataSasTopologyChangeList_t *local_event_data;
+	u16 expander_handle;
+	struct _sas_node *sas_expander;
+	unsigned long flags;
+
+	expander_handle = le16_to_cpu(event_data->ExpanderDevHandle);
+	if (expander_handle < ioc->sas_hba.num_phys) {
+		_scsih_block_io_to_children_attached_directly(ioc, event_data);
+		return;
+	}
+
+	if (event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_DELAY_NOT_RESPONDING
+	 || event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_NOT_RESPONDING) {
+		spin_lock_irqsave(&ioc->sas_node_lock, flags);
+		sas_expander = mpt2sas_scsih_expander_find_by_handle(ioc,
+		    expander_handle);
+		spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+		_scsih_block_io_to_children_attached_to_ex(ioc, sas_expander);
+	} else if (event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_RESPONDING)
+		_scsih_block_io_to_children_attached_directly(ioc, event_data);
+
+	if (event_data->ExpStatus != MPI2_EVENT_SAS_TOPO_ES_NOT_RESPONDING)
+		return;
+
+	/* mark ignore flag for pending events */
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	list_for_each_entry(fw_event, &ioc->fw_event_list, list) {
+		if (fw_event->event != MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST ||
+		    fw_event->ignore)
+			continue;
+		local_event_data = fw_event->event_data;
+		if (local_event_data->ExpStatus ==
+		    MPI2_EVENT_SAS_TOPO_ES_ADDED ||
+		    local_event_data->ExpStatus ==
+		    MPI2_EVENT_SAS_TOPO_ES_RESPONDING) {
+			if (le16_to_cpu(local_event_data->ExpanderDevHandle) ==
+			    expander_handle) {
+				dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+				    "setting ignoring flag\n", ioc->name));
+				fw_event->ignore = 1;
+			}
+		}
+	}
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+}
+
+/**
+ * _scsih_queue_rescan - queue a topology rescan from user context
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+static void
+_scsih_queue_rescan(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct fw_event_work *fw_event;
+
+	if (ioc->wait_for_port_enable_to_complete)
+		return;
+	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	if (!fw_event)
+		return;
+	fw_event->event = MPT2SAS_RESCAN_AFTER_HOST_RESET;
+	fw_event->ioc = ioc;
+	_scsih_fw_event_add(ioc, fw_event);
+}
+
+/**
+ * _scsih_flush_running_cmds - completing outstanding commands.
+ * @ioc: per adapter object
+ *
+ * The flushing out of all pending scmd commands following host reset,
+ * where all IO is dropped to the floor.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_flush_running_cmds(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct scsi_cmnd *scmd;
+	u16 smid;
+	u16 count = 0;
+
+	for (smid = 1; smid <= ioc->request_depth; smid++) {
+		scmd = _scsih_scsi_lookup_getclear(ioc, smid);
+		if (!scmd)
+			continue;
+		count++;
+		mpt2sas_base_free_smid(ioc, smid);
+		scsi_dma_unmap(scmd);
+		scmd->result = DID_RESET << 16;
+		scmd->scsi_done(scmd);
+	}
+	dtmprintk(ioc, printk(MPT2SAS_INFO_FMT "completing %d cmds\n",
+	    ioc->name, count));
+}
+
+/**
+ * mpt2sas_scsih_reset_handler - reset callback handler (for scsih)
+ * @ioc: per adapter object
+ * @reset_phase: phase
+ *
+ * The handler for doing any required cleanup or initialization.
+ *
+ * The reset phase can be MPT2_IOC_PRE_RESET, MPT2_IOC_AFTER_RESET,
+ * MPT2_IOC_DONE_RESET
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase)
+{
+	switch (reset_phase) {
+	case MPT2_IOC_PRE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_PRE_RESET\n", ioc->name, __func__));
+		_scsih_fw_event_off(ioc);
+		break;
+	case MPT2_IOC_AFTER_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_AFTER_RESET\n", ioc->name, __func__));
+		if (ioc->tm_cmds.status & MPT2_CMD_PENDING) {
+			ioc->tm_cmds.status |= MPT2_CMD_RESET;
+			mpt2sas_base_free_smid(ioc, ioc->tm_cmds.smid);
+			complete(&ioc->tm_cmds.done);
+		}
+		_scsih_fw_event_on(ioc);
+		_scsih_flush_running_cmds(ioc);
+		break;
+	case MPT2_IOC_DONE_RESET:
+		dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: "
+		    "MPT2_IOC_DONE_RESET\n", ioc->name, __func__));
+		_scsih_queue_rescan(ioc);
+		break;
+	}
+}
+
+/**
+ * scsih_qcmd - main scsi request entry point
+ * @scmd: pointer to scsi command object
+ * @done: function pointer to be invoked on completion
+ *
+ * The callback index is set inside `ioc->scsi_io_cb_idx`.
+ *
+ * Returns 0 on success.  If there's a failure, return either:
+ * SCSI_MLQUEUE_DEVICE_BUSY if the device queue is full, or
+ * SCSI_MLQUEUE_HOST_BUSY if the entire host queue is full
+ */
+static int
+scsih_qcmd(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
+{
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	Mpi2SCSIIORequest_t *mpi_request;
+	u32 mpi_control;
+	u16 smid;
+	unsigned long flags;
+
+	scmd->scsi_done = done;
+	sas_device_priv_data = scmd->device->hostdata;
+	if (!sas_device_priv_data) {
+		scmd->result = DID_NO_CONNECT << 16;
+		scmd->scsi_done(scmd);
+		return 0;
+	}
+
+	sas_target_priv_data = sas_device_priv_data->sas_target;
+	if (!sas_target_priv_data || sas_target_priv_data->handle ==
+	    MPT2SAS_INVALID_DEVICE_HANDLE || sas_target_priv_data->deleted) {
+		scmd->result = DID_NO_CONNECT << 16;
+		scmd->scsi_done(scmd);
+		return 0;
+	}
+
+	/* see if we are busy with task managment stuff */
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (sas_target_priv_data->tm_busy ||
+	    ioc->shost_recovery || ioc->ioc_link_reset_in_progress) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		return SCSI_MLQUEUE_HOST_BUSY;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	if (scmd->sc_data_direction == DMA_FROM_DEVICE)
+		mpi_control = MPI2_SCSIIO_CONTROL_READ;
+	else if (scmd->sc_data_direction == DMA_TO_DEVICE)
+		mpi_control = MPI2_SCSIIO_CONTROL_WRITE;
+	else
+		mpi_control = MPI2_SCSIIO_CONTROL_NODATATRANSFER;
+
+	/* set tags */
+	if (!(sas_device_priv_data->flags & MPT_DEVICE_FLAGS_INIT)) {
+		if (scmd->device->tagged_supported) {
+			if (scmd->device->ordered_tags)
+				mpi_control |= MPI2_SCSIIO_CONTROL_ORDEREDQ;
+			else
+				mpi_control |= MPI2_SCSIIO_CONTROL_SIMPLEQ;
+		} else
+/* MPI Revision I (UNIT = 0xA) - removed MPI2_SCSIIO_CONTROL_UNTAGGED */
+/*			mpi_control |= MPI2_SCSIIO_CONTROL_UNTAGGED;
+ */
+			mpi_control |= (0x500);
+
+	} else
+		mpi_control |= MPI2_SCSIIO_CONTROL_SIMPLEQ;
+
+	if ((sas_device_priv_data->flags & MPT_DEVICE_TLR_ON))
+		mpi_control |= MPI2_SCSIIO_CONTROL_TLR_ON;
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->scsi_io_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		goto out;
+	}
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	memset(mpi_request, 0, sizeof(Mpi2SCSIIORequest_t));
+	mpi_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
+	if (sas_device_priv_data->sas_target->flags &
+	    MPT_TARGET_FLAGS_RAID_COMPONENT)
+		mpi_request->Function = MPI2_FUNCTION_RAID_SCSI_IO_PASSTHROUGH;
+	else
+		mpi_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
+	mpi_request->DevHandle =
+	    cpu_to_le16(sas_device_priv_data->sas_target->handle);
+	mpi_request->DataLength = cpu_to_le32(scsi_bufflen(scmd));
+	mpi_request->Control = cpu_to_le32(mpi_control);
+	mpi_request->IoFlags = cpu_to_le16(scmd->cmd_len);
+	mpi_request->MsgFlags = MPI2_SCSIIO_MSGFLAGS_SYSTEM_SENSE_ADDR;
+	mpi_request->SenseBufferLength = SCSI_SENSE_BUFFERSIZE;
+	mpi_request->SenseBufferLowAddress =
+	    (u32)mpt2sas_base_get_sense_buffer_dma(ioc, smid);
+	mpi_request->SGLOffset0 = offsetof(Mpi2SCSIIORequest_t, SGL) / 4;
+	mpi_request->SGLFlags = cpu_to_le16(MPI2_SCSIIO_SGLFLAGS_TYPE_MPI +
+	    MPI2_SCSIIO_SGLFLAGS_SYSTEM_ADDR);
+
+	int_to_scsilun(sas_device_priv_data->lun, (struct scsi_lun *)
+	    mpi_request->LUN);
+	memcpy(mpi_request->CDB.CDB32, scmd->cmnd, scmd->cmd_len);
+
+	if (!mpi_request->DataLength) {
+		mpt2sas_base_build_zero_len_sge(ioc, &mpi_request->SGL);
+	} else {
+		if (_scsih_build_scatter_gather(ioc, scmd, smid)) {
+			mpt2sas_base_free_smid(ioc, smid);
+			goto out;
+		}
+	}
+
+	_scsih_scsi_lookup_set(ioc, smid, scmd);
+	mpt2sas_base_put_smid_scsi_io(ioc, smid, 0,
+	    sas_device_priv_data->sas_target->handle);
+	return 0;
+
+ out:
+	return SCSI_MLQUEUE_HOST_BUSY;
+}
+
+/**
+ * _scsih_normalize_sense - normalize descriptor and fixed format sense data
+ * @sense_buffer: sense data returned by target
+ * @data: normalized skey/asc/ascq
+ *
+ * Return nothing.
+ */
+static void
+_scsih_normalize_sense(char *sense_buffer, struct sense_info *data)
+{
+	if ((sense_buffer[0] & 0x7F) >= 0x72) {
+		/* descriptor format */
+		data->skey = sense_buffer[1] & 0x0F;
+		data->asc = sense_buffer[2];
+		data->ascq = sense_buffer[3];
+	} else {
+		/* fixed format */
+		data->skey = sense_buffer[2] & 0x0F;
+		data->asc = sense_buffer[12];
+		data->ascq = sense_buffer[13];
+	}
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_scsi_ioc_info - translated non-succesfull SCSI_IO request
+ * @ioc: per adapter object
+ * @scmd: pointer to scsi command object
+ * @mpi_reply: reply mf payload returned from firmware
+ *
+ * scsi_status - SCSI Status code returned from target device
+ * scsi_state - state info associated with SCSI_IO determined by ioc
+ * ioc_status - ioc supplied status info
+ *
+ * Return nothing.
+ */
+static void
+_scsih_scsi_ioc_info(struct MPT2SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
+    Mpi2SCSIIOReply_t *mpi_reply, u16 smid)
+{
+	u32 response_info;
+	u8 *response_bytes;
+	u16 ioc_status = le16_to_cpu(mpi_reply->IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	u8 scsi_state = mpi_reply->SCSIState;
+	u8 scsi_status = mpi_reply->SCSIStatus;
+	char *desc_ioc_state = NULL;
+	char *desc_scsi_status = NULL;
+	char *desc_scsi_state = ioc->tmp_string;
+
+	switch (ioc_status) {
+	case MPI2_IOCSTATUS_SUCCESS:
+		desc_ioc_state = "success";
+		break;
+	case MPI2_IOCSTATUS_INVALID_FUNCTION:
+		desc_ioc_state = "invalid function";
+		break;
+	case MPI2_IOCSTATUS_SCSI_RECOVERED_ERROR:
+		desc_ioc_state = "scsi recovered error";
+		break;
+	case MPI2_IOCSTATUS_SCSI_INVALID_DEVHANDLE:
+		desc_ioc_state = "scsi invalid dev handle";
+		break;
+	case MPI2_IOCSTATUS_SCSI_DEVICE_NOT_THERE:
+		desc_ioc_state = "scsi device not there";
+		break;
+	case MPI2_IOCSTATUS_SCSI_DATA_OVERRUN:
+		desc_ioc_state = "scsi data overrun";
+		break;
+	case MPI2_IOCSTATUS_SCSI_DATA_UNDERRUN:
+		desc_ioc_state = "scsi data underrun";
+		break;
+	case MPI2_IOCSTATUS_SCSI_IO_DATA_ERROR:
+		desc_ioc_state = "scsi io data error";
+		break;
+	case MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR:
+		desc_ioc_state = "scsi protocol error";
+		break;
+	case MPI2_IOCSTATUS_SCSI_TASK_TERMINATED:
+		desc_ioc_state = "scsi task terminated";
+		break;
+	case MPI2_IOCSTATUS_SCSI_RESIDUAL_MISMATCH:
+		desc_ioc_state = "scsi residual mismatch";
+		break;
+	case MPI2_IOCSTATUS_SCSI_TASK_MGMT_FAILED:
+		desc_ioc_state = "scsi task mgmt failed";
+		break;
+	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
+		desc_ioc_state = "scsi ioc terminated";
+		break;
+	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
+		desc_ioc_state = "scsi ext terminated";
+		break;
+	default:
+		desc_ioc_state = "unknown";
+		break;
+	}
+
+	switch (scsi_status) {
+	case MPI2_SCSI_STATUS_GOOD:
+		desc_scsi_status = "good";
+		break;
+	case MPI2_SCSI_STATUS_CHECK_CONDITION:
+		desc_scsi_status = "check condition";
+		break;
+	case MPI2_SCSI_STATUS_CONDITION_MET:
+		desc_scsi_status = "condition met";
+		break;
+	case MPI2_SCSI_STATUS_BUSY:
+		desc_scsi_status = "busy";
+		break;
+	case MPI2_SCSI_STATUS_INTERMEDIATE:
+		desc_scsi_status = "intermediate";
+		break;
+	case MPI2_SCSI_STATUS_INTERMEDIATE_CONDMET:
+		desc_scsi_status = "intermediate condmet";
+		break;
+	case MPI2_SCSI_STATUS_RESERVATION_CONFLICT:
+		desc_scsi_status = "reservation conflict";
+		break;
+	case MPI2_SCSI_STATUS_COMMAND_TERMINATED:
+		desc_scsi_status = "command terminated";
+		break;
+	case MPI2_SCSI_STATUS_TASK_SET_FULL:
+		desc_scsi_status = "task set full";
+		break;
+	case MPI2_SCSI_STATUS_ACA_ACTIVE:
+		desc_scsi_status = "aca active";
+		break;
+	case MPI2_SCSI_STATUS_TASK_ABORTED:
+		desc_scsi_status = "task aborted";
+		break;
+	default:
+		desc_scsi_status = "unknown";
+		break;
+	}
+
+	desc_scsi_state[0] = '\0';
+	if (!scsi_state)
+		desc_scsi_state = " ";
+	if (scsi_state & MPI2_SCSI_STATE_RESPONSE_INFO_VALID)
+		strcat(desc_scsi_state, "response info ");
+	if (scsi_state & MPI2_SCSI_STATE_TERMINATED)
+		strcat(desc_scsi_state, "state terminated ");
+	if (scsi_state & MPI2_SCSI_STATE_NO_SCSI_STATUS)
+		strcat(desc_scsi_state, "no status ");
+	if (scsi_state & MPI2_SCSI_STATE_AUTOSENSE_FAILED)
+		strcat(desc_scsi_state, "autosense failed ");
+	if (scsi_state & MPI2_SCSI_STATE_AUTOSENSE_VALID)
+		strcat(desc_scsi_state, "autosense valid ");
+
+	scsi_print_command(scmd);
+	printk(MPT2SAS_WARN_FMT "\tdev handle(0x%04x), "
+	    "ioc_status(%s)(0x%04x), smid(%d)\n", ioc->name,
+	    le16_to_cpu(mpi_reply->DevHandle), desc_ioc_state,
+		ioc_status, smid);
+	printk(MPT2SAS_WARN_FMT "\trequest_len(%d), underflow(%d), "
+	    "resid(%d)\n", ioc->name, scsi_bufflen(scmd), scmd->underflow,
+	    scsi_get_resid(scmd));
+	printk(MPT2SAS_WARN_FMT "\ttag(%d), transfer_count(%d), "
+	    "sc->result(0x%08x)\n", ioc->name, le16_to_cpu(mpi_reply->TaskTag),
+	    le32_to_cpu(mpi_reply->TransferCount), scmd->result);
+	printk(MPT2SAS_WARN_FMT "\tscsi_status(%s)(0x%02x), "
+	    "scsi_state(%s)(0x%02x)\n", ioc->name, desc_scsi_status,
+	    scsi_status, desc_scsi_state, scsi_state);
+
+	if (scsi_state & MPI2_SCSI_STATE_AUTOSENSE_VALID) {
+		struct sense_info data;
+		_scsih_normalize_sense(scmd->sense_buffer, &data);
+		printk(MPT2SAS_WARN_FMT "\t[sense_key,asc,ascq]: "
+		    "[0x%02x,0x%02x,0x%02x]\n", ioc->name, data.skey,
+		    data.asc, data.ascq);
+	}
+
+	if (scsi_state & MPI2_SCSI_STATE_RESPONSE_INFO_VALID) {
+		response_info = le32_to_cpu(mpi_reply->ResponseInfo);
+		response_bytes = (u8 *)&response_info;
+		_scsih_response_code(ioc, response_bytes[3]);
+	}
+}
+#endif
+
+/**
+ * _scsih_smart_predicted_fault - illuminate Fault LED
+ * @ioc: per adapter object
+ * @handle: device handle
+ *
+ * Return nothing.
+ */
+static void
+_scsih_smart_predicted_fault(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	Mpi2SepReply_t mpi_reply;
+	Mpi2SepRequest_t mpi_request;
+	struct scsi_target *starget;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	Mpi2EventNotificationReply_t *event_reply;
+	Mpi2EventDataSasDeviceStatusChange_t *event_data;
+	struct _sas_device *sas_device;
+	ssize_t sz;
+	unsigned long flags;
+
+	/* only handle non-raid devices */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	if (!sas_device) {
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		return;
+	}
+	starget = sas_device->starget;
+	sas_target_priv_data = starget->hostdata;
+
+	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
+	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		return;
+	}
+	starget_printk(KERN_WARNING, starget, "predicted fault\n");
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	if (ioc->pdev->subsystem_vendor == PCI_VENDOR_ID_IBM) {
+		memset(&mpi_request, 0, sizeof(Mpi2SepRequest_t));
+		mpi_request.Function = MPI2_FUNCTION_SCSI_ENCLOSURE_PROCESSOR;
+		mpi_request.Action = MPI2_SEP_REQ_ACTION_WRITE_STATUS;
+		mpi_request.SlotStatus =
+		    MPI2_SEP_REQ_SLOTSTATUS_PREDICTED_FAULT;
+		mpi_request.DevHandle = cpu_to_le16(handle);
+		mpi_request.Flags = MPI2_SEP_REQ_FLAGS_DEVHANDLE_ADDRESS;
+		if ((mpt2sas_base_scsi_enclosure_processor(ioc, &mpi_reply,
+		    &mpi_request)) != 0) {
+			printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+			    ioc->name, __FILE__, __LINE__, __func__);
+			return;
+		}
+
+		if (mpi_reply.IOCStatus || mpi_reply.IOCLogInfo) {
+			dewtprintk(ioc, printk(MPT2SAS_INFO_FMT
+			    "enclosure_processor: ioc_status (0x%04x), "
+			    "loginfo(0x%08x)\n", ioc->name,
+			    le16_to_cpu(mpi_reply.IOCStatus),
+			    le32_to_cpu(mpi_reply.IOCLogInfo)));
+			return;
+		}
+	}
+
+	/* insert into event log */
+	sz = offsetof(Mpi2EventNotificationReply_t, EventData) +
+	     sizeof(Mpi2EventDataSasDeviceStatusChange_t);
+	event_reply = kzalloc(sz, GFP_KERNEL);
+	if (!event_reply) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
+	event_reply->Event =
+	    cpu_to_le16(MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE);
+	event_reply->MsgLength = sz/4;
+	event_reply->EventDataLength =
+	    cpu_to_le16(sizeof(Mpi2EventDataSasDeviceStatusChange_t)/4);
+	event_data = (Mpi2EventDataSasDeviceStatusChange_t *)
+	    event_reply->EventData;
+	event_data->ReasonCode = MPI2_EVENT_SAS_DEV_STAT_RC_SMART_DATA;
+	event_data->ASC = 0x5D;
+	event_data->DevHandle = cpu_to_le16(handle);
+	event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
+	mpt2sas_ctl_add_to_event_log(ioc, event_reply);
+	kfree(event_reply);
+}
+
+/**
+ * scsih_io_done - scsi request callback
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ *
+ * Callback handler when using scsih_qcmd.
+ *
+ * Return nothing.
+ */
+static void
+scsih_io_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID, u32 reply)
+{
+	Mpi2SCSIIORequest_t *mpi_request;
+	Mpi2SCSIIOReply_t *mpi_reply;
+	struct scsi_cmnd *scmd;
+	u16 ioc_status;
+	u32 xfer_cnt;
+	u8 scsi_state;
+	u8 scsi_status;
+	u32 log_info;
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	u32 response_code;
+
+	mpi_reply = mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	scmd = _scsih_scsi_lookup_getclear(ioc, smid);
+	if (scmd == NULL)
+		return;
+
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+
+	if (mpi_reply == NULL) {
+		scmd->result = DID_OK << 16;
+		goto out;
+	}
+
+	sas_device_priv_data = scmd->device->hostdata;
+	if (!sas_device_priv_data || !sas_device_priv_data->sas_target ||
+	     sas_device_priv_data->sas_target->deleted) {
+		scmd->result = DID_NO_CONNECT << 16;
+		goto out;
+	}
+
+	/* turning off TLR */
+	if (!sas_device_priv_data->tlr_snoop_check) {
+		sas_device_priv_data->tlr_snoop_check++;
+		if (sas_device_priv_data->flags & MPT_DEVICE_TLR_ON) {
+			response_code = (le32_to_cpu(mpi_reply->ResponseInfo)
+			    >> 24);
+			if (response_code ==
+			    MPI2_SCSITASKMGMT_RSP_INVALID_FRAME)
+				sas_device_priv_data->flags &=
+				    ~MPT_DEVICE_TLR_ON;
+		}
+	}
+
+	xfer_cnt = le32_to_cpu(mpi_reply->TransferCount);
+	scsi_set_resid(scmd, scsi_bufflen(scmd) - xfer_cnt);
+	ioc_status = le16_to_cpu(mpi_reply->IOCStatus);
+	if (ioc_status & MPI2_IOCSTATUS_FLAG_LOG_INFO_AVAILABLE)
+		log_info =  le32_to_cpu(mpi_reply->IOCLogInfo);
+	else
+		log_info = 0;
+	ioc_status &= MPI2_IOCSTATUS_MASK;
+	scsi_state = mpi_reply->SCSIState;
+	scsi_status = mpi_reply->SCSIStatus;
+
+	if (ioc_status == MPI2_IOCSTATUS_SCSI_DATA_UNDERRUN && xfer_cnt == 0 &&
+	    (scsi_status == MPI2_SCSI_STATUS_BUSY ||
+	     scsi_status == MPI2_SCSI_STATUS_RESERVATION_CONFLICT ||
+	     scsi_status == MPI2_SCSI_STATUS_TASK_SET_FULL)) {
+		ioc_status = MPI2_IOCSTATUS_SUCCESS;
+	}
+
+	if (scsi_state & MPI2_SCSI_STATE_AUTOSENSE_VALID) {
+		struct sense_info data;
+		const void *sense_data = mpt2sas_base_get_sense_buffer(ioc,
+		    smid);
+		memcpy(scmd->sense_buffer, sense_data,
+		    le32_to_cpu(mpi_reply->SenseCount));
+		_scsih_normalize_sense(scmd->sense_buffer, &data);
+		/* failure prediction threshold exceeded */
+		if (data.asc == 0x5D)
+			_scsih_smart_predicted_fault(ioc,
+			    le16_to_cpu(mpi_reply->DevHandle));
+	}
+
+	switch (ioc_status) {
+	case MPI2_IOCSTATUS_BUSY:
+	case MPI2_IOCSTATUS_INSUFFICIENT_RESOURCES:
+		scmd->result = SAM_STAT_BUSY;
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_DEVICE_NOT_THERE:
+		scmd->result = DID_NO_CONNECT << 16;
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
+		if (sas_device_priv_data->block) {
+			scmd->result = (DID_BUS_BUSY << 16);
+			break;
+		}
+
+	case MPI2_IOCSTATUS_SCSI_TASK_TERMINATED:
+	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
+		scmd->result = DID_RESET << 16;
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_RESIDUAL_MISMATCH:
+		if ((xfer_cnt == 0) || (scmd->underflow > xfer_cnt))
+			scmd->result = DID_SOFT_ERROR << 16;
+		else
+			scmd->result = (DID_OK << 16) | scsi_status;
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_DATA_UNDERRUN:
+		scmd->result = (DID_OK << 16) | scsi_status;
+
+		if ((scsi_state & MPI2_SCSI_STATE_AUTOSENSE_VALID))
+			break;
+
+		if (xfer_cnt < scmd->underflow) {
+			if (scsi_status == SAM_STAT_BUSY)
+				scmd->result = SAM_STAT_BUSY;
+			else
+				scmd->result = DID_SOFT_ERROR << 16;
+		} else if (scsi_state & (MPI2_SCSI_STATE_AUTOSENSE_FAILED |
+		     MPI2_SCSI_STATE_NO_SCSI_STATUS))
+			scmd->result = DID_SOFT_ERROR << 16;
+		else if (scsi_state & MPI2_SCSI_STATE_TERMINATED)
+			scmd->result = DID_RESET << 16;
+		else if (!xfer_cnt && scmd->cmnd[0] == REPORT_LUNS) {
+			mpi_reply->SCSIState = MPI2_SCSI_STATE_AUTOSENSE_VALID;
+			mpi_reply->SCSIStatus = SAM_STAT_CHECK_CONDITION;
+			scmd->result = (DRIVER_SENSE << 24) |
+			    SAM_STAT_CHECK_CONDITION;
+			scmd->sense_buffer[0] = 0x70;
+			scmd->sense_buffer[2] = ILLEGAL_REQUEST;
+			scmd->sense_buffer[12] = 0x20;
+			scmd->sense_buffer[13] = 0;
+		}
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_DATA_OVERRUN:
+		scsi_set_resid(scmd, 0);
+	case MPI2_IOCSTATUS_SCSI_RECOVERED_ERROR:
+	case MPI2_IOCSTATUS_SUCCESS:
+		scmd->result = (DID_OK << 16) | scsi_status;
+		if (scsi_state & (MPI2_SCSI_STATE_AUTOSENSE_FAILED |
+		     MPI2_SCSI_STATE_NO_SCSI_STATUS))
+			scmd->result = DID_SOFT_ERROR << 16;
+		else if (scsi_state & MPI2_SCSI_STATE_TERMINATED)
+			scmd->result = DID_RESET << 16;
+		break;
+
+	case MPI2_IOCSTATUS_SCSI_PROTOCOL_ERROR:
+	case MPI2_IOCSTATUS_INVALID_FUNCTION:
+	case MPI2_IOCSTATUS_INVALID_SGL:
+	case MPI2_IOCSTATUS_INTERNAL_ERROR:
+	case MPI2_IOCSTATUS_INVALID_FIELD:
+	case MPI2_IOCSTATUS_INVALID_STATE:
+	case MPI2_IOCSTATUS_SCSI_IO_DATA_ERROR:
+	case MPI2_IOCSTATUS_SCSI_TASK_MGMT_FAILED:
+	default:
+		scmd->result = DID_SOFT_ERROR << 16;
+		break;
+
+	}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (scmd->result && (ioc->logging_level & MPT_DEBUG_REPLY))
+		_scsih_scsi_ioc_info(ioc , scmd, mpi_reply, smid);
+#endif
+
+ out:
+	scsi_dma_unmap(scmd);
+	scmd->scsi_done(scmd);
+}
+
+/**
+ * _scsih_link_change - process phy link changes
+ * @ioc: per adapter object
+ * @handle: phy handle
+ * @attached_handle: valid for devices attached to link
+ * @phy_number: phy number
+ * @link_rate: new link rate
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_link_change(struct MPT2SAS_ADAPTER *ioc, u16 handle, u16 attached_handle,
+   u8 phy_number, u8 link_rate)
+{
+	mpt2sas_transport_update_phy_link_change(ioc, handle, attached_handle,
+	    phy_number, link_rate);
+}
+
+/**
+ * _scsih_sas_host_refresh - refreshing sas host object contents
+ * @ioc: per adapter object
+ * @update: update link information
+ * Context: user
+ *
+ * During port enable, fw will send topology events for every device. Its
+ * possible that the handles may change from the previous setting, so this
+ * code keeping handles updating if changed.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_host_refresh(struct MPT2SAS_ADAPTER *ioc, u8 update)
+{
+	u16 sz;
+	u16 ioc_status;
+	int i;
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasIOUnitPage0_t *sas_iounit_pg0 = NULL;
+
+	dtmprintk(ioc, printk(MPT2SAS_INFO_FMT
+	    "updating handles for sas_host(0x%016llx)\n",
+	    ioc->name, (unsigned long long)ioc->sas_hba.sas_address));
+
+	sz = offsetof(Mpi2SasIOUnitPage0_t, PhyData) + (ioc->sas_hba.num_phys
+	    * sizeof(Mpi2SasIOUnit0PhyData_t));
+	sas_iounit_pg0 = kzalloc(sz, GFP_KERNEL);
+	if (!sas_iounit_pg0) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+	if (!(mpt2sas_config_get_sas_iounit_pg0(ioc, &mpi_reply,
+	    sas_iounit_pg0, sz))) {
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status != MPI2_IOCSTATUS_SUCCESS)
+			goto out;
+		for (i = 0; i < ioc->sas_hba.num_phys ; i++) {
+			ioc->sas_hba.phy[i].handle =
+			    le16_to_cpu(sas_iounit_pg0->PhyData[i].
+				ControllerDevHandle);
+			if (update)
+				_scsih_link_change(ioc,
+				    ioc->sas_hba.phy[i].handle,
+				    le16_to_cpu(sas_iounit_pg0->PhyData[i].
+				    AttachedDevHandle), i,
+				    sas_iounit_pg0->PhyData[i].
+				    NegotiatedLinkRate >> 4);
+		}
+	}
+
+ out:
+	kfree(sas_iounit_pg0);
+}
+
+/**
+ * _scsih_sas_host_add - create sas host object
+ * @ioc: per adapter object
+ *
+ * Creating host side data object, stored in ioc->sas_hba
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_host_add(struct MPT2SAS_ADAPTER *ioc)
+{
+	int i;
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasIOUnitPage0_t *sas_iounit_pg0 = NULL;
+	Mpi2SasIOUnitPage1_t *sas_iounit_pg1 = NULL;
+	Mpi2SasPhyPage0_t phy_pg0;
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	Mpi2SasEnclosurePage0_t enclosure_pg0;
+	u16 ioc_status;
+	u16 sz;
+	u16 device_missing_delay;
+
+	mpt2sas_config_get_number_hba_phys(ioc, &ioc->sas_hba.num_phys);
+	if (!ioc->sas_hba.num_phys) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	/* sas_iounit page 0 */
+	sz = offsetof(Mpi2SasIOUnitPage0_t, PhyData) + (ioc->sas_hba.num_phys *
+	    sizeof(Mpi2SasIOUnit0PhyData_t));
+	sas_iounit_pg0 = kzalloc(sz, GFP_KERNEL);
+	if (!sas_iounit_pg0) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+	if ((mpt2sas_config_get_sas_iounit_pg0(ioc, &mpi_reply,
+	    sas_iounit_pg0, sz))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+
+	/* sas_iounit page 1 */
+	sz = offsetof(Mpi2SasIOUnitPage1_t, PhyData) + (ioc->sas_hba.num_phys *
+	    sizeof(Mpi2SasIOUnit1PhyData_t));
+	sas_iounit_pg1 = kzalloc(sz, GFP_KERNEL);
+	if (!sas_iounit_pg1) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+	if ((mpt2sas_config_get_sas_iounit_pg1(ioc, &mpi_reply,
+	    sas_iounit_pg1, sz))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+
+	ioc->io_missing_delay =
+	    le16_to_cpu(sas_iounit_pg1->IODeviceMissingDelay);
+	device_missing_delay =
+	    le16_to_cpu(sas_iounit_pg1->ReportDeviceMissingDelay);
+	if (device_missing_delay & MPI2_SASIOUNIT1_REPORT_MISSING_UNIT_16)
+		ioc->device_missing_delay = (device_missing_delay &
+		    MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK) * 16;
+	else
+		ioc->device_missing_delay = device_missing_delay &
+		    MPI2_SASIOUNIT1_REPORT_MISSING_TIMEOUT_MASK;
+
+	ioc->sas_hba.parent_dev = &ioc->shost->shost_gendev;
+	ioc->sas_hba.phy = kcalloc(ioc->sas_hba.num_phys,
+	    sizeof(struct _sas_phy), GFP_KERNEL);
+	if (!ioc->sas_hba.phy) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+	for (i = 0; i < ioc->sas_hba.num_phys ; i++) {
+		if ((mpt2sas_config_get_phy_pg0(ioc, &mpi_reply, &phy_pg0,
+		    i))) {
+			printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+			    ioc->name, __FILE__, __LINE__, __func__);
+			goto out;
+		}
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+			printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+			    ioc->name, __FILE__, __LINE__, __func__);
+			goto out;
+		}
+		ioc->sas_hba.phy[i].handle =
+		    le16_to_cpu(sas_iounit_pg0->PhyData[i].ControllerDevHandle);
+		ioc->sas_hba.phy[i].phy_id = i;
+		mpt2sas_transport_add_host_phy(ioc, &ioc->sas_hba.phy[i],
+		    phy_pg0, ioc->sas_hba.parent_dev);
+	}
+	if ((mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
+	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, ioc->sas_hba.phy[0].handle))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out;
+	}
+	ioc->sas_hba.handle = le16_to_cpu(sas_device_pg0.DevHandle);
+	ioc->sas_hba.enclosure_handle =
+	    le16_to_cpu(sas_device_pg0.EnclosureHandle);
+	ioc->sas_hba.sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
+	printk(MPT2SAS_INFO_FMT "host_add: handle(0x%04x), "
+	    "sas_addr(0x%016llx), phys(%d)\n", ioc->name, ioc->sas_hba.handle,
+	    (unsigned long long) ioc->sas_hba.sas_address,
+	    ioc->sas_hba.num_phys) ;
+
+	if (ioc->sas_hba.enclosure_handle) {
+		if (!(mpt2sas_config_get_enclosure_pg0(ioc, &mpi_reply,
+		    &enclosure_pg0,
+		   MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE,
+		   ioc->sas_hba.enclosure_handle))) {
+			ioc->sas_hba.enclosure_logical_id =
+			    le64_to_cpu(enclosure_pg0.EnclosureLogicalID);
+		}
+	}
+
+ out:
+	kfree(sas_iounit_pg1);
+	kfree(sas_iounit_pg0);
+}
+
+/**
+ * _scsih_expander_add -  creating expander object
+ * @ioc: per adapter object
+ * @handle: expander handle
+ *
+ * Creating expander object, stored in ioc->sas_expander_list.
+ *
+ * Return 0 for success, else error.
+ */
+static int
+_scsih_expander_add(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_node *sas_expander;
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2ExpanderPage0_t expander_pg0;
+	Mpi2ExpanderPage1_t expander_pg1;
+	Mpi2SasEnclosurePage0_t enclosure_pg0;
+	u32 ioc_status;
+	u16 parent_handle;
+	__le64 sas_address;
+	int i;
+	unsigned long flags;
+	struct _sas_port *mpt2sas_port;
+	int rc = 0;
+
+	if (!handle)
+		return -1;
+
+	if ((mpt2sas_config_get_expander_pg0(ioc, &mpi_reply, &expander_pg0,
+	    MPI2_SAS_EXPAND_PGAD_FORM_HNDL, handle))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	/* handle out of order topology events */
+	parent_handle = le16_to_cpu(expander_pg0.ParentDevHandle);
+	if (parent_handle >= ioc->sas_hba.num_phys) {
+		spin_lock_irqsave(&ioc->sas_node_lock, flags);
+		sas_expander = mpt2sas_scsih_expander_find_by_handle(ioc,
+		    parent_handle);
+		spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+		if (!sas_expander) {
+			rc = _scsih_expander_add(ioc, parent_handle);
+			if (rc != 0)
+				return rc;
+		}
+	}
+
+	sas_address = le64_to_cpu(expander_pg0.SASAddress);
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_expander = mpt2sas_scsih_expander_find_by_sas_address(ioc,
+	    sas_address);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+
+	if (sas_expander)
+		return 0;
+
+	sas_expander = kzalloc(sizeof(struct _sas_node),
+	    GFP_KERNEL);
+	if (!sas_expander) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	sas_expander->handle = handle;
+	sas_expander->num_phys = expander_pg0.NumPhys;
+	sas_expander->parent_handle = parent_handle;
+	sas_expander->enclosure_handle =
+	    le16_to_cpu(expander_pg0.EnclosureHandle);
+	sas_expander->sas_address = sas_address;
+
+	printk(MPT2SAS_INFO_FMT "expander_add: handle(0x%04x),"
+	    " parent(0x%04x), sas_addr(0x%016llx), phys(%d)\n", ioc->name,
+	    handle, sas_expander->parent_handle, (unsigned long long)
+	    sas_expander->sas_address, sas_expander->num_phys);
+
+	if (!sas_expander->num_phys)
+		goto out_fail;
+	sas_expander->phy = kcalloc(sas_expander->num_phys,
+	    sizeof(struct _sas_phy), GFP_KERNEL);
+	if (!sas_expander->phy) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		rc = -1;
+		goto out_fail;
+	}
+
+	INIT_LIST_HEAD(&sas_expander->sas_port_list);
+	mpt2sas_port = mpt2sas_transport_port_add(ioc, handle,
+	    sas_expander->parent_handle);
+	if (!mpt2sas_port) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		rc = -1;
+		goto out_fail;
+	}
+	sas_expander->parent_dev = &mpt2sas_port->rphy->dev;
+
+	for (i = 0 ; i < sas_expander->num_phys ; i++) {
+		if ((mpt2sas_config_get_expander_pg1(ioc, &mpi_reply,
+		    &expander_pg1, i, handle))) {
+			printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+			    ioc->name, __FILE__, __LINE__, __func__);
+			continue;
+		}
+		sas_expander->phy[i].handle = handle;
+		sas_expander->phy[i].phy_id = i;
+		mpt2sas_transport_add_expander_phy(ioc, &sas_expander->phy[i],
+		    expander_pg1, sas_expander->parent_dev);
+	}
+
+	if (sas_expander->enclosure_handle) {
+		if (!(mpt2sas_config_get_enclosure_pg0(ioc, &mpi_reply,
+		    &enclosure_pg0, MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE,
+		   sas_expander->enclosure_handle))) {
+			sas_expander->enclosure_logical_id =
+			    le64_to_cpu(enclosure_pg0.EnclosureLogicalID);
+		}
+	}
+
+	_scsih_expander_node_add(ioc, sas_expander);
+	 return 0;
+
+ out_fail:
+
+	if (sas_expander)
+		kfree(sas_expander->phy);
+	kfree(sas_expander);
+	return rc;
+}
+
+/**
+ * _scsih_expander_remove - removing expander object
+ * @ioc: per adapter object
+ * @handle: expander handle
+ *
+ * Return nothing.
+ */
+static void
+_scsih_expander_remove(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_node *sas_expander;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_expander = mpt2sas_scsih_expander_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+	_scsih_expander_node_remove(ioc, sas_expander);
+}
+
+/**
+ * _scsih_add_device -  creating sas device object
+ * @ioc: per adapter object
+ * @handle: sas device handle
+ * @phy_num: phy number end device attached to
+ * @is_pd: is this hidden raid component
+ *
+ * Creating end device object, stored in ioc->sas_device_list.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_scsih_add_device(struct MPT2SAS_ADAPTER *ioc, u16 handle, u8 phy_num, u8 is_pd)
+{
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	Mpi2SasEnclosurePage0_t enclosure_pg0;
+	struct _sas_device *sas_device;
+	u32 ioc_status;
+	__le64 sas_address;
+	u32 device_info;
+	unsigned long flags;
+
+	if ((mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
+	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	/* check if device is present */
+	if (!(le16_to_cpu(sas_device_pg0.Flags) &
+	    MPI2_SAS_DEVICE0_FLAGS_DEVICE_PRESENT)) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		printk(MPT2SAS_ERR_FMT "Flags = 0x%04x\n",
+		    ioc->name, le16_to_cpu(sas_device_pg0.Flags));
+		return -1;
+	}
+
+	/* check if there were any issus with discovery */
+	if (sas_device_pg0.AccessStatus ==
+	    MPI2_SAS_DEVICE0_ASTATUS_SATA_INIT_FAILED) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		printk(MPT2SAS_ERR_FMT "AccessStatus = 0x%02x\n",
+		    ioc->name, sas_device_pg0.AccessStatus);
+		return -1;
+	}
+
+	/* check if this is end device */
+	device_info = le32_to_cpu(sas_device_pg0.DeviceInfo);
+	if (!(_scsih_is_end_device(device_info))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+	    sas_address);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	if (sas_device) {
+		_scsih_ublock_io_device(ioc, handle);
+		return 0;
+	}
+
+	sas_device = kzalloc(sizeof(struct _sas_device),
+	    GFP_KERNEL);
+	if (!sas_device) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	sas_device->handle = handle;
+	sas_device->parent_handle =
+	    le16_to_cpu(sas_device_pg0.ParentDevHandle);
+	sas_device->enclosure_handle =
+	    le16_to_cpu(sas_device_pg0.EnclosureHandle);
+	sas_device->slot =
+	    le16_to_cpu(sas_device_pg0.Slot);
+	sas_device->device_info = device_info;
+	sas_device->sas_address = sas_address;
+	sas_device->hidden_raid_component = is_pd;
+
+	/* get enclosure_logical_id */
+	if (!(mpt2sas_config_get_enclosure_pg0(ioc, &mpi_reply, &enclosure_pg0,
+	   MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE,
+	   sas_device->enclosure_handle))) {
+		sas_device->enclosure_logical_id =
+		    le64_to_cpu(enclosure_pg0.EnclosureLogicalID);
+	}
+
+	/* get device name */
+	sas_device->device_name = le64_to_cpu(sas_device_pg0.DeviceName);
+
+	if (ioc->wait_for_port_enable_to_complete)
+		_scsih_sas_device_init_add(ioc, sas_device);
+	else
+		_scsih_sas_device_add(ioc, sas_device);
+
+	return 0;
+}
+
+/**
+ * _scsih_remove_device -  removing sas device object
+ * @ioc: per adapter object
+ * @handle: sas device handle
+ *
+ * Return nothing.
+ */
+static void
+_scsih_remove_device(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	Mpi2SasIoUnitControlReply_t mpi_reply;
+	Mpi2SasIoUnitControlRequest_t mpi_request;
+	u16 device_handle;
+
+	/* lookup sas_device */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	if (!sas_device) {
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		return;
+	}
+
+	dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: enter: handle"
+	    "(0x%04x)\n", ioc->name, __func__, handle));
+
+	if (sas_device->starget && sas_device->starget->hostdata) {
+		sas_target_priv_data = sas_device->starget->hostdata;
+		sas_target_priv_data->deleted = 1;
+	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	if (ioc->remove_host)
+		goto out;
+
+	/* Target Reset to flush out all the outstanding IO */
+	device_handle = (sas_device->hidden_raid_component) ?
+	    sas_device->volume_handle : handle;
+	if (device_handle) {
+		dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "issue target reset: "
+		    "handle(0x%04x)\n", ioc->name, device_handle));
+		mutex_lock(&ioc->tm_cmds.mutex);
+		mpt2sas_scsih_issue_tm(ioc, device_handle, 0,
+		    MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET, 0, 10);
+		ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+		mutex_unlock(&ioc->tm_cmds.mutex);
+		dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "issue target reset "
+		    "done: handle(0x%04x)\n", ioc->name, device_handle));
+	}
+
+	/* SAS_IO_UNIT_CNTR - send REMOVE_DEVICE */
+	dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "sas_iounit: handle"
+	    "(0x%04x)\n", ioc->name, handle));
+	memset(&mpi_request, 0, sizeof(Mpi2SasIoUnitControlRequest_t));
+	mpi_request.Function = MPI2_FUNCTION_SAS_IO_UNIT_CONTROL;
+	mpi_request.Operation = MPI2_SAS_OP_REMOVE_DEVICE;
+	mpi_request.DevHandle = handle;
+	mpi_request.VF_ID = 0;
+	if ((mpt2sas_base_sas_iounit_control(ioc, &mpi_reply,
+	    &mpi_request)) != 0) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+	}
+
+	dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "sas_iounit: ioc_status"
+	    "(0x%04x), loginfo(0x%08x)\n", ioc->name,
+	    le16_to_cpu(mpi_reply.IOCStatus),
+	    le32_to_cpu(mpi_reply.IOCLogInfo)));
+
+ out:
+	mpt2sas_transport_port_remove(ioc, sas_device->sas_address,
+	    sas_device->parent_handle);
+
+	printk(MPT2SAS_INFO_FMT "removing handle(0x%04x), sas_addr"
+	    "(0x%016llx)\n", ioc->name, sas_device->handle,
+	    (unsigned long long) sas_device->sas_address);
+	_scsih_sas_device_remove(ioc, sas_device);
+
+	dewtprintk(ioc, printk(MPT2SAS_INFO_FMT "%s: exit: handle"
+	    "(0x%04x)\n", ioc->name, __func__, handle));
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_sas_topology_change_event_debug - debug for topology event
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ */
+static void
+_scsih_sas_topology_change_event_debug(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataSasTopologyChangeList_t *event_data)
+{
+	int i;
+	u16 handle;
+	u16 reason_code;
+	u8 phy_number;
+	char *status_str = NULL;
+	char link_rate[25];
+
+	switch (event_data->ExpStatus) {
+	case MPI2_EVENT_SAS_TOPO_ES_ADDED:
+		status_str = "add";
+		break;
+	case MPI2_EVENT_SAS_TOPO_ES_NOT_RESPONDING:
+		status_str = "remove";
+		break;
+	case MPI2_EVENT_SAS_TOPO_ES_RESPONDING:
+		status_str =  "responding";
+		break;
+	case MPI2_EVENT_SAS_TOPO_ES_DELAY_NOT_RESPONDING:
+		status_str = "remove delay";
+		break;
+	default:
+		status_str = "unknown status";
+		break;
+	}
+	printk(MPT2SAS_DEBUG_FMT "sas topology change: (%s)\n",
+	    ioc->name, status_str);
+	printk(KERN_DEBUG "\thandle(0x%04x), enclosure_handle(0x%04x) "
+	    "start_phy(%02d), count(%d)\n",
+	    le16_to_cpu(event_data->ExpanderDevHandle),
+	    le16_to_cpu(event_data->EnclosureHandle),
+	    event_data->StartPhyNum, event_data->NumEntries);
+	for (i = 0; i < event_data->NumEntries; i++) {
+		handle = le16_to_cpu(event_data->PHY[i].AttachedDevHandle);
+		if (!handle)
+			continue;
+		phy_number = event_data->StartPhyNum + i;
+		reason_code = event_data->PHY[i].PhyStatus &
+		    MPI2_EVENT_SAS_TOPO_RC_MASK;
+		switch (reason_code) {
+		case MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED:
+			snprintf(link_rate, 25, ": add, link(0x%02x)",
+			    (event_data->PHY[i].LinkRate >> 4));
+			status_str = link_rate;
+			break;
+		case MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING:
+			status_str = ": remove";
+			break;
+		case MPI2_EVENT_SAS_TOPO_RC_DELAY_NOT_RESPONDING:
+			status_str = ": remove_delay";
+			break;
+		case MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED:
+			snprintf(link_rate, 25, ": link(0x%02x)",
+			    (event_data->PHY[i].LinkRate >> 4));
+			status_str = link_rate;
+			break;
+		case MPI2_EVENT_SAS_TOPO_RC_NO_CHANGE:
+			status_str = ": responding";
+			break;
+		default:
+			status_str = ": unknown";
+			break;
+		}
+		printk(KERN_DEBUG "\tphy(%02d), attached_handle(0x%04x)%s\n",
+		    phy_number, handle, status_str);
+	}
+}
+#endif
+
+/**
+ * _scsih_sas_topology_change_event - handle topology changes
+ * @ioc: per adapter object
+ * @VF_ID:
+ * @event_data: event data payload
+ * fw_event:
+ * Context: user.
+ *
+ */
+static void
+_scsih_sas_topology_change_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataSasTopologyChangeList_t *event_data,
+    struct fw_event_work *fw_event)
+{
+	int i;
+	u16 parent_handle, handle;
+	u16 reason_code;
+	u8 phy_number;
+	struct _sas_node *sas_expander;
+	unsigned long flags;
+	u8 link_rate_;
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK)
+		_scsih_sas_topology_change_event_debug(ioc, event_data);
+#endif
+
+	if (!ioc->sas_hba.num_phys)
+		_scsih_sas_host_add(ioc);
+	else
+		_scsih_sas_host_refresh(ioc, 0);
+
+	if (fw_event->ignore) {
+		dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "ignoring expander "
+		    "event\n", ioc->name));
+		return;
+	}
+
+	parent_handle = le16_to_cpu(event_data->ExpanderDevHandle);
+
+	/* handle expander add */
+	if (event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_ADDED)
+		if (_scsih_expander_add(ioc, parent_handle) != 0)
+			return;
+
+	/* handle siblings events */
+	for (i = 0; i < event_data->NumEntries; i++) {
+		if (fw_event->ignore) {
+			dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "ignoring "
+			    "expander event\n", ioc->name));
+			return;
+		}
+		if (event_data->PHY[i].PhyStatus &
+		    MPI2_EVENT_SAS_TOPO_PHYSTATUS_VACANT)
+			continue;
+		handle = le16_to_cpu(event_data->PHY[i].AttachedDevHandle);
+		if (!handle)
+			continue;
+		phy_number = event_data->StartPhyNum + i;
+		reason_code = event_data->PHY[i].PhyStatus &
+		    MPI2_EVENT_SAS_TOPO_RC_MASK;
+		link_rate_ = event_data->PHY[i].LinkRate >> 4;
+		switch (reason_code) {
+		case MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED:
+		case MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED:
+			if (!parent_handle) {
+				if (phy_number < ioc->sas_hba.num_phys)
+					_scsih_link_change(ioc,
+					   ioc->sas_hba.phy[phy_number].handle,
+					   handle, phy_number, link_rate_);
+			} else {
+				spin_lock_irqsave(&ioc->sas_node_lock, flags);
+				sas_expander =
+				    mpt2sas_scsih_expander_find_by_handle(ioc,
+					parent_handle);
+				spin_unlock_irqrestore(&ioc->sas_node_lock,
+				    flags);
+				if (sas_expander) {
+					if (phy_number < sas_expander->num_phys)
+						_scsih_link_change(ioc,
+						   sas_expander->
+						   phy[phy_number].handle,
+						   handle, phy_number,
+						   link_rate_);
+				}
+			}
+			if (reason_code == MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED) {
+				if (link_rate_ >= MPI2_SAS_NEG_LINK_RATE_1_5)
+					_scsih_ublock_io_device(ioc, handle);
+			}
+			if (reason_code == MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED) {
+				if (link_rate_ < MPI2_SAS_NEG_LINK_RATE_1_5)
+					break;
+				_scsih_add_device(ioc, handle, phy_number, 0);
+			}
+			break;
+		case MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING:
+			_scsih_remove_device(ioc, handle);
+			break;
+		}
+	}
+
+	/* handle expander removal */
+	if (event_data->ExpStatus == MPI2_EVENT_SAS_TOPO_ES_NOT_RESPONDING)
+		_scsih_expander_remove(ioc, parent_handle);
+
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_sas_device_status_change_event_debug - debug for device event
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_device_status_change_event_debug(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataSasDeviceStatusChange_t *event_data)
+{
+	char *reason_str = NULL;
+
+	switch (event_data->ReasonCode) {
+	case MPI2_EVENT_SAS_DEV_STAT_RC_SMART_DATA:
+		reason_str = "smart data";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_UNSUPPORTED:
+		reason_str = "unsupported device discovered";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET:
+		reason_str = "internal device reset";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_TASK_ABORT_INTERNAL:
+		reason_str = "internal task abort";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_ABORT_TASK_SET_INTERNAL:
+		reason_str = "internal task abort set";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_CLEAR_TASK_SET_INTERNAL:
+		reason_str = "internal clear task set";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_QUERY_TASK_INTERNAL:
+		reason_str = "internal query task";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_SATA_INIT_FAILURE:
+		reason_str = "sata init failure";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_CMP_INTERNAL_DEV_RESET:
+		reason_str = "internal device reset complete";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_CMP_TASK_ABORT_INTERNAL:
+		reason_str = "internal task abort complete";
+		break;
+	case MPI2_EVENT_SAS_DEV_STAT_RC_ASYNC_NOTIFICATION:
+		reason_str = "internal async notification";
+		break;
+	default:
+		reason_str = "unknown reason";
+		break;
+	}
+	printk(MPT2SAS_DEBUG_FMT "device status change: (%s)\n"
+	    "\thandle(0x%04x), sas address(0x%016llx)", ioc->name,
+	    reason_str, le16_to_cpu(event_data->DevHandle),
+	    (unsigned long long)le64_to_cpu(event_data->SASAddress));
+	if (event_data->ReasonCode == MPI2_EVENT_SAS_DEV_STAT_RC_SMART_DATA)
+		printk(MPT2SAS_DEBUG_FMT ", ASC(0x%x), ASCQ(0x%x)\n", ioc->name,
+		    event_data->ASC, event_data->ASCQ);
+	printk(KERN_INFO "\n");
+}
+#endif
+
+/**
+ * _scsih_sas_device_status_change_event - handle device status change
+ * @ioc: per adapter object
+ * @VF_ID:
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_device_status_change_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataSasDeviceStatusChange_t *event_data)
+{
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK)
+		_scsih_sas_device_status_change_event_debug(ioc, event_data);
+#endif
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_sas_enclosure_dev_status_change_event_debug - debug for enclosure event
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_enclosure_dev_status_change_event_debug(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataSasEnclDevStatusChange_t *event_data)
+{
+	char *reason_str = NULL;
+
+	switch (event_data->ReasonCode) {
+	case MPI2_EVENT_SAS_ENCL_RC_ADDED:
+		reason_str = "enclosure add";
+		break;
+	case MPI2_EVENT_SAS_ENCL_RC_NOT_RESPONDING:
+		reason_str = "enclosure remove";
+		break;
+	default:
+		reason_str = "unknown reason";
+		break;
+	}
+
+	printk(MPT2SAS_DEBUG_FMT "enclosure status change: (%s)\n"
+	    "\thandle(0x%04x), enclosure logical id(0x%016llx)"
+	    " number slots(%d)\n", ioc->name, reason_str,
+	    le16_to_cpu(event_data->EnclosureHandle),
+	    (unsigned long long)le64_to_cpu(event_data->EnclosureLogicalID),
+	    le16_to_cpu(event_data->StartSlot));
+}
+#endif
+
+/**
+ * _scsih_sas_enclosure_dev_status_change_event - handle enclosure events
+ * @ioc: per adapter object
+ * @VF_ID:
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_enclosure_dev_status_change_event(struct MPT2SAS_ADAPTER *ioc,
+    u8 VF_ID, Mpi2EventDataSasEnclDevStatusChange_t *event_data)
+{
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK)
+		_scsih_sas_enclosure_dev_status_change_event_debug(ioc,
+		     event_data);
+#endif
+}
+
+/**
+ * _scsih_sas_broadcast_primative_event - handle broadcast events
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_broadcast_primative_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataSasBroadcastPrimitive_t *event_data)
+{
+	struct scsi_cmnd *scmd;
+	u16 smid, handle;
+	u32 lun;
+	struct MPT2SAS_DEVICE *sas_device_priv_data;
+	u32 termination_count;
+	u32 query_count;
+	Mpi2SCSITaskManagementReply_t *mpi_reply;
+
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "broadcast primative: "
+	    "phy number(%d), width(%d)\n", ioc->name, event_data->PhyNum,
+	    event_data->PortWidth));
+
+	dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: enter\n", ioc->name,
+	    __func__));
+
+	mutex_lock(&ioc->tm_cmds.mutex);
+	termination_count = 0;
+	query_count = 0;
+	mpi_reply = ioc->tm_cmds.reply;
+	for (smid = 1; smid <= ioc->request_depth; smid++) {
+		scmd = _scsih_scsi_lookup_get(ioc, smid);
+		if (!scmd)
+			continue;
+		sas_device_priv_data = scmd->device->hostdata;
+		if (!sas_device_priv_data || !sas_device_priv_data->sas_target)
+			continue;
+		 /* skip hidden raid components */
+		if (sas_device_priv_data->sas_target->flags &
+		    MPT_TARGET_FLAGS_RAID_COMPONENT)
+			continue;
+		 /* skip volumes */
+		if (sas_device_priv_data->sas_target->flags &
+		    MPT_TARGET_FLAGS_VOLUME)
+			continue;
+
+		handle = sas_device_priv_data->sas_target->handle;
+		lun = sas_device_priv_data->lun;
+		query_count++;
+
+		mpt2sas_scsih_issue_tm(ioc, handle, lun,
+		    MPI2_SCSITASKMGMT_TASKTYPE_QUERY_TASK, smid, 30);
+		termination_count += le32_to_cpu(mpi_reply->TerminationCount);
+
+		if ((mpi_reply->IOCStatus == MPI2_IOCSTATUS_SUCCESS) &&
+		    (mpi_reply->ResponseCode ==
+		     MPI2_SCSITASKMGMT_RSP_TM_SUCCEEDED ||
+		     mpi_reply->ResponseCode ==
+		     MPI2_SCSITASKMGMT_RSP_IO_QUEUED_ON_IOC))
+			continue;
+
+		mpt2sas_scsih_issue_tm(ioc, handle, lun,
+		    MPI2_SCSITASKMGMT_TASKTYPE_ABRT_TASK_SET, smid, 30);
+		termination_count += le32_to_cpu(mpi_reply->TerminationCount);
+	}
+	ioc->tm_cmds.status = MPT2_CMD_NOT_USED;
+	ioc->broadcast_aen_busy = 0;
+	mutex_unlock(&ioc->tm_cmds.mutex);
+
+	dtmprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+	    "%s - exit, query_count = %d termination_count = %d\n",
+	    ioc->name, __func__, query_count, termination_count));
+}
+
+/**
+ * _scsih_sas_discovery_event - handle discovery events
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_discovery_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataSasDiscovery_t *event_data)
+{
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK) {
+		printk(MPT2SAS_DEBUG_FMT "discovery event: (%s)", ioc->name,
+		    (event_data->ReasonCode == MPI2_EVENT_SAS_DISC_RC_STARTED) ?
+		    "start" : "stop");
+	if (event_data->DiscoveryStatus)
+		printk(MPT2SAS_DEBUG_FMT ", discovery_status(0x%08x)",
+		    ioc->name, le32_to_cpu(event_data->DiscoveryStatus));
+	printk("\n");
+	}
+#endif
+
+	if (event_data->ReasonCode == MPI2_EVENT_SAS_DISC_RC_STARTED &&
+	    !ioc->sas_hba.num_phys)
+		_scsih_sas_host_add(ioc);
+}
+
+/**
+ * _scsih_reprobe_lun - reprobing lun
+ * @sdev: scsi device struct
+ * @no_uld_attach: sdev->no_uld_attach flag setting
+ *
+ **/
+static void
+_scsih_reprobe_lun(struct scsi_device *sdev, void *no_uld_attach)
+{
+	int rc;
+
+	sdev->no_uld_attach = no_uld_attach ? 1 : 0;
+	sdev_printk(KERN_INFO, sdev, "%s raid component\n",
+	    sdev->no_uld_attach ? "hidding" : "exposing");
+	rc = scsi_device_reprobe(sdev);
+}
+
+/**
+ * _scsih_reprobe_target - reprobing target
+ * @starget: scsi target struct
+ * @no_uld_attach: sdev->no_uld_attach flag setting
+ *
+ * Note: no_uld_attach flag determines whether the disk device is attached
+ * to block layer. A value of `1` means to not attach.
+ **/
+static void
+_scsih_reprobe_target(struct scsi_target *starget, int no_uld_attach)
+{
+	struct MPT2SAS_TARGET *sas_target_priv_data = starget->hostdata;
+
+	if (no_uld_attach)
+		sas_target_priv_data->flags |= MPT_TARGET_FLAGS_RAID_COMPONENT;
+	else
+		sas_target_priv_data->flags &= ~MPT_TARGET_FLAGS_RAID_COMPONENT;
+
+	starget_for_each_device(starget, no_uld_attach ? (void *)1 : NULL,
+	    _scsih_reprobe_lun);
+}
+/**
+ * _scsih_sas_volume_add - add new volume
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_volume_add(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _raid_device *raid_device;
+	unsigned long flags;
+	u64 wwid;
+	u16 handle = le16_to_cpu(element->VolDevHandle);
+	int rc;
+
+#if 0 /* RAID_HACKS */
+	if (le32_to_cpu(event_data->Flags) &
+	    MPI2_EVENT_IR_CHANGE_FLAGS_FOREIGN_CONFIG)
+		return;
+#endif
+
+	mpt2sas_config_get_volume_wwid(ioc, handle, &wwid);
+	if (!wwid) {
+		printk(MPT2SAS_ERR_FMT
+		    "failure at %s:%d/%s()!\n", ioc->name,
+		    __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	raid_device = _scsih_raid_device_find_by_wwid(ioc, wwid);
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+
+	if (raid_device)
+		return;
+
+	raid_device = kzalloc(sizeof(struct _raid_device), GFP_KERNEL);
+	if (!raid_device) {
+		printk(MPT2SAS_ERR_FMT
+		    "failure at %s:%d/%s()!\n", ioc->name,
+		    __FILE__, __LINE__, __func__);
+		return;
+	}
+
+	raid_device->id = ioc->sas_id++;
+	raid_device->channel = RAID_CHANNEL;
+	raid_device->handle = handle;
+	raid_device->wwid = wwid;
+	_scsih_raid_device_add(ioc, raid_device);
+	if (!ioc->wait_for_port_enable_to_complete) {
+		rc = scsi_add_device(ioc->shost, RAID_CHANNEL,
+		    raid_device->id, 0);
+		if (rc)
+			_scsih_raid_device_remove(ioc, raid_device);
+	} else
+		_scsih_determine_boot_device(ioc, raid_device, 1);
+}
+
+/**
+ * _scsih_sas_volume_delete - delete volume
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_volume_delete(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _raid_device *raid_device;
+	u16 handle = le16_to_cpu(element->VolDevHandle);
+	unsigned long flags;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+
+#if 0 /* RAID_HACKS */
+	if (le32_to_cpu(event_data->Flags) &
+	    MPI2_EVENT_IR_CHANGE_FLAGS_FOREIGN_CONFIG)
+		return;
+#endif
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	raid_device = _scsih_raid_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+	if (!raid_device)
+		return;
+	if (raid_device->starget) {
+		sas_target_priv_data = raid_device->starget->hostdata;
+		sas_target_priv_data->deleted = 1;
+		scsi_remove_target(&raid_device->starget->dev);
+	}
+	_scsih_raid_device_remove(ioc, raid_device);
+}
+
+/**
+ * _scsih_sas_pd_expose - expose pd component to /dev/sdX
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_pd_expose(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (!sas_device)
+		return;
+
+	/* exposing raid component */
+	sas_device->volume_handle = 0;
+	sas_device->volume_wwid = 0;
+	sas_device->hidden_raid_component = 0;
+	_scsih_reprobe_target(sas_device->starget, 0);
+}
+
+/**
+ * _scsih_sas_pd_hide - hide pd component from /dev/sdX
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_pd_hide(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (!sas_device)
+		return;
+
+	/* hiding raid component */
+	mpt2sas_config_get_volume_handle(ioc, handle,
+	    &sas_device->volume_handle);
+	mpt2sas_config_get_volume_wwid(ioc, sas_device->volume_handle,
+	    &sas_device->volume_wwid);
+	sas_device->hidden_raid_component = 1;
+	_scsih_reprobe_target(sas_device->starget, 1);
+}
+
+/**
+ * _scsih_sas_pd_delete - delete pd component
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_pd_delete(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (!sas_device)
+		return;
+	_scsih_remove_device(ioc, handle);
+}
+
+/**
+ * _scsih_sas_pd_add - remove pd component
+ * @ioc: per adapter object
+ * @element: IR config element data
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_pd_add(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventIrConfigElement_t *element)
+{
+	struct _sas_device *sas_device;
+	unsigned long flags;
+	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (sas_device)
+		sas_device->hidden_raid_component = 1;
+	else
+		_scsih_add_device(ioc, handle, 0, 1);
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_sas_ir_config_change_event_debug - debug for IR Config Change events
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_config_change_event_debug(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataIrConfigChangeList_t *event_data)
+{
+	Mpi2EventIrConfigElement_t *element;
+	u8 element_type;
+	int i;
+	char *reason_str = NULL, *element_str = NULL;
+
+	element = (Mpi2EventIrConfigElement_t *)&event_data->ConfigElement[0];
+
+	printk(MPT2SAS_DEBUG_FMT "raid config change: (%s), elements(%d)\n",
+	    ioc->name, (le32_to_cpu(event_data->Flags) &
+	    MPI2_EVENT_IR_CHANGE_FLAGS_FOREIGN_CONFIG) ?
+	    "foreign" : "native", event_data->NumElements);
+	for (i = 0; i < event_data->NumElements; i++, element++) {
+		switch (element->ReasonCode) {
+		case MPI2_EVENT_IR_CHANGE_RC_ADDED:
+			reason_str = "add";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_REMOVED:
+			reason_str = "remove";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_NO_CHANGE:
+			reason_str = "no change";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_HIDE:
+			reason_str = "hide";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_UNHIDE:
+			reason_str = "unhide";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_VOLUME_CREATED:
+			reason_str = "volume_created";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_VOLUME_DELETED:
+			reason_str = "volume_deleted";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_PD_CREATED:
+			reason_str = "pd_created";
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_PD_DELETED:
+			reason_str = "pd_deleted";
+			break;
+		default:
+			reason_str = "unknown reason";
+			break;
+		}
+		element_type = le16_to_cpu(element->ElementFlags) &
+		    MPI2_EVENT_IR_CHANGE_EFLAGS_ELEMENT_TYPE_MASK;
+		switch (element_type) {
+		case MPI2_EVENT_IR_CHANGE_EFLAGS_VOLUME_ELEMENT:
+			element_str = "volume";
+			break;
+		case MPI2_EVENT_IR_CHANGE_EFLAGS_VOLPHYSDISK_ELEMENT:
+			element_str = "phys disk";
+			break;
+		case MPI2_EVENT_IR_CHANGE_EFLAGS_HOTSPARE_ELEMENT:
+			element_str = "hot spare";
+			break;
+		default:
+			element_str = "unknown element";
+			break;
+		}
+		printk(KERN_DEBUG "\t(%s:%s), vol handle(0x%04x), "
+		    "pd handle(0x%04x), pd num(0x%02x)\n", element_str,
+		    reason_str, le16_to_cpu(element->VolDevHandle),
+		    le16_to_cpu(element->PhysDiskDevHandle),
+		    element->PhysDiskNum);
+	}
+}
+#endif
+
+/**
+ * _scsih_sas_ir_config_change_event - handle ir configuration change events
+ * @ioc: per adapter object
+ * @VF_ID:
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_config_change_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataIrConfigChangeList_t *event_data)
+{
+	Mpi2EventIrConfigElement_t *element;
+	int i;
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK)
+		_scsih_sas_ir_config_change_event_debug(ioc, event_data);
+
+#endif
+
+	element = (Mpi2EventIrConfigElement_t *)&event_data->ConfigElement[0];
+	for (i = 0; i < event_data->NumElements; i++, element++) {
+
+		switch (element->ReasonCode) {
+		case MPI2_EVENT_IR_CHANGE_RC_VOLUME_CREATED:
+		case MPI2_EVENT_IR_CHANGE_RC_ADDED:
+			_scsih_sas_volume_add(ioc, element);
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_VOLUME_DELETED:
+		case MPI2_EVENT_IR_CHANGE_RC_REMOVED:
+			_scsih_sas_volume_delete(ioc, element);
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_PD_CREATED:
+			_scsih_sas_pd_hide(ioc, element);
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_PD_DELETED:
+			_scsih_sas_pd_expose(ioc, element);
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_HIDE:
+			_scsih_sas_pd_add(ioc, element);
+			break;
+		case MPI2_EVENT_IR_CHANGE_RC_UNHIDE:
+			_scsih_sas_pd_delete(ioc, element);
+			break;
+		}
+	}
+}
+
+/**
+ * _scsih_sas_ir_volume_event - IR volume event
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_volume_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataIrVolume_t *event_data)
+{
+	u64 wwid;
+	unsigned long flags;
+	struct _raid_device *raid_device;
+	u16 handle;
+	u32 state;
+	int rc;
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+
+	if (event_data->ReasonCode != MPI2_EVENT_IR_VOLUME_RC_STATE_CHANGED)
+		return;
+
+	handle = le16_to_cpu(event_data->VolDevHandle);
+	state = le32_to_cpu(event_data->NewValue);
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: handle(0x%04x), "
+	    "old(0x%08x), new(0x%08x)\n", ioc->name, __func__,  handle,
+	    le32_to_cpu(event_data->PreviousValue), state));
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	raid_device = _scsih_raid_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+
+	switch (state) {
+	case MPI2_RAID_VOL_STATE_MISSING:
+	case MPI2_RAID_VOL_STATE_FAILED:
+		if (!raid_device)
+			break;
+		if (raid_device->starget) {
+			sas_target_priv_data = raid_device->starget->hostdata;
+			sas_target_priv_data->deleted = 1;
+			scsi_remove_target(&raid_device->starget->dev);
+		}
+		_scsih_raid_device_remove(ioc, raid_device);
+		break;
+
+	case MPI2_RAID_VOL_STATE_ONLINE:
+	case MPI2_RAID_VOL_STATE_DEGRADED:
+	case MPI2_RAID_VOL_STATE_OPTIMAL:
+		if (raid_device)
+			break;
+
+		mpt2sas_config_get_volume_wwid(ioc, handle, &wwid);
+		if (!wwid) {
+			printk(MPT2SAS_ERR_FMT
+			    "failure at %s:%d/%s()!\n", ioc->name,
+			    __FILE__, __LINE__, __func__);
+			break;
+		}
+
+		raid_device = kzalloc(sizeof(struct _raid_device), GFP_KERNEL);
+		if (!raid_device) {
+			printk(MPT2SAS_ERR_FMT
+			    "failure at %s:%d/%s()!\n", ioc->name,
+			    __FILE__, __LINE__, __func__);
+			break;
+		}
+
+		raid_device->id = ioc->sas_id++;
+		raid_device->channel = RAID_CHANNEL;
+		raid_device->handle = handle;
+		raid_device->wwid = wwid;
+		_scsih_raid_device_add(ioc, raid_device);
+		rc = scsi_add_device(ioc->shost, RAID_CHANNEL,
+		    raid_device->id, 0);
+		if (rc)
+			_scsih_raid_device_remove(ioc, raid_device);
+		break;
+
+	case MPI2_RAID_VOL_STATE_INITIALIZING:
+	default:
+		break;
+	}
+}
+
+/**
+ * _scsih_sas_ir_physical_disk_event - PD event
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_physical_disk_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+   Mpi2EventDataIrPhysicalDisk_t *event_data)
+{
+	u16 handle;
+	u32 state;
+	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	if (event_data->ReasonCode != MPI2_EVENT_IR_PHYSDISK_RC_STATE_CHANGED)
+		return;
+
+	handle = le16_to_cpu(event_data->PhysDiskDevHandle);
+	state = le32_to_cpu(event_data->NewValue);
+
+	dewtprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s: handle(0x%04x), "
+	    "old(0x%08x), new(0x%08x)\n", ioc->name, __func__,  handle,
+	    le32_to_cpu(event_data->PreviousValue), state));
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	switch (state) {
+#if 0
+	case MPI2_RAID_PD_STATE_OFFLINE:
+		if (sas_device)
+			_scsih_remove_device(ioc, handle);
+		break;
+#endif
+	case MPI2_RAID_PD_STATE_ONLINE:
+	case MPI2_RAID_PD_STATE_DEGRADED:
+	case MPI2_RAID_PD_STATE_REBUILDING:
+	case MPI2_RAID_PD_STATE_OPTIMAL:
+		if (sas_device)
+			sas_device->hidden_raid_component = 1;
+		else
+			_scsih_add_device(ioc, handle, 0, 1);
+		break;
+
+	case MPI2_RAID_PD_STATE_NOT_CONFIGURED:
+	case MPI2_RAID_PD_STATE_NOT_COMPATIBLE:
+	case MPI2_RAID_PD_STATE_HOT_SPARE:
+	default:
+		break;
+	}
+}
+
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+/**
+ * _scsih_sas_ir_operation_status_event_debug - debug for IR op event
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_operation_status_event_debug(struct MPT2SAS_ADAPTER *ioc,
+    Mpi2EventDataIrOperationStatus_t *event_data)
+{
+	char *reason_str = NULL;
+
+	switch (event_data->RAIDOperation) {
+	case MPI2_EVENT_IR_RAIDOP_RESYNC:
+		reason_str = "resync";
+		break;
+	case MPI2_EVENT_IR_RAIDOP_ONLINE_CAP_EXPANSION:
+		reason_str = "online capacity expansion";
+		break;
+	case MPI2_EVENT_IR_RAIDOP_CONSISTENCY_CHECK:
+		reason_str = "consistency check";
+		break;
+	default:
+		reason_str = "unknown reason";
+		break;
+	}
+
+	printk(MPT2SAS_INFO_FMT "raid operational status: (%s)"
+	    "\thandle(0x%04x), percent complete(%d)\n",
+	    ioc->name, reason_str,
+	    le16_to_cpu(event_data->VolDevHandle),
+	    event_data->PercentComplete);
+}
+#endif
+
+/**
+ * _scsih_sas_ir_operation_status_event - handle RAID operation events
+ * @ioc: per adapter object
+ * @VF_ID:
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_sas_ir_operation_status_event(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataIrOperationStatus_t *event_data)
+{
+#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
+	if (ioc->logging_level & MPT_DEBUG_EVENT_WORK_TASK)
+		_scsih_sas_ir_operation_status_event_debug(ioc, event_data);
+#endif
+}
+
+/**
+ * _scsih_task_set_full - handle task set full
+ * @ioc: per adapter object
+ * @event_data: event data payload
+ * Context: user.
+ *
+ * Throttle back qdepth.
+ */
+static void
+_scsih_task_set_full(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID,
+    Mpi2EventDataTaskSetFull_t *event_data)
+{
+	unsigned long flags;
+	struct _sas_device *sas_device;
+	static struct _raid_device *raid_device;
+	struct scsi_device *sdev;
+	int depth;
+	u16 current_depth;
+	u16 handle;
+	int id, channel;
+	u64 sas_address;
+
+	current_depth = le16_to_cpu(event_data->CurrentDepth);
+	handle = le16_to_cpu(event_data->DevHandle);
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	if (!sas_device) {
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	id = sas_device->id;
+	channel = sas_device->channel;
+	sas_address = sas_device->sas_address;
+
+	/* if hidden raid component, then change to volume characteristics */
+	if (sas_device->hidden_raid_component && sas_device->volume_handle) {
+		spin_lock_irqsave(&ioc->raid_device_lock, flags);
+		raid_device = _scsih_raid_device_find_by_handle(
+		    ioc, sas_device->volume_handle);
+		spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+		if (raid_device) {
+			id = raid_device->id;
+			channel = raid_device->channel;
+			handle = raid_device->handle;
+			sas_address = raid_device->wwid;
+		}
+	}
+
+	if (ioc->logging_level & MPT_DEBUG_TASK_SET_FULL)
+		starget_printk(KERN_DEBUG, sas_device->starget, "task set "
+		    "full: handle(0x%04x), sas_addr(0x%016llx), depth(%d)\n",
+		    handle, (unsigned long long)sas_address, current_depth);
+
+	shost_for_each_device(sdev, ioc->shost) {
+		if (sdev->id == id && sdev->channel == channel) {
+			if (current_depth > sdev->queue_depth) {
+				if (ioc->logging_level &
+				    MPT_DEBUG_TASK_SET_FULL)
+					sdev_printk(KERN_INFO, sdev, "strange "
+					    "observation, the queue depth is"
+					    " (%d) meanwhile fw queue depth "
+					    "is (%d)\n", sdev->queue_depth,
+					    current_depth);
+				continue;
+			}
+			depth = scsi_track_queue_full(sdev,
+			    current_depth - 1);
+			if (depth > 0)
+				sdev_printk(KERN_INFO, sdev, "Queue depth "
+				    "reduced to (%d)\n", depth);
+			else if (depth < 0)
+				sdev_printk(KERN_INFO, sdev, "Tagged Command "
+				    "Queueing is being disabled\n");
+			else if (depth == 0)
+				if (ioc->logging_level &
+				     MPT_DEBUG_TASK_SET_FULL)
+					sdev_printk(KERN_INFO, sdev,
+					     "Queue depth not changed yet\n");
+		}
+	}
+}
+
+/**
+ * _scsih_mark_responding_sas_device - mark a sas_devices as responding
+ * @ioc: per adapter object
+ * @sas_address: sas address
+ * @slot: enclosure slot id
+ * @handle: device handle
+ *
+ * After host reset, find out whether devices are still responding.
+ * Used in _scsi_remove_unresponsive_sas_devices.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_mark_responding_sas_device(struct MPT2SAS_ADAPTER *ioc, u64 sas_address,
+    u16 slot, u16 handle)
+{
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct scsi_target *starget;
+	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	list_for_each_entry(sas_device, &ioc->sas_device_list, list) {
+		if (sas_device->sas_address == sas_address &&
+		    sas_device->slot == slot && sas_device->starget) {
+			sas_device->responding = 1;
+			starget_printk(KERN_INFO, sas_device->starget,
+			    "handle(0x%04x), sas_addr(0x%016llx), enclosure "
+			    "logical id(0x%016llx), slot(%d)\n", handle,
+			    (unsigned long long)sas_device->sas_address,
+			    (unsigned long long)
+			    sas_device->enclosure_logical_id,
+			    sas_device->slot);
+			if (sas_device->handle == handle)
+				goto out;
+			printk(KERN_INFO "\thandle changed from(0x%04x)!!!\n",
+			    sas_device->handle);
+			sas_device->handle = handle;
+			starget = sas_device->starget;
+			sas_target_priv_data = starget->hostdata;
+			sas_target_priv_data->handle = handle;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+}
+
+/**
+ * _scsih_search_responding_sas_devices -
+ * @ioc: per adapter object
+ *
+ * After host reset, find out whether devices are still responding.
+ * If not remove.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_search_responding_sas_devices(struct MPT2SAS_ADAPTER *ioc)
+{
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	Mpi2ConfigReply_t mpi_reply;
+	u16 ioc_status;
+	__le64 sas_address;
+	u16 handle;
+	u32 device_info;
+	u16 slot;
+
+	printk(MPT2SAS_INFO_FMT "%s\n", ioc->name, __func__);
+
+	if (list_empty(&ioc->sas_device_list))
+		return;
+
+	handle = 0xFFFF;
+	while (!(mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply,
+	    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_GET_NEXT_HANDLE,
+	    handle))) {
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status == MPI2_IOCSTATUS_CONFIG_INVALID_PAGE)
+			break;
+		handle = le16_to_cpu(sas_device_pg0.DevHandle);
+		device_info = le32_to_cpu(sas_device_pg0.DeviceInfo);
+		if (!(_scsih_is_end_device(device_info)))
+			continue;
+		sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
+		slot = le16_to_cpu(sas_device_pg0.Slot);
+		_scsih_mark_responding_sas_device(ioc, sas_address, slot,
+		    handle);
+	}
+}
+
+/**
+ * _scsih_mark_responding_raid_device - mark a raid_device as responding
+ * @ioc: per adapter object
+ * @wwid: world wide identifier for raid volume
+ * @handle: device handle
+ *
+ * After host reset, find out whether devices are still responding.
+ * Used in _scsi_remove_unresponsive_raid_devices.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_mark_responding_raid_device(struct MPT2SAS_ADAPTER *ioc, u64 wwid,
+    u16 handle)
+{
+	struct MPT2SAS_TARGET *sas_target_priv_data;
+	struct scsi_target *starget;
+	struct _raid_device *raid_device;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->raid_device_lock, flags);
+	list_for_each_entry(raid_device, &ioc->raid_device_list, list) {
+		if (raid_device->wwid == wwid && raid_device->starget) {
+			raid_device->responding = 1;
+			starget_printk(KERN_INFO, raid_device->starget,
+			    "handle(0x%04x), wwid(0x%016llx)\n", handle,
+			    (unsigned long long)raid_device->wwid);
+			if (raid_device->handle == handle)
+				goto out;
+			printk(KERN_INFO "\thandle changed from(0x%04x)!!!\n",
+			    raid_device->handle);
+			raid_device->handle = handle;
+			starget = raid_device->starget;
+			sas_target_priv_data = starget->hostdata;
+			sas_target_priv_data->handle = handle;
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->raid_device_lock, flags);
+}
+
+/**
+ * _scsih_search_responding_raid_devices -
+ * @ioc: per adapter object
+ *
+ * After host reset, find out whether devices are still responding.
+ * If not remove.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_search_responding_raid_devices(struct MPT2SAS_ADAPTER *ioc)
+{
+	Mpi2RaidVolPage1_t volume_pg1;
+	Mpi2ConfigReply_t mpi_reply;
+	u16 ioc_status;
+	u16 handle;
+
+	printk(MPT2SAS_INFO_FMT "%s\n", ioc->name, __func__);
+
+	if (list_empty(&ioc->raid_device_list))
+		return;
+
+	handle = 0xFFFF;
+	while (!(mpt2sas_config_get_raid_volume_pg1(ioc, &mpi_reply,
+	    &volume_pg1, MPI2_RAID_VOLUME_PGAD_FORM_GET_NEXT_HANDLE, handle))) {
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status == MPI2_IOCSTATUS_CONFIG_INVALID_PAGE)
+			break;
+		handle = le16_to_cpu(volume_pg1.DevHandle);
+		_scsih_mark_responding_raid_device(ioc,
+		    le64_to_cpu(volume_pg1.WWID), handle);
+	}
+}
+
+/**
+ * _scsih_mark_responding_expander - mark a expander as responding
+ * @ioc: per adapter object
+ * @sas_address: sas address
+ * @handle:
+ *
+ * After host reset, find out whether devices are still responding.
+ * Used in _scsi_remove_unresponsive_expanders.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_mark_responding_expander(struct MPT2SAS_ADAPTER *ioc, u64 sas_address,
+     u16 handle)
+{
+	struct _sas_node *sas_expander;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	list_for_each_entry(sas_expander, &ioc->sas_expander_list, list) {
+		if (sas_expander->sas_address == sas_address) {
+			sas_expander->responding = 1;
+			if (sas_expander->handle != handle) {
+				printk(KERN_INFO "old handle(0x%04x)\n",
+				    sas_expander->handle);
+				sas_expander->handle = handle;
+			}
+			goto out;
+		}
+	}
+ out:
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+}
+
+/**
+ * _scsih_search_responding_expanders -
+ * @ioc: per adapter object
+ *
+ * After host reset, find out whether devices are still responding.
+ * If not remove.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_search_responding_expanders(struct MPT2SAS_ADAPTER *ioc)
+{
+	Mpi2ExpanderPage0_t expander_pg0;
+	Mpi2ConfigReply_t mpi_reply;
+	u16 ioc_status;
+	__le64 sas_address;
+	u16 handle;
+
+	printk(MPT2SAS_INFO_FMT "%s\n", ioc->name, __func__);
+
+	if (list_empty(&ioc->sas_expander_list))
+		return;
+
+	handle = 0xFFFF;
+	while (!(mpt2sas_config_get_expander_pg0(ioc, &mpi_reply, &expander_pg0,
+	    MPI2_SAS_EXPAND_PGAD_FORM_GET_NEXT_HNDL, handle))) {
+
+		ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+		    MPI2_IOCSTATUS_MASK;
+		if (ioc_status == MPI2_IOCSTATUS_CONFIG_INVALID_PAGE)
+			break;
+
+		handle = le16_to_cpu(expander_pg0.DevHandle);
+		sas_address = le64_to_cpu(expander_pg0.SASAddress);
+		printk(KERN_INFO "\texpander present: handle(0x%04x), "
+		    "sas_addr(0x%016llx)\n", handle,
+		    (unsigned long long)sas_address);
+		_scsih_mark_responding_expander(ioc, sas_address, handle);
+	}
+
+}
+
+/**
+ * _scsih_remove_unresponding_devices - removing unresponding devices
+ * @ioc: per adapter object
+ *
+ * Return nothing.
+ */
+static void
+_scsih_remove_unresponding_devices(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct _sas_device *sas_device, *sas_device_next;
+	struct _sas_node *sas_expander, *sas_expander_next;
+	struct _raid_device *raid_device, *raid_device_next;
+	unsigned long flags;
+
+	_scsih_search_responding_sas_devices(ioc);
+	_scsih_search_responding_raid_devices(ioc);
+	_scsih_search_responding_expanders(ioc);
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	ioc->shost_recovery = 0;
+	if (ioc->shost->shost_state == SHOST_RECOVERY) {
+		printk(MPT2SAS_INFO_FMT "putting controller into "
+		    "SHOST_RUNNING\n", ioc->name);
+		scsi_host_set_state(ioc->shost, SHOST_RUNNING);
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	list_for_each_entry_safe(sas_device, sas_device_next,
+	    &ioc->sas_device_list, list) {
+		if (sas_device->responding) {
+			sas_device->responding = 0;
+			continue;
+		}
+		if (sas_device->starget)
+			starget_printk(KERN_INFO, sas_device->starget,
+			    "removing: handle(0x%04x), sas_addr(0x%016llx), "
+			    "enclosure logical id(0x%016llx), slot(%d)\n",
+			    sas_device->handle,
+			    (unsigned long long)sas_device->sas_address,
+			    (unsigned long long)
+			    sas_device->enclosure_logical_id,
+			    sas_device->slot);
+		_scsih_remove_device(ioc, sas_device->handle);
+	}
+
+	list_for_each_entry_safe(raid_device, raid_device_next,
+	    &ioc->raid_device_list, list) {
+		if (raid_device->responding) {
+			raid_device->responding = 0;
+			continue;
+		}
+		if (raid_device->starget) {
+			starget_printk(KERN_INFO, raid_device->starget,
+			    "removing: handle(0x%04x), wwid(0x%016llx)\n",
+			      raid_device->handle,
+			    (unsigned long long)raid_device->wwid);
+			scsi_remove_target(&raid_device->starget->dev);
+		}
+		_scsih_raid_device_remove(ioc, raid_device);
+	}
+
+	list_for_each_entry_safe(sas_expander, sas_expander_next,
+	    &ioc->sas_expander_list, list) {
+		if (sas_expander->responding) {
+			sas_expander->responding = 0;
+			continue;
+		}
+		printk("\tremoving expander: handle(0x%04x), "
+		    " sas_addr(0x%016llx)\n", sas_expander->handle,
+		    (unsigned long long)sas_expander->sas_address);
+		_scsih_expander_remove(ioc, sas_expander->handle);
+	}
+}
+
+/**
+ * _firmware_event_work - delayed task for processing firmware events
+ * @ioc: per adapter object
+ * @work: equal to the fw_event_work object
+ * Context: user.
+ *
+ * Return nothing.
+ */
+static void
+_firmware_event_work(struct work_struct *work)
+{
+	struct fw_event_work *fw_event = container_of(work,
+	    struct fw_event_work, work.work);
+	unsigned long flags;
+	struct MPT2SAS_ADAPTER *ioc = fw_event->ioc;
+
+	/* This is invoked by calling _scsih_queue_rescan(). */
+	if (fw_event->event == MPT2SAS_RESCAN_AFTER_HOST_RESET) {
+		_scsih_fw_event_free(ioc, fw_event);
+		_scsih_sas_host_refresh(ioc, 1);
+		_scsih_remove_unresponding_devices(ioc);
+		return;
+	}
+
+	/* the queue is being flushed so ignore this event */
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	if (ioc->fw_events_off || ioc->remove_host) {
+		spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+		_scsih_fw_event_free(ioc, fw_event);
+		return;
+	}
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->shost_recovery) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		_scsih_fw_event_requeue(ioc, fw_event, 1000);
+		return;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	switch (fw_event->event) {
+	case MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST:
+		_scsih_sas_topology_change_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data, fw_event);
+		break;
+	case MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE:
+		_scsih_sas_device_status_change_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_SAS_DISCOVERY:
+		_scsih_sas_discovery_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_SAS_BROADCAST_PRIMITIVE:
+		_scsih_sas_broadcast_primative_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_SAS_ENCL_DEVICE_STATUS_CHANGE:
+		_scsih_sas_enclosure_dev_status_change_event(ioc,
+		    fw_event->VF_ID, fw_event->event_data);
+		break;
+	case MPI2_EVENT_IR_CONFIGURATION_CHANGE_LIST:
+		_scsih_sas_ir_config_change_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_IR_VOLUME:
+		_scsih_sas_ir_volume_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_IR_PHYSICAL_DISK:
+		_scsih_sas_ir_physical_disk_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_IR_OPERATION_STATUS:
+		_scsih_sas_ir_operation_status_event(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	case MPI2_EVENT_TASK_SET_FULL:
+		_scsih_task_set_full(ioc, fw_event->VF_ID,
+		    fw_event->event_data);
+		break;
+	}
+	_scsih_fw_event_free(ioc, fw_event);
+}
+
+/**
+ * mpt2sas_scsih_event_callback - firmware event handler (called at ISR time)
+ * @ioc: per adapter object
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ * Context: interrupt.
+ *
+ * This function merely adds a new work task into ioc->firmware_event_thread.
+ * The tasks are worked from _firmware_event_work in user context.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_scsih_event_callback(struct MPT2SAS_ADAPTER *ioc, u8 VF_ID, u32 reply)
+{
+	struct fw_event_work *fw_event;
+	Mpi2EventNotificationReply_t *mpi_reply;
+	unsigned long flags;
+	u16 event;
+
+	/* events turned off due to host reset or driver unloading */
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	if (ioc->fw_events_off || ioc->remove_host) {
+		spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+	mpi_reply =  mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	event = le16_to_cpu(mpi_reply->Event);
+
+	switch (event) {
+	/* handle these */
+	case MPI2_EVENT_SAS_BROADCAST_PRIMITIVE:
+	{
+		Mpi2EventDataSasBroadcastPrimitive_t *baen_data =
+		    (Mpi2EventDataSasBroadcastPrimitive_t *)
+		    mpi_reply->EventData;
+
+		if (baen_data->Primitive !=
+		    MPI2_EVENT_PRIMITIVE_ASYNCHRONOUS_EVENT ||
+		    ioc->broadcast_aen_busy)
+			return;
+		ioc->broadcast_aen_busy = 1;
+		break;
+	}
+
+	case MPI2_EVENT_SAS_TOPOLOGY_CHANGE_LIST:
+		_scsih_check_topo_delete_events(ioc,
+		    (Mpi2EventDataSasTopologyChangeList_t *)
+		    mpi_reply->EventData);
+		break;
+
+	case MPI2_EVENT_SAS_DEVICE_STATUS_CHANGE:
+	case MPI2_EVENT_IR_OPERATION_STATUS:
+	case MPI2_EVENT_SAS_DISCOVERY:
+	case MPI2_EVENT_SAS_ENCL_DEVICE_STATUS_CHANGE:
+	case MPI2_EVENT_IR_VOLUME:
+	case MPI2_EVENT_IR_PHYSICAL_DISK:
+	case MPI2_EVENT_IR_CONFIGURATION_CHANGE_LIST:
+	case MPI2_EVENT_TASK_SET_FULL:
+		break;
+
+	default: /* ignore the rest */
+		return;
+	}
+
+	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	if (!fw_event) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return;
+	}
+	fw_event->event_data =
+	    kzalloc(mpi_reply->EventDataLength*4, GFP_ATOMIC);
+	if (!fw_event->event_data) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		kfree(fw_event);
+		return;
+	}
+
+	memcpy(fw_event->event_data, mpi_reply->EventData,
+	    mpi_reply->EventDataLength*4);
+	fw_event->ioc = ioc;
+	fw_event->VF_ID = VF_ID;
+	fw_event->event = event;
+	_scsih_fw_event_add(ioc, fw_event);
+}
+
+/* shost template */
+static struct scsi_host_template scsih_driver_template = {
+	.module				= THIS_MODULE,
+	.name				= "Fusion MPT SAS Host",
+	.proc_name			= MPT2SAS_DRIVER_NAME,
+	.queuecommand			= scsih_qcmd,
+	.target_alloc			= scsih_target_alloc,
+	.slave_alloc			= scsih_slave_alloc,
+	.slave_configure		= scsih_slave_configure,
+	.target_destroy			= scsih_target_destroy,
+	.slave_destroy			= scsih_slave_destroy,
+	.change_queue_depth 		= scsih_change_queue_depth,
+	.change_queue_type		= scsih_change_queue_type,
+	.eh_abort_handler		= scsih_abort,
+	.eh_device_reset_handler	= scsih_dev_reset,
+	.eh_host_reset_handler		= scsih_host_reset,
+	.bios_param			= scsih_bios_param,
+	.can_queue			= 1,
+	.this_id			= -1,
+	.sg_tablesize			= MPT2SAS_SG_DEPTH,
+	.max_sectors			= 8192,
+	.cmd_per_lun			= 7,
+	.use_clustering			= ENABLE_CLUSTERING,
+	.shost_attrs			= mpt2sas_host_attrs,
+	.sdev_attrs			= mpt2sas_dev_attrs,
+};
+
+/**
+ * _scsih_expander_node_remove - removing expander device from list.
+ * @ioc: per adapter object
+ * @sas_expander: the sas_device object
+ * Context: Calling function should acquire ioc->sas_node_lock.
+ *
+ * Removing object and freeing associated memory from the
+ * ioc->sas_expander_list.
+ *
+ * Return nothing.
+ */
+static void
+_scsih_expander_node_remove(struct MPT2SAS_ADAPTER *ioc,
+    struct _sas_node *sas_expander)
+{
+	struct _sas_port *mpt2sas_port;
+	struct _sas_device *sas_device;
+	struct _sas_node *expander_sibling;
+	unsigned long flags;
+
+	if (!sas_expander)
+		return;
+
+	/* remove sibling ports attached to this expander */
+ retry_device_search:
+	list_for_each_entry(mpt2sas_port,
+	   &sas_expander->sas_port_list, port_list) {
+		if (mpt2sas_port->remote_identify.device_type ==
+		    SAS_END_DEVICE) {
+			spin_lock_irqsave(&ioc->sas_device_lock, flags);
+			sas_device =
+			    mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+			   mpt2sas_port->remote_identify.sas_address);
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			if (!sas_device)
+				continue;
+			_scsih_remove_device(ioc, sas_device->handle);
+			goto retry_device_search;
+		}
+	}
+
+ retry_expander_search:
+	list_for_each_entry(mpt2sas_port,
+	   &sas_expander->sas_port_list, port_list) {
+
+		if (mpt2sas_port->remote_identify.device_type ==
+		    MPI2_SAS_DEVICE_INFO_EDGE_EXPANDER ||
+		    mpt2sas_port->remote_identify.device_type ==
+		    MPI2_SAS_DEVICE_INFO_FANOUT_EXPANDER) {
+
+			spin_lock_irqsave(&ioc->sas_node_lock, flags);
+			expander_sibling =
+			    mpt2sas_scsih_expander_find_by_sas_address(
+			    ioc, mpt2sas_port->remote_identify.sas_address);
+			spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+			if (!expander_sibling)
+				continue;
+			_scsih_expander_remove(ioc, expander_sibling->handle);
+			goto retry_expander_search;
+		}
+	}
+
+	mpt2sas_transport_port_remove(ioc, sas_expander->sas_address,
+	    sas_expander->parent_handle);
+
+	printk(MPT2SAS_INFO_FMT "expander_remove: handle"
+	   "(0x%04x), sas_addr(0x%016llx)\n", ioc->name,
+	    sas_expander->handle, (unsigned long long)
+	    sas_expander->sas_address);
+
+	list_del(&sas_expander->list);
+	kfree(sas_expander->phy);
+	kfree(sas_expander);
+}
+
+/**
+ * scsih_remove - detach and remove add host
+ * @pdev: PCI device struct
+ *
+ * Return nothing.
+ */
+static void __devexit
+scsih_remove(struct pci_dev *pdev)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct _sas_port *mpt2sas_port;
+	struct _sas_device *sas_device;
+	struct _sas_node *expander_sibling;
+	struct workqueue_struct	*wq;
+	unsigned long flags;
+
+	ioc->remove_host = 1;
+	_scsih_fw_event_off(ioc);
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	wq = ioc->firmware_event_thread;
+	ioc->firmware_event_thread = NULL;
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+	if (wq)
+		destroy_workqueue(wq);
+
+	/* free ports attached to the sas_host */
+ retry_again:
+	list_for_each_entry(mpt2sas_port,
+	   &ioc->sas_hba.sas_port_list, port_list) {
+		if (mpt2sas_port->remote_identify.device_type ==
+		    SAS_END_DEVICE) {
+			sas_device =
+			    mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+			   mpt2sas_port->remote_identify.sas_address);
+			if (sas_device) {
+				_scsih_remove_device(ioc, sas_device->handle);
+				goto retry_again;
+			}
+		} else {
+			expander_sibling =
+			    mpt2sas_scsih_expander_find_by_sas_address(ioc,
+			    mpt2sas_port->remote_identify.sas_address);
+			if (expander_sibling) {
+				_scsih_expander_remove(ioc,
+				    expander_sibling->handle);
+				goto retry_again;
+			}
+		}
+	}
+
+	/* free phys attached to the sas_host */
+	if (ioc->sas_hba.num_phys) {
+		kfree(ioc->sas_hba.phy);
+		ioc->sas_hba.phy = NULL;
+		ioc->sas_hba.num_phys = 0;
+	}
+
+	sas_remove_host(shost);
+	mpt2sas_base_detach(ioc);
+	list_del(&ioc->list);
+	scsi_remove_host(shost);
+	scsi_host_put(shost);
+}
+
+/**
+ * _scsih_probe_boot_devices - reports 1st device
+ * @ioc: per adapter object
+ *
+ * If specified in bios page 2, this routine reports the 1st
+ * device scsi-ml or sas transport for persistent boot device
+ * purposes.  Please refer to function _scsih_determine_boot_device()
+ */
+static void
+_scsih_probe_boot_devices(struct MPT2SAS_ADAPTER *ioc)
+{
+	u8 is_raid;
+	void *device;
+	struct _sas_device *sas_device;
+	struct _raid_device *raid_device;
+	u16 handle, parent_handle;
+	u64 sas_address;
+	unsigned long flags;
+	int rc;
+
+	device = NULL;
+	if (ioc->req_boot_device.device) {
+		device =  ioc->req_boot_device.device;
+		is_raid = ioc->req_boot_device.is_raid;
+	} else if (ioc->req_alt_boot_device.device) {
+		device =  ioc->req_alt_boot_device.device;
+		is_raid = ioc->req_alt_boot_device.is_raid;
+	} else if (ioc->current_boot_device.device) {
+		device =  ioc->current_boot_device.device;
+		is_raid = ioc->current_boot_device.is_raid;
+	}
+
+	if (!device)
+		return;
+
+	if (is_raid) {
+		raid_device = device;
+		rc = scsi_add_device(ioc->shost, RAID_CHANNEL,
+		    raid_device->id, 0);
+		if (rc)
+			_scsih_raid_device_remove(ioc, raid_device);
+	} else {
+		sas_device = device;
+		handle = sas_device->handle;
+		parent_handle = sas_device->parent_handle;
+		sas_address = sas_device->sas_address;
+		spin_lock_irqsave(&ioc->sas_device_lock, flags);
+		list_move_tail(&sas_device->list, &ioc->sas_device_list);
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
+		    sas_device->parent_handle)) {
+			_scsih_sas_device_remove(ioc, sas_device);
+		} else if (!sas_device->starget) {
+			mpt2sas_transport_port_remove(ioc, sas_address,
+			    parent_handle);
+			_scsih_sas_device_remove(ioc, sas_device);
+		}
+	}
+}
+
+/**
+ * _scsih_probe_raid - reporting raid volumes to scsi-ml
+ * @ioc: per adapter object
+ *
+ * Called during initial loading of the driver.
+ */
+static void
+_scsih_probe_raid(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct _raid_device *raid_device, *raid_next;
+	int rc;
+
+	list_for_each_entry_safe(raid_device, raid_next,
+	    &ioc->raid_device_list, list) {
+		if (raid_device->starget)
+			continue;
+		rc = scsi_add_device(ioc->shost, RAID_CHANNEL,
+		    raid_device->id, 0);
+		if (rc)
+			_scsih_raid_device_remove(ioc, raid_device);
+	}
+}
+
+/**
+ * _scsih_probe_sas - reporting raid volumes to sas transport
+ * @ioc: per adapter object
+ *
+ * Called during initial loading of the driver.
+ */
+static void
+_scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
+{
+	struct _sas_device *sas_device, *next;
+	unsigned long flags;
+	u16 handle, parent_handle;
+	u64 sas_address;
+
+	/* SAS Device List */
+	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
+	    list) {
+		spin_lock_irqsave(&ioc->sas_device_lock, flags);
+		list_move_tail(&sas_device->list, &ioc->sas_device_list);
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+		handle = sas_device->handle;
+		parent_handle = sas_device->parent_handle;
+		sas_address = sas_device->sas_address;
+		if (!mpt2sas_transport_port_add(ioc, handle, parent_handle)) {
+			_scsih_sas_device_remove(ioc, sas_device);
+		} else if (!sas_device->starget) {
+			mpt2sas_transport_port_remove(ioc, sas_address,
+			    parent_handle);
+			_scsih_sas_device_remove(ioc, sas_device);
+		}
+	}
+}
+
+/**
+ * _scsih_probe_devices - probing for devices
+ * @ioc: per adapter object
+ *
+ * Called during initial loading of the driver.
+ */
+static void
+_scsih_probe_devices(struct MPT2SAS_ADAPTER *ioc)
+{
+	u16 volume_mapping_flags =
+	    le16_to_cpu(ioc->ioc_pg8.IRVolumeMappingFlags) &
+	    MPI2_IOCPAGE8_IRFLAGS_MASK_VOLUME_MAPPING_MODE;
+
+	if (!(ioc->facts.ProtocolFlags & MPI2_IOCFACTS_PROTOCOL_SCSI_INITIATOR))
+		return;  /* return when IOC doesn't support initiator mode */
+
+	_scsih_probe_boot_devices(ioc);
+
+	if (ioc->ir_firmware) {
+		if ((volume_mapping_flags &
+		     MPI2_IOCPAGE8_IRFLAGS_HIGH_VOLUME_MAPPING)) {
+			_scsih_probe_sas(ioc);
+			_scsih_probe_raid(ioc);
+		} else {
+			_scsih_probe_raid(ioc);
+			_scsih_probe_sas(ioc);
+		}
+	} else
+		_scsih_probe_sas(ioc);
+}
+
+/**
+ * scsih_probe - attach and add scsi host
+ * @pdev: PCI device struct
+ * @id: pci device id
+ *
+ * Returns 0 success, anything else error.
+ */
+static int
+scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct MPT2SAS_ADAPTER *ioc;
+	struct Scsi_Host *shost;
+
+	shost = scsi_host_alloc(&scsih_driver_template,
+	    sizeof(struct MPT2SAS_ADAPTER));
+	if (!shost)
+		return -ENODEV;
+
+	/* init local params */
+	ioc = shost_priv(shost);
+	memset(ioc, 0, sizeof(struct MPT2SAS_ADAPTER));
+	INIT_LIST_HEAD(&ioc->list);
+	list_add_tail(&ioc->list, &mpt2sas_ioc_list);
+	ioc->shost = shost;
+	ioc->id = mpt_ids++;
+	sprintf(ioc->name, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id);
+	ioc->pdev = pdev;
+	ioc->scsi_io_cb_idx = scsi_io_cb_idx;
+	ioc->tm_cb_idx = tm_cb_idx;
+	ioc->ctl_cb_idx = ctl_cb_idx;
+	ioc->base_cb_idx = base_cb_idx;
+	ioc->transport_cb_idx = transport_cb_idx;
+	ioc->config_cb_idx = config_cb_idx;
+	ioc->logging_level = logging_level;
+	/* misc semaphores and spin locks */
+	spin_lock_init(&ioc->ioc_reset_in_progress_lock);
+	spin_lock_init(&ioc->scsi_lookup_lock);
+	spin_lock_init(&ioc->sas_device_lock);
+	spin_lock_init(&ioc->sas_node_lock);
+	spin_lock_init(&ioc->fw_event_lock);
+	spin_lock_init(&ioc->raid_device_lock);
+
+	INIT_LIST_HEAD(&ioc->sas_device_list);
+	INIT_LIST_HEAD(&ioc->sas_device_init_list);
+	INIT_LIST_HEAD(&ioc->sas_expander_list);
+	INIT_LIST_HEAD(&ioc->fw_event_list);
+	INIT_LIST_HEAD(&ioc->raid_device_list);
+	INIT_LIST_HEAD(&ioc->sas_hba.sas_port_list);
+
+	/* init shost parameters */
+	shost->max_cmd_len = 16;
+	shost->max_lun = max_lun;
+	shost->transportt = mpt2sas_transport_template;
+	shost->unique_id = ioc->id;
+
+	if ((scsi_add_host(shost, &pdev->dev))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		list_del(&ioc->list);
+		goto out_add_shost_fail;
+	}
+
+	/* event thread */
+	snprintf(ioc->firmware_event_name, sizeof(ioc->firmware_event_name),
+	    "fw_event%d", ioc->id);
+	ioc->firmware_event_thread = create_singlethread_workqueue(
+	    ioc->firmware_event_name);
+	if (!ioc->firmware_event_thread) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_thread_fail;
+	}
+
+	ioc->wait_for_port_enable_to_complete = 1;
+	if ((mpt2sas_base_attach(ioc))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_attach_fail;
+	}
+
+	ioc->wait_for_port_enable_to_complete = 0;
+	_scsih_probe_devices(ioc);
+	return 0;
+
+ out_attach_fail:
+	destroy_workqueue(ioc->firmware_event_thread);
+ out_thread_fail:
+	list_del(&ioc->list);
+	scsi_remove_host(shost);
+ out_add_shost_fail:
+	return -ENODEV;
+}
+
+#ifdef CONFIG_PM
+/**
+ * scsih_suspend - power management suspend main entry point
+ * @pdev: PCI device struct
+ * @state: PM state change to (usually PCI_D3)
+ *
+ * Returns 0 success, anything else error.
+ */
+static int
+scsih_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	u32 device_state;
+
+	flush_scheduled_work();
+	scsi_block_requests(shost);
+	device_state = pci_choose_state(pdev, state);
+	printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, entering "
+	    "operating state [D%d]\n", ioc->name, pdev,
+	    pci_name(pdev), device_state);
+
+	mpt2sas_base_free_resources(ioc);
+	pci_save_state(pdev);
+	pci_disable_device(pdev);
+	pci_set_power_state(pdev, device_state);
+	return 0;
+}
+
+/**
+ * scsih_resume - power management resume main entry point
+ * @pdev: PCI device struct
+ *
+ * Returns 0 success, anything else error.
+ */
+static int
+scsih_resume(struct pci_dev *pdev)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	u32 device_state = pdev->current_state;
+	int r;
+
+	printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, previous "
+	    "operating state [D%d]\n", ioc->name, pdev,
+	    pci_name(pdev), device_state);
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_enable_wake(pdev, PCI_D0, 0);
+	pci_restore_state(pdev);
+	ioc->pdev = pdev;
+	r = mpt2sas_base_map_resources(ioc);
+	if (r)
+		return r;
+
+	mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
+	scsi_unblock_requests(shost);
+	return 0;
+}
+#endif /* CONFIG_PM */
+
+
+static struct pci_driver scsih_driver = {
+	.name		= MPT2SAS_DRIVER_NAME,
+	.id_table	= scsih_pci_table,
+	.probe		= scsih_probe,
+	.remove		= __devexit_p(scsih_remove),
+#ifdef CONFIG_PM
+	.suspend	= scsih_suspend,
+	.resume		= scsih_resume,
+#endif
+};
+
+
+/**
+ * scsih_init - main entry point for this driver.
+ *
+ * Returns 0 success, anything else error.
+ */
+static int __init
+scsih_init(void)
+{
+	int error;
+
+	mpt_ids = 0;
+	printk(KERN_INFO "%s version %s loaded\n", MPT2SAS_DRIVER_NAME,
+	    MPT2SAS_DRIVER_VERSION);
+
+	mpt2sas_transport_template =
+	    sas_attach_transport(&mpt2sas_transport_functions);
+	if (!mpt2sas_transport_template)
+		return -ENODEV;
+
+	mpt2sas_base_initialize_callback_handler();
+
+	 /* queuecommand callback hander */
+	scsi_io_cb_idx = mpt2sas_base_register_callback_handler(scsih_io_done);
+
+	/* task managment callback handler */
+	tm_cb_idx = mpt2sas_base_register_callback_handler(scsih_tm_done);
+
+	/* base internal commands callback handler */
+	base_cb_idx = mpt2sas_base_register_callback_handler(mpt2sas_base_done);
+
+	/* transport internal commands callback handler */
+	transport_cb_idx = mpt2sas_base_register_callback_handler(
+	    mpt2sas_transport_done);
+
+	/* configuration page API internal commands callback handler */
+	config_cb_idx = mpt2sas_base_register_callback_handler(
+	    mpt2sas_config_done);
+
+	/* ctl module callback handler */
+	ctl_cb_idx = mpt2sas_base_register_callback_handler(mpt2sas_ctl_done);
+
+	mpt2sas_ctl_init();
+
+	error = pci_register_driver(&scsih_driver);
+	if (error)
+		sas_release_transport(mpt2sas_transport_template);
+
+	return error;
+}
+
+/**
+ * scsih_exit - exit point for this driver (when it is a module).
+ *
+ * Returns 0 success, anything else error.
+ */
+static void __exit
+scsih_exit(void)
+{
+	printk(KERN_INFO "mpt2sas version %s unloading\n",
+	    MPT2SAS_DRIVER_VERSION);
+
+	pci_unregister_driver(&scsih_driver);
+
+	sas_release_transport(mpt2sas_transport_template);
+	mpt2sas_base_release_callback_handler(scsi_io_cb_idx);
+	mpt2sas_base_release_callback_handler(tm_cb_idx);
+	mpt2sas_base_release_callback_handler(base_cb_idx);
+	mpt2sas_base_release_callback_handler(transport_cb_idx);
+	mpt2sas_base_release_callback_handler(config_cb_idx);
+	mpt2sas_base_release_callback_handler(ctl_cb_idx);
+
+	mpt2sas_ctl_exit();
+}
+
+module_init(scsih_init);
+module_exit(scsih_exit);
diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c b/drivers/scsi/mpt2sas/mpt2sas_transport.c
new file mode 100644
index 000000000000..e03dc0b1e1a0
--- /dev/null
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -0,0 +1,1211 @@
+/*
+ * SAS Transport Layer for MPT (Message Passing Technology) based controllers
+ *
+ * This code is based on drivers/scsi/mpt2sas/mpt2_transport.c
+ * Copyright (C) 2007-2008  LSI Corporation
+ *  (mailto:DL-MPTFusionLinux@lsi.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * NO WARRANTY
+ * THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT
+ * LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is
+ * solely responsible for determining the appropriateness of using and
+ * distributing the Program and assumes all risks associated with its
+ * exercise of rights under this Agreement, including but not limited to
+ * the risks and costs of program errors, damage to or loss of data,
+ * programs or equipment, and unavailability or interruption of operations.
+
+ * DISCLAIMER OF LIABILITY
+ * NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
+ * TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
+ * USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED
+ * HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301,
+ * USA.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/sched.h>
+#include <linux/workqueue.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_transport_sas.h>
+#include <scsi/scsi_dbg.h>
+
+#include "mpt2sas_base.h"
+/**
+ * _transport_sas_node_find_by_handle - sas node search
+ * @ioc: per adapter object
+ * @handle: expander or hba handle (assigned by firmware)
+ * Context: Calling function should acquire ioc->sas_node_lock.
+ *
+ * Search for either hba phys or expander device based on handle, then returns
+ * the sas_node object.
+ */
+static struct _sas_node *
+_transport_sas_node_find_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
+{
+	int i;
+
+	for (i = 0; i < ioc->sas_hba.num_phys; i++)
+		if (ioc->sas_hba.phy[i].handle == handle)
+			return &ioc->sas_hba;
+
+	return mpt2sas_scsih_expander_find_by_handle(ioc, handle);
+}
+
+/**
+ * _transport_convert_phy_link_rate -
+ * @link_rate: link rate returned from mpt firmware
+ *
+ * Convert link_rate from mpi fusion into sas_transport form.
+ */
+static enum sas_linkrate
+_transport_convert_phy_link_rate(u8 link_rate)
+{
+	enum sas_linkrate rc;
+
+	switch (link_rate) {
+	case MPI2_SAS_NEG_LINK_RATE_1_5:
+		rc = SAS_LINK_RATE_1_5_GBPS;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_3_0:
+		rc = SAS_LINK_RATE_3_0_GBPS;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_6_0:
+		rc = SAS_LINK_RATE_6_0_GBPS;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_PHY_DISABLED:
+		rc = SAS_PHY_DISABLED;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_NEGOTIATION_FAILED:
+		rc = SAS_LINK_RATE_FAILED;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_PORT_SELECTOR:
+		rc = SAS_SATA_PORT_SELECTOR;
+		break;
+	case MPI2_SAS_NEG_LINK_RATE_SMP_RESET_IN_PROGRESS:
+		rc = SAS_PHY_RESET_IN_PROGRESS;
+		break;
+	default:
+	case MPI2_SAS_NEG_LINK_RATE_SATA_OOB_COMPLETE:
+	case MPI2_SAS_NEG_LINK_RATE_UNKNOWN_LINK_RATE:
+		rc = SAS_LINK_RATE_UNKNOWN;
+		break;
+	}
+	return rc;
+}
+
+/**
+ * _transport_set_identify - set identify for phys and end devices
+ * @ioc: per adapter object
+ * @handle: device handle
+ * @identify: sas identify info
+ *
+ * Populates sas identify info.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+_transport_set_identify(struct MPT2SAS_ADAPTER *ioc, u16 handle,
+    struct sas_identify *identify)
+{
+	Mpi2SasDevicePage0_t sas_device_pg0;
+	Mpi2ConfigReply_t mpi_reply;
+	u32 device_info;
+	u32 ioc_status;
+
+	if ((mpt2sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
+	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	ioc_status = le16_to_cpu(mpi_reply.IOCStatus) &
+	    MPI2_IOCSTATUS_MASK;
+	if (ioc_status != MPI2_IOCSTATUS_SUCCESS) {
+		printk(MPT2SAS_ERR_FMT "handle(0x%04x), ioc_status(0x%04x)"
+		    "\nfailure at %s:%d/%s()!\n", ioc->name, handle, ioc_status,
+		     __FILE__, __LINE__, __func__);
+		return -1;
+	}
+
+	memset(identify, 0, sizeof(identify));
+	device_info = le32_to_cpu(sas_device_pg0.DeviceInfo);
+
+	/* sas_address */
+	identify->sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
+
+	/* device_type */
+	switch (device_info & MPI2_SAS_DEVICE_INFO_MASK_DEVICE_TYPE) {
+	case MPI2_SAS_DEVICE_INFO_NO_DEVICE:
+		identify->device_type = SAS_PHY_UNUSED;
+		break;
+	case MPI2_SAS_DEVICE_INFO_END_DEVICE:
+		identify->device_type = SAS_END_DEVICE;
+		break;
+	case MPI2_SAS_DEVICE_INFO_EDGE_EXPANDER:
+		identify->device_type = SAS_EDGE_EXPANDER_DEVICE;
+		break;
+	case MPI2_SAS_DEVICE_INFO_FANOUT_EXPANDER:
+		identify->device_type = SAS_FANOUT_EXPANDER_DEVICE;
+		break;
+	}
+
+	/* initiator_port_protocols */
+	if (device_info & MPI2_SAS_DEVICE_INFO_SSP_INITIATOR)
+		identify->initiator_port_protocols |= SAS_PROTOCOL_SSP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_STP_INITIATOR)
+		identify->initiator_port_protocols |= SAS_PROTOCOL_STP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_SMP_INITIATOR)
+		identify->initiator_port_protocols |= SAS_PROTOCOL_SMP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_SATA_HOST)
+		identify->initiator_port_protocols |= SAS_PROTOCOL_SATA;
+
+	/* target_port_protocols */
+	if (device_info & MPI2_SAS_DEVICE_INFO_SSP_TARGET)
+		identify->target_port_protocols |= SAS_PROTOCOL_SSP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_STP_TARGET)
+		identify->target_port_protocols |= SAS_PROTOCOL_STP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_SMP_TARGET)
+		identify->target_port_protocols |= SAS_PROTOCOL_SMP;
+	if (device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+		identify->target_port_protocols |= SAS_PROTOCOL_SATA;
+
+	return 0;
+}
+
+/**
+ * mpt2sas_transport_done -  internal transport layer callback handler.
+ * @ioc: per adapter object
+ * @smid: system request message index
+ * @VF_ID: virtual function id
+ * @reply: reply message frame(lower 32bit addr)
+ *
+ * Callback handler when sending internal generated transport cmds.
+ * The callback index passed is `ioc->transport_cb_idx`
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_transport_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 VF_ID,
+    u32 reply)
+{
+	MPI2DefaultReply_t *mpi_reply;
+
+	mpi_reply =  mpt2sas_base_get_reply_virt_addr(ioc, reply);
+	if (ioc->transport_cmds.status == MPT2_CMD_NOT_USED)
+		return;
+	if (ioc->transport_cmds.smid != smid)
+		return;
+	ioc->transport_cmds.status |= MPT2_CMD_COMPLETE;
+	if (mpi_reply) {
+		memcpy(ioc->transport_cmds.reply, mpi_reply,
+		    mpi_reply->MsgLength*4);
+		ioc->transport_cmds.status |= MPT2_CMD_REPLY_VALID;
+	}
+	ioc->transport_cmds.status &= ~MPT2_CMD_PENDING;
+	complete(&ioc->transport_cmds.done);
+}
+
+/* report manufacture request structure */
+struct rep_manu_request{
+	u8 smp_frame_type;
+	u8 function;
+	u8 reserved;
+	u8 request_length;
+};
+
+/* report manufacture reply structure */
+struct rep_manu_reply{
+	u8 smp_frame_type; /* 0x41 */
+	u8 function; /* 0x01 */
+	u8 function_result;
+	u8 response_length;
+	u16 expander_change_count;
+	u8 reserved0[2];
+	u8 sas_format:1;
+	u8 reserved1:7;
+	u8 reserved2[3];
+	u8 vendor_id[SAS_EXPANDER_VENDOR_ID_LEN];
+	u8 product_id[SAS_EXPANDER_PRODUCT_ID_LEN];
+	u8 product_rev[SAS_EXPANDER_PRODUCT_REV_LEN];
+	u8 component_vendor_id[SAS_EXPANDER_COMPONENT_VENDOR_ID_LEN];
+	u16 component_id;
+	u8 component_revision_id;
+	u8 reserved3;
+	u8 vendor_specific[8];
+};
+
+/**
+ * transport_expander_report_manufacture - obtain SMP report_manufacture
+ * @ioc: per adapter object
+ * @sas_address: expander sas address
+ * @edev: the sas_expander_device object
+ *
+ * Fills in the sas_expander_device object when SMP port is created.
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+transport_expander_report_manufacture(struct MPT2SAS_ADAPTER *ioc,
+    u64 sas_address, struct sas_expander_device *edev)
+{
+	Mpi2SmpPassthroughRequest_t *mpi_request;
+	Mpi2SmpPassthroughReply_t *mpi_reply;
+	struct rep_manu_reply *manufacture_reply;
+	struct rep_manu_request *manufacture_request;
+	int rc;
+	u16 smid;
+	u32 ioc_state;
+	unsigned long timeleft;
+	void *psge;
+	u32 sgl_flags;
+	u8 issue_reset = 0;
+	unsigned long flags;
+	void *data_out = NULL;
+	dma_addr_t data_out_dma;
+	u32 sz;
+	u64 *sas_address_le;
+	u16 wait_state_count;
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->ioc_reset_in_progress) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		printk(MPT2SAS_INFO_FMT "%s: host reset in progress!\n",
+		    __func__, ioc->name);
+		return -EFAULT;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	mutex_lock(&ioc->transport_cmds.mutex);
+
+	if (ioc->transport_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: transport_cmds in use\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+	ioc->transport_cmds.status = MPT2_CMD_PENDING;
+
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == 10) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			rc = -EFAULT;
+			goto out;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+	if (wait_state_count)
+		printk(MPT2SAS_INFO_FMT "%s: ioc is operational\n",
+		    ioc->name, __func__);
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->transport_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->transport_cmds.smid = smid;
+
+	sz = sizeof(struct rep_manu_request) + sizeof(struct rep_manu_reply);
+	data_out = pci_alloc_consistent(ioc->pdev, sz, &data_out_dma);
+
+	if (!data_out) {
+		printk(KERN_ERR "failure at %s:%d/%s()!\n", __FILE__,
+		    __LINE__, __func__);
+		rc = -ENOMEM;
+		mpt2sas_base_free_smid(ioc, smid);
+		goto out;
+	}
+
+	manufacture_request = data_out;
+	manufacture_request->smp_frame_type = 0x40;
+	manufacture_request->function = 1;
+	manufacture_request->reserved = 0;
+	manufacture_request->request_length = 0;
+
+	memset(mpi_request, 0, sizeof(Mpi2SmpPassthroughRequest_t));
+	mpi_request->Function = MPI2_FUNCTION_SMP_PASSTHROUGH;
+	mpi_request->PhysicalPort = 0xFF;
+	sas_address_le = (u64 *)&mpi_request->SASAddress;
+	*sas_address_le = cpu_to_le64(sas_address);
+	mpi_request->RequestDataLength = sizeof(struct rep_manu_request);
+	psge = &mpi_request->SGL;
+
+	/* WRITE sgel first */
+	sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+	    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
+	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+	ioc->base_add_sg_single(psge, sgl_flags |
+	    sizeof(struct rep_manu_request), data_out_dma);
+
+	/* incr sgel */
+	psge += ioc->sge_size;
+
+	/* READ sgel last */
+	sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+	    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
+	    MPI2_SGE_FLAGS_END_OF_LIST);
+	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+	ioc->base_add_sg_single(psge, sgl_flags |
+	    sizeof(struct rep_manu_reply), data_out_dma +
+	    sizeof(struct rep_manu_request));
+
+	dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT "report_manufacture - "
+	    "send to sas_addr(0x%016llx)\n", ioc->name,
+	    (unsigned long long)sas_address));
+	mpt2sas_base_put_smid_default(ioc, smid, 0 /* VF_ID */);
+	timeleft = wait_for_completion_timeout(&ioc->transport_cmds.done,
+	    10*HZ);
+
+	if (!(ioc->transport_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s: timeout\n",
+		    ioc->name, __func__);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2SmpPassthroughRequest_t)/4);
+		if (!(ioc->transport_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT "report_manufacture - "
+	    "complete\n", ioc->name));
+
+	if (ioc->transport_cmds.status & MPT2_CMD_REPLY_VALID) {
+		u8 *tmp;
+
+		mpi_reply = ioc->transport_cmds.reply;
+
+		dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "report_manufacture - reply data transfer size(%d)\n",
+		    ioc->name, le16_to_cpu(mpi_reply->ResponseDataLength)));
+
+		if (le16_to_cpu(mpi_reply->ResponseDataLength) !=
+		    sizeof(struct rep_manu_reply))
+			goto out;
+
+		manufacture_reply = data_out + sizeof(struct rep_manu_request);
+		strncpy(edev->vendor_id, manufacture_reply->vendor_id,
+		     SAS_EXPANDER_VENDOR_ID_LEN);
+		strncpy(edev->product_id, manufacture_reply->product_id,
+		     SAS_EXPANDER_PRODUCT_ID_LEN);
+		strncpy(edev->product_rev, manufacture_reply->product_rev,
+		     SAS_EXPANDER_PRODUCT_REV_LEN);
+		edev->level = manufacture_reply->sas_format;
+		if (manufacture_reply->sas_format) {
+			strncpy(edev->component_vendor_id,
+			    manufacture_reply->component_vendor_id,
+			     SAS_EXPANDER_COMPONENT_VENDOR_ID_LEN);
+			tmp = (u8 *)&manufacture_reply->component_id;
+			edev->component_id = tmp[0] << 8 | tmp[1];
+			edev->component_revision_id =
+			    manufacture_reply->component_revision_id;
+		}
+	} else
+		dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "report_manufacture - no reply\n", ioc->name));
+
+ issue_host_reset:
+	if (issue_reset)
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+ out:
+	ioc->transport_cmds.status = MPT2_CMD_NOT_USED;
+	if (data_out)
+		pci_free_consistent(ioc->pdev, sz, data_out, data_out_dma);
+
+	mutex_unlock(&ioc->transport_cmds.mutex);
+	return rc;
+}
+
+/**
+ * mpt2sas_transport_port_add - insert port to the list
+ * @ioc: per adapter object
+ * @handle: handle of attached device
+ * @parent_handle: parent handle(either hba or expander)
+ * Context: This function will acquire ioc->sas_node_lock.
+ *
+ * Adding new port object to the sas_node->sas_port_list.
+ *
+ * Returns mpt2sas_port.
+ */
+struct _sas_port *
+mpt2sas_transport_port_add(struct MPT2SAS_ADAPTER *ioc, u16 handle,
+    u16 parent_handle)
+{
+	struct _sas_phy *mpt2sas_phy, *next;
+	struct _sas_port *mpt2sas_port;
+	unsigned long flags;
+	struct _sas_node *sas_node;
+	struct sas_rphy *rphy;
+	int i;
+	struct sas_port *port;
+
+	if (!parent_handle)
+		return NULL;
+
+	mpt2sas_port = kzalloc(sizeof(struct _sas_port),
+	    GFP_KERNEL);
+	if (!mpt2sas_port) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return NULL;
+	}
+
+	INIT_LIST_HEAD(&mpt2sas_port->port_list);
+	INIT_LIST_HEAD(&mpt2sas_port->phy_list);
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_node = _transport_sas_node_find_by_handle(ioc, parent_handle);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+
+	if (!sas_node) {
+		printk(MPT2SAS_ERR_FMT "%s: Could not find parent(0x%04x)!\n",
+		    ioc->name, __func__, parent_handle);
+		goto out_fail;
+	}
+
+	mpt2sas_port->handle = parent_handle;
+	mpt2sas_port->sas_address = sas_node->sas_address;
+	if ((_transport_set_identify(ioc, handle,
+	    &mpt2sas_port->remote_identify))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_fail;
+	}
+
+	if (mpt2sas_port->remote_identify.device_type == SAS_PHY_UNUSED) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_fail;
+	}
+
+	for (i = 0; i < sas_node->num_phys; i++) {
+		if (sas_node->phy[i].remote_identify.sas_address !=
+		    mpt2sas_port->remote_identify.sas_address)
+			continue;
+		list_add_tail(&sas_node->phy[i].port_siblings,
+		    &mpt2sas_port->phy_list);
+		mpt2sas_port->num_phys++;
+	}
+
+	if (!mpt2sas_port->num_phys) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_fail;
+	}
+
+	port = sas_port_alloc_num(sas_node->parent_dev);
+	if ((sas_port_add(port))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		goto out_fail;
+	}
+
+	list_for_each_entry(mpt2sas_phy, &mpt2sas_port->phy_list,
+	    port_siblings) {
+		if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+			dev_printk(KERN_INFO, &port->dev, "add: handle(0x%04x)"
+			    ", sas_addr(0x%016llx), phy(%d)\n", handle,
+			    (unsigned long long)
+			    mpt2sas_port->remote_identify.sas_address,
+			    mpt2sas_phy->phy_id);
+		sas_port_add_phy(port, mpt2sas_phy->phy);
+	}
+
+	mpt2sas_port->port = port;
+	if (mpt2sas_port->remote_identify.device_type == SAS_END_DEVICE)
+		rphy = sas_end_device_alloc(port);
+	else
+		rphy = sas_expander_alloc(port,
+		    mpt2sas_port->remote_identify.device_type);
+
+	rphy->identify = mpt2sas_port->remote_identify;
+	if ((sas_rphy_add(rphy))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+	}
+	if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+		dev_printk(KERN_INFO, &rphy->dev, "add: handle(0x%04x), "
+		    "sas_addr(0x%016llx)\n", handle,
+		    (unsigned long long)
+		    mpt2sas_port->remote_identify.sas_address);
+	mpt2sas_port->rphy = rphy;
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	list_add_tail(&mpt2sas_port->port_list, &sas_node->sas_port_list);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+
+	/* fill in report manufacture */
+	if (mpt2sas_port->remote_identify.device_type ==
+	    MPI2_SAS_DEVICE_INFO_EDGE_EXPANDER ||
+	    mpt2sas_port->remote_identify.device_type ==
+	    MPI2_SAS_DEVICE_INFO_FANOUT_EXPANDER)
+		transport_expander_report_manufacture(ioc,
+		    mpt2sas_port->remote_identify.sas_address,
+		    rphy_to_expander_device(rphy));
+
+	return mpt2sas_port;
+
+ out_fail:
+	list_for_each_entry_safe(mpt2sas_phy, next, &mpt2sas_port->phy_list,
+	    port_siblings)
+		list_del(&mpt2sas_phy->port_siblings);
+	kfree(mpt2sas_port);
+	return NULL;
+}
+
+/**
+ * mpt2sas_transport_port_remove - remove port from the list
+ * @ioc: per adapter object
+ * @sas_address: sas address of attached device
+ * @parent_handle: handle to the upstream parent(either hba or expander)
+ * Context: This function will acquire ioc->sas_node_lock.
+ *
+ * Removing object and freeing associated memory from the
+ * ioc->sas_port_list.
+ *
+ * Return nothing.
+ */
+void
+mpt2sas_transport_port_remove(struct MPT2SAS_ADAPTER *ioc, u64 sas_address,
+    u16 parent_handle)
+{
+	int i;
+	unsigned long flags;
+	struct _sas_port *mpt2sas_port, *next;
+	struct _sas_node *sas_node;
+	u8 found = 0;
+	struct _sas_phy *mpt2sas_phy, *next_phy;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_node = _transport_sas_node_find_by_handle(ioc, parent_handle);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+	if (!sas_node)
+		return;
+	list_for_each_entry_safe(mpt2sas_port, next, &sas_node->sas_port_list,
+	    port_list) {
+		if (mpt2sas_port->remote_identify.sas_address != sas_address)
+			continue;
+		found = 1;
+		list_del(&mpt2sas_port->port_list);
+		goto out;
+	}
+ out:
+	if (!found)
+		return;
+
+	for (i = 0; i < sas_node->num_phys; i++) {
+		if (sas_node->phy[i].remote_identify.sas_address == sas_address)
+			memset(&sas_node->phy[i].remote_identify, 0 ,
+			    sizeof(struct sas_identify));
+	}
+
+	list_for_each_entry_safe(mpt2sas_phy, next_phy,
+	    &mpt2sas_port->phy_list, port_siblings) {
+		if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+			dev_printk(KERN_INFO, &mpt2sas_port->port->dev,
+			    "remove: parent_handle(0x%04x), "
+			    "sas_addr(0x%016llx), phy(%d)\n", parent_handle,
+			    (unsigned long long)
+			    mpt2sas_port->remote_identify.sas_address,
+			    mpt2sas_phy->phy_id);
+		sas_port_delete_phy(mpt2sas_port->port, mpt2sas_phy->phy);
+		list_del(&mpt2sas_phy->port_siblings);
+	}
+	sas_port_delete(mpt2sas_port->port);
+	kfree(mpt2sas_port);
+}
+
+/**
+ * mpt2sas_transport_add_host_phy - report sas_host phy to transport
+ * @ioc: per adapter object
+ * @mpt2sas_phy: mpt2sas per phy object
+ * @phy_pg0: sas phy page 0
+ * @parent_dev: parent device class object
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_transport_add_host_phy(struct MPT2SAS_ADAPTER *ioc, struct _sas_phy
+    *mpt2sas_phy, Mpi2SasPhyPage0_t phy_pg0, struct device *parent_dev)
+{
+	struct sas_phy *phy;
+	int phy_index = mpt2sas_phy->phy_id;
+
+
+	INIT_LIST_HEAD(&mpt2sas_phy->port_siblings);
+	phy = sas_phy_alloc(parent_dev, phy_index);
+	if (!phy) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+	if ((_transport_set_identify(ioc, mpt2sas_phy->handle,
+	    &mpt2sas_phy->identify))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+	phy->identify = mpt2sas_phy->identify;
+	mpt2sas_phy->attached_handle = le16_to_cpu(phy_pg0.AttachedDevHandle);
+	if (mpt2sas_phy->attached_handle)
+		_transport_set_identify(ioc, mpt2sas_phy->attached_handle,
+		    &mpt2sas_phy->remote_identify);
+	phy->identify.phy_identifier = mpt2sas_phy->phy_id;
+	phy->negotiated_linkrate = _transport_convert_phy_link_rate(
+	    phy_pg0.NegotiatedLinkRate & MPI2_SAS_NEG_LINK_RATE_MASK_PHYSICAL);
+	phy->minimum_linkrate_hw = _transport_convert_phy_link_rate(
+	    phy_pg0.HwLinkRate & MPI2_SAS_HWRATE_MIN_RATE_MASK);
+	phy->maximum_linkrate_hw = _transport_convert_phy_link_rate(
+	    phy_pg0.HwLinkRate >> 4);
+	phy->minimum_linkrate = _transport_convert_phy_link_rate(
+	    phy_pg0.ProgrammedLinkRate & MPI2_SAS_PRATE_MIN_RATE_MASK);
+	phy->maximum_linkrate = _transport_convert_phy_link_rate(
+	    phy_pg0.ProgrammedLinkRate >> 4);
+
+	if ((sas_phy_add(phy))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		sas_phy_free(phy);
+		return -1;
+	}
+	if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+		dev_printk(KERN_INFO, &phy->dev,
+		    "add: handle(0x%04x), sas_addr(0x%016llx)\n"
+		    "\tattached_handle(0x%04x), sas_addr(0x%016llx)\n",
+		    mpt2sas_phy->handle, (unsigned long long)
+		    mpt2sas_phy->identify.sas_address,
+		    mpt2sas_phy->attached_handle,
+		    (unsigned long long)
+		    mpt2sas_phy->remote_identify.sas_address);
+	mpt2sas_phy->phy = phy;
+	return 0;
+}
+
+
+/**
+ * mpt2sas_transport_add_expander_phy - report expander phy to transport
+ * @ioc: per adapter object
+ * @mpt2sas_phy: mpt2sas per phy object
+ * @expander_pg1: expander page 1
+ * @parent_dev: parent device class object
+ *
+ * Returns 0 for success, non-zero for failure.
+ */
+int
+mpt2sas_transport_add_expander_phy(struct MPT2SAS_ADAPTER *ioc, struct _sas_phy
+    *mpt2sas_phy, Mpi2ExpanderPage1_t expander_pg1, struct device *parent_dev)
+{
+	struct sas_phy *phy;
+	int phy_index = mpt2sas_phy->phy_id;
+
+	INIT_LIST_HEAD(&mpt2sas_phy->port_siblings);
+	phy = sas_phy_alloc(parent_dev, phy_index);
+	if (!phy) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+	if ((_transport_set_identify(ioc, mpt2sas_phy->handle,
+	    &mpt2sas_phy->identify))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -1;
+	}
+	phy->identify = mpt2sas_phy->identify;
+	mpt2sas_phy->attached_handle =
+	    le16_to_cpu(expander_pg1.AttachedDevHandle);
+	if (mpt2sas_phy->attached_handle)
+		_transport_set_identify(ioc, mpt2sas_phy->attached_handle,
+		    &mpt2sas_phy->remote_identify);
+	phy->identify.phy_identifier = mpt2sas_phy->phy_id;
+	phy->negotiated_linkrate = _transport_convert_phy_link_rate(
+	    expander_pg1.NegotiatedLinkRate &
+	    MPI2_SAS_NEG_LINK_RATE_MASK_PHYSICAL);
+	phy->minimum_linkrate_hw = _transport_convert_phy_link_rate(
+	    expander_pg1.HwLinkRate & MPI2_SAS_HWRATE_MIN_RATE_MASK);
+	phy->maximum_linkrate_hw = _transport_convert_phy_link_rate(
+	    expander_pg1.HwLinkRate >> 4);
+	phy->minimum_linkrate = _transport_convert_phy_link_rate(
+	    expander_pg1.ProgrammedLinkRate & MPI2_SAS_PRATE_MIN_RATE_MASK);
+	phy->maximum_linkrate = _transport_convert_phy_link_rate(
+	    expander_pg1.ProgrammedLinkRate >> 4);
+
+	if ((sas_phy_add(phy))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		sas_phy_free(phy);
+		return -1;
+	}
+	if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+		dev_printk(KERN_INFO, &phy->dev,
+		    "add: handle(0x%04x), sas_addr(0x%016llx)\n"
+		    "\tattached_handle(0x%04x), sas_addr(0x%016llx)\n",
+		    mpt2sas_phy->handle, (unsigned long long)
+		    mpt2sas_phy->identify.sas_address,
+		    mpt2sas_phy->attached_handle,
+		    (unsigned long long)
+		    mpt2sas_phy->remote_identify.sas_address);
+	mpt2sas_phy->phy = phy;
+	return 0;
+}
+
+/**
+ * mpt2sas_transport_update_phy_link_change - refreshing phy link changes and attached devices
+ * @ioc: per adapter object
+ * @handle: handle to sas_host or expander
+ * @attached_handle: attached device handle
+ * @phy_numberv: phy number
+ * @link_rate: new link rate
+ *
+ * Returns nothing.
+ */
+void
+mpt2sas_transport_update_phy_link_change(struct MPT2SAS_ADAPTER *ioc,
+    u16 handle, u16 attached_handle, u8 phy_number, u8 link_rate)
+{
+	unsigned long flags;
+	struct _sas_node *sas_node;
+	struct _sas_phy *mpt2sas_phy;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_node = _transport_sas_node_find_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+	if (!sas_node)
+		return;
+
+	mpt2sas_phy = &sas_node->phy[phy_number];
+	mpt2sas_phy->attached_handle = attached_handle;
+	if (attached_handle && (link_rate >= MPI2_SAS_NEG_LINK_RATE_1_5))
+		_transport_set_identify(ioc, mpt2sas_phy->attached_handle,
+		    &mpt2sas_phy->remote_identify);
+	else
+		memset(&mpt2sas_phy->remote_identify, 0 , sizeof(struct
+		    sas_identify));
+
+	if (mpt2sas_phy->phy)
+		mpt2sas_phy->phy->negotiated_linkrate =
+		    _transport_convert_phy_link_rate(link_rate);
+
+	if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
+		dev_printk(KERN_INFO, &mpt2sas_phy->phy->dev,
+		    "refresh: handle(0x%04x), sas_addr(0x%016llx),\n"
+		    "\tlink_rate(0x%02x), phy(%d)\n"
+		    "\tattached_handle(0x%04x), sas_addr(0x%016llx)\n",
+		    handle, (unsigned long long)
+		    mpt2sas_phy->identify.sas_address, link_rate,
+		    phy_number, attached_handle,
+		    (unsigned long long)
+		    mpt2sas_phy->remote_identify.sas_address);
+}
+
+static inline void *
+phy_to_ioc(struct sas_phy *phy)
+{
+	struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+	return shost_priv(shost);
+}
+
+static inline void *
+rphy_to_ioc(struct sas_rphy *rphy)
+{
+	struct Scsi_Host *shost = dev_to_shost(rphy->dev.parent->parent);
+	return shost_priv(shost);
+}
+
+/**
+ * transport_get_linkerrors -
+ * @phy: The sas phy object
+ *
+ * Only support sas_host direct attached phys.
+ * Returns 0 for success, non-zero for failure.
+ *
+ */
+static int
+transport_get_linkerrors(struct sas_phy *phy)
+{
+	struct MPT2SAS_ADAPTER *ioc = phy_to_ioc(phy);
+	struct _sas_phy *mpt2sas_phy;
+	Mpi2ConfigReply_t mpi_reply;
+	Mpi2SasPhyPage1_t phy_pg1;
+	int i;
+
+	for (i = 0, mpt2sas_phy = NULL; i < ioc->sas_hba.num_phys &&
+	    !mpt2sas_phy; i++) {
+		if (ioc->sas_hba.phy[i].phy != phy)
+			continue;
+		mpt2sas_phy = &ioc->sas_hba.phy[i];
+	}
+
+	if (!mpt2sas_phy) /* this phy not on sas_host */
+		return -EINVAL;
+
+	if ((mpt2sas_config_get_phy_pg1(ioc, &mpi_reply, &phy_pg1,
+		    mpt2sas_phy->phy_id))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -ENXIO;
+	}
+
+	if (mpi_reply.IOCStatus || mpi_reply.IOCLogInfo)
+		printk(MPT2SAS_INFO_FMT "phy(%d), ioc_status"
+		    "(0x%04x), loginfo(0x%08x)\n", ioc->name,
+		    mpt2sas_phy->phy_id,
+		    le16_to_cpu(mpi_reply.IOCStatus),
+		    le32_to_cpu(mpi_reply.IOCLogInfo));
+
+	phy->invalid_dword_count = le32_to_cpu(phy_pg1.InvalidDwordCount);
+	phy->running_disparity_error_count =
+	    le32_to_cpu(phy_pg1.RunningDisparityErrorCount);
+	phy->loss_of_dword_sync_count =
+	    le32_to_cpu(phy_pg1.LossDwordSynchCount);
+	phy->phy_reset_problem_count =
+	    le32_to_cpu(phy_pg1.PhyResetProblemCount);
+	return 0;
+}
+
+/**
+ * transport_get_enclosure_identifier -
+ * @phy: The sas phy object
+ *
+ * Obtain the enclosure logical id for an expander.
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
+{
+	struct MPT2SAS_ADAPTER *ioc = rphy_to_ioc(rphy);
+	struct _sas_node *sas_expander;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_node_lock, flags);
+	sas_expander = mpt2sas_scsih_expander_find_by_sas_address(ioc,
+	    rphy->identify.sas_address);
+	spin_unlock_irqrestore(&ioc->sas_node_lock, flags);
+
+	if (!sas_expander)
+		return -ENXIO;
+
+	*identifier = sas_expander->enclosure_logical_id;
+	return 0;
+}
+
+/**
+ * transport_get_bay_identifier -
+ * @phy: The sas phy object
+ *
+ * Returns the slot id for a device that resides inside an enclosure.
+ */
+static int
+transport_get_bay_identifier(struct sas_rphy *rphy)
+{
+	struct MPT2SAS_ADAPTER *ioc = rphy_to_ioc(rphy);
+	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
+	    rphy->identify.sas_address);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	if (!sas_device)
+		return -ENXIO;
+
+	return sas_device->slot;
+}
+
+/**
+ * transport_phy_reset -
+ * @phy: The sas phy object
+ * @hard_reset:
+ *
+ * Only support sas_host direct attached phys.
+ * Returns 0 for success, non-zero for failure.
+ */
+static int
+transport_phy_reset(struct sas_phy *phy, int hard_reset)
+{
+	struct MPT2SAS_ADAPTER *ioc = phy_to_ioc(phy);
+	struct _sas_phy *mpt2sas_phy;
+	Mpi2SasIoUnitControlReply_t mpi_reply;
+	Mpi2SasIoUnitControlRequest_t mpi_request;
+	int i;
+
+	for (i = 0, mpt2sas_phy = NULL; i < ioc->sas_hba.num_phys &&
+	    !mpt2sas_phy; i++) {
+		if (ioc->sas_hba.phy[i].phy != phy)
+			continue;
+		mpt2sas_phy = &ioc->sas_hba.phy[i];
+	}
+
+	if (!mpt2sas_phy) /* this phy not on sas_host */
+		return -EINVAL;
+
+	memset(&mpi_request, 0, sizeof(Mpi2SasIoUnitControlReply_t));
+	mpi_request.Function = MPI2_FUNCTION_SAS_IO_UNIT_CONTROL;
+	mpi_request.Operation = hard_reset ?
+	    MPI2_SAS_OP_PHY_HARD_RESET : MPI2_SAS_OP_PHY_LINK_RESET;
+	mpi_request.PhyNum = mpt2sas_phy->phy_id;
+
+	if ((mpt2sas_base_sas_iounit_control(ioc, &mpi_reply, &mpi_request))) {
+		printk(MPT2SAS_ERR_FMT "failure at %s:%d/%s()!\n",
+		    ioc->name, __FILE__, __LINE__, __func__);
+		return -ENXIO;
+	}
+
+	if (mpi_reply.IOCStatus || mpi_reply.IOCLogInfo)
+		printk(MPT2SAS_INFO_FMT "phy(%d), ioc_status"
+		    "(0x%04x), loginfo(0x%08x)\n", ioc->name,
+		    mpt2sas_phy->phy_id,
+		    le16_to_cpu(mpi_reply.IOCStatus),
+		    le32_to_cpu(mpi_reply.IOCLogInfo));
+
+	return 0;
+}
+
+/**
+ * transport_smp_handler - transport portal for smp passthru
+ * @shost: shost object
+ * @rphy: sas transport rphy object
+ * @req:
+ *
+ * This used primarily for smp_utils.
+ * Example:
+ *           smp_rep_general /sys/class/bsg/expander-5:0
+ */
+static int
+transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
+    struct request *req)
+{
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	Mpi2SmpPassthroughRequest_t *mpi_request;
+	Mpi2SmpPassthroughReply_t *mpi_reply;
+	int rc;
+	u16 smid;
+	u32 ioc_state;
+	unsigned long timeleft;
+	void *psge;
+	u32 sgl_flags;
+	u8 issue_reset = 0;
+	unsigned long flags;
+	dma_addr_t dma_addr_in = 0;
+	dma_addr_t dma_addr_out = 0;
+	u16 wait_state_count;
+	struct request *rsp = req->next_rq;
+
+	if (!rsp) {
+		printk(MPT2SAS_ERR_FMT "%s: the smp response space is "
+		    "missing\n", ioc->name, __func__);
+		return -EINVAL;
+	}
+
+	/* do we need to support multiple segments? */
+	if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
+		printk(MPT2SAS_ERR_FMT "%s: multiple segments req %u %u, "
+		    "rsp %u %u\n", ioc->name, __func__, req->bio->bi_vcnt,
+		    req->data_len, rsp->bio->bi_vcnt, rsp->data_len);
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
+	if (ioc->ioc_reset_in_progress) {
+		spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+		printk(MPT2SAS_INFO_FMT "%s: host reset in progress!\n",
+		    __func__, ioc->name);
+		return -EFAULT;
+	}
+	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
+
+	rc = mutex_lock_interruptible(&ioc->transport_cmds.mutex);
+	if (rc)
+		return rc;
+
+	if (ioc->transport_cmds.status != MPT2_CMD_NOT_USED) {
+		printk(MPT2SAS_ERR_FMT "%s: transport_cmds in use\n", ioc->name,
+		    __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+	ioc->transport_cmds.status = MPT2_CMD_PENDING;
+
+	wait_state_count = 0;
+	ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+		if (wait_state_count++ == 10) {
+			printk(MPT2SAS_ERR_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			rc = -EFAULT;
+			goto out;
+		}
+		ssleep(1);
+		ioc_state = mpt2sas_base_get_iocstate(ioc, 1);
+		printk(MPT2SAS_INFO_FMT "%s: waiting for "
+		    "operational state(count=%d)\n", ioc->name,
+		    __func__, wait_state_count);
+	}
+	if (wait_state_count)
+		printk(MPT2SAS_INFO_FMT "%s: ioc is operational\n",
+		    ioc->name, __func__);
+
+	smid = mpt2sas_base_get_smid(ioc, ioc->transport_cb_idx);
+	if (!smid) {
+		printk(MPT2SAS_ERR_FMT "%s: failed obtaining a smid\n",
+		    ioc->name, __func__);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	rc = 0;
+	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
+	ioc->transport_cmds.smid = smid;
+
+	memset(mpi_request, 0, sizeof(Mpi2SmpPassthroughRequest_t));
+	mpi_request->Function = MPI2_FUNCTION_SMP_PASSTHROUGH;
+	mpi_request->PhysicalPort = 0xFF;
+	*((u64 *)&mpi_request->SASAddress) = (rphy) ?
+	    cpu_to_le64(rphy->identify.sas_address) :
+	    cpu_to_le64(ioc->sas_hba.sas_address);
+	mpi_request->RequestDataLength = cpu_to_le16(req->data_len - 4);
+	psge = &mpi_request->SGL;
+
+	/* WRITE sgel first */
+	sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+	    MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
+	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+	dma_addr_out = pci_map_single(ioc->pdev, bio_data(req->bio),
+	      req->data_len, PCI_DMA_BIDIRECTIONAL);
+	if (!dma_addr_out) {
+		mpt2sas_base_free_smid(ioc, le16_to_cpu(smid));
+		goto unmap;
+	}
+
+	ioc->base_add_sg_single(psge, sgl_flags | (req->data_len - 4),
+	    dma_addr_out);
+
+	/* incr sgel */
+	psge += ioc->sge_size;
+
+	/* READ sgel last */
+	sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
+	    MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
+	    MPI2_SGE_FLAGS_END_OF_LIST);
+	sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
+	dma_addr_in =  pci_map_single(ioc->pdev, bio_data(rsp->bio),
+	      rsp->data_len, PCI_DMA_BIDIRECTIONAL);
+	if (!dma_addr_in) {
+		mpt2sas_base_free_smid(ioc, le16_to_cpu(smid));
+		goto unmap;
+	}
+
+	ioc->base_add_sg_single(psge, sgl_flags | (rsp->data_len + 4),
+	    dma_addr_in);
+
+	dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s - "
+	    "sending smp request\n", ioc->name, __func__));
+
+	mpt2sas_base_put_smid_default(ioc, smid, 0 /* VF_ID */);
+	timeleft = wait_for_completion_timeout(&ioc->transport_cmds.done,
+	    10*HZ);
+
+	if (!(ioc->transport_cmds.status & MPT2_CMD_COMPLETE)) {
+		printk(MPT2SAS_ERR_FMT "%s : timeout\n",
+		    __func__, ioc->name);
+		_debug_dump_mf(mpi_request,
+		    sizeof(Mpi2SmpPassthroughRequest_t)/4);
+		if (!(ioc->transport_cmds.status & MPT2_CMD_RESET))
+			issue_reset = 1;
+		goto issue_host_reset;
+	}
+
+	dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT "%s - "
+	    "complete\n", ioc->name, __func__));
+
+	if (ioc->transport_cmds.status & MPT2_CMD_REPLY_VALID) {
+
+		mpi_reply = ioc->transport_cmds.reply;
+
+		dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "%s - reply data transfer size(%d)\n",
+		    ioc->name, __func__,
+		    le16_to_cpu(mpi_reply->ResponseDataLength)));
+
+		memcpy(req->sense, mpi_reply, sizeof(*mpi_reply));
+		req->sense_len = sizeof(*mpi_reply);
+		req->data_len = 0;
+		rsp->data_len -= mpi_reply->ResponseDataLength;
+
+	} else {
+		dtransportprintk(ioc, printk(MPT2SAS_DEBUG_FMT
+		    "%s - no reply\n", ioc->name, __func__));
+		rc = -ENXIO;
+	}
+
+ issue_host_reset:
+	if (issue_reset) {
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP,
+		    FORCE_BIG_HAMMER);
+		rc = -ETIMEDOUT;
+	}
+
+ unmap:
+	if (dma_addr_out)
+		pci_unmap_single(ioc->pdev, dma_addr_out, req->data_len,
+		    PCI_DMA_BIDIRECTIONAL);
+	if (dma_addr_in)
+		pci_unmap_single(ioc->pdev, dma_addr_in, rsp->data_len,
+		    PCI_DMA_BIDIRECTIONAL);
+
+ out:
+	ioc->transport_cmds.status = MPT2_CMD_NOT_USED;
+	mutex_unlock(&ioc->transport_cmds.mutex);
+	return rc;
+}
+
+struct sas_function_template mpt2sas_transport_functions = {
+	.get_linkerrors		= transport_get_linkerrors,
+	.get_enclosure_identifier = transport_get_enclosure_identifier,
+	.get_bay_identifier	= transport_get_bay_identifier,
+	.phy_reset		= transport_phy_reset,
+	.smp_handler		= transport_smp_handler,
+};
+
+struct scsi_transport_template *mpt2sas_transport_template;
diff --git a/drivers/scsi/osd/Kbuild b/drivers/scsi/osd/Kbuild
new file mode 100644
index 000000000000..0e207aa67d16
--- /dev/null
+++ b/drivers/scsi/osd/Kbuild
@@ -0,0 +1,45 @@
+#
+# Kbuild for the OSD modules
+#
+# Copyright (C) 2008 Panasas Inc.  All rights reserved.
+#
+# Authors:
+#   Boaz Harrosh <bharrosh@panasas.com>
+#   Benny Halevy <bhalevy@panasas.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2
+#
+
+ifneq ($(OSD_INC),)
+# we are built out-of-tree Kconfigure everything as on
+
+CONFIG_SCSI_OSD_INITIATOR=m
+ccflags-y += -DCONFIG_SCSI_OSD_INITIATOR -DCONFIG_SCSI_OSD_INITIATOR_MODULE
+
+CONFIG_SCSI_OSD_ULD=m
+ccflags-y += -DCONFIG_SCSI_OSD_ULD -DCONFIG_SCSI_OSD_ULD_MODULE
+
+# CONFIG_SCSI_OSD_DPRINT_SENSE =
+#	0 - no print of errors
+#	1 - print errors
+#	2 - errors + warrnings
+ccflags-y += -DCONFIG_SCSI_OSD_DPRINT_SENSE=1
+
+# Uncomment to turn debug on
+# ccflags-y += -DCONFIG_SCSI_OSD_DEBUG
+
+# if we are built out-of-tree and the hosting kernel has OSD headers
+# then "ccflags-y +=" will not pick the out-off-tree headers. Only by doing
+# this it will work. This might break in future kernels
+LINUXINCLUDE := -I$(OSD_INC) $(LINUXINCLUDE)
+
+endif
+
+# libosd.ko - osd-initiator library
+libosd-y := osd_initiator.o
+obj-$(CONFIG_SCSI_OSD_INITIATOR) += libosd.o
+
+# osd.ko - SCSI ULD and char-device
+osd-y := osd_uld.o
+obj-$(CONFIG_SCSI_OSD_ULD) += osd.o
diff --git a/drivers/scsi/osd/Kconfig b/drivers/scsi/osd/Kconfig
new file mode 100644
index 000000000000..861b5cebaeae
--- /dev/null
+++ b/drivers/scsi/osd/Kconfig
@@ -0,0 +1,53 @@
+#
+# Kernel configuration file for the OSD scsi protocol
+#
+# Copyright (C) 2008 Panasas Inc.  All rights reserved.
+#
+# Authors:
+#   Boaz Harrosh <bharrosh@panasas.com>
+#   Benny Halevy <bhalevy@panasas.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public version 2 License as
+# published by the Free Software Foundation
+#
+# FIXME: SCSI_OSD_INITIATOR should select CONFIG (HMAC) SHA1 somehow.
+#        How is it done properly?
+#
+
+config SCSI_OSD_INITIATOR
+	tristate "OSD-Initiator library"
+	depends on SCSI
+	help
+		Enable the OSD-Initiator library (libosd.ko).
+		NOTE: You must also select CRYPTO_SHA1 + CRYPTO_HMAC and their
+		dependencies
+
+config SCSI_OSD_ULD
+	tristate "OSD Upper Level driver"
+	depends on SCSI_OSD_INITIATOR
+	help
+		Build a SCSI upper layer driver that exports /dev/osdX devices
+		to user-mode for testing and controlling OSD devices. It is also
+		needed by exofs, for mounting an OSD based file system.
+
+config SCSI_OSD_DPRINT_SENSE
+    int "(0-2) When sense is returned, DEBUG print all sense descriptors"
+    default 1
+    depends on SCSI_OSD_INITIATOR
+    help
+        When a CHECK_CONDITION status is returned from a target, and a
+        sense-buffer is retrieved, turning this on will dump a full
+        sense-decoding message. Setting to 2 will also print recoverable
+        errors that might be regularly returned for some filesystem
+        operations.
+
+config SCSI_OSD_DEBUG
+	bool "Compile All OSD modules with lots of DEBUG prints"
+	default n
+	depends on SCSI_OSD_INITIATOR
+	help
+		OSD Code is populated with lots of OSD_DEBUG(..) printouts to
+		dmesg. Enable this if you found a bug and you want to help us
+		track the problem (see also MAINTAINERS). Setting this will also
+		force SCSI_OSD_DPRINT_SENSE=2.
diff --git a/drivers/scsi/osd/Makefile b/drivers/scsi/osd/Makefile
new file mode 100755
index 000000000000..d905344f83ba
--- /dev/null
+++ b/drivers/scsi/osd/Makefile
@@ -0,0 +1,37 @@
+#
+# Makefile for the OSD modules (out of tree)
+#
+# Copyright (C) 2008 Panasas Inc.  All rights reserved.
+#
+# Authors:
+#   Boaz Harrosh <bharrosh@panasas.com>
+#   Benny Halevy <bhalevy@panasas.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2
+#
+# This Makefile is used to call the kernel Makefile in case of an out-of-tree
+# build.
+# $KSRC should point to a Kernel source tree otherwise host's default is
+# used. (eg. /lib/modules/`uname -r`/build)
+
+# include path for out-of-tree Headers
+OSD_INC ?= `pwd`/../../../include
+
+# allow users to override these
+# e.g. to compile for a kernel that you aren't currently running
+KSRC ?= /lib/modules/$(shell uname -r)/build
+KBUILD_OUTPUT ?=
+ARCH ?=
+V ?= 0
+
+# this is the basic Kbuild out-of-tree invocation, with the M= option
+KBUILD_BASE = +$(MAKE) -C $(KSRC) M=`pwd` KBUILD_OUTPUT=$(KBUILD_OUTPUT) ARCH=$(ARCH) V=$(V)
+
+all: libosd
+
+libosd: ;
+	$(KBUILD_BASE) OSD_INC=$(OSD_INC) modules
+
+clean:
+	$(KBUILD_BASE) clean
diff --git a/drivers/scsi/osd/osd_debug.h b/drivers/scsi/osd/osd_debug.h
new file mode 100644
index 000000000000..579e491f11df
--- /dev/null
+++ b/drivers/scsi/osd/osd_debug.h
@@ -0,0 +1,30 @@
+/*
+ * osd_debug.h - Some kprintf macros
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ */
+#ifndef __OSD_DEBUG_H__
+#define __OSD_DEBUG_H__
+
+#define OSD_ERR(fmt, a...) printk(KERN_ERR "osd: " fmt, ##a)
+#define OSD_INFO(fmt, a...) printk(KERN_NOTICE "osd: " fmt, ##a)
+
+#ifdef CONFIG_SCSI_OSD_DEBUG
+#define OSD_DEBUG(fmt, a...) \
+	printk(KERN_NOTICE "osd @%s:%d: " fmt, __func__, __LINE__, ##a)
+#else
+#define OSD_DEBUG(fmt, a...) do {} while (0)
+#endif
+
+/* u64 has problems with printk this will cast it to unsigned long long */
+#define _LLU(x) (unsigned long long)(x)
+
+#endif /* ndef __OSD_DEBUG_H__ */
diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
new file mode 100644
index 000000000000..552f58b655d1
--- /dev/null
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -0,0 +1,1657 @@
+/*
+ * osd_initiator - Main body of the osd initiator library.
+ *
+ * Note: The file does not contain the advanced security functionality which
+ * is only needed by the security_manager's initiators.
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <scsi/osd_initiator.h>
+#include <scsi/osd_sec.h>
+#include <scsi/osd_attributes.h>
+#include <scsi/osd_sense.h>
+
+#include <scsi/scsi_device.h>
+
+#include "osd_debug.h"
+
+#ifndef __unused
+#    define __unused			__attribute__((unused))
+#endif
+
+enum { OSD_REQ_RETRIES = 1 };
+
+MODULE_AUTHOR("Boaz Harrosh <bharrosh@panasas.com>");
+MODULE_DESCRIPTION("open-osd initiator library libosd.ko");
+MODULE_LICENSE("GPL");
+
+static inline void build_test(void)
+{
+	/* structures were not packed */
+	BUILD_BUG_ON(sizeof(struct osd_capability) != OSD_CAP_LEN);
+	BUILD_BUG_ON(sizeof(struct osdv2_cdb) != OSD_TOTAL_CDB_LEN);
+	BUILD_BUG_ON(sizeof(struct osdv1_cdb) != OSDv1_TOTAL_CDB_LEN);
+}
+
+static const char *_osd_ver_desc(struct osd_request *or)
+{
+	return osd_req_is_ver1(or) ? "OSD1" : "OSD2";
+}
+
+#define ATTR_DEF_RI(id, len) ATTR_DEF(OSD_APAGE_ROOT_INFORMATION, id, len)
+
+static int _osd_print_system_info(struct osd_dev *od, void *caps)
+{
+	struct osd_request *or;
+	struct osd_attr get_attrs[] = {
+		ATTR_DEF_RI(OSD_ATTR_RI_VENDOR_IDENTIFICATION, 8),
+		ATTR_DEF_RI(OSD_ATTR_RI_PRODUCT_IDENTIFICATION, 16),
+		ATTR_DEF_RI(OSD_ATTR_RI_PRODUCT_MODEL, 32),
+		ATTR_DEF_RI(OSD_ATTR_RI_PRODUCT_REVISION_LEVEL, 4),
+		ATTR_DEF_RI(OSD_ATTR_RI_PRODUCT_SERIAL_NUMBER, 64 /*variable*/),
+		ATTR_DEF_RI(OSD_ATTR_RI_OSD_NAME, 64 /*variable*/),
+		ATTR_DEF_RI(OSD_ATTR_RI_TOTAL_CAPACITY, 8),
+		ATTR_DEF_RI(OSD_ATTR_RI_USED_CAPACITY, 8),
+		ATTR_DEF_RI(OSD_ATTR_RI_NUMBER_OF_PARTITIONS, 8),
+		ATTR_DEF_RI(OSD_ATTR_RI_CLOCK, 6),
+		/* IBM-OSD-SIM Has a bug with this one put it last */
+		ATTR_DEF_RI(OSD_ATTR_RI_OSD_SYSTEM_ID, 20),
+	};
+	void *iter = NULL, *pFirst;
+	int nelem = ARRAY_SIZE(get_attrs), a = 0;
+	int ret;
+
+	or = osd_start_request(od, GFP_KERNEL);
+	if (!or)
+		return -ENOMEM;
+
+	/* get attrs */
+	osd_req_get_attributes(or, &osd_root_object);
+	osd_req_add_get_attr_list(or, get_attrs, ARRAY_SIZE(get_attrs));
+
+	ret = osd_finalize_request(or, 0, caps, NULL);
+	if (ret)
+		goto out;
+
+	ret = osd_execute_request(or);
+	if (ret) {
+		OSD_ERR("Failed to detect %s => %d\n", _osd_ver_desc(or), ret);
+		goto out;
+	}
+
+	osd_req_decode_get_attr_list(or, get_attrs, &nelem, &iter);
+
+	OSD_INFO("Detected %s device\n",
+		_osd_ver_desc(or));
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_VENDOR_IDENTIFICATION [%s]\n",
+		(char *)pFirst);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_PRODUCT_IDENTIFICATION [%s]\n",
+		(char *)pFirst);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_PRODUCT_MODEL [%s]\n",
+		(char *)pFirst);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_PRODUCT_REVISION_LEVEL [%u]\n",
+		pFirst ? get_unaligned_be32(pFirst) : ~0U);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_PRODUCT_SERIAL_NUMBER [%s]\n",
+		(char *)pFirst);
+
+	pFirst = get_attrs[a].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_OSD_NAME [%s]\n", (char *)pFirst);
+	a++;
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_TOTAL_CAPACITY [0x%llx]\n",
+		pFirst ? _LLU(get_unaligned_be64(pFirst)) : ~0ULL);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_USED_CAPACITY [0x%llx]\n",
+		pFirst ? _LLU(get_unaligned_be64(pFirst)) : ~0ULL);
+
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_NUMBER_OF_PARTITIONS [%llu]\n",
+		pFirst ? _LLU(get_unaligned_be64(pFirst)) : ~0ULL);
+
+	if (a >= nelem)
+		goto out;
+
+	/* FIXME: Where are the time utilities */
+	pFirst = get_attrs[a++].val_ptr;
+	OSD_INFO("OSD_ATTR_RI_CLOCK [0x%02x%02x%02x%02x%02x%02x]\n",
+		((char *)pFirst)[0], ((char *)pFirst)[1],
+		((char *)pFirst)[2], ((char *)pFirst)[3],
+		((char *)pFirst)[4], ((char *)pFirst)[5]);
+
+	if (a < nelem) { /* IBM-OSD-SIM bug, Might not have it */
+		unsigned len = get_attrs[a].len;
+		char sid_dump[32*4 + 2]; /* 2nibbles+space+ASCII */
+
+		hex_dump_to_buffer(get_attrs[a].val_ptr, len, 32, 1,
+				   sid_dump, sizeof(sid_dump), true);
+		OSD_INFO("OSD_ATTR_RI_OSD_SYSTEM_ID(%d) [%s]\n", len, sid_dump);
+		a++;
+	}
+out:
+	osd_end_request(or);
+	return ret;
+}
+
+int osd_auto_detect_ver(struct osd_dev *od, void *caps)
+{
+	int ret;
+
+	/* Auto-detect the osd version */
+	ret = _osd_print_system_info(od, caps);
+	if (ret) {
+		osd_dev_set_ver(od, OSD_VER1);
+		OSD_DEBUG("converting to OSD1\n");
+		ret = _osd_print_system_info(od, caps);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(osd_auto_detect_ver);
+
+static unsigned _osd_req_cdb_len(struct osd_request *or)
+{
+	return osd_req_is_ver1(or) ? OSDv1_TOTAL_CDB_LEN : OSD_TOTAL_CDB_LEN;
+}
+
+static unsigned _osd_req_alist_elem_size(struct osd_request *or, unsigned len)
+{
+	return osd_req_is_ver1(or) ?
+		osdv1_attr_list_elem_size(len) :
+		osdv2_attr_list_elem_size(len);
+}
+
+static unsigned _osd_req_alist_size(struct osd_request *or, void *list_head)
+{
+	return osd_req_is_ver1(or) ?
+		osdv1_list_size(list_head) :
+		osdv2_list_size(list_head);
+}
+
+static unsigned _osd_req_sizeof_alist_header(struct osd_request *or)
+{
+	return osd_req_is_ver1(or) ?
+		sizeof(struct osdv1_attributes_list_header) :
+		sizeof(struct osdv2_attributes_list_header);
+}
+
+static void _osd_req_set_alist_type(struct osd_request *or,
+	void *list, int list_type)
+{
+	if (osd_req_is_ver1(or)) {
+		struct osdv1_attributes_list_header *attr_list = list;
+
+		memset(attr_list, 0, sizeof(*attr_list));
+		attr_list->type = list_type;
+	} else {
+		struct osdv2_attributes_list_header *attr_list = list;
+
+		memset(attr_list, 0, sizeof(*attr_list));
+		attr_list->type = list_type;
+	}
+}
+
+static bool _osd_req_is_alist_type(struct osd_request *or,
+	void *list, int list_type)
+{
+	if (!list)
+		return false;
+
+	if (osd_req_is_ver1(or)) {
+		struct osdv1_attributes_list_header *attr_list = list;
+
+		return attr_list->type == list_type;
+	} else {
+		struct osdv2_attributes_list_header *attr_list = list;
+
+		return attr_list->type == list_type;
+	}
+}
+
+/* This is for List-objects not Attributes-Lists */
+static void _osd_req_encode_olist(struct osd_request *or,
+	struct osd_obj_id_list *list)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+
+	if (osd_req_is_ver1(or)) {
+		cdbh->v1.list_identifier = list->list_identifier;
+		cdbh->v1.start_address = list->continuation_id;
+	} else {
+		cdbh->v2.list_identifier = list->list_identifier;
+		cdbh->v2.start_address = list->continuation_id;
+	}
+}
+
+static osd_cdb_offset osd_req_encode_offset(struct osd_request *or,
+	u64 offset, unsigned *padding)
+{
+	return __osd_encode_offset(offset, padding,
+			osd_req_is_ver1(or) ?
+				OSDv1_OFFSET_MIN_SHIFT : OSD_OFFSET_MIN_SHIFT,
+			OSD_OFFSET_MAX_SHIFT);
+}
+
+static struct osd_security_parameters *
+_osd_req_sec_params(struct osd_request *or)
+{
+	struct osd_cdb *ocdb = &or->cdb;
+
+	if (osd_req_is_ver1(or))
+		return &ocdb->v1.sec_params;
+	else
+		return &ocdb->v2.sec_params;
+}
+
+void osd_dev_init(struct osd_dev *osdd, struct scsi_device *scsi_device)
+{
+	memset(osdd, 0, sizeof(*osdd));
+	osdd->scsi_device = scsi_device;
+	osdd->def_timeout = BLK_DEFAULT_SG_TIMEOUT;
+#ifdef OSD_VER1_SUPPORT
+	osdd->version = OSD_VER2;
+#endif
+	/* TODO: Allocate pools for osd_request attributes ... */
+}
+EXPORT_SYMBOL(osd_dev_init);
+
+void osd_dev_fini(struct osd_dev *osdd)
+{
+	/* TODO: De-allocate pools */
+
+	osdd->scsi_device = NULL;
+}
+EXPORT_SYMBOL(osd_dev_fini);
+
+static struct osd_request *_osd_request_alloc(gfp_t gfp)
+{
+	struct osd_request *or;
+
+	/* TODO: Use mempool with one saved request */
+	or = kzalloc(sizeof(*or), gfp);
+	return or;
+}
+
+static void _osd_request_free(struct osd_request *or)
+{
+	kfree(or);
+}
+
+struct osd_request *osd_start_request(struct osd_dev *dev, gfp_t gfp)
+{
+	struct osd_request *or;
+
+	or = _osd_request_alloc(gfp);
+	if (!or)
+		return NULL;
+
+	or->osd_dev = dev;
+	or->alloc_flags = gfp;
+	or->timeout = dev->def_timeout;
+	or->retries = OSD_REQ_RETRIES;
+
+	return or;
+}
+EXPORT_SYMBOL(osd_start_request);
+
+/*
+ * If osd_finalize_request() was called but the request was not executed through
+ * the block layer, then we must release BIOs.
+ */
+static void _abort_unexecuted_bios(struct request *rq)
+{
+	struct bio *bio;
+
+	while ((bio = rq->bio) != NULL) {
+		rq->bio = bio->bi_next;
+		bio_endio(bio, 0);
+	}
+}
+
+static void _osd_free_seg(struct osd_request *or __unused,
+	struct _osd_req_data_segment *seg)
+{
+	if (!seg->buff || !seg->alloc_size)
+		return;
+
+	kfree(seg->buff);
+	seg->buff = NULL;
+	seg->alloc_size = 0;
+}
+
+void osd_end_request(struct osd_request *or)
+{
+	struct request *rq = or->request;
+
+	_osd_free_seg(or, &or->set_attr);
+	_osd_free_seg(or, &or->enc_get_attr);
+	_osd_free_seg(or, &or->get_attr);
+
+	if (rq) {
+		if (rq->next_rq) {
+			_abort_unexecuted_bios(rq->next_rq);
+			blk_put_request(rq->next_rq);
+		}
+
+		_abort_unexecuted_bios(rq);
+		blk_put_request(rq);
+	}
+	_osd_request_free(or);
+}
+EXPORT_SYMBOL(osd_end_request);
+
+int osd_execute_request(struct osd_request *or)
+{
+	return blk_execute_rq(or->request->q, NULL, or->request, 0);
+}
+EXPORT_SYMBOL(osd_execute_request);
+
+static void osd_request_async_done(struct request *req, int error)
+{
+	struct osd_request *or = req->end_io_data;
+
+	or->async_error = error;
+
+	if (error)
+		OSD_DEBUG("osd_request_async_done error recieved %d\n", error);
+
+	if (or->async_done)
+		or->async_done(or, or->async_private);
+	else
+		osd_end_request(or);
+}
+
+int osd_execute_request_async(struct osd_request *or,
+	osd_req_done_fn *done, void *private)
+{
+	or->request->end_io_data = or;
+	or->async_private = private;
+	or->async_done = done;
+
+	blk_execute_rq_nowait(or->request->q, NULL, or->request, 0,
+			      osd_request_async_done);
+	return 0;
+}
+EXPORT_SYMBOL(osd_execute_request_async);
+
+u8 sg_out_pad_buffer[1 << OSDv1_OFFSET_MIN_SHIFT];
+u8 sg_in_pad_buffer[1 << OSDv1_OFFSET_MIN_SHIFT];
+
+static int _osd_realloc_seg(struct osd_request *or,
+	struct _osd_req_data_segment *seg, unsigned max_bytes)
+{
+	void *buff;
+
+	if (seg->alloc_size >= max_bytes)
+		return 0;
+
+	buff = krealloc(seg->buff, max_bytes, or->alloc_flags);
+	if (!buff) {
+		OSD_ERR("Failed to Realloc %d-bytes was-%d\n", max_bytes,
+			seg->alloc_size);
+		return -ENOMEM;
+	}
+
+	memset(buff + seg->alloc_size, 0, max_bytes - seg->alloc_size);
+	seg->buff = buff;
+	seg->alloc_size = max_bytes;
+	return 0;
+}
+
+static int _alloc_set_attr_list(struct osd_request *or,
+	const struct osd_attr *oa, unsigned nelem, unsigned add_bytes)
+{
+	unsigned total_bytes = add_bytes;
+
+	for (; nelem; --nelem, ++oa)
+		total_bytes += _osd_req_alist_elem_size(or, oa->len);
+
+	OSD_DEBUG("total_bytes=%d\n", total_bytes);
+	return _osd_realloc_seg(or, &or->set_attr, total_bytes);
+}
+
+static int _alloc_get_attr_desc(struct osd_request *or, unsigned max_bytes)
+{
+	OSD_DEBUG("total_bytes=%d\n", max_bytes);
+	return _osd_realloc_seg(or, &or->enc_get_attr, max_bytes);
+}
+
+static int _alloc_get_attr_list(struct osd_request *or)
+{
+	OSD_DEBUG("total_bytes=%d\n", or->get_attr.total_bytes);
+	return _osd_realloc_seg(or, &or->get_attr, or->get_attr.total_bytes);
+}
+
+/*
+ * Common to all OSD commands
+ */
+
+static void _osdv1_req_encode_common(struct osd_request *or,
+	__be16 act, const struct osd_obj_id *obj, u64 offset, u64 len)
+{
+	struct osdv1_cdb *ocdb = &or->cdb.v1;
+
+	/*
+	 * For speed, the commands
+	 *	OSD_ACT_PERFORM_SCSI_COMMAND	, V1 0x8F7E, V2 0x8F7C
+	 *	OSD_ACT_SCSI_TASK_MANAGEMENT	, V1 0x8F7F, V2 0x8F7D
+	 * are not supported here. Should pass zero and set after the call
+	 */
+	act &= cpu_to_be16(~0x0080); /* V1 action code */
+
+	OSD_DEBUG("OSDv1 execute opcode 0x%x\n", be16_to_cpu(act));
+
+	ocdb->h.varlen_cdb.opcode = VARIABLE_LENGTH_CMD;
+	ocdb->h.varlen_cdb.additional_cdb_length = OSD_ADDITIONAL_CDB_LENGTH;
+	ocdb->h.varlen_cdb.service_action = act;
+
+	ocdb->h.partition = cpu_to_be64(obj->partition);
+	ocdb->h.object = cpu_to_be64(obj->id);
+	ocdb->h.v1.length = cpu_to_be64(len);
+	ocdb->h.v1.start_address = cpu_to_be64(offset);
+}
+
+static void _osdv2_req_encode_common(struct osd_request *or,
+	 __be16 act, const struct osd_obj_id *obj, u64 offset, u64 len)
+{
+	struct osdv2_cdb *ocdb = &or->cdb.v2;
+
+	OSD_DEBUG("OSDv2 execute opcode 0x%x\n", be16_to_cpu(act));
+
+	ocdb->h.varlen_cdb.opcode = VARIABLE_LENGTH_CMD;
+	ocdb->h.varlen_cdb.additional_cdb_length = OSD_ADDITIONAL_CDB_LENGTH;
+	ocdb->h.varlen_cdb.service_action = act;
+
+	ocdb->h.partition = cpu_to_be64(obj->partition);
+	ocdb->h.object = cpu_to_be64(obj->id);
+	ocdb->h.v2.length = cpu_to_be64(len);
+	ocdb->h.v2.start_address = cpu_to_be64(offset);
+}
+
+static void _osd_req_encode_common(struct osd_request *or,
+	__be16 act, const struct osd_obj_id *obj, u64 offset, u64 len)
+{
+	if (osd_req_is_ver1(or))
+		_osdv1_req_encode_common(or, act, obj, offset, len);
+	else
+		_osdv2_req_encode_common(or, act, obj, offset, len);
+}
+
+/*
+ * Device commands
+ */
+/*TODO: void osd_req_set_master_seed_xchg(struct osd_request *, ...); */
+/*TODO: void osd_req_set_master_key(struct osd_request *, ...); */
+
+void osd_req_format(struct osd_request *or, u64 tot_capacity)
+{
+	_osd_req_encode_common(or, OSD_ACT_FORMAT_OSD, &osd_root_object, 0,
+				tot_capacity);
+}
+EXPORT_SYMBOL(osd_req_format);
+
+int osd_req_list_dev_partitions(struct osd_request *or,
+	osd_id initial_id, struct osd_obj_id_list *list, unsigned nelem)
+{
+	return osd_req_list_partition_objects(or, 0, initial_id, list, nelem);
+}
+EXPORT_SYMBOL(osd_req_list_dev_partitions);
+
+static void _osd_req_encode_flush(struct osd_request *or,
+	enum osd_options_flush_scope_values op)
+{
+	struct osd_cdb_head *ocdb = osd_cdb_head(&or->cdb);
+
+	ocdb->command_specific_options = op;
+}
+
+void osd_req_flush_obsd(struct osd_request *or,
+	enum osd_options_flush_scope_values op)
+{
+	_osd_req_encode_common(or, OSD_ACT_FLUSH_OSD, &osd_root_object, 0, 0);
+	_osd_req_encode_flush(or, op);
+}
+EXPORT_SYMBOL(osd_req_flush_obsd);
+
+/*TODO: void osd_req_perform_scsi_command(struct osd_request *,
+	const u8 *cdb, ...); */
+/*TODO: void osd_req_task_management(struct osd_request *, ...); */
+
+/*
+ * Partition commands
+ */
+static void _osd_req_encode_partition(struct osd_request *or,
+	__be16 act, osd_id partition)
+{
+	struct osd_obj_id par = {
+		.partition = partition,
+		.id = 0,
+	};
+
+	_osd_req_encode_common(or, act, &par, 0, 0);
+}
+
+void osd_req_create_partition(struct osd_request *or, osd_id partition)
+{
+	_osd_req_encode_partition(or, OSD_ACT_CREATE_PARTITION, partition);
+}
+EXPORT_SYMBOL(osd_req_create_partition);
+
+void osd_req_remove_partition(struct osd_request *or, osd_id partition)
+{
+	_osd_req_encode_partition(or, OSD_ACT_REMOVE_PARTITION, partition);
+}
+EXPORT_SYMBOL(osd_req_remove_partition);
+
+/*TODO: void osd_req_set_partition_key(struct osd_request *,
+	osd_id partition, u8 new_key_id[OSD_CRYPTO_KEYID_SIZE],
+	u8 seed[OSD_CRYPTO_SEED_SIZE]); */
+
+static int _osd_req_list_objects(struct osd_request *or,
+	__be16 action, const struct osd_obj_id *obj, osd_id initial_id,
+	struct osd_obj_id_list *list, unsigned nelem)
+{
+	struct request_queue *q = or->osd_dev->scsi_device->request_queue;
+	u64 len = nelem * sizeof(osd_id) + sizeof(*list);
+	struct bio *bio;
+
+	_osd_req_encode_common(or, action, obj, (u64)initial_id, len);
+
+	if (list->list_identifier)
+		_osd_req_encode_olist(or, list);
+
+	WARN_ON(or->in.bio);
+	bio = bio_map_kern(q, list, len, or->alloc_flags);
+	if (!bio) {
+		OSD_ERR("!!! Failed to allocate list_objects BIO\n");
+		return -ENOMEM;
+	}
+
+	bio->bi_rw &= ~(1 << BIO_RW);
+	or->in.bio = bio;
+	or->in.total_bytes = bio->bi_size;
+	return 0;
+}
+
+int osd_req_list_partition_collections(struct osd_request *or,
+	osd_id partition, osd_id initial_id, struct osd_obj_id_list *list,
+	unsigned nelem)
+{
+	struct osd_obj_id par = {
+		.partition = partition,
+		.id = 0,
+	};
+
+	return osd_req_list_collection_objects(or, &par, initial_id, list,
+					       nelem);
+}
+EXPORT_SYMBOL(osd_req_list_partition_collections);
+
+int osd_req_list_partition_objects(struct osd_request *or,
+	osd_id partition, osd_id initial_id, struct osd_obj_id_list *list,
+	unsigned nelem)
+{
+	struct osd_obj_id par = {
+		.partition = partition,
+		.id = 0,
+	};
+
+	return _osd_req_list_objects(or, OSD_ACT_LIST, &par, initial_id, list,
+				     nelem);
+}
+EXPORT_SYMBOL(osd_req_list_partition_objects);
+
+void osd_req_flush_partition(struct osd_request *or,
+	osd_id partition, enum osd_options_flush_scope_values op)
+{
+	_osd_req_encode_partition(or, OSD_ACT_FLUSH_PARTITION, partition);
+	_osd_req_encode_flush(or, op);
+}
+EXPORT_SYMBOL(osd_req_flush_partition);
+
+/*
+ * Collection commands
+ */
+/*TODO: void osd_req_create_collection(struct osd_request *,
+	const struct osd_obj_id *); */
+/*TODO: void osd_req_remove_collection(struct osd_request *,
+	const struct osd_obj_id *); */
+
+int osd_req_list_collection_objects(struct osd_request *or,
+	const struct osd_obj_id *obj, osd_id initial_id,
+	struct osd_obj_id_list *list, unsigned nelem)
+{
+	return _osd_req_list_objects(or, OSD_ACT_LIST_COLLECTION, obj,
+				     initial_id, list, nelem);
+}
+EXPORT_SYMBOL(osd_req_list_collection_objects);
+
+/*TODO: void query(struct osd_request *, ...); V2 */
+
+void osd_req_flush_collection(struct osd_request *or,
+	const struct osd_obj_id *obj, enum osd_options_flush_scope_values op)
+{
+	_osd_req_encode_common(or, OSD_ACT_FLUSH_PARTITION, obj, 0, 0);
+	_osd_req_encode_flush(or, op);
+}
+EXPORT_SYMBOL(osd_req_flush_collection);
+
+/*TODO: void get_member_attrs(struct osd_request *, ...); V2 */
+/*TODO: void set_member_attrs(struct osd_request *, ...); V2 */
+
+/*
+ * Object commands
+ */
+void osd_req_create_object(struct osd_request *or, struct osd_obj_id *obj)
+{
+	_osd_req_encode_common(or, OSD_ACT_CREATE, obj, 0, 0);
+}
+EXPORT_SYMBOL(osd_req_create_object);
+
+void osd_req_remove_object(struct osd_request *or, struct osd_obj_id *obj)
+{
+	_osd_req_encode_common(or, OSD_ACT_REMOVE, obj, 0, 0);
+}
+EXPORT_SYMBOL(osd_req_remove_object);
+
+
+/*TODO: void osd_req_create_multi(struct osd_request *or,
+	struct osd_obj_id *first, struct osd_obj_id_list *list, unsigned nelem);
+*/
+
+void osd_req_write(struct osd_request *or,
+	const struct osd_obj_id *obj, struct bio *bio, u64 offset)
+{
+	_osd_req_encode_common(or, OSD_ACT_WRITE, obj, offset, bio->bi_size);
+	WARN_ON(or->out.bio || or->out.total_bytes);
+	bio->bi_rw |= (1 << BIO_RW);
+	or->out.bio = bio;
+	or->out.total_bytes = bio->bi_size;
+}
+EXPORT_SYMBOL(osd_req_write);
+
+/*TODO: void osd_req_append(struct osd_request *,
+	const struct osd_obj_id *, struct bio *data_out); */
+/*TODO: void osd_req_create_write(struct osd_request *,
+	const struct osd_obj_id *, struct bio *data_out, u64 offset); */
+/*TODO: void osd_req_clear(struct osd_request *,
+	const struct osd_obj_id *, u64 offset, u64 len); */
+/*TODO: void osd_req_punch(struct osd_request *,
+	const struct osd_obj_id *, u64 offset, u64 len); V2 */
+
+void osd_req_flush_object(struct osd_request *or,
+	const struct osd_obj_id *obj, enum osd_options_flush_scope_values op,
+	/*V2*/ u64 offset, /*V2*/ u64 len)
+{
+	if (unlikely(osd_req_is_ver1(or) && (offset || len))) {
+		OSD_DEBUG("OSD Ver1 flush on specific range ignored\n");
+		offset = 0;
+		len = 0;
+	}
+
+	_osd_req_encode_common(or, OSD_ACT_FLUSH, obj, offset, len);
+	_osd_req_encode_flush(or, op);
+}
+EXPORT_SYMBOL(osd_req_flush_object);
+
+void osd_req_read(struct osd_request *or,
+	const struct osd_obj_id *obj, struct bio *bio, u64 offset)
+{
+	_osd_req_encode_common(or, OSD_ACT_READ, obj, offset, bio->bi_size);
+	WARN_ON(or->in.bio || or->in.total_bytes);
+	bio->bi_rw &= ~(1 << BIO_RW);
+	or->in.bio = bio;
+	or->in.total_bytes = bio->bi_size;
+}
+EXPORT_SYMBOL(osd_req_read);
+
+void osd_req_get_attributes(struct osd_request *or,
+	const struct osd_obj_id *obj)
+{
+	_osd_req_encode_common(or, OSD_ACT_GET_ATTRIBUTES, obj, 0, 0);
+}
+EXPORT_SYMBOL(osd_req_get_attributes);
+
+void osd_req_set_attributes(struct osd_request *or,
+	const struct osd_obj_id *obj)
+{
+	_osd_req_encode_common(or, OSD_ACT_SET_ATTRIBUTES, obj, 0, 0);
+}
+EXPORT_SYMBOL(osd_req_set_attributes);
+
+/*
+ * Attributes List-mode
+ */
+
+int osd_req_add_set_attr_list(struct osd_request *or,
+	const struct osd_attr *oa, unsigned nelem)
+{
+	unsigned total_bytes = or->set_attr.total_bytes;
+	void *attr_last;
+	int ret;
+
+	if (or->attributes_mode &&
+	    or->attributes_mode != OSD_CDB_GET_SET_ATTR_LISTS) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	or->attributes_mode = OSD_CDB_GET_SET_ATTR_LISTS;
+
+	if (!total_bytes) { /* first-time: allocate and put list header */
+		total_bytes = _osd_req_sizeof_alist_header(or);
+		ret = _alloc_set_attr_list(or, oa, nelem, total_bytes);
+		if (ret)
+			return ret;
+		_osd_req_set_alist_type(or, or->set_attr.buff,
+					OSD_ATTR_LIST_SET_RETRIEVE);
+	}
+	attr_last = or->set_attr.buff + total_bytes;
+
+	for (; nelem; --nelem) {
+		struct osd_attributes_list_element *attr;
+		unsigned elem_size = _osd_req_alist_elem_size(or, oa->len);
+
+		total_bytes += elem_size;
+		if (unlikely(or->set_attr.alloc_size < total_bytes)) {
+			or->set_attr.total_bytes = total_bytes - elem_size;
+			ret = _alloc_set_attr_list(or, oa, nelem, total_bytes);
+			if (ret)
+				return ret;
+			attr_last =
+				or->set_attr.buff + or->set_attr.total_bytes;
+		}
+
+		attr = attr_last;
+		attr->attr_page = cpu_to_be32(oa->attr_page);
+		attr->attr_id = cpu_to_be32(oa->attr_id);
+		attr->attr_bytes = cpu_to_be16(oa->len);
+		memcpy(attr->attr_val, oa->val_ptr, oa->len);
+
+		attr_last += elem_size;
+		++oa;
+	}
+
+	or->set_attr.total_bytes = total_bytes;
+	return 0;
+}
+EXPORT_SYMBOL(osd_req_add_set_attr_list);
+
+static int _append_map_kern(struct request *req,
+	void *buff, unsigned len, gfp_t flags)
+{
+	struct bio *bio;
+	int ret;
+
+	bio = bio_map_kern(req->q, buff, len, flags);
+	if (IS_ERR(bio)) {
+		OSD_ERR("Failed bio_map_kern(%p, %d) => %ld\n", buff, len,
+			PTR_ERR(bio));
+		return PTR_ERR(bio);
+	}
+	ret = blk_rq_append_bio(req->q, req, bio);
+	if (ret) {
+		OSD_ERR("Failed blk_rq_append_bio(%p) => %d\n", bio, ret);
+		bio_put(bio);
+	}
+	return ret;
+}
+
+static int _req_append_segment(struct osd_request *or,
+	unsigned padding, struct _osd_req_data_segment *seg,
+	struct _osd_req_data_segment *last_seg, struct _osd_io_info *io)
+{
+	void *pad_buff;
+	int ret;
+
+	if (padding) {
+		/* check if we can just add it to last buffer */
+		if (last_seg &&
+		    (padding <= last_seg->alloc_size - last_seg->total_bytes))
+			pad_buff = last_seg->buff + last_seg->total_bytes;
+		else
+			pad_buff = io->pad_buff;
+
+		ret = _append_map_kern(io->req, pad_buff, padding,
+				       or->alloc_flags);
+		if (ret)
+			return ret;
+		io->total_bytes += padding;
+	}
+
+	ret = _append_map_kern(io->req, seg->buff, seg->total_bytes,
+			       or->alloc_flags);
+	if (ret)
+		return ret;
+
+	io->total_bytes += seg->total_bytes;
+	OSD_DEBUG("padding=%d buff=%p total_bytes=%d\n", padding, seg->buff,
+		  seg->total_bytes);
+	return 0;
+}
+
+static int _osd_req_finalize_set_attr_list(struct osd_request *or)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+	unsigned padding;
+	int ret;
+
+	if (!or->set_attr.total_bytes) {
+		cdbh->attrs_list.set_attr_offset = OSD_OFFSET_UNUSED;
+		return 0;
+	}
+
+	cdbh->attrs_list.set_attr_bytes = cpu_to_be32(or->set_attr.total_bytes);
+	cdbh->attrs_list.set_attr_offset =
+		osd_req_encode_offset(or, or->out.total_bytes, &padding);
+
+	ret = _req_append_segment(or, padding, &or->set_attr,
+				  or->out.last_seg, &or->out);
+	if (ret)
+		return ret;
+
+	or->out.last_seg = &or->set_attr;
+	return 0;
+}
+
+int osd_req_add_get_attr_list(struct osd_request *or,
+	const struct osd_attr *oa, unsigned nelem)
+{
+	unsigned total_bytes = or->enc_get_attr.total_bytes;
+	void *attr_last;
+	int ret;
+
+	if (or->attributes_mode &&
+	    or->attributes_mode != OSD_CDB_GET_SET_ATTR_LISTS) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	or->attributes_mode = OSD_CDB_GET_SET_ATTR_LISTS;
+
+	/* first time calc data-in list header size */
+	if (!or->get_attr.total_bytes)
+		or->get_attr.total_bytes = _osd_req_sizeof_alist_header(or);
+
+	/* calc data-out info */
+	if (!total_bytes) { /* first-time: allocate and put list header */
+		unsigned max_bytes;
+
+		total_bytes = _osd_req_sizeof_alist_header(or);
+		max_bytes = total_bytes +
+			nelem * sizeof(struct osd_attributes_list_attrid);
+		ret = _alloc_get_attr_desc(or, max_bytes);
+		if (ret)
+			return ret;
+
+		_osd_req_set_alist_type(or, or->enc_get_attr.buff,
+					OSD_ATTR_LIST_GET);
+	}
+	attr_last = or->enc_get_attr.buff + total_bytes;
+
+	for (; nelem; --nelem) {
+		struct osd_attributes_list_attrid *attrid;
+		const unsigned cur_size = sizeof(*attrid);
+
+		total_bytes += cur_size;
+		if (unlikely(or->enc_get_attr.alloc_size < total_bytes)) {
+			or->enc_get_attr.total_bytes = total_bytes - cur_size;
+			ret = _alloc_get_attr_desc(or,
+					total_bytes + nelem * sizeof(*attrid));
+			if (ret)
+				return ret;
+			attr_last = or->enc_get_attr.buff +
+				or->enc_get_attr.total_bytes;
+		}
+
+		attrid = attr_last;
+		attrid->attr_page = cpu_to_be32(oa->attr_page);
+		attrid->attr_id = cpu_to_be32(oa->attr_id);
+
+		attr_last += cur_size;
+
+		/* calc data-in size */
+		or->get_attr.total_bytes +=
+			_osd_req_alist_elem_size(or, oa->len);
+		++oa;
+	}
+
+	or->enc_get_attr.total_bytes = total_bytes;
+
+	OSD_DEBUG(
+	       "get_attr.total_bytes=%u(%u) enc_get_attr.total_bytes=%u(%Zu)\n",
+	       or->get_attr.total_bytes,
+	       or->get_attr.total_bytes - _osd_req_sizeof_alist_header(or),
+	       or->enc_get_attr.total_bytes,
+	       (or->enc_get_attr.total_bytes - _osd_req_sizeof_alist_header(or))
+			/ sizeof(struct osd_attributes_list_attrid));
+
+	return 0;
+}
+EXPORT_SYMBOL(osd_req_add_get_attr_list);
+
+static int _osd_req_finalize_get_attr_list(struct osd_request *or)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+	unsigned out_padding;
+	unsigned in_padding;
+	int ret;
+
+	if (!or->enc_get_attr.total_bytes) {
+		cdbh->attrs_list.get_attr_desc_offset = OSD_OFFSET_UNUSED;
+		cdbh->attrs_list.get_attr_offset = OSD_OFFSET_UNUSED;
+		return 0;
+	}
+
+	ret = _alloc_get_attr_list(or);
+	if (ret)
+		return ret;
+
+	/* The out-going buffer info update */
+	OSD_DEBUG("out-going\n");
+	cdbh->attrs_list.get_attr_desc_bytes =
+		cpu_to_be32(or->enc_get_attr.total_bytes);
+
+	cdbh->attrs_list.get_attr_desc_offset =
+		osd_req_encode_offset(or, or->out.total_bytes, &out_padding);
+
+	ret = _req_append_segment(or, out_padding, &or->enc_get_attr,
+				  or->out.last_seg, &or->out);
+	if (ret)
+		return ret;
+	or->out.last_seg = &or->enc_get_attr;
+
+	/* The incoming buffer info update */
+	OSD_DEBUG("in-coming\n");
+	cdbh->attrs_list.get_attr_alloc_length =
+		cpu_to_be32(or->get_attr.total_bytes);
+
+	cdbh->attrs_list.get_attr_offset =
+		osd_req_encode_offset(or, or->in.total_bytes, &in_padding);
+
+	ret = _req_append_segment(or, in_padding, &or->get_attr, NULL,
+				  &or->in);
+	if (ret)
+		return ret;
+	or->in.last_seg = &or->get_attr;
+
+	return 0;
+}
+
+int osd_req_decode_get_attr_list(struct osd_request *or,
+	struct osd_attr *oa, int *nelem, void **iterator)
+{
+	unsigned cur_bytes, returned_bytes;
+	int n;
+	const unsigned sizeof_attr_list = _osd_req_sizeof_alist_header(or);
+	void *cur_p;
+
+	if (!_osd_req_is_alist_type(or, or->get_attr.buff,
+				    OSD_ATTR_LIST_SET_RETRIEVE)) {
+		oa->attr_page = 0;
+		oa->attr_id = 0;
+		oa->val_ptr = NULL;
+		oa->len = 0;
+		*iterator = NULL;
+		return 0;
+	}
+
+	if (*iterator) {
+		BUG_ON((*iterator < or->get_attr.buff) ||
+		     (or->get_attr.buff + or->get_attr.alloc_size < *iterator));
+		cur_p = *iterator;
+		cur_bytes = (*iterator - or->get_attr.buff) - sizeof_attr_list;
+		returned_bytes = or->get_attr.total_bytes;
+	} else { /* first time decode the list header */
+		cur_bytes = sizeof_attr_list;
+		returned_bytes = _osd_req_alist_size(or, or->get_attr.buff) +
+					sizeof_attr_list;
+
+		cur_p = or->get_attr.buff + sizeof_attr_list;
+
+		if (returned_bytes > or->get_attr.alloc_size) {
+			OSD_DEBUG("target report: space was not big enough! "
+				  "Allocate=%u Needed=%u\n",
+				  or->get_attr.alloc_size,
+				  returned_bytes + sizeof_attr_list);
+
+			returned_bytes =
+				or->get_attr.alloc_size - sizeof_attr_list;
+		}
+		or->get_attr.total_bytes = returned_bytes;
+	}
+
+	for (n = 0; (n < *nelem) && (cur_bytes < returned_bytes); ++n) {
+		struct osd_attributes_list_element *attr = cur_p;
+		unsigned inc;
+
+		oa->len = be16_to_cpu(attr->attr_bytes);
+		inc = _osd_req_alist_elem_size(or, oa->len);
+		OSD_DEBUG("oa->len=%d inc=%d cur_bytes=%d\n",
+			  oa->len, inc, cur_bytes);
+		cur_bytes += inc;
+		if (cur_bytes > returned_bytes) {
+			OSD_ERR("BAD FOOD from target. list not valid!"
+				"c=%d r=%d n=%d\n",
+				cur_bytes, returned_bytes, n);
+			oa->val_ptr = NULL;
+			break;
+		}
+
+		oa->attr_page = be32_to_cpu(attr->attr_page);
+		oa->attr_id = be32_to_cpu(attr->attr_id);
+		oa->val_ptr = attr->attr_val;
+
+		cur_p += inc;
+		++oa;
+	}
+
+	*iterator = (returned_bytes - cur_bytes) ? cur_p : NULL;
+	*nelem = n;
+	return returned_bytes - cur_bytes;
+}
+EXPORT_SYMBOL(osd_req_decode_get_attr_list);
+
+/*
+ * Attributes Page-mode
+ */
+
+int osd_req_add_get_attr_page(struct osd_request *or,
+	u32 page_id, void *attar_page, unsigned max_page_len,
+	const struct osd_attr *set_one_attr)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+
+	if (or->attributes_mode &&
+	    or->attributes_mode != OSD_CDB_GET_ATTR_PAGE_SET_ONE) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	or->attributes_mode = OSD_CDB_GET_ATTR_PAGE_SET_ONE;
+
+	or->get_attr.buff = attar_page;
+	or->get_attr.total_bytes = max_page_len;
+
+	or->set_attr.buff = set_one_attr->val_ptr;
+	or->set_attr.total_bytes = set_one_attr->len;
+
+	cdbh->attrs_page.get_attr_page = cpu_to_be32(page_id);
+	cdbh->attrs_page.get_attr_alloc_length = cpu_to_be32(max_page_len);
+	/* ocdb->attrs_page.get_attr_offset; */
+
+	cdbh->attrs_page.set_attr_page = cpu_to_be32(set_one_attr->attr_page);
+	cdbh->attrs_page.set_attr_id = cpu_to_be32(set_one_attr->attr_id);
+	cdbh->attrs_page.set_attr_length = cpu_to_be32(set_one_attr->len);
+	/* ocdb->attrs_page.set_attr_offset; */
+	return 0;
+}
+EXPORT_SYMBOL(osd_req_add_get_attr_page);
+
+static int _osd_req_finalize_attr_page(struct osd_request *or)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+	unsigned in_padding, out_padding;
+	int ret;
+
+	/* returned page */
+	cdbh->attrs_page.get_attr_offset =
+		osd_req_encode_offset(or, or->in.total_bytes, &in_padding);
+
+	ret = _req_append_segment(or, in_padding, &or->get_attr, NULL,
+				  &or->in);
+	if (ret)
+		return ret;
+
+	/* set one value */
+	cdbh->attrs_page.set_attr_offset =
+		osd_req_encode_offset(or, or->out.total_bytes, &out_padding);
+
+	ret = _req_append_segment(or, out_padding, &or->enc_get_attr, NULL,
+				  &or->out);
+	return ret;
+}
+
+static int _osd_req_finalize_data_integrity(struct osd_request *or,
+	bool has_in, bool has_out, const u8 *cap_key)
+{
+	struct osd_security_parameters *sec_parms = _osd_req_sec_params(or);
+	int ret;
+
+	if (!osd_is_sec_alldata(sec_parms))
+		return 0;
+
+	if (has_out) {
+		struct _osd_req_data_segment seg = {
+			.buff = &or->out_data_integ,
+			.total_bytes = sizeof(or->out_data_integ),
+		};
+		unsigned pad;
+
+		or->out_data_integ.data_bytes = cpu_to_be64(
+			or->out.bio ? or->out.bio->bi_size : 0);
+		or->out_data_integ.set_attributes_bytes = cpu_to_be64(
+			or->set_attr.total_bytes);
+		or->out_data_integ.get_attributes_bytes = cpu_to_be64(
+			or->enc_get_attr.total_bytes);
+
+		sec_parms->data_out_integrity_check_offset =
+			osd_req_encode_offset(or, or->out.total_bytes, &pad);
+
+		ret = _req_append_segment(or, pad, &seg, or->out.last_seg,
+					  &or->out);
+		if (ret)
+			return ret;
+		or->out.last_seg = NULL;
+
+		/* they are now all chained to request sign them all together */
+		osd_sec_sign_data(&or->out_data_integ, or->out.req->bio,
+				  cap_key);
+	}
+
+	if (has_in) {
+		struct _osd_req_data_segment seg = {
+			.buff = &or->in_data_integ,
+			.total_bytes = sizeof(or->in_data_integ),
+		};
+		unsigned pad;
+
+		sec_parms->data_in_integrity_check_offset =
+			osd_req_encode_offset(or, or->in.total_bytes, &pad);
+
+		ret = _req_append_segment(or, pad, &seg, or->in.last_seg,
+					  &or->in);
+		if (ret)
+			return ret;
+
+		or->in.last_seg = NULL;
+	}
+
+	return 0;
+}
+
+/*
+ * osd_finalize_request and helpers
+ */
+
+static int _init_blk_request(struct osd_request *or,
+	bool has_in, bool has_out)
+{
+	gfp_t flags = or->alloc_flags;
+	struct scsi_device *scsi_device = or->osd_dev->scsi_device;
+	struct request_queue *q = scsi_device->request_queue;
+	struct request *req;
+	int ret = -ENOMEM;
+
+	req = blk_get_request(q, has_out, flags);
+	if (!req)
+		goto out;
+
+	or->request = req;
+	req->cmd_type = REQ_TYPE_BLOCK_PC;
+	req->timeout = or->timeout;
+	req->retries = or->retries;
+	req->sense = or->sense;
+	req->sense_len = 0;
+
+	if (has_out) {
+		or->out.req = req;
+		if (has_in) {
+			/* allocate bidi request */
+			req = blk_get_request(q, READ, flags);
+			if (!req) {
+				OSD_DEBUG("blk_get_request for bidi failed\n");
+				goto out;
+			}
+			req->cmd_type = REQ_TYPE_BLOCK_PC;
+			or->in.req = or->request->next_rq = req;
+		}
+	} else if (has_in)
+		or->in.req = req;
+
+	ret = 0;
+out:
+	OSD_DEBUG("or=%p has_in=%d has_out=%d => %d, %p\n",
+			or, has_in, has_out, ret, or->request);
+	return ret;
+}
+
+int osd_finalize_request(struct osd_request *or,
+	u8 options, const void *cap, const u8 *cap_key)
+{
+	struct osd_cdb_head *cdbh = osd_cdb_head(&or->cdb);
+	bool has_in, has_out;
+	int ret;
+
+	if (options & OSD_REQ_FUA)
+		cdbh->options |= OSD_CDB_FUA;
+
+	if (options & OSD_REQ_DPO)
+		cdbh->options |= OSD_CDB_DPO;
+
+	if (options & OSD_REQ_BYPASS_TIMESTAMPS)
+		cdbh->timestamp_control = OSD_CDB_BYPASS_TIMESTAMPS;
+
+	osd_set_caps(&or->cdb, cap);
+
+	has_in = or->in.bio || or->get_attr.total_bytes;
+	has_out = or->out.bio || or->set_attr.total_bytes ||
+		or->enc_get_attr.total_bytes;
+
+	ret = _init_blk_request(or, has_in, has_out);
+	if (ret) {
+		OSD_DEBUG("_init_blk_request failed\n");
+		return ret;
+	}
+
+	if (or->out.bio) {
+		ret = blk_rq_append_bio(or->request->q, or->out.req,
+					or->out.bio);
+		if (ret) {
+			OSD_DEBUG("blk_rq_append_bio out failed\n");
+			return ret;
+		}
+		OSD_DEBUG("out bytes=%llu (bytes_req=%u)\n",
+			_LLU(or->out.total_bytes), or->out.req->data_len);
+	}
+	if (or->in.bio) {
+		ret = blk_rq_append_bio(or->request->q, or->in.req, or->in.bio);
+		if (ret) {
+			OSD_DEBUG("blk_rq_append_bio in failed\n");
+			return ret;
+		}
+		OSD_DEBUG("in bytes=%llu (bytes_req=%u)\n",
+			_LLU(or->in.total_bytes), or->in.req->data_len);
+	}
+
+	or->out.pad_buff = sg_out_pad_buffer;
+	or->in.pad_buff = sg_in_pad_buffer;
+
+	if (!or->attributes_mode)
+		or->attributes_mode = OSD_CDB_GET_SET_ATTR_LISTS;
+	cdbh->command_specific_options |= or->attributes_mode;
+	if (or->attributes_mode == OSD_CDB_GET_ATTR_PAGE_SET_ONE) {
+		ret = _osd_req_finalize_attr_page(or);
+	} else {
+		/* TODO: I think that for the GET_ATTR command these 2 should
+		 * be reversed to keep them in execution order (for embeded
+		 * targets with low memory footprint)
+		 */
+		ret = _osd_req_finalize_set_attr_list(or);
+		if (ret) {
+			OSD_DEBUG("_osd_req_finalize_set_attr_list failed\n");
+			return ret;
+		}
+
+		ret = _osd_req_finalize_get_attr_list(or);
+		if (ret) {
+			OSD_DEBUG("_osd_req_finalize_get_attr_list failed\n");
+			return ret;
+		}
+	}
+
+	ret = _osd_req_finalize_data_integrity(or, has_in, has_out, cap_key);
+	if (ret)
+		return ret;
+
+	osd_sec_sign_cdb(&or->cdb, cap_key);
+
+	or->request->cmd = or->cdb.buff;
+	or->request->cmd_len = _osd_req_cdb_len(or);
+
+	return 0;
+}
+EXPORT_SYMBOL(osd_finalize_request);
+
+#define OSD_SENSE_PRINT1(fmt, a...) \
+	do { \
+		if (__cur_sense_need_output) \
+			OSD_ERR(fmt, ##a); \
+	} while (0)
+
+#define OSD_SENSE_PRINT2(fmt, a...) OSD_SENSE_PRINT1("    " fmt, ##a)
+
+int osd_req_decode_sense_full(struct osd_request *or,
+	struct osd_sense_info *osi, bool silent,
+	struct osd_obj_id *bad_obj_list __unused, int max_obj __unused,
+	struct osd_attr *bad_attr_list, int max_attr)
+{
+	int sense_len, original_sense_len;
+	struct osd_sense_info local_osi;
+	struct scsi_sense_descriptor_based *ssdb;
+	void *cur_descriptor;
+#if (CONFIG_SCSI_OSD_DPRINT_SENSE == 0)
+	const bool __cur_sense_need_output = false;
+#else
+	bool __cur_sense_need_output = !silent;
+#endif
+
+	if (!or->request->errors)
+		return 0;
+
+	ssdb = or->request->sense;
+	sense_len = or->request->sense_len;
+	if ((sense_len < (int)sizeof(*ssdb) || !ssdb->sense_key)) {
+		OSD_ERR("Block-layer returned error(0x%x) but "
+			"sense_len(%u) || key(%d) is empty\n",
+			or->request->errors, sense_len, ssdb->sense_key);
+		return -EIO;
+	}
+
+	if ((ssdb->response_code != 0x72) && (ssdb->response_code != 0x73)) {
+		OSD_ERR("Unrecognized scsi sense: rcode=%x length=%d\n",
+			ssdb->response_code, sense_len);
+		return -EIO;
+	}
+
+	osi = osi ? : &local_osi;
+	memset(osi, 0, sizeof(*osi));
+	osi->key = ssdb->sense_key;
+	osi->additional_code = be16_to_cpu(ssdb->additional_sense_code);
+	original_sense_len = ssdb->additional_sense_length + 8;
+
+#if (CONFIG_SCSI_OSD_DPRINT_SENSE == 1)
+	if (__cur_sense_need_output)
+		__cur_sense_need_output = (osi->key > scsi_sk_recovered_error);
+#endif
+	OSD_SENSE_PRINT1("Main Sense information key=0x%x length(%d, %d) "
+			"additional_code=0x%x\n",
+			osi->key, original_sense_len, sense_len,
+			osi->additional_code);
+
+	if (original_sense_len < sense_len)
+		sense_len = original_sense_len;
+
+	cur_descriptor = ssdb->ssd;
+	sense_len -= sizeof(*ssdb);
+	while (sense_len > 0) {
+		struct scsi_sense_descriptor *ssd = cur_descriptor;
+		int cur_len = ssd->additional_length + 2;
+
+		sense_len -= cur_len;
+
+		if (sense_len < 0)
+			break; /* sense was truncated */
+
+		switch (ssd->descriptor_type) {
+		case scsi_sense_information:
+		case scsi_sense_command_specific_information:
+		{
+			struct scsi_sense_command_specific_data_descriptor
+				*sscd = cur_descriptor;
+
+			osi->command_info =
+				get_unaligned_be64(&sscd->information) ;
+			OSD_SENSE_PRINT2(
+				"command_specific_information 0x%llx \n",
+				_LLU(osi->command_info));
+			break;
+		}
+		case scsi_sense_key_specific:
+		{
+			struct scsi_sense_key_specific_data_descriptor
+				*ssks = cur_descriptor;
+
+			osi->sense_info = get_unaligned_be16(&ssks->value);
+			OSD_SENSE_PRINT2(
+				"sense_key_specific_information %u"
+				"sksv_cd_bpv_bp (0x%x)\n",
+				osi->sense_info, ssks->sksv_cd_bpv_bp);
+			break;
+		}
+		case osd_sense_object_identification:
+		{ /*FIXME: Keep first not last, Store in array*/
+			struct osd_sense_identification_data_descriptor
+				*osidd = cur_descriptor;
+
+			osi->not_initiated_command_functions =
+				le32_to_cpu(osidd->not_initiated_functions);
+			osi->completed_command_functions =
+				le32_to_cpu(osidd->completed_functions);
+			osi->obj.partition = be64_to_cpu(osidd->partition_id);
+			osi->obj.id = be64_to_cpu(osidd->object_id);
+			OSD_SENSE_PRINT2(
+				"object_identification pid=0x%llx oid=0x%llx\n",
+				_LLU(osi->obj.partition), _LLU(osi->obj.id));
+			OSD_SENSE_PRINT2(
+				"not_initiated_bits(%x) "
+				"completed_command_bits(%x)\n",
+				osi->not_initiated_command_functions,
+				osi->completed_command_functions);
+			break;
+		}
+		case osd_sense_response_integrity_check:
+		{
+			struct osd_sense_response_integrity_check_descriptor
+				*osricd = cur_descriptor;
+			const unsigned len =
+					  sizeof(osricd->integrity_check_value);
+			char key_dump[len*4 + 2]; /* 2nibbles+space+ASCII */
+
+			hex_dump_to_buffer(osricd->integrity_check_value, len,
+				       32, 1, key_dump, sizeof(key_dump), true);
+			OSD_SENSE_PRINT2("response_integrity [%s]\n", key_dump);
+		}
+		case osd_sense_attribute_identification:
+		{
+			struct osd_sense_attributes_data_descriptor
+				*osadd = cur_descriptor;
+			int len = min(cur_len, sense_len);
+			int i = 0;
+			struct osd_sense_attr *pattr = osadd->sense_attrs;
+
+			while (len < 0) {
+				u32 attr_page = be32_to_cpu(pattr->attr_page);
+				u32 attr_id = be32_to_cpu(pattr->attr_id);
+
+				if (i++ == 0) {
+					osi->attr.attr_page = attr_page;
+					osi->attr.attr_id = attr_id;
+				}
+
+				if (bad_attr_list && max_attr) {
+					bad_attr_list->attr_page = attr_page;
+					bad_attr_list->attr_id = attr_id;
+					bad_attr_list++;
+					max_attr--;
+				}
+				OSD_SENSE_PRINT2(
+					"osd_sense_attribute_identification"
+					"attr_page=0x%x attr_id=0x%x\n",
+					attr_page, attr_id);
+			}
+		}
+		/*These are not legal for OSD*/
+		case scsi_sense_field_replaceable_unit:
+			OSD_SENSE_PRINT2("scsi_sense_field_replaceable_unit\n");
+			break;
+		case scsi_sense_stream_commands:
+			OSD_SENSE_PRINT2("scsi_sense_stream_commands\n");
+			break;
+		case scsi_sense_block_commands:
+			OSD_SENSE_PRINT2("scsi_sense_block_commands\n");
+			break;
+		case scsi_sense_ata_return:
+			OSD_SENSE_PRINT2("scsi_sense_ata_return\n");
+			break;
+		default:
+			if (ssd->descriptor_type <= scsi_sense_Reserved_last)
+				OSD_SENSE_PRINT2(
+					"scsi_sense Reserved descriptor (0x%x)",
+					ssd->descriptor_type);
+			else
+				OSD_SENSE_PRINT2(
+					"scsi_sense Vendor descriptor (0x%x)",
+					ssd->descriptor_type);
+		}
+
+		cur_descriptor += cur_len;
+	}
+
+	return (osi->key > scsi_sk_recovered_error) ? -EIO : 0;
+}
+EXPORT_SYMBOL(osd_req_decode_sense_full);
+
+/*
+ * Implementation of osd_sec.h API
+ * TODO: Move to a separate osd_sec.c file at a later stage.
+ */
+
+enum { OSD_SEC_CAP_V1_ALL_CAPS =
+	OSD_SEC_CAP_APPEND | OSD_SEC_CAP_OBJ_MGMT | OSD_SEC_CAP_REMOVE   |
+	OSD_SEC_CAP_CREATE | OSD_SEC_CAP_SET_ATTR | OSD_SEC_CAP_GET_ATTR |
+	OSD_SEC_CAP_WRITE  | OSD_SEC_CAP_READ     | OSD_SEC_CAP_POL_SEC  |
+	OSD_SEC_CAP_GLOBAL | OSD_SEC_CAP_DEV_MGMT
+};
+
+enum { OSD_SEC_CAP_V2_ALL_CAPS =
+	OSD_SEC_CAP_V1_ALL_CAPS | OSD_SEC_CAP_QUERY | OSD_SEC_CAP_M_OBJECT
+};
+
+void osd_sec_init_nosec_doall_caps(void *caps,
+	const struct osd_obj_id *obj, bool is_collection, const bool is_v1)
+{
+	struct osd_capability *cap = caps;
+	u8 type;
+	u8 descriptor_type;
+
+	if (likely(obj->id)) {
+		if (unlikely(is_collection)) {
+			type = OSD_SEC_OBJ_COLLECTION;
+			descriptor_type = is_v1 ? OSD_SEC_OBJ_DESC_OBJ :
+						  OSD_SEC_OBJ_DESC_COL;
+		} else {
+			type = OSD_SEC_OBJ_USER;
+			descriptor_type = OSD_SEC_OBJ_DESC_OBJ;
+		}
+		WARN_ON(!obj->partition);
+	} else {
+		type = obj->partition ? OSD_SEC_OBJ_PARTITION :
+					OSD_SEC_OBJ_ROOT;
+		descriptor_type = OSD_SEC_OBJ_DESC_PAR;
+	}
+
+	memset(cap, 0, sizeof(*cap));
+
+	cap->h.format = OSD_SEC_CAP_FORMAT_VER1;
+	cap->h.integrity_algorithm__key_version = 0; /* MAKE_BYTE(0, 0); */
+	cap->h.security_method = OSD_SEC_NOSEC;
+/*	cap->expiration_time;
+	cap->AUDIT[30-10];
+	cap->discriminator[42-30];
+	cap->object_created_time; */
+	cap->h.object_type = type;
+	osd_sec_set_caps(&cap->h, OSD_SEC_CAP_V1_ALL_CAPS);
+	cap->h.object_descriptor_type = descriptor_type;
+	cap->od.obj_desc.policy_access_tag = 0;
+	cap->od.obj_desc.allowed_partition_id = cpu_to_be64(obj->partition);
+	cap->od.obj_desc.allowed_object_id = cpu_to_be64(obj->id);
+}
+EXPORT_SYMBOL(osd_sec_init_nosec_doall_caps);
+
+/* FIXME: Extract version from caps pointer.
+ *        Also Pete's target only supports caps from OSDv1 for now
+ */
+void osd_set_caps(struct osd_cdb *cdb, const void *caps)
+{
+	bool is_ver1 = true;
+	/* NOTE: They start at same address */
+	memcpy(&cdb->v1.caps, caps, is_ver1 ? OSDv1_CAP_LEN : OSD_CAP_LEN);
+}
+
+bool osd_is_sec_alldata(struct osd_security_parameters *sec_parms __unused)
+{
+	return false;
+}
+
+void osd_sec_sign_cdb(struct osd_cdb *ocdb __unused, const u8 *cap_key __unused)
+{
+}
+
+void osd_sec_sign_data(void *data_integ __unused,
+		       struct bio *bio __unused, const u8 *cap_key __unused)
+{
+}
+
+/*
+ * Declared in osd_protocol.h
+ * 4.12.5 Data-In and Data-Out buffer offsets
+ * byte offset = mantissa * (2^(exponent+8))
+ * Returns the smallest allowed encoded offset that contains given @offset
+ * The actual encoded offset returned is @offset + *@padding.
+ */
+osd_cdb_offset __osd_encode_offset(
+	u64 offset, unsigned *padding, int min_shift, int max_shift)
+{
+	u64 try_offset = -1, mod, align;
+	osd_cdb_offset be32_offset;
+	int shift;
+
+	*padding = 0;
+	if (!offset)
+		return 0;
+
+	for (shift = min_shift; shift < max_shift; ++shift) {
+		try_offset = offset >> shift;
+		if (try_offset < (1 << OSD_OFFSET_MAX_BITS))
+			break;
+	}
+
+	BUG_ON(shift == max_shift);
+
+	align = 1 << shift;
+	mod = offset & (align - 1);
+	if (mod) {
+		*padding = align - mod;
+		try_offset += 1;
+	}
+
+	try_offset |= ((shift - 8) & 0xf) << 28;
+	be32_offset = cpu_to_be32((u32)try_offset);
+
+	OSD_DEBUG("offset=%llu mantissa=%llu exp=%d encoded=%x pad=%d\n",
+		 _LLU(offset), _LLU(try_offset & 0x0FFFFFFF), shift,
+		 be32_offset, *padding);
+	return be32_offset;
+}
diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
new file mode 100644
index 000000000000..f8b1a749958b
--- /dev/null
+++ b/drivers/scsi/osd/osd_uld.c
@@ -0,0 +1,487 @@
+/*
+ * osd_uld.c - OSD Upper Layer Driver
+ *
+ * A Linux driver module that registers as a SCSI ULD and probes
+ * for OSD type SCSI devices.
+ * It's main function is to export osd devices to in-kernel users like
+ * osdfs and pNFS-objects-LD. It also provides one ioctl for running
+ * in Kernel tests.
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the Panasas company nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/namei.h>
+#include <linux/cdev.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/idr.h>
+#include <linux/major.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_ioctl.h>
+
+#include <scsi/osd_initiator.h>
+#include <scsi/osd_sec.h>
+
+#include "osd_debug.h"
+
+#ifndef TYPE_OSD
+#  define TYPE_OSD 0x11
+#endif
+
+#ifndef SCSI_OSD_MAJOR
+#  define SCSI_OSD_MAJOR 260
+#endif
+#define SCSI_OSD_MAX_MINOR 64
+
+static const char osd_name[] = "osd";
+static const char *osd_version_string = "open-osd 0.1.0";
+const char osd_symlink[] = "scsi_osd";
+
+MODULE_AUTHOR("Boaz Harrosh <bharrosh@panasas.com>");
+MODULE_DESCRIPTION("open-osd Upper-Layer-Driver osd.ko");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CHARDEV_MAJOR(SCSI_OSD_MAJOR);
+MODULE_ALIAS_SCSI_DEVICE(TYPE_OSD);
+
+struct osd_uld_device {
+	int minor;
+	struct kref kref;
+	struct cdev cdev;
+	struct osd_dev od;
+	struct gendisk *disk;
+	struct device *class_member;
+};
+
+static void __uld_get(struct osd_uld_device *oud);
+static void __uld_put(struct osd_uld_device *oud);
+
+/*
+ * Char Device operations
+ */
+
+static int osd_uld_open(struct inode *inode, struct file *file)
+{
+	struct osd_uld_device *oud = container_of(inode->i_cdev,
+					struct osd_uld_device, cdev);
+
+	__uld_get(oud);
+	/* cache osd_uld_device on file handle */
+	file->private_data = oud;
+	OSD_DEBUG("osd_uld_open %p\n", oud);
+	return 0;
+}
+
+static int osd_uld_release(struct inode *inode, struct file *file)
+{
+	struct osd_uld_device *oud = file->private_data;
+
+	OSD_DEBUG("osd_uld_release %p\n", file->private_data);
+	file->private_data = NULL;
+	__uld_put(oud);
+	return 0;
+}
+
+/* FIXME: Only one vector for now */
+unsigned g_test_ioctl;
+do_test_fn *g_do_test;
+
+int osduld_register_test(unsigned ioctl, do_test_fn *do_test)
+{
+	if (g_test_ioctl)
+		return -EINVAL;
+
+	g_test_ioctl = ioctl;
+	g_do_test = do_test;
+	return 0;
+}
+EXPORT_SYMBOL(osduld_register_test);
+
+void osduld_unregister_test(unsigned ioctl)
+{
+	if (ioctl == g_test_ioctl) {
+		g_test_ioctl = 0;
+		g_do_test = NULL;
+	}
+}
+EXPORT_SYMBOL(osduld_unregister_test);
+
+static do_test_fn *_find_ioctl(unsigned cmd)
+{
+	if (g_test_ioctl == cmd)
+		return g_do_test;
+	else
+		return NULL;
+}
+
+static long osd_uld_ioctl(struct file *file, unsigned int cmd,
+	unsigned long arg)
+{
+	struct osd_uld_device *oud = file->private_data;
+	int ret;
+	do_test_fn *do_test;
+
+	do_test = _find_ioctl(cmd);
+	if (do_test)
+		ret = do_test(&oud->od, cmd, arg);
+	else {
+		OSD_ERR("Unknown ioctl %d: osd_uld_device=%p\n", cmd, oud);
+		ret = -ENOIOCTLCMD;
+	}
+	return ret;
+}
+
+static const struct file_operations osd_fops = {
+	.owner          = THIS_MODULE,
+	.open           = osd_uld_open,
+	.release        = osd_uld_release,
+	.unlocked_ioctl = osd_uld_ioctl,
+};
+
+struct osd_dev *osduld_path_lookup(const char *path)
+{
+	struct nameidata nd;
+	struct inode *inode;
+	struct cdev *cdev;
+	struct osd_uld_device *uninitialized_var(oud);
+	int error;
+
+	if (!path || !*path) {
+		OSD_ERR("Mount with !path || !*path\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	error = path_lookup(path, LOOKUP_FOLLOW, &nd);
+	if (error) {
+		OSD_ERR("path_lookup of %s faild=>%d\n", path, error);
+		return ERR_PTR(error);
+	}
+
+	inode = nd.path.dentry->d_inode;
+	error = -EINVAL; /* Not the right device e.g osd_uld_device */
+	if (!S_ISCHR(inode->i_mode)) {
+		OSD_DEBUG("!S_ISCHR()\n");
+		goto out;
+	}
+
+	cdev = inode->i_cdev;
+	if (!cdev) {
+		OSD_ERR("Before mounting an OSD Based filesystem\n");
+		OSD_ERR("  user-mode must open+close the %s device\n", path);
+		OSD_ERR("  Example: bash: echo < %s\n", path);
+		goto out;
+	}
+
+	/* The Magic wand. Is it our char-dev */
+	/* TODO: Support sg devices */
+	if (cdev->owner != THIS_MODULE) {
+		OSD_ERR("Error mounting %s - is not an OSD device\n", path);
+		goto out;
+	}
+
+	oud = container_of(cdev, struct osd_uld_device, cdev);
+
+	__uld_get(oud);
+	error = 0;
+
+out:
+	path_put(&nd.path);
+	return error ? ERR_PTR(error) : &oud->od;
+}
+EXPORT_SYMBOL(osduld_path_lookup);
+
+void osduld_put_device(struct osd_dev *od)
+{
+	if (od) {
+		struct osd_uld_device *oud = container_of(od,
+						struct osd_uld_device, od);
+
+		__uld_put(oud);
+	}
+}
+EXPORT_SYMBOL(osduld_put_device);
+
+/*
+ * Scsi Device operations
+ */
+
+static int __detect_osd(struct osd_uld_device *oud)
+{
+	struct scsi_device *scsi_device = oud->od.scsi_device;
+	char caps[OSD_CAP_LEN];
+	int error;
+
+	/* sending a test_unit_ready as first command seems to be needed
+	 * by some targets
+	 */
+	OSD_DEBUG("start scsi_test_unit_ready %p %p %p\n",
+			oud, scsi_device, scsi_device->request_queue);
+	error = scsi_test_unit_ready(scsi_device, 10*HZ, 5, NULL);
+	if (error)
+		OSD_ERR("warning: scsi_test_unit_ready failed\n");
+
+	osd_sec_init_nosec_doall_caps(caps, &osd_root_object, false, true);
+	if (osd_auto_detect_ver(&oud->od, caps))
+		return -ENODEV;
+
+	return 0;
+}
+
+static struct class *osd_sysfs_class;
+static DEFINE_IDA(osd_minor_ida);
+
+static int osd_probe(struct device *dev)
+{
+	struct scsi_device *scsi_device = to_scsi_device(dev);
+	struct gendisk *disk;
+	struct osd_uld_device *oud;
+	int minor;
+	int error;
+
+	if (scsi_device->type != TYPE_OSD)
+		return -ENODEV;
+
+	do {
+		if (!ida_pre_get(&osd_minor_ida, GFP_KERNEL))
+			return -ENODEV;
+
+		error = ida_get_new(&osd_minor_ida, &minor);
+	} while (error == -EAGAIN);
+
+	if (error)
+		return error;
+	if (minor >= SCSI_OSD_MAX_MINOR) {
+		error = -EBUSY;
+		goto err_retract_minor;
+	}
+
+	error = -ENOMEM;
+	oud = kzalloc(sizeof(*oud), GFP_KERNEL);
+	if (NULL == oud)
+		goto err_retract_minor;
+
+	kref_init(&oud->kref);
+	dev_set_drvdata(dev, oud);
+	oud->minor = minor;
+
+	/* allocate a disk and set it up */
+	/* FIXME: do we need this since sg has already done that */
+	disk = alloc_disk(1);
+	if (!disk) {
+		OSD_ERR("alloc_disk failed\n");
+		goto err_free_osd;
+	}
+	disk->major = SCSI_OSD_MAJOR;
+	disk->first_minor = oud->minor;
+	sprintf(disk->disk_name, "osd%d", oud->minor);
+	oud->disk = disk;
+
+	/* hold one more reference to the scsi_device that will get released
+	 * in __release, in case a logout is happening while fs is mounted
+	 */
+	scsi_device_get(scsi_device);
+	osd_dev_init(&oud->od, scsi_device);
+
+	/* Detect the OSD Version */
+	error = __detect_osd(oud);
+	if (error) {
+		OSD_ERR("osd detection failed, non-compatible OSD device\n");
+		goto err_put_disk;
+	}
+
+	/* init the char-device for communication with user-mode */
+	cdev_init(&oud->cdev, &osd_fops);
+	oud->cdev.owner = THIS_MODULE;
+	error = cdev_add(&oud->cdev,
+			 MKDEV(SCSI_OSD_MAJOR, oud->minor), 1);
+	if (error) {
+		OSD_ERR("cdev_add failed\n");
+		goto err_put_disk;
+	}
+	kobject_get(&oud->cdev.kobj); /* 2nd ref see osd_remove() */
+
+	/* class_member */
+	oud->class_member = device_create(osd_sysfs_class, dev,
+		MKDEV(SCSI_OSD_MAJOR, oud->minor), "%s", disk->disk_name);
+	if (IS_ERR(oud->class_member)) {
+		OSD_ERR("class_device_create failed\n");
+		error = PTR_ERR(oud->class_member);
+		goto err_put_cdev;
+	}
+
+	dev_set_drvdata(oud->class_member, oud);
+	error = sysfs_create_link(&scsi_device->sdev_gendev.kobj,
+				  &oud->class_member->kobj, osd_symlink);
+	if (error)
+		OSD_ERR("warning: unable to make symlink\n");
+
+	OSD_INFO("osd_probe %s\n", disk->disk_name);
+	return 0;
+
+err_put_cdev:
+	cdev_del(&oud->cdev);
+err_put_disk:
+	scsi_device_put(scsi_device);
+	put_disk(disk);
+err_free_osd:
+	dev_set_drvdata(dev, NULL);
+	kfree(oud);
+err_retract_minor:
+	ida_remove(&osd_minor_ida, minor);
+	return error;
+}
+
+static int osd_remove(struct device *dev)
+{
+	struct scsi_device *scsi_device = to_scsi_device(dev);
+	struct osd_uld_device *oud = dev_get_drvdata(dev);
+
+	if (!oud || (oud->od.scsi_device != scsi_device)) {
+		OSD_ERR("Half cooked osd-device %p,%p || %p!=%p",
+			dev, oud, oud ? oud->od.scsi_device : NULL,
+			scsi_device);
+	}
+
+	sysfs_remove_link(&oud->od.scsi_device->sdev_gendev.kobj, osd_symlink);
+
+	if (oud->class_member)
+		device_destroy(osd_sysfs_class,
+			       MKDEV(SCSI_OSD_MAJOR, oud->minor));
+
+	/* We have 2 references to the cdev. One is released here
+	 * and also takes down the /dev/osdX mapping. The second
+	 * Will be released in __remove() after all users have released
+	 * the osd_uld_device.
+	 */
+	if (oud->cdev.owner)
+		cdev_del(&oud->cdev);
+
+	__uld_put(oud);
+	return 0;
+}
+
+static void __remove(struct kref *kref)
+{
+	struct osd_uld_device *oud = container_of(kref,
+					struct osd_uld_device, kref);
+	struct scsi_device *scsi_device = oud->od.scsi_device;
+
+	/* now let delete the char_dev */
+	kobject_put(&oud->cdev.kobj);
+
+	osd_dev_fini(&oud->od);
+	scsi_device_put(scsi_device);
+
+	OSD_INFO("osd_remove %s\n",
+		 oud->disk ? oud->disk->disk_name : NULL);
+
+	if (oud->disk)
+		put_disk(oud->disk);
+
+	ida_remove(&osd_minor_ida, oud->minor);
+	kfree(oud);
+}
+
+static void __uld_get(struct osd_uld_device *oud)
+{
+	kref_get(&oud->kref);
+}
+
+static void __uld_put(struct osd_uld_device *oud)
+{
+	kref_put(&oud->kref, __remove);
+}
+
+/*
+ * Global driver and scsi registration
+ */
+
+static struct scsi_driver osd_driver = {
+	.owner			= THIS_MODULE,
+	.gendrv = {
+		.name		= osd_name,
+		.probe		= osd_probe,
+		.remove		= osd_remove,
+	}
+};
+
+static int __init osd_uld_init(void)
+{
+	int err;
+
+	osd_sysfs_class = class_create(THIS_MODULE, osd_symlink);
+	if (IS_ERR(osd_sysfs_class)) {
+		OSD_ERR("Unable to register sysfs class => %ld\n",
+			PTR_ERR(osd_sysfs_class));
+		return PTR_ERR(osd_sysfs_class);
+	}
+
+	err = register_chrdev_region(MKDEV(SCSI_OSD_MAJOR, 0),
+				     SCSI_OSD_MAX_MINOR, osd_name);
+	if (err) {
+		OSD_ERR("Unable to register major %d for osd ULD => %d\n",
+			SCSI_OSD_MAJOR, err);
+		goto err_out;
+	}
+
+	err = scsi_register_driver(&osd_driver.gendrv);
+	if (err) {
+		OSD_ERR("scsi_register_driver failed => %d\n", err);
+		goto err_out_chrdev;
+	}
+
+	OSD_INFO("LOADED %s\n", osd_version_string);
+	return 0;
+
+err_out_chrdev:
+	unregister_chrdev_region(MKDEV(SCSI_OSD_MAJOR, 0), SCSI_OSD_MAX_MINOR);
+err_out:
+	class_destroy(osd_sysfs_class);
+	return err;
+}
+
+static void __exit osd_uld_exit(void)
+{
+	scsi_unregister_driver(&osd_driver.gendrv);
+	unregister_chrdev_region(MKDEV(SCSI_OSD_MAJOR, 0), SCSI_OSD_MAX_MINOR);
+	class_destroy(osd_sysfs_class);
+	OSD_INFO("UNLOADED %s\n", osd_version_string);
+}
+
+module_init(osd_uld_init);
+module_exit(osd_uld_exit);
diff --git a/drivers/scsi/osst.c b/drivers/scsi/osst.c
index 0ea78d9a37db..acb835837eec 100644
--- a/drivers/scsi/osst.c
+++ b/drivers/scsi/osst.c
@@ -280,8 +280,8 @@ static int osst_chk_result(struct osst_tape * STp, struct osst_request * SRpnt)
 			static	int	notyetprinted = 1;
 
 			printk(KERN_WARNING
-			     "%s:W: Warning %x (sugg. bt 0x%x, driver bt 0x%x, host bt 0x%x).\n",
-			     name, result, suggestion(result), driver_byte(result) & DRIVER_MASK,
+			     "%s:W: Warning %x (driver bt 0x%x, host bt 0x%x).\n",
+			     name, result, driver_byte(result),
 			     host_byte(result));
 			if (notyetprinted) {
 				notyetprinted = 0;
@@ -317,18 +317,25 @@ static int osst_chk_result(struct osst_tape * STp, struct osst_request * SRpnt)
 
 
 /* Wakeup from interrupt */
-static void osst_sleep_done(void *data, char *sense, int result, int resid)
+static void osst_end_async(struct request *req, int update)
 {
-	struct osst_request *SRpnt = data;
+	struct osst_request *SRpnt = req->end_io_data;
 	struct osst_tape *STp = SRpnt->stp;
+	struct rq_map_data *mdata = &SRpnt->stp->buffer->map_data;
 
-	memcpy(SRpnt->sense, sense, SCSI_SENSE_BUFFERSIZE);
-	STp->buffer->cmdstat.midlevel_result = SRpnt->result = result;
+	STp->buffer->cmdstat.midlevel_result = SRpnt->result = req->errors;
 #if DEBUG
 	STp->write_pending = 0;
 #endif
 	if (SRpnt->waiting)
 		complete(SRpnt->waiting);
+
+	if (SRpnt->bio) {
+		kfree(mdata->pages);
+		blk_rq_unmap_user(SRpnt->bio);
+	}
+
+	__blk_put_request(req->q, req);
 }
 
 /* osst_request memory management */
@@ -342,6 +349,74 @@ static void osst_release_request(struct osst_request *streq)
 	kfree(streq);
 }
 
+static int osst_execute(struct osst_request *SRpnt, const unsigned char *cmd,
+			int cmd_len, int data_direction, void *buffer, unsigned bufflen,
+			int use_sg, int timeout, int retries)
+{
+	struct request *req;
+	struct page **pages = NULL;
+	struct rq_map_data *mdata = &SRpnt->stp->buffer->map_data;
+
+	int err = 0;
+	int write = (data_direction == DMA_TO_DEVICE);
+
+	req = blk_get_request(SRpnt->stp->device->request_queue, write, GFP_KERNEL);
+	if (!req)
+		return DRIVER_ERROR << 24;
+
+	req->cmd_type = REQ_TYPE_BLOCK_PC;
+	req->cmd_flags |= REQ_QUIET;
+
+	SRpnt->bio = NULL;
+
+	if (use_sg) {
+		struct scatterlist *sg, *sgl = (struct scatterlist *)buffer;
+		int i;
+
+		pages = kzalloc(use_sg * sizeof(struct page *), GFP_KERNEL);
+		if (!pages)
+			goto free_req;
+
+		for_each_sg(sgl, sg, use_sg, i)
+			pages[i] = sg_page(sg);
+
+		mdata->null_mapped = 1;
+
+		mdata->page_order = get_order(sgl[0].length);
+		mdata->nr_entries =
+			DIV_ROUND_UP(bufflen, PAGE_SIZE << mdata->page_order);
+		mdata->offset = 0;
+
+		err = blk_rq_map_user(req->q, req, mdata, NULL, bufflen, GFP_KERNEL);
+		if (err) {
+			kfree(pages);
+			goto free_req;
+		}
+		SRpnt->bio = req->bio;
+		mdata->pages = pages;
+
+	} else if (bufflen) {
+		err = blk_rq_map_kern(req->q, req, buffer, bufflen, GFP_KERNEL);
+		if (err)
+			goto free_req;
+	}
+
+	req->cmd_len = cmd_len;
+	memset(req->cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */
+	memcpy(req->cmd, cmd, req->cmd_len);
+	req->sense = SRpnt->sense;
+	req->sense_len = 0;
+	req->timeout = timeout;
+	req->retries = retries;
+	req->end_io_data = SRpnt;
+
+	blk_execute_rq_nowait(req->q, NULL, req, 1, osst_end_async);
+	return 0;
+free_req:
+	blk_put_request(req);
+	return DRIVER_ERROR << 24;
+}
+
 /* Do the scsi command. Waits until command performed if do_wait is true.
    Otherwise osst_write_behind_check() is used to check that the command
    has finished. */
@@ -403,8 +478,8 @@ static	struct osst_request * osst_do_scsi(struct osst_request *SRpnt, struct oss
 	STp->buffer->cmdstat.have_sense = 0;
 	STp->buffer->syscall_result = 0;
 
-	if (scsi_execute_async(STp->device, cmd, COMMAND_SIZE(cmd[0]), direction, bp, bytes,
-			use_sg, timeout, retries, SRpnt, osst_sleep_done, GFP_KERNEL))
+	if (osst_execute(SRpnt, cmd, COMMAND_SIZE(cmd[0]), direction, bp, bytes,
+			 use_sg, timeout, retries))
 		/* could not allocate the buffer or request was too large */
 		(STp->buffer)->syscall_result = (-EBUSY);
 	else if (do_wait) {
@@ -5286,11 +5361,6 @@ static int enlarge_buffer(struct osst_buffer *STbuffer, int need_dma)
 		struct page *page = alloc_pages(priority, (OS_FRAME_SIZE - got <= PAGE_SIZE) ? 0 : order);
 		STbuffer->sg[segs].offset = 0;
 		if (page == NULL) {
-			if (OS_FRAME_SIZE - got <= (max_segs - segs) * b_size / 2 && order) {
-				b_size /= 2;  /* Large enough for the rest of the buffers */
-				order--;
-				continue;
-			}
 			printk(KERN_WARNING "osst :W: Failed to enlarge buffer to %d bytes.\n",
 						OS_FRAME_SIZE);
 #if DEBUG
diff --git a/drivers/scsi/osst.h b/drivers/scsi/osst.h
index 5aa22740b5df..11d26c57f3f8 100644
--- a/drivers/scsi/osst.h
+++ b/drivers/scsi/osst.h
@@ -520,6 +520,7 @@ struct osst_buffer {
   int syscall_result;
   struct osst_request *last_SRpnt;
   struct st_cmdstatus cmdstat;
+  struct rq_map_data map_data;
   unsigned char *b_data;
   os_aux_t *aux;               /* onstream AUX structure at end of each block     */
   unsigned short use_sg;       /* zero or number of s/g segments for this adapter */
@@ -634,6 +635,7 @@ struct osst_request {
 	int result;
 	struct osst_tape *stp;
 	struct completion *waiting;
+	struct bio *bio;
 };
 
 /* Values of write_type */
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index cbcd3f681b62..a2ef03243a2c 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -967,6 +967,110 @@ int scsi_track_queue_full(struct scsi_device *sdev, int depth)
 EXPORT_SYMBOL(scsi_track_queue_full);
 
 /**
+ * scsi_vpd_inquiry - Request a device provide us with a VPD page
+ * @sdev: The device to ask
+ * @buffer: Where to put the result
+ * @page: Which Vital Product Data to return
+ * @len: The length of the buffer
+ *
+ * This is an internal helper function.  You probably want to use
+ * scsi_get_vpd_page instead.
+ *
+ * Returns 0 on success or a negative error number.
+ */
+static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer,
+							u8 page, unsigned len)
+{
+	int result;
+	unsigned char cmd[16];
+
+	cmd[0] = INQUIRY;
+	cmd[1] = 1;		/* EVPD */
+	cmd[2] = page;
+	cmd[3] = len >> 8;
+	cmd[4] = len & 0xff;
+	cmd[5] = 0;		/* Control byte */
+
+	/*
+	 * I'm not convinced we need to try quite this hard to get VPD, but
+	 * all the existing users tried this hard.
+	 */
+	result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer,
+				  len + 4, NULL, 30 * HZ, 3, NULL);
+	if (result)
+		return result;
+
+	/* Sanity check that we got the page back that we asked for */
+	if (buffer[1] != page)
+		return -EIO;
+
+	return 0;
+}
+
+/**
+ * scsi_get_vpd_page - Get Vital Product Data from a SCSI device
+ * @sdev: The device to ask
+ * @page: Which Vital Product Data to return
+ *
+ * SCSI devices may optionally supply Vital Product Data.  Each 'page'
+ * of VPD is defined in the appropriate SCSI document (eg SPC, SBC).
+ * If the device supports this VPD page, this routine returns a pointer
+ * to a buffer containing the data from that page.  The caller is
+ * responsible for calling kfree() on this pointer when it is no longer
+ * needed.  If we cannot retrieve the VPD page this routine returns %NULL.
+ */
+unsigned char *scsi_get_vpd_page(struct scsi_device *sdev, u8 page)
+{
+	int i, result;
+	unsigned int len;
+	unsigned char *buf = kmalloc(259, GFP_KERNEL);
+
+	if (!buf)
+		return NULL;
+
+	/* Ask for all the pages supported by this device */
+	result = scsi_vpd_inquiry(sdev, buf, 0, 255);
+	if (result)
+		goto fail;
+
+	/* If the user actually wanted this page, we can skip the rest */
+	if (page == 0)
+		return buf;
+
+	for (i = 0; i < buf[3]; i++)
+		if (buf[i + 4] == page)
+			goto found;
+	/* The device claims it doesn't support the requested page */
+	goto fail;
+
+ found:
+	result = scsi_vpd_inquiry(sdev, buf, page, 255);
+	if (result)
+		goto fail;
+
+	/*
+	 * Some pages are longer than 255 bytes.  The actual length of
+	 * the page is returned in the header.
+	 */
+	len = (buf[2] << 8) | buf[3];
+	if (len <= 255)
+		return buf;
+
+	kfree(buf);
+	buf = kmalloc(len + 4, GFP_KERNEL);
+	result = scsi_vpd_inquiry(sdev, buf, page, len);
+	if (result)
+		goto fail;
+
+	return buf;
+
+ fail:
+	kfree(buf);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
+
+/**
  * scsi_device_get  -  get an additional reference to a scsi_device
  * @sdev:	device to get a reference to
  *
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 6eebd0bbe8a8..213123b0486b 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -40,6 +40,9 @@
 #include <linux/moduleparam.h>
 #include <linux/scatterlist.h>
 #include <linux/blkdev.h>
+#include <linux/crc-t10dif.h>
+
+#include <net/checksum.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -48,8 +51,7 @@
 #include <scsi/scsicam.h>
 #include <scsi/scsi_eh.h>
 
-#include <linux/stat.h>
-
+#include "sd.h"
 #include "scsi_logging.h"
 
 #define SCSI_DEBUG_VERSION "1.81"
@@ -95,6 +97,10 @@ static const char * scsi_debug_version_date = "20070104";
 #define DEF_FAKE_RW	0
 #define DEF_VPD_USE_HOSTNO 1
 #define DEF_SECTOR_SIZE 512
+#define DEF_DIX 0
+#define DEF_DIF 0
+#define DEF_GUARD 0
+#define DEF_ATO 1
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -102,6 +108,8 @@ static const char * scsi_debug_version_date = "20070104";
 #define SCSI_DEBUG_OPT_TIMEOUT   4
 #define SCSI_DEBUG_OPT_RECOVERED_ERR   8
 #define SCSI_DEBUG_OPT_TRANSPORT_ERR   16
+#define SCSI_DEBUG_OPT_DIF_ERR   32
+#define SCSI_DEBUG_OPT_DIX_ERR   64
 /* When "every_nth" > 0 then modulo "every_nth" commands:
  *   - a no response is simulated if SCSI_DEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
@@ -144,6 +152,10 @@ static int scsi_debug_virtual_gb = DEF_VIRTUAL_GB;
 static int scsi_debug_fake_rw = DEF_FAKE_RW;
 static int scsi_debug_vpd_use_hostno = DEF_VPD_USE_HOSTNO;
 static int scsi_debug_sector_size = DEF_SECTOR_SIZE;
+static int scsi_debug_dix = DEF_DIX;
+static int scsi_debug_dif = DEF_DIF;
+static int scsi_debug_guard = DEF_GUARD;
+static int scsi_debug_ato = DEF_ATO;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -204,11 +216,15 @@ struct sdebug_queued_cmd {
 static struct sdebug_queued_cmd queued_arr[SCSI_DEBUG_CANQUEUE];
 
 static unsigned char * fake_storep;	/* ramdisk storage */
+static unsigned char *dif_storep;	/* protection info */
 
 static int num_aborts = 0;
 static int num_dev_resets = 0;
 static int num_bus_resets = 0;
 static int num_host_resets = 0;
+static int dix_writes;
+static int dix_reads;
+static int dif_errors;
 
 static DEFINE_SPINLOCK(queued_arr_lock);
 static DEFINE_RWLOCK(atomic_rw);
@@ -217,6 +233,11 @@ static char sdebug_proc_name[] = "scsi_debug";
 
 static struct bus_type pseudo_lld_bus;
 
+static inline sector_t dif_offset(sector_t sector)
+{
+	return sector << 3;
+}
+
 static struct device_driver sdebug_driverfs_driver = {
 	.name 		= sdebug_proc_name,
 	.bus		= &pseudo_lld_bus,
@@ -225,6 +246,9 @@ static struct device_driver sdebug_driverfs_driver = {
 static const int check_condition_result =
 		(DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
 
+static const int illegal_condition_result =
+	(DRIVER_SENSE << 24) | (DID_ABORT << 16) | SAM_STAT_CHECK_CONDITION;
+
 static unsigned char ctrl_m_pg[] = {0xa, 10, 2, 0, 0, 0, 0, 0,
 				    0, 0, 0x2, 0x4b};
 static unsigned char iec_m_pg[] = {0x1c, 0xa, 0x08, 0, 0, 0, 0, 0,
@@ -726,7 +750,12 @@ static int resp_inquiry(struct scsi_cmnd * scp, int target,
 		} else if (0x86 == cmd[2]) { /* extended inquiry */
 			arr[1] = cmd[2];	/*sanity */
 			arr[3] = 0x3c;	/* number of following entries */
-			arr[4] = 0x0;   /* no protection stuff */
+			if (scsi_debug_dif == SD_DIF_TYPE3_PROTECTION)
+				arr[4] = 0x4;	/* SPT: GRD_CHK:1 */
+			else if (scsi_debug_dif)
+				arr[4] = 0x5;   /* SPT: GRD_CHK:1, REF_CHK:1 */
+			else
+				arr[4] = 0x0;   /* no protection stuff */
 			arr[5] = 0x7;   /* head of q, ordered + simple q's */
 		} else if (0x87 == cmd[2]) { /* mode page policy */
 			arr[1] = cmd[2];	/*sanity */
@@ -767,6 +796,7 @@ static int resp_inquiry(struct scsi_cmnd * scp, int target,
 	arr[2] = scsi_debug_scsi_level;
 	arr[3] = 2;    /* response_data_format==2 */
 	arr[4] = SDEBUG_LONG_INQ_SZ - 5;
+	arr[5] = scsi_debug_dif ? 1 : 0; /* PROTECT bit */
 	if (0 == scsi_debug_vpd_use_hostno)
 		arr[5] = 0x10; /* claim: implicit TGPS */
 	arr[6] = 0x10; /* claim: MultiP */
@@ -915,6 +945,12 @@ static int resp_readcap16(struct scsi_cmnd * scp,
 	arr[9] = (scsi_debug_sector_size >> 16) & 0xff;
 	arr[10] = (scsi_debug_sector_size >> 8) & 0xff;
 	arr[11] = scsi_debug_sector_size & 0xff;
+
+	if (scsi_debug_dif) {
+		arr[12] = (scsi_debug_dif - 1) << 1; /* P_TYPE */
+		arr[12] |= 1; /* PROT_EN */
+	}
+
 	return fill_from_dev_buffer(scp, arr,
 				    min(alloc_len, SDEBUG_READCAP16_ARR_SZ));
 }
@@ -1066,6 +1102,10 @@ static int resp_ctrl_m_pg(unsigned char * p, int pcontrol, int target)
 		ctrl_m_pg[2] |= 0x4;
 	else
 		ctrl_m_pg[2] &= ~0x4;
+
+	if (scsi_debug_ato)
+		ctrl_m_pg[5] |= 0x80; /* ATO=1 */
+
 	memcpy(p, ctrl_m_pg, sizeof(ctrl_m_pg));
 	if (1 == pcontrol)
 		memcpy(p + 2, ch_ctrl_m_pg, sizeof(ch_ctrl_m_pg));
@@ -1536,6 +1576,87 @@ static int do_device_access(struct scsi_cmnd *scmd,
 	return ret;
 }
 
+static int prot_verify_read(struct scsi_cmnd *SCpnt, sector_t start_sec,
+			    unsigned int sectors)
+{
+	unsigned int i, resid;
+	struct scatterlist *psgl;
+	struct sd_dif_tuple *sdt;
+	sector_t sector;
+	sector_t tmp_sec = start_sec;
+	void *paddr;
+
+	start_sec = do_div(tmp_sec, sdebug_store_sectors);
+
+	sdt = (struct sd_dif_tuple *)(dif_storep + dif_offset(start_sec));
+
+	for (i = 0 ; i < sectors ; i++) {
+		u16 csum;
+
+		if (sdt[i].app_tag == 0xffff)
+			continue;
+
+		sector = start_sec + i;
+
+		switch (scsi_debug_guard) {
+		case 1:
+			csum = ip_compute_csum(fake_storep +
+					       sector * scsi_debug_sector_size,
+					       scsi_debug_sector_size);
+			break;
+		case 0:
+			csum = crc_t10dif(fake_storep +
+					  sector * scsi_debug_sector_size,
+					  scsi_debug_sector_size);
+			csum = cpu_to_be16(csum);
+			break;
+		default:
+			BUG();
+		}
+
+		if (sdt[i].guard_tag != csum) {
+			printk(KERN_ERR "%s: GUARD check failed on sector %lu" \
+			       " rcvd 0x%04x, data 0x%04x\n", __func__,
+			       (unsigned long)sector,
+			       be16_to_cpu(sdt[i].guard_tag),
+			       be16_to_cpu(csum));
+			dif_errors++;
+			return 0x01;
+		}
+
+		if (scsi_debug_dif != SD_DIF_TYPE3_PROTECTION &&
+		    be32_to_cpu(sdt[i].ref_tag) != (sector & 0xffffffff)) {
+			printk(KERN_ERR "%s: REF check failed on sector %lu\n",
+			       __func__, (unsigned long)sector);
+			dif_errors++;
+			return 0x03;
+		}
+	}
+
+	resid = sectors * 8; /* Bytes of protection data to copy into sgl */
+	sector = start_sec;
+
+	scsi_for_each_prot_sg(SCpnt, psgl, scsi_prot_sg_count(SCpnt), i) {
+		int len = min(psgl->length, resid);
+
+		paddr = kmap_atomic(sg_page(psgl), KM_IRQ0) + psgl->offset;
+		memcpy(paddr, dif_storep + dif_offset(sector), len);
+
+		sector += len >> 3;
+		if (sector >= sdebug_store_sectors) {
+			/* Force wrap */
+			tmp_sec = sector;
+			sector = do_div(tmp_sec, sdebug_store_sectors);
+		}
+		resid -= len;
+		kunmap_atomic(paddr, KM_IRQ0);
+	}
+
+	dix_reads++;
+
+	return 0;
+}
+
 static int resp_read(struct scsi_cmnd *SCpnt, unsigned long long lba,
 		     unsigned int num, struct sdebug_dev_info *devip)
 {
@@ -1563,12 +1684,162 @@ static int resp_read(struct scsi_cmnd *SCpnt, unsigned long long lba,
 		}
 		return check_condition_result;
 	}
+
+	/* DIX + T10 DIF */
+	if (scsi_debug_dix && scsi_prot_sg_count(SCpnt)) {
+		int prot_ret = prot_verify_read(SCpnt, lba, num);
+
+		if (prot_ret) {
+			mk_sense_buffer(devip, ABORTED_COMMAND, 0x10, prot_ret);
+			return illegal_condition_result;
+		}
+	}
+
 	read_lock_irqsave(&atomic_rw, iflags);
 	ret = do_device_access(SCpnt, devip, lba, num, 0);
 	read_unlock_irqrestore(&atomic_rw, iflags);
 	return ret;
 }
 
+void dump_sector(unsigned char *buf, int len)
+{
+	int i, j;
+
+	printk(KERN_ERR ">>> Sector Dump <<<\n");
+
+	for (i = 0 ; i < len ; i += 16) {
+		printk(KERN_ERR "%04d: ", i);
+
+		for (j = 0 ; j < 16 ; j++) {
+			unsigned char c = buf[i+j];
+			if (c >= 0x20 && c < 0x7e)
+				printk(" %c ", buf[i+j]);
+			else
+				printk("%02x ", buf[i+j]);
+		}
+
+		printk("\n");
+	}
+}
+
+static int prot_verify_write(struct scsi_cmnd *SCpnt, sector_t start_sec,
+			     unsigned int sectors)
+{
+	int i, j, ret;
+	struct sd_dif_tuple *sdt;
+	struct scatterlist *dsgl = scsi_sglist(SCpnt);
+	struct scatterlist *psgl = scsi_prot_sglist(SCpnt);
+	void *daddr, *paddr;
+	sector_t tmp_sec = start_sec;
+	sector_t sector;
+	int ppage_offset;
+	unsigned short csum;
+
+	sector = do_div(tmp_sec, sdebug_store_sectors);
+
+	if (((SCpnt->cmnd[1] >> 5) & 7) != 1) {
+		printk(KERN_WARNING "scsi_debug: WRPROTECT != 1\n");
+		return 0;
+	}
+
+	BUG_ON(scsi_sg_count(SCpnt) == 0);
+	BUG_ON(scsi_prot_sg_count(SCpnt) == 0);
+
+	paddr = kmap_atomic(sg_page(psgl), KM_IRQ1) + psgl->offset;
+	ppage_offset = 0;
+
+	/* For each data page */
+	scsi_for_each_sg(SCpnt, dsgl, scsi_sg_count(SCpnt), i) {
+		daddr = kmap_atomic(sg_page(dsgl), KM_IRQ0) + dsgl->offset;
+
+		/* For each sector-sized chunk in data page */
+		for (j = 0 ; j < dsgl->length ; j += scsi_debug_sector_size) {
+
+			/* If we're at the end of the current
+			 * protection page advance to the next one
+			 */
+			if (ppage_offset >= psgl->length) {
+				kunmap_atomic(paddr, KM_IRQ1);
+				psgl = sg_next(psgl);
+				BUG_ON(psgl == NULL);
+				paddr = kmap_atomic(sg_page(psgl), KM_IRQ1)
+					+ psgl->offset;
+				ppage_offset = 0;
+			}
+
+			sdt = paddr + ppage_offset;
+
+			switch (scsi_debug_guard) {
+			case 1:
+				csum = ip_compute_csum(daddr,
+						       scsi_debug_sector_size);
+				break;
+			case 0:
+				csum = cpu_to_be16(crc_t10dif(daddr,
+						      scsi_debug_sector_size));
+				break;
+			default:
+				BUG();
+				ret = 0;
+				goto out;
+			}
+
+			if (sdt->guard_tag != csum) {
+				printk(KERN_ERR
+				       "%s: GUARD check failed on sector %lu " \
+				       "rcvd 0x%04x, calculated 0x%04x\n",
+				       __func__, (unsigned long)sector,
+				       be16_to_cpu(sdt->guard_tag),
+				       be16_to_cpu(csum));
+				ret = 0x01;
+				dump_sector(daddr, scsi_debug_sector_size);
+				goto out;
+			}
+
+			if (scsi_debug_dif != SD_DIF_TYPE3_PROTECTION &&
+			    be32_to_cpu(sdt->ref_tag)
+			    != (start_sec & 0xffffffff)) {
+				printk(KERN_ERR
+				       "%s: REF check failed on sector %lu\n",
+				       __func__, (unsigned long)sector);
+				ret = 0x03;
+				dump_sector(daddr, scsi_debug_sector_size);
+				goto out;
+			}
+
+			/* Would be great to copy this in bigger
+			 * chunks.  However, for the sake of
+			 * correctness we need to verify each sector
+			 * before writing it to "stable" storage
+			 */
+			memcpy(dif_storep + dif_offset(sector), sdt, 8);
+
+			sector++;
+
+			if (sector == sdebug_store_sectors)
+				sector = 0;	/* Force wrap */
+
+			start_sec++;
+			daddr += scsi_debug_sector_size;
+			ppage_offset += sizeof(struct sd_dif_tuple);
+		}
+
+		kunmap_atomic(daddr, KM_IRQ0);
+	}
+
+	kunmap_atomic(paddr, KM_IRQ1);
+
+	dix_writes++;
+
+	return 0;
+
+out:
+	dif_errors++;
+	kunmap_atomic(daddr, KM_IRQ0);
+	kunmap_atomic(paddr, KM_IRQ1);
+	return ret;
+}
+
 static int resp_write(struct scsi_cmnd *SCpnt, unsigned long long lba,
 		      unsigned int num, struct sdebug_dev_info *devip)
 {
@@ -1579,6 +1850,16 @@ static int resp_write(struct scsi_cmnd *SCpnt, unsigned long long lba,
 	if (ret)
 		return ret;
 
+	/* DIX + T10 DIF */
+	if (scsi_debug_dix && scsi_prot_sg_count(SCpnt)) {
+		int prot_ret = prot_verify_write(SCpnt, lba, num);
+
+		if (prot_ret) {
+			mk_sense_buffer(devip, ILLEGAL_REQUEST, 0x10, prot_ret);
+			return illegal_condition_result;
+		}
+	}
+
 	write_lock_irqsave(&atomic_rw, iflags);
 	ret = do_device_access(SCpnt, devip, lba, num, 1);
 	write_unlock_irqrestore(&atomic_rw, iflags);
@@ -2095,6 +2376,10 @@ module_param_named(virtual_gb, scsi_debug_virtual_gb, int, S_IRUGO | S_IWUSR);
 module_param_named(vpd_use_hostno, scsi_debug_vpd_use_hostno, int,
 		   S_IRUGO | S_IWUSR);
 module_param_named(sector_size, scsi_debug_sector_size, int, S_IRUGO);
+module_param_named(dix, scsi_debug_dix, int, S_IRUGO);
+module_param_named(dif, scsi_debug_dif, int, S_IRUGO);
+module_param_named(guard, scsi_debug_guard, int, S_IRUGO);
+module_param_named(ato, scsi_debug_ato, int, S_IRUGO);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2117,7 +2402,10 @@ MODULE_PARM_DESC(scsi_level, "SCSI level to simulate(def=5[SPC-3])");
 MODULE_PARM_DESC(virtual_gb, "virtual gigabyte size (def=0 -> use dev_size_mb)");
 MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique dev ids)");
 MODULE_PARM_DESC(sector_size, "hardware sector size in bytes (def=512)");
-
+MODULE_PARM_DESC(dix, "data integrity extensions mask (def=0)");
+MODULE_PARM_DESC(dif, "data integrity field type: 0-3 (def=0)");
+MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
+MODULE_PARM_DESC(ato, "application tag ownership: 0=disk 1=host (def=1)");
 
 static char sdebug_info[256];
 
@@ -2164,14 +2452,14 @@ static int scsi_debug_proc_info(struct Scsi_Host *host, char *buffer, char **sta
 	    "delay=%d, max_luns=%d, scsi_level=%d\n"
 	    "sector_size=%d bytes, cylinders=%d, heads=%d, sectors=%d\n"
 	    "number of aborts=%d, device_reset=%d, bus_resets=%d, "
-	    "host_resets=%d\n",
+	    "host_resets=%d\ndix_reads=%d dix_writes=%d dif_errors=%d\n",
 	    SCSI_DEBUG_VERSION, scsi_debug_version_date, scsi_debug_num_tgts,
 	    scsi_debug_dev_size_mb, scsi_debug_opts, scsi_debug_every_nth,
 	    scsi_debug_cmnd_count, scsi_debug_delay,
 	    scsi_debug_max_luns, scsi_debug_scsi_level,
 	    scsi_debug_sector_size, sdebug_cylinders_per, sdebug_heads,
 	    sdebug_sectors_per, num_aborts, num_dev_resets, num_bus_resets,
-	    num_host_resets);
+	    num_host_resets, dix_reads, dix_writes, dif_errors);
 	if (pos < offset) {
 		len = 0;
 		begin = pos;
@@ -2452,6 +2740,31 @@ static ssize_t sdebug_sector_size_show(struct device_driver * ddp, char * buf)
 }
 DRIVER_ATTR(sector_size, S_IRUGO, sdebug_sector_size_show, NULL);
 
+static ssize_t sdebug_dix_show(struct device_driver *ddp, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", scsi_debug_dix);
+}
+DRIVER_ATTR(dix, S_IRUGO, sdebug_dix_show, NULL);
+
+static ssize_t sdebug_dif_show(struct device_driver *ddp, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", scsi_debug_dif);
+}
+DRIVER_ATTR(dif, S_IRUGO, sdebug_dif_show, NULL);
+
+static ssize_t sdebug_guard_show(struct device_driver *ddp, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", scsi_debug_guard);
+}
+DRIVER_ATTR(guard, S_IRUGO, sdebug_guard_show, NULL);
+
+static ssize_t sdebug_ato_show(struct device_driver *ddp, char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%d\n", scsi_debug_ato);
+}
+DRIVER_ATTR(ato, S_IRUGO, sdebug_ato_show, NULL);
+
+
 /* Note: The following function creates attribute files in the
    /sys/bus/pseudo/drivers/scsi_debug directory. The advantage of these
    files (over those found in the /sys/module/scsi_debug/parameters
@@ -2478,11 +2791,19 @@ static int do_create_driverfs_files(void)
 	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_virtual_gb);
 	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_vpd_use_hostno);
 	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_sector_size);
+	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_dix);
+	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_dif);
+	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_guard);
+	ret |= driver_create_file(&sdebug_driverfs_driver, &driver_attr_ato);
 	return ret;
 }
 
 static void do_remove_driverfs_files(void)
 {
+	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_ato);
+	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_guard);
+	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_dif);
+	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_dix);
 	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_sector_size);
 	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_vpd_use_hostno);
 	driver_remove_file(&sdebug_driverfs_driver, &driver_attr_virtual_gb);
@@ -2526,11 +2847,33 @@ static int __init scsi_debug_init(void)
 	case 4096:
 		break;
 	default:
-		printk(KERN_ERR "scsi_debug_init: invalid sector_size %u\n",
+		printk(KERN_ERR "scsi_debug_init: invalid sector_size %d\n",
 		       scsi_debug_sector_size);
 		return -EINVAL;
 	}
 
+	switch (scsi_debug_dif) {
+
+	case SD_DIF_TYPE0_PROTECTION:
+	case SD_DIF_TYPE1_PROTECTION:
+	case SD_DIF_TYPE3_PROTECTION:
+		break;
+
+	default:
+		printk(KERN_ERR "scsi_debug_init: dif must be 0, 1 or 3\n");
+		return -EINVAL;
+	}
+
+	if (scsi_debug_guard > 1) {
+		printk(KERN_ERR "scsi_debug_init: guard must be 0 or 1\n");
+		return -EINVAL;
+	}
+
+	if (scsi_debug_ato > 1) {
+		printk(KERN_ERR "scsi_debug_init: ato must be 0 or 1\n");
+		return -EINVAL;
+	}
+
 	if (scsi_debug_dev_size_mb < 1)
 		scsi_debug_dev_size_mb = 1;  /* force minimum 1 MB ramdisk */
 	sz = (unsigned long)scsi_debug_dev_size_mb * 1048576;
@@ -2563,6 +2906,24 @@ static int __init scsi_debug_init(void)
 	if (scsi_debug_num_parts > 0)
 		sdebug_build_parts(fake_storep, sz);
 
+	if (scsi_debug_dif) {
+		int dif_size;
+
+		dif_size = sdebug_store_sectors * sizeof(struct sd_dif_tuple);
+		dif_storep = vmalloc(dif_size);
+
+		printk(KERN_ERR "scsi_debug_init: dif_storep %u bytes @ %p\n",
+		       dif_size, dif_storep);
+
+		if (dif_storep == NULL) {
+			printk(KERN_ERR "scsi_debug_init: out of mem. (DIX)\n");
+			ret = -ENOMEM;
+			goto free_vm;
+		}
+
+		memset(dif_storep, 0xff, dif_size);
+	}
+
 	ret = device_register(&pseudo_primary);
 	if (ret < 0) {
 		printk(KERN_WARNING "scsi_debug: device_register error: %d\n",
@@ -2615,6 +2976,8 @@ bus_unreg:
 dev_unreg:
 	device_unregister(&pseudo_primary);
 free_vm:
+	if (dif_storep)
+		vfree(dif_storep);
 	vfree(fake_storep);
 
 	return ret;
@@ -2632,6 +2995,9 @@ static void __exit scsi_debug_exit(void)
 	bus_unregister(&pseudo_lld_bus);
 	device_unregister(&pseudo_primary);
 
+	if (dif_storep)
+		vfree(dif_storep);
+
 	vfree(fake_storep);
 }
 
@@ -2732,6 +3098,8 @@ int scsi_debug_queuecommand(struct scsi_cmnd *SCpnt, done_funct_t done)
 	struct sdebug_dev_info *devip = NULL;
 	int inj_recovered = 0;
 	int inj_transport = 0;
+	int inj_dif = 0;
+	int inj_dix = 0;
 	int delay_override = 0;
 
 	scsi_set_resid(SCpnt, 0);
@@ -2769,6 +3137,10 @@ int scsi_debug_queuecommand(struct scsi_cmnd *SCpnt, done_funct_t done)
 			inj_recovered = 1; /* to reads and writes below */
 		else if (SCSI_DEBUG_OPT_TRANSPORT_ERR & scsi_debug_opts)
 			inj_transport = 1; /* to reads and writes below */
+		else if (SCSI_DEBUG_OPT_DIF_ERR & scsi_debug_opts)
+			inj_dif = 1; /* to reads and writes below */
+		else if (SCSI_DEBUG_OPT_DIX_ERR & scsi_debug_opts)
+			inj_dix = 1; /* to reads and writes below */
 	}
 
 	if (devip->wlun) {
@@ -2870,6 +3242,12 @@ int scsi_debug_queuecommand(struct scsi_cmnd *SCpnt, done_funct_t done)
 			mk_sense_buffer(devip, ABORTED_COMMAND,
 					TRANSPORT_PROBLEM, ACK_NAK_TO);
 			errsts = check_condition_result;
+		} else if (inj_dif && (0 == errsts)) {
+			mk_sense_buffer(devip, ABORTED_COMMAND, 0x10, 1);
+			errsts = illegal_condition_result;
+		} else if (inj_dix && (0 == errsts)) {
+			mk_sense_buffer(devip, ILLEGAL_REQUEST, 0x10, 1);
+			errsts = illegal_condition_result;
 		}
 		break;
 	case REPORT_LUNS:	/* mandatory, ignore unit attention */
@@ -2894,6 +3272,12 @@ int scsi_debug_queuecommand(struct scsi_cmnd *SCpnt, done_funct_t done)
 			mk_sense_buffer(devip, RECOVERED_ERROR,
 					THRESHOLD_EXCEEDED, 0);
 			errsts = check_condition_result;
+		} else if (inj_dif && (0 == errsts)) {
+			mk_sense_buffer(devip, ABORTED_COMMAND, 0x10, 1);
+			errsts = illegal_condition_result;
+		} else if (inj_dix && (0 == errsts)) {
+			mk_sense_buffer(devip, ILLEGAL_REQUEST, 0x10, 1);
+			errsts = illegal_condition_result;
 		}
 		break;
 	case MODE_SENSE:
@@ -2982,6 +3366,7 @@ static int sdebug_driver_probe(struct device * dev)
         int error = 0;
         struct sdebug_host_info *sdbg_host;
         struct Scsi_Host *hpnt;
+	int host_prot;
 
 	sdbg_host = to_sdebug_host(dev);
 
@@ -3000,6 +3385,50 @@ static int sdebug_driver_probe(struct device * dev)
 		hpnt->max_id = scsi_debug_num_tgts;
 	hpnt->max_lun = SAM2_WLUN_REPORT_LUNS;	/* = scsi_debug_max_luns; */
 
+	host_prot = 0;
+
+	switch (scsi_debug_dif) {
+
+	case SD_DIF_TYPE1_PROTECTION:
+		host_prot = SHOST_DIF_TYPE1_PROTECTION;
+		if (scsi_debug_dix)
+			host_prot |= SHOST_DIX_TYPE1_PROTECTION;
+		break;
+
+	case SD_DIF_TYPE2_PROTECTION:
+		host_prot = SHOST_DIF_TYPE2_PROTECTION;
+		if (scsi_debug_dix)
+			host_prot |= SHOST_DIX_TYPE2_PROTECTION;
+		break;
+
+	case SD_DIF_TYPE3_PROTECTION:
+		host_prot = SHOST_DIF_TYPE3_PROTECTION;
+		if (scsi_debug_dix)
+			host_prot |= SHOST_DIX_TYPE3_PROTECTION;
+		break;
+
+	default:
+		if (scsi_debug_dix)
+			host_prot |= SHOST_DIX_TYPE0_PROTECTION;
+		break;
+	}
+
+	scsi_host_set_prot(hpnt, host_prot);
+
+	printk(KERN_INFO "scsi_debug: host protection%s%s%s%s%s%s%s\n",
+	       (host_prot & SHOST_DIF_TYPE1_PROTECTION) ? " DIF1" : "",
+	       (host_prot & SHOST_DIF_TYPE2_PROTECTION) ? " DIF2" : "",
+	       (host_prot & SHOST_DIF_TYPE3_PROTECTION) ? " DIF3" : "",
+	       (host_prot & SHOST_DIX_TYPE0_PROTECTION) ? " DIX0" : "",
+	       (host_prot & SHOST_DIX_TYPE1_PROTECTION) ? " DIX1" : "",
+	       (host_prot & SHOST_DIX_TYPE2_PROTECTION) ? " DIX2" : "",
+	       (host_prot & SHOST_DIX_TYPE3_PROTECTION) ? " DIX3" : "");
+
+	if (scsi_debug_guard == 1)
+		scsi_host_set_guard(hpnt, SHOST_DIX_GUARD_IP);
+	else
+		scsi_host_set_guard(hpnt, SHOST_DIX_GUARD_CRC);
+
         error = scsi_add_host(hpnt, &sdbg_host->dev);
         if (error) {
                 printk(KERN_ERR "%s: scsi_add_host failed\n", __func__);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ad6a1370761e..0c2c73be1974 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1441,6 +1441,11 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 	}
 }
 
+static void eh_lock_door_done(struct request *req, int uptodate)
+{
+	__blk_put_request(req->q, req);
+}
+
 /**
  * scsi_eh_lock_door - Prevent medium removal for the specified device
  * @sdev:	SCSI device to prevent medium removal
@@ -1463,19 +1468,28 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
  */
 static void scsi_eh_lock_door(struct scsi_device *sdev)
 {
-	unsigned char cmnd[MAX_COMMAND_SIZE];
+	struct request *req;
 
-	cmnd[0] = ALLOW_MEDIUM_REMOVAL;
-	cmnd[1] = 0;
-	cmnd[2] = 0;
-	cmnd[3] = 0;
-	cmnd[4] = SCSI_REMOVAL_PREVENT;
-	cmnd[5] = 0;
+	req = blk_get_request(sdev->request_queue, READ, GFP_KERNEL);
+	if (!req)
+		return;
 
-	scsi_execute_async(sdev, cmnd, 6, DMA_NONE, NULL, 0, 0, 10 * HZ,
-			   5, NULL, NULL, GFP_KERNEL);
-}
+	req->cmd[0] = ALLOW_MEDIUM_REMOVAL;
+	req->cmd[1] = 0;
+	req->cmd[2] = 0;
+	req->cmd[3] = 0;
+	req->cmd[4] = SCSI_REMOVAL_PREVENT;
+	req->cmd[5] = 0;
 
+	req->cmd_len = COMMAND_SIZE(req->cmd[0]);
+
+	req->cmd_type = REQ_TYPE_BLOCK_PC;
+	req->cmd_flags |= REQ_QUIET;
+	req->timeout = 10 * HZ;
+	req->retries = 5;
+
+	blk_execute_rq_nowait(req->q, NULL, req, 1, eh_lock_door_done);
+}
 
 /**
  * scsi_restart_operations - restart io operations to the specified host.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b82ffd90632e..4b13e36d3aa0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -277,196 +277,6 @@ int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
 }
 EXPORT_SYMBOL(scsi_execute_req);
 
-struct scsi_io_context {
-	void *data;
-	void (*done)(void *data, char *sense, int result, int resid);
-	char sense[SCSI_SENSE_BUFFERSIZE];
-};
-
-static struct kmem_cache *scsi_io_context_cache;
-
-static void scsi_end_async(struct request *req, int uptodate)
-{
-	struct scsi_io_context *sioc = req->end_io_data;
-
-	if (sioc->done)
-		sioc->done(sioc->data, sioc->sense, req->errors, req->data_len);
-
-	kmem_cache_free(scsi_io_context_cache, sioc);
-	__blk_put_request(req->q, req);
-}
-
-static int scsi_merge_bio(struct request *rq, struct bio *bio)
-{
-	struct request_queue *q = rq->q;
-
-	bio->bi_flags &= ~(1 << BIO_SEG_VALID);
-	if (rq_data_dir(rq) == WRITE)
-		bio->bi_rw |= (1 << BIO_RW);
-	blk_queue_bounce(q, &bio);
-
-	return blk_rq_append_bio(q, rq, bio);
-}
-
-static void scsi_bi_endio(struct bio *bio, int error)
-{
-	bio_put(bio);
-}
-
-/**
- * scsi_req_map_sg - map a scatterlist into a request
- * @rq:		request to fill
- * @sgl:	scatterlist
- * @nsegs:	number of elements
- * @bufflen:	len of buffer
- * @gfp:	memory allocation flags
- *
- * scsi_req_map_sg maps a scatterlist into a request so that the
- * request can be sent to the block layer. We do not trust the scatterlist
- * sent to use, as some ULDs use that struct to only organize the pages.
- */
-static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
-			   int nsegs, unsigned bufflen, gfp_t gfp)
-{
-	struct request_queue *q = rq->q;
-	int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	unsigned int data_len = bufflen, len, bytes, off;
-	struct scatterlist *sg;
-	struct page *page;
-	struct bio *bio = NULL;
-	int i, err, nr_vecs = 0;
-
-	for_each_sg(sgl, sg, nsegs, i) {
-		page = sg_page(sg);
-		off = sg->offset;
-		len = sg->length;
-
-		while (len > 0 && data_len > 0) {
-			/*
-			 * sg sends a scatterlist that is larger than
-			 * the data_len it wants transferred for certain
-			 * IO sizes
-			 */
-			bytes = min_t(unsigned int, len, PAGE_SIZE - off);
-			bytes = min(bytes, data_len);
-
-			if (!bio) {
-				nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
-				nr_pages -= nr_vecs;
-
-				bio = bio_alloc(gfp, nr_vecs);
-				if (!bio) {
-					err = -ENOMEM;
-					goto free_bios;
-				}
-				bio->bi_end_io = scsi_bi_endio;
-			}
-
-			if (bio_add_pc_page(q, bio, page, bytes, off) !=
-			    bytes) {
-				bio_put(bio);
-				err = -EINVAL;
-				goto free_bios;
-			}
-
-			if (bio->bi_vcnt >= nr_vecs) {
-				err = scsi_merge_bio(rq, bio);
-				if (err) {
-					bio_endio(bio, 0);
-					goto free_bios;
-				}
-				bio = NULL;
-			}
-
-			page++;
-			len -= bytes;
-			data_len -=bytes;
-			off = 0;
-		}
-	}
-
-	rq->buffer = rq->data = NULL;
-	rq->data_len = bufflen;
-	return 0;
-
-free_bios:
-	while ((bio = rq->bio) != NULL) {
-		rq->bio = bio->bi_next;
-		/*
-		 * call endio instead of bio_put incase it was bounced
-		 */
-		bio_endio(bio, 0);
-	}
-
-	return err;
-}
-
-/**
- * scsi_execute_async - insert request
- * @sdev:	scsi device
- * @cmd:	scsi command
- * @cmd_len:	length of scsi cdb
- * @data_direction: DMA_TO_DEVICE, DMA_FROM_DEVICE, or DMA_NONE
- * @buffer:	data buffer (this can be a kernel buffer or scatterlist)
- * @bufflen:	len of buffer
- * @use_sg:	if buffer is a scatterlist this is the number of elements
- * @timeout:	request timeout in seconds
- * @retries:	number of times to retry request
- * @privdata:	data passed to done()
- * @done:	callback function when done
- * @gfp:	memory allocation flags
- */
-int scsi_execute_async(struct scsi_device *sdev, const unsigned char *cmd,
-		       int cmd_len, int data_direction, void *buffer, unsigned bufflen,
-		       int use_sg, int timeout, int retries, void *privdata,
-		       void (*done)(void *, char *, int, int), gfp_t gfp)
-{
-	struct request *req;
-	struct scsi_io_context *sioc;
-	int err = 0;
-	int write = (data_direction == DMA_TO_DEVICE);
-
-	sioc = kmem_cache_zalloc(scsi_io_context_cache, gfp);
-	if (!sioc)
-		return DRIVER_ERROR << 24;
-
-	req = blk_get_request(sdev->request_queue, write, gfp);
-	if (!req)
-		goto free_sense;
-	req->cmd_type = REQ_TYPE_BLOCK_PC;
-	req->cmd_flags |= REQ_QUIET;
-
-	if (use_sg)
-		err = scsi_req_map_sg(req, buffer, use_sg, bufflen, gfp);
-	else if (bufflen)
-		err = blk_rq_map_kern(req->q, req, buffer, bufflen, gfp);
-
-	if (err)
-		goto free_req;
-
-	req->cmd_len = cmd_len;
-	memset(req->cmd, 0, BLK_MAX_CDB); /* ATAPI hates garbage after CDB */
-	memcpy(req->cmd, cmd, req->cmd_len);
-	req->sense = sioc->sense;
-	req->sense_len = 0;
-	req->timeout = timeout;
-	req->retries = retries;
-	req->end_io_data = sioc;
-
-	sioc->data = privdata;
-	sioc->done = done;
-
-	blk_execute_rq_nowait(req->q, NULL, req, 1, scsi_end_async);
-	return 0;
-
-free_req:
-	blk_put_request(req);
-free_sense:
-	kmem_cache_free(scsi_io_context_cache, sioc);
-	return DRIVER_ERROR << 24;
-}
-EXPORT_SYMBOL_GPL(scsi_execute_async);
-
 /*
  * Function:    scsi_init_cmd_errh()
  *
@@ -1920,20 +1730,12 @@ int __init scsi_init_queue(void)
 {
 	int i;
 
-	scsi_io_context_cache = kmem_cache_create("scsi_io_context",
-					sizeof(struct scsi_io_context),
-					0, 0, NULL);
-	if (!scsi_io_context_cache) {
-		printk(KERN_ERR "SCSI: can't init scsi io context cache\n");
-		return -ENOMEM;
-	}
-
 	scsi_sdb_cache = kmem_cache_create("scsi_data_buffer",
 					   sizeof(struct scsi_data_buffer),
 					   0, 0, NULL);
 	if (!scsi_sdb_cache) {
 		printk(KERN_ERR "SCSI: can't init scsi sdb cache\n");
-		goto cleanup_io_context;
+		return -ENOMEM;
 	}
 
 	for (i = 0; i < SG_MEMPOOL_NR; i++) {
@@ -1968,8 +1770,6 @@ cleanup_sdb:
 			kmem_cache_destroy(sgp->slab);
 	}
 	kmem_cache_destroy(scsi_sdb_cache);
-cleanup_io_context:
-	kmem_cache_destroy(scsi_io_context_cache);
 
 	return -ENOMEM;
 }
@@ -1978,7 +1778,6 @@ void scsi_exit_queue(void)
 {
 	int i;
 
-	kmem_cache_destroy(scsi_io_context_cache);
 	kmem_cache_destroy(scsi_sdb_cache);
 
 	for (i = 0; i < SG_MEMPOOL_NR; i++) {
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 8f4de20c9deb..a14d245a66b8 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -797,6 +797,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	case TYPE_ENCLOSURE:
 	case TYPE_COMM:
 	case TYPE_RAID:
+	case TYPE_OSD:
 		sdev->writeable = 1;
 		break;
 	case TYPE_ROM:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index da63802cbf9d..fa4711d12744 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1043,7 +1043,6 @@ EXPORT_SYMBOL(scsi_register_interface);
 /**
  * scsi_sysfs_add_host - add scsi host to subsystem
  * @shost:     scsi host struct to add to subsystem
- * @dev:       parent struct device pointer
  **/
 int scsi_sysfs_add_host(struct Scsi_Host *shost)
 {
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 3ee4eb40abcf..a152f89ae51c 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -95,7 +95,7 @@ static struct {
 	{ FC_PORTTYPE_NPORT,	"NPort (fabric via point-to-point)" },
 	{ FC_PORTTYPE_NLPORT,	"NLPort (fabric via loop)" },
 	{ FC_PORTTYPE_LPORT,	"LPort (private loop)" },
-	{ FC_PORTTYPE_PTP,	"Point-To-Point (direct nport connection" },
+	{ FC_PORTTYPE_PTP,	"Point-To-Point (direct nport connection)" },
 	{ FC_PORTTYPE_NPIV,		"NPIV VPORT" },
 };
 fc_enum_name_search(port_type, fc_port_type, fc_port_type_names)
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 2adfab8c11c1..094795455293 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -246,30 +246,13 @@ static int iscsi_setup_host(struct transport_container *tc, struct device *dev,
 	memset(ihost, 0, sizeof(*ihost));
 	atomic_set(&ihost->nr_scans, 0);
 	mutex_init(&ihost->mutex);
-
-	snprintf(ihost->scan_workq_name, sizeof(ihost->scan_workq_name),
-		 "iscsi_scan_%d", shost->host_no);
-	ihost->scan_workq = create_singlethread_workqueue(
-						ihost->scan_workq_name);
-	if (!ihost->scan_workq)
-		return -ENOMEM;
-	return 0;
-}
-
-static int iscsi_remove_host(struct transport_container *tc, struct device *dev,
-			     struct device *cdev)
-{
-	struct Scsi_Host *shost = dev_to_shost(dev);
-	struct iscsi_cls_host *ihost = shost->shost_data;
-
-	destroy_workqueue(ihost->scan_workq);
 	return 0;
 }
 
 static DECLARE_TRANSPORT_CLASS(iscsi_host_class,
 			       "iscsi_host",
 			       iscsi_setup_host,
-			       iscsi_remove_host,
+			       NULL,
 			       NULL);
 
 static DECLARE_TRANSPORT_CLASS(iscsi_session_class,
@@ -568,7 +551,7 @@ static void __iscsi_unblock_session(struct work_struct *work)
 	 * scanning from userspace).
 	 */
 	if (shost->hostt->scan_finished) {
-		if (queue_work(ihost->scan_workq, &session->scan_work))
+		if (scsi_queue_work(shost, &session->scan_work))
 			atomic_inc(&ihost->nr_scans);
 	}
 }
@@ -636,14 +619,6 @@ static void __iscsi_unbind_session(struct work_struct *work)
 	iscsi_session_event(session, ISCSI_KEVENT_UNBIND_SESSION);
 }
 
-static int iscsi_unbind_session(struct iscsi_cls_session *session)
-{
-	struct Scsi_Host *shost = iscsi_session_to_shost(session);
-	struct iscsi_cls_host *ihost = shost->shost_data;
-
-	return queue_work(ihost->scan_workq, &session->unbind_work);
-}
-
 struct iscsi_cls_session *
 iscsi_alloc_session(struct Scsi_Host *shost, struct iscsi_transport *transport,
 		    int dd_size)
@@ -796,7 +771,6 @@ static int iscsi_iter_destroy_conn_fn(struct device *dev, void *data)
 void iscsi_remove_session(struct iscsi_cls_session *session)
 {
 	struct Scsi_Host *shost = iscsi_session_to_shost(session);
-	struct iscsi_cls_host *ihost = shost->shost_data;
 	unsigned long flags;
 	int err;
 
@@ -821,7 +795,7 @@ void iscsi_remove_session(struct iscsi_cls_session *session)
 
 	scsi_target_unblock(&session->dev);
 	/* flush running scans then delete devices */
-	flush_workqueue(ihost->scan_workq);
+	scsi_flush_work(shost);
 	__iscsi_unbind_session(&session->unbind_work);
 
 	/* hw iscsi may not have removed all connections from session */
@@ -1215,14 +1189,15 @@ iscsi_if_create_session(struct iscsi_internal *priv, struct iscsi_endpoint *ep,
 {
 	struct iscsi_transport *transport = priv->iscsi_transport;
 	struct iscsi_cls_session *session;
-	uint32_t host_no;
+	struct Scsi_Host *shost;
 
 	session = transport->create_session(ep, cmds_max, queue_depth,
-					    initial_cmdsn, &host_no);
+					    initial_cmdsn);
 	if (!session)
 		return -ENOMEM;
 
-	ev->r.c_session_ret.host_no = host_no;
+	shost = iscsi_session_to_shost(session);
+	ev->r.c_session_ret.host_no = shost->host_no;
 	ev->r.c_session_ret.sid = session->sid;
 	return 0;
 }
@@ -1439,7 +1414,8 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	case ISCSI_UEVENT_UNBIND_SESSION:
 		session = iscsi_session_lookup(ev->u.d_session.sid);
 		if (session)
-			iscsi_unbind_session(session);
+			scsi_queue_work(iscsi_session_to_shost(session),
+					&session->unbind_work);
 		else
 			err = -EINVAL;
 		break;
@@ -1801,8 +1777,7 @@ iscsi_register_transport(struct iscsi_transport *tt)
 	priv->daemon_pid = -1;
 	priv->iscsi_transport = tt;
 	priv->t.user_scan = iscsi_user_scan;
-	if (!(tt->caps & CAP_DATA_PATH_OFFLOAD))
-		priv->t.create_work_queue = 1;
+	priv->t.create_work_queue = 1;
 
 	priv->dev.class = &iscsi_transport_class;
 	dev_set_name(&priv->dev, "%s", tt->name);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4970ae4a62d6..aeab5d9dff27 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1273,42 +1273,126 @@ disable:
 	sdkp->capacity = 0;
 }
 
-/*
- * read disk capacity
- */
-static void
-sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
+static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
+			struct scsi_sense_hdr *sshdr, int sense_valid,
+			int the_result)
+{
+	sd_print_result(sdkp, the_result);
+	if (driver_byte(the_result) & DRIVER_SENSE)
+		sd_print_sense_hdr(sdkp, sshdr);
+	else
+		sd_printk(KERN_NOTICE, sdkp, "Sense not available.\n");
+
+	/*
+	 * Set dirty bit for removable devices if not ready -
+	 * sometimes drives will not report this properly.
+	 */
+	if (sdp->removable &&
+	    sense_valid && sshdr->sense_key == NOT_READY)
+		sdp->changed = 1;
+
+	/*
+	 * We used to set media_present to 0 here to indicate no media
+	 * in the drive, but some drives fail read capacity even with
+	 * media present, so we can't do that.
+	 */
+	sdkp->capacity = 0; /* unknown mapped to zero - as usual */
+}
+
+#define RC16_LEN 32
+#if RC16_LEN > SD_BUF_SIZE
+#error RC16_LEN must not be more than SD_BUF_SIZE
+#endif
+
+static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
+						unsigned char *buffer)
 {
 	unsigned char cmd[16];
-	int the_result, retries;
-	int sector_size = 0;
-	/* Force READ CAPACITY(16) when PROTECT=1 */
-	int longrc = scsi_device_protection(sdkp->device) ? 1 : 0;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
-	struct scsi_device *sdp = sdkp->device;
+	int the_result;
+	int retries = 3;
+	unsigned long long lba;
+	unsigned sector_size;
 
-repeat:
-	retries = 3;
 	do {
-		if (longrc) {
-			memset((void *) cmd, 0, 16);
-			cmd[0] = SERVICE_ACTION_IN;
-			cmd[1] = SAI_READ_CAPACITY_16;
-			cmd[13] = 13;
-			memset((void *) buffer, 0, 13);
-		} else {
-			cmd[0] = READ_CAPACITY;
-			memset((void *) &cmd[1], 0, 9);
-			memset((void *) buffer, 0, 8);
+		memset(cmd, 0, 16);
+		cmd[0] = SERVICE_ACTION_IN;
+		cmd[1] = SAI_READ_CAPACITY_16;
+		cmd[13] = RC16_LEN;
+		memset(buffer, 0, RC16_LEN);
+
+		the_result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
+					buffer, RC16_LEN, &sshdr,
+					SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+
+		if (media_not_present(sdkp, &sshdr))
+			return -ENODEV;
+
+		if (the_result) {
+			sense_valid = scsi_sense_valid(&sshdr);
+			if (sense_valid &&
+			    sshdr.sense_key == ILLEGAL_REQUEST &&
+			    (sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
+			    sshdr.ascq == 0x00)
+				/* Invalid Command Operation Code or
+				 * Invalid Field in CDB, just retry
+				 * silently with RC10 */
+				return -EINVAL;
 		}
-		
+		retries--;
+
+	} while (the_result && retries);
+
+	if (the_result) {
+		sd_printk(KERN_NOTICE, sdkp, "READ CAPACITY(16) failed\n");
+		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		return -EINVAL;
+	}
+
+	sector_size =	(buffer[8] << 24) | (buffer[9] << 16) |
+			(buffer[10] << 8) | buffer[11];
+	lba =  (((u64)buffer[0] << 56) | ((u64)buffer[1] << 48) |
+		((u64)buffer[2] << 40) | ((u64)buffer[3] << 32) |
+		((u64)buffer[4] << 24) | ((u64)buffer[5] << 16) |
+		((u64)buffer[6] << 8) | (u64)buffer[7]);
+
+	sd_read_protection_type(sdkp, buffer);
+
+	if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xffffffffULL)) {
+		sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
+			"kernel compiled with support for large block "
+			"devices.\n");
+		sdkp->capacity = 0;
+		return -EOVERFLOW;
+	}
+
+	sdkp->capacity = lba + 1;
+	return sector_size;
+}
+
+static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
+						unsigned char *buffer)
+{
+	unsigned char cmd[16];
+	struct scsi_sense_hdr sshdr;
+	int sense_valid = 0;
+	int the_result;
+	int retries = 3;
+	sector_t lba;
+	unsigned sector_size;
+
+	do {
+		cmd[0] = READ_CAPACITY;
+		memset(&cmd[1], 0, 9);
+		memset(buffer, 0, 8);
+
 		the_result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
-					      buffer, longrc ? 13 : 8, &sshdr,
-					      SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+					buffer, 8, &sshdr,
+					SD_TIMEOUT, SD_MAX_RETRIES, NULL);
 
 		if (media_not_present(sdkp, &sshdr))
-			return;
+			return -ENODEV;
 
 		if (the_result)
 			sense_valid = scsi_sense_valid(&sshdr);
@@ -1316,85 +1400,96 @@ repeat:
 
 	} while (the_result && retries);
 
-	if (the_result && !longrc) {
+	if (the_result) {
 		sd_printk(KERN_NOTICE, sdkp, "READ CAPACITY failed\n");
-		sd_print_result(sdkp, the_result);
-		if (driver_byte(the_result) & DRIVER_SENSE)
-			sd_print_sense_hdr(sdkp, &sshdr);
-		else
-			sd_printk(KERN_NOTICE, sdkp, "Sense not available.\n");
+		read_capacity_error(sdkp, sdp, &sshdr, sense_valid, the_result);
+		return -EINVAL;
+	}
 
-		/* Set dirty bit for removable devices if not ready -
-		 * sometimes drives will not report this properly. */
-		if (sdp->removable &&
-		    sense_valid && sshdr.sense_key == NOT_READY)
-			sdp->changed = 1;
+	sector_size =	(buffer[4] << 24) | (buffer[5] << 16) |
+			(buffer[6] << 8) | buffer[7];
+	lba =	(buffer[0] << 24) | (buffer[1] << 16) |
+		(buffer[2] << 8) | buffer[3];
 
-		/* Either no media are present but the drive didn't tell us,
-		   or they are present but the read capacity command fails */
-		/* sdkp->media_present = 0; -- not always correct */
-		sdkp->capacity = 0; /* unknown mapped to zero - as usual */
+	if ((sizeof(sdkp->capacity) == 4) && (lba == 0xffffffff)) {
+		sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
+			"kernel compiled with support for large block "
+			"devices.\n");
+		sdkp->capacity = 0;
+		return -EOVERFLOW;
+	}
 
-		return;
-	} else if (the_result && longrc) {
-		/* READ CAPACITY(16) has been failed */
-		sd_printk(KERN_NOTICE, sdkp, "READ CAPACITY(16) failed\n");
-		sd_print_result(sdkp, the_result);
-		sd_printk(KERN_NOTICE, sdkp, "Use 0xffffffff as device size\n");
+	sdkp->capacity = lba + 1;
+	return sector_size;
+}
 
-		sdkp->capacity = 1 + (sector_t) 0xffffffff;		
-		goto got_data;
-	}	
-	
-	if (!longrc) {
-		sector_size = (buffer[4] << 24) |
-			(buffer[5] << 16) | (buffer[6] << 8) | buffer[7];
-		if (buffer[0] == 0xff && buffer[1] == 0xff &&
-		    buffer[2] == 0xff && buffer[3] == 0xff) {
-			if(sizeof(sdkp->capacity) > 4) {
-				sd_printk(KERN_NOTICE, sdkp, "Very big device. "
-					  "Trying to use READ CAPACITY(16).\n");
-				longrc = 1;
-				goto repeat;
-			}
-			sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use "
-				  "a kernel compiled with support for large "
-				  "block devices.\n");
-			sdkp->capacity = 0;
+static int sd_try_rc16_first(struct scsi_device *sdp)
+{
+	if (sdp->scsi_level > SCSI_SPC_2)
+		return 1;
+	if (scsi_device_protection(sdp))
+		return 1;
+	return 0;
+}
+
+/*
+ * read disk capacity
+ */
+static void
+sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	int sector_size;
+	struct scsi_device *sdp = sdkp->device;
+	sector_t old_capacity = sdkp->capacity;
+
+	if (sd_try_rc16_first(sdp)) {
+		sector_size = read_capacity_16(sdkp, sdp, buffer);
+		if (sector_size == -EOVERFLOW)
 			goto got_data;
-		}
-		sdkp->capacity = 1 + (((sector_t)buffer[0] << 24) |
-			(buffer[1] << 16) |
-			(buffer[2] << 8) |
-			buffer[3]);			
+		if (sector_size == -ENODEV)
+			return;
+		if (sector_size < 0)
+			sector_size = read_capacity_10(sdkp, sdp, buffer);
+		if (sector_size < 0)
+			return;
 	} else {
-		sdkp->capacity = 1 + (((u64)buffer[0] << 56) |
-			((u64)buffer[1] << 48) |
-			((u64)buffer[2] << 40) |
-			((u64)buffer[3] << 32) |
-			((sector_t)buffer[4] << 24) |
-			((sector_t)buffer[5] << 16) |
-			((sector_t)buffer[6] << 8)  |
-			(sector_t)buffer[7]);
-			
-		sector_size = (buffer[8] << 24) |
-			(buffer[9] << 16) | (buffer[10] << 8) | buffer[11];
-
-		sd_read_protection_type(sdkp, buffer);
-	}	
-
-	/* Some devices return the total number of sectors, not the
-	 * highest sector number.  Make the necessary adjustment. */
-	if (sdp->fix_capacity) {
-		--sdkp->capacity;
+		sector_size = read_capacity_10(sdkp, sdp, buffer);
+		if (sector_size == -EOVERFLOW)
+			goto got_data;
+		if (sector_size < 0)
+			return;
+		if ((sizeof(sdkp->capacity) > 4) &&
+		    (sdkp->capacity > 0xffffffffULL)) {
+			int old_sector_size = sector_size;
+			sd_printk(KERN_NOTICE, sdkp, "Very big device. "
+					"Trying to use READ CAPACITY(16).\n");
+			sector_size = read_capacity_16(sdkp, sdp, buffer);
+			if (sector_size < 0) {
+				sd_printk(KERN_NOTICE, sdkp,
+					"Using 0xffffffff as device size\n");
+				sdkp->capacity = 1 + (sector_t) 0xffffffff;
+				sector_size = old_sector_size;
+				goto got_data;
+			}
+		}
+	}
 
-	/* Some devices have version which report the correct sizes
-	 * and others which do not. We guess size according to a heuristic
-	 * and err on the side of lowering the capacity. */
-	} else {
-		if (sdp->guess_capacity)
-			if (sdkp->capacity & 0x01) /* odd sizes are odd */
-				--sdkp->capacity;
+	/* Some devices are known to return the total number of blocks,
+	 * not the highest block number.  Some devices have versions
+	 * which do this and others which do not.  Some devices we might
+	 * suspect of doing this but we don't know for certain.
+	 *
+	 * If we know the reported capacity is wrong, decrement it.  If
+	 * we can only guess, then assume the number of blocks is even
+	 * (usually true but not always) and err on the side of lowering
+	 * the capacity.
+	 */
+	if (sdp->fix_capacity ||
+	    (sdp->guess_capacity && (sdkp->capacity & 0x01))) {
+		sd_printk(KERN_INFO, sdkp, "Adjusting the sector count "
+				"from its reported value: %llu\n",
+				(unsigned long long) sdkp->capacity);
+		--sdkp->capacity;
 	}
 
 got_data:
@@ -1437,10 +1532,11 @@ got_data:
 		string_get_size(sz, STRING_UNITS_10, cap_str_10,
 				sizeof(cap_str_10));
 
-		sd_printk(KERN_NOTICE, sdkp,
-			  "%llu %d-byte hardware sectors: (%s/%s)\n",
-			  (unsigned long long)sdkp->capacity,
-			  sector_size, cap_str_10, cap_str_2);
+		if (sdkp->first_scan || old_capacity != sdkp->capacity)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "%llu %d-byte hardware sectors: (%s/%s)\n",
+				  (unsigned long long)sdkp->capacity,
+				  sector_size, cap_str_10, cap_str_2);
 	}
 
 	/* Rescale capacity to 512-byte units */
@@ -1477,6 +1573,7 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
 	int res;
 	struct scsi_device *sdp = sdkp->device;
 	struct scsi_mode_data data;
+	int old_wp = sdkp->write_prot;
 
 	set_disk_ro(sdkp->disk, 0);
 	if (sdp->skip_ms_page_3f) {
@@ -1517,11 +1614,13 @@ sd_read_write_protect_flag(struct scsi_disk *sdkp, unsigned char *buffer)
 	} else {
 		sdkp->write_prot = ((data.device_specific & 0x80) != 0);
 		set_disk_ro(sdkp->disk, sdkp->write_prot);
-		sd_printk(KERN_NOTICE, sdkp, "Write Protect is %s\n",
-			  sdkp->write_prot ? "on" : "off");
-		sd_printk(KERN_DEBUG, sdkp,
-			  "Mode Sense: %02x %02x %02x %02x\n",
-			  buffer[0], buffer[1], buffer[2], buffer[3]);
+		if (sdkp->first_scan || old_wp != sdkp->write_prot) {
+			sd_printk(KERN_NOTICE, sdkp, "Write Protect is %s\n",
+				  sdkp->write_prot ? "on" : "off");
+			sd_printk(KERN_DEBUG, sdkp,
+				  "Mode Sense: %02x %02x %02x %02x\n",
+				  buffer[0], buffer[1], buffer[2], buffer[3]);
+		}
 	}
 }
 
@@ -1539,6 +1638,9 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
 	int modepage;
 	struct scsi_mode_data data;
 	struct scsi_sense_hdr sshdr;
+	int old_wce = sdkp->WCE;
+	int old_rcd = sdkp->RCD;
+	int old_dpofua = sdkp->DPOFUA;
 
 	if (sdp->skip_ms_page_8)
 		goto defaults;
@@ -1610,12 +1712,14 @@ sd_read_cache_type(struct scsi_disk *sdkp, unsigned char *buffer)
 			sdkp->DPOFUA = 0;
 		}
 
-		sd_printk(KERN_NOTICE, sdkp,
-		       "Write cache: %s, read cache: %s, %s\n",
-		       sdkp->WCE ? "enabled" : "disabled",
-		       sdkp->RCD ? "disabled" : "enabled",
-		       sdkp->DPOFUA ? "supports DPO and FUA"
-		       : "doesn't support DPO or FUA");
+		if (sdkp->first_scan || old_wce != sdkp->WCE ||
+		    old_rcd != sdkp->RCD || old_dpofua != sdkp->DPOFUA)
+			sd_printk(KERN_NOTICE, sdkp,
+				  "Write cache: %s, read cache: %s, %s\n",
+				  sdkp->WCE ? "enabled" : "disabled",
+				  sdkp->RCD ? "disabled" : "enabled",
+				  sdkp->DPOFUA ? "supports DPO and FUA"
+				  : "doesn't support DPO or FUA");
 
 		return;
 	}
@@ -1711,15 +1815,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	/* defaults, until the device tells us otherwise */
-	sdp->sector_size = 512;
-	sdkp->capacity = 0;
-	sdkp->media_present = 1;
-	sdkp->write_prot = 0;
-	sdkp->WCE = 0;
-	sdkp->RCD = 0;
-	sdkp->ATO = 0;
-
 	sd_spinup_disk(sdkp);
 
 	/*
@@ -1733,6 +1828,8 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_app_tag_own(sdkp, buffer);
 	}
 
+	sdkp->first_scan = 0;
+
 	/*
 	 * We now have all cache related info, determine how we deal
 	 * with ordered requests.  Note that as the current SCSI
@@ -1843,6 +1940,16 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	gd->private_data = &sdkp->driver;
 	gd->queue = sdkp->device->request_queue;
 
+	/* defaults, until the device tells us otherwise */
+	sdp->sector_size = 512;
+	sdkp->capacity = 0;
+	sdkp->media_present = 1;
+	sdkp->write_prot = 0;
+	sdkp->WCE = 0;
+	sdkp->RCD = 0;
+	sdkp->ATO = 0;
+	sdkp->first_scan = 1;
+
 	sd_revalidate_disk(gd);
 
 	blk_queue_prep_rq(sdp->request_queue, sd_prep_fn);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 75638e7d3f66..708778cf5f06 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -53,6 +53,7 @@ struct scsi_disk {
 	unsigned	WCE : 1;	/* state of disk WCE bit */
 	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
 	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */
+	unsigned	first_scan : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index e946e05db7f7..c9146d751cbf 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -345,44 +345,21 @@ static int ses_enclosure_find_by_addr(struct enclosure_device *edev,
 	return 0;
 }
 
-#define VPD_INQUIRY_SIZE 36
-
 static void ses_match_to_enclosure(struct enclosure_device *edev,
 				   struct scsi_device *sdev)
 {
-	unsigned char *buf = kmalloc(VPD_INQUIRY_SIZE, GFP_KERNEL);
+	unsigned char *buf;
 	unsigned char *desc;
-	u16 vpd_len;
+	unsigned int vpd_len;
 	struct efd efd = {
 		.addr = 0,
 	};
-	unsigned char cmd[] = {
-		INQUIRY,
-		1,
-		0x83,
-		VPD_INQUIRY_SIZE >> 8,
-		VPD_INQUIRY_SIZE & 0xff,
-		0
-	};
 
+	buf = scsi_get_vpd_page(sdev, 0x83);
 	if (!buf)
 		return;
 
-	if (scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
-			     VPD_INQUIRY_SIZE, NULL, SES_TIMEOUT, SES_RETRIES,
-			     NULL))
-		goto free;
-
-	vpd_len = (buf[2] << 8) + buf[3];
-	kfree(buf);
-	buf = kmalloc(vpd_len, GFP_KERNEL);
-	if (!buf)
-		return;
-	cmd[3] = vpd_len >> 8;
-	cmd[4] = vpd_len & 0xff;
-	if (scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
-			     vpd_len, NULL, SES_TIMEOUT, SES_RETRIES, NULL))
-		goto free;
+	vpd_len = ((buf[2] << 8) | buf[3]) + 4;
 
 	desc = buf + 4;
 	while (desc < buf + vpd_len) {
@@ -393,7 +370,7 @@ static void ses_match_to_enclosure(struct enclosure_device *edev,
 		u8 type = desc[1] & 0x0f;
 		u8 len = desc[3];
 
-		if (piv && code_set == 1 && assoc == 1 && code_set == 1
+		if (piv && code_set == 1 && assoc == 1
 		    && proto == SCSI_PROTOCOL_SAS && type == 3 && len == 8)
 			efd.addr = (u64)desc[4] << 56 |
 				(u64)desc[5] << 48 |
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b4ef2f84ea32..ffc87851f2e8 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -98,7 +98,6 @@ static int scatter_elem_sz = SG_SCATTER_SZ;
 static int scatter_elem_sz_prev = SG_SCATTER_SZ;
 
 #define SG_SECTOR_SZ 512
-#define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
 
 static int sg_add(struct device *, struct class_interface *);
 static void sg_remove(struct device *, struct class_interface *);
@@ -137,10 +136,11 @@ typedef struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	volatile char done;	/* 0->before bh, 1->before read, 2->read */
 	struct request *rq;
 	struct bio *bio;
+	struct execute_work ew;
 } Sg_request;
 
 typedef struct sg_fd {		/* holds the state of a file descriptor */
-	struct sg_fd *nextfp;	/* NULL when last opened fd on this device */
+	struct list_head sfd_siblings;
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
 	rwlock_t rq_list_lock;	/* protect access to list in req_arr */
@@ -158,6 +158,8 @@ typedef struct sg_fd {		/* holds the state of a file descriptor */
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -165,27 +167,25 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 	wait_queue_head_t o_excl_wait;	/* queue open() when O_EXCL in use */
 	int sg_tablesize;	/* adapter's max scatter-gather table size */
 	u32 index;		/* device index number */
-	Sg_fd *headfp;		/* first open fd belonging to this device */
+	struct list_head sfds;
 	volatile char detached;	/* 0->attached, 1->detached pending removal */
 	volatile char exclude;	/* opened for exclusive access */
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct kref d_ref;
 } Sg_device;
 
-static int sg_fasync(int fd, struct file *filp, int mode);
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, int uptodate);
 static int sg_start_req(Sg_request *srp, unsigned char *cmd);
 static void sg_finish_rem_req(Sg_request * srp);
 static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
-static int sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp,
-			 int tablesize);
 static ssize_t sg_new_read(Sg_fd * sfp, char __user *buf, size_t count,
 			   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
-			int read_only, Sg_request **o_srp);
+			int read_only, int sg_io_owned, Sg_request **o_srp);
 static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
 			   unsigned char *cmnd, int timeout, int blocking);
 static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
@@ -194,16 +194,13 @@ static void sg_build_reserve(Sg_fd * sfp, int req_size);
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
 static Sg_device *sg_get_dev(int dev);
-#ifdef CONFIG_SCSI_PROC_FS
-static int sg_last_dev(void);
-#endif
+static void sg_put_dev(Sg_device *sdp);
 
 #define SZ_SG_HEADER sizeof(struct sg_header)
 #define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
@@ -237,22 +234,17 @@ sg_open(struct inode *inode, struct file *filp)
 	nonseekable_open(inode, filp);
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
 	sdp = sg_get_dev(dev);
-	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
-	}
-	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+	if (IS_ERR(sdp)) {
+		retval = PTR_ERR(sdp);
+		sdp = NULL;
+		goto sg_put;
 	}
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -266,13 +258,13 @@ sg_open(struct inode *inode, struct file *filp)
 			retval = -EPERM; /* Can't lock it with read only access */
 			goto error_out;
 		}
-		if (sdp->headfp && (flags & O_NONBLOCK)) {
+		if (!list_empty(&sdp->sfds) && (flags & O_NONBLOCK)) {
 			retval = -EBUSY;
 			goto error_out;
 		}
 		res = 0;
 		__wait_event_interruptible(sdp->o_excl_wait,
-			((sdp->headfp || sdp->exclude) ? 0 : (sdp->exclude = 1)), res);
+					   ((!list_empty(&sdp->sfds) || sdp->exclude) ? 0 : (sdp->exclude = 1)), res);
 		if (res) {
 			retval = res;	/* -ERESTARTSYS because signal hit process */
 			goto error_out;
@@ -294,7 +286,7 @@ sg_open(struct inode *inode, struct file *filp)
 		retval = -ENODEV;
 		goto error_out;
 	}
-	if (!sdp->headfp) {	/* no existing opens on this device */
+	if (list_empty(&sdp->sfds)) {	/* no existing opens on this device */
 		sdp->sgdebug = 0;
 		q = sdp->device->request_queue;
 		sdp->sg_tablesize = min(q->max_hw_segments,
@@ -303,16 +295,20 @@ sg_open(struct inode *inode, struct file *filp)
 	if ((sfp = sg_add_sfp(sdp, dev)))
 		filp->private_data = sfp;
 	else {
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			sdp->exclude = 0;	/* undo if error */
+			wake_up_interruptible(&sdp->o_excl_wait);
+		}
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	if (sdp)
+		sg_put_dev(sdp);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +323,13 @@ sg_release(struct inode *inode, struct file *filp)
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sfp->closed = 1;
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return 0;
 }
 
@@ -557,7 +553,8 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 		return -EFAULT;
 	blocking = !(filp->f_flags & O_NONBLOCK);
 	if (old_hdr.reply_len < 0)
-		return sg_new_write(sfp, filp, buf, count, blocking, 0, NULL);
+		return sg_new_write(sfp, filp, buf, count,
+				    blocking, 0, 0, NULL);
 	if (count < (SZ_SG_HEADER + 6))
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
@@ -638,7 +635,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 
 static ssize_t
 sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
-		 size_t count, int blocking, int read_only,
+		 size_t count, int blocking, int read_only, int sg_io_owned,
 		 Sg_request **o_srp)
 {
 	int k;
@@ -658,6 +655,7 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 		SCSI_LOG_TIMEOUT(1, printk("sg_new_write: queue full\n"));
 		return -EDOM;
 	}
+	srp->sg_io_owned = sg_io_owned;
 	hp = &srp->header;
 	if (__copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
 		sg_remove_request(sfp, srp);
@@ -755,24 +753,13 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 	hp->duration = jiffies_to_msecs(jiffies);
 
 	srp->rq->timeout = timeout;
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, 1, sg_rq_end_io);
 	return 0;
 }
 
 static int
-sg_srp_done(Sg_request *srp, Sg_fd *sfp)
-{
-	unsigned long iflags;
-	int done;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	done = srp->done;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return done;
-}
-
-static int
 sg_ioctl(struct inode *inode, struct file *filp,
 	 unsigned int cmd_in, unsigned long arg)
 {
@@ -804,27 +791,26 @@ sg_ioctl(struct inode *inode, struct file *filp,
 				return -EFAULT;
 			result =
 			    sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
-					 blocking, read_only, &srp);
+					 blocking, read_only, 1, &srp);
 			if (result < 0)
 				return result;
-			srp->sg_io_owned = 1;
 			while (1) {
 				result = 0;	/* following macro to beat race condition */
 				__wait_event_interruptible(sfp->read_wait,
-					(sdp->detached || sfp->closed || sg_srp_done(srp, sfp)),
-							   result);
+					(srp->done || sdp->detached),
+					result);
 				if (sdp->detached)
 					return -ENODEV;
-				if (sfp->closed)
-					return 0;	/* request packet dropped already */
-				if (0 == result)
+				write_lock_irq(&sfp->rq_list_lock);
+				if (srp->done) {
+					srp->done = 2;
+					write_unlock_irq(&sfp->rq_list_lock);
 					break;
+				}
 				srp->orphan = 1;
+				write_unlock_irq(&sfp->rq_list_lock);
 				return result;	/* -ERESTARTSYS because signal hit process */
 			}
-			write_lock_irqsave(&sfp->rq_list_lock, iflags);
-			srp->done = 2;
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
 			return (result < 0) ? result : 0;
 		}
@@ -1238,6 +1224,15 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	return 0;
 }
 
+static void sg_rq_end_io_usercontext(struct work_struct *work)
+{
+	struct sg_request *srp = container_of(work, struct sg_request, ew.work);
+	struct sg_fd *sfp = srp->parentfp;
+
+	sg_finish_rem_req(srp);
+	kref_put(&sfp->f_ref, sg_remove_sfp);
+}
+
 /*
  * This function is a "bottom half" handler that is called by the mid
  * level when a command is completed (or has failed).
@@ -1245,24 +1240,23 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 static void sg_rq_end_io(struct request *rq, int uptodate)
 {
 	struct sg_request *srp = rq->end_io_data;
-	Sg_device *sdp = NULL;
+	Sg_device *sdp;
 	Sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
-	int result, resid;
+	int result, resid, done = 1;
 
-	if (NULL == srp) {
-		printk(KERN_ERR "sg_cmd_done: NULL request\n");
+	if (WARN_ON(srp->done != 0))
 		return;
-	}
+
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
-		printk(KERN_INFO "sg_cmd_done: device detached\n");
+	if (WARN_ON(sfp == NULL))
 		return;
-	}
+
+	sdp = sfp->parentdp;
+	if (unlikely(sdp->detached))
+		printk(KERN_INFO "sg_rq_end_io: device detached\n");
 
 	sense = rq->sense;
 	result = rq->errors;
@@ -1301,33 +1295,25 @@ static void sg_rq_end_io(struct request *rq, int uptodate)
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
-		else {
-			sg_finish_rem_req(srp);
-			srp = NULL;
-		}
+		else
+			done = 0;
 	}
-	if (sfp && srp) {
-		/* Now wake up any sg_read() that is waiting for this packet. */
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		write_lock_irqsave(&sfp->rq_list_lock, iflags);
-		srp->done = 1;
+	srp->done = done;
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+	if (likely(done)) {
+		/* Now wake up any sg_read() that is waiting for this
+		 * packet.
+		 */
 		wake_up_interruptible(&sfp->read_wait);
-		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	}
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		kref_put(&sfp->f_ref, sg_remove_sfp);
+	} else
+		execute_in_process_context(sg_rq_end_io_usercontext, &srp->ew);
 }
 
 static struct file_operations sg_fops = {
@@ -1362,17 +1348,18 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 		printk(KERN_WARNING "kmalloc Sg_device failure\n");
 		return ERR_PTR(-ENOMEM);
 	}
-	error = -ENOMEM;
+
 	if (!idr_pre_get(&sg_index_idr, GFP_KERNEL)) {
 		printk(KERN_WARNING "idr expansion Sg_device failure\n");
+		error = -ENOMEM;
 		goto out;
 	}
 
 	write_lock_irqsave(&sg_index_lock, iflags);
-	error = idr_get_new(&sg_index_idr, sdp, &k);
-	write_unlock_irqrestore(&sg_index_lock, iflags);
 
+	error = idr_get_new(&sg_index_idr, sdp, &k);
 	if (error) {
+		write_unlock_irqrestore(&sg_index_lock, iflags);
 		printk(KERN_WARNING "idr allocation Sg_device failure: %d\n",
 		       error);
 		goto out;
@@ -1386,9 +1373,13 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	disk->first_minor = k;
 	sdp->disk = disk;
 	sdp->device = scsidp;
+	INIT_LIST_HEAD(&sdp->sfds);
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
+
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 
 	error = 0;
  out:
@@ -1399,6 +1390,8 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	return sdp;
 
  overflow:
+	idr_remove(&sg_index_idr, k);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 	sdev_printk(KERN_WARNING, scsidp,
 		    "Unable to attach sg device type=%d, minor "
 		    "number exceeds %d\n", scsidp->type, SG_MAX_DEVS - 1);
@@ -1486,49 +1479,46 @@ out:
 	return error;
 }
 
-static void
-sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
+static void sg_device_destroy(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	/* CAUTION!  Note that the device can still be found via idr_find()
+	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
+	 * any other cleanup.
+	 */
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+
+	SCSI_LOG_TIMEOUT(3,
+		printk("sg_device_destroy: %s\n",
+			sdp->disk->disk_name));
+
+	put_disk(sdp->disk);
+	kfree(sdp);
+}
+
+static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	Sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	Sg_fd *sfp;
-	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
-	if (!sdp)
+	if (!sdp || sdp->detached)
 		return;
 
-	delay = 0;
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
+
+	/* Need a write lock to set sdp->detached. */
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	sdp->detached = 1;
+	list_for_each_entry(sfp, &sdp->sfds, sfd_siblings) {
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 
@@ -1536,13 +1526,8 @@ sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
-	put_disk(sdp->disk);
-	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	sg_put_dev(sdp);
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -1736,8 +1721,8 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
 		return -EFAULT;
 	if (0 == blk_size)
 		++blk_size;	/* don't know why */
-/* round request up to next highest SG_SECTOR_SZ byte boundary */
-	blk_size = (blk_size + SG_SECTOR_MSK) & (~SG_SECTOR_MSK);
+	/* round request up to next highest SG_SECTOR_SZ byte boundary */
+	blk_size = ALIGN(blk_size, SG_SECTOR_SZ);
 	SCSI_LOG_TIMEOUT(4, printk("sg_build_indirect: buff_size=%d, blk_size=%d\n",
 				   buff_size, blk_size));
 
@@ -1939,22 +1924,6 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
 	return resp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_request *
-sg_get_nth_request(Sg_fd * sfp, int nth)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	for (k = 0, resp = sfp->headrp; resp && (k < nth);
-	     ++k, resp = resp->nextrp) ;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return resp;
-}
-#endif
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)
@@ -2030,22 +1999,6 @@ sg_remove_request(Sg_fd * sfp, Sg_request * srp)
 	return res;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_fd *
-sg_get_nth_sfp(Sg_device * sdp, int nth)
-{
-	Sg_fd *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	for (k = 0, resp = sdp->headfp; resp && (k < nth);
-	     ++k, resp = resp->nextfp) ;
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return resp;
-}
-#endif
-
 static Sg_fd *
 sg_add_sfp(Sg_device * sdp, int dev)
 {
@@ -2060,6 +2013,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2069,14 +2023,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
 	sfp->parentdp = sdp;
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (!sdp->headfp)
-		sdp->headfp = sfp;
-	else {			/* add to tail of existing list */
-		Sg_fd *pfp = sdp->headfp;
-		while (pfp->nextfp)
-			pfp = pfp->nextfp;
-		pfp->nextfp = sfp;
-	}
+	list_add_tail(&sfp->sfd_siblings, &sdp->sfds);
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp: sfp=0x%p\n", sfp));
 	if (unlikely(sg_big_buff != def_reserved_size))
@@ -2087,75 +2034,52 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
-static void
-__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
 {
-	Sg_fd *fp;
-	Sg_fd *prev_fp;
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* Cleanup any responses which were never read(). */
+	while (sfp->headrp)
+		sg_finish_rem_req(sfp->headrp);
 
-	prev_fp = sdp->headfp;
-	if (sfp == prev_fp)
-		sdp->headfp = prev_fp->nextfp;
-	else {
-		while ((fp = prev_fp->nextfp)) {
-			if (sfp == fp) {
-				prev_fp->nextfp = fp->nextfp;
-				break;
-			}
-			prev_fp = fp;
-		}
-	}
 	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, 
-			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
+		SCSI_LOG_TIMEOUT(6,
+			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
+				(int) sfp->reserve.bufflen,
+				(int) sfp->reserve.k_use_sg));
 		sg_remove_scat(&sfp->reserve);
 	}
-	sfp->parentdp = NULL;
-	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
+
+	SCSI_LOG_TIMEOUT(6,
+		printk("sg_remove_sfp: %s, sfp=0x%p\n",
+			sdp->disk->disk_name,
+			sfp));
 	kfree(sfp);
+
+	scsi_device_put(sdp->device);
+	sg_put_dev(sdp);
+	module_put(THIS_MODULE);
 }
 
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp(struct kref *kref)
 {
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	struct sg_device *sdp = sfp->parentdp;
+	unsigned long iflags;
 
-	for (srp = sfp->headrp; srp; srp = tsrp) {
-		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
-	}
-	if (0 == dirty) {
-		unsigned long iflags;
-
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	write_lock_irqsave(&sg_index_lock, iflags);
+	list_del(&sfp->sfd_siblings);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2197,19 +2121,38 @@ sg_last_dev(void)
 }
 #endif
 
-static Sg_device *
-sg_get_dev(int dev)
+/* must be called with sg_index_lock held */
+static Sg_device *sg_lookup_dev(int dev)
 {
-	Sg_device *sdp;
-	unsigned long iflags;
+	return idr_find(&sg_index_idr, dev);
+}
 
-	read_lock_irqsave(&sg_index_lock, iflags);
-	sdp = idr_find(&sg_index_idr, dev);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
+static Sg_device *sg_get_dev(int dev)
+{
+	struct sg_device *sdp;
+	unsigned long flags;
+
+	read_lock_irqsave(&sg_index_lock, flags);
+	sdp = sg_lookup_dev(dev);
+	if (!sdp)
+		sdp = ERR_PTR(-ENXIO);
+	else if (sdp->detached) {
+		/* If sdp->detached, then the refcount may already be 0, in
+		 * which case it would be a bug to do kref_get().
+		 */
+		sdp = ERR_PTR(-ENODEV);
+	} else
+		kref_get(&sdp->d_ref);
+	read_unlock_irqrestore(&sg_index_lock, flags);
 
 	return sdp;
 }
 
+static void sg_put_dev(struct sg_device *sdp)
+{
+	kref_put(&sdp->d_ref, sg_device_destroy);
+}
+
 #ifdef CONFIG_SCSI_PROC_FS
 
 static struct proc_dir_entry *sg_proc_sgp = NULL;
@@ -2466,8 +2409,10 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
@@ -2478,6 +2423,7 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
@@ -2491,16 +2437,20 @@ static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
+/* must be called while holding sg_index_lock */
 static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 {
 	int k, m, new_interface, blen, usg;
@@ -2510,9 +2460,12 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 	const char * cp;
 	unsigned int ms;
 
-	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
+	k = 0;
+	list_for_each_entry(fp, &sdp->sfds, sfd_siblings) {
+		k++;
+		read_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
-			   "(res)sgat=%d low_dma=%d\n", k + 1,
+			   "(res)sgat=%d low_dma=%d\n", k,
 			   jiffies_to_msecs(fp->timeout),
 			   fp->reserve.bufflen,
 			   (int) fp->reserve.k_use_sg,
@@ -2520,7 +2473,9 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=%d\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan, (int) fp->closed);
-		for (m = 0; (srp = sg_get_nth_request(fp, m)); ++m) {
+		for (m = 0, srp = fp->headrp;
+				srp != NULL;
+				++m, srp = srp->nextrp) {
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
 			if (srp->res_used) {
@@ -2557,6 +2512,7 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 		}
 		if (0 == m)
 			seq_printf(s, "     No requests active\n");
+		read_unlock(&fp->rq_list_lock);
 	}
 }
 
@@ -2569,39 +2525,34 @@ static int sg_proc_seq_show_debug(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
+	unsigned long iflags;
 
 	if (it && (0 == it->index)) {
 		seq_printf(s, "max_active_device=%d(origin 1)\n",
 			   (int)it->max);
 		seq_printf(s, " def_reserved_size=%d\n", sg_big_buff);
 	}
-	sdp = it ? sg_get_dev(it->index) : NULL;
-	if (sdp) {
-		struct scsi_device *scsidp = sdp->device;
 
-		if (NULL == scsidp) {
-			seq_printf(s, "device %d detached ??\n", 
-				   (int)it->index);
-			return 0;
-		}
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
+	if (sdp && !list_empty(&sdp->sfds)) {
+		struct scsi_device *scsidp = sdp->device;
 
-		if (sg_get_nth_sfp(sdp, 0)) {
-			seq_printf(s, " >>> device=%s ",
-				sdp->disk->disk_name);
-			if (sdp->detached)
-				seq_printf(s, "detached pending close ");
-			else
-				seq_printf
-				    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
-				     scsidp->host->host_no,
-				     scsidp->channel, scsidp->id,
-				     scsidp->lun,
-				     scsidp->host->hostt->emulated);
-			seq_printf(s, " sg_tablesize=%d excl=%d\n",
-				   sdp->sg_tablesize, sdp->exclude);
-		}
+		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
+		if (sdp->detached)
+			seq_printf(s, "detached pending close ");
+		else
+			seq_printf
+			    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
+			     scsidp->host->host_no,
+			     scsidp->channel, scsidp->id,
+			     scsidp->lun,
+			     scsidp->host->hostt->emulated);
+		seq_printf(s, " sg_tablesize=%d excl=%d\n",
+			   sdp->sg_tablesize, sdp->exclude);
 		sg_proc_debug_helper(s, sdp);
 	}
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index c6f19ee8f2cb..eb24efea8f14 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -374,9 +374,9 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
 	if (!debugging) { /* Abnormal conditions for tape */
 		if (!cmdstatp->have_sense)
 			printk(KERN_WARNING
-			       "%s: Error %x (sugg. bt 0x%x, driver bt 0x%x, host bt 0x%x).\n",
-			       name, result, suggestion(result),
-			       driver_byte(result) & DRIVER_MASK, host_byte(result));
+			       "%s: Error %x (driver bt 0x%x, host bt 0x%x).\n",
+			       name, result, driver_byte(result),
+			       host_byte(result));
 		else if (cmdstatp->have_sense &&
 			 scode != NO_SENSE &&
 			 scode != RECOVERED_ERROR &&
diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index a3a18ad73125..47b614e8580c 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -1,7 +1,7 @@
 /*
  * SuperTrak EX Series Storage Controller driver for Linux
  *
- *	Copyright (C) 2005, 2006 Promise Technology Inc.
+ *	Copyright (C) 2005-2009 Promise Technology Inc.
  *
  *	This program is free software; you can redistribute it and/or
  *	modify it under the terms of the GNU General Public License
@@ -36,8 +36,8 @@
 #include <scsi/scsi_eh.h>
 
 #define DRV_NAME "stex"
-#define ST_DRIVER_VERSION "3.6.0000.1"
-#define ST_VER_MAJOR 		3
+#define ST_DRIVER_VERSION "4.6.0000.1"
+#define ST_VER_MAJOR 		4
 #define ST_VER_MINOR 		6
 #define ST_OEM 			0
 #define ST_BUILD_VER 		1
@@ -103,7 +103,7 @@ enum {
 	MU_REQ_COUNT				= (MU_MAX_REQUEST + 1),
 	MU_STATUS_COUNT				= (MU_MAX_REQUEST + 1),
 
-	STEX_CDB_LENGTH				= MAX_COMMAND_SIZE,
+	STEX_CDB_LENGTH				= 16,
 	REQ_VARIABLE_LEN			= 1024,
 	STATUS_VAR_LEN				= 128,
 	ST_CAN_QUEUE				= MU_MAX_REQUEST,
@@ -114,15 +114,19 @@ enum {
 	SG_CF_EOT				= 0x80,	/* end of table */
 	SG_CF_64B				= 0x40,	/* 64 bit item */
 	SG_CF_HOST				= 0x20,	/* sg in host memory */
+	MSG_DATA_DIR_ND				= 0,
+	MSG_DATA_DIR_IN				= 1,
+	MSG_DATA_DIR_OUT			= 2,
 
 	st_shasta				= 0,
 	st_vsc					= 1,
 	st_vsc1					= 2,
 	st_yosemite				= 3,
+	st_seq					= 4,
 
 	PASSTHRU_REQ_TYPE			= 0x00000001,
 	PASSTHRU_REQ_NO_WAKEUP			= 0x00000100,
-	ST_INTERNAL_TIMEOUT			= 30,
+	ST_INTERNAL_TIMEOUT			= 180,
 
 	ST_TO_CMD				= 0,
 	ST_FROM_CMD				= 1,
@@ -152,35 +156,6 @@ enum {
 	ST_ADDITIONAL_MEM			= 0x200000,
 };
 
-/* SCSI inquiry data */
-typedef struct st_inq {
-	u8 DeviceType			:5;
-	u8 DeviceTypeQualifier		:3;
-	u8 DeviceTypeModifier		:7;
-	u8 RemovableMedia		:1;
-	u8 Versions;
-	u8 ResponseDataFormat		:4;
-	u8 HiSupport			:1;
-	u8 NormACA			:1;
-	u8 ReservedBit			:1;
-	u8 AERC				:1;
-	u8 AdditionalLength;
-	u8 Reserved[2];
-	u8 SoftReset			:1;
-	u8 CommandQueue			:1;
-	u8 Reserved2			:1;
-	u8 LinkedCommands		:1;
-	u8 Synchronous			:1;
-	u8 Wide16Bit			:1;
-	u8 Wide32Bit			:1;
-	u8 RelativeAddressing		:1;
-	u8 VendorId[8];
-	u8 ProductId[16];
-	u8 ProductRevisionLevel[4];
-	u8 VendorSpecific[20];
-	u8 Reserved3[40];
-} ST_INQ;
-
 struct st_sgitem {
 	u8 ctrl;	/* SG_CF_xxx */
 	u8 reserved[3];
@@ -222,7 +197,7 @@ struct req_msg {
 	u8 target;
 	u8 task_attr;
 	u8 task_manage;
-	u8 prd_entry;
+	u8 data_dir;
 	u8 payload_sz;		/* payload size in 4-byte, not used */
 	u8 cdb[STEX_CDB_LENGTH];
 	u8 variable[REQ_VARIABLE_LEN];
@@ -284,7 +259,7 @@ struct st_drvver {
 #define MU_REQ_BUFFER_SIZE	(MU_REQ_COUNT * sizeof(struct req_msg))
 #define MU_STATUS_BUFFER_SIZE	(MU_STATUS_COUNT * sizeof(struct status_msg))
 #define MU_BUFFER_SIZE		(MU_REQ_BUFFER_SIZE + MU_STATUS_BUFFER_SIZE)
-#define STEX_EXTRA_SIZE		max(sizeof(struct st_frame), sizeof(ST_INQ))
+#define STEX_EXTRA_SIZE		sizeof(struct st_frame)
 #define STEX_BUFFER_SIZE	(MU_BUFFER_SIZE + STEX_EXTRA_SIZE)
 
 struct st_ccb {
@@ -346,8 +321,8 @@ MODULE_VERSION(ST_DRIVER_VERSION);
 static void stex_gettime(__le32 *time)
 {
 	struct timeval tv;
-	do_gettimeofday(&tv);
 
+	do_gettimeofday(&tv);
 	*time = cpu_to_le32(tv.tv_sec & 0xffffffff);
 	*(time + 1) = cpu_to_le32((tv.tv_sec >> 16) >> 16);
 }
@@ -368,7 +343,7 @@ static void stex_invalid_field(struct scsi_cmnd *cmd,
 {
 	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
 
-	/* "Invalid field in cbd" */
+	/* "Invalid field in cdb" */
 	scsi_build_sense_buffer(0, cmd->sense_buffer, ILLEGAL_REQUEST, 0x24,
 				0x0);
 	done(cmd);
@@ -497,6 +472,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *))
 	unsigned int id,lun;
 	struct req_msg *req;
 	u16 tag;
+
 	host = cmd->device->host;
 	id = cmd->device->id;
 	lun = cmd->device->lun;
@@ -508,6 +484,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *))
 		static char ms10_caching_page[12] =
 			{ 0, 0x12, 0, 0, 0, 0, 0, 0, 0x8, 0xa, 0x4, 0 };
 		unsigned char page;
+
 		page = cmd->cmnd[2] & 0x3f;
 		if (page == 0x8 || page == 0x3f) {
 			scsi_sg_copy_from_buffer(cmd, ms10_caching_page,
@@ -551,6 +528,7 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *))
 		if (cmd->cmnd[1] == PASSTHRU_GET_DRVVER) {
 			struct st_drvver ver;
 			size_t cp_len = sizeof(ver);
+
 			ver.major = ST_VER_MAJOR;
 			ver.minor = ST_VER_MINOR;
 			ver.oem = ST_OEM;
@@ -584,6 +562,13 @@ stex_queuecommand(struct scsi_cmnd *cmd, void (* done)(struct scsi_cmnd *))
 	/* cdb */
 	memcpy(req->cdb, cmd->cmnd, STEX_CDB_LENGTH);
 
+	if (cmd->sc_data_direction == DMA_FROM_DEVICE)
+		req->data_dir = MSG_DATA_DIR_IN;
+	else if (cmd->sc_data_direction == DMA_TO_DEVICE)
+		req->data_dir = MSG_DATA_DIR_OUT;
+	else
+		req->data_dir = MSG_DATA_DIR_ND;
+
 	hba->ccb[tag].cmd = cmd;
 	hba->ccb[tag].sense_bufflen = SCSI_SENSE_BUFFERSIZE;
 	hba->ccb[tag].sense_buffer = cmd->sense_buffer;
@@ -642,6 +627,7 @@ static void stex_copy_data(struct st_ccb *ccb,
 	struct status_msg *resp, unsigned int variable)
 {
 	size_t count = variable;
+
 	if (resp->scsi_status != SAM_STAT_GOOD) {
 		if (ccb->sense_buffer != NULL)
 			memcpy(ccb->sense_buffer, resp->variable,
@@ -661,24 +647,6 @@ static void stex_ys_commands(struct st_hba *hba,
 		resp->scsi_status != SAM_STAT_CHECK_CONDITION) {
 		scsi_set_resid(ccb->cmd, scsi_bufflen(ccb->cmd) -
 			le32_to_cpu(*(__le32 *)&resp->variable[0]));
-		return;
-	}
-
-	if (resp->srb_status != 0)
-		return;
-
-	/* determine inquiry command status by DeviceTypeQualifier */
-	if (ccb->cmd->cmnd[0] == INQUIRY &&
-		resp->scsi_status == SAM_STAT_GOOD) {
-		ST_INQ *inq_data;
-
-		scsi_sg_copy_to_buffer(ccb->cmd, hba->copy_buffer,
-				       STEX_EXTRA_SIZE);
-		inq_data = (ST_INQ *)hba->copy_buffer;
-		if (inq_data->DeviceTypeQualifier != 0)
-			ccb->srb_status = SRB_STATUS_SELECTION_TIMEOUT;
-		else
-			ccb->srb_status = SRB_STATUS_SUCCESS;
 	}
 }
 
@@ -746,6 +714,7 @@ static void stex_mu_intr(struct st_hba *hba, u32 doorbell)
 				stex_copy_data(ccb, resp, size);
 		}
 
+		ccb->req = NULL;
 		ccb->srb_status = resp->srb_status;
 		ccb->scsi_status = resp->scsi_status;
 
@@ -983,6 +952,7 @@ static int stex_reset(struct scsi_cmnd *cmd)
 	struct st_hba *hba;
 	unsigned long flags;
 	unsigned long before;
+
 	hba = (struct st_hba *) &cmd->device->host->hostdata[0];
 
 	printk(KERN_INFO DRV_NAME
@@ -1067,6 +1037,7 @@ static struct scsi_host_template driver_template = {
 static int stex_set_dma_mask(struct pci_dev * pdev)
 {
 	int ret;
+
 	if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)
 		&& !pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK))
 		return 0;
@@ -1124,9 +1095,9 @@ stex_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	hba->cardtype = (unsigned int) id->driver_data;
-	if (hba->cardtype == st_vsc && (pdev->subsystem_device & 0xf) == 0x1)
+	if (hba->cardtype == st_vsc && (pdev->subsystem_device & 1))
 		hba->cardtype = st_vsc1;
-	hba->dma_size = (hba->cardtype == st_vsc1) ?
+	hba->dma_size = (hba->cardtype == st_vsc1 || hba->cardtype == st_seq) ?
 		(STEX_BUFFER_SIZE + ST_ADDITIONAL_MEM) : (STEX_BUFFER_SIZE);
 	hba->dma_mem = dma_alloc_coherent(&pdev->dev,
 		hba->dma_size, &hba->dma_handle, GFP_KERNEL);
@@ -1146,10 +1117,10 @@ stex_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		host->max_lun = 8;
 		host->max_id = 16 + 1;
 	} else if (hba->cardtype == st_yosemite) {
-		host->max_lun = 128;
+		host->max_lun = 256;
 		host->max_id = 1 + 1;
 	} else {
-		/* st_vsc and st_vsc1 */
+		/* st_vsc , st_vsc1 and st_seq */
 		host->max_lun = 1;
 		host->max_id = 128 + 1;
 	}
@@ -1299,18 +1270,10 @@ static struct pci_device_id stex_pci_tbl[] = {
 	{ 0x105a, 0x7250, PCI_ANY_ID, PCI_ANY_ID, 0, 0, st_vsc },
 
 	/* st_yosemite */
-	{ 0x105a, 0x8650, PCI_ANY_ID, 0x4600, 0, 0,
-		st_yosemite }, /* SuperTrak EX4650 */
-	{ 0x105a, 0x8650, PCI_ANY_ID, 0x4610, 0, 0,
-		st_yosemite }, /* SuperTrak EX4650o */
-	{ 0x105a, 0x8650, PCI_ANY_ID, 0x8600, 0, 0,
-		st_yosemite }, /* SuperTrak EX8650EL */
-	{ 0x105a, 0x8650, PCI_ANY_ID, 0x8601, 0, 0,
-		st_yosemite }, /* SuperTrak EX8650 */
-	{ 0x105a, 0x8650, PCI_ANY_ID, 0x8602, 0, 0,
-		st_yosemite }, /* SuperTrak EX8654 */
-	{ 0x105a, 0x8650, PCI_ANY_ID, PCI_ANY_ID, 0, 0,
-		st_yosemite }, /* generic st_yosemite */
+	{ 0x105a, 0x8650, PCI_ANY_ID, PCI_ANY_ID, 0, 0, st_yosemite },
+
+	/* st_seq */
+	{ 0x105a, 0x3360, PCI_ANY_ID, PCI_ANY_ID, 0, 0, st_seq },
 	{ }	/* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, stex_pci_tbl);
diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c
index f4e6cde1fd0d..23e782015880 100644
--- a/drivers/scsi/sym53c8xx_2/sym_glue.c
+++ b/drivers/scsi/sym53c8xx_2/sym_glue.c
@@ -792,9 +792,9 @@ static int sym53c8xx_slave_configure(struct scsi_device *sdev)
 
 	/*
 	 *  Select queue depth from driver setup.
-	 *  Donnot use more than configured by user.
-	 *  Use at least 2.
-	 *  Donnot use more than our maximum.
+	 *  Do not use more than configured by user.
+	 *  Use at least 1.
+	 *  Do not use more than our maximum.
 	 */
 	reqtags = sym_driver_setup.max_tag;
 	if (reqtags > tp->usrtags)
@@ -803,7 +803,7 @@ static int sym53c8xx_slave_configure(struct scsi_device *sdev)
 		reqtags = 0;
 	if (reqtags > SYM_CONF_MAX_TAG)
 		reqtags = SYM_CONF_MAX_TAG;
-	depth_to_use = reqtags ? reqtags : 2;
+	depth_to_use = reqtags ? reqtags : 1;
 	scsi_adjust_queue_depth(sdev,
 				sdev->tagged_supported ? MSG_SIMPLE_TAG : 0,
 				depth_to_use);
@@ -1236,14 +1236,29 @@ static int sym53c8xx_proc_info(struct Scsi_Host *shost, char *buffer,
 #endif /* SYM_LINUX_PROC_INFO_SUPPORT */
 
 /*
+ * Free resources claimed by sym_iomap_device().  Note that
+ * sym_free_resources() should be used instead of this function after calling
+ * sym_attach().
+ */
+static void __devinit
+sym_iounmap_device(struct sym_device *device)
+{
+	if (device->s.ioaddr)
+		pci_iounmap(device->pdev, device->s.ioaddr);
+	if (device->s.ramaddr)
+		pci_iounmap(device->pdev, device->s.ramaddr);
+}
+
+/*
  *	Free controller resources.
  */
-static void sym_free_resources(struct sym_hcb *np, struct pci_dev *pdev)
+static void sym_free_resources(struct sym_hcb *np, struct pci_dev *pdev,
+		int do_free_irq)
 {
 	/*
 	 *  Free O/S specific resources.
 	 */
-	if (pdev->irq)
+	if (do_free_irq)
 		free_irq(pdev->irq, np->s.host);
 	if (np->s.ioaddr)
 		pci_iounmap(pdev, np->s.ioaddr);
@@ -1271,10 +1286,11 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 {
 	struct sym_data *sym_data;
 	struct sym_hcb *np = NULL;
-	struct Scsi_Host *shost;
+	struct Scsi_Host *shost = NULL;
 	struct pci_dev *pdev = dev->pdev;
 	unsigned long flags;
 	struct sym_fw *fw;
+	int do_free_irq = 0;
 
 	printk(KERN_INFO "sym%d: <%s> rev 0x%x at pci %s irq %u\n",
 		unit, dev->chip.name, pdev->revision, pci_name(pdev),
@@ -1285,11 +1301,11 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 	 */
 	fw = sym_find_firmware(&dev->chip);
 	if (!fw)
-		return NULL;
+		goto attach_failed;
 
 	shost = scsi_host_alloc(tpnt, sizeof(*sym_data));
 	if (!shost)
-		return NULL;
+		goto attach_failed;
 	sym_data = shost_priv(shost);
 
 	/*
@@ -1319,6 +1335,10 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 	np->maxoffs	= dev->chip.offset_max;
 	np->maxburst	= dev->chip.burst_max;
 	np->myaddr	= dev->host_id;
+	np->mmio_ba	= (u32)dev->mmio_base;
+	np->ram_ba	= (u32)dev->ram_base;
+	np->s.ioaddr	= dev->s.ioaddr;
+	np->s.ramaddr	= dev->s.ramaddr;
 
 	/*
 	 *  Edit its name.
@@ -1334,22 +1354,6 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 		goto attach_failed;
 	}
 
-	/*
-	 *  Try to map the controller chip to
-	 *  virtual and physical memory.
-	 */
-	np->mmio_ba = (u32)dev->mmio_base;
-	np->s.ioaddr	= dev->s.ioaddr;
-	np->s.ramaddr	= dev->s.ramaddr;
-
-	/*
-	 *  Map on-chip RAM if present and supported.
-	 */
-	if (!(np->features & FE_RAM))
-		dev->ram_base = 0;
-	if (dev->ram_base)
-		np->ram_ba = (u32)dev->ram_base;
-
 	if (sym_hcb_attach(shost, fw, dev->nvram))
 		goto attach_failed;
 
@@ -1364,6 +1368,7 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 			sym_name(np), pdev->irq);
 		goto attach_failed;
 	}
+	do_free_irq = 1;
 
 	/*
 	 *  After SCSI devices have been opened, we cannot
@@ -1416,12 +1421,13 @@ static struct Scsi_Host * __devinit sym_attach(struct scsi_host_template *tpnt,
 		   "TERMINATION, DEVICE POWER etc.!\n", sym_name(np));
 	spin_unlock_irqrestore(shost->host_lock, flags);
  attach_failed:
-	if (!shost)
-		return NULL;
-	printf_info("%s: giving up ...\n", sym_name(np));
+	printf_info("sym%d: giving up ...\n", unit);
 	if (np)
-		sym_free_resources(np, pdev);
-	scsi_host_put(shost);
+		sym_free_resources(np, pdev, do_free_irq);
+	else
+		sym_iounmap_device(dev);
+	if (shost)
+		scsi_host_put(shost);
 
 	return NULL;
  }
@@ -1550,30 +1556,28 @@ static int __devinit sym_set_workarounds(struct sym_device *device)
 }
 
 /*
- *  Read and check the PCI configuration for any detected NCR 
- *  boards and save data for attaching after all boards have 
- *  been detected.
+ * Map HBA registers and on-chip SRAM (if present).
  */
-static void __devinit
-sym_init_device(struct pci_dev *pdev, struct sym_device *device)
+static int __devinit
+sym_iomap_device(struct sym_device *device)
 {
-	int i = 2;
+	struct pci_dev *pdev = device->pdev;
 	struct pci_bus_region bus_addr;
-
-	device->host_id = SYM_SETUP_HOST_ID;
-	device->pdev = pdev;
+	int i = 2;
 
 	pcibios_resource_to_bus(pdev, &bus_addr, &pdev->resource[1]);
 	device->mmio_base = bus_addr.start;
 
-	/*
-	 * If the BAR is 64-bit, resource 2 will be occupied by the
-	 * upper 32 bits
-	 */
-	if (!pdev->resource[i].flags)
-		i++;
-	pcibios_resource_to_bus(pdev, &bus_addr, &pdev->resource[i]);
-	device->ram_base = bus_addr.start;
+	if (device->chip.features & FE_RAM) {
+		/*
+		 * If the BAR is 64-bit, resource 2 will be occupied by the
+		 * upper 32 bits
+		 */
+		if (!pdev->resource[i].flags)
+			i++;
+		pcibios_resource_to_bus(pdev, &bus_addr, &pdev->resource[i]);
+		device->ram_base = bus_addr.start;
+	}
 
 #ifdef CONFIG_SCSI_SYM53C8XX_MMIO
 	if (device->mmio_base)
@@ -1583,9 +1587,21 @@ sym_init_device(struct pci_dev *pdev, struct sym_device *device)
 	if (!device->s.ioaddr)
 		device->s.ioaddr = pci_iomap(pdev, 0,
 						pci_resource_len(pdev, 0));
-	if (device->ram_base)
+	if (!device->s.ioaddr) {
+		dev_err(&pdev->dev, "could not map registers; giving up.\n");
+		return -EIO;
+	}
+	if (device->ram_base) {
 		device->s.ramaddr = pci_iomap(pdev, i,
 						pci_resource_len(pdev, i));
+		if (!device->s.ramaddr) {
+			dev_warn(&pdev->dev,
+				"could not map SRAM; continuing anyway.\n");
+			device->ram_base = 0;
+		}
+	}
+
+	return 0;
 }
 
 /*
@@ -1659,7 +1675,8 @@ static int sym_detach(struct Scsi_Host *shost, struct pci_dev *pdev)
 	udelay(10);
 	OUTB(np, nc_istat, 0);
 
-	sym_free_resources(np, pdev);
+	sym_free_resources(np, pdev, 1);
+	scsi_host_put(shost);
 
 	return 1;
 }
@@ -1696,9 +1713,13 @@ static int __devinit sym2_probe(struct pci_dev *pdev,
 	struct sym_device sym_dev;
 	struct sym_nvram nvram;
 	struct Scsi_Host *shost;
+	int do_iounmap = 0;
+	int do_disable_device = 1;
 
 	memset(&sym_dev, 0, sizeof(sym_dev));
 	memset(&nvram, 0, sizeof(nvram));
+	sym_dev.pdev = pdev;
+	sym_dev.host_id = SYM_SETUP_HOST_ID;
 
 	if (pci_enable_device(pdev))
 		goto leave;
@@ -1708,12 +1729,17 @@ static int __devinit sym2_probe(struct pci_dev *pdev,
 	if (pci_request_regions(pdev, NAME53C8XX))
 		goto disable;
 
-	sym_init_device(pdev, &sym_dev);
 	if (sym_check_supported(&sym_dev))
 		goto free;
 
-	if (sym_check_raid(&sym_dev))
-		goto leave;	/* Don't disable the device */
+	if (sym_iomap_device(&sym_dev))
+		goto free;
+	do_iounmap = 1;
+
+	if (sym_check_raid(&sym_dev)) {
+		do_disable_device = 0;	/* Don't disable the device */
+		goto free;
+	}
 
 	if (sym_set_workarounds(&sym_dev))
 		goto free;
@@ -1722,6 +1748,7 @@ static int __devinit sym2_probe(struct pci_dev *pdev,
 
 	sym_get_nvram(&sym_dev, &nvram);
 
+	do_iounmap = 0; /* Don't sym_iounmap_device() after sym_attach(). */
 	shost = sym_attach(&sym2_template, attach_count, &sym_dev);
 	if (!shost)
 		goto free;
@@ -1737,9 +1764,12 @@ static int __devinit sym2_probe(struct pci_dev *pdev,
  detach:
 	sym_detach(pci_get_drvdata(pdev), pdev);
  free:
+	if (do_iounmap)
+		sym_iounmap_device(&sym_dev);
 	pci_release_regions(pdev);
  disable:
-	pci_disable_device(pdev);
+	if (do_disable_device)
+		pci_disable_device(pdev);
  leave:
 	return -ENODEV;
 }
@@ -1749,7 +1779,6 @@ static void sym2_remove(struct pci_dev *pdev)
 	struct Scsi_Host *shost = pci_get_drvdata(pdev);
 
 	scsi_remove_host(shost);
-	scsi_host_put(shost);
 	sym_detach(shost, pdev);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.c b/drivers/scsi/sym53c8xx_2/sym_hipd.c
index 98df1651404f..ccea7db59f49 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.c
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.c
@@ -1433,13 +1433,12 @@ static int sym_prepare_nego(struct sym_hcb *np, struct sym_ccb *cp, u_char *msgp
 	 * Many devices implement PPR in a buggy way, so only use it if we
 	 * really want to.
 	 */
-	if (goal->offset &&
-	    (goal->iu || goal->dt || goal->qas || (goal->period < 0xa))) {
+	if (goal->renego == NS_PPR || (goal->offset &&
+	    (goal->iu || goal->dt || goal->qas || (goal->period < 0xa)))) {
 		nego = NS_PPR;
-	} else if (spi_width(starget) != goal->width) {
+	} else if (goal->renego == NS_WIDE || goal->width) {
 		nego = NS_WIDE;
-	} else if (spi_period(starget) != goal->period ||
-		   spi_offset(starget) != goal->offset) {
+	} else if (goal->renego == NS_SYNC || goal->offset) {
 		nego = NS_SYNC;
 	} else {
 		goal->check_nego = 0;
@@ -2040,6 +2039,29 @@ static void sym_settrans(struct sym_hcb *np, int target, u_char opts, u_char ofs
 	}
 }
 
+static void sym_announce_transfer_rate(struct sym_tcb *tp)
+{
+	struct scsi_target *starget = tp->starget;
+
+	if (tp->tprint.period != spi_period(starget) ||
+	    tp->tprint.offset != spi_offset(starget) ||
+	    tp->tprint.width != spi_width(starget) ||
+	    tp->tprint.iu != spi_iu(starget) ||
+	    tp->tprint.dt != spi_dt(starget) ||
+	    tp->tprint.qas != spi_qas(starget) ||
+	    !tp->tprint.check_nego) {
+		tp->tprint.period = spi_period(starget);
+		tp->tprint.offset = spi_offset(starget);
+		tp->tprint.width = spi_width(starget);
+		tp->tprint.iu = spi_iu(starget);
+		tp->tprint.dt = spi_dt(starget);
+		tp->tprint.qas = spi_qas(starget);
+		tp->tprint.check_nego = 1;
+
+		spi_display_xfer_agreement(starget);
+	}
+}
+
 /*
  *  We received a WDTR.
  *  Let everything be aware of the changes.
@@ -2049,11 +2071,13 @@ static void sym_setwide(struct sym_hcb *np, int target, u_char wide)
 	struct sym_tcb *tp = &np->target[target];
 	struct scsi_target *starget = tp->starget;
 
-	if (spi_width(starget) == wide)
-		return;
-
 	sym_settrans(np, target, 0, 0, 0, wide, 0, 0);
 
+	if (wide)
+		tp->tgoal.renego = NS_WIDE;
+	else
+		tp->tgoal.renego = 0;
+	tp->tgoal.check_nego = 0;
 	tp->tgoal.width = wide;
 	spi_offset(starget) = 0;
 	spi_period(starget) = 0;
@@ -2063,7 +2087,7 @@ static void sym_setwide(struct sym_hcb *np, int target, u_char wide)
 	spi_qas(starget) = 0;
 
 	if (sym_verbose >= 3)
-		spi_display_xfer_agreement(starget);
+		sym_announce_transfer_rate(tp);
 }
 
 /*
@@ -2080,6 +2104,12 @@ sym_setsync(struct sym_hcb *np, int target,
 
 	sym_settrans(np, target, 0, ofs, per, wide, div, fak);
 
+	if (wide)
+		tp->tgoal.renego = NS_WIDE;
+	else if (ofs)
+		tp->tgoal.renego = NS_SYNC;
+	else
+		tp->tgoal.renego = 0;
 	spi_period(starget) = per;
 	spi_offset(starget) = ofs;
 	spi_iu(starget) = spi_dt(starget) = spi_qas(starget) = 0;
@@ -2090,7 +2120,7 @@ sym_setsync(struct sym_hcb *np, int target,
 		tp->tgoal.check_nego = 0;
 	}
 
-	spi_display_xfer_agreement(starget);
+	sym_announce_transfer_rate(tp);
 }
 
 /*
@@ -2106,6 +2136,10 @@ sym_setpprot(struct sym_hcb *np, int target, u_char opts, u_char ofs,
 
 	sym_settrans(np, target, opts, ofs, per, wide, div, fak);
 
+	if (wide || ofs)
+		tp->tgoal.renego = NS_PPR;
+	else
+		tp->tgoal.renego = 0;
 	spi_width(starget) = tp->tgoal.width = wide;
 	spi_period(starget) = tp->tgoal.period = per;
 	spi_offset(starget) = tp->tgoal.offset = ofs;
@@ -2114,7 +2148,7 @@ sym_setpprot(struct sym_hcb *np, int target, u_char opts, u_char ofs,
 	spi_qas(starget) = tp->tgoal.qas = !!(opts & PPR_OPT_QAS);
 	tp->tgoal.check_nego = 0;
 
-	spi_display_xfer_agreement(starget);
+	sym_announce_transfer_rate(tp);
 }
 
 /*
@@ -3516,6 +3550,7 @@ static void sym_sir_task_recovery(struct sym_hcb *np, int num)
 			spi_dt(starget) = 0;
 			spi_qas(starget) = 0;
 			tp->tgoal.check_nego = 1;
+			tp->tgoal.renego = 0;
 		}
 
 		/*
@@ -5135,9 +5170,14 @@ int sym_queue_scsiio(struct sym_hcb *np, struct scsi_cmnd *cmd, struct sym_ccb *
 	/*
 	 *  Build a negotiation message if needed.
 	 *  (nego_status is filled by sym_prepare_nego())
+	 *
+	 *  Always negotiate on INQUIRY and REQUEST SENSE.
+	 *
 	 */
 	cp->nego_status = 0;
-	if (tp->tgoal.check_nego && !tp->nego_cp && lp) {
+	if ((tp->tgoal.check_nego ||
+	     cmd->cmnd[0] == INQUIRY || cmd->cmnd[0] == REQUEST_SENSE) &&
+	    !tp->nego_cp && lp) {
 		msglen += sym_prepare_nego(np, cp, msgptr + msglen);
 	}
 
diff --git a/drivers/scsi/sym53c8xx_2/sym_hipd.h b/drivers/scsi/sym53c8xx_2/sym_hipd.h
index ad078805e62b..61d28fcfffbf 100644
--- a/drivers/scsi/sym53c8xx_2/sym_hipd.h
+++ b/drivers/scsi/sym53c8xx_2/sym_hipd.h
@@ -354,6 +354,7 @@ struct sym_trans {
 	unsigned int dt:1;
 	unsigned int qas:1;
 	unsigned int check_nego:1;
+	unsigned int renego:2;
 };
 
 /*
@@ -419,6 +420,9 @@ struct sym_tcb {
 	/* Transfer goal */
 	struct sym_trans tgoal;
 
+	/* Last printed transfer speed */
+	struct sym_trans tprint;
+
 	/*
 	 * Keep track of the CCB used for the negotiation in order
 	 * to ensure that only 1 negotiation is queued at a time.
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index d48c8553539d..49aedb36dc19 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -787,7 +787,7 @@ void usb_stor_invoke_transport(struct scsi_cmnd *srb, struct us_data *us)
 	/* Did we transfer less than the minimum amount required? */
 	if ((srb->result == SAM_STAT_GOOD || srb->sense_buffer[2] == 0) &&
 			scsi_bufflen(srb) - scsi_get_resid(srb) < srb->underflow)
-		srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);
+		srb->result = DID_ERROR << 16;
 
 	last_sector_hacks(us, srb);
 	return;
diff --git a/drivers/watchdog/rdc321x_wdt.c b/drivers/watchdog/rdc321x_wdt.c
index bf92802f2bbe..36e221beedcd 100644
--- a/drivers/watchdog/rdc321x_wdt.c
+++ b/drivers/watchdog/rdc321x_wdt.c
@@ -37,7 +37,7 @@
 #include <linux/io.h>
 #include <linux/uaccess.h>
 
-#include <asm/mach-rdc321x/rdc321x_defs.h>
+#include <asm/rdc321x_defs.h>
 
 #define RDC_WDT_MASK	0x80000000 /* Mask */
 #define RDC_WDT_EN	0x00800000 /* Enable bit */
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index eb0dfdeaa949..30963af5dba0 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -26,9 +26,11 @@
 #include <linux/irq.h>
 #include <linux/module.h>
 #include <linux/string.h>
+#include <linux/bootmem.h>
 
 #include <asm/ptrace.h>
 #include <asm/irq.h>
+#include <asm/idle.h>
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
@@ -50,36 +52,55 @@ static DEFINE_PER_CPU(int, virq_to_irq[NR_VIRQS]) = {[0 ... NR_VIRQS-1] = -1};
 /* IRQ <-> IPI mapping */
 static DEFINE_PER_CPU(int, ipi_to_irq[XEN_NR_IPIS]) = {[0 ... XEN_NR_IPIS-1] = -1};
 
-/* Packed IRQ information: binding type, sub-type index, and event channel. */
-struct packed_irq
-{
-	unsigned short evtchn;
-	unsigned char index;
-	unsigned char type;
-};
-
-static struct packed_irq irq_info[NR_IRQS];
-
-/* Binding types. */
-enum {
-	IRQT_UNBOUND,
+/* Interrupt types. */
+enum xen_irq_type {
+	IRQT_UNBOUND = 0,
 	IRQT_PIRQ,
 	IRQT_VIRQ,
 	IRQT_IPI,
 	IRQT_EVTCHN
 };
 
-/* Convenient shorthand for packed representation of an unbound IRQ. */
-#define IRQ_UNBOUND	mk_irq_info(IRQT_UNBOUND, 0, 0)
+/*
+ * Packed IRQ information:
+ * type - enum xen_irq_type
+ * event channel - irq->event channel mapping
+ * cpu - cpu this event channel is bound to
+ * index - type-specific information:
+ *    PIRQ - vector, with MSB being "needs EIO"
+ *    VIRQ - virq number
+ *    IPI - IPI vector
+ *    EVTCHN -
+ */
+struct irq_info
+{
+	enum xen_irq_type type;	/* type */
+	unsigned short evtchn;	/* event channel */
+	unsigned short cpu;	/* cpu bound */
+
+	union {
+		unsigned short virq;
+		enum ipi_vector ipi;
+		struct {
+			unsigned short gsi;
+			unsigned short vector;
+		} pirq;
+	} u;
+};
+
+static struct irq_info irq_info[NR_IRQS];
 
 static int evtchn_to_irq[NR_EVENT_CHANNELS] = {
 	[0 ... NR_EVENT_CHANNELS-1] = -1
 };
-static unsigned long cpu_evtchn_mask[NR_CPUS][NR_EVENT_CHANNELS/BITS_PER_LONG];
-static u8 cpu_evtchn[NR_EVENT_CHANNELS];
-
-/* Reference counts for bindings to IRQs. */
-static int irq_bindcount[NR_IRQS];
+struct cpu_evtchn_s {
+	unsigned long bits[NR_EVENT_CHANNELS/BITS_PER_LONG];
+};
+static struct cpu_evtchn_s *cpu_evtchn_mask_p;
+static inline unsigned long *cpu_evtchn_mask(int cpu)
+{
+	return cpu_evtchn_mask_p[cpu].bits;
+}
 
 /* Xen will never allocate port zero for any purpose. */
 #define VALID_EVTCHN(chn)	((chn) != 0)
@@ -87,27 +108,108 @@ static int irq_bindcount[NR_IRQS];
 static struct irq_chip xen_dynamic_chip;
 
 /* Constructor for packed IRQ information. */
-static inline struct packed_irq mk_irq_info(u32 type, u32 index, u32 evtchn)
+static struct irq_info mk_unbound_info(void)
+{
+	return (struct irq_info) { .type = IRQT_UNBOUND };
+}
+
+static struct irq_info mk_evtchn_info(unsigned short evtchn)
+{
+	return (struct irq_info) { .type = IRQT_EVTCHN, .evtchn = evtchn,
+			.cpu = 0 };
+}
+
+static struct irq_info mk_ipi_info(unsigned short evtchn, enum ipi_vector ipi)
 {
-	return (struct packed_irq) { evtchn, index, type };
+	return (struct irq_info) { .type = IRQT_IPI, .evtchn = evtchn,
+			.cpu = 0, .u.ipi = ipi };
+}
+
+static struct irq_info mk_virq_info(unsigned short evtchn, unsigned short virq)
+{
+	return (struct irq_info) { .type = IRQT_VIRQ, .evtchn = evtchn,
+			.cpu = 0, .u.virq = virq };
+}
+
+static struct irq_info mk_pirq_info(unsigned short evtchn,
+				    unsigned short gsi, unsigned short vector)
+{
+	return (struct irq_info) { .type = IRQT_PIRQ, .evtchn = evtchn,
+			.cpu = 0, .u.pirq = { .gsi = gsi, .vector = vector } };
 }
 
 /*
  * Accessors for packed IRQ information.
  */
-static inline unsigned int evtchn_from_irq(int irq)
+static struct irq_info *info_for_irq(unsigned irq)
+{
+	return &irq_info[irq];
+}
+
+static unsigned int evtchn_from_irq(unsigned irq)
 {
-	return irq_info[irq].evtchn;
+	return info_for_irq(irq)->evtchn;
 }
 
-static inline unsigned int index_from_irq(int irq)
+static enum ipi_vector ipi_from_irq(unsigned irq)
 {
-	return irq_info[irq].index;
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info == NULL);
+	BUG_ON(info->type != IRQT_IPI);
+
+	return info->u.ipi;
 }
 
-static inline unsigned int type_from_irq(int irq)
+static unsigned virq_from_irq(unsigned irq)
 {
-	return irq_info[irq].type;
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info == NULL);
+	BUG_ON(info->type != IRQT_VIRQ);
+
+	return info->u.virq;
+}
+
+static unsigned gsi_from_irq(unsigned irq)
+{
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info == NULL);
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	return info->u.pirq.gsi;
+}
+
+static unsigned vector_from_irq(unsigned irq)
+{
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info == NULL);
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	return info->u.pirq.vector;
+}
+
+static enum xen_irq_type type_from_irq(unsigned irq)
+{
+	return info_for_irq(irq)->type;
+}
+
+static unsigned cpu_from_irq(unsigned irq)
+{
+	return info_for_irq(irq)->cpu;
+}
+
+static unsigned int cpu_from_evtchn(unsigned int evtchn)
+{
+	int irq = evtchn_to_irq[evtchn];
+	unsigned ret = 0;
+
+	if (irq != -1)
+		ret = cpu_from_irq(irq);
+
+	return ret;
 }
 
 static inline unsigned long active_evtchns(unsigned int cpu,
@@ -115,7 +217,7 @@ static inline unsigned long active_evtchns(unsigned int cpu,
 					   unsigned int idx)
 {
 	return (sh->evtchn_pending[idx] &
-		cpu_evtchn_mask[cpu][idx] &
+		cpu_evtchn_mask(cpu)[idx] &
 		~sh->evtchn_mask[idx]);
 }
 
@@ -125,13 +227,13 @@ static void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
 
 	BUG_ON(irq == -1);
 #ifdef CONFIG_SMP
-	irq_to_desc(irq)->affinity = cpumask_of_cpu(cpu);
+	cpumask_copy(irq_to_desc(irq)->affinity, cpumask_of(cpu));
 #endif
 
-	__clear_bit(chn, cpu_evtchn_mask[cpu_evtchn[chn]]);
-	__set_bit(chn, cpu_evtchn_mask[cpu]);
+	__clear_bit(chn, cpu_evtchn_mask(cpu_from_irq(irq)));
+	__set_bit(chn, cpu_evtchn_mask(cpu));
 
-	cpu_evtchn[chn] = cpu;
+	irq_info[irq].cpu = cpu;
 }
 
 static void init_evtchn_cpu_bindings(void)
@@ -142,17 +244,11 @@ static void init_evtchn_cpu_bindings(void)
 
 	/* By default all event channels notify CPU#0. */
 	for_each_irq_desc(i, desc) {
-		desc->affinity = cpumask_of_cpu(0);
+		cpumask_copy(desc->affinity, cpumask_of(0));
 	}
 #endif
 
-	memset(cpu_evtchn, 0, sizeof(cpu_evtchn));
-	memset(cpu_evtchn_mask[0], ~0, sizeof(cpu_evtchn_mask[0]));
-}
-
-static inline unsigned int cpu_from_evtchn(unsigned int evtchn)
-{
-	return cpu_evtchn[evtchn];
+	memset(cpu_evtchn_mask(0), ~0, sizeof(cpu_evtchn_mask(0)));
 }
 
 static inline void clear_evtchn(int port)
@@ -232,9 +328,8 @@ static int find_unbound_irq(void)
 	int irq;
 	struct irq_desc *desc;
 
-	/* Only allocate from dynirq range */
 	for (irq = 0; irq < nr_irqs; irq++)
-		if (irq_bindcount[irq] == 0)
+		if (irq_info[irq].type == IRQT_UNBOUND)
 			break;
 
 	if (irq == nr_irqs)
@@ -244,6 +339,8 @@ static int find_unbound_irq(void)
 	if (WARN_ON(desc == NULL))
 		return -1;
 
+	dynamic_irq_init(irq);
+
 	return irq;
 }
 
@@ -258,16 +355,13 @@ int bind_evtchn_to_irq(unsigned int evtchn)
 	if (irq == -1) {
 		irq = find_unbound_irq();
 
-		dynamic_irq_init(irq);
 		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
 					      handle_level_irq, "event");
 
 		evtchn_to_irq[evtchn] = irq;
-		irq_info[irq] = mk_irq_info(IRQT_EVTCHN, 0, evtchn);
+		irq_info[irq] = mk_evtchn_info(evtchn);
 	}
 
-	irq_bindcount[irq]++;
-
 	spin_unlock(&irq_mapping_update_lock);
 
 	return irq;
@@ -282,12 +376,12 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
 	spin_lock(&irq_mapping_update_lock);
 
 	irq = per_cpu(ipi_to_irq, cpu)[ipi];
+
 	if (irq == -1) {
 		irq = find_unbound_irq();
 		if (irq < 0)
 			goto out;
 
-		dynamic_irq_init(irq);
 		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
 					      handle_level_irq, "ipi");
 
@@ -298,15 +392,12 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
 		evtchn = bind_ipi.port;
 
 		evtchn_to_irq[evtchn] = irq;
-		irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn);
-
+		irq_info[irq] = mk_ipi_info(evtchn, ipi);
 		per_cpu(ipi_to_irq, cpu)[ipi] = irq;
 
 		bind_evtchn_to_cpu(evtchn, cpu);
 	}
 
-	irq_bindcount[irq]++;
-
  out:
 	spin_unlock(&irq_mapping_update_lock);
 	return irq;
@@ -332,20 +423,17 @@ static int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
 
 		irq = find_unbound_irq();
 
-		dynamic_irq_init(irq);
 		set_irq_chip_and_handler_name(irq, &xen_dynamic_chip,
 					      handle_level_irq, "virq");
 
 		evtchn_to_irq[evtchn] = irq;
-		irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn);
+		irq_info[irq] = mk_virq_info(evtchn, virq);
 
 		per_cpu(virq_to_irq, cpu)[virq] = irq;
 
 		bind_evtchn_to_cpu(evtchn, cpu);
 	}
 
-	irq_bindcount[irq]++;
-
 	spin_unlock(&irq_mapping_update_lock);
 
 	return irq;
@@ -358,7 +446,7 @@ static void unbind_from_irq(unsigned int irq)
 
 	spin_lock(&irq_mapping_update_lock);
 
-	if ((--irq_bindcount[irq] == 0) && VALID_EVTCHN(evtchn)) {
+	if (VALID_EVTCHN(evtchn)) {
 		close.port = evtchn;
 		if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
 			BUG();
@@ -366,11 +454,11 @@ static void unbind_from_irq(unsigned int irq)
 		switch (type_from_irq(irq)) {
 		case IRQT_VIRQ:
 			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
-				[index_from_irq(irq)] = -1;
+				[virq_from_irq(irq)] = -1;
 			break;
 		case IRQT_IPI:
 			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
-				[index_from_irq(irq)] = -1;
+				[ipi_from_irq(irq)] = -1;
 			break;
 		default:
 			break;
@@ -380,7 +468,7 @@ static void unbind_from_irq(unsigned int irq)
 		bind_evtchn_to_cpu(evtchn, 0);
 
 		evtchn_to_irq[evtchn] = -1;
-		irq_info[irq] = IRQ_UNBOUND;
+		irq_info[irq] = mk_unbound_info();
 
 		dynamic_irq_cleanup(irq);
 	}
@@ -498,8 +586,8 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 	for(i = 0; i < NR_EVENT_CHANNELS; i++) {
 		if (sync_test_bit(i, sh->evtchn_pending)) {
 			printk("  %d: event %d -> irq %d\n",
-				cpu_evtchn[i], i,
-				evtchn_to_irq[i]);
+			       cpu_from_evtchn(i), i,
+			       evtchn_to_irq[i]);
 		}
 	}
 
@@ -508,7 +596,6 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-
 /*
  * Search the CPUs pending events bitmasks.  For each one found, map
  * the event number to an irq, and feed it into do_IRQ() for
@@ -521,11 +608,15 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
 void xen_evtchn_do_upcall(struct pt_regs *regs)
 {
 	int cpu = get_cpu();
+	struct pt_regs *old_regs = set_irq_regs(regs);
 	struct shared_info *s = HYPERVISOR_shared_info;
 	struct vcpu_info *vcpu_info = __get_cpu_var(xen_vcpu);
 	static DEFINE_PER_CPU(unsigned, nesting_count);
  	unsigned count;
 
+	exit_idle();
+	irq_enter();
+
 	do {
 		unsigned long pending_words;
 
@@ -550,7 +641,7 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 				int irq = evtchn_to_irq[port];
 
 				if (irq != -1)
-					xen_do_IRQ(irq, regs);
+					handle_irq(irq, regs);
 			}
 		}
 
@@ -561,12 +652,17 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
 	} while(count != 1);
 
 out:
+	irq_exit();
+	set_irq_regs(old_regs);
+
 	put_cpu();
 }
 
 /* Rebind a new event channel to an existing irq. */
 void rebind_evtchn_irq(int evtchn, int irq)
 {
+	struct irq_info *info = info_for_irq(irq);
+
 	/* Make sure the irq is masked, since the new event channel
 	   will also be masked. */
 	disable_irq(irq);
@@ -576,11 +672,11 @@ void rebind_evtchn_irq(int evtchn, int irq)
 	/* After resume the irq<->evtchn mappings are all cleared out */
 	BUG_ON(evtchn_to_irq[evtchn] != -1);
 	/* Expect irq to have been bound before,
-	   so the bindcount should be non-0 */
-	BUG_ON(irq_bindcount[irq] == 0);
+	   so there should be a proper type */
+	BUG_ON(info->type == IRQT_UNBOUND);
 
 	evtchn_to_irq[evtchn] = irq;
-	irq_info[irq] = mk_irq_info(IRQT_EVTCHN, 0, evtchn);
+	irq_info[irq] = mk_evtchn_info(evtchn);
 
 	spin_unlock(&irq_mapping_update_lock);
 
@@ -690,8 +786,7 @@ static void restore_cpu_virqs(unsigned int cpu)
 		if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1)
 			continue;
 
-		BUG_ON(irq_info[irq].type != IRQT_VIRQ);
-		BUG_ON(irq_info[irq].index != virq);
+		BUG_ON(virq_from_irq(irq) != virq);
 
 		/* Get a new binding from Xen. */
 		bind_virq.virq = virq;
@@ -703,7 +798,7 @@ static void restore_cpu_virqs(unsigned int cpu)
 
 		/* Record the new mapping. */
 		evtchn_to_irq[evtchn] = irq;
-		irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn);
+		irq_info[irq] = mk_virq_info(evtchn, virq);
 		bind_evtchn_to_cpu(evtchn, cpu);
 
 		/* Ready for use. */
@@ -720,8 +815,7 @@ static void restore_cpu_ipis(unsigned int cpu)
 		if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1)
 			continue;
 
-		BUG_ON(irq_info[irq].type != IRQT_IPI);
-		BUG_ON(irq_info[irq].index != ipi);
+		BUG_ON(ipi_from_irq(irq) != ipi);
 
 		/* Get a new binding from Xen. */
 		bind_ipi.vcpu = cpu;
@@ -732,7 +826,7 @@ static void restore_cpu_ipis(unsigned int cpu)
 
 		/* Record the new mapping. */
 		evtchn_to_irq[evtchn] = irq;
-		irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn);
+		irq_info[irq] = mk_ipi_info(evtchn, ipi);
 		bind_evtchn_to_cpu(evtchn, cpu);
 
 		/* Ready for use. */
@@ -812,8 +906,11 @@ void xen_irq_resume(void)
 
 static struct irq_chip xen_dynamic_chip __read_mostly = {
 	.name		= "xen-dyn",
+
+	.disable	= disable_dynirq,
 	.mask		= disable_dynirq,
 	.unmask		= enable_dynirq,
+
 	.ack		= ack_dynirq,
 	.set_affinity	= set_affinity_irq,
 	.retrigger	= retrigger_dynirq,
@@ -822,6 +919,10 @@ static struct irq_chip xen_dynamic_chip __read_mostly = {
 void __init xen_init_IRQ(void)
 {
 	int i;
+	size_t size = nr_cpu_ids * sizeof(struct cpu_evtchn_s);
+
+	cpu_evtchn_mask_p = alloc_bootmem(size);
+	BUG_ON(cpu_evtchn_mask_p == NULL);
 
 	init_evtchn_cpu_bindings();
 
@@ -829,9 +930,5 @@ void __init xen_init_IRQ(void)
 	for (i = 0; i < NR_EVENT_CHANNELS; i++)
 		mask_evtchn(i);
 
-	/* Dynamic IRQ space is currently unbound. Zero the refcnts. */
-	for (i = 0; i < nr_irqs; i++)
-		irq_bindcount[i] = 0;
-
 	irq_ctx_init(smp_processor_id());
 }
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 56892a142ee2..3ccd348d112d 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -108,7 +108,7 @@ static void do_suspend(void)
 	/* XXX use normal device tree? */
 	xenbus_suspend();
 
-	err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
+	err = stop_machine(xen_suspend, &cancelled, cpumask_of(0));
 	if (err) {
 		printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
 		goto out;
diff --git a/include/acpi/acpiosxf.h b/include/acpi/acpiosxf.h
index a62720a7edc0..ab0b85cf21f3 100644
--- a/include/acpi/acpiosxf.h
+++ b/include/acpi/acpiosxf.h
@@ -144,6 +144,7 @@ void __iomem *acpi_os_map_memory(acpi_physical_address where,
 				acpi_size length);
 
 void acpi_os_unmap_memory(void __iomem * logical_address, acpi_size size);
+void early_acpi_os_unmap_memory(void __iomem * virt, acpi_size size);
 
 #ifdef ACPI_FUTURE_USAGE
 acpi_status
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index c8e8cf45830f..cc40102fe2f3 100644
--- a/include/acpi/acpixf.h
+++ b/include/acpi/acpixf.h
@@ -130,6 +130,10 @@ acpi_get_table_header(acpi_string signature,
 		      struct acpi_table_header *out_table_header);
 
 acpi_status
+acpi_get_table_with_size(acpi_string signature,
+	       u32 instance, struct acpi_table_header **out_table,
+	       acpi_size *tbl_size);
+acpi_status
 acpi_get_table(acpi_string signature,
 	       u32 instance, struct acpi_table_header **out_table);
 
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index b0e63c672ebd..00f45ff081a6 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -80,4 +80,56 @@ extern void setup_per_cpu_areas(void);
 #define DECLARE_PER_CPU(type, name) extern PER_CPU_ATTRIBUTES \
 					__typeof__(type) per_cpu_var(name)
 
+/*
+ * Optional methods for optimized non-lvalue per-cpu variable access.
+ *
+ * @var can be a percpu variable or a field of it and its size should
+ * equal char, int or long.  percpu_read() evaluates to a lvalue and
+ * all others to void.
+ *
+ * These operations are guaranteed to be atomic w.r.t. preemption.
+ * The generic versions use plain get/put_cpu_var().  Archs are
+ * encouraged to implement single-instruction alternatives which don't
+ * require preemption protection.
+ */
+#ifndef percpu_read
+# define percpu_read(var)						\
+  ({									\
+	typeof(per_cpu_var(var)) __tmp_var__;				\
+	__tmp_var__ = get_cpu_var(var);					\
+	put_cpu_var(var);						\
+	__tmp_var__;							\
+  })
+#endif
+
+#define __percpu_generic_to_op(var, val, op)				\
+do {									\
+	get_cpu_var(var) op val;					\
+	put_cpu_var(var);						\
+} while (0)
+
+#ifndef percpu_write
+# define percpu_write(var, val)		__percpu_generic_to_op(var, (val), =)
+#endif
+
+#ifndef percpu_add
+# define percpu_add(var, val)		__percpu_generic_to_op(var, (val), +=)
+#endif
+
+#ifndef percpu_sub
+# define percpu_sub(var, val)		__percpu_generic_to_op(var, (val), -=)
+#endif
+
+#ifndef percpu_and
+# define percpu_and(var, val)		__percpu_generic_to_op(var, (val), &=)
+#endif
+
+#ifndef percpu_or
+# define percpu_or(var, val)		__percpu_generic_to_op(var, (val), |=)
+#endif
+
+#ifndef percpu_xor
+# define percpu_xor(var, val)		__percpu_generic_to_op(var, (val), ^=)
+#endif
+
 #endif /* _ASM_GENERIC_PERCPU_H_ */
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 79a7ff925bf8..4ce48e878530 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -9,7 +9,7 @@ extern char __bss_start[], __bss_stop[];
 extern char __init_begin[], __init_end[];
 extern char _sinittext[], _einittext[];
 extern char _end[];
-extern char __per_cpu_start[], __per_cpu_end[];
+extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
 extern char __kprobes_text_start[], __kprobes_text_end[];
 extern char __initdata_begin[], __initdata_end[];
 extern char __start_rodata[], __end_rodata[];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index aca40b93bd28..a654d724d3b0 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -427,12 +427,59 @@
   	*(.initcall7.init)						\
   	*(.initcall7s.init)
 
+/**
+ * PERCPU_VADDR - define output section for percpu area
+ * @vaddr: explicit base address (optional)
+ * @phdr: destination PHDR (optional)
+ *
+ * Macro which expands to output section for percpu area.  If @vaddr
+ * is not blank, it specifies explicit base address and all percpu
+ * symbols will be offset from the given address.  If blank, @vaddr
+ * always equals @laddr + LOAD_OFFSET.
+ *
+ * @phdr defines the output PHDR to use if not blank.  Be warned that
+ * output PHDR is sticky.  If @phdr is specified, the next output
+ * section in the linker script will go there too.  @phdr should have
+ * a leading colon.
+ *
+ * Note that this macros defines __per_cpu_load as an absolute symbol.
+ * If there is no need to put the percpu section at a predetermined
+ * address, use PERCPU().
+ */
+#define PERCPU_VADDR(vaddr, phdr)					\
+	VMLINUX_SYMBOL(__per_cpu_load) = .;				\
+	.data.percpu vaddr : AT(VMLINUX_SYMBOL(__per_cpu_load)		\
+				- LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
+		*(.data.percpu.first)					\
+		*(.data.percpu.page_aligned)				\
+		*(.data.percpu)						\
+		*(.data.percpu.shared_aligned)				\
+		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
+	} phdr								\
+	. = VMLINUX_SYMBOL(__per_cpu_load) + SIZEOF(.data.percpu);
+
+/**
+ * PERCPU - define output section for percpu area, simple version
+ * @align: required alignment
+ *
+ * Align to @align and outputs output section for percpu area.  This
+ * macro doesn't maniuplate @vaddr or @phdr and __per_cpu_load and
+ * __per_cpu_start will be identical.
+ *
+ * This macro is equivalent to ALIGN(align); PERCPU_VADDR( , ) except
+ * that __per_cpu_load is defined as a relative symbol against
+ * .data.percpu which is required for relocatable x86_32
+ * configuration.
+ */
 #define PERCPU(align)							\
 	. = ALIGN(align);						\
-	VMLINUX_SYMBOL(__per_cpu_start) = .;				\
-	.data.percpu  : AT(ADDR(.data.percpu) - LOAD_OFFSET) {		\
+	.data.percpu	: AT(ADDR(.data.percpu) - LOAD_OFFSET) {	\
+		VMLINUX_SYMBOL(__per_cpu_load) = .;			\
+		VMLINUX_SYMBOL(__per_cpu_start) = .;			\
+		*(.data.percpu.first)					\
 		*(.data.percpu.page_aligned)				\
 		*(.data.percpu)						\
 		*(.data.percpu.shared_aligned)				\
-	}								\
-	VMLINUX_SYMBOL(__per_cpu_end) = .;
+		VMLINUX_SYMBOL(__per_cpu_end) = .;			\
+	}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 6fce2fc2d124..78199151c00b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -79,6 +79,7 @@ typedef int (*acpi_table_handler) (struct acpi_table_header *table);
 typedef int (*acpi_table_entry_handler) (struct acpi_subtable_header *header, const unsigned long end);
 
 char * __acpi_map_table (unsigned long phys_addr, unsigned long size);
+void __acpi_unmap_table(char *map, unsigned long size);
 int early_acpi_boot_init(void);
 int acpi_boot_init (void);
 int acpi_boot_table_init (void);
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 95837bfb5256..455d83219fae 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -65,23 +65,20 @@ extern void free_bootmem(unsigned long addr, unsigned long size);
 #define BOOTMEM_DEFAULT		0
 #define BOOTMEM_EXCLUSIVE	(1<<0)
 
+extern int reserve_bootmem(unsigned long addr,
+			   unsigned long size,
+			   int flags);
 extern int reserve_bootmem_node(pg_data_t *pgdat,
-				 unsigned long physaddr,
-				 unsigned long size,
-				 int flags);
-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
-extern int reserve_bootmem(unsigned long addr, unsigned long size, int flags);
-#endif
+				unsigned long physaddr,
+				unsigned long size,
+				int flags);
 
-extern void *__alloc_bootmem_nopanic(unsigned long size,
+extern void *__alloc_bootmem(unsigned long size,
 			     unsigned long align,
 			     unsigned long goal);
-extern void *__alloc_bootmem(unsigned long size,
+extern void *__alloc_bootmem_nopanic(unsigned long size,
 				     unsigned long align,
 				     unsigned long goal);
-extern void *__alloc_bootmem_low(unsigned long size,
-				 unsigned long align,
-				 unsigned long goal);
 extern void *__alloc_bootmem_node(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
@@ -90,30 +87,35 @@ extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
 				  unsigned long goal);
+extern void *__alloc_bootmem_low(unsigned long size,
+				 unsigned long align,
+				 unsigned long goal);
 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long size,
 				      unsigned long align,
 				      unsigned long goal);
-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
+
 #define alloc_bootmem(x) \
 	__alloc_bootmem(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
 #define alloc_bootmem_nopanic(x) \
 	__alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low(x) \
-	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
 #define alloc_bootmem_pages(x) \
 	__alloc_bootmem(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
 #define alloc_bootmem_pages_nopanic(x) \
 	__alloc_bootmem_nopanic(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
-#define alloc_bootmem_low_pages(x) \
-	__alloc_bootmem_low(x, PAGE_SIZE, 0)
 #define alloc_bootmem_node(pgdat, x) \
 	__alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
 #define alloc_bootmem_pages_node(pgdat, x) \
 	__alloc_bootmem_node(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+#define alloc_bootmem_pages_node_nopanic(pgdat, x) \
+	__alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+
+#define alloc_bootmem_low(x) \
+	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
+#define alloc_bootmem_low_pages(x) \
+	__alloc_bootmem_low(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages_node(pgdat, x) \
 	__alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)
-#endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */
 
 extern int reserve_bootmem_generic(unsigned long addr, unsigned long size,
 				   int flags);
diff --git a/include/linux/bsg.h b/include/linux/bsg.h
index 3f0c64ace424..ecb4730d0868 100644
--- a/include/linux/bsg.h
+++ b/include/linux/bsg.h
@@ -1,6 +1,8 @@
 #ifndef BSG_H
 #define BSG_H
 
+#include <linux/types.h>
+
 #define BSG_PROTOCOL_SCSI		0
 
 #define BSG_SUB_PROTOCOL_SCSI_CMD	0
diff --git a/include/linux/decompress/bunzip2.h b/include/linux/decompress/bunzip2.h
new file mode 100644
index 000000000000..115272137a9c
--- /dev/null
+++ b/include/linux/decompress/bunzip2.h
@@ -0,0 +1,10 @@
+#ifndef DECOMPRESS_BUNZIP2_H
+#define DECOMPRESS_BUNZIP2_H
+
+int bunzip2(unsigned char *inbuf, int len,
+	    int(*fill)(void*, unsigned int),
+	    int(*flush)(void*, unsigned int),
+	    unsigned char *output,
+	    int *pos,
+	    void(*error)(char *x));
+#endif
diff --git a/include/linux/decompress/generic.h b/include/linux/decompress/generic.h
new file mode 100644
index 000000000000..6dfb856327bb
--- /dev/null
+++ b/include/linux/decompress/generic.h
@@ -0,0 +1,33 @@
+#ifndef DECOMPRESS_GENERIC_H
+#define DECOMPRESS_GENERIC_H
+
+/* Minimal chunksize to be read.
+ *Bzip2 prefers at least 4096
+ *Lzma prefers 0x10000 */
+#define COMPR_IOBUF_SIZE	4096
+
+typedef int (*decompress_fn) (unsigned char *inbuf, int len,
+			      int(*fill)(void*, unsigned int),
+			      int(*writebb)(void*, unsigned int),
+			      unsigned char *output,
+			      int *posp,
+			      void(*error)(char *x));
+
+/* inbuf   - input buffer
+ *len     - len of pre-read data in inbuf
+ *fill    - function to fill inbuf if empty
+ *writebb - function to write out outbug
+ *posp    - if non-null, input position (number of bytes read) will be
+ *	  returned here
+ *
+ *If len != 0, the inbuf is initialized (with as much data), and fill
+ *should not be called
+ *If len = 0, the inbuf is allocated, but empty. Its size is IOBUF_SIZE
+ *fill should be called (repeatedly...) to read data, at most IOBUF_SIZE
+ */
+
+/* Utility routine to detect the decompression method */
+decompress_fn decompress_method(const unsigned char *inbuf, int len,
+				const char **name);
+
+#endif
diff --git a/include/linux/decompress/inflate.h b/include/linux/decompress/inflate.h
new file mode 100644
index 000000000000..f9b06ccc3e5c
--- /dev/null
+++ b/include/linux/decompress/inflate.h
@@ -0,0 +1,13 @@
+#ifndef INFLATE_H
+#define INFLATE_H
+
+/* Other housekeeping constants */
+#define INBUFSIZ 4096
+
+int gunzip(unsigned char *inbuf, int len,
+	   int(*fill)(void*, unsigned int),
+	   int(*flush)(void*, unsigned int),
+	   unsigned char *output,
+	   int *pos,
+	   void(*error_fn)(char *x));
+#endif
diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
new file mode 100644
index 000000000000..12ff8c3f1d05
--- /dev/null
+++ b/include/linux/decompress/mm.h
@@ -0,0 +1,87 @@
+/*
+ * linux/compr_mm.h
+ *
+ * Memory management for pre-boot and ramdisk uncompressors
+ *
+ * Authors: Alain Knaff <alain@knaff.lu>
+ *
+ */
+
+#ifndef DECOMPR_MM_H
+#define DECOMPR_MM_H
+
+#ifdef STATIC
+
+/* Code active when included from pre-boot environment: */
+
+/* A trivial malloc implementation, adapted from
+ *  malloc by Hannu Savolainen 1993 and Matthias Urlichs 1994
+ */
+static unsigned long malloc_ptr;
+static int malloc_count;
+
+static void *malloc(int size)
+{
+	void *p;
+
+	if (size < 0)
+		error("Malloc error");
+	if (!malloc_ptr)
+		malloc_ptr = free_mem_ptr;
+
+	malloc_ptr = (malloc_ptr + 3) & ~3;     /* Align */
+
+	p = (void *)malloc_ptr;
+	malloc_ptr += size;
+
+	if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
+		error("Out of memory");
+
+	malloc_count++;
+	return p;
+}
+
+static void free(void *where)
+{
+	malloc_count--;
+	if (!malloc_count)
+		malloc_ptr = free_mem_ptr;
+}
+
+#define large_malloc(a) malloc(a)
+#define large_free(a) free(a)
+
+#define set_error_fn(x)
+
+#define INIT
+
+#else /* STATIC */
+
+/* Code active when compiled standalone for use when loading ramdisk: */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+
+/* Use defines rather than static inline in order to avoid spurious
+ * warnings when not needed (indeed large_malloc / large_free are not
+ * needed by inflate */
+
+#define malloc(a) kmalloc(a, GFP_KERNEL)
+#define free(a) kfree(a)
+
+#define large_malloc(a) vmalloc(a)
+#define large_free(a) vfree(a)
+
+static void(*error)(char *m);
+#define set_error_fn(x) error = x;
+
+#define INIT __init
+#define STATIC
+
+#include <linux/init.h>
+
+#endif /* STATIC */
+
+#endif /* DECOMPR_MM_H */
diff --git a/include/linux/decompress/unlzma.h b/include/linux/decompress/unlzma.h
new file mode 100644
index 000000000000..7796538f1bf4
--- /dev/null
+++ b/include/linux/decompress/unlzma.h
@@ -0,0 +1,12 @@
+#ifndef DECOMPRESS_UNLZMA_H
+#define DECOMPRESS_UNLZMA_H
+
+int unlzma(unsigned char *, int,
+	   int(*fill)(void*, unsigned int),
+	   int(*flush)(void*, unsigned int),
+	   unsigned char *output,
+	   int *posp,
+	   void(*error)(char *x)
+	);
+
+#endif
diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index 5ca54d77079f..7605c5e9589f 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -111,6 +111,15 @@ static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *re
 #endif
 }
 
+static inline void elf_core_copy_kernel_regs(elf_gregset_t *elfregs, struct pt_regs *regs)
+{
+#ifdef ELF_CORE_COPY_KERNEL_REGS
+	ELF_CORE_COPY_KERNEL_REGS((*elfregs), regs);
+#else
+	elf_core_copy_regs(elfregs, regs);
+#endif
+}
+
 static inline int elf_core_copy_task_regs(struct task_struct *t, elf_gregset_t* elfregs)
 {
 #ifdef ELF_CORE_COPY_TASK_REGS
diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 0216e1bdbc56..cfe4fe1b7132 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -78,6 +78,7 @@
 #define ETH_P_PAE	0x888E		/* Port Access Entity (IEEE 802.1X) */
 #define ETH_P_AOE	0x88A2		/* ATA over Ethernet		*/
 #define ETH_P_TIPC	0x88CA		/* TIPC 			*/
+#define ETH_P_FCOE	0x8906		/* Fibre Channel over Ethernet  */
 #define ETH_P_EDSA	0xDADA		/* Ethertype DSA [ NOT AN OFFICIALLY REGISTERED ID ] */
 
 /*
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 91658d076598..0c9cb63e6895 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -484,6 +484,7 @@ int show_interrupts(struct seq_file *p, void *v);
 struct irq_desc;
 
 extern int early_irq_init(void);
+extern int arch_probe_nr_irqs(void);
 extern int arch_early_irq_init(void);
 extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 6db939a575bd..873e4ac11b81 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -180,11 +180,11 @@ struct irq_desc {
 	unsigned int		irqs_unhandled;
 	spinlock_t		lock;
 #ifdef CONFIG_SMP
-	cpumask_t		affinity;
+	cpumask_var_t		affinity;
 	unsigned int		cpu;
-#endif
 #ifdef CONFIG_GENERIC_PENDING_IRQ
-	cpumask_t		pending_mask;
+	cpumask_var_t		pending_mask;
+#endif
 #endif
 #ifdef CONFIG_PROC_FS
 	struct proc_dir_entry	*dir;
@@ -414,4 +414,84 @@ extern int set_irq_msi(unsigned int irq, struct msi_desc *entry);
 
 #endif /* !CONFIG_S390 */
 
+#ifdef CONFIG_SMP
+/**
+ * init_alloc_desc_masks - allocate cpumasks for irq_desc
+ * @desc:	pointer to irq_desc struct
+ * @cpu:	cpu which will be handling the cpumasks
+ * @boot:	true if need bootmem
+ *
+ * Allocates affinity and pending_mask cpumask if required.
+ * Returns true if successful (or not required).
+ * Side effect: affinity has all bits set, pending_mask has all bits clear.
+ */
+static inline bool init_alloc_desc_masks(struct irq_desc *desc, int cpu,
+								bool boot)
+{
+	int node;
+
+	if (boot) {
+		alloc_bootmem_cpumask_var(&desc->affinity);
+		cpumask_setall(desc->affinity);
+
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+		alloc_bootmem_cpumask_var(&desc->pending_mask);
+		cpumask_clear(desc->pending_mask);
+#endif
+		return true;
+	}
+
+	node = cpu_to_node(cpu);
+
+	if (!alloc_cpumask_var_node(&desc->affinity, GFP_ATOMIC, node))
+		return false;
+	cpumask_setall(desc->affinity);
+
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	if (!alloc_cpumask_var_node(&desc->pending_mask, GFP_ATOMIC, node)) {
+		free_cpumask_var(desc->affinity);
+		return false;
+	}
+	cpumask_clear(desc->pending_mask);
+#endif
+	return true;
+}
+
+/**
+ * init_copy_desc_masks - copy cpumasks for irq_desc
+ * @old_desc:	pointer to old irq_desc struct
+ * @new_desc:	pointer to new irq_desc struct
+ *
+ * Insures affinity and pending_masks are copied to new irq_desc.
+ * If !CONFIG_CPUMASKS_OFFSTACK the cpumasks are embedded in the
+ * irq_desc struct so the copy is redundant.
+ */
+
+static inline void init_copy_desc_masks(struct irq_desc *old_desc,
+					struct irq_desc *new_desc)
+{
+#ifdef CONFIG_CPUMASKS_OFFSTACK
+	cpumask_copy(new_desc->affinity, old_desc->affinity);
+
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	cpumask_copy(new_desc->pending_mask, old_desc->pending_mask);
+#endif
+#endif
+}
+
+#else /* !CONFIG_SMP */
+
+static inline bool init_alloc_desc_masks(struct irq_desc *desc, int cpu,
+								bool boot)
+{
+	return true;
+}
+
+static inline void init_copy_desc_masks(struct irq_desc *old_desc,
+					struct irq_desc *new_desc)
+{
+}
+
+#endif	/* CONFIG_SMP */
+
 #endif /* _LINUX_IRQ_H */
diff --git a/include/linux/irqnr.h b/include/linux/irqnr.h
index 52ebbb4b161d..ec87b212ff7d 100644
--- a/include/linux/irqnr.h
+++ b/include/linux/irqnr.h
@@ -20,6 +20,7 @@
 
 # define for_each_irq_desc_reverse(irq, desc)                          \
 	for (irq = nr_irqs - 1; irq >= 0; irq--)
+
 #else /* CONFIG_GENERIC_HARDIRQS */
 
 extern int nr_irqs;
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 32851eef48f0..2ec6cc14a114 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -182,6 +182,14 @@ struct kprobe_blackpoint {
 DECLARE_PER_CPU(struct kprobe *, current_kprobe);
 DECLARE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
 
+/*
+ * For #ifdef avoidance:
+ */
+static inline int kprobes_built_in(void)
+{
+	return 1;
+}
+
 #ifdef CONFIG_KRETPROBES
 extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				   struct pt_regs *regs);
@@ -271,8 +279,16 @@ void unregister_kretprobes(struct kretprobe **rps, int num);
 void kprobe_flush_task(struct task_struct *tk);
 void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
 
-#else /* CONFIG_KPROBES */
+#else /* !CONFIG_KPROBES: */
 
+static inline int kprobes_built_in(void)
+{
+	return 0;
+}
+static inline int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
+{
+	return 0;
+}
 static inline struct kprobe *get_kprobe(void *addr)
 {
 	return NULL;
@@ -329,5 +345,5 @@ static inline void unregister_kretprobes(struct kretprobe **rps, int num)
 static inline void kprobe_flush_task(struct task_struct *tk)
 {
 }
-#endif				/* CONFIG_KPROBES */
-#endif				/* _LINUX_KPROBES_H */
+#endif /* CONFIG_KPROBES */
+#endif /* _LINUX_KPROBES_H */
diff --git a/include/linux/magic.h b/include/linux/magic.h
index 0b4df7eba852..5b4e28bcb788 100644
--- a/include/linux/magic.h
+++ b/include/linux/magic.h
@@ -49,4 +49,5 @@
 #define FUTEXFS_SUPER_MAGIC	0xBAD1DEA
 #define INOTIFYFS_SUPER_MAGIC	0x2BAD1DEA
 
+#define STACK_END_MAGIC		0x57AC6E9D
 #endif /* __LINUX_MAGIC_H__ */
diff --git a/include/linux/major.h b/include/linux/major.h
index 88249452b935..058ec15dd060 100644
--- a/include/linux/major.h
+++ b/include/linux/major.h
@@ -171,5 +171,6 @@
 #define VIOTAPE_MAJOR		230
 
 #define BLOCK_EXT_MAJOR		259
+#define SCSI_OSD_MAJOR		260	/* open-osd's OSD scsi device */
 
 #endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index a820f816a49e..beb6ec99cfef 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -26,6 +26,7 @@
 #define TUN_MINOR		200
 #define MWAVE_MINOR		219	/* ACP/Mwave Modem */
 #define MPT_MINOR		220
+#define MPT2SAS_MINOR		221
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
diff --git a/include/linux/mmiotrace.h b/include/linux/mmiotrace.h
index 139d7c88d9c9..3d1b7bde1283 100644
--- a/include/linux/mmiotrace.h
+++ b/include/linux/mmiotrace.h
@@ -1,5 +1,5 @@
-#ifndef MMIOTRACE_H
-#define MMIOTRACE_H
+#ifndef _LINUX_MMIOTRACE_H
+#define _LINUX_MMIOTRACE_H
 
 #include <linux/types.h>
 #include <linux/list.h>
@@ -13,28 +13,34 @@ typedef void (*kmmio_post_handler_t)(struct kmmio_probe *,
 				unsigned long condition, struct pt_regs *);
 
 struct kmmio_probe {
-	struct list_head list; /* kmmio internal list */
-	unsigned long addr; /* start location of the probe point */
-	unsigned long len; /* length of the probe region */
-	kmmio_pre_handler_t pre_handler; /* Called before addr is executed. */
-	kmmio_post_handler_t post_handler; /* Called after addr is executed */
-	void *private;
+	/* kmmio internal list: */
+	struct list_head	list;
+	/* start location of the probe point: */
+	unsigned long		addr;
+	/* length of the probe region: */
+	unsigned long		len;
+	/* Called before addr is executed: */
+	kmmio_pre_handler_t	pre_handler;
+	/* Called after addr is executed: */
+	kmmio_post_handler_t	post_handler;
+	void			*private;
 };
 
+extern unsigned int kmmio_count;
+
+extern int register_kmmio_probe(struct kmmio_probe *p);
+extern void unregister_kmmio_probe(struct kmmio_probe *p);
+
+#ifdef CONFIG_MMIOTRACE
 /* kmmio is active by some kmmio_probes? */
 static inline int is_kmmio_active(void)
 {
-	extern unsigned int kmmio_count;
 	return kmmio_count;
 }
 
-extern int register_kmmio_probe(struct kmmio_probe *p);
-extern void unregister_kmmio_probe(struct kmmio_probe *p);
-
 /* Called from page fault handler. */
 extern int kmmio_handler(struct pt_regs *regs, unsigned long addr);
 
-#ifdef CONFIG_MMIOTRACE
 /* Called from ioremap.c */
 extern void mmiotrace_ioremap(resource_size_t offset, unsigned long size,
 							void __iomem *addr);
@@ -43,7 +49,17 @@ extern void mmiotrace_iounmap(volatile void __iomem *addr);
 /* For anyone to insert markers. Remember trailing newline. */
 extern int mmiotrace_printk(const char *fmt, ...)
 				__attribute__ ((format (printf, 1, 2)));
-#else
+#else /* !CONFIG_MMIOTRACE: */
+static inline int is_kmmio_active(void)
+{
+	return 0;
+}
+
+static inline int kmmio_handler(struct pt_regs *regs, unsigned long addr)
+{
+	return 0;
+}
+
 static inline void mmiotrace_ioremap(resource_size_t offset,
 					unsigned long size, void __iomem *addr)
 {
@@ -63,28 +79,28 @@ static inline int mmiotrace_printk(const char *fmt, ...)
 #endif /* CONFIG_MMIOTRACE */
 
 enum mm_io_opcode {
-	MMIO_READ = 0x1,     /* struct mmiotrace_rw */
-	MMIO_WRITE = 0x2,    /* struct mmiotrace_rw */
-	MMIO_PROBE = 0x3,    /* struct mmiotrace_map */
-	MMIO_UNPROBE = 0x4,  /* struct mmiotrace_map */
-	MMIO_UNKNOWN_OP = 0x5, /* struct mmiotrace_rw */
+	MMIO_READ	= 0x1,	/* struct mmiotrace_rw */
+	MMIO_WRITE	= 0x2,	/* struct mmiotrace_rw */
+	MMIO_PROBE	= 0x3,	/* struct mmiotrace_map */
+	MMIO_UNPROBE	= 0x4,	/* struct mmiotrace_map */
+	MMIO_UNKNOWN_OP = 0x5,	/* struct mmiotrace_rw */
 };
 
 struct mmiotrace_rw {
-	resource_size_t phys;	/* PCI address of register */
-	unsigned long value;
-	unsigned long pc;	/* optional program counter */
-	int map_id;
-	unsigned char opcode;	/* one of MMIO_{READ,WRITE,UNKNOWN_OP} */
-	unsigned char width;	/* size of register access in bytes */
+	resource_size_t	phys;	/* PCI address of register */
+	unsigned long	value;
+	unsigned long	pc;	/* optional program counter */
+	int		map_id;
+	unsigned char	opcode;	/* one of MMIO_{READ,WRITE,UNKNOWN_OP} */
+	unsigned char	width;	/* size of register access in bytes */
 };
 
 struct mmiotrace_map {
-	resource_size_t phys;	/* base address in PCI space */
-	unsigned long virt;	/* base virtual address */
-	unsigned long len;	/* mapping size */
-	int map_id;
-	unsigned char opcode;	/* MMIO_PROBE or MMIO_UNPROBE */
+	resource_size_t	phys;	/* base address in PCI space */
+	unsigned long	virt;	/* base virtual address */
+	unsigned long	len;	/* mapping size */
+	int		map_id;
+	unsigned char	opcode;	/* MMIO_PROBE or MMIO_UNPROBE */
 };
 
 /* in kernel/trace/trace_mmiotrace.c */
@@ -94,4 +110,4 @@ extern void mmio_trace_rw(struct mmiotrace_rw *rw);
 extern void mmio_trace_mapping(struct mmiotrace_map *map);
 extern int mmio_trace_printk(const char *fmt, va_list args);
 
-#endif /* MMIOTRACE_H */
+#endif /* _LINUX_MMIOTRACE_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1b55952a17f6..2e7783f4a755 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -594,6 +594,14 @@ struct net_device_ops {
 #define HAVE_NETDEV_POLL
 	void                    (*ndo_poll_controller)(struct net_device *dev);
 #endif
+#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
+	int			(*ndo_fcoe_ddp_setup)(struct net_device *dev,
+						      u16 xid,
+						      struct scatterlist *sgl,
+						      unsigned int sgc);
+	int			(*ndo_fcoe_ddp_done)(struct net_device *dev,
+						     u16 xid);
+#endif
 };
 
 /*
@@ -662,14 +670,17 @@ struct net_device
 #define NETIF_F_GRO		16384	/* Generic receive offload */
 #define NETIF_F_LRO		32768	/* large receive offload */
 
+#define NETIF_F_FCOE_CRC	(1 << 24) /* FCoE CRC32 */
+
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
-#define NETIF_F_GSO_MASK	0xffff0000
+#define NETIF_F_GSO_MASK	0x00ff0000
 #define NETIF_F_TSO		(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
 #define NETIF_F_UFO		(SKB_GSO_UDP << NETIF_F_GSO_SHIFT)
 #define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
 #define NETIF_F_TSO_ECN		(SKB_GSO_TCP_ECN << NETIF_F_GSO_SHIFT)
 #define NETIF_F_TSO6		(SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_FSO		(SKB_GSO_FCOE << NETIF_F_GSO_SHIFT)
 
 	/* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
@@ -852,6 +863,11 @@ struct net_device
 	struct dcbnl_rtnl_ops *dcbnl_ops;
 #endif
 
+#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
+	/* max exchange id for FCoE LRO by ddp */
+	unsigned int		fcoe_ddp_xid;
+#endif
+
 #ifdef CONFIG_COMPAT_NET_DEV_OPS
 	struct {
 		int			(*init)(struct net_device *dev);
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 9f2a3751873a..ee5615d65211 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -5,53 +5,66 @@
 #include <linux/slab.h> /* For kmalloc() */
 #include <linux/smp.h>
 #include <linux/cpumask.h>
+#include <linux/pfn.h>
 
 #include <asm/percpu.h>
 
+#ifndef PER_CPU_BASE_SECTION
+#ifdef CONFIG_SMP
+#define PER_CPU_BASE_SECTION ".data.percpu"
+#else
+#define PER_CPU_BASE_SECTION ".data"
+#endif
+#endif
+
 #ifdef CONFIG_SMP
-#define DEFINE_PER_CPU(type, name)					\
-	__attribute__((__section__(".data.percpu")))			\
-	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
 
 #ifdef MODULE
-#define SHARED_ALIGNED_SECTION ".data.percpu"
+#define PER_CPU_SHARED_ALIGNED_SECTION ""
 #else
-#define SHARED_ALIGNED_SECTION ".data.percpu.shared_aligned"
+#define PER_CPU_SHARED_ALIGNED_SECTION ".shared_aligned"
 #endif
+#define PER_CPU_FIRST_SECTION ".first"
 
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
-	__attribute__((__section__(SHARED_ALIGNED_SECTION)))		\
-	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name		\
-	____cacheline_aligned_in_smp
+#else
+
+#define PER_CPU_SHARED_ALIGNED_SECTION ""
+#define PER_CPU_FIRST_SECTION ""
 
-#define DEFINE_PER_CPU_PAGE_ALIGNED(type, name)			\
-	__attribute__((__section__(".data.percpu.page_aligned")))	\
+#endif
+
+#define DEFINE_PER_CPU_SECTION(type, name, section)			\
+	__attribute__((__section__(PER_CPU_BASE_SECTION section)))	\
 	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
-#else
+
 #define DEFINE_PER_CPU(type, name)					\
-	PER_CPU_ATTRIBUTES __typeof__(type) per_cpu__##name
+	DEFINE_PER_CPU_SECTION(type, name, "")
 
-#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)		      \
-	DEFINE_PER_CPU(type, name)
+#define DEFINE_PER_CPU_SHARED_ALIGNED(type, name)			\
+	DEFINE_PER_CPU_SECTION(type, name, PER_CPU_SHARED_ALIGNED_SECTION) \
+	____cacheline_aligned_in_smp
 
-#define DEFINE_PER_CPU_PAGE_ALIGNED(type, name)		      \
-	DEFINE_PER_CPU(type, name)
-#endif
+#define DEFINE_PER_CPU_PAGE_ALIGNED(type, name)				\
+	DEFINE_PER_CPU_SECTION(type, name, ".page_aligned")
+
+#define DEFINE_PER_CPU_FIRST(type, name)				\
+	DEFINE_PER_CPU_SECTION(type, name, PER_CPU_FIRST_SECTION)
 
 #define EXPORT_PER_CPU_SYMBOL(var) EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var) EXPORT_SYMBOL_GPL(per_cpu__##var)
 
-/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
-#ifndef PERCPU_ENOUGH_ROOM
+/* enough to cover all DEFINE_PER_CPUs in modules */
 #ifdef CONFIG_MODULES
-#define PERCPU_MODULE_RESERVE	8192
+#define PERCPU_MODULE_RESERVE		(8 << 10)
 #else
-#define PERCPU_MODULE_RESERVE	0
+#define PERCPU_MODULE_RESERVE		0
 #endif
 
+#ifndef PERCPU_ENOUGH_ROOM
 #define PERCPU_ENOUGH_ROOM						\
-	(__per_cpu_end - __per_cpu_start + PERCPU_MODULE_RESERVE)
-#endif	/* PERCPU_ENOUGH_ROOM */
+	(ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES) +	\
+	 PERCPU_MODULE_RESERVE)
+#endif
 
 /*
  * Must be an lvalue. Since @var must be a simple identifier,
@@ -65,52 +78,94 @@
 
 #ifdef CONFIG_SMP
 
+#ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA
+
+/* minimum unit size, also is the maximum supported allocation size */
+#define PCPU_MIN_UNIT_SIZE		PFN_ALIGN(64 << 10)
+
+/*
+ * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy
+ * back on the first chunk for dynamic percpu allocation if arch is
+ * manually allocating and mapping it for faster access (as a part of
+ * large page mapping for example).
+ *
+ * The following values give between one and two pages of free space
+ * after typical minimal boot (2-way SMP, single disk and NIC) with
+ * both defconfig and a distro config on x86_64 and 32.  More
+ * intelligent way to determine this would be nice.
+ */
+#if BITS_PER_LONG > 32
+#define PERCPU_DYNAMIC_RESERVE		(20 << 10)
+#else
+#define PERCPU_DYNAMIC_RESERVE		(12 << 10)
+#endif
+
+extern void *pcpu_base_addr;
+
+typedef struct page * (*pcpu_get_page_fn_t)(unsigned int cpu, int pageno);
+typedef void (*pcpu_populate_pte_fn_t)(unsigned long addr);
+
+extern size_t __init pcpu_setup_first_chunk(pcpu_get_page_fn_t get_page_fn,
+				size_t static_size, size_t reserved_size,
+				ssize_t dyn_size, ssize_t unit_size,
+				void *base_addr,
+				pcpu_populate_pte_fn_t populate_pte_fn);
+
+extern ssize_t __init pcpu_embed_first_chunk(
+				size_t static_size, size_t reserved_size,
+				ssize_t dyn_size, ssize_t unit_size);
+
+/*
+ * Use this to get to a cpu's version of the per-cpu object
+ * dynamically allocated. Non-atomic access to the current CPU's
+ * version should probably be combined with get_cpu()/put_cpu().
+ */
+#define per_cpu_ptr(ptr, cpu)	SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))
+
+extern void *__alloc_reserved_percpu(size_t size, size_t align);
+
+#else /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
+
 struct percpu_data {
 	void *ptrs[1];
 };
 
 #define __percpu_disguise(pdata) (struct percpu_data *)~(unsigned long)(pdata)
-/* 
- * Use this to get to a cpu's version of the per-cpu object dynamically
- * allocated. Non-atomic access to the current CPU's version should
- * probably be combined with get_cpu()/put_cpu().
- */ 
-#define percpu_ptr(ptr, cpu)                              \
-({                                                        \
-        struct percpu_data *__p = __percpu_disguise(ptr); \
-        (__typeof__(ptr))__p->ptrs[(cpu)];	          \
+
+#define per_cpu_ptr(ptr, cpu)						\
+({									\
+        struct percpu_data *__p = __percpu_disguise(ptr);		\
+        (__typeof__(ptr))__p->ptrs[(cpu)];				\
 })
 
-extern void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask);
-extern void percpu_free(void *__pdata);
+#endif /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
+
+extern void *__alloc_percpu(size_t size, size_t align);
+extern void free_percpu(void *__pdata);
 
 #else /* CONFIG_SMP */
 
-#define percpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })
+#define per_cpu_ptr(ptr, cpu) ({ (void)(cpu); (ptr); })
 
-static __always_inline void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
+static inline void *__alloc_percpu(size_t size, size_t align)
 {
-	return kzalloc(size, gfp);
+	/*
+	 * Can't easily make larger alignment work with kmalloc.  WARN
+	 * on it.  Larger alignment should only be used for module
+	 * percpu sections on SMP for which this path isn't used.
+	 */
+	WARN_ON_ONCE(align > SMP_CACHE_BYTES);
+	return kzalloc(size, GFP_KERNEL);
 }
 
-static inline void percpu_free(void *__pdata)
+static inline void free_percpu(void *p)
 {
-	kfree(__pdata);
+	kfree(p);
 }
 
 #endif /* CONFIG_SMP */
 
-#define percpu_alloc_mask(size, gfp, mask) \
-	__percpu_alloc_mask((size), (gfp), &(mask))
-
-#define percpu_alloc(size, gfp) percpu_alloc_mask((size), (gfp), cpu_online_map)
-
-/* (legacy) interface for use without CPU hotplug handling */
-
-#define __alloc_percpu(size)	percpu_alloc_mask((size), GFP_KERNEL, \
-						  cpu_possible_map)
-#define alloc_percpu(type)	(type *)__alloc_percpu(sizeof(type))
-#define free_percpu(ptr)	percpu_free((ptr))
-#define per_cpu_ptr(ptr, cpu)	percpu_ptr((ptr), (cpu))
+#define alloc_percpu(type)	(type *)__alloc_percpu(sizeof(type), \
+						       __alignof__(type))
 
 #endif /* __LINUX_PERCPU_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ff904b0606d4..1d19c025f9d2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1185,10 +1185,9 @@ struct task_struct {
 	pid_t pid;
 	pid_t tgid;
 
-#ifdef CONFIG_CC_STACKPROTECTOR
 	/* Canary value for the -fstack-protector gcc feature */
 	unsigned long stack_canary;
-#endif
+
 	/* 
 	 * pointers to (original) parent process, youngest child, younger sibling,
 	 * older sibling, respectively.  (p->father can be replaced with 
@@ -2107,6 +2106,19 @@ static inline int object_is_on_stack(void *obj)
 
 extern void thread_info_cache_init(void);
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static inline unsigned long stack_not_used(struct task_struct *p)
+{
+	unsigned long *n = end_of_stack(p);
+
+	do { 	/* Skip over canary */
+		n++;
+	} while (!*n);
+
+	return (unsigned long)n - (unsigned long)end_of_stack(p);
+}
+#endif
+
 /* set thread flags in other task's structures
  * - see asm/thread_info.h for TIF_xxxx flags available
  */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bb1981fd60f3..55d67300fa10 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -236,6 +236,8 @@ enum {
 	SKB_GSO_TCP_ECN = 1 << 3,
 
 	SKB_GSO_TCPV6 = 1 << 4,
+
+	SKB_GSO_FCOE = 1 << 5,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 715196b09d67..bbacb7baa446 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -176,6 +176,12 @@ static inline void init_call_single_data(void)
 #define put_cpu()		preempt_enable()
 #define put_cpu_no_resched()	preempt_enable_no_resched()
 
+/*
+ * Callback to arch code if there's nosmp or maxcpus=0 on the
+ * boot command line:
+ */
+extern void arch_disable_smp_support(void);
+
 void smp_setup_processor_id(void);
 
 #endif /* __LINUX_SMP_H */
diff --git a/include/linux/stackprotector.h b/include/linux/stackprotector.h
new file mode 100644
index 000000000000..6f3e54c704c0
--- /dev/null
+++ b/include/linux/stackprotector.h
@@ -0,0 +1,16 @@
+#ifndef _LINUX_STACKPROTECTOR_H
+#define _LINUX_STACKPROTECTOR_H 1
+
+#include <linux/compiler.h>
+#include <linux/sched.h>
+#include <linux/random.h>
+
+#ifdef CONFIG_CC_STACKPROTECTOR
+# include <asm/stackprotector.h>
+#else
+static inline void boot_init_stack_canary(void)
+{
+}
+#endif
+
+#endif
diff --git a/include/linux/topology.h b/include/linux/topology.h
index e632d29f0544..a16b9e06f2e5 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -193,5 +193,11 @@ int arch_update_cpu_topology(void);
 #ifndef topology_core_siblings
 #define topology_core_siblings(cpu)		cpumask_of_cpu(cpu)
 #endif
+#ifndef topology_thread_cpumask
+#define topology_thread_cpumask(cpu)		cpumask_of(cpu)
+#endif
+#ifndef topology_core_cpumask
+#define topology_core_cpumask(cpu)		cpumask_of(cpu)
+#endif
 
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9c0890c7a06a..a43ebec3a7b9 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -95,6 +95,9 @@ extern struct vm_struct *remove_vm_area(const void *addr);
 
 extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
 			struct page ***pages);
+extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
+				    pgprot_t prot, struct page **pages);
+extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size);
 extern void unmap_kernel_range(unsigned long addr, unsigned long size);
 
 /* Allocate/destroy a 'vmalloc' VM area. */
@@ -110,5 +113,6 @@ extern long vwrite(char *buf, char *addr, unsigned long count);
  */
 extern rwlock_t vmlist_lock;
 extern struct vm_struct *vmlist;
+extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
 
 #endif /* _LINUX_VMALLOC_H */
diff --git a/include/scsi/fc/fc_fcoe.h b/include/scsi/fc/fc_fcoe.h
index f271d9cc0fc2..ccb3dbe90463 100644
--- a/include/scsi/fc/fc_fcoe.h
+++ b/include/scsi/fc/fc_fcoe.h
@@ -25,13 +25,6 @@
  */
 
 /*
- * The FCoE ethertype eventually goes in net/if_ether.h.
- */
-#ifndef ETH_P_FCOE
-#define	ETH_P_FCOE	0x8906		/* FCOE ether type */
-#endif
-
-/*
  * FC_FCOE_OUI hasn't been standardized yet.   XXX TBD.
  */
 #ifndef FC_FCOE_OUI
diff --git a/include/scsi/fc_frame.h b/include/scsi/fc_frame.h
index 04d34a71355f..59511057cee0 100644
--- a/include/scsi/fc_frame.h
+++ b/include/scsi/fc_frame.h
@@ -54,8 +54,7 @@
 #define fr_eof(fp)	(fr_cb(fp)->fr_eof)
 #define fr_flags(fp)	(fr_cb(fp)->fr_flags)
 #define fr_max_payload(fp)	(fr_cb(fp)->fr_max_payload)
-#define fr_cmd(fp)	(fr_cb(fp)->fr_cmd)
-#define fr_dir(fp)	(fr_cmd(fp)->sc_data_direction)
+#define fr_fsp(fp)	(fr_cb(fp)->fr_fsp)
 #define fr_crc(fp)	(fr_cb(fp)->fr_crc)
 
 struct fc_frame {
@@ -66,7 +65,7 @@ struct fcoe_rcv_info {
 	struct packet_type  *ptype;
 	struct fc_lport	*fr_dev;	/* transport layer private pointer */
 	struct fc_seq	*fr_seq;	/* for use with exchange manager */
-	struct scsi_cmnd *fr_cmd;	/* for use of scsi command */
+	struct fc_fcp_pkt *fr_fsp;	/* for the corresponding fcp I/O */
 	u32		fr_crc;
 	u16		fr_max_payload;	/* max FC payload */
 	enum fc_sof	fr_sof;		/* start of frame delimiter */
@@ -218,20 +217,6 @@ static inline bool fc_frame_is_cmd(const struct fc_frame *fp)
 	return fc_frame_rctl(fp) == FC_RCTL_DD_UNSOL_CMD;
 }
 
-static inline bool fc_frame_is_read(const struct fc_frame *fp)
-{
-	if (fc_frame_is_cmd(fp) && fr_cmd(fp))
-		return fr_dir(fp) == DMA_FROM_DEVICE;
-	return false;
-}
-
-static inline bool fc_frame_is_write(const struct fc_frame *fp)
-{
-	if (fc_frame_is_cmd(fp) && fr_cmd(fp))
-		return fr_dir(fp) == DMA_TO_DEVICE;
-	return false;
-}
-
 /*
  * Check for leaks.
  * Print the frame header of any currently allocated frame, assuming there
diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
index a2e126b86e3e..a70eafaad084 100644
--- a/include/scsi/libfc.h
+++ b/include/scsi/libfc.h
@@ -245,6 +245,7 @@ struct fc_fcp_pkt {
 	 */
 	struct fcp_cmnd cdb_cmd;
 	size_t		xfer_len;
+	u16		xfer_ddp;	/* this xfer is ddped */
 	u32		xfer_contig_end; /* offset of end of contiguous xfer */
 	u16		max_payload;	/* max payload size in bytes */
 
@@ -267,6 +268,15 @@ struct fc_fcp_pkt {
 	u8		recov_retry;	/* count of recovery retries */
 	struct fc_seq	*recov_seq;	/* sequence for REC or SRR */
 };
+/*
+ * FC_FCP HELPER FUNCTIONS
+ *****************************/
+static inline bool fc_fcp_is_read(const struct fc_fcp_pkt *fsp)
+{
+	if (fsp && fsp->cmd)
+		return fsp->cmd->sc_data_direction == DMA_FROM_DEVICE;
+	return false;
+}
 
 /*
  * Structure and function definitions for managing Fibre Channel Exchanges
@@ -400,6 +410,21 @@ struct libfc_function_template {
 					void *arg, unsigned int timer_msec);
 
 	/*
+	 * Sets up the DDP context for a given exchange id on the given
+	 * scatterlist if LLD supports DDP for large receive.
+	 *
+	 * STATUS: OPTIONAL
+	 */
+	int (*ddp_setup)(struct fc_lport *lp, u16 xid,
+			 struct scatterlist *sgl, unsigned int sgc);
+	/*
+	 * Completes the DDP transfer and returns the length of data DDPed
+	 * for the given exchange id.
+	 *
+	 * STATUS: OPTIONAL
+	 */
+	int (*ddp_done)(struct fc_lport *lp, u16 xid);
+	/*
 	 * Send a frame using an existing sequence and exchange.
 	 *
 	 * STATUS: OPTIONAL
@@ -654,6 +679,7 @@ struct fc_lport {
 	u16			link_speed;
 	u16			link_supported_speeds;
 	u16			lro_xid;	/* max xid for fcoe lro */
+	unsigned int		lso_max;	/* max large send size */
 	struct fc_ns_fts	fcts;	        /* FC-4 type masks */
 	struct fc_els_rnid_gen	rnid_gen;	/* RNID information */
 
@@ -821,6 +847,11 @@ int fc_change_queue_type(struct scsi_device *sdev, int tag_type);
 void fc_fcp_destroy(struct fc_lport *);
 
 /*
+ * Set up direct-data placement for this I/O request
+ */
+void fc_fcp_ddp_setup(struct fc_fcp_pkt *fsp, u16 xid);
+
+/*
  * ELS/CT interface
  *****************************/
 /*
diff --git a/include/scsi/libfcoe.h b/include/scsi/libfcoe.h
index 941818f29f59..c41f7d0c6efc 100644
--- a/include/scsi/libfcoe.h
+++ b/include/scsi/libfcoe.h
@@ -124,24 +124,6 @@ static inline u16 skb_fc_rxid(const struct sk_buff *skb)
 	return be16_to_cpu(skb_fc_header(skb)->fh_rx_id);
 }
 
-/* FIXME - DMA_BIDIRECTIONAL ? */
-#define skb_cb(skb)	((struct fcoe_rcv_info *)&((skb)->cb[0]))
-#define skb_cmd(skb)	(skb_cb(skb)->fr_cmd)
-#define skb_dir(skb)	(skb_cmd(skb)->sc_data_direction)
-static inline bool skb_fc_is_read(const struct sk_buff *skb)
-{
-	if (skb_fc_is_cmd(skb) && skb_cmd(skb))
-		return skb_dir(skb) == DMA_FROM_DEVICE;
-	return false;
-}
-
-static inline bool skb_fc_is_write(const struct sk_buff *skb)
-{
-	if (skb_fc_is_cmd(skb) && skb_cmd(skb))
-		return skb_dir(skb) == DMA_TO_DEVICE;
-	return false;
-}
-
 /* libfcoe funcs */
 int fcoe_reset(struct Scsi_Host *shost);
 u64 fcoe_wwn_from_mac(unsigned char mac[MAX_ADDR_LEN],
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 7360e1916e75..7ffaed2f94dd 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -45,18 +45,10 @@ struct iscsi_session;
 struct iscsi_nopin;
 struct device;
 
-/* #define DEBUG_SCSI */
-#ifdef DEBUG_SCSI
-#define debug_scsi(fmt...) printk(KERN_INFO "iscsi: " fmt)
-#else
-#define debug_scsi(fmt...)
-#endif
-
 #define ISCSI_DEF_XMIT_CMDS_MAX	128	/* must be power of 2 */
 #define ISCSI_MGMT_CMDS_MAX	15
 
-#define ISCSI_DEF_CMD_PER_LUN		32
-#define ISCSI_MAX_CMD_PER_LUN		128
+#define ISCSI_DEF_CMD_PER_LUN	32
 
 /* Task Mgmt states */
 enum {
@@ -326,6 +318,9 @@ struct iscsi_host {
 	spinlock_t		lock;
 	int			num_sessions;
 	int			state;
+
+	struct workqueue_struct	*workq;
+	char			workq_name[20];
 };
 
 /*
@@ -351,7 +346,8 @@ extern int iscsi_host_get_param(struct Scsi_Host *shost,
 				enum iscsi_host_param param, char *buf);
 extern int iscsi_host_add(struct Scsi_Host *shost, struct device *pdev);
 extern struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
-					  int dd_data_size, uint16_t qdepth);
+					  int dd_data_size,
+					  bool xmit_can_sleep);
 extern void iscsi_host_remove(struct Scsi_Host *shost);
 extern void iscsi_host_free(struct Scsi_Host *shost);
 
@@ -382,11 +378,12 @@ extern void iscsi_conn_stop(struct iscsi_cls_conn *, int);
 extern int iscsi_conn_bind(struct iscsi_cls_session *, struct iscsi_cls_conn *,
 			   int);
 extern void iscsi_conn_failure(struct iscsi_conn *conn, enum iscsi_err err);
-extern void iscsi_session_failure(struct iscsi_cls_session *cls_session,
+extern void iscsi_session_failure(struct iscsi_session *session,
 				  enum iscsi_err err);
 extern int iscsi_conn_get_param(struct iscsi_cls_conn *cls_conn,
 				enum iscsi_param param, char *buf);
 extern void iscsi_suspend_tx(struct iscsi_conn *conn);
+extern void iscsi_conn_queue_work(struct iscsi_conn *conn);
 
 #define iscsi_conn_printk(prefix, _c, fmt, a...) \
 	iscsi_cls_conn_printk(prefix, ((struct iscsi_conn *)_c)->cls_conn, \
diff --git a/include/scsi/osd_attributes.h b/include/scsi/osd_attributes.h
new file mode 100644
index 000000000000..f888a6fda073
--- /dev/null
+++ b/include/scsi/osd_attributes.h
@@ -0,0 +1,327 @@
+#ifndef __OSD_ATTRIBUTES_H__
+#define __OSD_ATTRIBUTES_H__
+
+#include "osd_protocol.h"
+
+/*
+ * Contains types and constants that define attribute pages and attribute
+ * numbers and their data types.
+ */
+
+#define ATTR_SET(pg, id, l, ptr) \
+	{ .attr_page = pg, .attr_id = id, .len = l, .val_ptr = ptr }
+
+#define ATTR_DEF(pg, id, l) ATTR_SET(pg, id, l, NULL)
+
+/* osd-r10 4.7.3 Attributes pages */
+enum {
+	OSD_APAGE_OBJECT_FIRST		= 0x0,
+	OSD_APAGE_OBJECT_DIRECTORY	= 0,
+	OSD_APAGE_OBJECT_INFORMATION	= 1,
+	OSD_APAGE_OBJECT_QUOTAS		= 2,
+	OSD_APAGE_OBJECT_TIMESTAMP	= 3,
+	OSD_APAGE_OBJECT_COLLECTIONS	= 4,
+	OSD_APAGE_OBJECT_SECURITY	= 5,
+	OSD_APAGE_OBJECT_LAST		= 0x2fffffff,
+
+	OSD_APAGE_PARTITION_FIRST	= 0x30000000,
+	OSD_APAGE_PARTITION_DIRECTORY	= OSD_APAGE_PARTITION_FIRST + 0,
+	OSD_APAGE_PARTITION_INFORMATION = OSD_APAGE_PARTITION_FIRST + 1,
+	OSD_APAGE_PARTITION_QUOTAS	= OSD_APAGE_PARTITION_FIRST + 2,
+	OSD_APAGE_PARTITION_TIMESTAMP	= OSD_APAGE_PARTITION_FIRST + 3,
+	OSD_APAGE_PARTITION_SECURITY	= OSD_APAGE_PARTITION_FIRST + 5,
+	OSD_APAGE_PARTITION_LAST	= 0x5FFFFFFF,
+
+	OSD_APAGE_COLLECTION_FIRST	= 0x60000000,
+	OSD_APAGE_COLLECTION_DIRECTORY	= OSD_APAGE_COLLECTION_FIRST + 0,
+	OSD_APAGE_COLLECTION_INFORMATION = OSD_APAGE_COLLECTION_FIRST + 1,
+	OSD_APAGE_COLLECTION_TIMESTAMP	= OSD_APAGE_COLLECTION_FIRST + 3,
+	OSD_APAGE_COLLECTION_SECURITY	= OSD_APAGE_COLLECTION_FIRST + 5,
+	OSD_APAGE_COLLECTION_LAST	= 0x8FFFFFFF,
+
+	OSD_APAGE_ROOT_FIRST		= 0x90000000,
+	OSD_APAGE_ROOT_DIRECTORY	= OSD_APAGE_ROOT_FIRST + 0,
+	OSD_APAGE_ROOT_INFORMATION	= OSD_APAGE_ROOT_FIRST + 1,
+	OSD_APAGE_ROOT_QUOTAS		= OSD_APAGE_ROOT_FIRST + 2,
+	OSD_APAGE_ROOT_TIMESTAMP	= OSD_APAGE_ROOT_FIRST + 3,
+	OSD_APAGE_ROOT_SECURITY		= OSD_APAGE_ROOT_FIRST + 5,
+	OSD_APAGE_ROOT_LAST		= 0xBFFFFFFF,
+
+	OSD_APAGE_RESERVED_TYPE_FIRST	= 0xC0000000,
+	OSD_APAGE_RESERVED_TYPE_LAST	= 0xEFFFFFFF,
+
+	OSD_APAGE_COMMON_FIRST		= 0xF0000000,
+	OSD_APAGE_COMMON_LAST		= 0xFFFFFFFE,
+
+	OSD_APAGE_REQUEST_ALL		= 0xFFFFFFFF,
+};
+
+/* subcategories of attr pages within each range above */
+enum {
+	OSD_APAGE_STD_FIRST		= 0x0,
+	OSD_APAGE_STD_DIRECTORY		= 0,
+	OSD_APAGE_STD_INFORMATION	= 1,
+	OSD_APAGE_STD_QUOTAS		= 2,
+	OSD_APAGE_STD_TIMESTAMP		= 3,
+	OSD_APAGE_STD_COLLECTIONS	= 4,
+	OSD_APAGE_STD_POLICY_SECURITY	= 5,
+	OSD_APAGE_STD_LAST		= 0x0000007F,
+
+	OSD_APAGE_RESERVED_FIRST	= 0x00000080,
+	OSD_APAGE_RESERVED_LAST		= 0x00007FFF,
+
+	OSD_APAGE_OTHER_STD_FIRST	= 0x00008000,
+	OSD_APAGE_OTHER_STD_LAST	= 0x0000EFFF,
+
+	OSD_APAGE_PUBLIC_FIRST		= 0x0000F000,
+	OSD_APAGE_PUBLIC_LAST		= 0x0000FFFF,
+
+	OSD_APAGE_APP_DEFINED_FIRST	= 0x00010000,
+	OSD_APAGE_APP_DEFINED_LAST	= 0x1FFFFFFF,
+
+	OSD_APAGE_VENDOR_SPECIFIC_FIRST	= 0x20000000,
+	OSD_APAGE_VENDOR_SPECIFIC_LAST	= 0x2FFFFFFF,
+};
+
+enum {
+	OSD_ATTR_PAGE_IDENTIFICATION = 0, /* in all pages 40 bytes */
+};
+
+struct page_identification {
+	u8 vendor_identification[8];
+	u8 page_identification[32];
+}  __packed;
+
+struct osd_attr_page_header {
+	__be32 page_number;
+	__be32 page_length;
+} __packed;
+
+/* 7.1.2.8 Root Information attributes page (OSD_APAGE_ROOT_INFORMATION) */
+enum {
+	OSD_ATTR_RI_OSD_SYSTEM_ID            = 0x3,   /* 20       */
+	OSD_ATTR_RI_VENDOR_IDENTIFICATION    = 0x4,   /* 8        */
+	OSD_ATTR_RI_PRODUCT_IDENTIFICATION   = 0x5,   /* 16       */
+	OSD_ATTR_RI_PRODUCT_MODEL            = 0x6,   /* 32       */
+	OSD_ATTR_RI_PRODUCT_REVISION_LEVEL   = 0x7,   /* 4        */
+	OSD_ATTR_RI_PRODUCT_SERIAL_NUMBER    = 0x8,   /* variable */
+	OSD_ATTR_RI_OSD_NAME                 = 0x9,   /* variable */
+	OSD_ATTR_RI_TOTAL_CAPACITY           = 0x80,  /* 8        */
+	OSD_ATTR_RI_USED_CAPACITY            = 0x81,  /* 8        */
+	OSD_ATTR_RI_NUMBER_OF_PARTITIONS     = 0xC0,  /* 8        */
+	OSD_ATTR_RI_CLOCK                    = 0x100, /* 6        */
+};
+/* Root_Information_attributes_page does not have a get_page structure */
+
+/* 7.1.2.9 Partition Information attributes page
+ * (OSD_APAGE_PARTITION_INFORMATION)
+ */
+enum {
+	OSD_ATTR_PI_PARTITION_ID            = 0x1,     /* 8        */
+	OSD_ATTR_PI_USERNAME                = 0x9,     /* variable */
+	OSD_ATTR_PI_USED_CAPACITY           = 0x81,    /* 8        */
+	OSD_ATTR_PI_NUMBER_OF_OBJECTS       = 0xC1,    /* 8        */
+};
+/* Partition Information attributes page does not have a get_page structure */
+
+/* 7.1.2.10 Collection Information attributes page
+ * (OSD_APAGE_COLLECTION_INFORMATION)
+ */
+enum {
+	OSD_ATTR_CI_PARTITION_ID           = 0x1,       /* 8        */
+	OSD_ATTR_CI_COLLECTION_OBJECT_ID   = 0x2,       /* 8        */
+	OSD_ATTR_CI_USERNAME               = 0x9,       /* variable */
+	OSD_ATTR_CI_USED_CAPACITY          = 0x81,      /* 8        */
+};
+/* Collection Information attributes page does not have a get_page structure */
+
+/* 7.1.2.11 User Object Information attributes page
+ * (OSD_APAGE_OBJECT_INFORMATION)
+ */
+enum {
+	OSD_ATTR_OI_PARTITION_ID         = 0x1,       /* 8        */
+	OSD_ATTR_OI_OBJECT_ID            = 0x2,       /* 8        */
+	OSD_ATTR_OI_USERNAME             = 0x9,       /* variable */
+	OSD_ATTR_OI_USED_CAPACITY        = 0x81,      /* 8        */
+	OSD_ATTR_OI_LOGICAL_LENGTH       = 0x82,      /* 8        */
+};
+/* Object Information attributes page does not have a get_page structure */
+
+/* 7.1.2.12 Root Quotas attributes page (OSD_APAGE_ROOT_QUOTAS) */
+enum {
+	OSD_ATTR_RQ_DEFAULT_MAXIMUM_USER_OBJECT_LENGTH     = 0x1,      /* 8  */
+	OSD_ATTR_RQ_PARTITION_CAPACITY_QUOTA               = 0x10001,  /* 8  */
+	OSD_ATTR_RQ_PARTITION_OBJECT_COUNT                 = 0x10002,  /* 8  */
+	OSD_ATTR_RQ_PARTITION_COLLECTIONS_PER_USER_OBJECT  = 0x10081,  /* 4  */
+	OSD_ATTR_RQ_PARTITION_COUNT                        = 0x20002,  /* 8  */
+};
+
+struct Root_Quotas_attributes_page {
+	struct osd_attr_page_header hdr; /* id=R+2, size=0x24 */
+	__be64 default_maximum_user_object_length;
+	__be64 partition_capacity_quota;
+	__be64 partition_object_count;
+	__be64 partition_collections_per_user_object;
+	__be64 partition_count;
+}  __packed;
+
+/* 7.1.2.13 Partition Quotas attributes page (OSD_APAGE_PARTITION_QUOTAS)*/
+enum {
+	OSD_ATTR_PQ_DEFAULT_MAXIMUM_USER_OBJECT_LENGTH  = 0x1,        /* 8 */
+	OSD_ATTR_PQ_CAPACITY_QUOTA                      = 0x10001,    /* 8 */
+	OSD_ATTR_PQ_OBJECT_COUNT                        = 0x10002,    /* 8 */
+	OSD_ATTR_PQ_COLLECTIONS_PER_USER_OBJECT         = 0x10081,    /* 4 */
+};
+
+struct Partition_Quotas_attributes_page {
+	struct osd_attr_page_header hdr; /* id=P+2, size=0x1C */
+	__be64 default_maximum_user_object_length;
+	__be64 capacity_quota;
+	__be64 object_count;
+	__be64 collections_per_user_object;
+}  __packed;
+
+/* 7.1.2.14 User Object Quotas attributes page (OSD_APAGE_OBJECT_QUOTAS) */
+enum {
+	OSD_ATTR_OQ_MAXIMUM_LENGTH  = 0x1,        /* 8 */
+};
+
+struct Object_Quotas_attributes_page {
+	struct osd_attr_page_header hdr; /* id=U+2, size=0x8 */
+	__be64 maximum_length;
+}  __packed;
+
+/* 7.1.2.15 Root Timestamps attributes page (OSD_APAGE_ROOT_TIMESTAMP) */
+enum {
+	OSD_ATTR_RT_ATTRIBUTES_ACCESSED_TIME  = 0x2,        /* 6 */
+	OSD_ATTR_RT_ATTRIBUTES_MODIFIED_TIME  = 0x3,        /* 6 */
+	OSD_ATTR_RT_TIMESTAMP_BYPASS          = 0xFFFFFFFE, /* 1 */
+};
+
+struct root_timestamps_attributes_page {
+	struct osd_attr_page_header hdr; /* id=R+3, size=0xD */
+	struct osd_timestamp attributes_accessed_time;
+	struct osd_timestamp attributes_modified_time;
+	u8 timestamp_bypass;
+}  __packed;
+
+/* 7.1.2.16 Partition Timestamps attributes page
+ * (OSD_APAGE_PARTITION_TIMESTAMP)
+ */
+enum {
+	OSD_ATTR_PT_CREATED_TIME              = 0x1,        /* 6 */
+	OSD_ATTR_PT_ATTRIBUTES_ACCESSED_TIME  = 0x2,        /* 6 */
+	OSD_ATTR_PT_ATTRIBUTES_MODIFIED_TIME  = 0x3,        /* 6 */
+	OSD_ATTR_PT_DATA_ACCESSED_TIME        = 0x4,        /* 6 */
+	OSD_ATTR_PT_DATA_MODIFIED_TIME        = 0x5,        /* 6 */
+	OSD_ATTR_PT_TIMESTAMP_BYPASS          = 0xFFFFFFFE, /* 1 */
+};
+
+struct partition_timestamps_attributes_page {
+	struct osd_attr_page_header hdr; /* id=P+3, size=0x1F */
+	struct osd_timestamp created_time;
+	struct osd_timestamp attributes_accessed_time;
+	struct osd_timestamp attributes_modified_time;
+	struct osd_timestamp data_accessed_time;
+	struct osd_timestamp data_modified_time;
+	u8 timestamp_bypass;
+}  __packed;
+
+/* 7.1.2.17/18 Collection/Object Timestamps attributes page
+ * (OSD_APAGE_COLLECTION_TIMESTAMP/OSD_APAGE_OBJECT_TIMESTAMP)
+ */
+enum {
+	OSD_ATTR_OT_CREATED_TIME              = 0x1,        /* 6 */
+	OSD_ATTR_OT_ATTRIBUTES_ACCESSED_TIME  = 0x2,        /* 6 */
+	OSD_ATTR_OT_ATTRIBUTES_MODIFIED_TIME  = 0x3,        /* 6 */
+	OSD_ATTR_OT_DATA_ACCESSED_TIME        = 0x4,        /* 6 */
+	OSD_ATTR_OT_DATA_MODIFIED_TIME        = 0x5,        /* 6 */
+};
+
+/* same for collection */
+struct object_timestamps_attributes_page {
+	struct osd_attr_page_header hdr; /* id=C+3/3, size=0x1E */
+	struct osd_timestamp created_time;
+	struct osd_timestamp attributes_accessed_time;
+	struct osd_timestamp attributes_modified_time;
+	struct osd_timestamp data_accessed_time;
+	struct osd_timestamp data_modified_time;
+}  __packed;
+
+/* 7.1.2.19 Collections attributes page */
+/* TBD */
+
+/* 7.1.2.20 Root Policy/Security attributes page (OSD_APAGE_ROOT_SECURITY) */
+enum {
+	OSD_ATTR_RS_DEFAULT_SECURITY_METHOD           = 0x1,       /* 1      */
+	OSD_ATTR_RS_OLDEST_VALID_NONCE_LIMIT          = 0x2,       /* 6      */
+	OSD_ATTR_RS_NEWEST_VALID_NONCE_LIMIT          = 0x3,       /* 6      */
+	OSD_ATTR_RS_PARTITION_DEFAULT_SECURITY_METHOD = 0x6,       /* 1      */
+	OSD_ATTR_RS_SUPPORTED_SECURITY_METHODS        = 0x7,       /* 2      */
+	OSD_ATTR_RS_ADJUSTABLE_CLOCK                  = 0x9,       /* 6      */
+	OSD_ATTR_RS_MASTER_KEY_IDENTIFIER             = 0x7FFD,    /* 0 or 7 */
+	OSD_ATTR_RS_ROOT_KEY_IDENTIFIER               = 0x7FFE,    /* 0 or 7 */
+	OSD_ATTR_RS_SUPPORTED_INTEGRITY_ALGORITHM_0   = 0x80000000,/* 1,(x16)*/
+	OSD_ATTR_RS_SUPPORTED_DH_GROUP_0              = 0x80000010,/* 1,(x16)*/
+};
+
+struct root_security_attributes_page {
+	struct osd_attr_page_header hdr; /* id=R+5, size=0x3F */
+	u8 default_security_method;
+	u8 partition_default_security_method;
+	__be16 supported_security_methods;
+	u8 mki_valid_rki_valid;
+	struct osd_timestamp oldest_valid_nonce_limit;
+	struct osd_timestamp newest_valid_nonce_limit;
+	struct osd_timestamp adjustable_clock;
+	u8 master_key_identifier[32-25];
+	u8 root_key_identifier[39-32];
+	u8 supported_integrity_algorithm[16];
+	u8 supported_dh_group[16];
+}  __packed;
+
+/* 7.1.2.21 Partition Policy/Security attributes page
+ * (OSD_APAGE_PARTITION_SECURITY)
+ */
+enum {
+	OSD_ATTR_PS_DEFAULT_SECURITY_METHOD        = 0x1,        /* 1      */
+	OSD_ATTR_PS_OLDEST_VALID_NONCE             = 0x2,        /* 6      */
+	OSD_ATTR_PS_NEWEST_VALID_NONCE             = 0x3,        /* 6      */
+	OSD_ATTR_PS_REQUEST_NONCE_LIST_DEPTH       = 0x4,        /* 2      */
+	OSD_ATTR_PS_FROZEN_WORKING_KEY_BIT_MASK    = 0x5,        /* 2      */
+	OSD_ATTR_PS_PARTITION_KEY_IDENTIFIER       = 0x7FFF,     /* 0 or 7 */
+	OSD_ATTR_PS_WORKING_KEY_IDENTIFIER_FIRST   = 0x8000,     /* 0 or 7 */
+	OSD_ATTR_PS_WORKING_KEY_IDENTIFIER_LAST    = 0x800F,     /* 0 or 7 */
+	OSD_ATTR_PS_POLICY_ACCESS_TAG              = 0x40000001, /* 4      */
+	OSD_ATTR_PS_USER_OBJECT_POLICY_ACCESS_TAG  = 0x40000002, /* 4      */
+};
+
+struct partition_security_attributes_page {
+	struct osd_attr_page_header hdr; /* id=p+5, size=0x8f */
+	u8 reserved[3];
+	u8 default_security_method;
+	struct osd_timestamp oldest_valid_nonce;
+	struct osd_timestamp newest_valid_nonce;
+	__be16 request_nonce_list_depth;
+	__be16 frozen_working_key_bit_mask;
+	__be32 policy_access_tag;
+	__be32 user_object_policy_access_tag;
+	u8 pki_valid;
+	__be16 wki_00_0f_vld;
+	struct osd_key_identifier partition_key_identifier;
+	struct osd_key_identifier working_key_identifiers[16];
+}  __packed;
+
+/* 7.1.2.22/23 Collection/Object Policy-Security attributes page
+ * (OSD_APAGE_COLLECTION_SECURITY/OSD_APAGE_OBJECT_SECURITY)
+ */
+enum {
+	OSD_ATTR_OS_POLICY_ACCESS_TAG              = 0x40000001, /* 4      */
+};
+
+struct object_security_attributes_page {
+	struct osd_attr_page_header hdr; /* id=C+5/5, size=4 */
+	__be32 policy_access_tag;
+}  __packed;
+
+#endif /*ndef __OSD_ATTRIBUTES_H__*/
diff --git a/include/scsi/osd_initiator.h b/include/scsi/osd_initiator.h
new file mode 100644
index 000000000000..b24d9616eb46
--- /dev/null
+++ b/include/scsi/osd_initiator.h
@@ -0,0 +1,433 @@
+/*
+ * osd_initiator.h - OSD initiator API definition
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ */
+#ifndef __OSD_INITIATOR_H__
+#define __OSD_INITIATOR_H__
+
+#include "osd_protocol.h"
+#include "osd_types.h"
+
+#include <linux/blkdev.h>
+
+/* Note: "NI" in comments below means "Not Implemented yet" */
+
+/* Configure of code:
+ * #undef if you *don't* want OSD v1 support in runtime.
+ * If #defined the initiator will dynamically configure to encode OSD v1
+ * CDB's if the target is detected to be OSD v1 only.
+ * OSD v2 only commands, options, and attributes will be ignored if target
+ * is v1 only.
+ * If #defined will result in bigger/slower code (OK Slower maybe not)
+ * Q: Should this be CONFIG_SCSI_OSD_VER1_SUPPORT and set from Kconfig?
+ */
+#define OSD_VER1_SUPPORT y
+
+enum osd_std_version {
+	OSD_VER_NONE = 0,
+	OSD_VER1 = 1,
+	OSD_VER2 = 2,
+};
+
+/*
+ * Object-based Storage Device.
+ * This object represents an OSD device.
+ * It is not a full linux device in any way. It is only
+ * a place to hang resources associated with a Linux
+ * request Q and some default properties.
+ */
+struct osd_dev {
+	struct scsi_device *scsi_device;
+	unsigned def_timeout;
+
+#ifdef OSD_VER1_SUPPORT
+	enum osd_std_version version;
+#endif
+};
+
+/* Retrieve/return osd_dev(s) for use by Kernel clients */
+struct osd_dev *osduld_path_lookup(const char *dev_name); /*Use IS_ERR/ERR_PTR*/
+void osduld_put_device(struct osd_dev *od);
+
+/* Add/remove test ioctls from external modules */
+typedef int (do_test_fn)(struct osd_dev *od, unsigned cmd, unsigned long arg);
+int osduld_register_test(unsigned ioctl, do_test_fn *do_test);
+void osduld_unregister_test(unsigned ioctl);
+
+/* These are called by uld at probe time */
+void osd_dev_init(struct osd_dev *od, struct scsi_device *scsi_device);
+void osd_dev_fini(struct osd_dev *od);
+
+/* some hi level device operations */
+int osd_auto_detect_ver(struct osd_dev *od, void *caps);    /* GFP_KERNEL */
+
+/* we might want to use function vector in the future */
+static inline void osd_dev_set_ver(struct osd_dev *od, enum osd_std_version v)
+{
+#ifdef OSD_VER1_SUPPORT
+	od->version = v;
+#endif
+}
+
+struct osd_request;
+typedef void (osd_req_done_fn)(struct osd_request *or, void *private);
+
+struct osd_request {
+	struct osd_cdb cdb;
+	struct osd_data_out_integrity_info out_data_integ;
+	struct osd_data_in_integrity_info in_data_integ;
+
+	struct osd_dev *osd_dev;
+	struct request *request;
+
+	struct _osd_req_data_segment {
+		void *buff;
+		unsigned alloc_size; /* 0 here means: don't call kfree */
+		unsigned total_bytes;
+	} set_attr, enc_get_attr, get_attr;
+
+	struct _osd_io_info {
+		struct bio *bio;
+		u64 total_bytes;
+		struct request *req;
+		struct _osd_req_data_segment *last_seg;
+		u8 *pad_buff;
+	} out, in;
+
+	gfp_t alloc_flags;
+	unsigned timeout;
+	unsigned retries;
+	u8 sense[OSD_MAX_SENSE_LEN];
+	enum osd_attributes_mode attributes_mode;
+
+	osd_req_done_fn *async_done;
+	void *async_private;
+	int async_error;
+};
+
+/* OSD Version control */
+static inline bool osd_req_is_ver1(struct osd_request *or)
+{
+#ifdef OSD_VER1_SUPPORT
+	return or->osd_dev->version == OSD_VER1;
+#else
+	return false;
+#endif
+}
+
+/*
+ * How to use the osd library:
+ *
+ * osd_start_request
+ *	Allocates a request.
+ *
+ * osd_req_*
+ *	Call one of, to encode the desired operation.
+ *
+ * osd_add_{get,set}_attr
+ *	Optionally add attributes to the CDB, list or page mode.
+ *
+ * osd_finalize_request
+ *	Computes final data out/in offsets and signs the request,
+ *	making it ready for execution.
+ *
+ * osd_execute_request
+ *	May be called to execute it through the block layer. Other wise submit
+ *	the associated block request in some other way.
+ *
+ * After execution:
+ * osd_req_decode_sense
+ *	Decodes sense information to verify execution results.
+ *
+ * osd_req_decode_get_attr
+ *	Retrieve osd_add_get_attr_list() values if used.
+ *
+ * osd_end_request
+ *	Must be called to deallocate the request.
+ */
+
+/**
+ * osd_start_request - Allocate and initialize an osd_request
+ *
+ * @osd_dev:    OSD device that holds the scsi-device and default values
+ *              that the request is associated with.
+ * @gfp:        The allocation flags to use for request allocation, and all
+ *              subsequent allocations. This will be stored at
+ *              osd_request->alloc_flags, can be changed by user later
+ *
+ * Allocate osd_request and initialize all members to the
+ * default/initial state.
+ */
+struct osd_request *osd_start_request(struct osd_dev *od, gfp_t gfp);
+
+enum osd_req_options {
+	OSD_REQ_FUA = 0x08,	/* Force Unit Access */
+	OSD_REQ_DPO = 0x10,	/* Disable Page Out */
+
+	OSD_REQ_BYPASS_TIMESTAMPS = 0x80,
+};
+
+/**
+ * osd_finalize_request - Sign request and prepare request for execution
+ *
+ * @or:		osd_request to prepare
+ * @options:	combination of osd_req_options bit flags or 0.
+ * @cap:	A Pointer to an OSD_CAP_LEN bytes buffer that is received from
+ *              The security manager as capabilities for this cdb.
+ * @cap_key:	The cryptographic key used to sign the cdb/data. Can be null
+ *              if NOSEC is used.
+ *
+ * The actual request and bios are only allocated here, so are the get_attr
+ * buffers that will receive the returned attributes. Copy's @cap to cdb.
+ * Sign the cdb/data with @cap_key.
+ */
+int osd_finalize_request(struct osd_request *or,
+	u8 options, const void *cap, const u8 *cap_key);
+
+/**
+ * osd_execute_request - Execute the request synchronously through block-layer
+ *
+ * @or:		osd_request to Executed
+ *
+ * Calls blk_execute_rq to q the command and waits for completion.
+ */
+int osd_execute_request(struct osd_request *or);
+
+/**
+ * osd_execute_request_async - Execute the request without waitting.
+ *
+ * @or:                      - osd_request to Executed
+ * @done: (Optional)         - Called at end of execution
+ * @private:                 - Will be passed to @done function
+ *
+ * Calls blk_execute_rq_nowait to queue the command. When execution is done
+ * optionally calls @done with @private as parameter. @or->async_error will
+ * have the return code
+ */
+int osd_execute_request_async(struct osd_request *or,
+	osd_req_done_fn *done, void *private);
+
+/**
+ * osd_req_decode_sense_full - Decode sense information after execution.
+ *
+ * @or:           - osd_request to examine
+ * @osi           - Recievs a more detailed error report information (optional).
+ * @silent        - Do not print to dmsg (Even if enabled)
+ * @bad_obj_list  - Some commands act on multiple objects. Failed objects will
+ *                  be recieved here (optional)
+ * @max_obj       - Size of @bad_obj_list.
+ * @bad_attr_list - List of failing attributes (optional)
+ * @max_attr      - Size of @bad_attr_list.
+ *
+ * After execution, sense + return code can be analyzed using this function. The
+ * return code is the final disposition on the error. So it is possible that a
+ * CHECK_CONDITION was returned from target but this will return NO_ERROR, for
+ * example on recovered errors. All parameters are optional if caller does
+ * not need any returned information.
+ * Note: This function will also dump the error to dmsg according to settings
+ * of the SCSI_OSD_DPRINT_SENSE Kconfig value. Set @silent if you know the
+ * command would routinely fail, to not spam the dmsg file.
+ */
+struct osd_sense_info {
+	int key;		/* one of enum scsi_sense_keys */
+	int additional_code ;	/* enum osd_additional_sense_codes */
+	union { /* Sense specific information */
+		u16 sense_info;
+		u16 cdb_field_offset; 	/* scsi_invalid_field_in_cdb */
+	};
+	union { /* Command specific information */
+		u64 command_info;
+	};
+
+	u32 not_initiated_command_functions; /* osd_command_functions_bits */
+	u32 completed_command_functions; /* osd_command_functions_bits */
+	struct osd_obj_id obj;
+	struct osd_attr attr;
+};
+
+int osd_req_decode_sense_full(struct osd_request *or,
+	struct osd_sense_info *osi, bool silent,
+	struct osd_obj_id *bad_obj_list, int max_obj,
+	struct osd_attr *bad_attr_list, int max_attr);
+
+static inline int osd_req_decode_sense(struct osd_request *or,
+	struct osd_sense_info *osi)
+{
+	return osd_req_decode_sense_full(or, osi, false, NULL, 0, NULL, 0);
+}
+
+/**
+ * osd_end_request - return osd_request to free store
+ *
+ * @or:		osd_request to free
+ *
+ * Deallocate all osd_request resources (struct req's, BIOs, buffers, etc.)
+ */
+void osd_end_request(struct osd_request *or);
+
+/*
+ * CDB Encoding
+ *
+ * Note: call only one of the following methods.
+ */
+
+/*
+ * Device commands
+ */
+void osd_req_set_master_seed_xchg(struct osd_request *or, ...);/* NI */
+void osd_req_set_master_key(struct osd_request *or, ...);/* NI */
+
+void osd_req_format(struct osd_request *or, u64 tot_capacity);
+
+/* list all partitions
+ * @list header must be initialized to zero on first run.
+ *
+ * Call osd_is_obj_list_done() to find if we got the complete list.
+ */
+int osd_req_list_dev_partitions(struct osd_request *or,
+	osd_id initial_id, struct osd_obj_id_list *list, unsigned nelem);
+
+void osd_req_flush_obsd(struct osd_request *or,
+	enum osd_options_flush_scope_values);
+
+void osd_req_perform_scsi_command(struct osd_request *or,
+	const u8 *cdb, ...);/* NI */
+void osd_req_task_management(struct osd_request *or, ...);/* NI */
+
+/*
+ * Partition commands
+ */
+void osd_req_create_partition(struct osd_request *or, osd_id partition);
+void osd_req_remove_partition(struct osd_request *or, osd_id partition);
+
+void osd_req_set_partition_key(struct osd_request *or,
+	osd_id partition, u8 new_key_id[OSD_CRYPTO_KEYID_SIZE],
+	u8 seed[OSD_CRYPTO_SEED_SIZE]);/* NI */
+
+/* list all collections in the partition
+ * @list header must be init to zero on first run.
+ *
+ * Call osd_is_obj_list_done() to find if we got the complete list.
+ */
+int osd_req_list_partition_collections(struct osd_request *or,
+	osd_id partition, osd_id initial_id, struct osd_obj_id_list *list,
+	unsigned nelem);
+
+/* list all objects in the partition
+ * @list header must be init to zero on first run.
+ *
+ * Call osd_is_obj_list_done() to find if we got the complete list.
+ */
+int osd_req_list_partition_objects(struct osd_request *or,
+	osd_id partition, osd_id initial_id, struct osd_obj_id_list *list,
+	unsigned nelem);
+
+void osd_req_flush_partition(struct osd_request *or,
+	osd_id partition, enum osd_options_flush_scope_values);
+
+/*
+ * Collection commands
+ */
+void osd_req_create_collection(struct osd_request *or,
+	const struct osd_obj_id *);/* NI */
+void osd_req_remove_collection(struct osd_request *or,
+	const struct osd_obj_id *);/* NI */
+
+/* list all objects in the collection */
+int osd_req_list_collection_objects(struct osd_request *or,
+	const struct osd_obj_id *, osd_id initial_id,
+	struct osd_obj_id_list *list, unsigned nelem);
+
+/* V2 only filtered list of objects in the collection */
+void osd_req_query(struct osd_request *or, ...);/* NI */
+
+void osd_req_flush_collection(struct osd_request *or,
+	const struct osd_obj_id *, enum osd_options_flush_scope_values);
+
+void osd_req_get_member_attrs(struct osd_request *or, ...);/* V2-only NI */
+void osd_req_set_member_attrs(struct osd_request *or, ...);/* V2-only NI */
+
+/*
+ * Object commands
+ */
+void osd_req_create_object(struct osd_request *or, struct osd_obj_id *);
+void osd_req_remove_object(struct osd_request *or, struct osd_obj_id *);
+
+void osd_req_write(struct osd_request *or,
+	const struct osd_obj_id *, struct bio *data_out, u64 offset);
+void osd_req_append(struct osd_request *or,
+	const struct osd_obj_id *, struct bio *data_out);/* NI */
+void osd_req_create_write(struct osd_request *or,
+	const struct osd_obj_id *, struct bio *data_out, u64 offset);/* NI */
+void osd_req_clear(struct osd_request *or,
+	const struct osd_obj_id *, u64 offset, u64 len);/* NI */
+void osd_req_punch(struct osd_request *or,
+	const struct osd_obj_id *, u64 offset, u64 len);/* V2-only NI */
+
+void osd_req_flush_object(struct osd_request *or,
+	const struct osd_obj_id *, enum osd_options_flush_scope_values,
+	/*V2*/ u64 offset, /*V2*/ u64 len);
+
+void osd_req_read(struct osd_request *or,
+	const struct osd_obj_id *, struct bio *data_in, u64 offset);
+
+/*
+ * Root/Partition/Collection/Object Attributes commands
+ */
+
+/* get before set */
+void osd_req_get_attributes(struct osd_request *or, const struct osd_obj_id *);
+
+/* set before get */
+void osd_req_set_attributes(struct osd_request *or, const struct osd_obj_id *);
+
+/*
+ * Attributes appended to most commands
+ */
+
+/* Attributes List mode (or V2 CDB) */
+  /*
+   * TODO: In ver2 if at finalize time only one attr was set and no gets,
+   * then the Attributes CDB mode is used automatically to save IO.
+   */
+
+/* set a list of attributes. */
+int osd_req_add_set_attr_list(struct osd_request *or,
+	const struct osd_attr *, unsigned nelem);
+
+/* get a list of attributes */
+int osd_req_add_get_attr_list(struct osd_request *or,
+	const struct osd_attr *, unsigned nelem);
+
+/*
+ * Attributes list decoding
+ * Must be called after osd_request.request was executed
+ * It is called in a loop to decode the returned get_attr
+ * (see osd_add_get_attr)
+ */
+int osd_req_decode_get_attr_list(struct osd_request *or,
+	struct osd_attr *, int *nelem, void **iterator);
+
+/* Attributes Page mode */
+
+/*
+ * Read an attribute page and optionally set one attribute
+ *
+ * Retrieves the attribute page directly to a user buffer.
+ * @attr_page_data shall stay valid until end of execution.
+ * See osd_attributes.h for common page structures
+ */
+int osd_req_add_get_attr_page(struct osd_request *or,
+	u32 page_id, void *attr_page_data, unsigned max_page_len,
+	const struct osd_attr *set_one);
+
+#endif /* __OSD_LIB_H__ */
diff --git a/include/scsi/osd_protocol.h b/include/scsi/osd_protocol.h
new file mode 100644
index 000000000000..cd3cbf764650
--- /dev/null
+++ b/include/scsi/osd_protocol.h
@@ -0,0 +1,579 @@
+/*
+ * osd_protocol.h - OSD T10 standard C definitions.
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ * This file contains types and constants that are defined by the protocol
+ * Note: All names and symbols are taken from the OSD standard's text.
+ */
+#ifndef __OSD_PROTOCOL_H__
+#define __OSD_PROTOCOL_H__
+
+#include <linux/types.h>
+#include <asm/unaligned.h>
+#include <scsi/scsi.h>
+
+enum {
+	OSDv1_ADDITIONAL_CDB_LENGTH = 192,
+	OSDv1_TOTAL_CDB_LEN = OSDv1_ADDITIONAL_CDB_LENGTH + 8,
+	OSDv1_CAP_LEN = 80,
+	/* Latest supported version */
+/* 	OSD_ADDITIONAL_CDB_LENGTH = 216,*/
+	OSD_ADDITIONAL_CDB_LENGTH =
+		OSDv1_ADDITIONAL_CDB_LENGTH, /* FIXME: Pete rev-001 sup */
+	OSD_TOTAL_CDB_LEN = OSD_ADDITIONAL_CDB_LENGTH + 8,
+/* 	OSD_CAP_LEN = 104,*/
+	OSD_CAP_LEN = OSDv1_CAP_LEN,/* FIXME: Pete rev-001 sup */
+
+	OSD_SYSTEMID_LEN = 20,
+	OSD_CRYPTO_KEYID_SIZE = 20,
+	/*FIXME: OSDv2_CRYPTO_KEYID_SIZE = 32,*/
+	OSD_CRYPTO_SEED_SIZE = 4,
+	OSD_CRYPTO_NONCE_SIZE = 12,
+	OSD_MAX_SENSE_LEN = 252, /* from SPC-3 */
+
+	OSD_PARTITION_FIRST_ID = 0x10000,
+	OSD_OBJECT_FIRST_ID = 0x10000,
+};
+
+/* (osd-r10 5.2.4)
+ * osd2r03: 5.2.3 Caching control bits
+ */
+enum osd_options_byte {
+	OSD_CDB_FUA = 0x08,	/* Force Unit Access */
+	OSD_CDB_DPO = 0x10,	/* Disable Page Out */
+};
+
+/*
+ * osd2r03: 5.2.5 Isolation.
+ * First 3 bits, V2-only.
+ * Also for attr 110h "default isolation method" at Root Information page
+ */
+enum osd_options_byte_isolation {
+	OSD_ISOLATION_DEFAULT = 0,
+	OSD_ISOLATION_NONE = 1,
+	OSD_ISOLATION_STRICT = 2,
+	OSD_ISOLATION_RANGE = 4,
+	OSD_ISOLATION_FUNCTIONAL = 5,
+	OSD_ISOLATION_VENDOR = 7,
+};
+
+/* (osd-r10: 6.7)
+ * osd2r03: 6.8 FLUSH, FLUSH COLLECTION, FLUSH OSD, FLUSH PARTITION
+ */
+enum osd_options_flush_scope_values {
+	OSD_CDB_FLUSH_ALL = 0,
+	OSD_CDB_FLUSH_ATTR_ONLY = 1,
+
+	OSD_CDB_FLUSH_ALL_RECURSIVE = 2,
+	/* V2-only */
+	OSD_CDB_FLUSH_ALL_RANGE = 2,
+};
+
+/* osd2r03: 5.2.10 Timestamps control */
+enum {
+	OSD_CDB_NORMAL_TIMESTAMPS = 0,
+	OSD_CDB_BYPASS_TIMESTAMPS = 0x7f,
+};
+
+/* (osd-r10: 5.2.2.1)
+ * osd2r03: 5.2.4.1 Get and set attributes CDB format selection
+ *	2 bits at second nibble of command_specific_options byte
+ */
+enum osd_attributes_mode {
+	/* V2-only */
+	OSD_CDB_SET_ONE_ATTR = 0x10,
+
+	OSD_CDB_GET_ATTR_PAGE_SET_ONE = 0x20,
+	OSD_CDB_GET_SET_ATTR_LISTS = 0x30,
+
+	OSD_CDB_GET_SET_ATTR_MASK = 0x30,
+};
+
+/* (osd-r10: 4.12.5)
+ * osd2r03: 4.14.5 Data-In and Data-Out buffer offsets
+ *	byte offset = mantissa * (2^(exponent+8))
+ *	struct {
+ *		unsigned mantissa: 28;
+ *		int exponent: 04;
+ *	}
+ */
+typedef __be32 __bitwise osd_cdb_offset;
+
+enum {
+	OSD_OFFSET_UNUSED = 0xFFFFFFFF,
+	OSD_OFFSET_MAX_BITS = 28,
+
+	OSDv1_OFFSET_MIN_SHIFT = 8,
+	OSD_OFFSET_MIN_SHIFT = 3,
+	OSD_OFFSET_MAX_SHIFT = 16,
+};
+
+/* Return the smallest allowed encoded offset that contains @offset.
+ *
+ * The actual encoded offset returned is @offset + *padding.
+ * (up to max_shift, non-inclusive)
+ */
+osd_cdb_offset __osd_encode_offset(u64 offset, unsigned *padding,
+	int min_shift, int max_shift);
+
+/* Minimum alignment is 256 bytes
+ * Note: Seems from std v1 that exponent can be from 0+8 to 0xE+8 (inclusive)
+ * which is 8 to 23 but IBM code restricts it to 16, so be it.
+ */
+static inline osd_cdb_offset osd_encode_offset_v1(u64 offset, unsigned *padding)
+{
+	return __osd_encode_offset(offset, padding,
+				OSDv1_OFFSET_MIN_SHIFT, OSD_OFFSET_MAX_SHIFT);
+}
+
+/* Minimum 8 bytes alignment
+ * Same as v1 but since exponent can be signed than a less than
+ * 256 alignment can be reached with small offsets (<2GB)
+ */
+static inline osd_cdb_offset osd_encode_offset_v2(u64 offset, unsigned *padding)
+{
+	return __osd_encode_offset(offset, padding,
+				   OSD_OFFSET_MIN_SHIFT, OSD_OFFSET_MAX_SHIFT);
+}
+
+/* osd2r03: 5.2.1 Overview */
+struct osd_cdb_head {
+	struct scsi_varlen_cdb_hdr varlen_cdb;
+/*10*/	u8		options;
+	u8		command_specific_options;
+	u8		timestamp_control;
+/*13*/	u8		reserved1[3];
+/*16*/	__be64		partition;
+/*24*/	__be64		object;
+/*32*/	union { /* V1 vs V2 alignment differences */
+		struct __osdv1_cdb_addr_len {
+/*32*/			__be32 		list_identifier;/* Rarely used */
+/*36*/			__be64		length;
+/*44*/			__be64		start_address;
+		} __packed v1;
+
+		struct __osdv2_cdb_addr_len {
+			/* called allocation_length in some commands */
+/*32*/			__be64	length;
+/*40*/			__be64	start_address;
+/*48*/			__be32 list_identifier;/* Rarely used */
+		} __packed v2;
+	};
+/*52*/	union { /* selected attributes mode Page/List/Single */
+		struct osd_attributes_page_mode {
+/*52*/			__be32		get_attr_page;
+/*56*/			__be32		get_attr_alloc_length;
+/*60*/			osd_cdb_offset	get_attr_offset;
+
+/*64*/			__be32		set_attr_page;
+/*68*/			__be32		set_attr_id;
+/*72*/			__be32		set_attr_length;
+/*76*/			osd_cdb_offset	set_attr_offset;
+/*80*/		} __packed attrs_page;
+
+		struct osd_attributes_list_mode {
+/*52*/			__be32		get_attr_desc_bytes;
+/*56*/			osd_cdb_offset	get_attr_desc_offset;
+
+/*60*/			__be32		get_attr_alloc_length;
+/*64*/			osd_cdb_offset	get_attr_offset;
+
+/*68*/			__be32		set_attr_bytes;
+/*72*/			osd_cdb_offset	set_attr_offset;
+			__be32 not_used;
+/*80*/		} __packed attrs_list;
+
+		/* osd2r03:5.2.4.2 Set one attribute value using CDB fields */
+		struct osd_attributes_cdb_mode {
+/*52*/			__be32		set_attr_page;
+/*56*/			__be32		set_attr_id;
+/*60*/			__be16		set_attr_len;
+/*62*/			u8		set_attr_val[18];
+/*80*/		} __packed attrs_cdb;
+/*52*/		u8 get_set_attributes_parameters[28];
+	};
+} __packed;
+/*80*/
+
+/*160 v1*/
+/*184 v2*/
+struct osd_security_parameters {
+/*160*/u8	integrity_check_value[OSD_CRYPTO_KEYID_SIZE];
+/*180*/u8	request_nonce[OSD_CRYPTO_NONCE_SIZE];
+/*192*/osd_cdb_offset	data_in_integrity_check_offset;
+/*196*/osd_cdb_offset	data_out_integrity_check_offset;
+} __packed;
+/*200 v1*/
+/*224 v2*/
+
+/* FIXME: osdv2_security_parameters */
+
+struct osdv1_cdb {
+	struct osd_cdb_head h;
+	u8 caps[OSDv1_CAP_LEN];
+	struct osd_security_parameters sec_params;
+} __packed;
+
+struct osdv2_cdb {
+	struct osd_cdb_head h;
+	u8 caps[OSD_CAP_LEN];
+	struct osd_security_parameters sec_params;
+	/* FIXME: osdv2_security_parameters */
+} __packed;
+
+struct osd_cdb {
+	union {
+		struct osdv1_cdb v1;
+		struct osdv2_cdb v2;
+		u8 buff[OSD_TOTAL_CDB_LEN];
+	};
+} __packed;
+
+static inline struct osd_cdb_head *osd_cdb_head(struct osd_cdb *ocdb)
+{
+	return (struct osd_cdb_head *)ocdb->buff;
+}
+
+/* define both version actions
+ * Ex name = FORMAT_OSD we have OSD_ACT_FORMAT_OSD && OSDv1_ACT_FORMAT_OSD
+ */
+#define OSD_ACT___(Name, Num) \
+	OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num), \
+	OSDv1_ACT_##Name = __constant_cpu_to_be16(0x8800 + Num),
+
+/* V2 only actions */
+#define OSD_ACT_V2(Name, Num) \
+	OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num),
+
+#define OSD_ACT_V1_V2(Name, Num1, Num2) \
+	OSD_ACT_##Name = __constant_cpu_to_be16(Num2), \
+	OSDv1_ACT_##Name = __constant_cpu_to_be16(Num1),
+
+enum osd_service_actions {
+	OSD_ACT_V2(OBJECT_STRUCTURE_CHECK,	0x00)
+	OSD_ACT___(FORMAT_OSD,			0x01)
+	OSD_ACT___(CREATE,			0x02)
+	OSD_ACT___(LIST,			0x03)
+	OSD_ACT_V2(PUNCH,			0x04)
+	OSD_ACT___(READ,			0x05)
+	OSD_ACT___(WRITE,			0x06)
+	OSD_ACT___(APPEND,			0x07)
+	OSD_ACT___(FLUSH,			0x08)
+	OSD_ACT_V2(CLEAR,			0x09)
+	OSD_ACT___(REMOVE,			0x0A)
+	OSD_ACT___(CREATE_PARTITION,		0x0B)
+	OSD_ACT___(REMOVE_PARTITION,		0x0C)
+	OSD_ACT___(GET_ATTRIBUTES,		0x0E)
+	OSD_ACT___(SET_ATTRIBUTES,		0x0F)
+	OSD_ACT___(CREATE_AND_WRITE,		0x12)
+	OSD_ACT___(CREATE_COLLECTION,		0x15)
+	OSD_ACT___(REMOVE_COLLECTION,		0x16)
+	OSD_ACT___(LIST_COLLECTION,		0x17)
+	OSD_ACT___(SET_KEY,			0x18)
+	OSD_ACT___(SET_MASTER_KEY,		0x19)
+	OSD_ACT___(FLUSH_COLLECTION,		0x1A)
+	OSD_ACT___(FLUSH_PARTITION,		0x1B)
+	OSD_ACT___(FLUSH_OSD,			0x1C)
+
+	OSD_ACT_V2(QUERY,			0x20)
+	OSD_ACT_V2(REMOVE_MEMBER_OBJECTS,	0x21)
+	OSD_ACT_V2(GET_MEMBER_ATTRIBUTES,	0x22)
+	OSD_ACT_V2(SET_MEMBER_ATTRIBUTES,	0x23)
+	OSD_ACT_V2(READ_MAP,			0x31)
+
+	OSD_ACT_V1_V2(PERFORM_SCSI_COMMAND,	0x8F7E, 0x8F7C)
+	OSD_ACT_V1_V2(SCSI_TASK_MANAGEMENT,	0x8F7F, 0x8F7D)
+	/* 0x8F80 to 0x8FFF are Vendor specific */
+};
+
+/* osd2r03: 7.1.3.2 List entry format for retrieving attributes */
+struct osd_attributes_list_attrid {
+	__be32 attr_page;
+	__be32 attr_id;
+} __packed;
+
+/*
+ * osd2r03: 7.1.3.3 List entry format for retrieved attributes and
+ *                  for setting attributes
+ * NOTE: v2 is 8-bytes aligned, v1 is not aligned.
+ */
+struct osd_attributes_list_element {
+	__be32 attr_page;
+	__be32 attr_id;
+	__be16 attr_bytes;
+	u8 attr_val[0];
+} __packed;
+
+enum {
+	OSDv1_ATTRIBUTES_ELEM_ALIGN = 1,
+	OSD_ATTRIBUTES_ELEM_ALIGN = 8,
+};
+
+enum {
+	OSD_ATTR_LIST_ALL_PAGES = 0xFFFFFFFF,
+	OSD_ATTR_LIST_ALL_IN_PAGE = 0xFFFFFFFF,
+};
+
+static inline unsigned osdv1_attr_list_elem_size(unsigned len)
+{
+	return ALIGN(len + sizeof(struct osd_attributes_list_element),
+		     OSDv1_ATTRIBUTES_ELEM_ALIGN);
+}
+
+static inline unsigned osdv2_attr_list_elem_size(unsigned len)
+{
+	return ALIGN(len + sizeof(struct osd_attributes_list_element),
+		     OSD_ATTRIBUTES_ELEM_ALIGN);
+}
+
+/*
+ * osd2r03: 7.1.3 OSD attributes lists (Table 184) — List type values
+ */
+enum osd_attr_list_types {
+	OSD_ATTR_LIST_GET = 0x1, 	/* descriptors only */
+	OSD_ATTR_LIST_SET_RETRIEVE = 0x9, /*descriptors/values variable-length*/
+	OSD_V2_ATTR_LIST_MULTIPLE = 0xE,  /* ver2, Multiple Objects lists*/
+	OSD_V1_ATTR_LIST_CREATE_MULTIPLE = 0xF,/*ver1, used by create_multple*/
+};
+
+/* osd2r03: 7.1.3.4 Multi-object retrieved attributes format */
+struct osd_attributes_list_multi_header {
+	__be64 object_id;
+	u8 object_type; /* object_type enum below */
+	u8 reserved[5];
+	__be16 list_bytes;
+	/* followed by struct osd_attributes_list_element's */
+};
+
+struct osdv1_attributes_list_header {
+	u8 type;	/* low 4-bit only */
+	u8 pad;
+	__be16 list_bytes; /* Initiator shall set to Zero. Only set by target */
+	/*
+	 * type=9 followed by struct osd_attributes_list_element's
+	 * type=E followed by struct osd_attributes_list_multi_header's
+	 */
+} __packed;
+
+static inline unsigned osdv1_list_size(struct osdv1_attributes_list_header *h)
+{
+	return be16_to_cpu(h->list_bytes);
+}
+
+struct osdv2_attributes_list_header {
+	u8 type;	/* lower 4-bits only */
+	u8 pad[3];
+/*4*/	__be32 list_bytes; /* Initiator shall set to zero. Only set by target */
+	/*
+	 * type=9 followed by struct osd_attributes_list_element's
+	 * type=E followed by struct osd_attributes_list_multi_header's
+	 */
+} __packed;
+
+static inline unsigned osdv2_list_size(struct osdv2_attributes_list_header *h)
+{
+	return be32_to_cpu(h->list_bytes);
+}
+
+/* (osd-r10 6.13)
+ * osd2r03: 6.15 LIST (Table 79) LIST command parameter data.
+ *	for root_lstchg below
+ */
+enum {
+	OSD_OBJ_ID_LIST_PAR = 0x1, /* V1-only. Not used in V2 */
+	OSD_OBJ_ID_LIST_LSTCHG = 0x2,
+};
+
+/*
+ * osd2r03: 6.15.2 LIST command parameter data
+ * (Also for LIST COLLECTION)
+ */
+struct osd_obj_id_list {
+	__be64 list_bytes; /* bytes in list excluding list_bytes (-8) */
+	__be64 continuation_id;
+	__be32 list_identifier;
+	u8 pad[3];
+	u8 root_lstchg;
+	__be64 object_ids[0];
+} __packed;
+
+static inline bool osd_is_obj_list_done(struct osd_obj_id_list *list,
+	bool *is_changed)
+{
+	*is_changed = (0 != (list->root_lstchg & OSD_OBJ_ID_LIST_LSTCHG));
+	return 0 != list->continuation_id;
+}
+
+/*
+ * osd2r03: 4.12.4.5 The ALLDATA security method
+ */
+struct osd_data_out_integrity_info {
+	__be64 data_bytes;
+	__be64 set_attributes_bytes;
+	__be64 get_attributes_bytes;
+	__be64 integrity_check_value;
+} __packed;
+
+struct osd_data_in_integrity_info {
+	__be64 data_bytes;
+	__be64 retrieved_attributes_bytes;
+	__be64 integrity_check_value;
+} __packed;
+
+struct osd_timestamp {
+	u8 time[6]; /* number of milliseconds since 1/1/1970 UT (big endian) */
+} __packed;
+/* FIXME: define helper functions to convert to/from osd time format */
+
+/*
+ * Capability & Security definitions
+ * osd2r03: 4.11.2.2 Capability format
+ * osd2r03: 5.2.8 Security parameters
+ */
+
+struct osd_key_identifier {
+	u8 id[7]; /* if you know why 7 please email bharrosh@panasas.com */
+} __packed;
+
+/* for osd_capability.format */
+enum {
+	OSD_SEC_CAP_FORMAT_NO_CAPS = 0,
+	OSD_SEC_CAP_FORMAT_VER1 = 1,
+	OSD_SEC_CAP_FORMAT_VER2 = 2,
+};
+
+/* security_method */
+enum {
+	OSD_SEC_NOSEC = 0,
+	OSD_SEC_CAPKEY = 1,
+	OSD_SEC_CMDRSP = 2,
+	OSD_SEC_ALLDATA = 3,
+};
+
+enum object_type {
+	OSD_SEC_OBJ_ROOT = 0x1,
+	OSD_SEC_OBJ_PARTITION = 0x2,
+	OSD_SEC_OBJ_COLLECTION = 0x40,
+	OSD_SEC_OBJ_USER = 0x80,
+};
+
+enum osd_capability_bit_masks {
+	OSD_SEC_CAP_APPEND	= BIT(0),
+	OSD_SEC_CAP_OBJ_MGMT	= BIT(1),
+	OSD_SEC_CAP_REMOVE	= BIT(2),
+	OSD_SEC_CAP_CREATE	= BIT(3),
+	OSD_SEC_CAP_SET_ATTR	= BIT(4),
+	OSD_SEC_CAP_GET_ATTR	= BIT(5),
+	OSD_SEC_CAP_WRITE	= BIT(6),
+	OSD_SEC_CAP_READ	= BIT(7),
+
+	OSD_SEC_CAP_NONE1	= BIT(8),
+	OSD_SEC_CAP_NONE2	= BIT(9),
+	OSD_SEC_CAP_NONE3	= BIT(10),
+	OSD_SEC_CAP_QUERY	= BIT(11), /*v2 only*/
+	OSD_SEC_CAP_M_OBJECT	= BIT(12), /*v2 only*/
+	OSD_SEC_CAP_POL_SEC	= BIT(13),
+	OSD_SEC_CAP_GLOBAL	= BIT(14),
+	OSD_SEC_CAP_DEV_MGMT	= BIT(15),
+};
+
+/* for object_descriptor_type (hi nibble used) */
+enum {
+	OSD_SEC_OBJ_DESC_NONE = 0,     /* Not allowed */
+	OSD_SEC_OBJ_DESC_OBJ = 1 << 4, /* v1: also collection */
+	OSD_SEC_OBJ_DESC_PAR = 2 << 4, /* also root */
+	OSD_SEC_OBJ_DESC_COL = 3 << 4, /* v2 only */
+};
+
+/* (osd-r10:4.9.2.2)
+ * osd2r03:4.11.2.2 Capability format
+ */
+struct osd_capability_head {
+	u8 format; /* low nibble */
+	u8 integrity_algorithm__key_version; /* MAKE_BYTE(integ_alg, key_ver) */
+	u8 security_method;
+	u8 reserved1;
+/*04*/	struct osd_timestamp expiration_time;
+/*10*/	u8 audit[20];
+/*30*/	u8 discriminator[12];
+/*42*/	struct osd_timestamp object_created_time;
+/*48*/	u8 object_type;
+/*49*/	u8 permissions_bit_mask[5];
+/*54*/	u8 reserved2;
+/*55*/	u8 object_descriptor_type; /* high nibble */
+} __packed;
+
+/*56 v1*/
+struct osdv1_cap_object_descriptor {
+	union {
+		struct {
+/*56*/			__be32 policy_access_tag;
+/*60*/			__be64 allowed_partition_id;
+/*68*/			__be64 allowed_object_id;
+/*76*/			__be32 reserved;
+		} __packed obj_desc;
+
+/*56*/		u8 object_descriptor[24];
+	};
+} __packed;
+/*80 v1*/
+
+/*56 v2*/
+struct osd_cap_object_descriptor {
+	union {
+		struct {
+/*56*/			__be32 allowed_attributes_access;
+/*60*/			__be32 policy_access_tag;
+/*64*/			__be16 boot_epoch;
+/*66*/			u8 reserved[6];
+/*72*/			__be64 allowed_partition_id;
+/*80*/			__be64 allowed_object_id;
+/*88*/			__be64 allowed_range_length;
+/*96*/			__be64 allowed_range_start;
+		} __packed obj_desc;
+
+/*56*/		u8 object_descriptor[48];
+	};
+} __packed;
+/*104 v2*/
+
+struct osdv1_capability {
+	struct osd_capability_head h;
+	struct osdv1_cap_object_descriptor od;
+} __packed;
+
+struct osd_capability {
+	struct osd_capability_head h;
+/* 	struct osd_cap_object_descriptor od;*/
+	struct osdv1_cap_object_descriptor od; /* FIXME: Pete rev-001 sup */
+} __packed;
+
+/**
+ * osd_sec_set_caps - set cap-bits into the capabilities header
+ *
+ * @cap:	The osd_capability_head to set cap bits to.
+ * @bit_mask: 	Use an ORed list of enum osd_capability_bit_masks values
+ *
+ * permissions_bit_mask is unaligned use below to set into caps
+ * in a version independent way
+ */
+static inline void osd_sec_set_caps(struct osd_capability_head *cap,
+	u16 bit_mask)
+{
+	/*
+	 *Note: The bits above are defined LE order this is because this way
+	 *      they can grow in the future to more then 16, and still retain
+	 *      there constant values.
+	 */
+	put_unaligned_le16(bit_mask, &cap->permissions_bit_mask);
+}
+
+#endif /* ndef __OSD_PROTOCOL_H__ */
diff --git a/include/scsi/osd_sec.h b/include/scsi/osd_sec.h
new file mode 100644
index 000000000000..4c09fee8ae1e
--- /dev/null
+++ b/include/scsi/osd_sec.h
@@ -0,0 +1,45 @@
+/*
+ * osd_sec.h - OSD security manager API
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ */
+#ifndef __OSD_SEC_H__
+#define __OSD_SEC_H__
+
+#include "osd_protocol.h"
+#include "osd_types.h"
+
+/*
+ * Contains types and constants of osd capabilities and security
+ * encoding/decoding.
+ * API is trying to keep security abstract so initiator of an object
+ * based pNFS client knows as little as possible about security and
+ * capabilities. It is the Server's osd-initiator place to know more.
+ * Also can be used by osd-target.
+ */
+void osd_sec_encode_caps(void *caps, ...);/* NI */
+void osd_sec_init_nosec_doall_caps(void *caps,
+	const struct osd_obj_id *obj, bool is_collection, const bool is_v1);
+
+bool osd_is_sec_alldata(struct osd_security_parameters *sec_params);
+
+/* Conditionally sign the CDB according to security setting in ocdb
+ * with cap_key */
+void osd_sec_sign_cdb(struct osd_cdb *ocdb, const u8 *cap_key);
+
+/* Unconditionally sign the BIO data with cap_key.
+ * Check for osd_is_sec_alldata() was done prior to calling this. */
+void osd_sec_sign_data(void *data_integ, struct bio *bio, const u8 *cap_key);
+
+/* Version independent copy of caps into the cdb */
+void osd_set_caps(struct osd_cdb *cdb, const void *caps);
+
+#endif /* ndef __OSD_SEC_H__ */
diff --git a/include/scsi/osd_sense.h b/include/scsi/osd_sense.h
new file mode 100644
index 000000000000..ff9b33c773c7
--- /dev/null
+++ b/include/scsi/osd_sense.h
@@ -0,0 +1,260 @@
+/*
+ * osd_sense.h - OSD Related sense handling definitions.
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ * This file contains types and constants that are defined by the protocol
+ * Note: All names and symbols are taken from the OSD standard's text.
+ */
+#ifndef __OSD_SENSE_H__
+#define __OSD_SENSE_H__
+
+#include <scsi/osd_protocol.h>
+
+/* SPC3r23 4.5.6 Sense key and sense code definitions table 27 */
+enum scsi_sense_keys {
+	scsi_sk_no_sense        = 0x0,
+	scsi_sk_recovered_error = 0x1,
+	scsi_sk_not_ready       = 0x2,
+	scsi_sk_medium_error    = 0x3,
+	scsi_sk_hardware_error  = 0x4,
+	scsi_sk_illegal_request = 0x5,
+	scsi_sk_unit_attention  = 0x6,
+	scsi_sk_data_protect    = 0x7,
+	scsi_sk_blank_check     = 0x8,
+	scsi_sk_vendor_specific = 0x9,
+	scsi_sk_copy_aborted    = 0xa,
+	scsi_sk_aborted_command = 0xb,
+	scsi_sk_volume_overflow = 0xd,
+	scsi_sk_miscompare      = 0xe,
+	scsi_sk_reserved        = 0xf,
+};
+
+/* SPC3r23 4.5.6 Sense key and sense code definitions table 28 */
+/* Note: only those which can be returned by an OSD target. Most of
+ *       these errors are taken care of by the generic scsi layer.
+ */
+enum osd_additional_sense_codes {
+	scsi_no_additional_sense_information			= 0x0000,
+	scsi_operation_in_progress				= 0x0016,
+	scsi_cleaning_requested					= 0x0017,
+	scsi_lunr_cause_not_reportable				= 0x0400,
+	scsi_logical_unit_is_in_process_of_becoming_ready	= 0x0401,
+	scsi_lunr_initializing_command_required			= 0x0402,
+	scsi_lunr_manual_intervention_required			= 0x0403,
+	scsi_lunr_operation_in_progress				= 0x0407,
+	scsi_lunr_selftest_in_progress				= 0x0409,
+	scsi_luna_asymmetric_access_state_transition		= 0x040a,
+	scsi_luna_target_port_in_standby_state			= 0x040b,
+	scsi_luna_target_port_in_unavailable_state		= 0x040c,
+	scsi_lunr_notify_enable_spinup_required			= 0x0411,
+	scsi_logical_unit_does_not_respond_to_selection		= 0x0500,
+	scsi_logical_unit_communication_failure			= 0x0800,
+	scsi_logical_unit_communication_timeout			= 0x0801,
+	scsi_logical_unit_communication_parity_error		= 0x0802,
+	scsi_error_log_overflow					= 0x0a00,
+	scsi_warning						= 0x0b00,
+	scsi_warning_specified_temperature_exceeded		= 0x0b01,
+	scsi_warning_enclosure_degraded				= 0x0b02,
+	scsi_write_error_unexpected_unsolicited_data		= 0x0c0c,
+	scsi_write_error_not_enough_unsolicited_data		= 0x0c0d,
+	scsi_invalid_information_unit				= 0x0e00,
+	scsi_invalid_field_in_command_information_unit		= 0x0e03,
+	scsi_read_error_failed_retransmission_request		= 0x1113,
+	scsi_parameter_list_length_error			= 0x1a00,
+	scsi_invalid_command_operation_code			= 0x2000,
+	scsi_invalid_field_in_cdb				= 0x2400,
+	osd_security_audit_value_frozen				= 0x2404,
+	osd_security_working_key_frozen				= 0x2405,
+	osd_nonce_not_unique					= 0x2406,
+	osd_nonce_timestamp_out_of_range			= 0x2407,
+	scsi_logical_unit_not_supported				= 0x2500,
+	scsi_invalid_field_in_parameter_list			= 0x2600,
+	scsi_parameter_not_supported				= 0x2601,
+	scsi_parameter_value_invalid				= 0x2602,
+	scsi_invalid_release_of_persistent_reservation		= 0x2604,
+	osd_invalid_dataout_buffer_integrity_check_value	= 0x260f,
+	scsi_not_ready_to_ready_change_medium_may_have_changed	= 0x2800,
+	scsi_power_on_reset_or_bus_device_reset_occurred	= 0x2900,
+	scsi_power_on_occurred					= 0x2901,
+	scsi_scsi_bus_reset_occurred				= 0x2902,
+	scsi_bus_device_reset_function_occurred			= 0x2903,
+	scsi_device_internal_reset				= 0x2904,
+	scsi_transceiver_mode_changed_to_single_ended		= 0x2905,
+	scsi_transceiver_mode_changed_to_lvd			= 0x2906,
+	scsi_i_t_nexus_loss_occurred				= 0x2907,
+	scsi_parameters_changed					= 0x2a00,
+	scsi_mode_parameters_changed				= 0x2a01,
+	scsi_asymmetric_access_state_changed			= 0x2a06,
+	scsi_priority_changed					= 0x2a08,
+	scsi_command_sequence_error				= 0x2c00,
+	scsi_previous_busy_status				= 0x2c07,
+	scsi_previous_task_set_full_status			= 0x2c08,
+	scsi_previous_reservation_conflict_status		= 0x2c09,
+	osd_partition_or_collection_contains_user_objects	= 0x2c0a,
+	scsi_commands_cleared_by_another_initiator		= 0x2f00,
+	scsi_cleaning_failure					= 0x3007,
+	scsi_enclosure_failure					= 0x3400,
+	scsi_enclosure_services_failure				= 0x3500,
+	scsi_unsupported_enclosure_function			= 0x3501,
+	scsi_enclosure_services_unavailable			= 0x3502,
+	scsi_enclosure_services_transfer_failure		= 0x3503,
+	scsi_enclosure_services_transfer_refused		= 0x3504,
+	scsi_enclosure_services_checksum_error			= 0x3505,
+	scsi_rounded_parameter					= 0x3700,
+	osd_read_past_end_of_user_object			= 0x3b17,
+	scsi_logical_unit_has_not_self_configured_yet		= 0x3e00,
+	scsi_logical_unit_failure				= 0x3e01,
+	scsi_timeout_on_logical_unit				= 0x3e02,
+	scsi_logical_unit_failed_selftest			= 0x3e03,
+	scsi_logical_unit_unable_to_update_selftest_log		= 0x3e04,
+	scsi_target_operating_conditions_have_changed		= 0x3f00,
+	scsi_microcode_has_been_changed				= 0x3f01,
+	scsi_inquiry_data_has_changed				= 0x3f03,
+	scsi_echo_buffer_overwritten				= 0x3f0f,
+	scsi_diagnostic_failure_on_component_nn_first		= 0x4080,
+	scsi_diagnostic_failure_on_component_nn_last		= 0x40ff,
+	scsi_message_error					= 0x4300,
+	scsi_internal_target_failure				= 0x4400,
+	scsi_select_or_reselect_failure				= 0x4500,
+	scsi_scsi_parity_error					= 0x4700,
+	scsi_data_phase_crc_error_detected			= 0x4701,
+	scsi_scsi_parity_error_detected_during_st_data_phase	= 0x4702,
+	scsi_asynchronous_information_protection_error_detected	= 0x4704,
+	scsi_protocol_service_crc_error				= 0x4705,
+	scsi_phy_test_function_in_progress			= 0x4706,
+	scsi_invalid_message_error				= 0x4900,
+	scsi_command_phase_error				= 0x4a00,
+	scsi_data_phase_error					= 0x4b00,
+	scsi_logical_unit_failed_self_configuration		= 0x4c00,
+	scsi_overlapped_commands_attempted			= 0x4e00,
+	osd_quota_error						= 0x5507,
+	scsi_failure_prediction_threshold_exceeded		= 0x5d00,
+	scsi_failure_prediction_threshold_exceeded_false	= 0x5dff,
+	scsi_voltage_fault					= 0x6500,
+};
+
+enum scsi_descriptor_types {
+	scsi_sense_information			= 0x0,
+	scsi_sense_command_specific_information	= 0x1,
+	scsi_sense_key_specific			= 0x2,
+	scsi_sense_field_replaceable_unit	= 0x3,
+	scsi_sense_stream_commands		= 0x4,
+	scsi_sense_block_commands		= 0x5,
+	osd_sense_object_identification		= 0x6,
+	osd_sense_response_integrity_check	= 0x7,
+	osd_sense_attribute_identification	= 0x8,
+	scsi_sense_ata_return			= 0x9,
+
+	scsi_sense_Reserved_first		= 0x0A,
+	scsi_sense_Reserved_last		= 0x7F,
+	scsi_sense_Vendor_specific_first	= 0x80,
+	scsi_sense_Vendor_specific_last		= 0xFF,
+};
+
+struct scsi_sense_descriptor { /* for picking into desc type */
+	u8	descriptor_type; /* one of enum scsi_descriptor_types */
+	u8	additional_length; /* n - 1 */
+	u8	data[];
+} __packed;
+
+/* OSD deploys only scsi descriptor_based sense buffers */
+struct scsi_sense_descriptor_based {
+/*0*/	u8 	response_code; /* 0x72 or 0x73 */
+/*1*/	u8 	sense_key; /* one of enum scsi_sense_keys (4 lower bits) */
+/*2*/	__be16	additional_sense_code; /* enum osd_additional_sense_codes */
+/*4*/	u8	Reserved[3];
+/*7*/	u8	additional_sense_length; /* n - 7 */
+/*8*/	struct	scsi_sense_descriptor ssd[0]; /* variable length, 1 or more */
+} __packed;
+
+/* some descriptors deployed by OSD */
+
+/* SPC3r23 4.5.2.3 Command-specific information sense data descriptor */
+/* Note: this is the same for descriptor_type=00 but with type=00 the
+ *        Reserved[0] == 0x80 (ie. bit-7 set)
+ */
+struct scsi_sense_command_specific_data_descriptor {
+/*0*/	u8	descriptor_type; /* (00h/01h) */
+/*1*/	u8	additional_length; /* (0Ah) */
+/*2*/	u8	Reserved[2];
+/*4*/	__be64  information;
+} __packed;
+/*12*/
+
+struct scsi_sense_key_specific_data_descriptor {
+/*0*/	u8	descriptor_type; /* (02h) */
+/*1*/	u8	additional_length; /* (06h) */
+/*2*/	u8	Reserved[2];
+/* SKSV, C/D, Reserved (2), BPV, BIT POINTER (3) */
+/*4*/	u8	sksv_cd_bpv_bp;
+/*5*/	__be16	value; /* field-pointer/progress-value/retry-count/... */
+/*7*/	u8	Reserved2;
+} __packed;
+/*8*/
+
+/* 4.16.2.1 OSD error identification sense data descriptor - table 52 */
+/* Note: these bits are defined LE order for easy definition, this way the BIT()
+ * number is the same as in the documentation. Below members at
+ * osd_sense_identification_data_descriptor are therefore defined __le32.
+ */
+enum osd_command_functions_bits {
+	OSD_CFB_COMMAND		 = BIT(4),
+	OSD_CFB_CMD_CAP_VERIFIED = BIT(5),
+	OSD_CFB_VALIDATION	 = BIT(7),
+	OSD_CFB_IMP_ST_ATT	 = BIT(12),
+	OSD_CFB_SET_ATT		 = BIT(20),
+	OSD_CFB_SA_CAP_VERIFIED	 = BIT(21),
+	OSD_CFB_GET_ATT		 = BIT(28),
+	OSD_CFB_GA_CAP_VERIFIED	 = BIT(29),
+};
+
+struct osd_sense_identification_data_descriptor {
+/*0*/	u8	descriptor_type; /* (06h) */
+/*1*/	u8	additional_length; /* (1Eh) */
+/*2*/	u8	Reserved[6];
+/*8*/	__le32	not_initiated_functions; /*osd_command_functions_bits*/
+/*12*/	__le32	completed_functions; /*osd_command_functions_bits*/
+/*16*/ 	__be64	partition_id;
+/*24*/	__be64	object_id;
+} __packed;
+/*32*/
+
+struct osd_sense_response_integrity_check_descriptor {
+/*0*/	u8	descriptor_type; /* (07h) */
+/*1*/	u8	additional_length; /* (20h) */
+/*2*/	u8	integrity_check_value[32]; /*FIXME: OSDv2_CRYPTO_KEYID_SIZE*/
+} __packed;
+/*34*/
+
+struct osd_sense_attributes_data_descriptor {
+/*0*/	u8	descriptor_type; /* (08h) */
+/*1*/	u8	additional_length; /* (n-2) */
+/*2*/	u8	Reserved[6];
+	struct osd_sense_attr {
+/*8*/		__be32	attr_page;
+/*12*/		__be32	attr_id;
+/*16*/	} sense_attrs[0]; /* 1 or more */
+} __packed;
+/*variable*/
+
+/* Dig into scsi_sk_illegal_request/scsi_invalid_field_in_cdb errors */
+
+/*FIXME: Support also field in CAPS*/
+#define OSD_CDB_OFFSET(F) offsetof(struct osd_cdb_head, F)
+
+enum osdv2_cdb_field_offset {
+	OSDv1_CFO_STARTING_BYTE	= OSD_CDB_OFFSET(v1.start_address),
+	OSD_CFO_STARTING_BYTE	= OSD_CDB_OFFSET(v2.start_address),
+	OSD_CFO_PARTITION_ID	= OSD_CDB_OFFSET(partition),
+	OSD_CFO_OBJECT_ID	= OSD_CDB_OFFSET(object),
+};
+
+#endif /* ndef __OSD_SENSE_H__ */
diff --git a/include/scsi/osd_types.h b/include/scsi/osd_types.h
new file mode 100644
index 000000000000..3f5e88cc75c0
--- /dev/null
+++ b/include/scsi/osd_types.h
@@ -0,0 +1,40 @@
+/*
+ * osd_types.h - Types and constants which are not part of the protocol.
+ *
+ * Copyright (C) 2008 Panasas Inc.  All rights reserved.
+ *
+ * Authors:
+ *   Boaz Harrosh <bharrosh@panasas.com>
+ *   Benny Halevy <bhalevy@panasas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ *
+ * Contains types and constants that are implementation specific and are
+ * used by more than one part of the osd library.
+ *     (Eg initiator/target/security_manager/...)
+ */
+#ifndef __OSD_TYPES_H__
+#define __OSD_TYPES_H__
+
+struct osd_systemid {
+	u8 data[OSD_SYSTEMID_LEN];
+};
+
+typedef u64 __bitwise osd_id;
+
+struct osd_obj_id {
+	osd_id partition;
+	osd_id id;
+};
+
+static const struct __weak osd_obj_id osd_root_object = {0, 0};
+
+struct osd_attr {
+	u32 attr_page;
+	u32 attr_id;
+	u16 len;		/* byte count of operand */
+	void *val_ptr;		/* in network order */
+};
+
+#endif /* ndef __OSD_TYPES_H__ */
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index a109165714d6..084478e14d24 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -9,7 +9,8 @@
 #define _SCSI_SCSI_H
 
 #include <linux/types.h>
-#include <scsi/scsi_cmnd.h>
+
+struct scsi_cmnd;
 
 /*
  * The maximum number of SG segments that we will put inside a
@@ -263,6 +264,7 @@ static inline int scsi_status_is_good(int status)
 #define TYPE_RAID           0x0c
 #define TYPE_ENCLOSURE      0x0d    /* Enclosure Services Device */
 #define TYPE_RBC	    0x0e
+#define TYPE_OSD            0x11
 #define TYPE_NO_LUN         0x7f
 
 /* SCSI protocols; these are taken from SPC-3 section 7.5 */
@@ -402,16 +404,6 @@ static inline int scsi_is_wlun(unsigned int lun)
 #define DRIVER_HARD         0x07
 #define DRIVER_SENSE	    0x08
 
-#define SUGGEST_RETRY       0x10
-#define SUGGEST_ABORT       0x20
-#define SUGGEST_REMAP       0x30
-#define SUGGEST_DIE         0x40
-#define SUGGEST_SENSE       0x80
-#define SUGGEST_IS_OK       0xff
-
-#define DRIVER_MASK         0x0f
-#define SUGGEST_MASK        0xf0
-
 /*
  * Internal return values.
  */
@@ -447,23 +439,6 @@ static inline int scsi_is_wlun(unsigned int lun)
 #define msg_byte(result)    (((result) >> 8) & 0xff)
 #define host_byte(result)   (((result) >> 16) & 0xff)
 #define driver_byte(result) (((result) >> 24) & 0xff)
-#define suggestion(result)  (driver_byte(result) & SUGGEST_MASK)
-
-static inline void set_msg_byte(struct scsi_cmnd *cmd, char status)
-{
-	cmd->result |= status << 8;
-}
-
-static inline void set_host_byte(struct scsi_cmnd *cmd, char status)
-{
-	cmd->result |= status << 16;
-}
-
-static inline void set_driver_byte(struct scsi_cmnd *cmd, char status)
-{
-	cmd->result |= status << 24;
-}
-
 
 #define sense_class(sense)  (((sense) >> 4) & 0x7)
 #define sense_error(sense)  ((sense) & 0xf)
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 855bf95963e7..43b50d36925c 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -291,4 +291,19 @@ static inline struct scsi_data_buffer *scsi_prot(struct scsi_cmnd *cmd)
 #define scsi_for_each_prot_sg(cmd, sg, nseg, __i)		\
 	for_each_sg(scsi_prot_sglist(cmd), sg, nseg, __i)
 
+static inline void set_msg_byte(struct scsi_cmnd *cmd, char status)
+{
+	cmd->result |= status << 8;
+}
+
+static inline void set_host_byte(struct scsi_cmnd *cmd, char status)
+{
+	cmd->result |= status << 16;
+}
+
+static inline void set_driver_byte(struct scsi_cmnd *cmd, char status)
+{
+	cmd->result |= status << 24;
+}
+
 #endif /* _SCSI_SCSI_CMND_H */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 01a4c58f8bad..3f566af3f101 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -340,6 +340,7 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
 			    struct scsi_sense_hdr *);
 extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
 				int retries, struct scsi_sense_hdr *sshdr);
+extern unsigned char *scsi_get_vpd_page(struct scsi_device *, u8 page);
 extern int scsi_device_set_state(struct scsi_device *sdev,
 				 enum scsi_device_state state);
 extern struct scsi_event *sdev_evt_alloc(enum scsi_device_event evt_type,
@@ -370,12 +371,6 @@ extern int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
 			    int data_direction, void *buffer, unsigned bufflen,
 			    struct scsi_sense_hdr *, int timeout, int retries,
 			    int *resid);
-extern int scsi_execute_async(struct scsi_device *sdev,
-			      const unsigned char *cmd, int cmd_len, int data_direction,
-			      void *buffer, unsigned bufflen, int use_sg,
-			      int timeout, int retries, void *privdata,
-			      void (*done)(void *, char *, int, int),
-			      gfp_t gfp);
 
 static inline int __must_check scsi_device_reprobe(struct scsi_device *sdev)
 {
@@ -400,7 +395,8 @@ static inline unsigned int sdev_id(struct scsi_device *sdev)
  */
 static inline int scsi_device_online(struct scsi_device *sdev)
 {
-	return sdev->sdev_state != SDEV_OFFLINE;
+	return (sdev->sdev_state != SDEV_OFFLINE &&
+		sdev->sdev_state != SDEV_DEL);
 }
 static inline int scsi_device_blocked(struct scsi_device *sdev)
 {
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index b50aabe2861e..457588e1119b 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -88,7 +88,7 @@ struct iscsi_transport {
 	uint64_t host_param_mask;
 	struct iscsi_cls_session *(*create_session) (struct iscsi_endpoint *ep,
 					uint16_t cmds_max, uint16_t qdepth,
-					uint32_t sn, uint32_t *hn);
+					uint32_t sn);
 	void (*destroy_session) (struct iscsi_cls_session *session);
 	struct iscsi_cls_conn *(*create_conn) (struct iscsi_cls_session *sess,
 				uint32_t cid);
@@ -206,8 +206,6 @@ struct iscsi_cls_session {
 struct iscsi_cls_host {
 	atomic_t nr_scans;
 	struct mutex mutex;
-	struct workqueue_struct *scan_workq;
-	char scan_workq_name[20];
 };
 
 extern void iscsi_host_for_each_session(struct Scsi_Host *shost,
diff --git a/init/Kconfig b/init/Kconfig
index 68699137b147..14c483d2b7c9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -101,6 +101,66 @@ config LOCALVERSION_AUTO
 
 	  which is done within the script "scripts/setlocalversion".)
 
+config HAVE_KERNEL_GZIP
+	bool
+
+config HAVE_KERNEL_BZIP2
+	bool
+
+config HAVE_KERNEL_LZMA
+	bool
+
+choice
+	prompt "Kernel compression mode"
+	default KERNEL_GZIP
+	depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA
+	help
+	  The linux kernel is a kind of self-extracting executable.
+	  Several compression algorithms are available, which differ
+	  in efficiency, compression and decompression speed.
+	  Compression speed is only relevant when building a kernel.
+	  Decompression speed is relevant at each boot.
+
+	  If you have any problems with bzip2 or lzma compressed
+	  kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
+	  version of this functionality (bzip2 only), for 2.4, was
+	  supplied by Christian Ludwig)
+
+	  High compression options are mostly useful for users, who
+	  are low on disk space (embedded systems), but for whom ram
+	  size matters less.
+
+	  If in doubt, select 'gzip'
+
+config KERNEL_GZIP
+	bool "Gzip"
+	depends on HAVE_KERNEL_GZIP
+	help
+	  The old and tried gzip compression. Its compression ratio is
+	  the poorest among the 3 choices; however its speed (both
+	  compression and decompression) is the fastest.
+
+config KERNEL_BZIP2
+	bool "Bzip2"
+	depends on HAVE_KERNEL_BZIP2
+	help
+	  Its compression ratio and speed is intermediate.
+	  Decompression speed is slowest among the three.  The kernel
+	  size is about 10% smaller with bzip2, in comparison to gzip.
+	  Bzip2 uses a large amount of memory. For modern kernels you
+	  will need at least 8MB RAM or more for booting.
+
+config KERNEL_LZMA
+	bool "LZMA"
+	depends on HAVE_KERNEL_LZMA
+	help
+	  The most recent compression algorithm.
+	  Its ratio is best, decompression speed is between the other
+	  two. Compression is slowest.	The kernel size is about 33%
+	  smaller with LZMA in comparison to gzip.
+
+endchoice
+
 config SWAP
 	bool "Support for paging of anonymous memory (swap)"
 	depends on MMU && BLOCK
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index 0f0f0cf3ba9a..027a402708de 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -11,6 +11,9 @@
 #include "do_mounts.h"
 #include "../fs/squashfs/squashfs_fs.h"
 
+#include <linux/decompress/generic.h>
+
+
 int __initdata rd_prompt = 1;/* 1 = prompt for RAM disk, 0 = don't prompt */
 
 static int __init prompt_ramdisk(char *str)
@@ -29,7 +32,7 @@ static int __init ramdisk_start_setup(char *str)
 }
 __setup("ramdisk_start=", ramdisk_start_setup);
 
-static int __init crd_load(int in_fd, int out_fd);
+static int __init crd_load(int in_fd, int out_fd, decompress_fn deco);
 
 /*
  * This routine tries to find a RAM disk image to load, and returns the
@@ -38,15 +41,15 @@ static int __init crd_load(int in_fd, int out_fd);
  * numbers could not be found.
  *
  * We currently check for the following magic numbers:
- * 	minix
- * 	ext2
+ *	minix
+ *	ext2
  *	romfs
  *	cramfs
  *	squashfs
- * 	gzip
+ *	gzip
  */
-static int __init 
-identify_ramdisk_image(int fd, int start_block)
+static int __init
+identify_ramdisk_image(int fd, int start_block, decompress_fn *decompressor)
 {
 	const int size = 512;
 	struct minix_super_block *minixsb;
@@ -56,6 +59,7 @@ identify_ramdisk_image(int fd, int start_block)
 	struct squashfs_super_block *squashfsb;
 	int nblocks = -1;
 	unsigned char *buf;
+	const char *compress_name;
 
 	buf = kmalloc(size, GFP_KERNEL);
 	if (!buf)
@@ -69,18 +73,19 @@ identify_ramdisk_image(int fd, int start_block)
 	memset(buf, 0xe5, size);
 
 	/*
-	 * Read block 0 to test for gzipped kernel
+	 * Read block 0 to test for compressed kernel
 	 */
 	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
 	sys_read(fd, buf, size);
 
-	/*
-	 * If it matches the gzip magic numbers, return 0
-	 */
-	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
-		printk(KERN_NOTICE
-		       "RAMDISK: Compressed image found at block %d\n",
-		       start_block);
+	*decompressor = decompress_method(buf, size, &compress_name);
+	if (compress_name) {
+		printk(KERN_NOTICE "RAMDISK: %s image found at block %d\n",
+		       compress_name, start_block);
+		if (!*decompressor)
+			printk(KERN_EMERG
+			       "RAMDISK: %s decompressor not configured!\n",
+			       compress_name);
 		nblocks = 0;
 		goto done;
 	}
@@ -142,7 +147,7 @@ identify_ramdisk_image(int fd, int start_block)
 	printk(KERN_NOTICE
 	       "RAMDISK: Couldn't find valid RAM disk image starting at %d.\n",
 	       start_block);
-	
+
 done:
 	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
 	kfree(buf);
@@ -157,6 +162,7 @@ int __init rd_load_image(char *from)
 	int nblocks, i, disk;
 	char *buf = NULL;
 	unsigned short rotate = 0;
+	decompress_fn decompressor = NULL;
 #if !defined(CONFIG_S390) && !defined(CONFIG_PPC_ISERIES)
 	char rotator[4] = { '|' , '/' , '-' , '\\' };
 #endif
@@ -169,12 +175,12 @@ int __init rd_load_image(char *from)
 	if (in_fd < 0)
 		goto noclose_input;
 
-	nblocks = identify_ramdisk_image(in_fd, rd_image_start);
+	nblocks = identify_ramdisk_image(in_fd, rd_image_start, &decompressor);
 	if (nblocks < 0)
 		goto done;
 
 	if (nblocks == 0) {
-		if (crd_load(in_fd, out_fd) == 0)
+		if (crd_load(in_fd, out_fd, decompressor) == 0)
 			goto successful_load;
 		goto done;
 	}
@@ -200,7 +206,7 @@ int __init rd_load_image(char *from)
 		       nblocks, rd_blocks);
 		goto done;
 	}
-		
+
 	/*
 	 * OK, time to copy in the data
 	 */
@@ -273,138 +279,48 @@ int __init rd_load_disk(int n)
 	return rd_load_image("/dev/root");
 }
 
-/*
- * gzip declarations
- */
-
-#define OF(args)  args
-
-#ifndef memzero
-#define memzero(s, n)     memset ((s), 0, (n))
-#endif
-
-typedef unsigned char  uch;
-typedef unsigned short ush;
-typedef unsigned long  ulg;
-
-#define INBUFSIZ 4096
-#define WSIZE 0x8000    /* window size--must be a power of two, and */
-			/*  at least 32K for zip's deflate method */
-
-static uch *inbuf;
-static uch *window;
-
-static unsigned insize;  /* valid bytes in inbuf */
-static unsigned inptr;   /* index of next byte to be processed in inbuf */
-static unsigned outcnt;  /* bytes in output buffer */
 static int exit_code;
-static int unzip_error;
-static long bytes_out;
+static int decompress_error;
 static int crd_infd, crd_outfd;
 
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
-		
-/* Diagnostic functions (stubbed out) */
-#define Assert(cond,msg)
-#define Trace(x)
-#define Tracev(x)
-#define Tracevv(x)
-#define Tracec(c,x)
-#define Tracecv(c,x)
-
-#define STATIC static
-#define INIT __init
-
-static int  __init fill_inbuf(void);
-static void __init flush_window(void);
-static void __init error(char *m);
-
-#define NO_INFLATE_MALLOC
-
-#include "../lib/inflate.c"
-
-/* ===========================================================================
- * Fill the input buffer. This is called only when the buffer is empty
- * and at least one byte is really needed.
- * Returning -1 does not guarantee that gunzip() will ever return.
- */
-static int __init fill_inbuf(void)
+static int __init compr_fill(void *buf, unsigned int len)
 {
-	if (exit_code) return -1;
-	
-	insize = sys_read(crd_infd, inbuf, INBUFSIZ);
-	if (insize == 0) {
-		error("RAMDISK: ran out of compressed data");
-		return -1;
-	}
-
-	inptr = 1;
-
-	return inbuf[0];
+	int r = sys_read(crd_infd, buf, len);
+	if (r < 0)
+		printk(KERN_ERR "RAMDISK: error while reading compressed data");
+	else if (r == 0)
+		printk(KERN_ERR "RAMDISK: EOF while reading compressed data");
+	return r;
 }
 
-/* ===========================================================================
- * Write the output window window[0..outcnt-1] and update crc and bytes_out.
- * (Used for the decompressed data only.)
- */
-static void __init flush_window(void)
+static int __init compr_flush(void *window, unsigned int outcnt)
 {
-    ulg c = crc;         /* temporary variable */
-    unsigned n, written;
-    uch *in, ch;
-    
-    written = sys_write(crd_outfd, window, outcnt);
-    if (written != outcnt && unzip_error == 0) {
-	printk(KERN_ERR "RAMDISK: incomplete write (%d != %d) %ld\n",
-	       written, outcnt, bytes_out);
-	unzip_error = 1;
-    }
-    in = window;
-    for (n = 0; n < outcnt; n++) {
-	    ch = *in++;
-	    c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    outcnt = 0;
+	int written = sys_write(crd_outfd, window, outcnt);
+	if (written != outcnt) {
+		if (decompress_error == 0)
+			printk(KERN_ERR
+			       "RAMDISK: incomplete write (%d != %d)\n",
+			       written, outcnt);
+		decompress_error = 1;
+		return -1;
+	}
+	return outcnt;
 }
 
 static void __init error(char *x)
 {
 	printk(KERN_ERR "%s\n", x);
 	exit_code = 1;
-	unzip_error = 1;
+	decompress_error = 1;
 }
 
-static int __init crd_load(int in_fd, int out_fd)
+static int __init crd_load(int in_fd, int out_fd, decompress_fn deco)
 {
 	int result;
-
-	insize = 0;		/* valid bytes in inbuf */
-	inptr = 0;		/* index of next byte to be processed in inbuf */
-	outcnt = 0;		/* bytes in output buffer */
-	exit_code = 0;
-	bytes_out = 0;
-	crc = (ulg)0xffffffffL; /* shift register contents */
-
 	crd_infd = in_fd;
 	crd_outfd = out_fd;
-	inbuf = kmalloc(INBUFSIZ, GFP_KERNEL);
-	if (!inbuf) {
-		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n");
-		return -1;
-	}
-	window = kmalloc(WSIZE, GFP_KERNEL);
-	if (!window) {
-		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n");
-		kfree(inbuf);
-		return -1;
-	}
-	makecrc();
-	result = gunzip();
-	if (unzip_error)
+	result = deco(NULL, 0, compr_fill, compr_flush, NULL, NULL, error);
+	if (decompress_error)
 		result = 1;
-	kfree(inbuf);
-	kfree(window);
 	return result;
 }
diff --git a/init/initramfs.c b/init/initramfs.c
index d9c941c0c3ca..7dcde7ea6603 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -390,11 +390,13 @@ static int __init write_buffer(char *buf, unsigned len)
 	return len - count;
 }
 
-static void __init flush_buffer(char *buf, unsigned len)
+static int __init flush_buffer(void *bufv, unsigned len)
 {
+	char *buf = (char *) bufv;
 	int written;
+	int origLen = len;
 	if (message)
-		return;
+		return -1;
 	while ((written = write_buffer(buf, len)) < len && !message) {
 		char c = buf[written];
 		if (c == '0') {
@@ -408,84 +410,28 @@ static void __init flush_buffer(char *buf, unsigned len)
 		} else
 			error("junk in compressed archive");
 	}
+	return origLen;
 }
 
-/*
- * gzip declarations
- */
+static unsigned my_inptr;   /* index of next byte to be processed in inbuf */
 
-#define OF(args)  args
-
-#ifndef memzero
-#define memzero(s, n)     memset ((s), 0, (n))
-#endif
-
-typedef unsigned char  uch;
-typedef unsigned short ush;
-typedef unsigned long  ulg;
-
-#define WSIZE 0x8000    /* window size--must be a power of two, and */
-			/*  at least 32K for zip's deflate method */
-
-static uch *inbuf;
-static uch *window;
-
-static unsigned insize;  /* valid bytes in inbuf */
-static unsigned inptr;   /* index of next byte to be processed in inbuf */
-static unsigned outcnt;  /* bytes in output buffer */
-static long bytes_out;
-
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : -1)
-		
-/* Diagnostic functions (stubbed out) */
-#define Assert(cond,msg)
-#define Trace(x)
-#define Tracev(x)
-#define Tracevv(x)
-#define Tracec(c,x)
-#define Tracecv(c,x)
-
-#define STATIC static
-#define INIT __init
-
-static void __init flush_window(void);
-static void __init error(char *m);
-
-#define NO_INFLATE_MALLOC
-
-#include "../lib/inflate.c"
-
-/* ===========================================================================
- * Write the output window window[0..outcnt-1] and update crc and bytes_out.
- * (Used for the decompressed data only.)
- */
-static void __init flush_window(void)
-{
-	ulg c = crc;         /* temporary variable */
-	unsigned n;
-	uch *in, ch;
-
-	flush_buffer(window, outcnt);
-	in = window;
-	for (n = 0; n < outcnt; n++) {
-		ch = *in++;
-		c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-	}
-	crc = c;
-	bytes_out += (ulg)outcnt;
-	outcnt = 0;
-}
+#include <linux/decompress/generic.h>
 
 static char * __init unpack_to_rootfs(char *buf, unsigned len, int check_only)
 {
 	int written;
+	decompress_fn decompress;
+	const char *compress_name;
+	static __initdata char msg_buf[64];
+
 	dry_run = check_only;
 	header_buf = kmalloc(110, GFP_KERNEL);
 	symlink_buf = kmalloc(PATH_MAX + N_ALIGN(PATH_MAX) + 1, GFP_KERNEL);
 	name_buf = kmalloc(N_ALIGN(PATH_MAX), GFP_KERNEL);
-	window = kmalloc(WSIZE, GFP_KERNEL);
-	if (!window || !header_buf || !symlink_buf || !name_buf)
+
+	if (!header_buf || !symlink_buf || !name_buf)
 		panic("can't allocate buffers");
+
 	state = Start;
 	this_header = 0;
 	message = NULL;
@@ -505,22 +451,25 @@ static char * __init unpack_to_rootfs(char *buf, unsigned len, int check_only)
 			continue;
 		}
 		this_header = 0;
-		insize = len;
-		inbuf = buf;
-		inptr = 0;
-		outcnt = 0;		/* bytes in output buffer */
-		bytes_out = 0;
-		crc = (ulg)0xffffffffL; /* shift register contents */
-		makecrc();
-		gunzip();
+		decompress = decompress_method(buf, len, &compress_name);
+		if (decompress)
+			decompress(buf, len, NULL, flush_buffer, NULL,
+				   &my_inptr, error);
+		else if (compress_name) {
+			if (!message) {
+				snprintf(msg_buf, sizeof msg_buf,
+					 "compression method %s not configured",
+					 compress_name);
+				message = msg_buf;
+			}
+		}
 		if (state != Reset)
-			error("junk in gzipped archive");
-		this_header = saved_offset + inptr;
-		buf += inptr;
-		len -= inptr;
+			error("junk in compressed archive");
+		this_header = saved_offset + my_inptr;
+		buf += my_inptr;
+		len -= my_inptr;
 	}
 	dir_utime();
-	kfree(window);
 	kfree(name_buf);
 	kfree(symlink_buf);
 	kfree(header_buf);
@@ -579,7 +528,7 @@ static int __init populate_rootfs(void)
 	char *err = unpack_to_rootfs(__initramfs_start,
 			 __initramfs_end - __initramfs_start, 0);
 	if (err)
-		panic(err);
+		panic(err);	/* Failed to decompress INTERNAL initramfs */
 	if (initrd_start) {
 #ifdef CONFIG_BLK_DEV_RAM
 		int fd;
@@ -605,9 +554,12 @@ static int __init populate_rootfs(void)
 		printk(KERN_INFO "Unpacking initramfs...");
 		err = unpack_to_rootfs((char *)initrd_start,
 			initrd_end - initrd_start, 0);
-		if (err)
-			panic(err);
-		printk(" done\n");
+		if (err) {
+			printk(" failed!\n");
+			printk(KERN_EMERG "%s\n", err);
+		} else {
+			printk(" done\n");
+		}
 		free_initrd();
 #endif
 	}
diff --git a/init/main.c b/init/main.c
index 83697e160b3a..6bf83afd654d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -14,6 +14,7 @@
 #include <linux/proc_fs.h>
 #include <linux/kernel.h>
 #include <linux/syscalls.h>
+#include <linux/stackprotector.h>
 #include <linux/string.h>
 #include <linux/ctype.h>
 #include <linux/delay.h>
@@ -135,14 +136,14 @@ unsigned int __initdata setup_max_cpus = NR_CPUS;
  * greater than 0, limits the maximum number of CPUs activated in
  * SMP mode to <NUM>.
  */
-#ifndef CONFIG_X86_IO_APIC
-static inline void disable_ioapic_setup(void) {};
-#endif
+
+void __weak arch_disable_smp_support(void) { }
 
 static int __init nosmp(char *str)
 {
 	setup_max_cpus = 0;
-	disable_ioapic_setup();
+	arch_disable_smp_support();
+
 	return 0;
 }
 
@@ -152,14 +153,14 @@ static int __init maxcpus(char *str)
 {
 	get_option(&str, &setup_max_cpus);
 	if (setup_max_cpus == 0)
-		disable_ioapic_setup();
+		arch_disable_smp_support();
 
 	return 0;
 }
 
 early_param("maxcpus", maxcpus);
 #else
-#define setup_max_cpus NR_CPUS
+const unsigned int setup_max_cpus = NR_CPUS;
 #endif
 
 /*
@@ -540,6 +541,12 @@ asmlinkage void __init start_kernel(void)
 	 */
 	lockdep_init();
 	debug_objects_early_init();
+
+	/*
+	 * Set up the the initial canary ASAP:
+	 */
+	boot_init_stack_canary();
+
 	cgroup_init_early();
 
 	local_irq_disable();
diff --git a/kernel/exit.c b/kernel/exit.c
index efd30ccf3858..167e1e3ad7c6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -980,12 +980,9 @@ static void check_stack_usage(void)
 {
 	static DEFINE_SPINLOCK(low_water_lock);
 	static int lowest_to_date = THREAD_SIZE;
-	unsigned long *n = end_of_stack(current);
 	unsigned long free;
 
-	while (*n == 0)
-		n++;
-	free = (unsigned long)n - (unsigned long)end_of_stack(current);
+	free = stack_not_used(current);
 
 	if (free >= lowest_to_date)
 		return;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4854c2c4a82e..6715ebc3761d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -61,6 +61,7 @@
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
 #include <trace/sched.h>
+#include <linux/magic.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -212,6 +213,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 {
 	struct task_struct *tsk;
 	struct thread_info *ti;
+	unsigned long *stackend;
+
 	int err;
 
 	prepare_to_copy(orig);
@@ -237,6 +240,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 		goto out;
 
 	setup_thread_stack(tsk, orig);
+	stackend = end_of_stack(tsk);
+	*stackend = STACK_END_MAGIC;	/* for overflow detection */
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 	tsk->stack_canary = get_random_int();
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 03d0bed2b8d9..c687ba4363f2 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -46,7 +46,10 @@ void dynamic_irq_init(unsigned int irq)
 	desc->irq_count = 0;
 	desc->irqs_unhandled = 0;
 #ifdef CONFIG_SMP
-	cpumask_setall(&desc->affinity);
+	cpumask_setall(desc->affinity);
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+	cpumask_clear(desc->pending_mask);
+#endif
 #endif
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index f6cdda68e5c6..9ebf77968871 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -17,6 +17,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/rculist.h>
 #include <linux/hash.h>
+#include <linux/bootmem.h>
 
 #include "internals.h"
 
@@ -69,6 +70,7 @@ int nr_irqs = NR_IRQS;
 EXPORT_SYMBOL_GPL(nr_irqs);
 
 #ifdef CONFIG_SPARSE_IRQ
+
 static struct irq_desc irq_desc_init = {
 	.irq	    = -1,
 	.status	    = IRQ_DISABLED,
@@ -76,9 +78,6 @@ static struct irq_desc irq_desc_init = {
 	.handle_irq = handle_bad_irq,
 	.depth      = 1,
 	.lock       = __SPIN_LOCK_UNLOCKED(irq_desc_init.lock),
-#ifdef CONFIG_SMP
-	.affinity   = CPU_MASK_ALL
-#endif
 };
 
 void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr)
@@ -115,6 +114,10 @@ static void init_one_irq_desc(int irq, struct irq_desc *desc, int cpu)
 		printk(KERN_ERR "can not alloc kstat_irqs\n");
 		BUG_ON(1);
 	}
+	if (!init_alloc_desc_masks(desc, cpu, false)) {
+		printk(KERN_ERR "can not alloc irq_desc cpumasks\n");
+		BUG_ON(1);
+	}
 	arch_init_chip_data(desc, cpu);
 }
 
@@ -123,7 +126,7 @@ static void init_one_irq_desc(int irq, struct irq_desc *desc, int cpu)
  */
 DEFINE_SPINLOCK(sparse_irq_lock);
 
-struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;
+struct irq_desc **irq_desc_ptrs __read_mostly;
 
 static struct irq_desc irq_desc_legacy[NR_IRQS_LEGACY] __cacheline_aligned_in_smp = {
 	[0 ... NR_IRQS_LEGACY-1] = {
@@ -133,14 +136,10 @@ static struct irq_desc irq_desc_legacy[NR_IRQS_LEGACY] __cacheline_aligned_in_sm
 		.handle_irq = handle_bad_irq,
 		.depth	    = 1,
 		.lock	    = __SPIN_LOCK_UNLOCKED(irq_desc_init.lock),
-#ifdef CONFIG_SMP
-		.affinity   = CPU_MASK_ALL
-#endif
 	}
 };
 
-/* FIXME: use bootmem alloc ...*/
-static unsigned int kstat_irqs_legacy[NR_IRQS_LEGACY][NR_CPUS];
+static unsigned int *kstat_irqs_legacy;
 
 int __init early_irq_init(void)
 {
@@ -150,18 +149,30 @@ int __init early_irq_init(void)
 
 	init_irq_default_affinity();
 
+	 /* initialize nr_irqs based on nr_cpu_ids */
+	arch_probe_nr_irqs();
+	printk(KERN_INFO "NR_IRQS:%d nr_irqs:%d\n", NR_IRQS, nr_irqs);
+
 	desc = irq_desc_legacy;
 	legacy_count = ARRAY_SIZE(irq_desc_legacy);
 
+	/* allocate irq_desc_ptrs array based on nr_irqs */
+	irq_desc_ptrs = alloc_bootmem(nr_irqs * sizeof(void *));
+
+	/* allocate based on nr_cpu_ids */
+	/* FIXME: invert kstat_irgs, and it'd be a per_cpu_alloc'd thing */
+	kstat_irqs_legacy = alloc_bootmem(NR_IRQS_LEGACY * nr_cpu_ids *
+					  sizeof(int));
+
 	for (i = 0; i < legacy_count; i++) {
 		desc[i].irq = i;
-		desc[i].kstat_irqs = kstat_irqs_legacy[i];
+		desc[i].kstat_irqs = kstat_irqs_legacy + i * nr_cpu_ids;
 		lockdep_set_class(&desc[i].lock, &irq_desc_lock_class);
-
+		init_alloc_desc_masks(&desc[i], 0, true);
 		irq_desc_ptrs[i] = desc + i;
 	}
 
-	for (i = legacy_count; i < NR_IRQS; i++)
+	for (i = legacy_count; i < nr_irqs; i++)
 		irq_desc_ptrs[i] = NULL;
 
 	return arch_early_irq_init();
@@ -169,7 +180,10 @@ int __init early_irq_init(void)
 
 struct irq_desc *irq_to_desc(unsigned int irq)
 {
-	return (irq < NR_IRQS) ? irq_desc_ptrs[irq] : NULL;
+	if (irq_desc_ptrs && irq < nr_irqs)
+		return irq_desc_ptrs[irq];
+
+	return NULL;
 }
 
 struct irq_desc *irq_to_desc_alloc_cpu(unsigned int irq, int cpu)
@@ -178,10 +192,9 @@ struct irq_desc *irq_to_desc_alloc_cpu(unsigned int irq, int cpu)
 	unsigned long flags;
 	int node;
 
-	if (irq >= NR_IRQS) {
-		printk(KERN_WARNING "irq >= NR_IRQS in irq_to_desc_alloc: %d %d\n",
-				irq, NR_IRQS);
-		WARN_ON(1);
+	if (irq >= nr_irqs) {
+		WARN(1, "irq (%d) >= nr_irqs (%d) in irq_to_desc_alloc\n",
+			irq, nr_irqs);
 		return NULL;
 	}
 
@@ -223,9 +236,6 @@ struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
 		.handle_irq = handle_bad_irq,
 		.depth = 1,
 		.lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
-#ifdef CONFIG_SMP
-		.affinity = CPU_MASK_ALL
-#endif
 	}
 };
 
@@ -238,14 +248,16 @@ int __init early_irq_init(void)
 
 	init_irq_default_affinity();
 
+	printk(KERN_INFO "NR_IRQS:%d\n", NR_IRQS);
+
 	desc = irq_desc;
 	count = ARRAY_SIZE(irq_desc);
 
 	for (i = 0; i < count; i++) {
 		desc[i].irq = i;
+		init_alloc_desc_masks(&desc[i], 0, true);
 		desc[i].kstat_irqs = kstat_irqs_all[i];
 	}
-
 	return arch_early_irq_init();
 }
 
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index b60950bf5a16..ee1aa9f8e8b9 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -17,7 +17,14 @@ extern struct lock_class_key irq_desc_lock_class;
 extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);
 extern void clear_kstat_irqs(struct irq_desc *desc);
 extern spinlock_t sparse_irq_lock;
+
+#ifdef CONFIG_SPARSE_IRQ
+/* irq_desc_ptrs allocated at boot time */
+extern struct irq_desc **irq_desc_ptrs;
+#else
+/* irq_desc_ptrs is a fixed size array */
 extern struct irq_desc *irq_desc_ptrs[NR_IRQS];
+#endif
 
 #ifdef CONFIG_PROC_FS
 extern void register_irq_proc(unsigned int irq, struct irq_desc *desc);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ea119effe096..6458e99984c0 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -90,14 +90,14 @@ int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
-		cpumask_copy(&desc->affinity, cpumask);
+		cpumask_copy(desc->affinity, cpumask);
 		desc->chip->set_affinity(irq, cpumask);
 	} else {
 		desc->status |= IRQ_MOVE_PENDING;
-		cpumask_copy(&desc->pending_mask, cpumask);
+		cpumask_copy(desc->pending_mask, cpumask);
 	}
 #else
-	cpumask_copy(&desc->affinity, cpumask);
+	cpumask_copy(desc->affinity, cpumask);
 	desc->chip->set_affinity(irq, cpumask);
 #endif
 	desc->status |= IRQ_AFFINITY_SET;
@@ -119,16 +119,16 @@ static int setup_affinity(unsigned int irq, struct irq_desc *desc)
 	 * one of the targets is online.
 	 */
 	if (desc->status & (IRQ_AFFINITY_SET | IRQ_NO_BALANCING)) {
-		if (cpumask_any_and(&desc->affinity, cpu_online_mask)
+		if (cpumask_any_and(desc->affinity, cpu_online_mask)
 		    < nr_cpu_ids)
 			goto set_affinity;
 		else
 			desc->status &= ~IRQ_AFFINITY_SET;
 	}
 
-	cpumask_and(&desc->affinity, cpu_online_mask, irq_default_affinity);
+	cpumask_and(desc->affinity, cpu_online_mask, irq_default_affinity);
 set_affinity:
-	desc->chip->set_affinity(irq, &desc->affinity);
+	desc->chip->set_affinity(irq, desc->affinity);
 
 	return 0;
 }
diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c
index bd72329e630c..e05ad9be43b7 100644
--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -18,7 +18,7 @@ void move_masked_irq(int irq)
 
 	desc->status &= ~IRQ_MOVE_PENDING;
 
-	if (unlikely(cpumask_empty(&desc->pending_mask)))
+	if (unlikely(cpumask_empty(desc->pending_mask)))
 		return;
 
 	if (!desc->chip->set_affinity)
@@ -38,13 +38,13 @@ void move_masked_irq(int irq)
 	 * For correct operation this depends on the caller
 	 * masking the irqs.
 	 */
-	if (likely(cpumask_any_and(&desc->pending_mask, cpu_online_mask)
+	if (likely(cpumask_any_and(desc->pending_mask, cpu_online_mask)
 		   < nr_cpu_ids)) {
-		cpumask_and(&desc->affinity,
-			    &desc->pending_mask, cpu_online_mask);
-		desc->chip->set_affinity(irq, &desc->affinity);
+		cpumask_and(desc->affinity,
+			    desc->pending_mask, cpu_online_mask);
+		desc->chip->set_affinity(irq, desc->affinity);
 	}
-	cpumask_clear(&desc->pending_mask);
+	cpumask_clear(desc->pending_mask);
 }
 
 void move_native_irq(int irq)
diff --git a/kernel/irq/numa_migrate.c b/kernel/irq/numa_migrate.c
index aef18ab6b75b..243d6121e50e 100644
--- a/kernel/irq/numa_migrate.c
+++ b/kernel/irq/numa_migrate.c
@@ -33,15 +33,22 @@ static void free_kstat_irqs(struct irq_desc *old_desc, struct irq_desc *desc)
 	old_desc->kstat_irqs = NULL;
 }
 
-static void init_copy_one_irq_desc(int irq, struct irq_desc *old_desc,
+static bool init_copy_one_irq_desc(int irq, struct irq_desc *old_desc,
 		 struct irq_desc *desc, int cpu)
 {
 	memcpy(desc, old_desc, sizeof(struct irq_desc));
+	if (!init_alloc_desc_masks(desc, cpu, false)) {
+		printk(KERN_ERR "irq %d: can not get new irq_desc cpumask "
+				"for migration.\n", irq);
+		return false;
+	}
 	spin_lock_init(&desc->lock);
 	desc->cpu = cpu;
 	lockdep_set_class(&desc->lock, &irq_desc_lock_class);
 	init_copy_kstat_irqs(old_desc, desc, cpu, nr_cpu_ids);
+	init_copy_desc_masks(old_desc, desc);
 	arch_init_copy_chip_data(old_desc, desc, cpu);
+	return true;
 }
 
 static void free_one_irq_desc(struct irq_desc *old_desc, struct irq_desc *desc)
@@ -71,12 +78,18 @@ static struct irq_desc *__real_move_irq_desc(struct irq_desc *old_desc,
 	node = cpu_to_node(cpu);
 	desc = kzalloc_node(sizeof(*desc), GFP_ATOMIC, node);
 	if (!desc) {
-		printk(KERN_ERR "irq %d: can not get new irq_desc for migration.\n", irq);
+		printk(KERN_ERR "irq %d: can not get new irq_desc "
+				"for migration.\n", irq);
+		/* still use old one */
+		desc = old_desc;
+		goto out_unlock;
+	}
+	if (!init_copy_one_irq_desc(irq, old_desc, desc, cpu)) {
 		/* still use old one */
+		kfree(desc);
 		desc = old_desc;
 		goto out_unlock;
 	}
-	init_copy_one_irq_desc(irq, old_desc, desc, cpu);
 
 	irq_desc_ptrs[irq] = desc;
 	spin_unlock_irqrestore(&sparse_irq_lock, flags);
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index aae3f742bcec..692363dd591f 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -20,11 +20,11 @@ static struct proc_dir_entry *root_irq_dir;
 static int irq_affinity_proc_show(struct seq_file *m, void *v)
 {
 	struct irq_desc *desc = irq_to_desc((long)m->private);
-	const struct cpumask *mask = &desc->affinity;
+	const struct cpumask *mask = desc->affinity;
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
 	if (desc->status & IRQ_MOVE_PENDING)
-		mask = &desc->pending_mask;
+		mask = desc->pending_mask;
 #endif
 	seq_cpumask(m, mask);
 	seq_putc(m, '\n');
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 483899578259..c7fd6692939d 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1130,7 +1130,7 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
 		return;
 	memset(&prstatus, 0, sizeof(prstatus));
 	prstatus.pr_pid = current->pid;
-	elf_core_copy_regs(&prstatus.pr_reg, regs);
+	elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
 	buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
 		      	      &prstatus, sizeof(prstatus));
 	final_note(buf);
diff --git a/kernel/module.c b/kernel/module.c
index 77672233387f..f77ac320d0b5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -51,6 +51,7 @@
 #include <linux/tracepoint.h>
 #include <linux/ftrace.h>
 #include <linux/async.h>
+#include <linux/percpu.h>
 
 #if 0
 #define DEBUGP printk
@@ -366,6 +367,34 @@ static struct module *find_module(const char *name)
 }
 
 #ifdef CONFIG_SMP
+
+#ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA
+
+static void *percpu_modalloc(unsigned long size, unsigned long align,
+			     const char *name)
+{
+	void *ptr;
+
+	if (align > PAGE_SIZE) {
+		printk(KERN_WARNING "%s: per-cpu alignment %li > %li\n",
+		       name, align, PAGE_SIZE);
+		align = PAGE_SIZE;
+	}
+
+	ptr = __alloc_reserved_percpu(size, align);
+	if (!ptr)
+		printk(KERN_WARNING
+		       "Could not allocate %lu bytes percpu data\n", size);
+	return ptr;
+}
+
+static void percpu_modfree(void *freeme)
+{
+	free_percpu(freeme);
+}
+
+#else /* ... !CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
+
 /* Number of blocks used and allocated. */
 static unsigned int pcpu_num_used, pcpu_num_allocated;
 /* Size of each block.  -ve means used. */
@@ -480,21 +509,6 @@ static void percpu_modfree(void *freeme)
 	}
 }
 
-static unsigned int find_pcpusec(Elf_Ehdr *hdr,
-				 Elf_Shdr *sechdrs,
-				 const char *secstrings)
-{
-	return find_sec(hdr, sechdrs, secstrings, ".data.percpu");
-}
-
-static void percpu_modcopy(void *pcpudest, const void *from, unsigned long size)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu)
-		memcpy(pcpudest + per_cpu_offset(cpu), from, size);
-}
-
 static int percpu_modinit(void)
 {
 	pcpu_num_used = 2;
@@ -513,7 +527,26 @@ static int percpu_modinit(void)
 	return 0;
 }
 __initcall(percpu_modinit);
+
+#endif /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
+
+static unsigned int find_pcpusec(Elf_Ehdr *hdr,
+				 Elf_Shdr *sechdrs,
+				 const char *secstrings)
+{
+	return find_sec(hdr, sechdrs, secstrings, ".data.percpu");
+}
+
+static void percpu_modcopy(void *pcpudest, const void *from, unsigned long size)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		memcpy(pcpudest + per_cpu_offset(cpu), from, size);
+}
+
 #else /* ... !CONFIG_SMP */
+
 static inline void *percpu_modalloc(unsigned long size, unsigned long align,
 				    const char *name)
 {
@@ -535,6 +568,7 @@ static inline void percpu_modcopy(void *pcpudst, const void *src,
 	/* pcpusec should be 0, and size of that section should be 0. */
 	BUG_ON(size != 0);
 }
+
 #endif /* CONFIG_SMP */
 
 #define MODINFO_ATTR(field)	\
diff --git a/kernel/panic.c b/kernel/panic.c
index 2a2ff36ff44d..32fe4eff1b89 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -74,6 +74,9 @@ NORET_TYPE void panic(const char * fmt, ...)
 	vsnprintf(buf, sizeof(buf), fmt, args);
 	va_end(args);
 	printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
+#ifdef CONFIG_DEBUG_BUGVERBOSE
+	dump_stack();
+#endif
 	bust_spinlocks(0);
 
 	/*
@@ -355,15 +358,18 @@ EXPORT_SYMBOL(warn_slowpath);
 #endif
 
 #ifdef CONFIG_CC_STACKPROTECTOR
+
 /*
  * Called when gcc's -fstack-protector feature is used, and
  * gcc detects corruption of the on-stack canary value
  */
 void __stack_chk_fail(void)
 {
-	panic("stack-protector: Kernel stack is corrupted");
+	panic("stack-protector: Kernel stack is corrupted in: %p\n",
+		__builtin_return_address(0));
 }
 EXPORT_SYMBOL(__stack_chk_fail);
+
 #endif
 
 core_param(panic, panic_timeout, int, 0644);
diff --git a/kernel/sched.c b/kernel/sched.c
index 9f8506d68fdc..f4c413bdd38d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6342,12 +6342,7 @@ void sched_show_task(struct task_struct *p)
 		printk(KERN_CONT " %016lx ", thread_saved_pc(p));
 #endif
 #ifdef CONFIG_DEBUG_STACK_USAGE
-	{
-		unsigned long *n = end_of_stack(p);
-		while (!*n)
-			n++;
-		free = (unsigned long)n - (unsigned long)end_of_stack(p);
-	}
+	free = stack_not_used(p);
 #endif
 	printk(KERN_CONT "%5lu %5d %6d\n", free,
 		task_pid_nr(p), task_pid_nr(p->real_parent));
@@ -9892,7 +9887,7 @@ cpuacct_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
 
 static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu)
 {
-	u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);
+	u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
 	u64 data;
 
 #ifndef CONFIG_64BIT
@@ -9911,7 +9906,7 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu)
 
 static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val)
 {
-	u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);
+	u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
 
 #ifndef CONFIG_64BIT
 	/*
@@ -10007,7 +10002,7 @@ static void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	ca = task_ca(tsk);
 
 	for (; ca; ca = ca->parent) {
-		u64 *cpuusage = percpu_ptr(ca->cpuusage, cpu);
+		u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu);
 		*cpuusage += cputime;
 	}
 }
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index c79dc7844012..299d012b4394 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1122,12 +1122,13 @@ static struct task_struct *pick_next_highest_task_rt(struct rq *rq, int cpu)
 
 static DEFINE_PER_CPU(cpumask_var_t, local_cpu_mask);
 
-static inline int pick_optimal_cpu(int this_cpu, cpumask_t *mask)
+static inline int pick_optimal_cpu(int this_cpu,
+				   const struct cpumask *mask)
 {
 	int first;
 
 	/* "this_cpu" is cheaper to preempt than a remote processor */
-	if ((this_cpu != -1) && cpu_isset(this_cpu, *mask))
+	if ((this_cpu != -1) && cpumask_test_cpu(this_cpu, mask))
 		return this_cpu;
 
 	first = cpumask_first(mask);
@@ -1143,6 +1144,7 @@ static int find_lowest_rq(struct task_struct *task)
 	struct cpumask *lowest_mask = __get_cpu_var(local_cpu_mask);
 	int this_cpu = smp_processor_id();
 	int cpu      = task_cpu(task);
+	cpumask_var_t domain_mask;
 
 	if (task->rt.nr_cpus_allowed == 1)
 		return -1; /* No other targets possible */
@@ -1175,19 +1177,25 @@ static int find_lowest_rq(struct task_struct *task)
 	if (this_cpu == cpu)
 		this_cpu = -1; /* Skip this_cpu opt if the same */
 
-	for_each_domain(cpu, sd) {
-		if (sd->flags & SD_WAKE_AFFINE) {
-			cpumask_t domain_mask;
-			int       best_cpu;
+	if (alloc_cpumask_var(&domain_mask, GFP_ATOMIC)) {
+		for_each_domain(cpu, sd) {
+			if (sd->flags & SD_WAKE_AFFINE) {
+				int best_cpu;
 
-			cpumask_and(&domain_mask, sched_domain_span(sd),
-				    lowest_mask);
+				cpumask_and(domain_mask,
+					    sched_domain_span(sd),
+					    lowest_mask);
 
-			best_cpu = pick_optimal_cpu(this_cpu,
-						    &domain_mask);
-			if (best_cpu != -1)
-				return best_cpu;
+				best_cpu = pick_optimal_cpu(this_cpu,
+							    domain_mask);
+
+				if (best_cpu != -1) {
+					free_cpumask_var(domain_mask);
+					return best_cpu;
+				}
+			}
 		}
+		free_cpumask_var(domain_mask);
 	}
 
 	/*
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 9041ea7948fe..57d3f67f6f38 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -796,6 +796,11 @@ int __init __weak early_irq_init(void)
 	return 0;
 }
 
+int __init __weak arch_probe_nr_irqs(void)
+{
+	return 0;
+}
+
 int __init __weak arch_early_irq_init(void)
 {
 	return 0;
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 0cd415ee62a2..74541ca49536 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -170,7 +170,7 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 	 * doesn't hit this CPU until we're ready. */
 	get_cpu();
 	for_each_online_cpu(i) {
-		sm_work = percpu_ptr(stop_machine_work, i);
+		sm_work = per_cpu_ptr(stop_machine_work, i);
 		INIT_WORK(sm_work, stop_cpu);
 		queue_work_on(i, stop_machine_wq, sm_work);
 	}
diff --git a/lib/Kconfig b/lib/Kconfig
index 54aaf4feaf6c..2a9c69f34482 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -98,6 +98,20 @@ config LZO_DECOMPRESS
 	tristate
 
 #
+# These all provide a common interface (hence the apparent duplication with
+# ZLIB_INFLATE; DECOMPRESS_GZIP is just a wrapper.)
+#
+config DECOMPRESS_GZIP
+	select ZLIB_INFLATE
+	tristate
+
+config DECOMPRESS_BZIP2
+	tristate
+
+config DECOMPRESS_LZMA
+	tristate
+
+#
 # Generic allocator support is selected if needed
 #
 config GENERIC_ALLOCATOR
diff --git a/lib/Makefile b/lib/Makefile
index 8bdc647e6d62..051a33a8e028 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
 	 sha1.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
-	 is_single_threaded.o plist.o
+	 is_single_threaded.o plist.o decompress.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -65,6 +65,10 @@ obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
 obj-$(CONFIG_LZO_COMPRESS) += lzo/
 obj-$(CONFIG_LZO_DECOMPRESS) += lzo/
 
+lib-$(CONFIG_DECOMPRESS_GZIP) += decompress_inflate.o
+lib-$(CONFIG_DECOMPRESS_BZIP2) += decompress_bunzip2.o
+lib-$(CONFIG_DECOMPRESS_LZMA) += decompress_unlzma.o
+
 obj-$(CONFIG_TEXTSEARCH) += textsearch.o
 obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
 obj-$(CONFIG_TEXTSEARCH_BM) += ts_bm.o
diff --git a/lib/decompress.c b/lib/decompress.c
new file mode 100644
index 000000000000..d2842f571674
--- /dev/null
+++ b/lib/decompress.c
@@ -0,0 +1,54 @@
+/*
+ * decompress.c
+ *
+ * Detect the decompression method based on magic number
+ */
+
+#include <linux/decompress/generic.h>
+
+#include <linux/decompress/bunzip2.h>
+#include <linux/decompress/unlzma.h>
+#include <linux/decompress/inflate.h>
+
+#include <linux/types.h>
+#include <linux/string.h>
+
+#ifndef CONFIG_DECOMPRESS_GZIP
+# define gunzip NULL
+#endif
+#ifndef CONFIG_DECOMPRESS_BZIP2
+# define bunzip2 NULL
+#endif
+#ifndef CONFIG_DECOMPRESS_LZMA
+# define unlzma NULL
+#endif
+
+static const struct compress_format {
+	unsigned char magic[2];
+	const char *name;
+	decompress_fn decompressor;
+} compressed_formats[] = {
+	{ {037, 0213}, "gzip", gunzip },
+	{ {037, 0236}, "gzip", gunzip },
+	{ {0x42, 0x5a}, "bzip2", bunzip2 },
+	{ {0x5d, 0x00}, "lzma", unlzma },
+	{ {0, 0}, NULL, NULL }
+};
+
+decompress_fn decompress_method(const unsigned char *inbuf, int len,
+				const char **name)
+{
+	const struct compress_format *cf;
+
+	if (len < 2)
+		return NULL;	/* Need at least this much... */
+
+	for (cf = compressed_formats; cf->name; cf++) {
+		if (!memcmp(inbuf, cf->magic, 2))
+			break;
+
+	}
+	if (name)
+		*name = cf->name;
+	return cf->decompressor;
+}
diff --git a/lib/decompress_bunzip2.c b/lib/decompress_bunzip2.c
new file mode 100644
index 000000000000..5d3ddb5fcfd9
--- /dev/null
+++ b/lib/decompress_bunzip2.c
@@ -0,0 +1,735 @@
+/* vi: set sw = 4 ts = 4: */
+/*	Small bzip2 deflate implementation, by Rob Landley (rob@landley.net).
+
+	Based on bzip2 decompression code by Julian R Seward (jseward@acm.org),
+	which also acknowledges contributions by Mike Burrows, David Wheeler,
+	Peter Fenwick, Alistair Moffat, Radford Neal, Ian H. Witten,
+	Robert Sedgewick, and Jon L. Bentley.
+
+	This code is licensed under the LGPLv2:
+		LGPL (http://www.gnu.org/copyleft/lgpl.html
+*/
+
+/*
+	Size and speed optimizations by Manuel Novoa III  (mjn3@codepoet.org).
+
+	More efficient reading of Huffman codes, a streamlined read_bunzip()
+	function, and various other tweaks.  In (limited) tests, approximately
+	20% faster than bzcat on x86 and about 10% faster on arm.
+
+	Note that about 2/3 of the time is spent in read_unzip() reversing
+	the Burrows-Wheeler transformation.  Much of that time is delay
+	resulting from cache misses.
+
+	I would ask that anyone benefiting from this work, especially those
+	using it in commercial products, consider making a donation to my local
+	non-profit hospice organization in the name of the woman I loved, who
+	passed away Feb. 12, 2003.
+
+		In memory of Toni W. Hagan
+
+		Hospice of Acadiana, Inc.
+		2600 Johnston St., Suite 200
+		Lafayette, LA 70503-3240
+
+		Phone (337) 232-1234 or 1-800-738-2226
+		Fax   (337) 232-1297
+
+		http://www.hospiceacadiana.com/
+
+	Manuel
+ */
+
+/*
+	Made it fit for running in Linux Kernel by Alain Knaff (alain@knaff.lu)
+*/
+
+
+#ifndef STATIC
+#include <linux/decompress/bunzip2.h>
+#endif /* !STATIC */
+
+#include <linux/decompress/mm.h>
+
+#ifndef INT_MAX
+#define INT_MAX 0x7fffffff
+#endif
+
+/* Constants for Huffman coding */
+#define MAX_GROUPS		6
+#define GROUP_SIZE   		50	/* 64 would have been more efficient */
+#define MAX_HUFCODE_BITS 	20	/* Longest Huffman code allowed */
+#define MAX_SYMBOLS 		258	/* 256 literals + RUNA + RUNB */
+#define SYMBOL_RUNA		0
+#define SYMBOL_RUNB		1
+
+/* Status return values */
+#define RETVAL_OK			0
+#define RETVAL_LAST_BLOCK		(-1)
+#define RETVAL_NOT_BZIP_DATA		(-2)
+#define RETVAL_UNEXPECTED_INPUT_EOF	(-3)
+#define RETVAL_UNEXPECTED_OUTPUT_EOF	(-4)
+#define RETVAL_DATA_ERROR		(-5)
+#define RETVAL_OUT_OF_MEMORY		(-6)
+#define RETVAL_OBSOLETE_INPUT		(-7)
+
+/* Other housekeeping constants */
+#define BZIP2_IOBUF_SIZE		4096
+
+/* This is what we know about each Huffman coding group */
+struct group_data {
+	/* We have an extra slot at the end of limit[] for a sentinal value. */
+	int limit[MAX_HUFCODE_BITS+1];
+	int base[MAX_HUFCODE_BITS];
+	int permute[MAX_SYMBOLS];
+	int minLen, maxLen;
+};
+
+/* Structure holding all the housekeeping data, including IO buffers and
+   memory that persists between calls to bunzip */
+struct bunzip_data {
+	/* State for interrupting output loop */
+	int writeCopies, writePos, writeRunCountdown, writeCount, writeCurrent;
+	/* I/O tracking data (file handles, buffers, positions, etc.) */
+	int (*fill)(void*, unsigned int);
+	int inbufCount, inbufPos /*, outbufPos*/;
+	unsigned char *inbuf /*,*outbuf*/;
+	unsigned int inbufBitCount, inbufBits;
+	/* The CRC values stored in the block header and calculated from the
+	data */
+	unsigned int crc32Table[256], headerCRC, totalCRC, writeCRC;
+	/* Intermediate buffer and its size (in bytes) */
+	unsigned int *dbuf, dbufSize;
+	/* These things are a bit too big to go on the stack */
+	unsigned char selectors[32768];		/* nSelectors = 15 bits */
+	struct group_data groups[MAX_GROUPS];	/* Huffman coding tables */
+	int io_error;			/* non-zero if we have IO error */
+};
+
+
+/* Return the next nnn bits of input.  All reads from the compressed input
+   are done through this function.  All reads are big endian */
+static unsigned int INIT get_bits(struct bunzip_data *bd, char bits_wanted)
+{
+	unsigned int bits = 0;
+
+	/* If we need to get more data from the byte buffer, do so.
+	   (Loop getting one byte at a time to enforce endianness and avoid
+	   unaligned access.) */
+	while (bd->inbufBitCount < bits_wanted) {
+		/* If we need to read more data from file into byte buffer, do
+		   so */
+		if (bd->inbufPos == bd->inbufCount) {
+			if (bd->io_error)
+				return 0;
+			bd->inbufCount = bd->fill(bd->inbuf, BZIP2_IOBUF_SIZE);
+			if (bd->inbufCount <= 0) {
+				bd->io_error = RETVAL_UNEXPECTED_INPUT_EOF;
+				return 0;
+			}
+			bd->inbufPos = 0;
+		}
+		/* Avoid 32-bit overflow (dump bit buffer to top of output) */
+		if (bd->inbufBitCount >= 24) {
+			bits = bd->inbufBits&((1 << bd->inbufBitCount)-1);
+			bits_wanted -= bd->inbufBitCount;
+			bits <<= bits_wanted;
+			bd->inbufBitCount = 0;
+		}
+		/* Grab next 8 bits of input from buffer. */
+		bd->inbufBits = (bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
+		bd->inbufBitCount += 8;
+	}
+	/* Calculate result */
+	bd->inbufBitCount -= bits_wanted;
+	bits |= (bd->inbufBits >> bd->inbufBitCount)&((1 << bits_wanted)-1);
+
+	return bits;
+}
+
+/* Unpacks the next block and sets up for the inverse burrows-wheeler step. */
+
+static int INIT get_next_block(struct bunzip_data *bd)
+{
+	struct group_data *hufGroup = NULL;
+	int *base = NULL;
+	int *limit = NULL;
+	int dbufCount, nextSym, dbufSize, groupCount, selector,
+		i, j, k, t, runPos, symCount, symTotal, nSelectors,
+		byteCount[256];
+	unsigned char uc, symToByte[256], mtfSymbol[256], *selectors;
+	unsigned int *dbuf, origPtr;
+
+	dbuf = bd->dbuf;
+	dbufSize = bd->dbufSize;
+	selectors = bd->selectors;
+
+	/* Read in header signature and CRC, then validate signature.
+	   (last block signature means CRC is for whole file, return now) */
+	i = get_bits(bd, 24);
+	j = get_bits(bd, 24);
+	bd->headerCRC = get_bits(bd, 32);
+	if ((i == 0x177245) && (j == 0x385090))
+		return RETVAL_LAST_BLOCK;
+	if ((i != 0x314159) || (j != 0x265359))
+		return RETVAL_NOT_BZIP_DATA;
+	/* We can add support for blockRandomised if anybody complains.
+	   There was some code for this in busybox 1.0.0-pre3, but nobody ever
+	   noticed that it didn't actually work. */
+	if (get_bits(bd, 1))
+		return RETVAL_OBSOLETE_INPUT;
+	origPtr = get_bits(bd, 24);
+	if (origPtr > dbufSize)
+		return RETVAL_DATA_ERROR;
+	/* mapping table: if some byte values are never used (encoding things
+	   like ascii text), the compression code removes the gaps to have fewer
+	   symbols to deal with, and writes a sparse bitfield indicating which
+	   values were present.  We make a translation table to convert the
+	   symbols back to the corresponding bytes. */
+	t = get_bits(bd, 16);
+	symTotal = 0;
+	for (i = 0; i < 16; i++) {
+		if (t&(1 << (15-i))) {
+			k = get_bits(bd, 16);
+			for (j = 0; j < 16; j++)
+				if (k&(1 << (15-j)))
+					symToByte[symTotal++] = (16*i)+j;
+		}
+	}
+	/* How many different Huffman coding groups does this block use? */
+	groupCount = get_bits(bd, 3);
+	if (groupCount < 2 || groupCount > MAX_GROUPS)
+		return RETVAL_DATA_ERROR;
+	/* nSelectors: Every GROUP_SIZE many symbols we select a new
+	   Huffman coding group.  Read in the group selector list,
+	   which is stored as MTF encoded bit runs.  (MTF = Move To
+	   Front, as each value is used it's moved to the start of the
+	   list.) */
+	nSelectors = get_bits(bd, 15);
+	if (!nSelectors)
+		return RETVAL_DATA_ERROR;
+	for (i = 0; i < groupCount; i++)
+		mtfSymbol[i] = i;
+	for (i = 0; i < nSelectors; i++) {
+		/* Get next value */
+		for (j = 0; get_bits(bd, 1); j++)
+			if (j >= groupCount)
+				return RETVAL_DATA_ERROR;
+		/* Decode MTF to get the next selector */
+		uc = mtfSymbol[j];
+		for (; j; j--)
+			mtfSymbol[j] = mtfSymbol[j-1];
+		mtfSymbol[0] = selectors[i] = uc;
+	}
+	/* Read the Huffman coding tables for each group, which code
+	   for symTotal literal symbols, plus two run symbols (RUNA,
+	   RUNB) */
+	symCount = symTotal+2;
+	for (j = 0; j < groupCount; j++) {
+		unsigned char length[MAX_SYMBOLS], temp[MAX_HUFCODE_BITS+1];
+		int	minLen,	maxLen, pp;
+		/* Read Huffman code lengths for each symbol.  They're
+		   stored in a way similar to mtf; record a starting
+		   value for the first symbol, and an offset from the
+		   previous value for everys symbol after that.
+		   (Subtracting 1 before the loop and then adding it
+		   back at the end is an optimization that makes the
+		   test inside the loop simpler: symbol length 0
+		   becomes negative, so an unsigned inequality catches
+		   it.) */
+		t = get_bits(bd, 5)-1;
+		for (i = 0; i < symCount; i++) {
+			for (;;) {
+				if (((unsigned)t) > (MAX_HUFCODE_BITS-1))
+					return RETVAL_DATA_ERROR;
+
+				/* If first bit is 0, stop.  Else
+				   second bit indicates whether to
+				   increment or decrement the value.
+				   Optimization: grab 2 bits and unget
+				   the second if the first was 0. */
+
+				k = get_bits(bd, 2);
+				if (k < 2) {
+					bd->inbufBitCount++;
+					break;
+				}
+				/* Add one if second bit 1, else
+				 * subtract 1.  Avoids if/else */
+				t += (((k+1)&2)-1);
+			}
+			/* Correct for the initial -1, to get the
+			 * final symbol length */
+			length[i] = t+1;
+		}
+		/* Find largest and smallest lengths in this group */
+		minLen = maxLen = length[0];
+
+		for (i = 1; i < symCount; i++) {
+			if (length[i] > maxLen)
+				maxLen = length[i];
+			else if (length[i] < minLen)
+				minLen = length[i];
+		}
+
+		/* Calculate permute[], base[], and limit[] tables from
+		 * length[].
+		 *
+		 * permute[] is the lookup table for converting
+		 * Huffman coded symbols into decoded symbols.  base[]
+		 * is the amount to subtract from the value of a
+		 * Huffman symbol of a given length when using
+		 * permute[].
+		 *
+		 * limit[] indicates the largest numerical value a
+		 * symbol with a given number of bits can have.  This
+		 * is how the Huffman codes can vary in length: each
+		 * code with a value > limit[length] needs another
+		 * bit.
+		 */
+		hufGroup = bd->groups+j;
+		hufGroup->minLen = minLen;
+		hufGroup->maxLen = maxLen;
+		/* Note that minLen can't be smaller than 1, so we
+		   adjust the base and limit array pointers so we're
+		   not always wasting the first entry.  We do this
+		   again when using them (during symbol decoding).*/
+		base = hufGroup->base-1;
+		limit = hufGroup->limit-1;
+		/* Calculate permute[].  Concurently, initialize
+		 * temp[] and limit[]. */
+		pp = 0;
+		for (i = minLen; i <= maxLen; i++) {
+			temp[i] = limit[i] = 0;
+			for (t = 0; t < symCount; t++)
+				if (length[t] == i)
+					hufGroup->permute[pp++] = t;
+		}
+		/* Count symbols coded for at each bit length */
+		for (i = 0; i < symCount; i++)
+			temp[length[i]]++;
+		/* Calculate limit[] (the largest symbol-coding value
+		 *at each bit length, which is (previous limit <<
+		 *1)+symbols at this level), and base[] (number of
+		 *symbols to ignore at each bit length, which is limit
+		 *minus the cumulative count of symbols coded for
+		 *already). */
+		pp = t = 0;
+		for (i = minLen; i < maxLen; i++) {
+			pp += temp[i];
+			/* We read the largest possible symbol size
+			   and then unget bits after determining how
+			   many we need, and those extra bits could be
+			   set to anything.  (They're noise from
+			   future symbols.)  At each level we're
+			   really only interested in the first few
+			   bits, so here we set all the trailing
+			   to-be-ignored bits to 1 so they don't
+			   affect the value > limit[length]
+			   comparison. */
+			limit[i] = (pp << (maxLen - i)) - 1;
+			pp <<= 1;
+			base[i+1] = pp-(t += temp[i]);
+		}
+		limit[maxLen+1] = INT_MAX; /* Sentinal value for
+					    * reading next sym. */
+		limit[maxLen] = pp+temp[maxLen]-1;
+		base[minLen] = 0;
+	}
+	/* We've finished reading and digesting the block header.  Now
+	   read this block's Huffman coded symbols from the file and
+	   undo the Huffman coding and run length encoding, saving the
+	   result into dbuf[dbufCount++] = uc */
+
+	/* Initialize symbol occurrence counters and symbol Move To
+	 * Front table */
+	for (i = 0; i < 256; i++) {
+		byteCount[i] = 0;
+		mtfSymbol[i] = (unsigned char)i;
+	}
+	/* Loop through compressed symbols. */
+	runPos = dbufCount = symCount = selector = 0;
+	for (;;) {
+		/* Determine which Huffman coding group to use. */
+		if (!(symCount--)) {
+			symCount = GROUP_SIZE-1;
+			if (selector >= nSelectors)
+				return RETVAL_DATA_ERROR;
+			hufGroup = bd->groups+selectors[selector++];
+			base = hufGroup->base-1;
+			limit = hufGroup->limit-1;
+		}
+		/* Read next Huffman-coded symbol. */
+		/* Note: It is far cheaper to read maxLen bits and
+		   back up than it is to read minLen bits and then an
+		   additional bit at a time, testing as we go.
+		   Because there is a trailing last block (with file
+		   CRC), there is no danger of the overread causing an
+		   unexpected EOF for a valid compressed file.  As a
+		   further optimization, we do the read inline
+		   (falling back to a call to get_bits if the buffer
+		   runs dry).  The following (up to got_huff_bits:) is
+		   equivalent to j = get_bits(bd, hufGroup->maxLen);
+		 */
+		while (bd->inbufBitCount < hufGroup->maxLen) {
+			if (bd->inbufPos == bd->inbufCount) {
+				j = get_bits(bd, hufGroup->maxLen);
+				goto got_huff_bits;
+			}
+			bd->inbufBits =
+				(bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
+			bd->inbufBitCount += 8;
+		};
+		bd->inbufBitCount -= hufGroup->maxLen;
+		j = (bd->inbufBits >> bd->inbufBitCount)&
+			((1 << hufGroup->maxLen)-1);
+got_huff_bits:
+		/* Figure how how many bits are in next symbol and
+		 * unget extras */
+		i = hufGroup->minLen;
+		while (j > limit[i])
+			++i;
+		bd->inbufBitCount += (hufGroup->maxLen - i);
+		/* Huffman decode value to get nextSym (with bounds checking) */
+		if ((i > hufGroup->maxLen)
+			|| (((unsigned)(j = (j>>(hufGroup->maxLen-i))-base[i]))
+				>= MAX_SYMBOLS))
+			return RETVAL_DATA_ERROR;
+		nextSym = hufGroup->permute[j];
+		/* We have now decoded the symbol, which indicates
+		   either a new literal byte, or a repeated run of the
+		   most recent literal byte.  First, check if nextSym
+		   indicates a repeated run, and if so loop collecting
+		   how many times to repeat the last literal. */
+		if (((unsigned)nextSym) <= SYMBOL_RUNB) { /* RUNA or RUNB */
+			/* If this is the start of a new run, zero out
+			 * counter */
+			if (!runPos) {
+				runPos = 1;
+				t = 0;
+			}
+			/* Neat trick that saves 1 symbol: instead of
+			   or-ing 0 or 1 at each bit position, add 1
+			   or 2 instead.  For example, 1011 is 1 << 0
+			   + 1 << 1 + 2 << 2.  1010 is 2 << 0 + 2 << 1
+			   + 1 << 2.  You can make any bit pattern
+			   that way using 1 less symbol than the basic
+			   or 0/1 method (except all bits 0, which
+			   would use no symbols, but a run of length 0
+			   doesn't mean anything in this context).
+			   Thus space is saved. */
+			t += (runPos << nextSym);
+			/* +runPos if RUNA; +2*runPos if RUNB */
+
+			runPos <<= 1;
+			continue;
+		}
+		/* When we hit the first non-run symbol after a run,
+		   we now know how many times to repeat the last
+		   literal, so append that many copies to our buffer
+		   of decoded symbols (dbuf) now.  (The last literal
+		   used is the one at the head of the mtfSymbol
+		   array.) */
+		if (runPos) {
+			runPos = 0;
+			if (dbufCount+t >= dbufSize)
+				return RETVAL_DATA_ERROR;
+
+			uc = symToByte[mtfSymbol[0]];
+			byteCount[uc] += t;
+			while (t--)
+				dbuf[dbufCount++] = uc;
+		}
+		/* Is this the terminating symbol? */
+		if (nextSym > symTotal)
+			break;
+		/* At this point, nextSym indicates a new literal
+		   character.  Subtract one to get the position in the
+		   MTF array at which this literal is currently to be
+		   found.  (Note that the result can't be -1 or 0,
+		   because 0 and 1 are RUNA and RUNB.  But another
+		   instance of the first symbol in the mtf array,
+		   position 0, would have been handled as part of a
+		   run above.  Therefore 1 unused mtf position minus 2
+		   non-literal nextSym values equals -1.) */
+		if (dbufCount >= dbufSize)
+			return RETVAL_DATA_ERROR;
+		i = nextSym - 1;
+		uc = mtfSymbol[i];
+		/* Adjust the MTF array.  Since we typically expect to
+		 *move only a small number of symbols, and are bound
+		 *by 256 in any case, using memmove here would
+		 *typically be bigger and slower due to function call
+		 *overhead and other assorted setup costs. */
+		do {
+			mtfSymbol[i] = mtfSymbol[i-1];
+		} while (--i);
+		mtfSymbol[0] = uc;
+		uc = symToByte[uc];
+		/* We have our literal byte.  Save it into dbuf. */
+		byteCount[uc]++;
+		dbuf[dbufCount++] = (unsigned int)uc;
+	}
+	/* At this point, we've read all the Huffman-coded symbols
+	   (and repeated runs) for this block from the input stream,
+	   and decoded them into the intermediate buffer.  There are
+	   dbufCount many decoded bytes in dbuf[].  Now undo the
+	   Burrows-Wheeler transform on dbuf.  See
+	   http://dogma.net/markn/articles/bwt/bwt.htm
+	 */
+	/* Turn byteCount into cumulative occurrence counts of 0 to n-1. */
+	j = 0;
+	for (i = 0; i < 256; i++) {
+		k = j+byteCount[i];
+		byteCount[i] = j;
+		j = k;
+	}
+	/* Figure out what order dbuf would be in if we sorted it. */
+	for (i = 0; i < dbufCount; i++) {
+		uc = (unsigned char)(dbuf[i] & 0xff);
+		dbuf[byteCount[uc]] |= (i << 8);
+		byteCount[uc]++;
+	}
+	/* Decode first byte by hand to initialize "previous" byte.
+	   Note that it doesn't get output, and if the first three
+	   characters are identical it doesn't qualify as a run (hence
+	   writeRunCountdown = 5). */
+	if (dbufCount) {
+		if (origPtr >= dbufCount)
+			return RETVAL_DATA_ERROR;
+		bd->writePos = dbuf[origPtr];
+		bd->writeCurrent = (unsigned char)(bd->writePos&0xff);
+		bd->writePos >>= 8;
+		bd->writeRunCountdown = 5;
+	}
+	bd->writeCount = dbufCount;
+
+	return RETVAL_OK;
+}
+
+/* Undo burrows-wheeler transform on intermediate buffer to produce output.
+   If start_bunzip was initialized with out_fd =-1, then up to len bytes of
+   data are written to outbuf.  Return value is number of bytes written or
+   error (all errors are negative numbers).  If out_fd!=-1, outbuf and len
+   are ignored, data is written to out_fd and return is RETVAL_OK or error.
+*/
+
+static int INIT read_bunzip(struct bunzip_data *bd, char *outbuf, int len)
+{
+	const unsigned int *dbuf;
+	int pos, xcurrent, previous, gotcount;
+
+	/* If last read was short due to end of file, return last block now */
+	if (bd->writeCount < 0)
+		return bd->writeCount;
+
+	gotcount = 0;
+	dbuf = bd->dbuf;
+	pos = bd->writePos;
+	xcurrent = bd->writeCurrent;
+
+	/* We will always have pending decoded data to write into the output
+	   buffer unless this is the very first call (in which case we haven't
+	   Huffman-decoded a block into the intermediate buffer yet). */
+
+	if (bd->writeCopies) {
+		/* Inside the loop, writeCopies means extra copies (beyond 1) */
+		--bd->writeCopies;
+		/* Loop outputting bytes */
+		for (;;) {
+			/* If the output buffer is full, snapshot
+			 * state and return */
+			if (gotcount >= len) {
+				bd->writePos = pos;
+				bd->writeCurrent = xcurrent;
+				bd->writeCopies++;
+				return len;
+			}
+			/* Write next byte into output buffer, updating CRC */
+			outbuf[gotcount++] = xcurrent;
+			bd->writeCRC = (((bd->writeCRC) << 8)
+				^bd->crc32Table[((bd->writeCRC) >> 24)
+				^xcurrent]);
+			/* Loop now if we're outputting multiple
+			 * copies of this byte */
+			if (bd->writeCopies) {
+				--bd->writeCopies;
+				continue;
+			}
+decode_next_byte:
+			if (!bd->writeCount--)
+				break;
+			/* Follow sequence vector to undo
+			 * Burrows-Wheeler transform */
+			previous = xcurrent;
+			pos = dbuf[pos];
+			xcurrent = pos&0xff;
+			pos >>= 8;
+			/* After 3 consecutive copies of the same
+			   byte, the 4th is a repeat count.  We count
+			   down from 4 instead *of counting up because
+			   testing for non-zero is faster */
+			if (--bd->writeRunCountdown) {
+				if (xcurrent != previous)
+					bd->writeRunCountdown = 4;
+			} else {
+				/* We have a repeated run, this byte
+				 * indicates the count */
+				bd->writeCopies = xcurrent;
+				xcurrent = previous;
+				bd->writeRunCountdown = 5;
+				/* Sometimes there are just 3 bytes
+				 * (run length 0) */
+				if (!bd->writeCopies)
+					goto decode_next_byte;
+				/* Subtract the 1 copy we'd output
+				 * anyway to get extras */
+				--bd->writeCopies;
+			}
+		}
+		/* Decompression of this block completed successfully */
+		bd->writeCRC = ~bd->writeCRC;
+		bd->totalCRC = ((bd->totalCRC << 1) |
+				(bd->totalCRC >> 31)) ^ bd->writeCRC;
+		/* If this block had a CRC error, force file level CRC error. */
+		if (bd->writeCRC != bd->headerCRC) {
+			bd->totalCRC = bd->headerCRC+1;
+			return RETVAL_LAST_BLOCK;
+		}
+	}
+
+	/* Refill the intermediate buffer by Huffman-decoding next
+	 * block of input */
+	/* (previous is just a convenient unused temp variable here) */
+	previous = get_next_block(bd);
+	if (previous) {
+		bd->writeCount = previous;
+		return (previous != RETVAL_LAST_BLOCK) ? previous : gotcount;
+	}
+	bd->writeCRC = 0xffffffffUL;
+	pos = bd->writePos;
+	xcurrent = bd->writeCurrent;
+	goto decode_next_byte;
+}
+
+static int INIT nofill(void *buf, unsigned int len)
+{
+	return -1;
+}
+
+/* Allocate the structure, read file header.  If in_fd ==-1, inbuf must contain
+   a complete bunzip file (len bytes long).  If in_fd!=-1, inbuf and len are
+   ignored, and data is read from file handle into temporary buffer. */
+static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, int len,
+			     int (*fill)(void*, unsigned int))
+{
+	struct bunzip_data *bd;
+	unsigned int i, j, c;
+	const unsigned int BZh0 =
+		(((unsigned int)'B') << 24)+(((unsigned int)'Z') << 16)
+		+(((unsigned int)'h') << 8)+(unsigned int)'0';
+
+	/* Figure out how much data to allocate */
+	i = sizeof(struct bunzip_data);
+
+	/* Allocate bunzip_data.  Most fields initialize to zero. */
+	bd = *bdp = malloc(i);
+	memset(bd, 0, sizeof(struct bunzip_data));
+	/* Setup input buffer */
+	bd->inbuf = inbuf;
+	bd->inbufCount = len;
+	if (fill != NULL)
+		bd->fill = fill;
+	else
+		bd->fill = nofill;
+
+	/* Init the CRC32 table (big endian) */
+	for (i = 0; i < 256; i++) {
+		c = i << 24;
+		for (j = 8; j; j--)
+			c = c&0x80000000 ? (c << 1)^0x04c11db7 : (c << 1);
+		bd->crc32Table[i] = c;
+	}
+
+	/* Ensure that file starts with "BZh['1'-'9']." */
+	i = get_bits(bd, 32);
+	if (((unsigned int)(i-BZh0-1)) >= 9)
+		return RETVAL_NOT_BZIP_DATA;
+
+	/* Fourth byte (ascii '1'-'9'), indicates block size in units of 100k of
+	   uncompressed data.  Allocate intermediate buffer for block. */
+	bd->dbufSize = 100000*(i-BZh0);
+
+	bd->dbuf = large_malloc(bd->dbufSize * sizeof(int));
+	return RETVAL_OK;
+}
+
+/* Example usage: decompress src_fd to dst_fd.  (Stops at end of bzip2 data,
+   not end of file.) */
+STATIC int INIT bunzip2(unsigned char *buf, int len,
+			int(*fill)(void*, unsigned int),
+			int(*flush)(void*, unsigned int),
+			unsigned char *outbuf,
+			int *pos,
+			void(*error_fn)(char *x))
+{
+	struct bunzip_data *bd;
+	int i = -1;
+	unsigned char *inbuf;
+
+	set_error_fn(error_fn);
+	if (flush)
+		outbuf = malloc(BZIP2_IOBUF_SIZE);
+	else
+		len -= 4; /* Uncompressed size hack active in pre-boot
+			     environment */
+	if (!outbuf) {
+		error("Could not allocate output bufer");
+		return -1;
+	}
+	if (buf)
+		inbuf = buf;
+	else
+		inbuf = malloc(BZIP2_IOBUF_SIZE);
+	if (!inbuf) {
+		error("Could not allocate input bufer");
+		goto exit_0;
+	}
+	i = start_bunzip(&bd, inbuf, len, fill);
+	if (!i) {
+		for (;;) {
+			i = read_bunzip(bd, outbuf, BZIP2_IOBUF_SIZE);
+			if (i <= 0)
+				break;
+			if (!flush)
+				outbuf += i;
+			else
+				if (i != flush(outbuf, i)) {
+					i = RETVAL_UNEXPECTED_OUTPUT_EOF;
+					break;
+				}
+		}
+	}
+	/* Check CRC and release memory */
+	if (i == RETVAL_LAST_BLOCK) {
+		if (bd->headerCRC != bd->totalCRC)
+			error("Data integrity error when decompressing.");
+		else
+			i = RETVAL_OK;
+	} else if (i == RETVAL_UNEXPECTED_OUTPUT_EOF) {
+		error("Compressed file ends unexpectedly");
+	}
+	if (bd->dbuf)
+		large_free(bd->dbuf);
+	if (pos)
+		*pos = bd->inbufPos;
+	free(bd);
+	if (!buf)
+		free(inbuf);
+exit_0:
+	if (flush)
+		free(outbuf);
+	return i;
+}
+
+#define decompress bunzip2
diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c
new file mode 100644
index 000000000000..839a329b4fc4
--- /dev/null
+++ b/lib/decompress_inflate.c
@@ -0,0 +1,167 @@
+#ifdef STATIC
+/* Pre-boot environment: included */
+
+/* prevent inclusion of _LINUX_KERNEL_H in pre-boot environment: lots
+ * errors about console_printk etc... on ARM */
+#define _LINUX_KERNEL_H
+
+#include "zlib_inflate/inftrees.c"
+#include "zlib_inflate/inffast.c"
+#include "zlib_inflate/inflate.c"
+
+#else /* STATIC */
+/* initramfs et al: linked */
+
+#include <linux/zutil.h>
+
+#include "zlib_inflate/inftrees.h"
+#include "zlib_inflate/inffast.h"
+#include "zlib_inflate/inflate.h"
+
+#include "zlib_inflate/infutil.h"
+
+#endif /* STATIC */
+
+#include <linux/decompress/mm.h>
+
+#define INBUF_LEN (16*1024)
+
+/* Included from initramfs et al code */
+STATIC int INIT gunzip(unsigned char *buf, int len,
+		       int(*fill)(void*, unsigned int),
+		       int(*flush)(void*, unsigned int),
+		       unsigned char *out_buf,
+		       int *pos,
+		       void(*error_fn)(char *x)) {
+	u8 *zbuf;
+	struct z_stream_s *strm;
+	int rc;
+	size_t out_len;
+
+	set_error_fn(error_fn);
+	rc = -1;
+	if (flush) {
+		out_len = 0x8000; /* 32 K */
+		out_buf = malloc(out_len);
+	} else {
+		out_len = 0x7fffffff; /* no limit */
+	}
+	if (!out_buf) {
+		error("Out of memory while allocating output buffer");
+		goto gunzip_nomem1;
+	}
+
+	if (buf)
+		zbuf = buf;
+	else {
+		zbuf = malloc(INBUF_LEN);
+		len = 0;
+	}
+	if (!zbuf) {
+		error("Out of memory while allocating input buffer");
+		goto gunzip_nomem2;
+	}
+
+	strm = malloc(sizeof(*strm));
+	if (strm == NULL) {
+		error("Out of memory while allocating z_stream");
+		goto gunzip_nomem3;
+	}
+
+	strm->workspace = malloc(flush ? zlib_inflate_workspacesize() :
+				 sizeof(struct inflate_state));
+	if (strm->workspace == NULL) {
+		error("Out of memory while allocating workspace");
+		goto gunzip_nomem4;
+	}
+
+	if (len == 0)
+		len = fill(zbuf, INBUF_LEN);
+
+	/* verify the gzip header */
+	if (len < 10 ||
+	   zbuf[0] != 0x1f || zbuf[1] != 0x8b || zbuf[2] != 0x08) {
+		if (pos)
+			*pos = 0;
+		error("Not a gzip file");
+		goto gunzip_5;
+	}
+
+	/* skip over gzip header (1f,8b,08... 10 bytes total +
+	 * possible asciz filename)
+	 */
+	strm->next_in = zbuf + 10;
+	/* skip over asciz filename */
+	if (zbuf[3] & 0x8) {
+		while (strm->next_in[0])
+			strm->next_in++;
+		strm->next_in++;
+	}
+	strm->avail_in = len - (strm->next_in - zbuf);
+
+	strm->next_out = out_buf;
+	strm->avail_out = out_len;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+
+	if (!flush) {
+		WS(strm)->inflate_state.wsize = 0;
+		WS(strm)->inflate_state.window = NULL;
+	}
+
+	while (rc == Z_OK) {
+		if (strm->avail_in == 0) {
+			/* TODO: handle case where both pos and fill are set */
+			len = fill(zbuf, INBUF_LEN);
+			if (len < 0) {
+				rc = -1;
+				error("read error");
+				break;
+			}
+			strm->next_in = zbuf;
+			strm->avail_in = len;
+		}
+		rc = zlib_inflate(strm, 0);
+
+		/* Write any data generated */
+		if (flush && strm->next_out > out_buf) {
+			int l = strm->next_out - out_buf;
+			if (l != flush(out_buf, l)) {
+				rc = -1;
+				error("write error");
+				break;
+			}
+			strm->next_out = out_buf;
+			strm->avail_out = out_len;
+		}
+
+		/* after Z_FINISH, only Z_STREAM_END is "we unpacked it all" */
+		if (rc == Z_STREAM_END) {
+			rc = 0;
+			break;
+		} else if (rc != Z_OK) {
+			error("uncompression error");
+			rc = -1;
+		}
+	}
+
+	zlib_inflateEnd(strm);
+	if (pos)
+		/* add + 8 to skip over trailer */
+		*pos = strm->next_in - zbuf+8;
+
+gunzip_5:
+	free(strm->workspace);
+gunzip_nomem4:
+	free(strm);
+gunzip_nomem3:
+	if (!buf)
+		free(zbuf);
+gunzip_nomem2:
+	if (flush)
+		free(out_buf);
+gunzip_nomem1:
+	return rc; /* returns Z_OK (0) if successful */
+}
+
+#define decompress gunzip
diff --git a/lib/decompress_unlzma.c b/lib/decompress_unlzma.c
new file mode 100644
index 000000000000..546f2f4c157e
--- /dev/null
+++ b/lib/decompress_unlzma.c
@@ -0,0 +1,647 @@
+/* Lzma decompressor for Linux kernel. Shamelessly snarfed
+ *from busybox 1.1.1
+ *
+ *Linux kernel adaptation
+ *Copyright (C) 2006  Alain < alain@knaff.lu >
+ *
+ *Based on small lzma deflate implementation/Small range coder
+ *implementation for lzma.
+ *Copyright (C) 2006  Aurelien Jacobs < aurel@gnuage.org >
+ *
+ *Based on LzmaDecode.c from the LZMA SDK 4.22 (http://www.7-zip.org/)
+ *Copyright (C) 1999-2005  Igor Pavlov
+ *
+ *Copyrights of the parts, see headers below.
+ *
+ *
+ *This program is free software; you can redistribute it and/or
+ *modify it under the terms of the GNU Lesser General Public
+ *License as published by the Free Software Foundation; either
+ *version 2.1 of the License, or (at your option) any later version.
+ *
+ *This program is distributed in the hope that it will be useful,
+ *but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *Lesser General Public License for more details.
+ *
+ *You should have received a copy of the GNU Lesser General Public
+ *License along with this library; if not, write to the Free Software
+ *Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#ifndef STATIC
+#include <linux/decompress/unlzma.h>
+#endif /* STATIC */
+
+#include <linux/decompress/mm.h>
+
+#define	MIN(a, b) (((a) < (b)) ? (a) : (b))
+
+static long long INIT read_int(unsigned char *ptr, int size)
+{
+	int i;
+	long long ret = 0;
+
+	for (i = 0; i < size; i++)
+		ret = (ret << 8) | ptr[size-i-1];
+	return ret;
+}
+
+#define ENDIAN_CONVERT(x) \
+  x = (typeof(x))read_int((unsigned char *)&x, sizeof(x))
+
+
+/* Small range coder implementation for lzma.
+ *Copyright (C) 2006  Aurelien Jacobs < aurel@gnuage.org >
+ *
+ *Based on LzmaDecode.c from the LZMA SDK 4.22 (http://www.7-zip.org/)
+ *Copyright (c) 1999-2005  Igor Pavlov
+ */
+
+#include <linux/compiler.h>
+
+#define LZMA_IOBUF_SIZE	0x10000
+
+struct rc {
+	int (*fill)(void*, unsigned int);
+	uint8_t *ptr;
+	uint8_t *buffer;
+	uint8_t *buffer_end;
+	int buffer_size;
+	uint32_t code;
+	uint32_t range;
+	uint32_t bound;
+};
+
+
+#define RC_TOP_BITS 24
+#define RC_MOVE_BITS 5
+#define RC_MODEL_TOTAL_BITS 11
+
+
+/* Called twice: once at startup and once in rc_normalize() */
+static void INIT rc_read(struct rc *rc)
+{
+	rc->buffer_size = rc->fill((char *)rc->buffer, LZMA_IOBUF_SIZE);
+	if (rc->buffer_size <= 0)
+		error("unexpected EOF");
+	rc->ptr = rc->buffer;
+	rc->buffer_end = rc->buffer + rc->buffer_size;
+}
+
+/* Called once */
+static inline void INIT rc_init(struct rc *rc,
+				       int (*fill)(void*, unsigned int),
+				       char *buffer, int buffer_size)
+{
+	rc->fill = fill;
+	rc->buffer = (uint8_t *)buffer;
+	rc->buffer_size = buffer_size;
+	rc->buffer_end = rc->buffer + rc->buffer_size;
+	rc->ptr = rc->buffer;
+
+	rc->code = 0;
+	rc->range = 0xFFFFFFFF;
+}
+
+static inline void INIT rc_init_code(struct rc *rc)
+{
+	int i;
+
+	for (i = 0; i < 5; i++) {
+		if (rc->ptr >= rc->buffer_end)
+			rc_read(rc);
+		rc->code = (rc->code << 8) | *rc->ptr++;
+	}
+}
+
+
+/* Called once. TODO: bb_maybe_free() */
+static inline void INIT rc_free(struct rc *rc)
+{
+	free(rc->buffer);
+}
+
+/* Called twice, but one callsite is in inline'd rc_is_bit_0_helper() */
+static void INIT rc_do_normalize(struct rc *rc)
+{
+	if (rc->ptr >= rc->buffer_end)
+		rc_read(rc);
+	rc->range <<= 8;
+	rc->code = (rc->code << 8) | *rc->ptr++;
+}
+static inline void INIT rc_normalize(struct rc *rc)
+{
+	if (rc->range < (1 << RC_TOP_BITS))
+		rc_do_normalize(rc);
+}
+
+/* Called 9 times */
+/* Why rc_is_bit_0_helper exists?
+ *Because we want to always expose (rc->code < rc->bound) to optimizer
+ */
+static inline uint32_t INIT rc_is_bit_0_helper(struct rc *rc, uint16_t *p)
+{
+	rc_normalize(rc);
+	rc->bound = *p * (rc->range >> RC_MODEL_TOTAL_BITS);
+	return rc->bound;
+}
+static inline int INIT rc_is_bit_0(struct rc *rc, uint16_t *p)
+{
+	uint32_t t = rc_is_bit_0_helper(rc, p);
+	return rc->code < t;
+}
+
+/* Called ~10 times, but very small, thus inlined */
+static inline void INIT rc_update_bit_0(struct rc *rc, uint16_t *p)
+{
+	rc->range = rc->bound;
+	*p += ((1 << RC_MODEL_TOTAL_BITS) - *p) >> RC_MOVE_BITS;
+}
+static inline void rc_update_bit_1(struct rc *rc, uint16_t *p)
+{
+	rc->range -= rc->bound;
+	rc->code -= rc->bound;
+	*p -= *p >> RC_MOVE_BITS;
+}
+
+/* Called 4 times in unlzma loop */
+static int INIT rc_get_bit(struct rc *rc, uint16_t *p, int *symbol)
+{
+	if (rc_is_bit_0(rc, p)) {
+		rc_update_bit_0(rc, p);
+		*symbol *= 2;
+		return 0;
+	} else {
+		rc_update_bit_1(rc, p);
+		*symbol = *symbol * 2 + 1;
+		return 1;
+	}
+}
+
+/* Called once */
+static inline int INIT rc_direct_bit(struct rc *rc)
+{
+	rc_normalize(rc);
+	rc->range >>= 1;
+	if (rc->code >= rc->range) {
+		rc->code -= rc->range;
+		return 1;
+	}
+	return 0;
+}
+
+/* Called twice */
+static inline void INIT
+rc_bit_tree_decode(struct rc *rc, uint16_t *p, int num_levels, int *symbol)
+{
+	int i = num_levels;
+
+	*symbol = 1;
+	while (i--)
+		rc_get_bit(rc, p + *symbol, symbol);
+	*symbol -= 1 << num_levels;
+}
+
+
+/*
+ * Small lzma deflate implementation.
+ * Copyright (C) 2006  Aurelien Jacobs < aurel@gnuage.org >
+ *
+ * Based on LzmaDecode.c from the LZMA SDK 4.22 (http://www.7-zip.org/)
+ * Copyright (C) 1999-2005  Igor Pavlov
+ */
+
+
+struct lzma_header {
+	uint8_t pos;
+	uint32_t dict_size;
+	uint64_t dst_size;
+} __attribute__ ((packed)) ;
+
+
+#define LZMA_BASE_SIZE 1846
+#define LZMA_LIT_SIZE 768
+
+#define LZMA_NUM_POS_BITS_MAX 4
+
+#define LZMA_LEN_NUM_LOW_BITS 3
+#define LZMA_LEN_NUM_MID_BITS 3
+#define LZMA_LEN_NUM_HIGH_BITS 8
+
+#define LZMA_LEN_CHOICE 0
+#define LZMA_LEN_CHOICE_2 (LZMA_LEN_CHOICE + 1)
+#define LZMA_LEN_LOW (LZMA_LEN_CHOICE_2 + 1)
+#define LZMA_LEN_MID (LZMA_LEN_LOW \
+		      + (1 << (LZMA_NUM_POS_BITS_MAX + LZMA_LEN_NUM_LOW_BITS)))
+#define LZMA_LEN_HIGH (LZMA_LEN_MID \
+		       +(1 << (LZMA_NUM_POS_BITS_MAX + LZMA_LEN_NUM_MID_BITS)))
+#define LZMA_NUM_LEN_PROBS (LZMA_LEN_HIGH + (1 << LZMA_LEN_NUM_HIGH_BITS))
+
+#define LZMA_NUM_STATES 12
+#define LZMA_NUM_LIT_STATES 7
+
+#define LZMA_START_POS_MODEL_INDEX 4
+#define LZMA_END_POS_MODEL_INDEX 14
+#define LZMA_NUM_FULL_DISTANCES (1 << (LZMA_END_POS_MODEL_INDEX >> 1))
+
+#define LZMA_NUM_POS_SLOT_BITS 6
+#define LZMA_NUM_LEN_TO_POS_STATES 4
+
+#define LZMA_NUM_ALIGN_BITS 4
+
+#define LZMA_MATCH_MIN_LEN 2
+
+#define LZMA_IS_MATCH 0
+#define LZMA_IS_REP (LZMA_IS_MATCH + (LZMA_NUM_STATES << LZMA_NUM_POS_BITS_MAX))
+#define LZMA_IS_REP_G0 (LZMA_IS_REP + LZMA_NUM_STATES)
+#define LZMA_IS_REP_G1 (LZMA_IS_REP_G0 + LZMA_NUM_STATES)
+#define LZMA_IS_REP_G2 (LZMA_IS_REP_G1 + LZMA_NUM_STATES)
+#define LZMA_IS_REP_0_LONG (LZMA_IS_REP_G2 + LZMA_NUM_STATES)
+#define LZMA_POS_SLOT (LZMA_IS_REP_0_LONG \
+		       + (LZMA_NUM_STATES << LZMA_NUM_POS_BITS_MAX))
+#define LZMA_SPEC_POS (LZMA_POS_SLOT \
+		       +(LZMA_NUM_LEN_TO_POS_STATES << LZMA_NUM_POS_SLOT_BITS))
+#define LZMA_ALIGN (LZMA_SPEC_POS \
+		    + LZMA_NUM_FULL_DISTANCES - LZMA_END_POS_MODEL_INDEX)
+#define LZMA_LEN_CODER (LZMA_ALIGN + (1 << LZMA_NUM_ALIGN_BITS))
+#define LZMA_REP_LEN_CODER (LZMA_LEN_CODER + LZMA_NUM_LEN_PROBS)
+#define LZMA_LITERAL (LZMA_REP_LEN_CODER + LZMA_NUM_LEN_PROBS)
+
+
+struct writer {
+	uint8_t *buffer;
+	uint8_t previous_byte;
+	size_t buffer_pos;
+	int bufsize;
+	size_t global_pos;
+	int(*flush)(void*, unsigned int);
+	struct lzma_header *header;
+};
+
+struct cstate {
+	int state;
+	uint32_t rep0, rep1, rep2, rep3;
+};
+
+static inline size_t INIT get_pos(struct writer *wr)
+{
+	return
+		wr->global_pos + wr->buffer_pos;
+}
+
+static inline uint8_t INIT peek_old_byte(struct writer *wr,
+						uint32_t offs)
+{
+	if (!wr->flush) {
+		int32_t pos;
+		while (offs > wr->header->dict_size)
+			offs -= wr->header->dict_size;
+		pos = wr->buffer_pos - offs;
+		return wr->buffer[pos];
+	} else {
+		uint32_t pos = wr->buffer_pos - offs;
+		while (pos >= wr->header->dict_size)
+			pos += wr->header->dict_size;
+		return wr->buffer[pos];
+	}
+
+}
+
+static inline void INIT write_byte(struct writer *wr, uint8_t byte)
+{
+	wr->buffer[wr->buffer_pos++] = wr->previous_byte = byte;
+	if (wr->flush && wr->buffer_pos == wr->header->dict_size) {
+		wr->buffer_pos = 0;
+		wr->global_pos += wr->header->dict_size;
+		wr->flush((char *)wr->buffer, wr->header->dict_size);
+	}
+}
+
+
+static inline void INIT copy_byte(struct writer *wr, uint32_t offs)
+{
+	write_byte(wr, peek_old_byte(wr, offs));
+}
+
+static inline void INIT copy_bytes(struct writer *wr,
+					 uint32_t rep0, int len)
+{
+	do {
+		copy_byte(wr, rep0);
+		len--;
+	} while (len != 0 && wr->buffer_pos < wr->header->dst_size);
+}
+
+static inline void INIT process_bit0(struct writer *wr, struct rc *rc,
+				     struct cstate *cst, uint16_t *p,
+				     int pos_state, uint16_t *prob,
+				     int lc, uint32_t literal_pos_mask) {
+	int mi = 1;
+	rc_update_bit_0(rc, prob);
+	prob = (p + LZMA_LITERAL +
+		(LZMA_LIT_SIZE
+		 * (((get_pos(wr) & literal_pos_mask) << lc)
+		    + (wr->previous_byte >> (8 - lc))))
+		);
+
+	if (cst->state >= LZMA_NUM_LIT_STATES) {
+		int match_byte = peek_old_byte(wr, cst->rep0);
+		do {
+			int bit;
+			uint16_t *prob_lit;
+
+			match_byte <<= 1;
+			bit = match_byte & 0x100;
+			prob_lit = prob + 0x100 + bit + mi;
+			if (rc_get_bit(rc, prob_lit, &mi)) {
+				if (!bit)
+					break;
+			} else {
+				if (bit)
+					break;
+			}
+		} while (mi < 0x100);
+	}
+	while (mi < 0x100) {
+		uint16_t *prob_lit = prob + mi;
+		rc_get_bit(rc, prob_lit, &mi);
+	}
+	write_byte(wr, mi);
+	if (cst->state < 4)
+		cst->state = 0;
+	else if (cst->state < 10)
+		cst->state -= 3;
+	else
+		cst->state -= 6;
+}
+
+static inline void INIT process_bit1(struct writer *wr, struct rc *rc,
+					    struct cstate *cst, uint16_t *p,
+					    int pos_state, uint16_t *prob) {
+  int offset;
+	uint16_t *prob_len;
+	int num_bits;
+	int len;
+
+	rc_update_bit_1(rc, prob);
+	prob = p + LZMA_IS_REP + cst->state;
+	if (rc_is_bit_0(rc, prob)) {
+		rc_update_bit_0(rc, prob);
+		cst->rep3 = cst->rep2;
+		cst->rep2 = cst->rep1;
+		cst->rep1 = cst->rep0;
+		cst->state = cst->state < LZMA_NUM_LIT_STATES ? 0 : 3;
+		prob = p + LZMA_LEN_CODER;
+	} else {
+		rc_update_bit_1(rc, prob);
+		prob = p + LZMA_IS_REP_G0 + cst->state;
+		if (rc_is_bit_0(rc, prob)) {
+			rc_update_bit_0(rc, prob);
+			prob = (p + LZMA_IS_REP_0_LONG
+				+ (cst->state <<
+				   LZMA_NUM_POS_BITS_MAX) +
+				pos_state);
+			if (rc_is_bit_0(rc, prob)) {
+				rc_update_bit_0(rc, prob);
+
+				cst->state = cst->state < LZMA_NUM_LIT_STATES ?
+					9 : 11;
+				copy_byte(wr, cst->rep0);
+				return;
+			} else {
+				rc_update_bit_1(rc, prob);
+			}
+		} else {
+			uint32_t distance;
+
+			rc_update_bit_1(rc, prob);
+			prob = p + LZMA_IS_REP_G1 + cst->state;
+			if (rc_is_bit_0(rc, prob)) {
+				rc_update_bit_0(rc, prob);
+				distance = cst->rep1;
+			} else {
+				rc_update_bit_1(rc, prob);
+				prob = p + LZMA_IS_REP_G2 + cst->state;
+				if (rc_is_bit_0(rc, prob)) {
+					rc_update_bit_0(rc, prob);
+					distance = cst->rep2;
+				} else {
+					rc_update_bit_1(rc, prob);
+					distance = cst->rep3;
+					cst->rep3 = cst->rep2;
+				}
+				cst->rep2 = cst->rep1;
+			}
+			cst->rep1 = cst->rep0;
+			cst->rep0 = distance;
+		}
+		cst->state = cst->state < LZMA_NUM_LIT_STATES ? 8 : 11;
+		prob = p + LZMA_REP_LEN_CODER;
+	}
+
+	prob_len = prob + LZMA_LEN_CHOICE;
+	if (rc_is_bit_0(rc, prob_len)) {
+		rc_update_bit_0(rc, prob_len);
+		prob_len = (prob + LZMA_LEN_LOW
+			    + (pos_state <<
+			       LZMA_LEN_NUM_LOW_BITS));
+		offset = 0;
+		num_bits = LZMA_LEN_NUM_LOW_BITS;
+	} else {
+		rc_update_bit_1(rc, prob_len);
+		prob_len = prob + LZMA_LEN_CHOICE_2;
+		if (rc_is_bit_0(rc, prob_len)) {
+			rc_update_bit_0(rc, prob_len);
+			prob_len = (prob + LZMA_LEN_MID
+				    + (pos_state <<
+				       LZMA_LEN_NUM_MID_BITS));
+			offset = 1 << LZMA_LEN_NUM_LOW_BITS;
+			num_bits = LZMA_LEN_NUM_MID_BITS;
+		} else {
+			rc_update_bit_1(rc, prob_len);
+			prob_len = prob + LZMA_LEN_HIGH;
+			offset = ((1 << LZMA_LEN_NUM_LOW_BITS)
+				  + (1 << LZMA_LEN_NUM_MID_BITS));
+			num_bits = LZMA_LEN_NUM_HIGH_BITS;
+		}
+	}
+
+	rc_bit_tree_decode(rc, prob_len, num_bits, &len);
+	len += offset;
+
+	if (cst->state < 4) {
+		int pos_slot;
+
+		cst->state += LZMA_NUM_LIT_STATES;
+		prob =
+			p + LZMA_POS_SLOT +
+			((len <
+			  LZMA_NUM_LEN_TO_POS_STATES ? len :
+			  LZMA_NUM_LEN_TO_POS_STATES - 1)
+			 << LZMA_NUM_POS_SLOT_BITS);
+		rc_bit_tree_decode(rc, prob,
+				   LZMA_NUM_POS_SLOT_BITS,
+				   &pos_slot);
+		if (pos_slot >= LZMA_START_POS_MODEL_INDEX) {
+			int i, mi;
+			num_bits = (pos_slot >> 1) - 1;
+			cst->rep0 = 2 | (pos_slot & 1);
+			if (pos_slot < LZMA_END_POS_MODEL_INDEX) {
+				cst->rep0 <<= num_bits;
+				prob = p + LZMA_SPEC_POS +
+					cst->rep0 - pos_slot - 1;
+			} else {
+				num_bits -= LZMA_NUM_ALIGN_BITS;
+				while (num_bits--)
+					cst->rep0 = (cst->rep0 << 1) |
+						rc_direct_bit(rc);
+				prob = p + LZMA_ALIGN;
+				cst->rep0 <<= LZMA_NUM_ALIGN_BITS;
+				num_bits = LZMA_NUM_ALIGN_BITS;
+			}
+			i = 1;
+			mi = 1;
+			while (num_bits--) {
+				if (rc_get_bit(rc, prob + mi, &mi))
+					cst->rep0 |= i;
+				i <<= 1;
+			}
+		} else
+			cst->rep0 = pos_slot;
+		if (++(cst->rep0) == 0)
+			return;
+	}
+
+	len += LZMA_MATCH_MIN_LEN;
+
+	copy_bytes(wr, cst->rep0, len);
+}
+
+
+
+STATIC inline int INIT unlzma(unsigned char *buf, int in_len,
+			      int(*fill)(void*, unsigned int),
+			      int(*flush)(void*, unsigned int),
+			      unsigned char *output,
+			      int *posp,
+			      void(*error_fn)(char *x)
+	)
+{
+	struct lzma_header header;
+	int lc, pb, lp;
+	uint32_t pos_state_mask;
+	uint32_t literal_pos_mask;
+	uint16_t *p;
+	int num_probs;
+	struct rc rc;
+	int i, mi;
+	struct writer wr;
+	struct cstate cst;
+	unsigned char *inbuf;
+	int ret = -1;
+
+	set_error_fn(error_fn);
+	if (!flush)
+		in_len -= 4; /* Uncompressed size hack active in pre-boot
+				environment */
+	if (buf)
+		inbuf = buf;
+	else
+		inbuf = malloc(LZMA_IOBUF_SIZE);
+	if (!inbuf) {
+		error("Could not allocate input bufer");
+		goto exit_0;
+	}
+
+	cst.state = 0;
+	cst.rep0 = cst.rep1 = cst.rep2 = cst.rep3 = 1;
+
+	wr.header = &header;
+	wr.flush = flush;
+	wr.global_pos = 0;
+	wr.previous_byte = 0;
+	wr.buffer_pos = 0;
+
+	rc_init(&rc, fill, inbuf, in_len);
+
+	for (i = 0; i < sizeof(header); i++) {
+		if (rc.ptr >= rc.buffer_end)
+			rc_read(&rc);
+		((unsigned char *)&header)[i] = *rc.ptr++;
+	}
+
+	if (header.pos >= (9 * 5 * 5))
+		error("bad header");
+
+	mi = 0;
+	lc = header.pos;
+	while (lc >= 9) {
+		mi++;
+		lc -= 9;
+	}
+	pb = 0;
+	lp = mi;
+	while (lp >= 5) {
+		pb++;
+		lp -= 5;
+	}
+	pos_state_mask = (1 << pb) - 1;
+	literal_pos_mask = (1 << lp) - 1;
+
+	ENDIAN_CONVERT(header.dict_size);
+	ENDIAN_CONVERT(header.dst_size);
+
+	if (header.dict_size == 0)
+		header.dict_size = 1;
+
+	if (output)
+		wr.buffer = output;
+	else {
+		wr.bufsize = MIN(header.dst_size, header.dict_size);
+		wr.buffer = large_malloc(wr.bufsize);
+	}
+	if (wr.buffer == NULL)
+		goto exit_1;
+
+	num_probs = LZMA_BASE_SIZE + (LZMA_LIT_SIZE << (lc + lp));
+	p = (uint16_t *) large_malloc(num_probs * sizeof(*p));
+	if (p == 0)
+		goto exit_2;
+	num_probs = LZMA_LITERAL + (LZMA_LIT_SIZE << (lc + lp));
+	for (i = 0; i < num_probs; i++)
+		p[i] = (1 << RC_MODEL_TOTAL_BITS) >> 1;
+
+	rc_init_code(&rc);
+
+	while (get_pos(&wr) < header.dst_size) {
+		int pos_state =	get_pos(&wr) & pos_state_mask;
+		uint16_t *prob = p + LZMA_IS_MATCH +
+			(cst.state << LZMA_NUM_POS_BITS_MAX) + pos_state;
+		if (rc_is_bit_0(&rc, prob))
+			process_bit0(&wr, &rc, &cst, p, pos_state, prob,
+				     lc, literal_pos_mask);
+		else {
+			process_bit1(&wr, &rc, &cst, p, pos_state, prob);
+			if (cst.rep0 == 0)
+				break;
+		}
+	}
+
+	if (posp)
+		*posp = rc.ptr-rc.buffer;
+	if (wr.flush)
+		wr.flush(wr.buffer, wr.buffer_pos);
+	ret = 0;
+	large_free(p);
+exit_2:
+	if (!output)
+		large_free(wr.buffer);
+exit_1:
+	if (!buf)
+		free(inbuf);
+exit_0:
+	return ret;
+}
+
+#define decompress unlzma
diff --git a/lib/zlib_inflate/inflate.h b/lib/zlib_inflate/inflate.h
index df8a6c92052d..3d17b3d1b21f 100644
--- a/lib/zlib_inflate/inflate.h
+++ b/lib/zlib_inflate/inflate.h
@@ -1,3 +1,6 @@
+#ifndef INFLATE_H
+#define INFLATE_H
+
 /* inflate.h -- internal inflate state definition
  * Copyright (C) 1995-2004 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
@@ -105,3 +108,4 @@ struct inflate_state {
     unsigned short work[288];   /* work area for code table building */
     code codes[ENOUGH];         /* space for code tables */
 };
+#endif
diff --git a/lib/zlib_inflate/inftrees.h b/lib/zlib_inflate/inftrees.h
index 5f5219b1240e..b70b4731ac7a 100644
--- a/lib/zlib_inflate/inftrees.h
+++ b/lib/zlib_inflate/inftrees.h
@@ -1,3 +1,6 @@
+#ifndef INFTREES_H
+#define INFTREES_H
+
 /* inftrees.h -- header to use inftrees.c
  * Copyright (C) 1995-2005 Mark Adler
  * For conditions of distribution and use, see copyright notice in zlib.h
@@ -53,3 +56,4 @@ typedef enum {
 extern int zlib_inflate_table (codetype type, unsigned short *lens,
                              unsigned codes, code **table,
                              unsigned *bits, unsigned short *work);
+#endif
diff --git a/mm/Makefile b/mm/Makefile
index 72255be57f89..818569b68f46 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -30,6 +30,10 @@ obj-$(CONFIG_FAILSLAB) += failslab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
+ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA
+obj-$(CONFIG_SMP) += percpu.o
+else
 obj-$(CONFIG_SMP) += allocpercpu.o
+endif
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o page_cgroup.o
diff --git a/mm/allocpercpu.c b/mm/allocpercpu.c
index 4297bc41bfd2..1882923bc706 100644
--- a/mm/allocpercpu.c
+++ b/mm/allocpercpu.c
@@ -99,45 +99,51 @@ static int __percpu_populate_mask(void *__pdata, size_t size, gfp_t gfp,
 	__percpu_populate_mask((__pdata), (size), (gfp), &(mask))
 
 /**
- * percpu_alloc_mask - initial setup of per-cpu data
+ * alloc_percpu - initial setup of per-cpu data
  * @size: size of per-cpu object
- * @gfp: may sleep or not etc.
- * @mask: populate per-data for cpu's selected through mask bits
+ * @align: alignment
  *
- * Populating per-cpu data for all online cpu's would be a typical use case,
- * which is simplified by the percpu_alloc() wrapper.
- * Per-cpu objects are populated with zeroed buffers.
+ * Allocate dynamic percpu area.  Percpu objects are populated with
+ * zeroed buffers.
  */
-void *__percpu_alloc_mask(size_t size, gfp_t gfp, cpumask_t *mask)
+void *__alloc_percpu(size_t size, size_t align)
 {
 	/*
 	 * We allocate whole cache lines to avoid false sharing
 	 */
 	size_t sz = roundup(nr_cpu_ids * sizeof(void *), cache_line_size());
-	void *pdata = kzalloc(sz, gfp);
+	void *pdata = kzalloc(sz, GFP_KERNEL);
 	void *__pdata = __percpu_disguise(pdata);
 
+	/*
+	 * Can't easily make larger alignment work with kmalloc.  WARN
+	 * on it.  Larger alignment should only be used for module
+	 * percpu sections on SMP for which this path isn't used.
+	 */
+	WARN_ON_ONCE(align > SMP_CACHE_BYTES);
+
 	if (unlikely(!pdata))
 		return NULL;
-	if (likely(!__percpu_populate_mask(__pdata, size, gfp, mask)))
+	if (likely(!__percpu_populate_mask(__pdata, size, GFP_KERNEL,
+					   &cpu_possible_map)))
 		return __pdata;
 	kfree(pdata);
 	return NULL;
 }
-EXPORT_SYMBOL_GPL(__percpu_alloc_mask);
+EXPORT_SYMBOL_GPL(__alloc_percpu);
 
 /**
- * percpu_free - final cleanup of per-cpu data
+ * free_percpu - final cleanup of per-cpu data
  * @__pdata: object to clean up
  *
  * We simply clean up any per-cpu object left. No need for the client to
  * track and specify through a bis mask which per-cpu objects are to free.
  */
-void percpu_free(void *__pdata)
+void free_percpu(void *__pdata)
 {
 	if (unlikely(!__pdata))
 		return;
 	__percpu_depopulate_mask(__pdata, &cpu_possible_map);
 	kfree(__percpu_disguise(__pdata));
 }
-EXPORT_SYMBOL_GPL(percpu_free);
+EXPORT_SYMBOL_GPL(free_percpu);
diff --git a/mm/bootmem.c b/mm/bootmem.c
index 51a0ccf61e0e..daf92713f7de 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -382,7 +382,6 @@ int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
 }
 
-#ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
 /**
  * reserve_bootmem - mark a page range as usable
  * @addr: starting address of the range
@@ -403,7 +402,6 @@ int __init reserve_bootmem(unsigned long addr, unsigned long size,
 
 	return mark_bootmem(start, end, 1, flags);
 }
-#endif /* !CONFIG_HAVE_ARCH_BOOTMEM_NODE */
 
 static unsigned long align_idx(struct bootmem_data *bdata, unsigned long idx,
 			unsigned long step)
@@ -429,8 +427,8 @@ static unsigned long align_off(struct bootmem_data *bdata, unsigned long off,
 }
 
 static void * __init alloc_bootmem_core(struct bootmem_data *bdata,
-				unsigned long size, unsigned long align,
-				unsigned long goal, unsigned long limit)
+					unsigned long size, unsigned long align,
+					unsigned long goal, unsigned long limit)
 {
 	unsigned long fallback = 0;
 	unsigned long min, max, start, sidx, midx, step;
@@ -530,17 +528,34 @@ find_block:
 	return NULL;
 }
 
+static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,
+					unsigned long size, unsigned long align,
+					unsigned long goal, unsigned long limit)
+{
+#ifdef CONFIG_HAVE_ARCH_BOOTMEM
+	bootmem_data_t *p_bdata;
+
+	p_bdata = bootmem_arch_preferred_node(bdata, size, align, goal, limit);
+	if (p_bdata)
+		return alloc_bootmem_core(p_bdata, size, align, goal, limit);
+#endif
+	return NULL;
+}
+
 static void * __init ___alloc_bootmem_nopanic(unsigned long size,
 					unsigned long align,
 					unsigned long goal,
 					unsigned long limit)
 {
 	bootmem_data_t *bdata;
+	void *region;
 
 restart:
-	list_for_each_entry(bdata, &bdata_list, list) {
-		void *region;
+	region = alloc_arch_preferred_bootmem(NULL, size, align, goal, limit);
+	if (region)
+		return region;
 
+	list_for_each_entry(bdata, &bdata_list, list) {
 		if (goal && bdata->node_low_pfn <= PFN_DOWN(goal))
 			continue;
 		if (limit && bdata->node_min_pfn >= PFN_DOWN(limit))
@@ -618,6 +633,10 @@ static void * __init ___alloc_bootmem_node(bootmem_data_t *bdata,
 {
 	void *ptr;
 
+	ptr = alloc_arch_preferred_bootmem(bdata, size, align, goal, limit);
+	if (ptr)
+		return ptr;
+
 	ptr = alloc_bootmem_core(bdata, size, align, goal, limit);
 	if (ptr)
 		return ptr;
@@ -674,6 +693,10 @@ void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
 {
 	void *ptr;
 
+	ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
+	if (ptr)
+		return ptr;
+
 	ptr = alloc_bootmem_core(pgdat->bdata, size, align, goal, 0);
 	if (ptr)
 		return ptr;
diff --git a/mm/filemap.c b/mm/filemap.c
index 23acefe51808..126d3973b3d1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1823,7 +1823,7 @@ static size_t __iovec_copy_from_user_inatomic(char *vaddr,
 		int copy = min(bytes, iov->iov_len - base);
 
 		base = 0;
-		left = __copy_from_user_inatomic_nocache(vaddr, buf, copy);
+		left = __copy_from_user_inatomic(vaddr, buf, copy);
 		copied += copy;
 		bytes -= copy;
 		vaddr += copy;
@@ -1851,8 +1851,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 	if (likely(i->nr_segs == 1)) {
 		int left;
 		char __user *buf = i->iov->iov_base + i->iov_offset;
-		left = __copy_from_user_inatomic_nocache(kaddr + offset,
-							buf, bytes);
+		left = __copy_from_user_inatomic(kaddr + offset, buf, bytes);
 		copied = bytes - left;
 	} else {
 		copied = __iovec_copy_from_user_inatomic(kaddr + offset,
@@ -1880,7 +1879,7 @@ size_t iov_iter_copy_from_user(struct page *page,
 	if (likely(i->nr_segs == 1)) {
 		int left;
 		char __user *buf = i->iov->iov_base + i->iov_offset;
-		left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
+		left = __copy_from_user(kaddr + offset, buf, bytes);
 		copied = bytes - left;
 	} else {
 		copied = __iovec_copy_from_user_inatomic(kaddr + offset,
diff --git a/mm/percpu.c b/mm/percpu.c
new file mode 100644
index 000000000000..1aa5d8fbca12
--- /dev/null
+++ b/mm/percpu.c
@@ -0,0 +1,1326 @@
+/*
+ * linux/mm/percpu.c - percpu memory allocator
+ *
+ * Copyright (C) 2009		SUSE Linux Products GmbH
+ * Copyright (C) 2009		Tejun Heo <tj@kernel.org>
+ *
+ * This file is released under the GPLv2.
+ *
+ * This is percpu allocator which can handle both static and dynamic
+ * areas.  Percpu areas are allocated in chunks in vmalloc area.  Each
+ * chunk is consisted of num_possible_cpus() units and the first chunk
+ * is used for static percpu variables in the kernel image (special
+ * boot time alloc/init handling necessary as these areas need to be
+ * brought up before allocation services are running).  Unit grows as
+ * necessary and all units grow or shrink in unison.  When a chunk is
+ * filled up, another chunk is allocated.  ie. in vmalloc area
+ *
+ *  c0                           c1                         c2
+ *  -------------------          -------------------        ------------
+ * | u0 | u1 | u2 | u3 |        | u0 | u1 | u2 | u3 |      | u0 | u1 | u
+ *  -------------------  ......  -------------------  ....  ------------
+ *
+ * Allocation is done in offset-size areas of single unit space.  Ie,
+ * an area of 512 bytes at 6k in c1 occupies 512 bytes at 6k of c1:u0,
+ * c1:u1, c1:u2 and c1:u3.  Percpu access can be done by configuring
+ * percpu base registers UNIT_SIZE apart.
+ *
+ * There are usually many small percpu allocations many of them as
+ * small as 4 bytes.  The allocator organizes chunks into lists
+ * according to free size and tries to allocate from the fullest one.
+ * Each chunk keeps the maximum contiguous area size hint which is
+ * guaranteed to be eqaul to or larger than the maximum contiguous
+ * area in the chunk.  This helps the allocator not to iterate the
+ * chunk maps unnecessarily.
+ *
+ * Allocation state in each chunk is kept using an array of integers
+ * on chunk->map.  A positive value in the map represents a free
+ * region and negative allocated.  Allocation inside a chunk is done
+ * by scanning this map sequentially and serving the first matching
+ * entry.  This is mostly copied from the percpu_modalloc() allocator.
+ * Chunks are also linked into a rb tree to ease address to chunk
+ * mapping during free.
+ *
+ * To use this allocator, arch code should do the followings.
+ *
+ * - define CONFIG_HAVE_DYNAMIC_PER_CPU_AREA
+ *
+ * - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate
+ *   regular address to percpu pointer and back if they need to be
+ *   different from the default
+ *
+ * - use pcpu_setup_first_chunk() during percpu area initialization to
+ *   setup the first chunk containing the kernel static percpu area
+ */
+
+#include <linux/bitmap.h>
+#include <linux/bootmem.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/percpu.h>
+#include <linux/pfn.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/vmalloc.h>
+#include <linux/workqueue.h>
+
+#include <asm/cacheflush.h>
+#include <asm/sections.h>
+#include <asm/tlbflush.h>
+
+#define PCPU_SLOT_BASE_SHIFT		5	/* 1-31 shares the same slot */
+#define PCPU_DFL_MAP_ALLOC		16	/* start a map with 16 ents */
+
+/* default addr <-> pcpu_ptr mapping, override in asm/percpu.h if necessary */
+#ifndef __addr_to_pcpu_ptr
+#define __addr_to_pcpu_ptr(addr)					\
+	(void *)((unsigned long)(addr) - (unsigned long)pcpu_base_addr	\
+		 + (unsigned long)__per_cpu_start)
+#endif
+#ifndef __pcpu_ptr_to_addr
+#define __pcpu_ptr_to_addr(ptr)						\
+	(void *)((unsigned long)(ptr) + (unsigned long)pcpu_base_addr	\
+		 - (unsigned long)__per_cpu_start)
+#endif
+
+struct pcpu_chunk {
+	struct list_head	list;		/* linked to pcpu_slot lists */
+	struct rb_node		rb_node;	/* key is chunk->vm->addr */
+	int			free_size;	/* free bytes in the chunk */
+	int			contig_hint;	/* max contiguous size hint */
+	struct vm_struct	*vm;		/* mapped vmalloc region */
+	int			map_used;	/* # of map entries used */
+	int			map_alloc;	/* # of map entries allocated */
+	int			*map;		/* allocation map */
+	bool			immutable;	/* no [de]population allowed */
+	struct page		**page;		/* points to page array */
+	struct page		*page_ar[];	/* #cpus * UNIT_PAGES */
+};
+
+static int pcpu_unit_pages __read_mostly;
+static int pcpu_unit_size __read_mostly;
+static int pcpu_chunk_size __read_mostly;
+static int pcpu_nr_slots __read_mostly;
+static size_t pcpu_chunk_struct_size __read_mostly;
+
+/* the address of the first chunk which starts with the kernel static area */
+void *pcpu_base_addr __read_mostly;
+EXPORT_SYMBOL_GPL(pcpu_base_addr);
+
+/* optional reserved chunk, only accessible for reserved allocations */
+static struct pcpu_chunk *pcpu_reserved_chunk;
+/* offset limit of the reserved chunk */
+static int pcpu_reserved_chunk_limit;
+
+/*
+ * Synchronization rules.
+ *
+ * There are two locks - pcpu_alloc_mutex and pcpu_lock.  The former
+ * protects allocation/reclaim paths, chunks and chunk->page arrays.
+ * The latter is a spinlock and protects the index data structures -
+ * chunk slots, rbtree, chunks and area maps in chunks.
+ *
+ * During allocation, pcpu_alloc_mutex is kept locked all the time and
+ * pcpu_lock is grabbed and released as necessary.  All actual memory
+ * allocations are done using GFP_KERNEL with pcpu_lock released.
+ *
+ * Free path accesses and alters only the index data structures, so it
+ * can be safely called from atomic context.  When memory needs to be
+ * returned to the system, free path schedules reclaim_work which
+ * grabs both pcpu_alloc_mutex and pcpu_lock, unlinks chunks to be
+ * reclaimed, release both locks and frees the chunks.  Note that it's
+ * necessary to grab both locks to remove a chunk from circulation as
+ * allocation path might be referencing the chunk with only
+ * pcpu_alloc_mutex locked.
+ */
+static DEFINE_MUTEX(pcpu_alloc_mutex);	/* protects whole alloc and reclaim */
+static DEFINE_SPINLOCK(pcpu_lock);	/* protects index data structures */
+
+static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
+static struct rb_root pcpu_addr_root = RB_ROOT;	/* chunks by address */
+
+/* reclaim work to release fully free chunks, scheduled from free path */
+static void pcpu_reclaim(struct work_struct *work);
+static DECLARE_WORK(pcpu_reclaim_work, pcpu_reclaim);
+
+static int __pcpu_size_to_slot(int size)
+{
+	int highbit = fls(size);	/* size is in bytes */
+	return max(highbit - PCPU_SLOT_BASE_SHIFT + 2, 1);
+}
+
+static int pcpu_size_to_slot(int size)
+{
+	if (size == pcpu_unit_size)
+		return pcpu_nr_slots - 1;
+	return __pcpu_size_to_slot(size);
+}
+
+static int pcpu_chunk_slot(const struct pcpu_chunk *chunk)
+{
+	if (chunk->free_size < sizeof(int) || chunk->contig_hint < sizeof(int))
+		return 0;
+
+	return pcpu_size_to_slot(chunk->free_size);
+}
+
+static int pcpu_page_idx(unsigned int cpu, int page_idx)
+{
+	return cpu * pcpu_unit_pages + page_idx;
+}
+
+static struct page **pcpu_chunk_pagep(struct pcpu_chunk *chunk,
+				      unsigned int cpu, int page_idx)
+{
+	return &chunk->page[pcpu_page_idx(cpu, page_idx)];
+}
+
+static unsigned long pcpu_chunk_addr(struct pcpu_chunk *chunk,
+				     unsigned int cpu, int page_idx)
+{
+	return (unsigned long)chunk->vm->addr +
+		(pcpu_page_idx(cpu, page_idx) << PAGE_SHIFT);
+}
+
+static bool pcpu_chunk_page_occupied(struct pcpu_chunk *chunk,
+				     int page_idx)
+{
+	return *pcpu_chunk_pagep(chunk, 0, page_idx) != NULL;
+}
+
+/**
+ * pcpu_mem_alloc - allocate memory
+ * @size: bytes to allocate
+ *
+ * Allocate @size bytes.  If @size is smaller than PAGE_SIZE,
+ * kzalloc() is used; otherwise, vmalloc() is used.  The returned
+ * memory is always zeroed.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.
+ *
+ * RETURNS:
+ * Pointer to the allocated area on success, NULL on failure.
+ */
+static void *pcpu_mem_alloc(size_t size)
+{
+	if (size <= PAGE_SIZE)
+		return kzalloc(size, GFP_KERNEL);
+	else {
+		void *ptr = vmalloc(size);
+		if (ptr)
+			memset(ptr, 0, size);
+		return ptr;
+	}
+}
+
+/**
+ * pcpu_mem_free - free memory
+ * @ptr: memory to free
+ * @size: size of the area
+ *
+ * Free @ptr.  @ptr should have been allocated using pcpu_mem_alloc().
+ */
+static void pcpu_mem_free(void *ptr, size_t size)
+{
+	if (size <= PAGE_SIZE)
+		kfree(ptr);
+	else
+		vfree(ptr);
+}
+
+/**
+ * pcpu_chunk_relocate - put chunk in the appropriate chunk slot
+ * @chunk: chunk of interest
+ * @oslot: the previous slot it was on
+ *
+ * This function is called after an allocation or free changed @chunk.
+ * New slot according to the changed state is determined and @chunk is
+ * moved to the slot.  Note that the reserved chunk is never put on
+ * chunk slots.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
+{
+	int nslot = pcpu_chunk_slot(chunk);
+
+	if (chunk != pcpu_reserved_chunk && oslot != nslot) {
+		if (oslot < nslot)
+			list_move(&chunk->list, &pcpu_slot[nslot]);
+		else
+			list_move_tail(&chunk->list, &pcpu_slot[nslot]);
+	}
+}
+
+static struct rb_node **pcpu_chunk_rb_search(void *addr,
+					     struct rb_node **parentp)
+{
+	struct rb_node **p = &pcpu_addr_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct pcpu_chunk *chunk;
+
+	while (*p) {
+		parent = *p;
+		chunk = rb_entry(parent, struct pcpu_chunk, rb_node);
+
+		if (addr < chunk->vm->addr)
+			p = &(*p)->rb_left;
+		else if (addr > chunk->vm->addr)
+			p = &(*p)->rb_right;
+		else
+			break;
+	}
+
+	if (parentp)
+		*parentp = parent;
+	return p;
+}
+
+/**
+ * pcpu_chunk_addr_search - search for chunk containing specified address
+ * @addr: address to search for
+ *
+ * Look for chunk which might contain @addr.  More specifically, it
+ * searchs for the chunk with the highest start address which isn't
+ * beyond @addr.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ *
+ * RETURNS:
+ * The address of the found chunk.
+ */
+static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr)
+{
+	struct rb_node *n, *parent;
+	struct pcpu_chunk *chunk;
+
+	/* is it in the reserved chunk? */
+	if (pcpu_reserved_chunk) {
+		void *start = pcpu_reserved_chunk->vm->addr;
+
+		if (addr >= start && addr < start + pcpu_reserved_chunk_limit)
+			return pcpu_reserved_chunk;
+	}
+
+	/* nah... search the regular ones */
+	n = *pcpu_chunk_rb_search(addr, &parent);
+	if (!n) {
+		/* no exactly matching chunk, the parent is the closest */
+		n = parent;
+		BUG_ON(!n);
+	}
+	chunk = rb_entry(n, struct pcpu_chunk, rb_node);
+
+	if (addr < chunk->vm->addr) {
+		/* the parent was the next one, look for the previous one */
+		n = rb_prev(n);
+		BUG_ON(!n);
+		chunk = rb_entry(n, struct pcpu_chunk, rb_node);
+	}
+
+	return chunk;
+}
+
+/**
+ * pcpu_chunk_addr_insert - insert chunk into address rb tree
+ * @new: chunk to insert
+ *
+ * Insert @new into address rb tree.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static void pcpu_chunk_addr_insert(struct pcpu_chunk *new)
+{
+	struct rb_node **p, *parent;
+
+	p = pcpu_chunk_rb_search(new->vm->addr, &parent);
+	BUG_ON(*p);
+	rb_link_node(&new->rb_node, parent, p);
+	rb_insert_color(&new->rb_node, &pcpu_addr_root);
+}
+
+/**
+ * pcpu_extend_area_map - extend area map for allocation
+ * @chunk: target chunk
+ *
+ * Extend area map of @chunk so that it can accomodate an allocation.
+ * A single allocation can split an area into three areas, so this
+ * function makes sure that @chunk->map has at least two extra slots.
+ *
+ * CONTEXT:
+ * pcpu_alloc_mutex, pcpu_lock.  pcpu_lock is released and reacquired
+ * if area map is extended.
+ *
+ * RETURNS:
+ * 0 if noop, 1 if successfully extended, -errno on failure.
+ */
+static int pcpu_extend_area_map(struct pcpu_chunk *chunk)
+{
+	int new_alloc;
+	int *new;
+	size_t size;
+
+	/* has enough? */
+	if (chunk->map_alloc >= chunk->map_used + 2)
+		return 0;
+
+	spin_unlock_irq(&pcpu_lock);
+
+	new_alloc = PCPU_DFL_MAP_ALLOC;
+	while (new_alloc < chunk->map_used + 2)
+		new_alloc *= 2;
+
+	new = pcpu_mem_alloc(new_alloc * sizeof(new[0]));
+	if (!new) {
+		spin_lock_irq(&pcpu_lock);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Acquire pcpu_lock and switch to new area map.  Only free
+	 * could have happened inbetween, so map_used couldn't have
+	 * grown.
+	 */
+	spin_lock_irq(&pcpu_lock);
+	BUG_ON(new_alloc < chunk->map_used + 2);
+
+	size = chunk->map_alloc * sizeof(chunk->map[0]);
+	memcpy(new, chunk->map, size);
+
+	/*
+	 * map_alloc < PCPU_DFL_MAP_ALLOC indicates that the chunk is
+	 * one of the first chunks and still using static map.
+	 */
+	if (chunk->map_alloc >= PCPU_DFL_MAP_ALLOC)
+		pcpu_mem_free(chunk->map, size);
+
+	chunk->map_alloc = new_alloc;
+	chunk->map = new;
+	return 0;
+}
+
+/**
+ * pcpu_split_block - split a map block
+ * @chunk: chunk of interest
+ * @i: index of map block to split
+ * @head: head size in bytes (can be 0)
+ * @tail: tail size in bytes (can be 0)
+ *
+ * Split the @i'th map block into two or three blocks.  If @head is
+ * non-zero, @head bytes block is inserted before block @i moving it
+ * to @i+1 and reducing its size by @head bytes.
+ *
+ * If @tail is non-zero, the target block, which can be @i or @i+1
+ * depending on @head, is reduced by @tail bytes and @tail byte block
+ * is inserted after the target block.
+ *
+ * @chunk->map must have enough free slots to accomodate the split.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static void pcpu_split_block(struct pcpu_chunk *chunk, int i,
+			     int head, int tail)
+{
+	int nr_extra = !!head + !!tail;
+
+	BUG_ON(chunk->map_alloc < chunk->map_used + nr_extra);
+
+	/* insert new subblocks */
+	memmove(&chunk->map[i + nr_extra], &chunk->map[i],
+		sizeof(chunk->map[0]) * (chunk->map_used - i));
+	chunk->map_used += nr_extra;
+
+	if (head) {
+		chunk->map[i + 1] = chunk->map[i] - head;
+		chunk->map[i++] = head;
+	}
+	if (tail) {
+		chunk->map[i++] -= tail;
+		chunk->map[i] = tail;
+	}
+}
+
+/**
+ * pcpu_alloc_area - allocate area from a pcpu_chunk
+ * @chunk: chunk of interest
+ * @size: wanted size in bytes
+ * @align: wanted align
+ *
+ * Try to allocate @size bytes area aligned at @align from @chunk.
+ * Note that this function only allocates the offset.  It doesn't
+ * populate or map the area.
+ *
+ * @chunk->map must have at least two free slots.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ *
+ * RETURNS:
+ * Allocated offset in @chunk on success, -1 if no matching area is
+ * found.
+ */
+static int pcpu_alloc_area(struct pcpu_chunk *chunk, int size, int align)
+{
+	int oslot = pcpu_chunk_slot(chunk);
+	int max_contig = 0;
+	int i, off;
+
+	for (i = 0, off = 0; i < chunk->map_used; off += abs(chunk->map[i++])) {
+		bool is_last = i + 1 == chunk->map_used;
+		int head, tail;
+
+		/* extra for alignment requirement */
+		head = ALIGN(off, align) - off;
+		BUG_ON(i == 0 && head != 0);
+
+		if (chunk->map[i] < 0)
+			continue;
+		if (chunk->map[i] < head + size) {
+			max_contig = max(chunk->map[i], max_contig);
+			continue;
+		}
+
+		/*
+		 * If head is small or the previous block is free,
+		 * merge'em.  Note that 'small' is defined as smaller
+		 * than sizeof(int), which is very small but isn't too
+		 * uncommon for percpu allocations.
+		 */
+		if (head && (head < sizeof(int) || chunk->map[i - 1] > 0)) {
+			if (chunk->map[i - 1] > 0)
+				chunk->map[i - 1] += head;
+			else {
+				chunk->map[i - 1] -= head;
+				chunk->free_size -= head;
+			}
+			chunk->map[i] -= head;
+			off += head;
+			head = 0;
+		}
+
+		/* if tail is small, just keep it around */
+		tail = chunk->map[i] - head - size;
+		if (tail < sizeof(int))
+			tail = 0;
+
+		/* split if warranted */
+		if (head || tail) {
+			pcpu_split_block(chunk, i, head, tail);
+			if (head) {
+				i++;
+				off += head;
+				max_contig = max(chunk->map[i - 1], max_contig);
+			}
+			if (tail)
+				max_contig = max(chunk->map[i + 1], max_contig);
+		}
+
+		/* update hint and mark allocated */
+		if (is_last)
+			chunk->contig_hint = max_contig; /* fully scanned */
+		else
+			chunk->contig_hint = max(chunk->contig_hint,
+						 max_contig);
+
+		chunk->free_size -= chunk->map[i];
+		chunk->map[i] = -chunk->map[i];
+
+		pcpu_chunk_relocate(chunk, oslot);
+		return off;
+	}
+
+	chunk->contig_hint = max_contig;	/* fully scanned */
+	pcpu_chunk_relocate(chunk, oslot);
+
+	/* tell the upper layer that this chunk has no matching area */
+	return -1;
+}
+
+/**
+ * pcpu_free_area - free area to a pcpu_chunk
+ * @chunk: chunk of interest
+ * @freeme: offset of area to free
+ *
+ * Free area starting from @freeme to @chunk.  Note that this function
+ * only modifies the allocation map.  It doesn't depopulate or unmap
+ * the area.
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme)
+{
+	int oslot = pcpu_chunk_slot(chunk);
+	int i, off;
+
+	for (i = 0, off = 0; i < chunk->map_used; off += abs(chunk->map[i++]))
+		if (off == freeme)
+			break;
+	BUG_ON(off != freeme);
+	BUG_ON(chunk->map[i] > 0);
+
+	chunk->map[i] = -chunk->map[i];
+	chunk->free_size += chunk->map[i];
+
+	/* merge with previous? */
+	if (i > 0 && chunk->map[i - 1] >= 0) {
+		chunk->map[i - 1] += chunk->map[i];
+		chunk->map_used--;
+		memmove(&chunk->map[i], &chunk->map[i + 1],
+			(chunk->map_used - i) * sizeof(chunk->map[0]));
+		i--;
+	}
+	/* merge with next? */
+	if (i + 1 < chunk->map_used && chunk->map[i + 1] >= 0) {
+		chunk->map[i] += chunk->map[i + 1];
+		chunk->map_used--;
+		memmove(&chunk->map[i + 1], &chunk->map[i + 2],
+			(chunk->map_used - (i + 1)) * sizeof(chunk->map[0]));
+	}
+
+	chunk->contig_hint = max(chunk->map[i], chunk->contig_hint);
+	pcpu_chunk_relocate(chunk, oslot);
+}
+
+/**
+ * pcpu_unmap - unmap pages out of a pcpu_chunk
+ * @chunk: chunk of interest
+ * @page_start: page index of the first page to unmap
+ * @page_end: page index of the last page to unmap + 1
+ * @flush: whether to flush cache and tlb or not
+ *
+ * For each cpu, unmap pages [@page_start,@page_end) out of @chunk.
+ * If @flush is true, vcache is flushed before unmapping and tlb
+ * after.
+ */
+static void pcpu_unmap(struct pcpu_chunk *chunk, int page_start, int page_end,
+		       bool flush)
+{
+	unsigned int last = num_possible_cpus() - 1;
+	unsigned int cpu;
+
+	/* unmap must not be done on immutable chunk */
+	WARN_ON(chunk->immutable);
+
+	/*
+	 * Each flushing trial can be very expensive, issue flush on
+	 * the whole region at once rather than doing it for each cpu.
+	 * This could be an overkill but is more scalable.
+	 */
+	if (flush)
+		flush_cache_vunmap(pcpu_chunk_addr(chunk, 0, page_start),
+				   pcpu_chunk_addr(chunk, last, page_end));
+
+	for_each_possible_cpu(cpu)
+		unmap_kernel_range_noflush(
+				pcpu_chunk_addr(chunk, cpu, page_start),
+				(page_end - page_start) << PAGE_SHIFT);
+
+	/* ditto as flush_cache_vunmap() */
+	if (flush)
+		flush_tlb_kernel_range(pcpu_chunk_addr(chunk, 0, page_start),
+				       pcpu_chunk_addr(chunk, last, page_end));
+}
+
+/**
+ * pcpu_depopulate_chunk - depopulate and unmap an area of a pcpu_chunk
+ * @chunk: chunk to depopulate
+ * @off: offset to the area to depopulate
+ * @size: size of the area to depopulate in bytes
+ * @flush: whether to flush cache and tlb or not
+ *
+ * For each cpu, depopulate and unmap pages [@page_start,@page_end)
+ * from @chunk.  If @flush is true, vcache is flushed before unmapping
+ * and tlb after.
+ *
+ * CONTEXT:
+ * pcpu_alloc_mutex.
+ */
+static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, int off, int size,
+				  bool flush)
+{
+	int page_start = PFN_DOWN(off);
+	int page_end = PFN_UP(off + size);
+	int unmap_start = -1;
+	int uninitialized_var(unmap_end);
+	unsigned int cpu;
+	int i;
+
+	for (i = page_start; i < page_end; i++) {
+		for_each_possible_cpu(cpu) {
+			struct page **pagep = pcpu_chunk_pagep(chunk, cpu, i);
+
+			if (!*pagep)
+				continue;
+
+			__free_page(*pagep);
+
+			/*
+			 * If it's partial depopulation, it might get
+			 * populated or depopulated again.  Mark the
+			 * page gone.
+			 */
+			*pagep = NULL;
+
+			unmap_start = unmap_start < 0 ? i : unmap_start;
+			unmap_end = i + 1;
+		}
+	}
+
+	if (unmap_start >= 0)
+		pcpu_unmap(chunk, unmap_start, unmap_end, flush);
+}
+
+/**
+ * pcpu_map - map pages into a pcpu_chunk
+ * @chunk: chunk of interest
+ * @page_start: page index of the first page to map
+ * @page_end: page index of the last page to map + 1
+ *
+ * For each cpu, map pages [@page_start,@page_end) into @chunk.
+ * vcache is flushed afterwards.
+ */
+static int pcpu_map(struct pcpu_chunk *chunk, int page_start, int page_end)
+{
+	unsigned int last = num_possible_cpus() - 1;
+	unsigned int cpu;
+	int err;
+
+	/* map must not be done on immutable chunk */
+	WARN_ON(chunk->immutable);
+
+	for_each_possible_cpu(cpu) {
+		err = map_kernel_range_noflush(
+				pcpu_chunk_addr(chunk, cpu, page_start),
+				(page_end - page_start) << PAGE_SHIFT,
+				PAGE_KERNEL,
+				pcpu_chunk_pagep(chunk, cpu, page_start));
+		if (err < 0)
+			return err;
+	}
+
+	/* flush at once, please read comments in pcpu_unmap() */
+	flush_cache_vmap(pcpu_chunk_addr(chunk, 0, page_start),
+			 pcpu_chunk_addr(chunk, last, page_end));
+	return 0;
+}
+
+/**
+ * pcpu_populate_chunk - populate and map an area of a pcpu_chunk
+ * @chunk: chunk of interest
+ * @off: offset to the area to populate
+ * @size: size of the area to populate in bytes
+ *
+ * For each cpu, populate and map pages [@page_start,@page_end) into
+ * @chunk.  The area is cleared on return.
+ *
+ * CONTEXT:
+ * pcpu_alloc_mutex, does GFP_KERNEL allocation.
+ */
+static int pcpu_populate_chunk(struct pcpu_chunk *chunk, int off, int size)
+{
+	const gfp_t alloc_mask = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
+	int page_start = PFN_DOWN(off);
+	int page_end = PFN_UP(off + size);
+	int map_start = -1;
+	int uninitialized_var(map_end);
+	unsigned int cpu;
+	int i;
+
+	for (i = page_start; i < page_end; i++) {
+		if (pcpu_chunk_page_occupied(chunk, i)) {
+			if (map_start >= 0) {
+				if (pcpu_map(chunk, map_start, map_end))
+					goto err;
+				map_start = -1;
+			}
+			continue;
+		}
+
+		map_start = map_start < 0 ? i : map_start;
+		map_end = i + 1;
+
+		for_each_possible_cpu(cpu) {
+			struct page **pagep = pcpu_chunk_pagep(chunk, cpu, i);
+
+			*pagep = alloc_pages_node(cpu_to_node(cpu),
+						  alloc_mask, 0);
+			if (!*pagep)
+				goto err;
+		}
+	}
+
+	if (map_start >= 0 && pcpu_map(chunk, map_start, map_end))
+		goto err;
+
+	for_each_possible_cpu(cpu)
+		memset(chunk->vm->addr + cpu * pcpu_unit_size + off, 0,
+		       size);
+
+	return 0;
+err:
+	/* likely under heavy memory pressure, give memory back */
+	pcpu_depopulate_chunk(chunk, off, size, true);
+	return -ENOMEM;
+}
+
+static void free_pcpu_chunk(struct pcpu_chunk *chunk)
+{
+	if (!chunk)
+		return;
+	if (chunk->vm)
+		free_vm_area(chunk->vm);
+	pcpu_mem_free(chunk->map, chunk->map_alloc * sizeof(chunk->map[0]));
+	kfree(chunk);
+}
+
+static struct pcpu_chunk *alloc_pcpu_chunk(void)
+{
+	struct pcpu_chunk *chunk;
+
+	chunk = kzalloc(pcpu_chunk_struct_size, GFP_KERNEL);
+	if (!chunk)
+		return NULL;
+
+	chunk->map = pcpu_mem_alloc(PCPU_DFL_MAP_ALLOC * sizeof(chunk->map[0]));
+	chunk->map_alloc = PCPU_DFL_MAP_ALLOC;
+	chunk->map[chunk->map_used++] = pcpu_unit_size;
+	chunk->page = chunk->page_ar;
+
+	chunk->vm = get_vm_area(pcpu_chunk_size, GFP_KERNEL);
+	if (!chunk->vm) {
+		free_pcpu_chunk(chunk);
+		return NULL;
+	}
+
+	INIT_LIST_HEAD(&chunk->list);
+	chunk->free_size = pcpu_unit_size;
+	chunk->contig_hint = pcpu_unit_size;
+
+	return chunk;
+}
+
+/**
+ * pcpu_alloc - the percpu allocator
+ * @size: size of area to allocate in bytes
+ * @align: alignment of area (max PAGE_SIZE)
+ * @reserved: allocate from the reserved chunk if available
+ *
+ * Allocate percpu area of @size bytes aligned at @align.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.
+ *
+ * RETURNS:
+ * Percpu pointer to the allocated area on success, NULL on failure.
+ */
+static void *pcpu_alloc(size_t size, size_t align, bool reserved)
+{
+	struct pcpu_chunk *chunk;
+	int slot, off;
+
+	if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
+		WARN(true, "illegal size (%zu) or align (%zu) for "
+		     "percpu allocation\n", size, align);
+		return NULL;
+	}
+
+	mutex_lock(&pcpu_alloc_mutex);
+	spin_lock_irq(&pcpu_lock);
+
+	/* serve reserved allocations from the reserved chunk if available */
+	if (reserved && pcpu_reserved_chunk) {
+		chunk = pcpu_reserved_chunk;
+		if (size > chunk->contig_hint ||
+		    pcpu_extend_area_map(chunk) < 0)
+			goto fail_unlock;
+		off = pcpu_alloc_area(chunk, size, align);
+		if (off >= 0)
+			goto area_found;
+		goto fail_unlock;
+	}
+
+restart:
+	/* search through normal chunks */
+	for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) {
+		list_for_each_entry(chunk, &pcpu_slot[slot], list) {
+			if (size > chunk->contig_hint)
+				continue;
+
+			switch (pcpu_extend_area_map(chunk)) {
+			case 0:
+				break;
+			case 1:
+				goto restart;	/* pcpu_lock dropped, restart */
+			default:
+				goto fail_unlock;
+			}
+
+			off = pcpu_alloc_area(chunk, size, align);
+			if (off >= 0)
+				goto area_found;
+		}
+	}
+
+	/* hmmm... no space left, create a new chunk */
+	spin_unlock_irq(&pcpu_lock);
+
+	chunk = alloc_pcpu_chunk();
+	if (!chunk)
+		goto fail_unlock_mutex;
+
+	spin_lock_irq(&pcpu_lock);
+	pcpu_chunk_relocate(chunk, -1);
+	pcpu_chunk_addr_insert(chunk);
+	goto restart;
+
+area_found:
+	spin_unlock_irq(&pcpu_lock);
+
+	/* populate, map and clear the area */
+	if (pcpu_populate_chunk(chunk, off, size)) {
+		spin_lock_irq(&pcpu_lock);
+		pcpu_free_area(chunk, off);
+		goto fail_unlock;
+	}
+
+	mutex_unlock(&pcpu_alloc_mutex);
+
+	return __addr_to_pcpu_ptr(chunk->vm->addr + off);
+
+fail_unlock:
+	spin_unlock_irq(&pcpu_lock);
+fail_unlock_mutex:
+	mutex_unlock(&pcpu_alloc_mutex);
+	return NULL;
+}
+
+/**
+ * __alloc_percpu - allocate dynamic percpu area
+ * @size: size of area to allocate in bytes
+ * @align: alignment of area (max PAGE_SIZE)
+ *
+ * Allocate percpu area of @size bytes aligned at @align.  Might
+ * sleep.  Might trigger writeouts.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.
+ *
+ * RETURNS:
+ * Percpu pointer to the allocated area on success, NULL on failure.
+ */
+void *__alloc_percpu(size_t size, size_t align)
+{
+	return pcpu_alloc(size, align, false);
+}
+EXPORT_SYMBOL_GPL(__alloc_percpu);
+
+/**
+ * __alloc_reserved_percpu - allocate reserved percpu area
+ * @size: size of area to allocate in bytes
+ * @align: alignment of area (max PAGE_SIZE)
+ *
+ * Allocate percpu area of @size bytes aligned at @align from reserved
+ * percpu area if arch has set it up; otherwise, allocation is served
+ * from the same dynamic area.  Might sleep.  Might trigger writeouts.
+ *
+ * CONTEXT:
+ * Does GFP_KERNEL allocation.
+ *
+ * RETURNS:
+ * Percpu pointer to the allocated area on success, NULL on failure.
+ */
+void *__alloc_reserved_percpu(size_t size, size_t align)
+{
+	return pcpu_alloc(size, align, true);
+}
+
+/**
+ * pcpu_reclaim - reclaim fully free chunks, workqueue function
+ * @work: unused
+ *
+ * Reclaim all fully free chunks except for the first one.
+ *
+ * CONTEXT:
+ * workqueue context.
+ */
+static void pcpu_reclaim(struct work_struct *work)
+{
+	LIST_HEAD(todo);
+	struct list_head *head = &pcpu_slot[pcpu_nr_slots - 1];
+	struct pcpu_chunk *chunk, *next;
+
+	mutex_lock(&pcpu_alloc_mutex);
+	spin_lock_irq(&pcpu_lock);
+
+	list_for_each_entry_safe(chunk, next, head, list) {
+		WARN_ON(chunk->immutable);
+
+		/* spare the first one */
+		if (chunk == list_first_entry(head, struct pcpu_chunk, list))
+			continue;
+
+		rb_erase(&chunk->rb_node, &pcpu_addr_root);
+		list_move(&chunk->list, &todo);
+	}
+
+	spin_unlock_irq(&pcpu_lock);
+	mutex_unlock(&pcpu_alloc_mutex);
+
+	list_for_each_entry_safe(chunk, next, &todo, list) {
+		pcpu_depopulate_chunk(chunk, 0, pcpu_unit_size, false);
+		free_pcpu_chunk(chunk);
+	}
+}
+
+/**
+ * free_percpu - free percpu area
+ * @ptr: pointer to area to free
+ *
+ * Free percpu area @ptr.
+ *
+ * CONTEXT:
+ * Can be called from atomic context.
+ */
+void free_percpu(void *ptr)
+{
+	void *addr = __pcpu_ptr_to_addr(ptr);
+	struct pcpu_chunk *chunk;
+	unsigned long flags;
+	int off;
+
+	if (!ptr)
+		return;
+
+	spin_lock_irqsave(&pcpu_lock, flags);
+
+	chunk = pcpu_chunk_addr_search(addr);
+	off = addr - chunk->vm->addr;
+
+	pcpu_free_area(chunk, off);
+
+	/* if there are more than one fully free chunks, wake up grim reaper */
+	if (chunk->free_size == pcpu_unit_size) {
+		struct pcpu_chunk *pos;
+
+		list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)
+			if (pos != chunk) {
+				schedule_work(&pcpu_reclaim_work);
+				break;
+			}
+	}
+
+	spin_unlock_irqrestore(&pcpu_lock, flags);
+}
+EXPORT_SYMBOL_GPL(free_percpu);
+
+/**
+ * pcpu_setup_first_chunk - initialize the first percpu chunk
+ * @get_page_fn: callback to fetch page pointer
+ * @static_size: the size of static percpu area in bytes
+ * @reserved_size: the size of reserved percpu area in bytes
+ * @dyn_size: free size for dynamic allocation in bytes, -1 for auto
+ * @unit_size: unit size in bytes, must be multiple of PAGE_SIZE, -1 for auto
+ * @base_addr: mapped address, NULL for auto
+ * @populate_pte_fn: callback to allocate pagetable, NULL if unnecessary
+ *
+ * Initialize the first percpu chunk which contains the kernel static
+ * perpcu area.  This function is to be called from arch percpu area
+ * setup path.  The first two parameters are mandatory.  The rest are
+ * optional.
+ *
+ * @get_page_fn() should return pointer to percpu page given cpu
+ * number and page number.  It should at least return enough pages to
+ * cover the static area.  The returned pages for static area should
+ * have been initialized with valid data.  If @unit_size is specified,
+ * it can also return pages after the static area.  NULL return
+ * indicates end of pages for the cpu.  Note that @get_page_fn() must
+ * return the same number of pages for all cpus.
+ *
+ * @reserved_size, if non-zero, specifies the amount of bytes to
+ * reserve after the static area in the first chunk.  This reserves
+ * the first chunk such that it's available only through reserved
+ * percpu allocation.  This is primarily used to serve module percpu
+ * static areas on architectures where the addressing model has
+ * limited offset range for symbol relocations to guarantee module
+ * percpu symbols fall inside the relocatable range.
+ *
+ * @dyn_size, if non-negative, determines the number of bytes
+ * available for dynamic allocation in the first chunk.  Specifying
+ * non-negative value makes percpu leave alone the area beyond
+ * @static_size + @reserved_size + @dyn_size.
+ *
+ * @unit_size, if non-negative, specifies unit size and must be
+ * aligned to PAGE_SIZE and equal to or larger than @static_size +
+ * @reserved_size + if non-negative, @dyn_size.
+ *
+ * Non-null @base_addr means that the caller already allocated virtual
+ * region for the first chunk and mapped it.  percpu must not mess
+ * with the chunk.  Note that @base_addr with 0 @unit_size or non-NULL
+ * @populate_pte_fn doesn't make any sense.
+ *
+ * @populate_pte_fn is used to populate the pagetable.  NULL means the
+ * caller already populated the pagetable.
+ *
+ * If the first chunk ends up with both reserved and dynamic areas, it
+ * is served by two chunks - one to serve the core static and reserved
+ * areas and the other for the dynamic area.  They share the same vm
+ * and page map but uses different area allocation map to stay away
+ * from each other.  The latter chunk is circulated in the chunk slots
+ * and available for dynamic allocation like any other chunks.
+ *
+ * RETURNS:
+ * The determined pcpu_unit_size which can be used to initialize
+ * percpu access.
+ */
+size_t __init pcpu_setup_first_chunk(pcpu_get_page_fn_t get_page_fn,
+				     size_t static_size, size_t reserved_size,
+				     ssize_t dyn_size, ssize_t unit_size,
+				     void *base_addr,
+				     pcpu_populate_pte_fn_t populate_pte_fn)
+{
+	static struct vm_struct first_vm;
+	static int smap[2], dmap[2];
+	size_t size_sum = static_size + reserved_size +
+			  (dyn_size >= 0 ? dyn_size : 0);
+	struct pcpu_chunk *schunk, *dchunk = NULL;
+	unsigned int cpu;
+	int nr_pages;
+	int err, i;
+
+	/* santiy checks */
+	BUILD_BUG_ON(ARRAY_SIZE(smap) >= PCPU_DFL_MAP_ALLOC ||
+		     ARRAY_SIZE(dmap) >= PCPU_DFL_MAP_ALLOC);
+	BUG_ON(!static_size);
+	if (unit_size >= 0) {
+		BUG_ON(unit_size < size_sum);
+		BUG_ON(unit_size & ~PAGE_MASK);
+		BUG_ON(unit_size < PCPU_MIN_UNIT_SIZE);
+	} else
+		BUG_ON(base_addr);
+	BUG_ON(base_addr && populate_pte_fn);
+
+	if (unit_size >= 0)
+		pcpu_unit_pages = unit_size >> PAGE_SHIFT;
+	else
+		pcpu_unit_pages = max_t(int, PCPU_MIN_UNIT_SIZE >> PAGE_SHIFT,
+					PFN_UP(size_sum));
+
+	pcpu_unit_size = pcpu_unit_pages << PAGE_SHIFT;
+	pcpu_chunk_size = num_possible_cpus() * pcpu_unit_size;
+	pcpu_chunk_struct_size = sizeof(struct pcpu_chunk)
+		+ num_possible_cpus() * pcpu_unit_pages * sizeof(struct page *);
+
+	if (dyn_size < 0)
+		dyn_size = pcpu_unit_size - static_size - reserved_size;
+
+	/*
+	 * Allocate chunk slots.  The additional last slot is for
+	 * empty chunks.
+	 */
+	pcpu_nr_slots = __pcpu_size_to_slot(pcpu_unit_size) + 2;
+	pcpu_slot = alloc_bootmem(pcpu_nr_slots * sizeof(pcpu_slot[0]));
+	for (i = 0; i < pcpu_nr_slots; i++)
+		INIT_LIST_HEAD(&pcpu_slot[i]);
+
+	/*
+	 * Initialize static chunk.  If reserved_size is zero, the
+	 * static chunk covers static area + dynamic allocation area
+	 * in the first chunk.  If reserved_size is not zero, it
+	 * covers static area + reserved area (mostly used for module
+	 * static percpu allocation).
+	 */
+	schunk = alloc_bootmem(pcpu_chunk_struct_size);
+	INIT_LIST_HEAD(&schunk->list);
+	schunk->vm = &first_vm;
+	schunk->map = smap;
+	schunk->map_alloc = ARRAY_SIZE(smap);
+	schunk->page = schunk->page_ar;
+
+	if (reserved_size) {
+		schunk->free_size = reserved_size;
+		pcpu_reserved_chunk = schunk;	/* not for dynamic alloc */
+	} else {
+		schunk->free_size = dyn_size;
+		dyn_size = 0;			/* dynamic area covered */
+	}
+	schunk->contig_hint = schunk->free_size;
+
+	schunk->map[schunk->map_used++] = -static_size;
+	if (schunk->free_size)
+		schunk->map[schunk->map_used++] = schunk->free_size;
+
+	pcpu_reserved_chunk_limit = static_size + schunk->free_size;
+
+	/* init dynamic chunk if necessary */
+	if (dyn_size) {
+		dchunk = alloc_bootmem(sizeof(struct pcpu_chunk));
+		INIT_LIST_HEAD(&dchunk->list);
+		dchunk->vm = &first_vm;
+		dchunk->map = dmap;
+		dchunk->map_alloc = ARRAY_SIZE(dmap);
+		dchunk->page = schunk->page_ar;	/* share page map with schunk */
+
+		dchunk->contig_hint = dchunk->free_size = dyn_size;
+		dchunk->map[dchunk->map_used++] = -pcpu_reserved_chunk_limit;
+		dchunk->map[dchunk->map_used++] = dchunk->free_size;
+	}
+
+	/* allocate vm address */
+	first_vm.flags = VM_ALLOC;
+	first_vm.size = pcpu_chunk_size;
+
+	if (!base_addr)
+		vm_area_register_early(&first_vm, PAGE_SIZE);
+	else {
+		/*
+		 * Pages already mapped.  No need to remap into
+		 * vmalloc area.  In this case the first chunks can't
+		 * be mapped or unmapped by percpu and are marked
+		 * immutable.
+		 */
+		first_vm.addr = base_addr;
+		schunk->immutable = true;
+		if (dchunk)
+			dchunk->immutable = true;
+	}
+
+	/* assign pages */
+	nr_pages = -1;
+	for_each_possible_cpu(cpu) {
+		for (i = 0; i < pcpu_unit_pages; i++) {
+			struct page *page = get_page_fn(cpu, i);
+
+			if (!page)
+				break;
+			*pcpu_chunk_pagep(schunk, cpu, i) = page;
+		}
+
+		BUG_ON(i < PFN_UP(static_size));
+
+		if (nr_pages < 0)
+			nr_pages = i;
+		else
+			BUG_ON(nr_pages != i);
+	}
+
+	/* map them */
+	if (populate_pte_fn) {
+		for_each_possible_cpu(cpu)
+			for (i = 0; i < nr_pages; i++)
+				populate_pte_fn(pcpu_chunk_addr(schunk,
+								cpu, i));
+
+		err = pcpu_map(schunk, 0, nr_pages);
+		if (err)
+			panic("failed to setup static percpu area, err=%d\n",
+			      err);
+	}
+
+	/* link the first chunk in */
+	if (!dchunk) {
+		pcpu_chunk_relocate(schunk, -1);
+		pcpu_chunk_addr_insert(schunk);
+	} else {
+		pcpu_chunk_relocate(dchunk, -1);
+		pcpu_chunk_addr_insert(dchunk);
+	}
+
+	/* we're done */
+	pcpu_base_addr = (void *)pcpu_chunk_addr(schunk, 0, 0);
+	return pcpu_unit_size;
+}
+
+/*
+ * Embedding first chunk setup helper.
+ */
+static void *pcpue_ptr __initdata;
+static size_t pcpue_size __initdata;
+static size_t pcpue_unit_size __initdata;
+
+static struct page * __init pcpue_get_page(unsigned int cpu, int pageno)
+{
+	size_t off = (size_t)pageno << PAGE_SHIFT;
+
+	if (off >= pcpue_size)
+		return NULL;
+
+	return virt_to_page(pcpue_ptr + cpu * pcpue_unit_size + off);
+}
+
+/**
+ * pcpu_embed_first_chunk - embed the first percpu chunk into bootmem
+ * @static_size: the size of static percpu area in bytes
+ * @reserved_size: the size of reserved percpu area in bytes
+ * @dyn_size: free size for dynamic allocation in bytes, -1 for auto
+ * @unit_size: unit size in bytes, must be multiple of PAGE_SIZE, -1 for auto
+ *
+ * This is a helper to ease setting up embedded first percpu chunk and
+ * can be called where pcpu_setup_first_chunk() is expected.
+ *
+ * If this function is used to setup the first chunk, it is allocated
+ * as a contiguous area using bootmem allocator and used as-is without
+ * being mapped into vmalloc area.  This enables the first chunk to
+ * piggy back on the linear physical mapping which often uses larger
+ * page size.
+ *
+ * When @dyn_size is positive, dynamic area might be larger than
+ * specified to fill page alignment.  Also, when @dyn_size is auto,
+ * @dyn_size does not fill the whole first chunk but only what's
+ * necessary for page alignment after static and reserved areas.
+ *
+ * If the needed size is smaller than the minimum or specified unit
+ * size, the leftover is returned to the bootmem allocator.
+ *
+ * RETURNS:
+ * The determined pcpu_unit_size which can be used to initialize
+ * percpu access on success, -errno on failure.
+ */
+ssize_t __init pcpu_embed_first_chunk(size_t static_size, size_t reserved_size,
+				      ssize_t dyn_size, ssize_t unit_size)
+{
+	unsigned int cpu;
+
+	/* determine parameters and allocate */
+	pcpue_size = PFN_ALIGN(static_size + reserved_size +
+			       (dyn_size >= 0 ? dyn_size : 0));
+	if (dyn_size != 0)
+		dyn_size = pcpue_size - static_size - reserved_size;
+
+	if (unit_size >= 0) {
+		BUG_ON(unit_size < pcpue_size);
+		pcpue_unit_size = unit_size;
+	} else
+		pcpue_unit_size = max_t(size_t, pcpue_size, PCPU_MIN_UNIT_SIZE);
+
+	pcpue_ptr = __alloc_bootmem_nopanic(
+					num_possible_cpus() * pcpue_unit_size,
+					PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+	if (!pcpue_ptr)
+		return -ENOMEM;
+
+	/* return the leftover and copy */
+	for_each_possible_cpu(cpu) {
+		void *ptr = pcpue_ptr + cpu * pcpue_unit_size;
+
+		free_bootmem(__pa(ptr + pcpue_size),
+			     pcpue_unit_size - pcpue_size);
+		memcpy(ptr, __per_cpu_load, static_size);
+	}
+
+	/* we're ready, commit */
+	pr_info("PERCPU: Embedded %zu pages at %p, static data %zu bytes\n",
+		pcpue_size >> PAGE_SHIFT, pcpue_ptr, static_size);
+
+	return pcpu_setup_first_chunk(pcpue_get_page, static_size,
+				      reserved_size, dyn_size,
+				      pcpue_unit_size, pcpue_ptr, NULL);
+}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 520a75980269..af58324c361a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -24,6 +24,7 @@
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
 #include <linux/bootmem.h>
+#include <linux/pfn.h>
 
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
@@ -152,8 +153,8 @@ static int vmap_pud_range(pgd_t *pgd, unsigned long addr,
  *
  * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N]
  */
-static int vmap_page_range(unsigned long start, unsigned long end,
-				pgprot_t prot, struct page **pages)
+static int vmap_page_range_noflush(unsigned long start, unsigned long end,
+				   pgprot_t prot, struct page **pages)
 {
 	pgd_t *pgd;
 	unsigned long next;
@@ -169,13 +170,22 @@ static int vmap_page_range(unsigned long start, unsigned long end,
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
-	flush_cache_vmap(start, end);
 
 	if (unlikely(err))
 		return err;
 	return nr;
 }
 
+static int vmap_page_range(unsigned long start, unsigned long end,
+			   pgprot_t prot, struct page **pages)
+{
+	int ret;
+
+	ret = vmap_page_range_noflush(start, end, prot, pages);
+	flush_cache_vmap(start, end);
+	return ret;
+}
+
 static inline int is_vmalloc_or_module_addr(const void *x)
 {
 	/*
@@ -990,6 +1000,32 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro
 }
 EXPORT_SYMBOL(vm_map_ram);
 
+/**
+ * vm_area_register_early - register vmap area early during boot
+ * @vm: vm_struct to register
+ * @align: requested alignment
+ *
+ * This function is used to register kernel vm area before
+ * vmalloc_init() is called.  @vm->size and @vm->flags should contain
+ * proper values on entry and other fields should be zero.  On return,
+ * vm->addr contains the allocated address.
+ *
+ * DO NOT USE THIS FUNCTION UNLESS YOU KNOW WHAT YOU'RE DOING.
+ */
+void __init vm_area_register_early(struct vm_struct *vm, size_t align)
+{
+	static size_t vm_init_off __initdata;
+	unsigned long addr;
+
+	addr = ALIGN(VMALLOC_START + vm_init_off, align);
+	vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
+
+	vm->addr = (void *)addr;
+
+	vm->next = vmlist;
+	vmlist = vm;
+}
+
 void __init vmalloc_init(void)
 {
 	struct vmap_area *va;
@@ -1017,6 +1053,58 @@ void __init vmalloc_init(void)
 	vmap_initialized = true;
 }
 
+/**
+ * map_kernel_range_noflush - map kernel VM area with the specified pages
+ * @addr: start of the VM area to map
+ * @size: size of the VM area to map
+ * @prot: page protection flags to use
+ * @pages: pages to map
+ *
+ * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size
+ * specify should have been allocated using get_vm_area() and its
+ * friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is
+ * responsible for calling flush_cache_vmap() on to-be-mapped areas
+ * before calling this function.
+ *
+ * RETURNS:
+ * The number of pages mapped on success, -errno on failure.
+ */
+int map_kernel_range_noflush(unsigned long addr, unsigned long size,
+			     pgprot_t prot, struct page **pages)
+{
+	return vmap_page_range_noflush(addr, addr + size, prot, pages);
+}
+
+/**
+ * unmap_kernel_range_noflush - unmap kernel VM area
+ * @addr: start of the VM area to unmap
+ * @size: size of the VM area to unmap
+ *
+ * Unmap PFN_UP(@size) pages at @addr.  The VM area @addr and @size
+ * specify should have been allocated using get_vm_area() and its
+ * friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is
+ * responsible for calling flush_cache_vunmap() on to-be-mapped areas
+ * before calling this function and flush_tlb_kernel_range() after.
+ */
+void unmap_kernel_range_noflush(unsigned long addr, unsigned long size)
+{
+	vunmap_page_range(addr, addr + size);
+}
+
+/**
+ * unmap_kernel_range - unmap kernel VM area and flush cache and TLB
+ * @addr: start of the VM area to unmap
+ * @size: size of the VM area to unmap
+ *
+ * Similar to unmap_kernel_range_noflush() but flushes vcache before
+ * the unmapping and tlb after.
+ */
 void unmap_kernel_range(unsigned long addr, unsigned long size)
 {
 	unsigned long end = addr + size;
@@ -1267,6 +1355,7 @@ EXPORT_SYMBOL(vfree);
 void vunmap(const void *addr)
 {
 	BUG_ON(in_interrupt());
+	might_sleep();
 	__vunmap(addr, 0);
 }
 EXPORT_SYMBOL(vunmap);
@@ -1286,6 +1375,8 @@ void *vmap(struct page **pages, unsigned int count,
 {
 	struct vm_struct *area;
 
+	might_sleep();
+
 	if (count > num_physpages)
 		return NULL;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 63ec4bf89b29..52fea5b28ca6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1457,7 +1457,9 @@ static bool can_checksum_protocol(unsigned long features, __be16 protocol)
 		((features & NETIF_F_IP_CSUM) &&
 		 protocol == htons(ETH_P_IP)) ||
 		((features & NETIF_F_IPV6_CSUM) &&
-		 protocol == htons(ETH_P_IPV6)));
+		 protocol == htons(ETH_P_IPV6)) ||
+		((features & NETIF_F_FCOE_CRC) &&
+		 protocol == htons(ETH_P_FCOE)));
 }
 
 static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d5aaabbb7cb3..7f03373b8c07 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1375,10 +1375,10 @@ EXPORT_SYMBOL_GPL(snmp_fold_field);
 int snmp_mib_init(void *ptr[2], size_t mibsize)
 {
 	BUG_ON(ptr == NULL);
-	ptr[0] = __alloc_percpu(mibsize);
+	ptr[0] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
 	if (!ptr[0])
 		goto err0;
-	ptr[1] = __alloc_percpu(mibsize);
+	ptr[1] = __alloc_percpu(mibsize, __alignof__(unsigned long long));
 	if (!ptr[1])
 		goto err1;
 	return 0;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5caee609be06..c40debe51b38 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3377,7 +3377,7 @@ int __init ip_rt_init(void)
 	int rc = 0;
 
 #ifdef CONFIG_NET_CLS_ROUTE
-	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct));
+	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));
 	if (!ip_rt_acct)
 		panic("IP: failed to allocate ip_rt_acct\n");
 #endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index c18fa150b6fe..979619574f70 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -186,3 +186,17 @@ quiet_cmd_gzip = GZIP    $@
 cmd_gzip = gzip -f -9 < $< > $@
 
 
+# Bzip2
+# ---------------------------------------------------------------------------
+
+# Bzip2 does not include size in file... so we have to fake that
+size_append=$(CONFIG_SHELL) $(srctree)/scripts/bin_size
+
+quiet_cmd_bzip2 = BZIP2    $@
+cmd_bzip2 = (bzip2 -9 < $< && $(size_append) $<) > $@ || (rm -f $@ ; false)
+
+# Lzma
+# ---------------------------------------------------------------------------
+
+quiet_cmd_lzma = LZMA    $@
+cmd_lzma = (lzma -9 -c $< && $(size_append) $<) >$@ || (rm -f $@ ; false)
diff --git a/scripts/bin_size b/scripts/bin_size
new file mode 100644
index 000000000000..43e1b360cee6
--- /dev/null
+++ b/scripts/bin_size
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+if [ $# = 0 ] ; then
+   echo Usage: $0 file
+fi
+
+size_dec=`stat -c "%s" $1`
+size_hex_echo_string=`printf "%08x" $size_dec |
+     sed 's/\(..\)\(..\)\(..\)\(..\)/\\\\x\4\\\\x\3\\\\x\2\\\\x\1/g'`
+/bin/echo -ne $size_hex_echo_string
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
new file mode 100644
index 000000000000..29493dc4528d
--- /dev/null
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -xc -c -O0 -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+if [ "$?" -eq "0" ] ; then
+	echo y
+else
+	echo n
+fi
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
index 325c0a1b03b6..afaec618b395 100644
--- a/scripts/gcc-x86_64-has-stack-protector.sh
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -1,6 +1,8 @@
 #!/bin/sh
 
-echo "int foo(void) { char X[200]; return 3; }" | $1 -S -xc -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -xc -c -O0 -mcmodel=kernel -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
 if [ "$?" -eq "0" ] ; then
-	echo $2
+	echo y
+else
+	echo n
 fi
diff --git a/scripts/gen_initramfs_list.sh b/scripts/gen_initramfs_list.sh
index 5f3415f28736..3eea8f15131b 100644
--- a/scripts/gen_initramfs_list.sh
+++ b/scripts/gen_initramfs_list.sh
@@ -5,7 +5,7 @@
 # Released under the terms of the GNU GPL
 #
 # Generate a cpio packed initramfs. It uses gen_init_cpio to generate
-# the cpio archive, and gzip to pack it.
+# the cpio archive, and then compresses it.
 # The script may also be used to generate the inputfile used for gen_init_cpio
 # This script assumes that gen_init_cpio is located in usr/ directory
 
@@ -16,8 +16,8 @@ usage() {
 cat << EOF
 Usage:
 $0 [-o <file>] [-u <uid>] [-g <gid>] {-d | <cpio_source>} ...
-	-o <file>      Create gzipped initramfs file named <file> using
-		       gen_init_cpio and gzip
+	-o <file>      Create compressed initramfs file named <file> using
+		       gen_init_cpio and compressor depending on the extension
 	-u <uid>       User ID to map to user ID 0 (root).
 		       <uid> is only meaningful if <cpio_source> is a
 		       directory.  "squash" forces all files to uid 0.
@@ -225,6 +225,7 @@ cpio_list=
 output="/dev/stdout"
 output_file=""
 is_cpio_compressed=
+compr="gzip -9 -f"
 
 arg="$1"
 case "$arg" in
@@ -233,11 +234,15 @@ case "$arg" in
 		echo "deps_initramfs := \\"
 		shift
 		;;
-	"-o")	# generate gzipped cpio image named $1
+	"-o")	# generate compressed cpio image named $1
 		shift
 		output_file="$1"
 		cpio_list="$(mktemp ${TMPDIR:-/tmp}/cpiolist.XXXXXX)"
 		output=${cpio_list}
+		echo "$output_file" | grep -q "\.gz$" && compr="gzip -9 -f"
+		echo "$output_file" | grep -q "\.bz2$" && compr="bzip2 -9 -f"
+		echo "$output_file" | grep -q "\.lzma$" && compr="lzma -9 -f"
+		echo "$output_file" | grep -q "\.cpio$" && compr="cat"
 		shift
 		;;
 esac
@@ -274,7 +279,7 @@ while [ $# -gt 0 ]; do
 	esac
 done
 
-# If output_file is set we will generate cpio archive and gzip it
+# If output_file is set we will generate cpio archive and compress it
 # we are carefull to delete tmp files
 if [ ! -z ${output_file} ]; then
 	if [ -z ${cpio_file} ]; then
@@ -287,7 +292,8 @@ if [ ! -z ${output_file} ]; then
 	if [ "${is_cpio_compressed}" = "compressed" ]; then
 		cat ${cpio_tfile} > ${output_file}
 	else
-		cat ${cpio_tfile} | gzip -f -9 - > ${output_file}
+		(cat ${cpio_tfile} | ${compr}  - > ${output_file}) \
+		|| (rm -f ${output_file} ; false)
 	fi
 	[ -z ${cpio_file} ] && rm ${cpio_tfile}
 fi
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 88921611b22e..7e62303133dc 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -415,8 +415,9 @@ static int parse_elf(struct elf_info *info, const char *filename)
 		const char *secstrings
 			= (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 		const char *secname;
+		int nobits = sechdrs[i].sh_type == SHT_NOBITS;
 
-		if (sechdrs[i].sh_offset > info->size) {
+		if (!nobits && sechdrs[i].sh_offset > info->size) {
 			fatal("%s is truncated. sechdrs[i].sh_offset=%lu > "
 			      "sizeof(*hrd)=%zu\n", filename,
 			      (unsigned long)sechdrs[i].sh_offset,
@@ -425,6 +426,8 @@ static int parse_elf(struct elf_info *info, const char *filename)
 		}
 		secname = secstrings + sechdrs[i].sh_name;
 		if (strcmp(secname, ".modinfo") == 0) {
+			if (nobits)
+				fatal("%s has NOBITS .modinfo\n", filename);
 			info->modinfo = (void *)hdr + sechdrs[i].sh_offset;
 			info->modinfo_len = sechdrs[i].sh_size;
 		} else if (strcmp(secname, "__ksymtab") == 0)
diff --git a/sound/drivers/Kconfig b/sound/drivers/Kconfig
index 0bcf14640fde..84714a65e5c8 100644
--- a/sound/drivers/Kconfig
+++ b/sound/drivers/Kconfig
@@ -33,7 +33,7 @@ if SND_DRIVERS
 
 config SND_PCSP
 	tristate "PC-Speaker support (READ HELP!)"
-	depends on PCSPKR_PLATFORM && X86_PC && HIGH_RES_TIMERS
+	depends on PCSPKR_PLATFORM && X86 && HIGH_RES_TIMERS
 	depends on INPUT
 	depends on EXPERIMENTAL
 	select SND_PCM
diff --git a/usr/Kconfig b/usr/Kconfig
index 86cecb59dd07..43a3a0fe8f29 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -44,3 +44,92 @@ config INITRAMFS_ROOT_GID
 	  owned by group root in the initial ramdisk image.
 
 	  If you are not sure, leave it set to "0".
+
+config RD_GZIP
+	bool "Initial ramdisk compressed using gzip"
+	default y
+	depends on BLK_DEV_INITRD=y
+	select DECOMPRESS_GZIP
+	help
+	  Support loading of a gzip encoded initial ramdisk or cpio buffer.
+	  If unsure, say Y.
+
+config RD_BZIP2
+	bool "Initial ramdisk compressed using bzip2"
+	default n
+	depends on BLK_DEV_INITRD=y
+	select DECOMPRESS_BZIP2
+	help
+	  Support loading of a bzip2 encoded initial ramdisk or cpio buffer
+	  If unsure, say N.
+
+config RD_LZMA
+	bool "Initial ramdisk compressed using lzma"
+	default n
+	depends on BLK_DEV_INITRD=y
+	select DECOMPRESS_LZMA
+	help
+	  Support loading of a lzma encoded initial ramdisk or cpio buffer
+	  If unsure, say N.
+
+choice
+	prompt "Built-in initramfs compression mode"
+	help
+	  This setting is only meaningful if the INITRAMFS_SOURCE is
+	  set. It decides by which algorithm the INITRAMFS_SOURCE will
+	  be compressed.
+	  Several compression algorithms are available, which differ
+	  in efficiency, compression and decompression speed.
+	  Compression speed is only relevant when building a kernel.
+	  Decompression speed is relevant at each boot.
+
+	  If you have any problems with bzip2 or lzma compressed
+	  initramfs, mail me (Alain Knaff) <alain@knaff.lu>.
+
+	  High compression options are mostly useful for users who
+	  are low on disk space (embedded systems), but for whom ram
+	  size matters less.
+
+	  If in doubt, select 'gzip'
+
+config INITRAMFS_COMPRESSION_NONE
+	bool "None"
+	help
+	  Do not compress the built-in initramfs at all. This may
+	  sound wasteful in space, but, you should be aware that the
+	  built-in initramfs will be compressed at a later stage
+	  anyways along with the rest of the kernel, on those
+	  architectures that support this.
+	  However, not compressing the initramfs may lead to slightly
+	  higher memory consumption during a short time at boot, while
+	  both the cpio image and the unpacked filesystem image will
+	  be present in memory simultaneously
+
+config INITRAMFS_COMPRESSION_GZIP
+	bool "Gzip"
+	depends on RD_GZIP
+	help
+	  The old and tried gzip compression. Its compression ratio is
+	  the poorest among the 3 choices; however its speed (both
+	  compression and decompression) is the fastest.
+
+config INITRAMFS_COMPRESSION_BZIP2
+	bool "Bzip2"
+	depends on RD_BZIP2
+	help
+	  Its compression ratio and speed is intermediate.
+	  Decompression speed is slowest among the three.  The initramfs
+	  size is about 10% smaller with bzip2, in comparison to gzip.
+	  Bzip2 uses a large amount of memory. For modern kernels you
+	  will need at least 8MB RAM or more for booting.
+
+config INITRAMFS_COMPRESSION_LZMA
+	bool "LZMA"
+	depends on RD_LZMA
+	help
+	  The most recent compression algorithm.
+	  Its ratio is best, decompression speed is between the other
+	  two. Compression is slowest.	The initramfs size is about 33%
+	  smaller with LZMA in comparison to gzip.
+
+endchoice
diff --git a/usr/Makefile b/usr/Makefile
index 201f27f8cbaf..b84894b3929d 100644
--- a/usr/Makefile
+++ b/usr/Makefile
@@ -6,13 +6,25 @@ klibcdirs:;
 PHONY += klibcdirs
 
 
+# No compression
+suffix_$(CONFIG_INITRAMFS_COMPRESSION_NONE)   =
+
+# Gzip, but no bzip2
+suffix_$(CONFIG_INITRAMFS_COMPRESSION_GZIP)   = .gz
+
+# Bzip2
+suffix_$(CONFIG_INITRAMFS_COMPRESSION_BZIP2)  = .bz2
+
+# Lzma
+suffix_$(CONFIG_INITRAMFS_COMPRESSION_LZMA)   = .lzma
+
 # Generate builtin.o based on initramfs_data.o
-obj-$(CONFIG_BLK_DEV_INITRD) := initramfs_data.o
+obj-$(CONFIG_BLK_DEV_INITRD) := initramfs_data$(suffix_y).o
 
-# initramfs_data.o contains the initramfs_data.cpio.gz image.
+# initramfs_data.o contains the compressed initramfs_data.cpio image.
 # The image is included using .incbin, a dependency which is not
 # tracked automatically.
-$(obj)/initramfs_data.o: $(obj)/initramfs_data.cpio.gz FORCE
+$(obj)/initramfs_data$(suffix_y).o: $(obj)/initramfs_data.cpio$(suffix_y) FORCE
 
 #####
 # Generate the initramfs cpio archive
@@ -25,28 +37,28 @@ ramfs-args  := \
         $(if $(CONFIG_INITRAMFS_ROOT_UID), -u $(CONFIG_INITRAMFS_ROOT_UID)) \
         $(if $(CONFIG_INITRAMFS_ROOT_GID), -g $(CONFIG_INITRAMFS_ROOT_GID))
 
-# .initramfs_data.cpio.gz.d is used to identify all files included
+# .initramfs_data.cpio.d is used to identify all files included
 # in initramfs and to detect if any files are added/removed.
 # Removed files are identified by directory timestamp being updated
 # The dependency list is generated by gen_initramfs.sh -l
-ifneq ($(wildcard $(obj)/.initramfs_data.cpio.gz.d),)
-	include $(obj)/.initramfs_data.cpio.gz.d
+ifneq ($(wildcard $(obj)/.initramfs_data.cpio.d),)
+	include $(obj)/.initramfs_data.cpio.d
 endif
 
 quiet_cmd_initfs = GEN     $@
       cmd_initfs = $(initramfs) -o $@ $(ramfs-args) $(ramfs-input)
 
-targets := initramfs_data.cpio.gz
+targets := initramfs_data.cpio.gz initramfs_data.cpio.bz2 initramfs_data.cpio.lzma initramfs_data.cpio
 # do not try to update files included in initramfs
 $(deps_initramfs): ;
 
 $(deps_initramfs): klibcdirs
-# We rebuild initramfs_data.cpio.gz if:
-# 1) Any included file is newer then initramfs_data.cpio.gz
+# We rebuild initramfs_data.cpio if:
+# 1) Any included file is newer then initramfs_data.cpio
 # 2) There are changes in which files are included (added or deleted)
-# 3) If gen_init_cpio are newer than initramfs_data.cpio.gz
+# 3) If gen_init_cpio are newer than initramfs_data.cpio
 # 4) arguments to gen_initramfs.sh changes
-$(obj)/initramfs_data.cpio.gz: $(obj)/gen_init_cpio $(deps_initramfs) klibcdirs
-	$(Q)$(initramfs) -l $(ramfs-input) > $(obj)/.initramfs_data.cpio.gz.d
+$(obj)/initramfs_data.cpio$(suffix_y): $(obj)/gen_init_cpio $(deps_initramfs) klibcdirs
+	$(Q)$(initramfs) -l $(ramfs-input) > $(obj)/.initramfs_data.cpio.d
 	$(call if_changed,initfs)
 
diff --git a/usr/initramfs_data.S b/usr/initramfs_data.S
index c2e1ad424f4a..7c6973d8d829 100644
--- a/usr/initramfs_data.S
+++ b/usr/initramfs_data.S
@@ -26,5 +26,5 @@ SECTIONS
 */
 
 .section .init.ramfs,"a"
-.incbin "usr/initramfs_data.cpio.gz"
+.incbin "usr/initramfs_data.cpio"
 
diff --git a/usr/initramfs_data.bz2.S b/usr/initramfs_data.bz2.S
new file mode 100644
index 000000000000..bc54d090365c
--- /dev/null
+++ b/usr/initramfs_data.bz2.S
@@ -0,0 +1,29 @@
+/*
+  initramfs_data includes the compressed binary that is the
+  filesystem used for early user space.
+  Note: Older versions of "as" (prior to binutils 2.11.90.0.23
+  released on 2001-07-14) dit not support .incbin.
+  If you are forced to use older binutils than that then the
+  following trick can be applied to create the resulting binary:
+
+
+  ld -m elf_i386  --format binary --oformat elf32-i386 -r \
+  -T initramfs_data.scr initramfs_data.cpio.gz -o initramfs_data.o
+   ld -m elf_i386  -r -o built-in.o initramfs_data.o
+
+  initramfs_data.scr looks like this:
+SECTIONS
+{
+       .init.ramfs : { *(.data) }
+}
+
+  The above example is for i386 - the parameters vary from architectures.
+  Eventually look up LDFLAGS_BLOB in an older version of the
+  arch/$(ARCH)/Makefile to see the flags used before .incbin was introduced.
+
+  Using .incbin has the advantage over ld that the correct flags are set
+  in the ELF header, as required by certain architectures.
+*/
+
+.section .init.ramfs,"a"
+.incbin "usr/initramfs_data.cpio.bz2"
diff --git a/usr/initramfs_data.gz.S b/usr/initramfs_data.gz.S
new file mode 100644
index 000000000000..890c8dd1d6bd
--- /dev/null
+++ b/usr/initramfs_data.gz.S
@@ -0,0 +1,29 @@
+/*
+  initramfs_data includes the compressed binary that is the
+  filesystem used for early user space.
+  Note: Older versions of "as" (prior to binutils 2.11.90.0.23
+  released on 2001-07-14) dit not support .incbin.
+  If you are forced to use older binutils than that then the
+  following trick can be applied to create the resulting binary:
+
+
+  ld -m elf_i386  --format binary --oformat elf32-i386 -r \
+  -T initramfs_data.scr initramfs_data.cpio.gz -o initramfs_data.o
+   ld -m elf_i386  -r -o built-in.o initramfs_data.o
+
+  initramfs_data.scr looks like this:
+SECTIONS
+{
+       .init.ramfs : { *(.data) }
+}
+
+  The above example is for i386 - the parameters vary from architectures.
+  Eventually look up LDFLAGS_BLOB in an older version of the
+  arch/$(ARCH)/Makefile to see the flags used before .incbin was introduced.
+
+  Using .incbin has the advantage over ld that the correct flags are set
+  in the ELF header, as required by certain architectures.
+*/
+
+.section .init.ramfs,"a"
+.incbin "usr/initramfs_data.cpio.gz"
diff --git a/usr/initramfs_data.lzma.S b/usr/initramfs_data.lzma.S
new file mode 100644
index 000000000000..e11469e48562
--- /dev/null
+++ b/usr/initramfs_data.lzma.S
@@ -0,0 +1,29 @@
+/*
+  initramfs_data includes the compressed binary that is the
+  filesystem used for early user space.
+  Note: Older versions of "as" (prior to binutils 2.11.90.0.23
+  released on 2001-07-14) dit not support .incbin.
+  If you are forced to use older binutils than that then the
+  following trick can be applied to create the resulting binary:
+
+
+  ld -m elf_i386  --format binary --oformat elf32-i386 -r \
+  -T initramfs_data.scr initramfs_data.cpio.gz -o initramfs_data.o
+   ld -m elf_i386  -r -o built-in.o initramfs_data.o
+
+  initramfs_data.scr looks like this:
+SECTIONS
+{
+       .init.ramfs : { *(.data) }
+}
+
+  The above example is for i386 - the parameters vary from architectures.
+  Eventually look up LDFLAGS_BLOB in an older version of the
+  arch/$(ARCH)/Makefile to see the flags used before .incbin was introduced.
+
+  Using .incbin has the advantage over ld that the correct flags are set
+  in the ELF header, as required by certain architectures.
+*/
+
+.section .init.ramfs,"a"
+.incbin "usr/initramfs_data.cpio.lzma"